Emblica's Blog

Looking Back at 2025: A Year of Meaningful Partnerships in Physical AI

Teemu Heikkilä — Thu, 18 Dec 2025 09:15:40 GMT

As we head into the end of 2025, I’ve been reflecting on everything we’ve done over the past twelve months. It’s been a year of steady work, clear progress, and a lot of good conversations about where physical AI is headed.

For a long time, we’ve focused on making AI work in the real world, not just on a screen, but in heavy machinery and industrial environments. This year, that focus really paid off. We’ve moved from just "doing the work" to being a core part of the conversation on how physical AI should be built and used.

Growing our community

We’ve always believed that technology works better when people collaborate. This year, we got closer to the organizations shaping our industry: We joined FIMA (Forum for Intelligent Machines) and were invited to become an associate partner in the SIX Mobile Work Machines network. We also continued our work with AI Finland and AHK Finnland, helping to build a stronger AI ecosystem here and across Europe.

One thing the team is particularly proud of is our EmbliCats initiative. We want the tech world to be more open, so we hosted events like a LiDARLab workshop where we invited women and underrepresented groups in tech to learn about physical AI and LiDARs alongside our experts.

Getting out there

It was great to see so many of you in person this year. Whether we were talking about synthetic data or sharing experiences how does physical AI systems perform in production, these events were the highlight of 2025:

Bauma 2025: The world’s leading trade fair for construction machinery and equipment.
FIMA AI Day: We co-hosted this event focused on leveraging AI for the R&D of intelligent mobile work machines. We welcomed speakers from Konecranes, Aalto University House of AI, FCAI, and Finnish AI Region FAIR, alongside our own experts.
Future Mobile Work Machines 2025: The flagship event for the mobile machinery sector, where we demonstrated how physical AI systems are built in practice and contributed a keynote.
Teknologia 25: Finland's leading event for industry and technology professionals. At SICK’s booth, we demonstrated real-time human detection and tracking using LiDAR data and a model trained with synthetic data.
AI Finland’s AI Gala: Great opportunity to celebrate what the Finnish tech community is achieving together.

Looking toward 2026

The coming year is looking even busier. We are preparing to grow the team and take our work on more international projects.

I want to say a sincere thank you to our customers and partners. We don’t take your trust for granted, and we’re excited to keep building things with you in 2026. If you see us at an event next year, please come over and say hi.

Wishing you all a great year ahead,

Teemu Heikkilä

CEO, Emblica

Emblica is not your average data science team. We are delivering tailored AI systems for our partners working in unique operation environments and challenges. We design and develop tailored solutions for land, sea, air, and underground applications. By building strong partnerships, we focus on deeply understanding our customers goals, ensuring our development delivers on your vision.

Embeddings and Mushrooms: Another Way of Finding Fungi

Lotta Koponen — Tue, 28 Oct 2025 12:29:25 GMT

This mini-project turned out to be a great example of how tailored AI systems come to life. While the first goal was simply a tasty dinner, the real takeaway was the power and flexibility of the tech behind it. Similar perceptual systems could easily scale to autonomous machines in agriculture, construction, or defense. These systems could enable real-time ground analysis and assess stability or risk in milliseconds to ensure operational safety and efficiency.

The idea was simple: build a tool that could learn where the good spots are while we’re walking.

At the core were embeddings. They are feature representations created from publicly available Earth Observation data, mainly Sentinel satellite imagery. These embeddings capture a lot of detail about the forest environment (things like soil moisture, canopy density, and terrain structure) at a resolution of about 10×10 meters per grid cell. In short, they compress years of observations into a compact description of each location.

On top of the embeddings we trained a small neural network to predict the probability of finding mushrooms in any given 10×10 meter square.

Then came the fun part — we added an active feedback loop. While hiking, every time we found a mushroom patch, we marked that grid cell as “positive.” If we explored an area that looked promising but turned out empty, we marked it as “negative.” We even logged a few different mushroom species. The model updated itself on the spot, constantly learning from new data. To make that possible, we built a simple web app we could use right from our phones.

There are already apps out there for mushroom hunting, but most rely on static rules, like biological data and historical sightings. Ours, on the other hand, learns dynamically. It figures out the conditions for growth as they change. Once you find your first patch, it helps guide you to the next one.

After completing these steps, we’d trained a model to classify terrain from satellite data just to find mushrooms, but the same input data and the same core capability can be repurposed for much more critical or industrial applications:

Real-time traversability estimation: The same features that help locate mushrooms can tell if a forest area is swampy, dense, or uneven. That’s invaluable for machines in agriculture, forestry, and construction, helping them plan safe and efficient routes in real time.
Defense and security: Terrain “walkability” is a key factor for operational planning and off-road movement.
Environmental monitoring: These classifications can support forestry, conservation, and land management with high precision.

Our little mushroom seeker project proved that a high-resolution classification layer, trained quickly through active learning, can be incredibly effective. It’s a great example of how to build tailored AI – we take general models and technologies, combine them with high-quality, context-specific data, and produce a system that gives our clients a unique, practical advantage.

My biggest takeaway from that weekend? Sometimes, the smartest AI systems start with a simple human touch. A quick loop of feedback and labeling in the field turned a general satellite dataset into a focused, high-resolution solution.

The model proved to be an excellent mushroom guide.

Want to learn how Emblica can build a cutting-edge AI system for your needs, even if they’re as niche as mushroom foraging? Let’s talk.

Emblica is not your average data team. We build customized solutions for collecting, processing, and utilizing data for all sectors, especially at the R&D interface. Whether our target is a factory line, an online store or a field, you can find us busy at work, hands in the clay - at least at our office in Helsinki.

Introducing EmbliCats: Empowering Women in Tech at Emblica

Sanna Niemelä — Sat, 08 Mar 2025 07:30:31 GMT

Women remain underrepresented in the tech industry despite strong evidence that more diverse teams foster innovation, improve decision-making, and boost business performance. In the deep tech environment, where we work with machines, sensors, and advanced data science, the gender gap seems to be even more pronounced compared to the broader IT industry. This is why inclusivity in deep technology is particularly important. At Emblica, we believe that deep tech is for everyone—and that’s why we’re proud to introduce EmbliCats, our internal network dedicated to supporting and uplifting women in tech.

Who we are

EmbliCats is a community of women at Emblica, who share a passion for technology, equality, and professional growth. We are proud of the work we do at Emblica, tackling business challenges, already across 13 industries on land, sea, air and underground with cutting-edge technology. From robotics and autonomous machines to intelligent data pipelines and predictive sensory systems, we create innovative solutions that drive long-term success for our clients.

Every woman at Emblica, including those who will join Emblica in the future, is a member of EmbliCats, bringing their unique insights and experiences to the table. Whether it’s over breakfast meetups, coffee breaks, workshops, or other activities, we come together to exchange ideas, deepen our knowledge, support one another, and drive meaningful initiatives that empower women in tech.

EmbliCats starting from the left: Sanna, Lotta, Venla and Anni

Get to know the founding members

Venla – Data Scientist and Project Manager at Emblica, delivering AI solutions to our clients. At home, I do my best to manage personal projects, such as parenting a teenager and building my camper van.

Anni – ML Engineer and Project Manager at Emblica, crafting code by day, spinning clay and bike by night. Proud cat mom of two and a certified member of EmbliCats.

Lotta – As CCO at Emblica, I help our customers make the most of AI while growing our business. Although I have lived and worked abroad and love exploring new places, my roots are on a small farm in Eastern Finland and I still relax best in nature with my fluffy dog.

Sanna – HR Manager at Emblica, the one who keeps things running smoothly and helps every Emblican to reach their full potential. Child of Lapland with a sunny soul shaped by Spain. You can also find me dancing, traveling, or diving into whatever new project I am currently obsessed with.

What’s Ahead

We want to take things to the next level with an exciting lineup of events and initiatives:

Collaboration – Partnering with other female tech groups
Industry Engagement – Participating in hackathons, networking events, and conferences
Knowledge Sharing – Organizing workshops, writing blog posts, and developing internal discussions on key topics
Inspiring Future Talent – School visits and student mentorships
Giving Back – Charity projects that align with our mission
Fun & Connection – Hosting low-threshold events like meetups, and hangouts
Strengthening Female Community in Tech – Creating more opportunities for women to thrive

Wishing you all a wonderful International Women’s Day!

P.S.
We are organizing our next open-invitation afterwork event on 03.04.2025 at 17:00, where we’ll gather to play board games and hang out at Emblica’s office. Read more information and sign up on LinkedIn. Hopefully, we’ll see you there!

Data Done Right: The Foundation for Successful AI

Rick Joosten — Mon, 03 Mar 2025 12:26:27 GMT

What data do I need?

The best course of action is often to think about your problem before considering the data. “I think AI should go through all of our data and identify problems” is a sentiment we hear often but this creates an unfocused objective. In this blog post, we discussed how having a clear problem definition will help you create better solutions.

Once the problem is clearly defined it is time to think about the data. The quality of the data used to create your algorithmic solution will directly affect the quality that the solution outputs. Shortly put: Garbage in, garbage out. One simple way to test whether your data is up to the task is by checking if an expert can solve the problem given the data.

If a human can solve a problem with the given data, so can a computer. A computer is generally faster and more efficient which is why we usually want to automate things in the first place but a human can often easily tell whether the data is garbage or not.

There however is one major area where computers outperform human experts and that is in combining information from several data sources. A senior mechanic might be able to tell how an engine is performing based on the sound alone. This comes with years of experience and is hard to teach to a new guy. So an expert can tell the performance based on sound. So what if we outfit the machine with not only a microphone but also other sensors that measure pressure, temperature, and vibrations? This additional information would allow an AI system to be much more precise in estimating the engine quality which would normally be invisible to the engineer.

Collecting data the smart way

So what if I know exactly what data I need but I don’t already have the required data? The most straightforward way is to simply start collecting data. However, some AI techniques require a large amount of data which might be cost- or time-prohibitive to collect. Instead of manually collecting data, it might be possible to generate data using a simulated environment. This could mean simulating a full environment using programs such as Blender or Unity. For other applications, we can create statistical models where we can sample as much data as we need.

Using Blender you can simulate complete environments. You are in control of the light, background, reflectivity of objects, and even their exact size and shape. If your goal is to detect objects in a warehouse, if a person walks around with a camera taking pictures of objects all day it would not get the variety of data that you might need for the system to work in every environment. On top of that, you also immediately have the correct labels. Read more about this in our blog post about data augmentation.

Do I need to label my data?

Utilizing synthetic data immediately produces ground truth labels that can be used in supervised learning techniques. However, synthetic data cannot be used to solve all problems. In these cases, it is worth considering whether techniques that do not require labeled data can be used.

To illustrate, let's consider the task of detecting bad welds from pictures taken with a simple phone camera. One way is to have a group of people label a large set of pictures as good or bad. However, this can be error-prone and time-consuming. Using unsupervised learning techniques allows us to cluster the pictures based on the distinctive features of each weld without the need for labeled data. Good welds would be clustered together away from bad welds making it possible to tell good welds from bad.

Lastly, foundational models such as large language models can also be used to build solutions without having to introduce any of your own data at all. The boom in language models such as ChatGPT, Gemini, and DeepSeek has made tasks such as summarizing or sentiment analysis much more accessible. Privacy concerns aside, you could give it your text and get a good result. This gives a chance to build low-cost experiments without having to train a whole model yourself.

How to get started building my AI solution

When building AI solutions for your business problem the data required to make it happen should be considered. Here is my suggestion for how to utilize data when starting to build an AI application.

Check if foundational models can be used to solve your problem. This might include using public-domain data or small sets of your own data. This is often possible in the computer vision or language domain.
If there is some data available specifically for your task, try building a simple model and check the quality of the model. If the model performs okay but does not match the requirements, this can be a good indicator that more quality data is required to get the results you need
After testing with either step 1 or 2 and the results look ok but not good enough it is worth exploring if synthetic data can be employed to make your model fit exactly to your problem.
Lastly, push for the last improvement in accuracy by labeling a small set of data by hand. This can bridge the gap between the synthetic data and real-world data.

Want to know how this can be applied to your specific problem? Please reach out to us at Emblica and we can discuss what would fit best to your problem.

Looking Back at 2024: A Year of Growth and New Adventures

Teemu Heikkilä — Fri, 20 Dec 2024 12:09:47 GMT

As we wrap up 2024, I’m excited to reflect on what has been a truly rewarding year for all of us at Emblica. We’ve grown in many ways—welcoming new faces to the team, exploring new markets, and pushing forward on projects we’re passionate about. It’s been a year full of hard work, learning, and plenty of good times together.

One of our big goals for 2024 was expanding into Germany, and I’m happy to say we’ve made great progress. We took part in events like SMM Hamburg, the leading maritime trade fair, building some fantastic connections along the way. Joining AHK, the German-Finnish Chamber of Commerce, has been another key step, leading to collaborative projects like webinars with DFKI, Germany’s renowned AI research center, and even a keynote at the AHK Spring Forum.

AHK Spring Forum keynote, photo by AHK / Friedrich von der Hagen

Speaking of DFKI, we launched AI’d Forge together this year—a joint program that combines our expertise in AI with their deep research capabilities. We’ve also been able to connect with like-minded organizations by becoming a part of SIX MWMs Cluster and AI Finland.

Another highlight of the year was being recognized in the Kasvu Open Growth Path Programme where we made it to the TOP 10. The program gave us fresh ideas and confirmed that we’re on the right track with how we’re growing the company.

On the home front, our team has grown with amazing new colleagues, and expanding the team continues next year. It’s been great to see the energy and ideas they’ve brought to the table.

We’ve also kept our team spirit strong with a variety of events—from a team-building trip to Tallinn and a remote work week at Emblimökki to sailing, painting, and even a Halloween party. I also want to give a shoutout to Emblica Ladies, who started gathering this year to discuss ways to improve opportunities for women in tech and make Emblica an even better place to work.

All in all, 2024 has been a year of progress, challenges, and plenty of reasons to celebrate. A huge thank you to our clients, partners, and collaborators—you’ve played an important part in everything we’ve achieved. Here’s to an exciting year ahead, filled with new projects, new connections, and more moments worth celebrating.

Wishing you all the best for the year to come,

Teemu Heikkilä
CEO of Emblica

Inside Emblica’s Kasvu Open success, insights from Lotta and Teemu

Rick Joosten — Fri, 15 Nov 2024 12:45:54 GMT

In this blog post, we sit down with Lotta (CCO) and Teemu (CEO) to discuss Emblica’s recent participation in Kasvu Open, a growth-focused program designed to help companies enhance their strategies and scale their operations. Lotta and Teemu share their experiences, insights gained, and what reaching the top 10 means for Emblica.

Kasvu Open describes itself as a program that “organizes sparring sessions for companies to enhance their growth capabilities.” How would you describe what Kasvu Open is?

Lotta: Kasvu Open is a fantastic opportunity for growth-oriented companies looking to refine their strategies and gain fresh perspectives. The program connects participants with industry experts and other companies, providing immersive sparring sessions and networking opportunities. It's a platform where businesses can receive valuable insights, all aimed at enhancing their growth capabilities.

Sparring Session, photo by Kasvu Open / Matias Ulfves

Can you describe in more detail what participation in Kasvu Open entails?

Teemu: The program spanned about six months, during which we completed online tasks and detailed questionnaires related to our growth analysis. However, the core of the experience revolved around onsite days. These included 1:1 sparring sessions with experts, group discussions, and, of course, a lot of networking during and after the sessions.

Lotta: We had the flexibility to influence the topics of our sparring discussions and could choose experts based on the areas we wanted to focus on. The sparring sessions were engaging and supportive, offering concrete tips and advice while also challenging our existing strategies. It was a great mix of friendly discussions and constructive critique.

Why did Emblica decide to participate in Kasvu Open this year, and what was the application process like?

Teemu: This year felt like the perfect time to join Kasvu Open. Emblica is growing steadily, and we have ambitious projections for the coming years, especially as we consider international expansion. We were eager to gain external insights into our strategy and to network with experts and other like-minded companies.

Lotta: The application process started with filling out a comprehensive Growth Analysis (Kasvuyritysanalyysi). This analysis covers everything from basic company information to deeper insights into our team, market potential, references, and growth potential. Experts evaluated our responses and our performance during the 1:1 meetings and we were selected to the top 60 companies. Eventually, the judges selected us for the top 10.

Kasvu Open Carnival October 31, 2024: Final Pitch, photo by Kasvu Open / Matias Ulfves

What do you think made Emblica a good candidate for Kasvu Open to be selected into the top 10?

Teemu: The judges highlighted our consistent growth, clear vision, strong references, and our modern approach to leadership. I think a modern culture where people are trusted and taken care of is something that the judges would like to see more widely adopted in other organizations.

Lotta: Additionally, Emblica operates in the rapidly growing data and AI market, which gives us a natural edge in scaling our business. The fact that we have a track record of 70+ implemented projects across multiple industries helps us leverage the market growth even better.

What were the biggest insights you gained from the program that you are willing to share?

Teemu: The most interesting observation came from a mentor who said: “It sounds like your customers are all forerunners.” This was something we hadn’t realized before, but it perfectly captures the type of clients we attract.

Lotta: We got so many useful insights, from strategies to scale our sales team efficiently, and enhancing our brand awareness, to hearing past experiences on expanding into new markets. The direction we’re headed looks promising and now we have even more tools in our belt to choose from. We can use these tools when there is a need to improve any of the areas as well as many professionals in our network to consult if needed.

Were there any personal highlights or memorable moments you’d like to share?

Lotta: One of the most rewarding aspects was when the mentors and judges asked challenging questions that I hadn’t considered before. It pushed me to explain our current approaches and evaluate if it is really the best approach. I enjoyed seeing how all the companies went through this and how open-minded and curious Kasvu Open participants were.

Teemu: My main highlight is getting to know so many growth-oriented companies, many of which are looking to grow not only in Finland but also internationally just like us.

Kasvu Open Carnival October 31, 2024: Judges' Roast, photo by Kasvu Open / Matias Ulfves

Now that Kasvu Open has concluded, can we expect any significant changes in how Emblica will operate?

Lotta: I wouldn’t say there will be massive changes from a customer perspective. Our team will continue designing and developing top-tier solutions with a focus on delivering tangible business value. However, we’re constantly improving our operations, and we will definitely incorporate the key learnings from Kasvu Open to further enhance our operations.

Teemu: Agreed, no drastic changes are planned. One of the most important insights from Kasvu Open is to have the confirmation that we are doing the right things. It doesn’t mean we’ll stagnate, though. We are continuing to evolve in ways that matter for us and for our customers.

Thank you for the interview any final thoughts?

Lotta: The Kasvu Open experience has been an amazing journey for Emblica, helping define our growth strategy and enhancing our network of industry experts. Making it to the top 10 was incredible and shows that our vision and expertise are set to take us to the next level.

Solving business problems with AI-enhanced LiDAR data

Markku Leppälä — Tue, 15 Oct 2024 10:09:23 GMT

The use of LiDAR data can create many opportunities to solve problems requiring spatial information. In an earlier blog post we dug deeper into how machine learning models can analyze 3D point clouds for several tasks such as classification, object localization, and object segmentation. This post will explore and give examples of how these techniques can be applied to solve real business challenges.

A quick primer on LiDAR sensors

LiDAR, or Light Detection and Ranging, is a remote sensing method that uses light in the form of a pulsed laser to measure distances. This technology generates precise, three-dimensional information about the shape of the ground and its surface characteristics. The output, known as a point cloud, consists of a large number of points that represent the surface coordinates, appearing as a dense cluster of dots that collectively depict the depth and form of the physical environment.

There's no doubt about the superiority of LiDARs for obtaining high-quality spatial data. Especially now that the price of this technology is affordable for various use cases. However, the systems integrating LiDARs might face an issue of overflow of unlabeled data to be processed, a problem solvable by processing the data with AI.

Mapping the cityscape

LiDARs produce thousands of points per second, but not all of these are relevant to the application used. Imagine taking a look at a city from a bird’s view with an object of counting all cars. The majority of the objects seen, such as buildings, parks, and pedestrians, might not be anyhow relevant to the task. The same applies to LiDAR data, where in many cases only a minority of the data is important for the given task.

AI-enhanced systems can automate the majority, if not all, of the object recognition and movement tracking tasks, even in real-time and in complete darkness. AI systems can also be taught to filter unwanted elements in the point cloud such as noise from rain or snow. This allows the LiDAR systems to gather data no matter the weather.

Depending on the training of the AI-enhanced systems, these can offer highly flexible or very strict recognition. Coming back to the example of the city view, the AI could recognize all cars, trucks, and buses, or just convertible cars if necessary. This level of specificity is particularly useful in industries where precision is critical, such as logistics and urban traffic management.

Access to high-quality spatial data with the possibility to strip certain features in real time opens up new business opportunities. By focusing only on relevant data, companies can optimize resources and improve service delivery.

More examples

Different industries would benefit from AI-enhanced LiDAR systems. Here are some examples of how LiDAR data can be used in different cases.

Forestry and Agriculture: Point clouds can be leveraged for biomass estimation. Suitable LiDARs are accurate enough to estimate the growth of individual crops. This precise measurement allows for better management of forests, planning of harvests, and monitoring of ecosystem health. Additionally, in agriculture, LiDAR can help map field topography, enabling precise irrigation planning and terrain analysis to maximize crop yield. Furthermore, having an accurate elevation map can help improve route planning through a forest and monitor ground erosion.

Urban Planning and Infrastructure: In urban settings, LiDAR combined with AI proves invaluable in managing and analyzing traffic patterns and human flow. By creating detailed 3D maps of urban areas, planners can anticipate traffic jams and optimize traffic flow based on real-time data. Moreover, AI-enhanced LiDAR can assess flood risks by analyzing terrain and elevation data to predict water flow paths in extreme weather, aiding in the design of more effective water management and flood defense systems.

Factory monitoring: LiDAR data is inherently private compared to the use of cameras. This opens opportunities to track people's movement in a room without infringing on their privacy. This way we can track current processes and use this data to improve efficiency. Additionally, the spatial resolution is much better than a camera’s. This can potentially be used for full automation helping with object detection and optimized routing for autonomous robots.

Getting started with AI and LiDAR sensors

Exploring how LiDAR technology works with AI shows us its big impact across different areas like forestry, city planning, and managing infrastructure. LiDAR, enhanced by AI, helps businesses handle large amounts of spatial data quickly, focusing on the important bits to make smarter decisions. For example, it can help manage forests better or improve how traffic flows in cities, making sure the data used is accurate and specifically suited for the task.

Looking forward, the combination of LiDAR data and AI techniques is promising. It could not only improve current ways of doing things but also completely change industries. Wondering if LiDAR technology could help you and your business? Get in touch and we can help you explore the possibilities.

Emblica is not your average data team. We build customized solutions for collecting, processing, and utilizing data for all sectors, especially at the R&D interface. Whether our target is a factory line, an online store, or a field, you can find us busy at work, hands in the clay - at least at our office in Helsinki.

The “You Get What You Pay For” Problem in Data Projects

Markku Leppälä — Mon, 17 Jun 2024 14:47:51 GMT

When faced with a textbook example of a problem solvable by algorithms, data professionals often rush to apply their favorite algorithm without questioning the initial problem formulation. The old adage rings true: “If your only tool is a hammer, every problem looks like a nail.”

The same article has been previously published in finnish

When businesses start to use data in decision-making and optimization, defining the problem is one of the most error-prone parts of the development process. The problem and data determine the built solution and its operation. Hence, data-utilizing projects can be impacted by what I call the “you get what you pay for” phenomenon.

How does the "you get what you pay for" problem show itself?

The success of data projects is directly influenced by the data utilized.

Cliched as they may be, the maxims about the impact of data quality on algorithm performance have deeply ingrained themselves in the minds of business decision-makers. In reality, “data quality” is a concept not fully understood by many. In all its complexity, the criteria for data quality are clearest to data professionals who understand what to demand from data and what to expect from it. These same data-savvy professionals may be blind to another stumbling block in data projects—incorrect problem definition.

When a textbook example of an algorithm-solvable problem presents itself, even these professionals often rush to their notebooks. They should pause to consider whether the problem to be solved has been correctly set, even in cases that sound simple. Here the saying holds true among data professionals as well: “If your only tool is a hammer, every problem looks like a nail.”

Example of an incorrectly defined problem

Let’s examine two example cases. Both address the same challenge: How to reduce churn.

Imaginary discussion between business and data science team. Churn prediction is a common project where "you get what you pay for" problem occurs.

Many companies have attempted to predict the turnover of paying customers (churn/attrition). Often, the idea is this: identify customers who are reducing or stopping their use of the service and target them with preventive actions, such as communications or offers.

Forecasting customer turnover is exactly the kind of challenge that many data scientists drool over! Initially, it might sound like a useful and interesting case for data collection, but the problem formulation in the project contains a dangerous trap.

If we analyze business needs more broadly, we realize that the ultimate goal is to minimize customer turnover, not to predict it. The original problem definition did not hit the mark immediately but instead led to hastily solving the wrong issue.

But what harm does incorrect problem framing actually cause? If we start predicting customer turnover instead of minimizing it, the model built will likely learn to identify user groups at risk of leaving as intended. However, this approach also creates significant problems:

The model predicting churn tells us that customers are at risk of leaving but cannot directly say why.
The model's output generalizes all departing users into the same category. Different customer groups at risk of leaving require different actions to improve customer relations, but the model in its current form does not enable distinguishing between them.

A pertinent example of incorrect risk classification is classifying the so-called sleeping customers. This group may include subscribers to a service whose subscription is running as expected, and no problems have occurred. These customers haven’t thought about the ongoing charge or canceling the subscription. As a thought experiment, the model could interpret such passivity as a sign of dissatisfaction and incorrectly place a well-functioning customer relationship in the danger zone. In the worst case, the company takes action and launches a targeted marketing campaign that annoys the customers and leads them to cancel the entire service—even though they had been completely satisfied until now.
In some cases, the model functions exactly as it should, predicting departure with certainty—and nothing can be done. Some customers may indeed be lost cases, regardless of whether they are targeted with actions or not. The model works, but actually, no customer relationship management does. Money is wasted on actions, but no results are produced.
When the model is designed, it is often considered which signals could classify a user as “in the danger zone.” Frequently, only user-created signals, such as those generated by using the service, are chosen. However, in designing models that solve this type of problem setting, actions from the service towards the user are rarely considered. When the prediction model is utilized, the analysis-based actions taken towards customers contaminate the data.

Suppose that a grocery delivery service’s customer service representative reviews loyalty accounts identified by the model as at risk of leaving. The customer service representative notices that some at-risk accounts had delivery problems and as a corrective action he slips them all discount coupons. As a result, customers remain service users, but the model does not understand the reason behind the customers’ retention. When the model is retrained with new data, similar cases may no longer be classified in the same way, because according to the model’s data, delivery issues did not cause the churn.

Churn prediction model is able to identify at-risk customers but cannot tell what should be done to retain them.

How can the “you get what you pay for” problem be turned into an advantage?

Design the problem chain from start to finish and consider what you want to achieve.

“You get what you pay for” is not only a constant challenge in data projects but also a fundamental nature guiding their successful implementation. Defining the problem and objectives goes a long way.

Changing the problem to be solved also changes the utilization approach. Ultimately, this may be the most challenging part of implementing a great algorithm, as people must also trust the machine’s assessment of actions, even if it feels intuitively wrong.

The problem should be defined from the beginning to depict the desired outcome, not an intermediary stage. In our example case, the right solution would be to build a model that predicts the best possible way to manage customer relationships, revealing what kind of treatment satisfied and dissatisfied customers really need. Changing the problem to be solved also changes the utilization approach. Ultimately, this could be the most challenging part of adopting the new algorithm, as people must trust the machine’s evaluation of actions, even if it seems counterintuitive.

Concretely, a small change in problem definition significantly alters the entire task’s outcomes. In our example case, comparing two different problem descriptions is relatively straightforward; we no longer aim to simply divide the entire customer base binary, but make personalized decisions for different customers.

Additionally, we shift the assessment of effects to be processed by algorithms rather than burdening people with it. In some situations, it may be possible to automate certain customer relationship management methods in the same pipeline, thereby further reducing the workload on people.

One challenge in the first example was that customer relationship management actions contaminated the data, making the results unreliable. In this case, the model is specifically created for assessing customer relationship management, so it learns from successful and failed actions to become a better predictor. The model can evaluate the effects of various management approaches on the customer and identify which actions are best for each customer at different times. The model might identify that the best approach for most customers could simply be to leave them alone. The previous model could not do this and might have incorrectly presented them as being at risk of leaving.

The simple solution to the “you get what you pay for” problem: change your order.

In a nutshell: how do I “get what I pay for” and benefit from it?

PROBLEM. Spend time defining the problem and ensure you are addressing the right issue. A simple flowchart can be a good tool for defining objectives, operational areas, available data, and desired outcomes.
DATA. Ensure that you are collecting data on aspects that are crucially related to customer relationships and your defined objectives. If data is not yet available, can it be collected?
TEST. Initially, testing can be done “on paper.” A good guideline is to consider: “Does this outcome directly help solve the original problem?” Even “hard” models can be tested up to production alongside business operations without too much disruption.

There is a simple solution to the “you get what you pay for” problem: if the project does not produce the desired result, change what you order. In our example, the issue was minimizing customer churn. However, because of the methods used, such as utilizing existing data and optimizing metrics, this problem often recurs in various algorithmic solutions. Therefore, it is crucial to recognize the risk posed by hasty problem definition and try to circumvent it.

Emblica is a technology company focused on data-intensive applications and artificial intelligence. Our customers are e.g. Sanoma, Uponor, Caruna, and the Tax Administration. Emblica is 100% owned by its employees.

AI’d Forge – German-Finnish cooperation in the transfer of AI technologies to companies

Emblica — Mon, 03 Jun 2024 09:00:00 GMT

AI’d Forge – Deutsch-finnische Zusammenarbeit beim Transfer von KI-Technologien in Unternehmen

Das Deutsche Forschungszentrum für Künstliche Intelligenz (DFKI) und der finnische Experte für KI-Implementierung Emblica mit Sitz in Helsinki werden künftig Unternehmen bei der digitalen Transformation unterstützen. Das DFKI bringt dabei seine Forschungsexpertise in den Technologiefeldern Datenanalyse, Generative KI, Bild- und Signalerkennung- und -verarbeitung sowie KI-basierte Prognostik ein. Emblica steuert seine Erfahrung in der Beratung von Unternehmen im Bereich der Geschäftsprozessoptimierung durch KI-basierte Softwarelösungen und deren Implementierung bei. Die Zusammenarbeit wird ergänzt durch das Beratungsunternehmen Remode ME und deren Expertise in der internationalen Geschäftsentwicklung und im Venture Building. Das gemeinsame Programm unter der Bezeichnung „AI’d Forge“ wurde am 30. Mai 2024 auf der Digital Convention “Noerd” in Rostock erstmals vorgestellt.

Auf Deutsch (DFKI.de)

The German Research Center for Artificial Intelligence (DFKI) and Finnish AI implementation expert Emblica, based in Helsinki, will support companies in digital transformation in the future. DFKI will contribute its research expertise in the technology fields of data analysis, generative AI, image and signal recognition and processing as well as AI-based forecasting. Emblica will share its experience in advising companies in the field of business process optimization through AI-based software solutions and their implementation. The collaboration is complemented by the consulting company Remode ME and their expertise in international business development and venture building. The joint program under the name "AI'd Forge" was presented for the first time May 30, 2024 at the digital convention "Noerd" in Rostock.

AI'd Forge aims to support companies in identifying the potential of artificial intelligence in their operations, developing customized AI solutions and implementing them.

Emblica focuses, in particular, on those areas in which AI promises a high return on investment and scaling opportunities. DFKI's broad research spectrum and Emblica's many years of practical experience offer companies a combination of in-depth industry knowledge and the latest research results. The aim is to implement these in such a way that the result is seamless integration into the existing company software.

Philipp Koch, Research Department Manager in DFKI’s AI in Medical Image and Signal Processing research area:

“We aim to support companies in the best possible way with the use of artificial intelligence through AI'd Forge. In doing so, we rely not only on the expertise and know-how of the DFKI but also on the experience of Emblica to jointly identify challenges and develop tailored solutions for companies. Our focus is on a trustworthy and needs-based collaboration as well as knowledge transfer so that companies can optimally leverage AI opportunities for their success.”

Teemu Heikkilä, CEO of Emblica:

"With AI'd Forge, we support companies throughout their individual journey to the optimal use of artificial intelligence - from the initial idea to the implementation of the solution, its operation and its continuous further development. We always consider a company’s needs and resources, and customize our support accordingly, including knowledge transfer to in-house developers. Together with DFKI's expertise and its broad research spectrum, we can significantly improve the way companies approach the implementation of AI."

To this end, DFKI and Emblica have defined a model process that holistically maps the introduction of AI processes in companies. It describes the entire process chain from knowledge transfer to the possible uses of AI through to implementation. In joint workshops, the AI'd Forge team works with the company to identify the business areas with potential for optimization. The next step is to examine which AI technologies and processes could provide a remedy. Before implementation, AI'd Forge draws up a business case that specifically elaborates the expected benefits. The DFKI and Emblica team then examines what is required to develop an adapted AI solution and estimates the time, personnel and financial costs involved. The roll-out occurs based on a concrete development and implementation plan, which also focuses on the transfer of knowledge to in-house developers.

Thanks to the collaboration between a German research partner and a Finnish implementation service provider, AI'd Forge offers companies from both countries access to leading expertise in the field of AI. In the future, AI'd Forge could serve as a model for cross-border and cross-sector collaboration with other European partners.

About DFKI

The German Research Center for Artificial Intelligence GmbH (DFKI) was founded in 1988 as a non-profit public-private partnership (PPP). It has locations in Kaiserslautern, Saarbrücken, Bremen and Lower Saxony, laboratories in Berlin, Darmstadt and Lübeck, as well as a branch office in Trier. DFKI has been researching AI for humans for over 35 years and is oriented towards social relevance and scientific excellence in the key future-oriented research and application areas of artificial intelligence. DFKI is one of the most important "Centers of Excellence" in the international scientific community. Around 1,560 employees from over 76 nations are currently researching innovative software solutions. In its laboratory in Lübeck, DFKI is researching AI in medical image and signal processing.

About Emblica

Emblica is a Finnish company that has specialized in artificial intelligence for over ten years. It develops and implements customized AI solutions for companies in the private and public sectors. With a focus on practical implementation and measurable results, Emblica has led AI implementation in numerous market-leading organizations.

About Remode ME

Remode ME is a consulting company based in Helsinki that helps innovative companies in Europe and Africa achieve growth and global success. Remode ME's international team develops and implements strategies for market entry, building B2B/BB2B partnerships, and venture building.

Further information: https://aidforge.eu

(Joint press release DFKI & Emblica)

How to fool an AI system

Rick Joosten — Tue, 06 Feb 2024 08:40:57 GMT

Lyhyesti suomeksi: Rakensimme aMazedin pakopelihuoneeseen tekoälyä hyödyntävän tehtävän. Tässä blogissa kerromme, miten tekoälyjärjestelmän "huijattavuus" liittyy esimerkiksi tehdaslinjaston laaduntarkkailuun.

After escaping from two of the escape rooms at aMazed during an after-work we started thinking, would it be fun to add AI elements to an escape room? We had a brief discussion about having such a feature with the team at aMazed before we even left the rooms. Shortly after, we received an email stating that aMazed was interested in working with Emblica to develop what ultimately became the AI gatekeeper. This blog will describe what the AI gatekeeper is and what we learned about fooling AI systems during its development.

Inevitably this post will contain some spoilers about the room. We tried to keep the spoilers to a minimum, but if you want to experience the room completely fresh you might want to read the rest of this post after you have escaped from the room. More information about the room Alchẽmia can be found on Amazed’s website. This post by aMazed, however, is safe to read and tells about their experience with this AI project as well as what it takes to develop an escape room in general.

The AI gatekeeper

⚠ Spoilers for the escape room Alchẽmia are coming up, you have been warned. ⚠

Professor Happygolucky has developed the elixir of happiness that will take everyone’s sorrows away. It is safely locked behind many puzzles and other safety measures so that it doesn’t fall into the wrong hands. The task we have developed is one of such gatekeepers. You need to prove that you are the professor to our “face recognition” to progress through the room. Of course, there is only one professor so the task is to look enough like her for the system to let you continue.

The gatekeeper has multiple core elements. First, it uses a camera feed to detect whether the elements that are associated with the professor are in view using machine learning. Secondly, the results are shown to the players on a screen with a thematic user interface so that the users can monitor their progress. Lastly, the computer is connected to a magnetic lock that unlocks the next part of the escape room. All this is wrapped up so that the game administrator can, if necessary, control the game from their control center and reset the gatekeeper for new players.

Making fooling the gatekeeper possible

To make sure the puzzle is possible to solve we cannot make the gatekeeper too strict in how it determines whether to let you continue or not. If you solve the puzzle but the system does not let you continue it can ruin the fun. On the other hand, if it lets everyone through no matter what it is not a great puzzle either. The problem of finding a balance between these two extremes we face not only in this project but appears in almost every project that involves automatic decision making. Let’s step out of the escape room to illustrate these problems so we don’t spoil the puzzle fully.

Consider a quality control system on an assembly line with the primary aim of ensuring that only products with all necessary components correctly assembled are shipped to customers. Striking the right balance is crucial; if the system is too lenient, it risks sending out defective products, while being too strict may lead to the rejection of perfectly functional items, increasing costs for the factory. How we collect data to train these models plays a pivotal role in determining how the system behaves. For example, if our training data comprises images taken from the assembly line with consistent angles and lighting conditions, the system may only recognize products that look identical. However, if a product arrives slightly rotated or a light breaks in the factory, the system might struggle to identify it as a good product and reject it. On the flip side, if our data only includes extremely faulty cases without instances of minor defects, the model might miss the nuances of what constitutes a good product, potentially allowing some faulty items to slip through and be shipped to the customer.

Making sure the training data covers all sorts of situations it might face is crucial. One technique to create more diverse training data is data augmentation. You can dive into the nitty-gritty details in our blog post here. In short, instead of keeping things too predictable, we throw in different angles, lighting scenarios, and maybe even simulate some hiccups. It's like giving our AI system a taste of real-life unpredictability, whether it's dealing with an escape room or managing an assembly line.

Now, let’s get back to the escape room. To make sure that our model makes a good puzzle we collected a diverse set of training data together with people from aMazed, and used different augmentation techniques. Is the puzzle hard enough, but still fun to solve? You will have to find it out by yourselves and see if you can fool the AI gatekeeper!

Remote-remote working

Vili Hätönen — Fri, 21 Apr 2023 13:41:00 GMT

Lyhyesti suomeksi: Kun yritys sanoo “tukevansa etätyöskentelyä”, se voi tarkoittaa hyvin eri asioita. Tässä blogissa Anna haastattelee Suomesta Marokkoon pyöräillyttä varatoimistusjohtajaamme Viliä.

Over the past three years, remote working has been discussed a lot. For an organization like Emblica, remote working was nothing new when the Covid-19 pandemic hit Finland in early 2020. Once the pandemic had forced many organizations to take a leap in their digitalization and remote work practices, consulting roles are now widely expected to allow hybrid (remote + on-site) work. However, the reality of what hybrid work looks like varies a lot.

To push the limits of remote work, and as a personal experiment, Emblica's vice CEO Vili cycled from North Europe to Africa while working full-time. Our People Happiness Officer Anna interviewed Vili regarding his personal experiences of this remote-remote work experiment.

Hi Vili! Thanks for joining me today to discuss remote work experiences at Emblica. You had a little cycling trip last year, could you tell us about it?

Yes, we decided to try an unusual remote-remote working setup last year. By remote-remote working, I mean someone working remotely from somewhere else than their default remote working location, which would be usually their home. I had an idea to cycle to Africa while simultaneously working full-time, and we decided to give it a go.

A social media update, the picture is from the coast of Denmark.

How did you come up with the idea to cycle to Africa? How far away is that from Helsinki?

Well, the previous year, someone asked me how I would get to my volunteering post in Namibia, and asked if I would cycle there. I thought they were crazy, and I told them that. But the idea had been planted, so a year or so later, I decided to act on it and cycle the 5000km from Finland to North Africa via ten different countries. Which probably makes me the crazy one.

Yes, it sort of does. To clarify, we don't push or expect our employees to be like this. Also, we want to ensure everyone's well-being is taken care of from all aspects, also while remote working. So how did you plan and prepare for the trip in advance?

The biggest part of the planning was organizing projects so that the trip would not bother customers and our internal team dynamics. It took me half a year to sort these out. This was really helpful since it motivated me to document certain things better and share responsibilities. The remote-remote working equipment was light: a laptop, a charger, a hard drive, and some clothes appropriate for visiting an office somewhere along the way. You don't need more than that, although you need to look after your work ergonomics when working in different locations with often non-optimal tables and chairs.

I didn't really prepare for the cycling, but I figured that I will learn along the way. I originally estimated the duration and route by checking a map app for a route from Finland to South Spain and dividing that by 100km, which was my guess for a daily cycling distance. Then add the work days between the cycling weekends and tada, I had my estimate of 5 months on the road.

0:00

/0:25

The 5000km route from Helsinki to Tarifa. The color indicates altitude. The bar plot shows the distance covered over time.

How long did it take you in the end?

Exactly 5 months, by the day. But only because I needed to make it, not because it was an easy pace.

Well, you reap what you sow. But five months is a long time. How did not seeing your colleagues affect your experience and your work?

I did miss the team for sure. You had so many great activities and fun times without me! But pair coding and working together on the same topic were not affected almost at all. Hybrid work is such an ingrained part of our everyday routine. However, brainstorming and throwing ideas around spontaneously did decrease a lot. When you meet remotely, you talk business and stay on topic. There is much less room to share random thoughts about recent advances in ML when you're not physically in the same space sharing the short empty moments between work tasks.

So working remotely hurts innovation?

On the one hand, yes, you tend to share and brainstorm less. But on the other hand, when you are far away, you can see the big picture better. Just make sure you have time for big thoughts, the ideas that really create change. Sometimes I didn't manage to have that time, but it had a lot to do with the practical issue of me trying to survive across the continent.

Sometimes one can find enthusiastic pair coders from unexpected places. Bordeaux, France

Were there any other practical things that made things difficult regarding work?

Working remotely went surprisingly smoothly. Of course, you need basic things like electricity, an internet connection, and food to not lose time. Because I was crossing Europe, the electricity for charging my devices and 4G for a decent internet connection was guaranteed in the cities over 98% of the time. Honestly, organizing a lunch without spending half a day moving around the new city proved the most difficult thing.

Well, as you know, sometimes organizing a team lunch can also be challenging when working in the office! Great to hear things went smoothly. How about the data privacy and security topics, how did you cover this part?

The most important decision, during your travel, regarding privacy and data security is where you work. Some meetings you can have in a café for sure, but most of the client work and anything HR-related needs to happen behind closed doors. In practice, that meant having a private room accessible for the whole day.

Where were you staying? What kind of accommodations would you recommend for a remote worker?

I was mostly staying in AirBnB's during the work week mainly for three reasons: cooking and doing laundry is so much faster when you don't need always to leave your flat; you get a calm, private space for work calls; and the wifi is not shared with a hundred other guests. Staying in a private accommodation was more efficient work-wise in many ways.

Talking about wifi, you can never trust that it will work. Always have a 4G plan available, and your phone charged. I don't think there was a single week I didn't need to share the internet from my phone. Luckily the costs for using mobile data are minimal for Finnish plans in Europe. If you work elsewhere, be prepared to sort out your prepaid as the first thing when you arrive in a new country.

Apart from internet access, what about another crucial topic: rest and work-life balance?

Both are really important and, in the case of my trip, both more easy and more difficult than normal. Work-life balance was easy: when you cycle across the Danish coast or Spanish mountains, you don't think about work, and when you sit in your AirBnB, you have no distractions from work. I recovered well from work while cycling, and from cycling when I stopped for work.

But overall, my schedule was too tight to have time to recover from both. Working and cycling for five months with a handful of rest days is not really sustainable. Unless you cover a shorter distance or are in a better physical condition. On the bright side, I didn't struggle to fall asleep once during the trip.

Sure, in this case, exercise was provided by the choice of transportation.

I think I would have found it easy to detach from work even if I would have been traveling by train or car; being immersed in a new environment does make it easy to live in the moment and forget projects for a moment. But organizing a healthy amount of exercise would require extra effort in those cases.

One gets to try many different remote working locations, and sometimes your work and travel days overlap. Düsseldorf, Germany (left); a park bench near Valencia, Spain (middle); an old mansion in Moliets-et-Maa, France (right)

Talking about extra effort, what unexpected extra effort comes with working remotely-remotely like that?

Booking accommodation and planning which places to go takes surprisingly much time. And I didn't even need to book transportation. Although, having a bike did bring some maintenance and route planning burden. But regardless of the transportation, I was practically moving flats almost every week. And once you are in a new place, all everyday tasks like grocery shopping take longer. So if you're working full days, don't expect to experience the city during the week: your evenings are mostly spent staying alive and preparing for the next change of locations.

That sounds like a lot indeed. What were the good parts of the traveling, I bet it wasn't only extra work?

The opportunity to live in 14 cities, and stay in ~46 more, it was amazing. And traveling by land was superb compared to flying: you get to see and experience so much more! Even though I got to spend a short time in each place, the experience differs greatly from a touristy visit. For example, I liked Paris much better now when I shortly lived there compared to a long weekend visit as a tourist. But people are the most important, like always. Regardless of my time working or cycling, I had plenty of opportunities to visit friends and business associates across Europe. I also encountered like-minded people spontaneously, both remote workers and hosts related to accommodations, and cyclists on the road.

If your work enables you to cycle a 20km downhill to a sunset on the Mediterranean shore, using that opportunity can be worth it. Murcia, Spain.

What advice would you give someone planning a longer stay outside their standard remote working location?

Oh, there are many things. First of all, communication: discuss your plan with your supervisor and team long before you're about to leave. It is much easier to get people to agree to non-conventional plans when you show that you work well remotely and the communication between you and your team is fluent, regardless of the location.

Cybersecurity is another topic that is often overlooked. Have your devices encrypted, password manager accessible from several devices, and two-factor authentication in use. We provide opsec training for our employees and ensure that our customer's information is secure where ever we work. In computer security, the human is the weakest link. Make sure that you're not the weak link that breaks. Then people let you also travel with work devices.

A huge topic is also the climate impact of your travel. The bare minimum is always to be aware of how much emissions your travel roughly produces and consciously decide that you will cause those emissions. There are plenty of emission estimation apps, personally I trust Compensate app to be on the safe side of the estimations.

Everything an IT consultant needs for half a year fits in a surprisingly small space. The work equipment, emergency shelter & first aid (for cycle and the cyclist), water, food, and bike locks in the Las Bardenas desert in Northern Spain.

Lastly: remote-remote working is easier than you think. You don't need so many things as you might first assume, and with mobile phones, international internet coverage, and all the apps traveling is almost too easy. Yes, you need to be able to lead your own work and carry some responsibility over where your next food will manifest itself, but the opportunity to experience other places is very much worth it. If you work in IT, most probably the biggest obstacle in front of your remote-remote work experience is you and your own routines. Although most employers have restrictive policies on where and how much you are allowed to work abroad, traveling by land a few hours to a different city for a week or two is within the reach of most IT professionals.

Would you say that everyone should work remotely?

Absolutely not, there are a lot of aspects that you can't replace when working remotely. Especially when you don't know your colleagues well, getting to know them in person is crucial for communication and work dynamics. That's why we have hybrid roles in Emblica. And this 5-months-of-remote was a rather extreme experiment in many ways. That said, I would recommend everyone to try working remotely if possible and organize a remote-remote working experience once the basics are in place.

Those are wise words; thanks for sharing your experience!

Thank you for taking the time!

The 5000km mark was reached just before the rock of Gibraltar, the United Kingdom.

Anna: It's great that everyone in Emblica can work in a hybrid model here. It is essential to meet your colleagues face to face, but also as an employer, to make remote working easy when possible, as some of us need quiet days at home or want to stay longer in the summer cottage or abroad. If there is an option to choose where you can work from, and it's planned well, I believe it dramatically impacts work satisfaction!

In many ways, this was an exciting and even successful experiment. Although, I am not sure if anyone will repeat the exact thing soon. Nevertheless, at Emblica, we want to support our employees to have ways of working that suit them, naturally prioritizing our customers' needs and team dynamics.

Emblica is a data and AI consultancy solving real-world problems with a large toolbox of data-relevant technologies. We work on-site and remotely, prioritizing our customers' needs while pushing the limits of ways of working to suit our employees' preferences.

Datatieteilijän sankarimyytti

Teemu Heikkilä — Fri, 24 Mar 2023 08:30:10 GMT

Shortly in English: Data scientists are today's superheroes who envision big and save any business challenge by spinning some data in notebooks. Sound plausible? Read here what are the problems of putting data scientists on a pedestal, and how to avoid them.

Analytiikka, big data, tekoäly, AI, data science... paljon sanoja, vielä enemmän hypeä, sinisiä sähköaivoja, robotteja ja avaruusaluksia. Olemme kehittämässä liiketoimintaa alueelle jossa fantasioidaan utuisista utopioista. Samaan aikaan todella iso osa yrityksistä haluaa esiintyä edelläkävijöinä ja tarjoilee yksinkertaisimmatkin algoritminsa samalla kiiltävällä kuvastolla. Olemmeko siis jo saavuttaneet haavekuvat vai onko todellisuus niistä vielä kaukana?

Moni kirjoitus muistuttaa, että tekoälyä ja tekoälyksi miellettäviä järjestelmiä on jo yhteiskunnassamme käytössä laajalti. Näiden ratkaisujen todellisesta toiminnasta, rakenteesta ja hyödyistä puhutaan kuitenkin vain vähän, minkä vuoksi ei ole ihme että äänekäs tekoälyhype täyttää ymmärryksen tyhjiötä utopistisilla lupauksilla. Tämän seurauksena datatiede näyttäytyy tavalliselle kansalaiselle lähes mustana magiana, joka on ns. "tavan tallaajan" ulottumattomissa.

Datatiede on nostettu niin korkealle jalustalle, että se alkaa muistuttaa norsunluutornia

Miten tämä hehkutus ja tekoälyn mystifiointi on vaikuttanut ammattiaan harjoittaviin datatieteilijöihin? Ensinnäkin, moderni datatieteilijä on saavuttanut lähes myyttisen imagon. Työpaikkoja on enemmän kuin tekijöitä, kompensaatio työstä on kova ja työtehtävät mielenkiintoisia. Moni juuri aloittanut datatieteilijä ajattelee tulevan uransa koostuvan sankarimaisesta ongelmien ratkomisesta ja täydellisten ratkaisujen tuottamisesta yhdessä samalla tavoin ajattelevan eliittikollegion kanssa. Jokainen toki huomaa, että todellisuus on ehkä hieman ihannettaan karumpi, mutta sankari-datatieteilijän ihanne näyttää muovanneen alaa salakavalasti.

Sankari-eliittiin kuuluva datatieteilijä visioi tulevaisuuden ratkaisuja norsunluutornissaan.

Harmillisen suuri osa suomalaisten organisaatioiden datatieteilijöistä onkin järjestäytynyt tätä sankari-eliittiä mukaillen. Siiloutunut, pieni porukka pohtii ongelmia usein erillisessä data- tai analytiikkatiimissä, ikään kuin omana supersankarien joukkonaan, joka rientää tarvittaessa organisaatiota painavien dataongelmien kimppuun. Tämä asetelma luo pohjan monenlaisille ongelmille:

Ensimmäinen ongelma on kulttuurinen: datatieteilijöiden eliittiporukka yksinkertaisesti ärsyttää, eikä sooloilu tuo projekteihin toimivia lopputuloksia. Digitalisaatio on pakottanut ohjelmistokehittäjät erittäin tiiviiseen yhteistyöhön liiketoiminnan kanssa, ja onkin ilahduttavaa nähdä kuinka sujuvasti Suomessa tehdään nykyään monialaisia softaprojekteja. Jostain syystä datatiimit nähdään kuitenkin tästä erillisenä porukkana. Kun kehitystiimiin tuodaan dataosaamista, painajaisten “eliitti-tiimi” vaatii pääsyä kaikkeen dataan ja tuuppaa kehitystiimin backlogille valtavasti tehtäviä liittyen esimerkiksi datan käsittelyyn. Lopputuloksena kaikesta vaivasta on kasa hankalasti integroitavia Python-notebookkeja ja lupaus paremmasta tulevaisuudesta. Harmillisesti syvemmälle yhteistyölle ei jäänyt kuitenkaan aikaa, koska dataporukka on jo kutsuttu seuraaviin seikkailuihin.
Toinen ongelma liittyy datatiimien diversiteetin puutteeseen. Koodareiden diversiteettiongelmista on keskusteltu iät ja ajat, mutta datatieteilijät voitaisiin lisätä tilanteen polttavuuden vuoksi ongelmalistan kärkeen. Vaikka hyvä tiimi osaa toki ottaa ulkopuolelta tulevia vaatimuksia ja näkemyksiä huomioon, on tiivis samoin ajatteleva porukka kuitenkin pohjimmiltaan kaikukammio - ja se näkyy. Ratkaisut rakennetaan yhdenlaiseen maailmankuvaan ja moraalikäsitykseen perustuen, eikä toisenlaisia näkökulmia tajuta ajatella tarpeeksi laajasti. Edellämainittu eristäytyminen erikoisryhmäksi ei tietenkään helpota tätä tilannetta, vaan pahentaa kuilua “eliitti-tiimin” sekä moninaisten sidosryhmien välillä.
Datatieteilijöiden diversiteettiongelmaa voi laajentaa myös työmenetelmiin. Jostain syystä nimittäin tuntuu, että ohjelmistokehityksen menetelmät koetaan jotenkin datatieteestä irrallisena asiana, vaikka todellisuudessa datatieteilijöiden menetelmät eivät yksinään tuota suurtakaan arvoa, jos dataprojektien lopputulokset jäävät ohjelmistoratkaisusta irrallisiksi. Eripuraakin syntyy kun koetaan, että kalliiden “eliitti-aivojen” työaikaa ei kannata tuhlata dataputken hiomiseen, eikä siihen oikeastaan ole kiinnostustakaan. Versionhallinta, deploy, devops ja testaus ovat normaali osa jokaista ohjelmistoprojektia, eikä datatieteilijöiden “magialla” pötkitä datan hyödyntämisessä pitkälle, mikäli nämä vaiheet puuttuvat.

Kuinka väistää sankarimyytin tuomat ongelmat?

Datatieteilijät työskentelevät samassa ryhmässä muun devaustiimin kanssa.

Datatieteilijät eivät vain suunnittele, vaan toteuttavat softaa yhdessä muun tiimin kanssa. Minkä tahansa dataprojektin kannalta on hyödyllistä, että datatieteilijät ja -insinöörit kykenevät suoriutumaan datan hyödyntämisen lisäksi myös projektissa vaadittavasta perinteisestä ohjelmistokehityksestä. Tällaiseen työskentelyyn kuuluu ymmärrys siitä, että koko tiimi rakentaa yhdessä softaa johon tuodaan lisäarvoa datalla. Onnistumisen kannalta kaikki toteutettava työ onkin siis arvokasta, ja koko tiimi kantaa siitä yhteisen vastuun.
Projektien omistajuus kuuluu alusta asti koko tiimille, etenkin asiakkaalle.
Tiimin diversiteetin puuttuminen pitää tiedostaa ja asian korjaamiseksi on tehtävä töitä. On hyvin loogista, että erilaisista taustoista tulevat ihmiset onnistuvat yhdessä ymmärtämään laajempia käyttäjäryhmiä sekä käyttötapauksia, jonka vuoksi heillä on suurempi todennäköisyys onnistua älykkäässä, useita käyttötapauksia palvelevassa datan hyödyntämisessä. Siksi toimiva datatiimi saavutetaan palkkaamalla ihmisiä erilaisista taustoista.
Datan hyödyntämisen tueksi tarvitaan ymmärrystä bisneksen erityisominaisuuksista. Dataprojekteja ajaa lähes aina taloudellisen kasvun tai säästöjen tavoittelu, eikä projekteja voi siksi suunnitella tai toteuttaa ymmärtämättä kunkin liiketoiminta-alueen lainalaisuuksia. Meidän kannaltamme on ollut hyödyllistä, että lähes kaikissa projekteissamme on mukana palvelumuotoilua, joka mahdollistaa liiketoiminnasta keskustelun kokonaisuutena. Pelkkään teknologiaan keskittyminen tarjoaa mahdollisuuden syvään erikoistumiseen, mutta liiketoimintaymmärryksen tuominen osaksi projektien toteutusta tekee edistyneistä ideoista oikeasti hyödyllisiä ja käyttökelpoisia.
Emme toimita pelkkiä Python-notebookeja (paitsi jos niitä hyvästä syystä pyydetään). Työtapojen monipuolisuuden osalta meillä on pienestä tiimistä huolimatta laajasti kokemusta ja ambitioita erilaisia teknologioita ja ongelmia kohtaan, emmekä jätä töitämme nopean “datapyöräytyksen” tasolle. Vaativatpa projektit sitten arkkitehtuurin suunnittelua, syviä neuroverkkoja, optimointi- tai kombinatoristen ongelmien ratkaisua, tiimistämme löytyy siihen sekä kiinnostusta että osaamista.

Datatieteen tilanne ei ole missään nimessä menetetty, mutta juuri siksi aiheesta on syytä puhua. Ratkaisu ongelmaan löytyy toimintatapojen ja tiimien diversiteetistä, kollegoiden välisestä yhteispelistä ja projektien laajasta omistajuudesta. Näin rakennetaan onnistuneita dataprojekteja, joilla on kyky kasvattaa liiketoimintaa ja tuoda muutakin lisäarvoa yhteiskuntaamme. Vastuu ja mahdollisuus vaikuttaa on meillä, datan ammattilaisilla.

Etsimme jatkuvasti uusia tekijöitä, jotka eivät pelkää tuoda näkemyksiään julki. Tutustu meihin työnantajana täällä: emblica.com/careers

Emblica ei ole se tavallinen datatiimi. Rakennamme räätälöityjä ratkaisuja datan keräämiseen, käsittelyyn ja hyödyntämiseen alalle kuin alalle, etenkin R&D:n rajapinnassa. Oli kohteemme tehdaslinjasto, verkkokauppa tai pelto, löydät meidät työn touhusta, kädet savessa - ainakin toimistoltamme Helsingistä.

There Is More To Computer Vision Than Video Cameras

Samuel Piirainen — Mon, 20 Feb 2023 19:25:04 GMT

Shortly in Finnish: Mitä etuja millimetriaaltotutka tarjoaa havainnointiongelmissa, joissa näkyvyydessä on puutteita tai yksityisyydensuoja on ensisijaista? Lue tästä, miten me Emblicalla olemme ratkaisseet mmWave-tekniikan avulla ongelmia, joissa kameraa hyödyntävät konenäköratkaisut eivät ole olleet mahdollisia.

For the past several years, automation and AI have had great levels of interest in almost any industry from car R&D to agriculture ¹ ². Computer Vision (CV) in particular is a domain that continues to find countless applications via Machine Learning. However, these solutions largely rely on Camera-based computer vision—perhaps as it is the most intuitive kind of vision to us humans. At Emblica, we're comfortable using also less common tools and adopting a technology that is not common knowledge even in the domain of Machine Learning specialists we operate in. This can provide our customers unique solutions to problems otherwise deemed too difficult to solve. In this blog, we focus on one alternative to Camera-based Computer Vision, the millimeter-wave radar.

Limitations of Camera-based Computer Vision

As mentioned, camera-based systems are not always suitable for CV applications. Here are some reasons why:

Privacy concerns: Despite the availability of technology to blur faces, privacy remains a concern. There is always the risk of malicious third-party access to raw video data, even if it is anonymized in post-processing.
Environmental issues: Cameras are susceptible to weather conditions, such as fog or dirt on the lens. This makes designing camera-based systems more complicated.
Hardware requirements: The specific hardware requirements for camera-based systems can be challenging.

Alternatives to Cameras

In light of the challenges posed by camera-based computer vision solutions, it is important to consider alternative technologies that can address these issues. One such technology is radars, which offer several advantages over cameras. For instance, radar signals cannot be used to identify facial features, thus addressing privacy concerns. Additionally, radars can penetrate through sparse matter, allowing for clear radar images even in low-visibility conditions like dust, rain, or snowfall. Furthermore, radars do not have a camera lens, eliminating the risk of obstructions in the field of view. These features make radars a promising technology for many special Computer Vision applications where cameras are inadequate.

One well-known technology—in many ways analogous to radars—are LiDAR-based systems. While Devices utilizing the LiDAR-technology have found a plenty of use in many industrial applications like self-driving cars ³. For those interested to read more about LiDAR-technology, here is our blog post on the topic. However, a downside of the technology is these devices are very expensive. This makes them impractical for many applications where the system must be affordable for mass production.

A millimeter-wave radar (mmWave radar) is a small, relatively cheap (available at prices well under 200 euro at the time of publication) radar that operates on a very precise millimeter-wave signal. It is precise enough that it is possible to discover the position and gait of a person even through sparse matter—such as obstructions on the lens—without discovering the facial features.

What is a Millimeter-Wave Radar?

A mmWave radar is a type of radar that emits short-wavelength electronic waves known as millimeter-waves (mmWaves), that objects in their path then reflect back ⁴. The radar captures these reflected signals, and the underlying software of the system can usually calculate properties such as the range, relative radial velocity, and angle of arrival of targets within the signal. These are physical properties traditional to a radar-system, usually obtained via signal processing. This is in contrast to a camera, whose signal only reveals the 2D position.

An interesting property of the mmWave-signal in particular is that its wavelength in the electromagnetic spectrum is short. This has two advantages: Firstly, the radar system can be relatively small in size. Secondly, the accuracy of the signal is high. For example, according to Texas Instruments, a mmWave radar operating at 76–81 GHz can detect movements as small as a fraction of a millimeter ⁴.

A top-down view of two humans observed in the mmWave radar signal.

Because the domain is very different to camera-based computer vision, discovering important information from signals like mmWave-signal requires sophisticated technology and specialized expertise. Machine Learning tools can be a great help here.

Moreover, many off-the-shelf systems of mmWave radars come with firmware for transforming the radar signal into a three-dimensional point cloud. Once objects are observed in the vicinity, their position and relative radial velocity can be seen in the point cloud data output by the system. This is also where Machine Learning and AI come into picture. Because the domain is very different to camera-based computer vision, discovering important information from signals like mmWave-signal requires sophisticated technology and specialized expertise. Machine Learning tools can be a great help here.

Millimeter-Wave Radar Applications

Millimeter-wave radar has numerous potential applications. One such use case is the detection and counting of humans within the radar's range. For instance, smart doors or elevators that can distinguish between someone simply walking by and someone waiting for the door to open. Another potential application is detecting falls among elderly residents in nursing homes, which can be crucial for their well-being and health. While difficult problems for classical algorithms, such tasks can be automated with modern Machine Learning tools, thanks to their capability to deal with large masses of data, without compromising the individuals' privacy.

While difficult problems for classical algorithms, such tasks can be automated with modern Machine Learning tools, thanks to their capability to deal with large masses of data, without compromising the individuals' privacy.

Even outside cars, autonomous vehicles and machines are prime example of environment where radar can be used and where some of our customers are developing new solutions.

Object detection solutions with the mmWave radar are more common than one might think. The radar is used by several car manufacturers (e.g. Tesla, Mercedes-Benz, and Audi ⁵) in sensor fusion combined with other radar types like LiDARs to improve the capabilities of the cars' autonomous driving systems to make decisions.

Even outside cars, autonomous vehicles and machines are prime example of environment where radar can be used and where some of our customers are developing new solutions.

mmWave radars have been used to measure heart beats and breathing remotely.

Advantages in comparison to using just the more well-known LiDARs include improved long-range object detection, lower price, and more accurate detection of dynamic targets. In comparison to vision sensors, the object detection performance of mmWave radars is also affected less by extreme weather.

The possibilities do not end with object detection, mmWave radars have been used to measure heart beats and breathing remotely and with sophisticated algorithms object identification from the signal is also viable. Maybe we would like to know how many shopping mall customers prefer to use shopping carts or hand-held baskets? The difference in the objects can be seen in the mmWave-signal, and extracted using Machine Learning tools.

Human Identification and Tracking in Indoor Environments

At Emblica, we have investigated the viability of the mmWave radar in the problem domain of human detection and tracking in an indoor space. In particular, we investigated the Emblica office and the public transport. These are good examples of environments where the mmWave radar has an advantage: (1) The spaces are in the public, so anonymity has a high priority. (2) Public surfaces are prone to getting dirty and degraded, either due to natural wear and tear or due to harassment.

MMWR-Tracker system data collection setup in Emblica premises.

This task can be reduced to a problem of tracking humans in an indoor space. To this end, Emblica engineered the MMWR-Tracker—an end-to-end system for human detection and tracking in an indoor environment.

The MMWR-Tracker system architecture.

The system uses the U-Net architecture ⁶ —A Deep Neural Network for Image Segmentation applications—to first segment the mmWave-signal from the mmWave radar system into a segmentation map where signal noise is filtered away and humans detected by the signal remain. These segmentation maps are passed to a tracking module, that utilizes a Kalman filter and the Hungarian Method—methods commonly found in object tracking solutions in the domain—to track the humans detected.

Data collection within a bus.

The mmWave radar directed towards the bus aisle.

Once installed in an edge device backend, the MMWR-Tracker works in real-time, monitoring the environment and relaying the spatial information of detected objects to the cloud.

Tracking humans is just one possible application of the radar technology. Other possibilities include detecting whether a human or a pet is moving in its field of view—think of smart lighting or security systems.

Conclusion

In conclusion, while many computer vision solutions rely on cameras, the mmWave radar offers an alternative solution that is especially useful in scenarios where visibility and privacy are limited. Although not as well-known as other computer vision tools, such as stereo cameras or LiDAR systems, the mmWave radar has unique properties that make it a valuable consideration for specific use cases. Emblica has explored the potential of this technology to address problems that cannot be solved with camera-based computer vision solutions.

Emblica is not your average data team. We build customized solutions for collecting, processing, and utilizing data for all sectors, especially at the R&D interface. Whether our target is a factory line, an online store, or a bus, you can find us busy at work, hands in the clay - at least at our office in Helsinki.

Points about point clouds

Rick Joosten — Tue, 14 Feb 2023 08:00:00 GMT

Representing 3D space digitally can be a challenging task. The most common way to process visuals about our world comes in the form of a 2D image, either from a photograph or video. Capturing 3D images might be more common than you think. Sensors using technologies such as LiDAR or radar do this. These sensors often produce data in the form of point clouds. In this blog, we will walk you through;

What point clouds are.
How point cloud data is created.
What kind of problems can be tackled using point clouds.

Lyhyesti suomeksi: Tutkat, lidarit ja muut anturit ovat kustannustehokas tapa tuottaa pistepilviä, eli moniulotteisia pistejukkoja. Lue tästä tarkemmin, miten pistepilvidataa luodaan ja millaisia ongelmia voidaan ratkaista pistepilvien avulla.

What are point clouds?

Point clouds are simply collections of points in space. Most of the time we are talking about 3D space, but a 2D scatterplot could also be considered a point cloud. The same goes for a higher dimensional space.

There are two important characteristics of point clouds. First, a point cloud is unordered. This means that the set of points can be shuffled and put in any order and it would still be the same point cloud. In other words, the first point is unrelated to the second point. This is not the case in regular photographs, two pixels that are next to each other are related to each other. You cannot shuffle them around and still get the same picture. This difference is important to remember when discussing point cloud algorithms later.

The second characteristic is that the space in which the point cloud exists needs to have some way to calculate the distance between two points. The most commonly used space is 3D Euclidean space where each point has an x,y, and z coordinate.

How are point clouds created?

The most common methods for generating 3D point cloud data are using LiDAR, radar, and Photogrammetry. LiDAR (Light Detection and Ranging) sensors use a pulsing laser to measure an object’s distance from the lidar based on the time it takes for the light to bounce back to the LiDAR unit. Combining this with the location of the sensor, and its directionality, the 3D location from where the pulse was reflected can be calculated. The LiDAR sensor can be static on the ground or attached to a moving object such as a plane or a car. In the latter case, extra care has to be taken to know the exact location and direction of the sensor. These days even modern iPhones come with a built-in Lidar

Creating a 3D image using the iPhone's built-in LiDAR sensor resulting in a 3d image of the scanned object.

Oftentimes a color value as well as an intensity measure of how much of the laser beam is being measured for each point. Combining these with the depth information creates richer data and can increase the usefulness for solving machine learning problems.

Similarly to LiDAR sensors, radar can be used to create point cloud data. Instead of light, radar uses electromagnetic waves to detect objects. This can be beneficial compared to LiDAR sensors when detecting is necessary through objects that block light but not radio waves. One place where both light and radio waves cannot penetrate far enough is underwater. Here sonar could be used to create point clouds using the same principles as LiDAR or radar.

The last common method is photogrammetry. This method creates a 3D image by interpreting multiple 2D images of an object. This is analogous to how humans see in 3D. These images usually have a much lower resolution compared to the previously discussed methods but are generally much cheaper to create. For example, Google Earth uses photogrammetry to create a 3D image of buildings based on satellite images.

What machine learning tasks can we do on point clouds?

There is a point cloud, now what? Most problems on point cloud data that we can solve with machine learning fall into one of three categories: classification, segmentation, and localization. This section will discuss each of the three problems and how to solve them.

Classification

Classification is the task of detecting what object is in the point cloud. For example, products coming down a factory conveyor belt can be scanned and classified whether they are good or bad based on its shape. In this case, the point cloud only contains a single object to be classified into a class. Of course, the number of possible classes can be bigger than two.

Segmentation

Segmentation concerns itself with dividing a point cloud into recognizable parts. This can be done at two levels. The first level is part segmentation. Here the task is to segment an object into recognizable parts. For instance, we know that the object is a table but find the points that are part of the top of the table and which points are part of the legs. The second level is semantic segmentation. Instead of segmenting a single object into its separate parts, semantic segmentation is the task of dividing a point cloud with multiple objects into distinct objects, for example, dividing a scene that is created by a self-driving car into pedestrians, other cars, buildings, busses, etc. so that the car knows what kind of objects are around.

Localization

Object localization tasks are not often mentioned in the classic academic point cloud algorithms but can prove extremely useful. The goal of the task is to pinpoint a specific location on an object. This could for example be useful on a factory line where a robot has to interact with a specific part of an object. Each object will be the same but there might be variations in the exact location which the robot has to account for. Furthermore, if we can localize an object over time, it is possible to track the object’s trajectory.

Three different tasks: 1) Classification, what is the object in the point cloud? (a pyramid) 2) Segmentation, segmenting the point cloud into distinct elements 3) Localization, locating the coordinates of the pyramid's top point

So how do we solve these problems using machine learning?

One way is to turn to proven efficient techniques from computer vision utilizing convolutional neural networks (CNNs). From a 3D point cloud, it is possible to create a 2D depth map by essentially looking at the point cloud from a top-down birds-eye perspective. This creates an image where each pixel represents a column of the point cloud. Information about the height, number of points, reflectivity, or other characteristics can be encoded in separate channels just like a color image has a red, green, and blue channel.

The analogous structure of 2D pixels in 3D is voxels. Instead of a grid of 2D pixels, the point cloud can be represented as a grid of 3D boxes. Because the grid has a regular structure, voxels close to each other contain information about that region of the point cloud. Just like for 2D images, CNNs can be used to solve our tasks. However, the size of the image in a voxel-based approach scales cubically with the size of the point cloud compared to quadratically for 2D pixels. This makes using voxels significantly slower due to the increased overhead.

Creating a 2D picture is a good representation to use for machine learning tasks because it is a well-studied use case. Point clouds are however closer to what humans see. Machine learning using a 3D point cloud as input without processing it into a regular grid structure cannot rely on this structured nature. At the core of most solutions built upon unaltered point clouds lies PointNet (or its extension PointNet++) which deals with the unordered nature of point clouds. The next section will go deeper into the workings of this PointNet. For more information on the pixel, voxel, and other approaches check out this document by Kyle Vedder.

PointNet

The PointNet architecture is designed to create a global representation of the whole point cloud in a single vector by using only permutation invariant operations. Most importantly the aggregation function cannot rely on the order of the input. PointNet uses max pooling to create the global representation (no matter what the order is, the maximum will always be the same).

With the global representation, we can now add the final part of our network depending on the task we want to perform. For classification, we add a classification head that predicts to which class the whole point cloud belongs to.

If the goal is to predict the location of an object in a space we can add a regression head after the global representation. Using supervised learning methods this regression head can learn the exact location of a feature of the point cloud. For example, pinpoint the center of a wheel of a car and give its x,y, and z coordinates.

For segmentation, the model architecture can be similar to the classification architecture. However, instead of classifying the whole point cloud, segmentation requires the classification of each point. What makes segmentation tasks more challenging is that the global representation created by the PointNet might not contain enough fine-grained information about the object. One way to alleviate this problem is to concatenate the earlier layer’s output to the global feature vector to also have information about earlier, less global, abstractions. This works reasonably well for many cases but for more challenging segmentation problems PointNet++ was created to deal with this problem.

This post will not explain PointNet++ in full detail. If you are interested, I highly recommend reading the original paper, or slides and youtube videos made by Maziar Raissi about PointNet and other point cloud methods. In brief, PointNet++ clusters points in the point cloud and applies a mini-PointNet to each cluster instead of the whole point cloud. This ensures that more local information is retained and segmentation can perform better.

Final thoughts

Point clouds are a great way to represent a 3D environment. Tasks that can be defined as a classification, segmentation, or localization task can be solved with the help of a PointNet. However, simply having a digital representation of a real-life environment can already be beneficial. For example, creating point clouds at different points in time allows for easy comparison of a 3D object. Furthermore, it is easy to zoom in on or rotate a digital object which might be difficult to do in the real world. As an additional benefit, most methods of creating point clouds are inherently more private compared to photos or videos because no personal data can easily be gathered from a point cloud.

Emblica is not your average data team. We build customized solutions for collecting, processing, and utilizing data for all sectors, especially at the R&D interface. Whether our target is a factory line, an online store, or a field, you can find us busy at work, hands in the clay - at least at our office in Helsinki.

AI ethics guidelines and their shortcomings

Rick Joosten — Tue, 17 Jan 2023 08:00:24 GMT

Lyhyesti suomeksi: Monet organisaatiot ovat luoneet ja julkaisseet omia tekoälyn eettisiä ohjeistuksia edistääkseen tekoälyn vastuullista ja eettistä kehittämistä ja käyttöä. Tämä blogikirjoitus kertoo, miksi periaatteelliset tekoälyn eettiset ohjeet eivät yksin riitä takaamaan eettisyyttä.

The ethics of artificial intelligence (AI) is a broad research area that exists between philosophy and computer science. It encompasses everything from AI accountability (questions such as "who is responsible when an AI system makes an error") to fairness and transparency. Many companies, organizations, governments, and institutions have created and published their own AI ethics principles and guidelines to promote responsible and ethical development and use of AI.

This blog will discuss what these principles and guidelines look like and whether self-created guidelines are actually useful when it comes to doing business ethically. This discussion will mostly be based on Brent Mittelstadt’s paper Principles alone cannot guarantee ethical AI of which there will be a short summary, and other sources and references are linked to in the text. Finally, there is a brief section describing ways we can improve or supplement AI ethics guidelines.

AI ethics statements

Discussing the ethics of AI technology initially was the domain of academics. Early philosophical discussions on artificial general intelligence and machine consciousness made way for discussions about how we can ensure the ethical use of AI systems as AI technology advanced. Most of the current output from research, private, and political organizations are in the form of ethics guidelines such as the EU’s high-level expert group on AI’s guidelines. These guidelines are designed to be used as a framework to give guidance on how to deal with ethical issues and inform policy decisions but are not legally binding in and of themselves.

The German organization AlgorithmWatch started collecting and labeling these AI ethics guidelines in 2019. Currently, their inventory holds 167 guidelines from various companies, organizations, and governments (this does not include legislation). These guidelines are mostly focused on systems for automated decision-making. Of these guidelines, only 8 have a binding agreement describing an enforcement mechanism. All others only contain either voluntary commitments or simple recommendations and principles. Furthermore, the guidelines come majorly from Europe and the US. This shows a heavy influence of western values on publicly available guidelines with almost no representation of African, South and Central American, and Central Asian countries (excluding India).

Current ethics initiatives started by the industry can be seen as virtue-signaling according to critics ² ³ ⁴ ⁵. AI development companies have a vested interest to appear to be self-regulating in order to delay any legislation that would make their work more difficult to do or that would cut into their profits. So by publishing their own guidelines, they can present themselves as responsible actors in AI development. Whether this is the intent of the companies or not is up for debate, but by publishing guidelines companies can point towards it and claim to conduct business ethically even when the guidelines are limited in scope. The main limitation is often that the guidelines stick with high-level principles. For example, Google’s recommended practices for AI (one of the shortest in the industry) contains: “Use a human-centered design approach”. This is expanded by some advice on what a human-centered design approach such as “Design features with appropriate disclosures built-in” without mentioning what appropriate disclosures are. Another noteworthy limitation is that only a few guidelines explicitly address topics like the possible effects of AI systems on democratic control, political abuse of AI systems, or the ability of AI systems to reduce social cohesion by creating echo chambers. The only ones that do are created by research institutes, the Montréal Declaration for Responsible Development of Artificial Intelligence, and the AI Now 2019 Report. The latter report is also the only one discussing “hidden” social and ecological costs such as AI technologies creating consumer practices that contradict sustainability goals. To meet sustainability goals, care should be taken to reflect on the need to mine one-way-use materials, the energy consumption of running AI services, or how electronic waste should be handled.

Critics say that most corporate AI ethics discussions should be described as ethics-washing. Ethics-washing can refer to cases where there is no real mechanism to implement these guidelines or to assess that development is more ethically aware. On the other hand, reducing ethical questions to solvable technological problems will also result in ethics-washing. While creating technical solutions to identify and mitigate ethical problems can be necessary to turn a high-level principle into practice, seeing most ethical problems as solvable using technology overlooks the broader social problems and reduces continuous discussion. Later in this post, there will be more discussion on how to make sure AI ethics is not simply “technological solutionism”.

While principles give broad gestures where it is easy to find consensus among large and diverse groups of people, they do not give any moralistic judgment. For example, the principle of “AI should be fair” is widely accepted but does not indicate what is considered fair. In some situations, equality, where everyone is treated the exact same, is preferred. In other, you might want to give some preferential treatment to compensate for a group that has historically been disadvantaged. It is up to the individuals who are implementing some AI system to make a decision about what fairness means in this case. The ethics principles only dictate that it should be fair, but the interpretation of how it should be operationalized is up to individuals. This shows that ethical principles can help in guiding ethical AI development but cannot guarantee the final outcomes of an AI system developed with these principles provides an ethical outcome. Most AI ethics researchers seek to complement or change the principled approach to dealing with AI ethics. There are some, however, that claim that if the principles are comprehensive enough, the guidelines will not oversimplify the ethical debate.

Drawing of fairness and equality in an ethical landscape via Stable Diffusion.

Principles cannot guarantee ethical AI

In the paper Principles alone cannot guarantee ethical AI, Brent Mittelstadt takes the position that guidelines with principles are not enough to ensure AI organizations will behave ethically. The stance is reinforced by empirical experiments showing that reading ethics guidelines have no significant influence on the ethical decision-making of software developers. When looking at publicly available ethics guidelines, reviews show that most center around four main principles, respect for human autonomy, prevention of harm, fairness, and explainability. These principles are similar to medical ethics which have a long history of principled ethical standards. However, Mittelstadt points out four main differences between medical ethics and AI ethics and critically examines why ethical principles are suitable for the field of medicine, but not for AI development. This section will discuss these differences and go over suggestions on how we can supplement these ethical principles.

Differences between principled medical ethics and AI ethics

1 Common aims and fiduciary duties

A medical professional has a moral duty to look after the well-being of their patient. This primary goal is absent in AI development. AI developers do not have something that could be equivalent to a “patient”. Due to the lack of a fiduciary relationship with a “patient” and common pressures to behave against the public interest, users of AI systems cannot rely on the developers to keep their best interests in mind.

2 Professional history and norms

Medical professional history is significantly longer than that of AI development going back as far as the Hippocratic oath in the western world. Throughout history, ethical guidelines have been created to serve as a basis for ‘good’ clinical decision-making. These long-standing traditions can be referenced when having to translate and contextualize a high-level ethical guideline into practice. AI development cannot reference long-standing norms. Furthermore, AI development varies wildly in approach, goals, backgrounds, histories, and moral obligations. These two things combined leaves the developers to interpret and resolve ethical dilemmas as they see fit. Without a large history and common goal to guide them.

3 Proven methods to translate principles into practice

High-level principles do not automatically translate into practice. A high-level principle has to be translated into norms and then into practical requirements. The normative decisions made while translating high-level principles must take into account the specific technology, applications, and local norms. This will lead to conflicting requirements depending on the context in which the principles were translated. Because of the need to interpret the principles depending on the context and the lack of longstanding historical norms, principles do not guarantee consistent ethical behavior. This could be alleviated by creating sector- and case-specific guidelines, technical solutions, and empirical knowledge of how AI can impact its surroundings instead of top-down general guidelines.

Compared to the field of medicine, there are also no common bodies that help determine what is ethical behavior in day-to-day practice or assessing particularly difficult cases. Bodies like these create proven methods to translate principles into practice across the industry.

4 Legal and professional accountability mechanisms

Lastly, AI developers do not have any legal and professional accountability mechanisms. A doctor who has behaved unethically can be barred from practicing medicine again because there are licensing schemes as well as ethics committees and professional medical boards. These mechanisms do not exist universally across all of AI development. Because of their absence, an AI developer cannot be sanctioned for not following ethical principles so the user cannot take it for granted that the AI developer has committed to some ethical framework. Additionally, there is no set pathway to correct wrongdoings if they occur.

As mentioned in the previous point, incorporating independent ethical auditing into AI development can keep bad actors responsible as well as assess difficult ethical cases, and set a precedent when they come up. Mittelstadt is by far not the only person to advocate for independent ethics boards. Most notably, the AI Now Report [2018, 2019] keeps repeating that the AI industry needs to have new approaches to governance in order to ensure ethical AI use. There are already initiatives for creating new governance mechanisms such as AIGA here in Finland. Moreover, with an outside organization, it would be possible to license developers who work on high-risk AI.

Conclusion

This blog showed the shortcomings of having high-level ethical principles. Many solutions to overcome these shortcomings require a shift in AI development by increasing outside oversight and legislation to ensure ethical business. While individual organizations cannot achieve this alone, they should be aware of the traps and weaknesses that high-level ethics principles bring with them. Most importantly, ethics will always be a process. We shouldn’t see ethical challenges as something that can be fixed through technical solutions and principles alone.

The constant changes in technology and applications require us to approach ethics as an ever-changing process. New ethical challenges will come up and require an open debate in order to make sure AI development reflects many viewpoints and ethical questions are not simplified to ‘solvable’ problems. A first step towards continued ethical thinking could be incorporating AI ethics checklists into AI development. These checklists can guide developers to embrace ethical thinking as long as the checklist is broad enough as well as require active participation from the participants.

Emblica and ethics

Here at Emblica, we are strengthening our ethical commitments by implementing an active ethics review into our workflow. We have been reviewing statements and guidelines from several companies and institutions to gauge what is out there and what we can learn from them. As this blog post discussed, only having a set of AI principles cannot guarantee that our processes are ethical. That is why we are taking our time to review and consider what approaches will give the best practical ethics recommendations and also avoid the trap of ethics-washing. Once we have concluded our research we will publish our approach to ensure that our AI solutions are developed ethically.

Emblica is a technology company focused on data-intensive applications and artificial intelligence. Some examples of our customers are Sanoma, Uponor, Caruna, and the Tax Administration. Emblica is 100% owned by its employees.