Codos Case Study: 96% Accuracy, Less Data, Faster Performance


Branża:
Ochrona środowiska
Kraj:
Switzerland
Usługa:
A neural network classifying the mode or method of transportation based on recorded data from a mobile phone.
Client_
Codos Foundation is a Swiss non-profit organization. The company developed an app that encourages eco-friendly transportation choices.
Using AI and machine learning, the platform measures CO2 emission reductions, supporting users in adopting a sustainable lifestyle.
Project goal_
The goal of the project was to develop an algorithm capable of identifying the user’s mode or method of transportation at every minute of their journey, enabling precise calculation of generated CO2 emissions. The system detects various modes of transport, such as cars, buses, trams, trains, bicycles, electric scooters, subways, walking, and stationary moments.
A key requirement was to ensure the algorithm could be easily expanded to include additional transportation modes without requiring significant changes to the code.
Solution Delivered_
The solution designed by the Euvic team was based on AWS cloud infrastructure, ensuring high scalability and performance. Advanced machine learning and artificial intelligence technologies were applied to achieve the project’s goals.
The core components included neural networks such as:
- Convolutional Neural Networks (CNNs): enabling feature extraction from input data in the form of spectrograms and numerical sequences.
- Transformers: analyzing processed data and context to effectively infer the mode of transportation.
Additionally, the solution incorporated:
- Hidden Markov Models (HMMs) and statistical methods to enhance the precision of classification and data analysis.
The combination of these technologies enabled the creation of an algorithm capable of accurately identifying and classifying modes of transportation in real-time, which was a critical factor in the project’s success.
Solution architecture and key components_
The system was built on a flexible microservices architecture, supporting scalability and seamless integration with the client’s existing systems.
The solution included the following key components:
- Time-series data processing modules: responsible for processing data collected by users’ mobile devices.
- Transportation classification modules: performing transportation mode identification based on
machine learning models. - Monitoring and analytics services: enabling access to data, reports, and result visualizations.
The solution operated within the AWS cloud, utilizing services such as:
- Amazon S3 for storing large datasets,
- Amazon EC2 for advanced computations during model training,
- Amazon Lambda for deploying models and managing their real-time operation.
Core technologies, such as CNNs and Transformers, were central to the algorithm, ensuring accurate classification and enabling the system to be easily extended to include new modes of transportation in the future.
Methodology_
The project utilized a range of frameworks and tools that enabled the efficient achievement of objectives. PyTorch was used to build machine learning models, allowing for flexible design and training of neural networks, including Convolutional Neural Networks (CNNs) and Transformers.
Pandas, NumPy, and Scikit-Learn were employed for time series data management and processing, while Matplotlib and Plotly were used for visualizing the results.
The entire process was managed and monitored using AWS, with Jira facilitating the overall algorithm development process.
Technologies used_
The project used a number of frameworks and tools to effectively achieve its goals.
PyTorch was used to build machine learning models, which allowed flexible design and training of neural networks, including Convolutional Neural Networks (CNNs) and Transformers (Transformers).
Pandas, NumPy and Scikit-Learn were used to manage and process the temporal data, while visualizations of the results were created using Matplotlib and Plotly. The whole thing was managed and monitored using AWS, and Jira helped manage the entire algorithm building process.
Benefits of the solution_
The solution significantly improved the efficiency of transport mode recognition for the client, increasing the initial test set accuracy from approximately 80% to 96%. It is worth noting that errors at this level were partly due to inaccuracies in the test dataset itself (4% of errors fell within the margin of human error).
Additionally, the delivered model processed a reduced volume of data through compression, which significantly lowered costs related to computational power, storage, and the neural network’s processing speed.
The new model greatly enhanced the user experience with the application by providing significantly improved data classification with much higher accuracy. This translated into positive user feedback, as the application was able to correctly calculate the CO2 emissions generated by the user during travel and reward them accordingly.
Challenges_
During the development of the algorithm, numerous challenges were encountered. A major issue was distinguishing between buses and cars, which was effectively addressed by modifying the neural network architecture, applying additional statistical data processing, and focusing the model on information extracted from the accelerometer.
From the start, the size and speed of the model also posed a challenge, as both the model and its data processing had to be time-efficient. To achieve this, the code was optimized using the PyTorch library, while data processing was enhanced with Polars and NumPy.
Another significant issue was the quality of training data, which did not always reflect real-world scenarios.
The solution involved initially implementing filtering based on logical and statistical factors. However, as the model improved and the need for accurate data grew, custom tools for manual analysis were developed and used in extreme cases.
Smaller challenges also arose, such as optimizing selected processes within the infrastructure, attempting to use mislabeled data in training, dealing with model training stability issues, and analyzing incorrect outputs. However, most of these were minor and could be resolved with focused attention and effort.
As a result, the overall system operation costs were significantly reduced in some areas. Additionally, both the speed and quality of the results improved markedly.
Lessons learned_
The project provided valuable experience in working with advanced neural network models utilizing CNN and Transformers technologies. Throughout its implementation, it was repeatedly confirmed that the quality and preprocessing of data play a crucial role—often even more so than its quantity. Detailed data analysis, both before starting and during the work, proved essential, while the method of data collection and quality assessment required precise alignment with the provider. Minor details, such as brief GPS signal loss or misclassifying traffic jams as stops, could significantly disrupt the model’s performance. Ultimately, it turned out that the network achieved much better results when trained on a smaller amount of high-quality data with carefully selected features than on a full, unfiltered dataset.
Additionally, the incremental approach to experiments proved highly effective. It allowed for precise progress monitoring and enabled informed decisions to either abandon ineffective approaches or fully implement proven solutions. Supporting experiments with statistical indicators, such as a confusion matrix, F1 table, or custom metrics and charts tailored to the specific problem (e.g., a prediction consistency index or visualization of results on a map), significantly improved the analysis process and outcomes.
The exceptional value of scientific publications and technical documents available online is also worth highlighting. These resources often offer ready-made solutions to problems similar to those encountered in the project. Combining such external insights with solutions developed within the team yielded excellent results.
Summary_
The project focused on significantly improving the precision of transport mode classification in the client’s application, increasing accuracy from the initial 80% to 96%. By implementing data compression, the model processed smaller amounts of information, reducing computational costs and improving performance speed.
From a business perspective, the enhanced classification positively impacted user experience, increasing the reliability of carbon footprint calculations and likely motivating users to adopt more eco-friendly behaviors.