Technical Advancements and Importance of Time-Series Data Augmentation‍

Time series data is a key factor in determining the performance of AI-based predictive models. In particular, the importance of time series data is increasing day by day in various areas such as demand forecasting, price forecasting, and failure prediction in the business environment.

However, in reality, it is not easy to obtain a sufficient amount of high-quality time series data. To solve this data shortage problem, time series data augmentation technology is attracting attention, and it is evolving into an advanced methodology that generates new data points while preserving the statistical characteristics of the original data.

‍

Innovative development of time domain-based augmentation technology

Data augmentation in the time domain is a core technology that leverages the temporal characteristics of time-series data to generate new data.

Time-shifting technology creates a new sequence by shifting the time axis of the original data within a certain range, and the most important thing in this process is to preserve the temporal continuity and causality of the data.

In one study, a technology was developed that uses the Dynamic Time Warping (DTW) algorithm to preserve the regional characteristics of time-series data while transforming the overall pattern.

This time-domain augmentation technology has shown excellent performance, especially in data with strong periodicity, and has shown results that improve the accuracy of predictions by an average of 15% in actual application cases.

‍

Securing data diversity and improving quality through amplitude modulation

Data augmentation in the amplitude domain is a technique that generates various scenarios by adjusting the size and variability of time-series data. Gaussian noise injection technology increases the robustness of the model by adding statistically controlled noise to the original data.

The strength of the noise should be carefully controlled based on the standard deviation of the original data. Amplitude scaling is a method of adjusting the overall scale of the data while maintaining the statistical characteristics of the original data, which is particularly effective for financial time series data.

These amplitude-based augmentation techniques are very effective for training models to respond to various market conditions and volatility.

‍

Advanced augmentation techniques using frequency domain transformation

Data augmentation in the frequency domain is an advanced technique that takes advantage of the periodic nature of time series data. After converting time series data to the frequency domain using the Fourier transform, new data is generated by adjusting the amplitude or phase of a specific frequency band.

This method has the advantage of maintaining the basic periodicity of the data while diversifying the detailed patterns. In fact, when this technique was applied to the field of electricity demand forecasting, it succeeded in diversifying daily volatility while maintaining long-term patterns such as seasonality.

‍

Performance improvement mechanism of prediction model using augmented data

Technical principle of improving model robustness

Learning with augmented data dramatically improves model robustness. A model trained with data containing various fluctuation patterns and noise can maintain stable performance even in unexpected changes that occur in the real environment.

In the manufacturing demand forecasting case, the model that applied data augmentation showed 30% more stable forecasting performance than the existing model even in the case of sudden market fluctuations. This means that data augmentation improves the quality of the model's learning itself, rather than simply increasing the amount of data.

In particular, the ability to handle outliers has been improved, enabling stable forecasting even in the case of sudden market changes or unexpected events.

‍

Systematic process of improving generalization ability

Models trained with augmented data have significantly improved generalization ability for new patterns. Augmented data allows models to experience various scenarios, preventing overfitting and improving prediction performance in real-world environments.

In financial time series forecasting, models using augmented data showed 20% higher adaptability to new market trends than existing models. They also showed more stable performance in the cross-validation process, which significantly increased the practical value of the models.

This improvement in generalization ability was particularly noticeable in long-term forecasting, with the longer the forecast interval, the greater the difference in performance with existing models.

‍

Improved accuracy of uncertainty estimation

Augmenting time-series data allows for more accurate estimation of uncertainty in the predicted value. Models trained with augmented datasets can provide more reliable confidence intervals along with the predicted value.

This is a very important factor in business decision-making, as it directly helps with risk management and resource allocation optimization. In fact, by using this improved uncertainty estimation in the inventory management system, the company was able to reduce the safety stock level by 15% while maintaining the out-of-stock rate.

‍

Industrial applications of time series data augmentation

Construction and manufacturing industries

The semiconductor industry has adopted an innovative data augmentation approach to predict equipment failures.

By combining two-dimensional time series data in a channel manner, the amount of data was effectively expanded without compromising the characteristics of the original data. This approach has achieved remarkable results, improving the accuracy of process risk detection by up to 16.7%.

The case of Tirautech shows the possibility of data augmentation in the manufacturing execution system (MES) environment. It verified the learning suitability of limited time-series data and effectively solved the problem of underfitting, which led to the optimisation of the manufacturing process and improved quality control.

Data augmentation techniques are also being used innovatively in the field of object recognition at construction sites.

What is particularly noteworthy is the case where six different augmented techniques were experimentally applied, starting from a very limited image dataset of 50 images. This approach, combined with the YOLOv10 algorithm, has greatly contributed to the safety monitoring and work efficiency improvement of construction sites.

‍

Demand forecasting

E-Mart, a leading Korean retailer, has improved its demand forecasting error by 18% by using machine learning.

E-Mart operates about 140 stores nationwide, and each store is open for business 340 days a year. There are about 40 independent variables that need to be considered.

In this situation, demand forecasting based on past data alone makes it difficult to flexibly respond to unexpected external variables such as COVID-19.

To this end, E-Mart built a predictive model based on two years of data and conducted data analysis. It is said that the error rate of E-Mart's demand forecast improved by 18% through advanced machine learning, while learning about changes in the sales volume of specific products according to changes in situations and internal/external conditions.

IMPACTIVE AI has built a custom model that is specialized in forecasting demand for flagship products using advanced time series models such as transformers. It has also built a model that accurately captures changes in the environment by learning from more than 6 million pieces of external environmental data.

The Deepflow created in this way provides insights on the optimal purchase price through accurate predictions of the prices of raw materials such as metals, plastics, glass, and lithium, which are the main raw materials for home appliances and mobile devices. It also predicts the required amount along with the price of raw materials, allowing you to determine the optimal purchase timing and order quantity.

‍

Semiconductor Industry

Similarly, data augmentation techniques are also being used in the failure detection system for semiconductor equipment.

In this field, multivariate data is applied to improve the failure prediction model, and augmentation techniques are applied to generate training data to solve the problem of data imbalance.

This has resulted in increased accuracy in failure detection and reduced maintenance costs.

‍

Financial industry

In the financial sector, time series data augmentation techniques are being used to improve the performance of fraud detection models.

For example, by creating virtual fraud cases and training the model to detect fraud more accurately in real-world scenarios. This approach increases the diversity of the data and contributes to improving the model's generalisation ability.

‍

Traffic volume prediction

Data augmentation techniques were also used in the traffic volume prediction study. This study compared and analyzed various data augmentation techniques and explored ways to improve traffic volume prediction performance by applying deep learning models.

Data augmentation techniques are essential because traffic volume data is often unbalanced or insufficient. Broadly speaking, the following data augmentation methods are applied.

Noise addition: Simulate various situations by adding noise to the original data. This helps the model learn the changes in traffic volume in various environments.
Time shifting: Generate new data by shifting the time axis of the data. For example, you can train a prediction model by applying data from a specific time period to another time period.
Scaling: Generate a range of data by adjusting the scale of the data. This is useful for models to learn various traffic volume levels.
Generating synthetic data: Combining multiple data sources to generate new synthetic data. This increases the diversity of data and improves the model's generalisation ability.

Research is being conducted to improve traffic volume prediction performance by applying these data augmentation techniques.

‍

Points to note when selecting data augmentation methods and future prospects

Selection and development of time series data augmentation techniques: Key points

The most important thing in the selection and development of time series data augmentation techniques is to accurately understand the intrinsic characteristics of the data and adopt the optimal approach.

The suitability of the augmentation technique should be determined by thoroughly analysing the patterns of periodicity, seasonality, and trends in the demand forecast data, which is a key factor in maximising the performance of advanced AI systems such as Deepflow.

In particular, strategic selection of data augmentation techniques is essential for effective use of the vast datasets provided by Deepflow, including more than 600,000 external price data, more than 5 million market environment data, and more than 6 million trend data.

For data involving complex non-linear relationships, deep learning-based augmentation is effective, and statistical-based augmentation is appropriate when probability distributions or statistical characteristics are important. In addition, it is desirable to apply time series transformation and decomposition techniques to data showing strong periodicity or seasonality.

Deepflow combines these various augmentation techniques with 224 AI prediction models to systematically compare and verify performance, and creates models optimised for the characteristics of each item and product.

This approach enables predictions that take into account various variables related to future situations, rather than simply relying on past data, and thus produces more accurate and reliable results.

‍

Expansion of multimodal data integration and augmented technology

Multimodal data integration is a technology that collects and analyses various forms of data, such as text, images, voice, and video, into one. This allows for more accurate identification of complex information that cannot be identified through individual data alone, and the discovery of new meanings.

Meanwhile, augmented technology is a technology that superimposes digital information on the real world.

Augmented reality (AR), virtual reality (VR), and mixed reality (MR) are representative examples. Examples of the use of augmented technology include characters appearing in real space in smartphone games or checking information about buildings through glasses.

Expansion of multimodal data integration and augmented technology — source: Multimodal Seed Data Augmentation for Low-Resource Audio Latin Cuengh Language

Recently, technologies that integrate and augment time-series data with other types of data are also advancing. Augmented technologies that combine various types of data, such as text, images, and sensor data, are expected to enable more sophisticated predictions.

When multimodal data integration and augmented technology meet, computers can better understand our words and actions and provide us with tailored information. For example, if you ask a question to your smartphone with your voice, you can get a detailed answer with an image.

It can also provide more realistic experiences in various fields such as gaming, education, and medicine. You can receive realistic training in a virtual space or learn how to assemble a complex machine through augmented reality.

Ultimately, multimodal data analysis can be used to discover new patterns and relationships, and develop new services and products based on them. For example, data from an entire city can be analysed to improve traffic flow or provide personalised healthcare services.

‍

Conclusion

Time-series data augmentation technology has become a key tool for dramatically improving the performance of AI prediction models. This technology is evolving beyond simply increasing the amount of data to fundamentally improve the quality of model training and the reliability of predictions.

Through the development and application of new augmentation techniques, AI prediction models will provide more accurate and reliable results, which will provide greater value to the decision-making process of companies.