Current Issue

The Journal of Korean Institute of Information Technology - Vol. 22 , No. 3


[ Article ]
The Journal of Korean Institute of Information Technology - Vol. 20, No. 11, pp. 137-146
Abbreviation: Journal of KIIT
ISSN: 1598-8619 (Print) 2093-7571 (Online)
Print publication date 30 Nov 2022
Received 21 Sep 2022 Revised 21 Oct 2022 Accepted 24 Oct 2022
DOI: https://doi.org/10.14801/jkiit.2022.20.11.137
A Comparison of Time Series Forecast Models for Predicting the Outliers Particles in Semiconductor Cleanroom
Saksonita Khoeurn^* ; Jae sung Kim^ ; Wan sup Cho^
*Dept. of Big Data at Chunguk National University
**Professor at Chunugbuk National University (Corresponding Author)


Correspondence to : Jae sung Kim and Wan sup Cho Dept. of Big Data, Chungbuk National University Tel.: +82-43-261-3636 Email: comkjsb@chungbuk.ac.kr, wscho@chungbuk.ac.kr



Funding Information ▼ Ministry of Trade, Industry and Energy Korea Institute for Advancement of Technology P0011880

Abstract

Cleanroom cleanliness is essential in maintaining the quality of products in the semiconductor manufacturing process. In a clean room environment, particles(outliers particles) influence the quality and yield of the product. Therefore, predicting and removing particles(sub-inflow particles) to maintain the cleanliness of the cleanroom becomes an essential factor in optimizing the semiconductor cleanroom environment. This study compared statistics, machine learning, and deep learning models such as (S)ARIMAX, Facebook Prophet, LightGBM, XGBoost, GRU, and Bi-LSTM to predict fine dust anomalies in semiconductor cleanroom and evaluated the most suitable model for performing prediction tasks. This study explored GRU and LightGBM as efficient models used to develop a prediction model for semiconductor cleanroom. This paper provides good insights into selecting resource-efficient models through research and is expected to be a practical guide for future research.

초록

반도체 제조공정에서 제품의 품질을 유지하기 위해서는 클린룸 청정도가 필수적인 요소이다. 클린룸 환경에서 파티클(부유입자)는 제품의 품질과 수율에 결정적인 영향을 미친다. 따라서, 클린룸의 청정도를 유지하기 위하여 파티클(부유입자)를 예측하고, 제거하는 것이 반도체 클린룸 환경을 최적화하는데 필수요소가 된다. 본 연구는 반도체 클린룸의 미세먼지 이상값을 예측하기 위하여 (S)ARIMAX, Facebook Prophet, LightGBM, XGBoost, GRU 및 Bi-LSTM 등의 통계, 머신러닝, 딥러닝 모델을 비교하고, 예측 작업을 수행하는데 가장 적합한 모델을 평가하였다. 연구결과 GRU와 LightGBM이 반도체 클린룸의 이상탐지 예측모델 개발에 활용되는 효율적인 모델로 분석되었다. 연구를 통하여 자원 효율적인 모델을 선택하는 데 좋은 통찰력을 제공하며 향후 연구를 위한 유용한 가이드가 될 것으로 기대한다.


Keywords: predictive maintenance, lightGBM, bi-LSTM, GRU, semiconductor, cleanroom maintenance

Ⅰ. Introduction

With the influence of the Fourth Industrial Revolution, the system semiconductor industry is advancing as a service industry in which software is integrated. In addition, semiconductors are essential in high-tech sectors such as the smart devices, wearable devices, smart cars, and Artificial Intelligence(AI) technology. Thus, it is a crucial ground drive in various fields. System semiconductors are promising industries with high technology, high growth, high precision, and high value-added according to high-tech IT demand. Therefore, they are becoming the core of technology competition between countries. Since system semiconductor is a promising industry in a high-tech market, thanks to their high precision, growth, and value-added, it becomes the competitive core technology between the manufacturers.

To maintain and ensure the quality of the manufacturing production, managing the production's clean environment is a must for the manufacturer. It is known that the massive number of airborne particles inside a room called a "cleanroom" must be managed[1]. The over-set amounts of particles could lead to production defection or failure. Accordingly, removing the over amount of them inside the room is needed to avoid the problem. Therefore, predictive maintenance should foresee unexpected anomalies and alert the person in charge to make decisions[2].

Many academic and industry researchers have developed algorithms to make predictions, ranging from statistical, machine learning, and deep learning methods. Besides, decision-makers must study different aspects of the prediction process, such as forecasting horizon length, objectives, frequency, structure, and data qualities. etc., when deciding on a forecasting algorithm[3]. Thus, choosing a suitable model for prediction tasks could be painful for them. Therefore, the objectives of this study are the following: (1) to perform the prediction tasks using three types of methods, including statistical, machine learning, and deep learning models (2) to compare those methods' performances.

This paper describes the prediction tasks solution comparison and explains the best among them. Specifically, Section two presents the explanation of each chosen algorithm. Section three describes the methodology of the study. In contrast, Section four will discuss the analysis's implementation and results. Finally, this paper concludes with a discussion and future study.

Ⅱ. Literature Review

This section provides a brief review of the methodology used in the study and a short overview of the cleanroom.

2.1 Cleanroom

A cleanroom is formed to control the concentration of suspended particles and minimize particles' inflow, generation, and retention. It refers to a workspace where temperature, humidity, pressure, and related variables must be adjusted. The cleanroom environment becomes a steppingstone for securing stability and accuracy that guarantees the production quality of semiconductors. Since production efficiency can be increased when defects in the semiconductor manufacturing process are lowered, the semiconductor manufacturing smart factory must be maintained following the cleanroom management standard[4].

2.2 Statistical Modelling

Statistical Modeling is a mathematically diagnosed and prescribed method to approximate the truth generated by the data and make forecasts from this approximation. Many statistcal models, such as Autoregressive(AR) models, moving average(MA) models, and autoregressive integrated moving average (ARIMA) models, have been widely used in time-series forecasting. Moreover, they continued to be applied in various situations ranging from academic research to industry modeling.

2.2.1 Autoregressive Integrated Moving Average Exogenous(ARIMAX)

ARIMA is a forecasting model for univariate data and is limited for this study dataset which is a multivariate type. Moreover, another model can account for more than just past prices or residuals, named the "MAX" model. The name ARIMAX derives from an obtainment of the ARIMA, respectively. The X added to the end stands for "exogenous." The model suggests adding various variables to assist measurement of the endogenous variable[5]. In time series, the exogenous variable is a parallel time series that are not modeled directly but is used as a weighted input to the model. Therefore, this model fits univariate time series with trend and seasonal components and exogenous variables.

2.3 Machine Learning Models

There are many classifications, and regression-based machine learning modeling has been widely used for forecasting tasks ranging from well-known libraries to decision tree-based algorithms.

2.3.1 Facebook Prophet

Developed by Facebook, the model is known for its simplicity for the modular regression with the critical components of trends, seasonality, and holidays. There are a variety of functional benefits of this approach. Due to weekly and annual seasonality, the seasonal feature provides a flexible model of periodic changes[6].

2.3.2 Light Gradient Boosting Machine(LightGBM)

LightGBM[7] utilizes the boosting method to combine multiple weak learners and develop a strong learner. This robust and expressive model can be utilized for most regression problems. Among the general regression tree algorithm models, LightGBM has the superiorities of preventing over-fitting, outstanding generalization ability, faster speed, and lower memory consumption.

2.3.3 eXtream Gradient Boosting(XGBOOST)

XGBoost is a set of machine learning techniques with a scalable boost tree. XGBoost calculates the results of K trees as the final predicted value. XGBoost also shows shrinkage and column subsampling. Besides, XGBoost uses acquisitive and approximate algorithms to find the most acceptable split point [8].

2.4 Deep Learning Models

Deep learning neural networks can automatically learn uninformed complex mappings from inputs to outputs and reinforce multiple inputs and outputs. These robust features nourish many prospects for time series forecasting, especially on difficulties with complex-nonlinear dependencies, multi-step forecasting, and multivalent inputs. Moreover, these features and the capabilities of more modern neural networks may deliver significant promises, including the automatic feature learning offered by convolutional neural networks and support for sequence data in recurrent neural networks. Among the neural networks, Recurrent Neural Networks(RNNs) surpass the other two, namely Convolutional Neural Networks(CNN) and Multilayer Perceptron (MLP)[9].

2.4.1 Gated Recurrent Unit(GRU)

GRU architecture was introduced[10] and is regarded as a deviation of Long Short-Term Memory (LSTM) as it uses identical functions but is organized differently. To solve the disappearing gradient problem of a standard RNN, GRU employs the so-called update gate and reset gate. These two vectors determine what information should be transferred to the output. Their phenomenal specialty is that they can be trained to keep information from long ago without removing it through time or withdrawing information unrelated to the prediction.

2.4.2 Bidirectional Long Short-Term Memory (Bi-LSTM)

Bi-LSTM just put two independent RNNs together. This structure enables the networks to hold backward and forward information about the sequence at every step. Furthermore, Bi-LSTM operates the inputs in two ways, one from past to future and one from future to past, which differs from unidirectional LSTM.

Ⅲ. Methodology

The following methodology was followed for particles forecasting, as shown in Fig. 1:

Fig. 1.
Flow chart of proposed method

⦁ Data sources: the datasets were integrated from many sources, such as (i) Weather data: through API, the data was collected from the manufacturer's location. (ii) Differential Pressure: This variable was collected from a type of sensor installed in each zone. (iii) Temperature, Humidity, and Particles: These three variables were collected from another type of sensor installed in each zone. (iv) Heating, Ventilation, and Air Conditioning (HVAC): these variables consist of the HVAC information from each zone where another type of sensor is installed. All the data sources are selected within the same time frames to study.

⦁ Data Preparation: This part contains many mini processes to prepare the data for the model. First, the selected data must be carefully checked to catch any missing values and if they are in the periodical order. Then, the study performed the fundamental statistical analysis, mainly the analysis needed for the time-series data. Next, feature engineering was performed to prepare the datasets for the required models. Moving averages and lagging data were generated for the models such as ARIMAX, Prophet, XGBOOST, and LightGBM. Meanwhile, the data was scaled for the deep learning models GRU and BiLSTM. Then, the study separated the datasets into training and validation sets with a portion of 80%-20%.

⦁ Analytics Models: the models were defined independently and trained the data mentioned in the previous parts. The details of the described model will be explained in Section VI.

⦁ Evaluate the Models: after training, the validation data is used for prediction from all the models. Three metrics were used in this study to evaluate and compare the performance of the models:

- Root Mean Square Error(RMSE): an often-used measure of the distinctions between values predicted by the model and the values observed.

- Mean Absolute Error(MAE): the most direct measure of prediction accuracy. As the name indicates, MAE is the mean of absolute errors. The absolute error is the difference in absolute value between the forecasted value and the actual value.

- Performing Time: the computation time of each model to identify the fastest one.

Ⅳ. Implementation and Result

4.1 Selected Dataset

Due to the unavailability of THP data from time to time, the study selected only the data from November 29 to December 30, 2021, which contained all the least missing values compared to other time frames. Therefore, the missing values within that time frame were filled using the forward fill method that carries forward the last known value before the missing one. Notably, the sensor data are in one-minute intervals. Also, previous research with the same data sources found that Zone 100 contained most of the outliers of the particle variables.

Meanwhile, the external weather data from the API are collected at a one-hour interval. The study used up-sampling and interpolating neighboring data points to estimate the missing value.

After integrating all the datasets from the data sources, 14 features with 40,564 records were selected to prepare for the model.

4.2 Statistical Analysis

4.2.1 Exploratory Data Analysis(EDA)

As shown in Fig. 2, all three particle types contain outliers. Then, we would like to see the total amount by each day of the week and by the time of the day. Fig. 3 shows that all three particles obtain the most on Thursday; meanwhile, the highest peak is from 2 pm to 3 pm during the day.

Fig. 2.
Outliers present in zone 100

Fig. 3.
Histogram grouped the particles by days and hours

4.2.2 Correlation Analysis

Since the data is not normally distributed and contains outliers, the Spearman Ranking analysis was conducted to check the correlation coefficient between the variables. As shown in Fig. 4, only particle05 is strongly correlated with particle03. There is no other significant relationship between particle variables with other variables. Therefore, we will focus on particle05 forecasting from here.

Fig. 4.
Correlation analysis results using spearman

4.2.3 Time Series Specific Exploratory Methods

⦁ Stationary Test: The time series need to quality check on the stationary, self-correlation, trends, and seasonality. Therefore, the study used the Augmented Dickey-Fuller test(ADF) to check whether the data were stationary. Fortunately, the result shows that the data obtained has no unit root and is stationary, meaning differentiation is unnecessary.

⦁ Decompose Time Series: Then, the study decomposes the time series to check its trends and seasonality. As shown in Fig. 5, the data is seasonal, and there is always a peak around 2 pm, like the histogram in the EDA section mentioned earlier.

Fig. 5.
Seasonality test of particle 05

4.2.4 Feature Engineering

With the time-series analysis, it is relatively more complicated than simple datasets. The study converted the moving averages and lagging with seven days, 15 days, and 30 days respectively, and included them as the exogenous variables with other existing features to define the models. After converting those features, the datasets with 40,564 rows and 96 columns are ready to be split for the model. After separating, the train data contains 32,500 rows, while validation data contain 8,064 rows.

4.3 Analytics Models

4.3.1 ARIMAX, Facebook Prophet, LightGBM, and XGBoost

The research used the desktop with Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz as a processor and 16.0 GB of physical memory to train all the models,

With the inclusion of exogenous variables, four models use the engineered data to train and show remarkable results.

⦁ SARIMAX: since it is seasonal data, SARIMAX was defined. The study used the existing library that automatically performs the stepwise feature selection to find the least Akaike Information Criterion(AIC), a mathematical method to evaluate the goodness of fit. The best model found at SARIMAX(5,0,2)(0,0,0)[0] intercept with the AIC value = 448901.578.

⦁ Facebook Prophet: the model performs the decomposing, including trend, weekly, and daily pattern analysis. The result showed the same pattern as the histogram in EDA.

⦁ LightGBM: the model automatically performed the feature selection. Features used in the model are 78.

⦁ XGBOOST: Time series features such as data, hour, min, day of the week, quarter, month, year, day of the year, day of the month, and week of year were added to the data to enhance the performance of the model.

Fig. 6 and the table1 below show the results of the four models. As highlighted in bold in Table 1, LightGBM outperforms the other three models with the least 93.42 RMSE score, 56.54 MAE score, and 0.3 seconds performing time.

Fig. 6.
Comparison of four models

Table 1.
RSME, MAE scores, and performing time of the four models

Model	RMSE	MAE	Performing time
ARIMAX	95.21	60.80	1745.947 seconds
Prophet	144.58	115.52	11.8 seconds
LightGBM	93.42	56.54	0.3 seconds
XGBOOST	120.52	66.96	0.4 seconds

4.3.2 Deep Learning Models: GRU and BiLSTM

By using the same amount of training and validation data, GRU and BiLSTM models were created. These two models were developed with the same hypermeters described in table 2.

Table 2.
Experimental setting

Parameters	Values
Epoch	Early stopping: 14
Batch size	16
Dropout rate	0.2
Optimizer	Adam
Loss	MSE

As shown in Fig. 7 and Fig. 8, GRU shows slightly better results comparing BiLSTM. With all the six models' comparison, the result indicates that GRU outperformed the other models with the lowest RMSE(88.63) and MAE(54.58), while LightGBM took the shortest(0.3 seconds) computing time, as shown in Table 3. Since GRU is a deep learning model, the performing time is longer than machine learning models; however, this short computation time can be considered efficient since the model was trained using a CPU. The performance would be better by using the higher capability processers.

Fig. 7.
BiLSTM prediction result

Fig. 8.
BiLSTM prediction results

Table 3.
Comparing all the models

Models	RMSE	MAE	Performing time
ARIMAX	95.21	60.8	1745.947 seconds
Prophet	144.58	115.52	11.8 seconds
LightGBM	93.42	56.54	0.3 seconds
XGBOOST	120.52	66.96	0.4 seconds
GRU	88.63	54.58	70 seconds
BiLSTM	90.92	55.74	80 seconds

Ⅴ. Conclusion

Choosing the efficient prediction models is resource-consuming, so the study performed the most applicable models and compared them. Six models used in the study range from classic statistical models to deep learning, such as (S)ARIMAX, Facebook Prophet, LightGBM, XGBoost, GRU, and Bi-LSTM. The research used three methods to evaluate the performance of the chosen models. The deep learning model, GRU, surpasses the other models with the lowest RMSE and MAE scores among all six models. Meanwhile, the machine learning model, LightGBM had the shortest computing time among the six models and showed the lowest RMSE and MAE scores among the machine learning models. Therefore, the study can conclude that GRU and LightGBM are the efficient models to be utilized for further development in predictive maintenance.

Overall, the research provides good insight into choosing the resource-effective model and is a helpful guide for future research.

Acknowledgments

This research was financially supported by the Ministry of Trade, Industry and Energy(MOTIE) and Korea Institute for Advancement of Technology(KIAT) through the International Cooperative R&D program(Project ID:P0011880)

References


1.	M. Dobler, M. Rüb, and T. Billen, "Minienvironment solutions: special concepts for mask-systems", 27th European Mask and Lithography Conference, Germany, Vol. 7985, pp. 243-257, Apr. 2011.
2.	Key Elements of Contamination Control, https://www.cleanroom-industries.com. [accessed: Jun. 13, 2022]
3.	E. Mahmoud, "Accuracy in forecasting: 'A survey', Journal of Forecasting", Journal of Forecasting, Vol. 31984, No. 2, pp. 139-159, Apr. 1984.
4.	A. M. Dixon, "Environmental monitoring for cleanrooms and controlled environments", CRC Press, 2016.
5.	What Is an ARIMAX Model?, https://365datascience.com. [accessed: Jun. 13, 2022]
6.	S. J. Taylor and B. Letham, "Forecasting at scale", The American Statistician, Vol. 72, No. 1, pp. 37-45, Sep. 2017.
7.	Q. Meng, G. Ke, T. Wang, W. Chen, Q. Ye, Z.-M. Ma, and T.-Y. Liu, "A communication- efficient parallel algorithm for decision tree", Advances in Neural Information Proc. Systems, Vol. 29, 2016.
8.	N. Zhai, P. Yao, and X. Zhou, "Multivariate Time Series Forecast in Industrial Process Based on XGBoost and GRU", 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference(ITAIC), Chongqing, China, Vol. 9, pp. 1397-1400, Dec. 2020.
9.	I. Méndez-Jiménez and M. Cárdenas-Montes, "Time series decomposition for improving the forecasting performance of convolutional neural networks", Conference of the Spanish Association for Artificial Intelligence, pp. 87-97, 2018.
*10.*	K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation", arXiv preprint arXiv:1406.1078, Jun. 2014.

Authors

Saksonita Khoeurn

2017 : Master of Science in ICT Convergence, Handong Global University

2021 ~ present : PhD student in Depart of Big Data, Chungbuk National University

Research interests : Deep learning, Big Data Analytics, Smart Devices, Machine Learning

Jae sung Kim

2017 : PhD in Managment of Information System, Chungbuk National University

2017 ~ Present : Professor at Chungbuk National University

Research interests : Big Data, Smart Factory, Data mining

Wan sup Cho

1996 : PhD in Computer Science, KAIST

1997 ~ present : Professor of Management of Information System at Chungbuk National University

Research interests : Database, Big data, Blockchain, Artificial intelligence, Data governance