Korean Institute of Information Technology
[ Article ]
The Journal of Korean Institute of Information Technology - Vol. 20, No. 3, pp.147-156
ISSN: 1598-8619 (Print) 2093-7571 (Online)
Print publication date 28 Feb 2022
Received 10 Dec 2021 Revised 25 Jan 2022 Accepted 28 Jan 2022
DOI: https://doi.org/10.14801/jkiit.2022.20.3.147

Predicting the Impact of High-Speed Rail on Population Change in Local Cities by using a Naive Bayesian Classification-based Artificial Intelligence Model

Hyunjung Kim* ; Kyuseok Kim**
*Institute of Construction and Environmental Engineering, Seoul National University
**Dept. of Urban Planning, Seoul National University

Correspondence to: Kyuseok Kim Department of Urban Planning, Seoul National University / Department of Data Convergence Software, Korea Polytechnics Tel.: +82-31-696-8832, Email: kyuseokkim@kopo.ac.kr

Abstract

Korea is dealing with a population disparity between the capital areas of Seoul and the population of local cities. As a result, the Korean governments established and implemented regional balancing policies such as relocating public institutions to local cities and building high-speed railways. In this regard, we proposed and empirically evaluated a Naive Bayesian Classification-based artificial intelligence model whether the launch of the high-speed railway increased or decreased the population of local cities. The spatial range of the learning data is all the districts except for Jeju island in Korea, accounting for 227 districts, and the temporal range of it is from 2014 to 2019, 3 years before and after the SRT(Super Rapid Train) construction. The research results are as follows. First, the average accuracy of the proposed research model was 0.64, and the average precision of it reached 0.67. Second, the districts with SRT stations showed a prediction of population growth, and the districts that have both KTX and SRT stations showed higher prediction value on the population growth. This study is valuable as an early-stage study that proposed analyzing various data affecting population growth by applying the Naive Bayesian classification technique known as a fast and accurate methodology to predict the increase or decrease of the urban population.

초록

본 연구는 SRT 개통 전후 3년 기간 동안 고속철도의 개통으로 인한 시군구별 전국 지방도시 인구의 증감을 Naive Bayesian 분류 기반의 인공지능 모형을 제안하고 실증적으로 분석하였다. SRT와 KTX 정류소 외에 고령자 비율, 총 가구수, 도시 면적, 종사자 수, 재정자립도, 1인가구 비율, 인구 천명당 학교 수 및 병상 수, 혁신도시 여부 등 여러 가지 변수를 반영하여 AI모델을 분석한 결과는 다음과 같다. 첫째, 제안한 연구모형의 평균 정확도는 0.64, 평균 정밀도는 0.67에 도달하였다. 둘째, 모집단의 예측값을 GIS를 통해 시각화한 결과, 수도권과 인근 지역은 인구가 증가할 것으로 예측 되었고, 지방도시 중에서는 SRT역이 있는 지역은 인구가 증가할 것으로 예측 되었다. 또한 KTX와 SRT역이 모두 위치한 지역은 인구 증가의 예측치가 더 높게 나타났다. 본 연구는 상대적으로 머신러닝의 활용이 적었던 도시 인구의 증감에 대한 예측 분야에 나이브 베이지안 분류 기법을 적용함으로서 인구 증감에 영향을 미치는 다양한 데이터들을 빠르고 간편하게 분석할 수 있도록 제안한 초기 연구로서 가치를 지닌다.

Keywords:

population change, high-speed railway, artificial intelligence, naive bayesian classification, local cities, Korea

Ⅰ. Introduction

The population decrease is the primary concern in most developed countries. Korea, especially, is experiencing a low birth rate, with a total fertility rate of 0.840 in 2020 [1]. Population decline is expected to have a huge ripple effect on all societies and economies. In particular, the decrease in population tends to be concentrated in local small cities rather than in the metropolitan area, which is expected to cause a crisis of urban shrinkage. Local small cities of Korea, in particular, are suffering from the dual effects of natural population decline and population outflow. Although the Seoul Capital Area (Seoul, Gyeonggi, and Incheon) occupies only 11% of the total land area in Korea, the population of Seoul Capital Area accounted for more than 50% of the total population of Korea [2]. In this situation, the Korean government has been suggesting numerous policies that can prevent the shrinkage of local cities. For example, the government has relocated administrative capital from Seoul to Sejong City, and Innovation Cities has been made. As one of these various policies, the Korean government tried to strengthen access to local and metropolitan areas by constructing several high-speed railways. As a result, in addition to KTX (Korea Train eXpress), SRT (Super Rapid Train) has launched in December 2016.

Transportation infrastructure plays an important role as an intervention event in this population movement [3]. High-speed rail, in particular, is known to play an important role in logistics and population movement by influencing socioeconomic conditions in the region and fundamentally altering the concept of accessibility [4][5]. There are two perspectives on the role of high-speed rail in population movement. One is the view that high-speed railroads generate so-called straw effects, accelerating the outflow of local populations to large cities [6][7]. The other is that high-speed rail performs a function of alleviating population concentration in the metropolitan area [8].

Deng et al. (2018) examine population movement by high-speed rail by dividing it into three causes. These include the agglomeration effect, diffusion effect, and siphoning effect. The agglomeration effect refers to the concentrated effect of the population and economy according to the benefits of accumulation due to improved accessibility by high-speed rail. High-speed rail is known to cause considerable agglomeration as an important SOC (Social Overhead Capital) [9]. The diffusion effect refers to the effect of spreading the population and economy along the high-speed railway line. Large-scale transportation infrastructure contributes to regional development linked to routes, as in the case of Gyeongbu Expressway, which has also been proven in the case of KTX [10]. Siphoning effect refers to a phenomenon in which the population and economy move from a relatively depressing place to a thriving place as if absorbed. Most previous works of literature that showed high-speed rail accelerates the outflow of local population point to siphoning effect [4][6][9][11][12].

In other words, the agglomeration effect and the diffusion effect cause regional imbalances between regions where high-speed rail passes and regions where high-speed rail does not pass, and the siphoning effect acts as a driving force for regional imbalances between large cities and small cities. In sum, there have been many previous works of literature on the high-speed railways and the population change; however, the result showed different results depending on the effect. Most of the previous studies in Korea focus on the effect of KTX. Considering the government’s regional rebalancing and a new launch of SRT railway, it is needed to empirically examine the impact of population changes in local cities by high-speed railway. In this situation, a question arises whether this high-speed railway increase or decrease the population in local cities; therefore, in this study, we suggest an analytical model for predicting population growth of local cities using a Naive Bayesian Classification-based Artificial Intelligence model.


Ⅱ. Data and Methodology

2.1 Study Area

The nationwide spatial range is targeted to examine the population movement across the country considering the wide-area transportation network. The spatial range of the research data is 227 districts in South Korea, except for Jeju, which is an island region. Considering the SRT launch year (December 2016), the temporal range of this study was constructed from 2014 to 2019 to examine three years before and after the SRT launch. Also, the relocation of administrative capital to Sejong city was full-fledged in 2014, and the Innovation city project also covers this temporal range. Fig. 1 shows the study area of this study.

Fig. 1.

Study area

2.2 Methodology: Naive Bayesian Classification

In order to predict the outputs, machine learning techniques are trained by data. The types of machine learning are divided into three: supervised learning, unsupervised learning, and reinforcement learning [13][14].

First, the input data of the supervised learning algorithm is provided as a labeled dataset, which shows the correct and incorrect solutions. Second, the unsupervised learning algorithm is not complete and not a clean labeled dataset. The unsupervised learning algorithm aims to explore the patterns and predict the output. Finally, the reinforcement learning algorithm is neither based on supervised nor unsupervised. It learns through the reward or feedback of the environment [15].

There are two types of problem-solving methods of supervised learning: classification and regression [16]. Regarding the classification methods, it helps to predict a discrete value. Naive Bayesian Classification, Support Vector Machines, and Logistic Regression can be examples [17][18].

The Naive Bayesian Classification is based on Bayes Theorem, which states the equation (1). This equation can be regarded as the probability that A will occur when B occurs. In this formula, A and B can be regarded as the output and input variables, respectively [17].

PAB=PBA×PAPA(1) 

If more than one condition corresponds to B, it can be represented as the following equation (2).

PAvertB1B2Bn=PB1A×PB1A××PBnA×PAPB1×PB2××PBn(2) 

Regarding equation (2), we set up and propose the research model with the variables of Table 1. PG is assigned as the output variable, and all the other 16 variables are assigned as the input variables for the proposed model.

Variables and their sources

2.3 Model Validation

In order to validate the classification research model, accuracy, precision, recall, and f1-score are usually widely used [19][20]. For calculating those values, the values for TP(True Positive), TN(True Negative), FP(False Positive), and FN(False Negative) should be calculated in advance [19][20]. Then, the values for accuracy, precision, recall, and f1-score are calculated as the following Equations (3) to (6) [19][20].

First, accuracy indicates the ratio of correctly predicted observation among the total observations [19][20]. Second, precision indicates the ratio of correctly predicted positive observations among the total predicted positive observations [19][20]. Third, recall indicates the ratio of correctly predicted positive observations among all the observations [19][20]. Finally, f1-score is the weighted average of precision and recall [19][20].

Accuracy=TP+TNTP+FP+FN+TN(3) 
Precision=TPTP+FP(4) 
Recall=TPTP+FN(5) 
F1-Score=2×Recall×PrecisionRecall+Precision(6) 

2.4 Data and Variables

Table 1 shows the variables and sources of them are depicted. As the first variable and dependence one, this study aims to examine the population change, and PG is composed of either 1 or 0. the value of 1 indicates the growth, that of 0 indicates the decline (After checking the raw data, there was no such data that has the same value in population change; therefore 0 denotes population decline). As the first independent variable, YY indicates the year of the data. Following the variable of YY, there are six dummy variables.

The first and second dummy variables, KTX and SRT, are 1 or 0. The value of 1 indicates that the district has the KTX station or the SRT one, respectively; otherwise, the value is 0. As the third dummy variable, KS is 1 or 0. If the value is 1, the district has both KTX and SRT stations; otherwise, the value is 0. As the firth and fifth dummy variables, MC and IC are either 1 or 0. If the value for MC is 1, it indicates that the district is included in the metropolitan cities. If the value for IC is 1, it indicates that the district is included in the innovation cities. Otherwise, the values for them are 0. As the sixth dummy variable, HI is either 1 or 0. H as in HI indicates whether the district's high-speed rail of KTX or SRT exists. If there is none of them, the value for HI is 0.

Following those six dummy variables, there are nine numeric variables more. EPR indicates the ratio of the elderly among the total population in the district. NOH indicates that the number of households in the district. CA indicates the size of urbanized area in the district. NOE indicates that the number of employees in the district. IOF indicates that the independence rate of finance of the district. SHR indicates that the proportion of the single-person households among the total households. The existing SHR data has been surveyed by one year since 2015, but it had been surveyed by five years before 2015. So, the SHR data in 2014 is replaced with that of 2015 because that of 2014 does not exist. NOP indicates that the number of primary schools per one thousand people in the district. NOA indicates the number of academies per one thousand people in the district. Finally, NOH indicates the number of hospital beds per one thousand people in the district. At last, to increase the normality, the log was taken of the corresponding variables to NOH, CA, and NOE.

2.5 Research Flow

The research flow of this study is shown in Fig. 2. First, we set up the artificial intelligence model based on Naive Bayesian Classification to predict the population change. Second, according to the proposed research model, we collected the related data. Third, the descriptive statistics was performed with the research data. Forth, we performed Naive Bayesian Classification with the research data. Finally, the proposed research model was validated.

Fig. 2.

Research flow


Ⅲ. Result

3.1 Descriptive Statistic

The descriptive statistic results are shown in Table 2. The total number of records is 1,362, which is 227 multiplied by 6 because the number of districts is 227 and the temporal range is 6 years. Looking into the variables, the average PG is 0.3950, which indicates that 39.5% of the districts had population growth. Other dummy variables such as KTX, SRT, KS, MC, IC, and HI can also be considered in the same way.

Descriptive statistics(n=1,362)

Besides the dummy variables, there are categorized groups found: groups A and B. The variable NOP is included in group A, whose data are widely scattered around the mean because the standard deviation is bigger than the average. The other variables are included in group B.

3.2 Results of Naive Bayesian Classification

According to the variables of Table 1 and equation (2), we performed the Naive Bayesian Classification based on Python and got the results as shown in Table 3. One of the factors affecting the performance of machine learning models is the training data size. If the size of training data is too big, the machine learning model is usually overtrained, which is called overfitting. The overfitting phenomenon comes from the high error rates on test data. On the other hand, if the model cannot know the relationship between the input and output data because of a lack of training data, it could generate high error rates on both the training data and test data. This phenomenon is called underfitting. It is not easy to avoid underfitting and overfitting phenomena when performing the machine learning models by training them and predicting the results. Another factor affecting the performance of a machine learning model depends on the test data for validation.

Validation results of Naive Bayesian classification

In this study, therefore, the machine learning was trained with the default size of data, which is 0.25, and predicted the results with randomly collected test data. The randomly collected test data consists of 400 records, 200 population growth-related data, and 200 population decline or population- no-change-related data. Therefore, we performed and conducted this research model 200 times repeatedly and got the results as shown in Table 3.

First, the accuracy average is 0.6369, which indicates that the proposed research model classified correctly by 63.69%. As mentioned earlier, the results depend on the training and test data. Also, this is an inevitable phenomenon when dealing with artificial intelligence models. Second, the average precision is 0.6645, which indicates that 66.45% of correctly predicted positive observations among the total predicted positive observations by the proposed research model. Third, the average recall is 0.5692. This value is the lowest among the average values of those four validation results: accuracy, precision, recall, and F1-score. This value depends on the value of FN, as shown in equation (5). Therefore, precision and recall are challenging to complement each other. Finally, the average F1-score is 0.5973.

3.3 Discussion on the Result

Based on the validation result of Naive Bayesian Classification, we visualized the prediction value of the population in 2019 in Fig. 3. As seen in Fig. 3, the Seoul metropolitan area (including Seoul, Gyeonggi, and Incheon) and the nearby districts were predicted to increase the population from the model. Overall, districts with KTX stations showed a prediction of population decrease, while the districts with SRT stations showed a prediction of population growth. In particular, districts near Gangwon showed population decrease prediction even though having the KTX station. It can be interpreted that KTX has been built comparatively for a long time, but the newly launched SRT is more likely to increase in population in areas with SRT stations. In addition, while launching SRT a consideration of the effect of population dispersion in local cities was important, so other regional balancing projects would have been processed in consideration of SRT.

Fig. 3.

Validation result in 227 districts

The districts with both KTX and SRT stations showed higher prediction value on the population growth. However, among the districts having both KTX and SRT stations, districts in Jeolla had a lower prediction value in population growth. The result shows that the straw effect and alleviation effect varied based on the districts. For example, overall, the districts with SRT stations showed agglomeration to the district having SRT stations, while the districts with KTX stations, especially in Gangwon districts, interpreted outflow of the local population to large cities. Also, districts having both KTX and SRT stations was predicted population growth, but the effect was smaller in the districts of Jeolla.


Ⅳ. Conclusion

This study proposed a Naive Bayesian Classification-based artificial intelligence model to predict the population change. Previous studies usually performed and conducted the traditional methodologies such as regression analysis to investigate the population change. However, this study proposed and carried out the prediction of population increase and decrease.

According to the research result, the average values of all the validation indexes are between 0.5692 and 0.6645. This performance indicates that the results can predict social phenomena using artificial intelligence techniques and suggest additional research. Considering the overfitting and underfitting issues, the parameters for the artificial intelligence model can be changed to improve the validation results of the model. Also, a comparative study that can compare the performance between models such as decision tree, SVM, etc., could be delivered as a future study.

Nevertheless, the methodology used in this study is in the form of a black box, making it difficult to determine which factors influenced exactly and how. Based on city characteristics, future studies could reflect these points and expect to gain more detailed implications for population changes due to high-speed rails.

According to the results of this study, the following implications can be explained. First, this study proposed and carried out the prediction of population increase and decrease, which is one of the key topics in the field of social science, by using an artificial intelligence model that can overcome extrapolation. Second, this study conducted an empirical analysis using public open data from the web. Finally, this study investigated the relationship between opening high-speed railways and the increasing or decreasing population. This study is valuable as a preliminary investigation into population growth using artificial intelligence in transportation and urban planning.

References

  • Statistics Korea, https://www.index.go.kr/potal/main/EachDtlPageDetail.do?idx_cd=1428, . [accessed: Nov. 30, 2021]
  • 2020 Population and Housing Census, https://www.census.go.kr/eng/html/index_en.jsp, . [accessed: Nov. 29, 2021]
  • J. M. Coronado, J. M. de Ureña, and J. L. Miralles, "Short-and long-term population and project implications of high-speed rail for served cities: analysis of all served Spanish cities and re-evaluation of Ciudad Real and Puertollano", Eur. Plan. Stud., Vol. 27, No. 3, pp. 434–460, 2019. [https://doi.org/10.1080/09654313.2018.1562652]
  • T. Deng, D. Wang, Y. Yang, and H. Yang, "Shrinking cities in growing China: Did high speed rail further aggravate urban shrinkage?", Cities, Vol. 86, pp. 210–219, Mar. 2019. [https://doi.org/10.1016/j.cities.2018.09.017]
  • B. H. SUH, G. KIM, C. H. LIM, and H.-K. HA, "Effects of High-speed Rail’s Competition Strategies on Price and Share in the Transportation Market", Journal of Korean Society of Transportation, Vol. 38, No. 2, pp. 97-111, Apr. 2020. [https://doi.org/10.7470/jkst.2020.38.2.097]
  • S. W. Lee, J. K. Jeong, W. S. Zhee, and J. K. Cho, "The effects of high speed rail on population distribution", Journal of The Korean Regional Development Association, Vol. 16, No. 1, pp. 119-138, Mar. 2004. https://www.earticle.net/Article/A171974, .
  • W. Li, X. Wang, and O. Hilmola, "Does High-Speed Railway Influence Convergence of Urban-Rural Income Gap in China?", Sustainability, Vol. 12, No. 10, pp. 4236, May 2020. [https://doi.org/10.3390/su12104236]
  • J. W. Hur, "A critical review on the straw effects of high speed train", Journal of the Korean Urban Management Association, Vol. 23, No. 4, pp. 59–74, Dec. 2010. https://www.dbpia.co.kr/Journal/rticleDetail?nodeId=NODE01583919, .
  • T. Hiramatsu, "Job and population location choices and economic scale as effects of high speed rail: Simulation analysis of Shinkansen in Kyushu, Japan", Research in Transportation Economics, Vol. 72, pp. 15–26, Dec. 2018. [https://doi.org/10.1016/j.retrec.2018.06.007]
  • J. U. Jo and M. J. Woo, "The impacts of high speed rail on regional economy and balanced development: Focused on Gyeongbu and Gyeongjeon Lines of Korea Train Express (KTX)", Journal of Korea Planning Association, Vol. 49, No. 5, pp. 263–278, Jun. 2014. [https://doi.org/10.17208/jkpa.2014.08.49.5.263]
  • J. LEE and Y. YOON, "A Study on the Outliers Detection in the Number of Railway Passengers for the Gyeongbu Line From Seoul to Major Cities Using a Time Series Outlier Detection Technique", Journal of Korean Society of Transportation, Vol. 35, No. 6, pp. 469–480, Dec. 2017. [https://doi.org/10.7470/jkst.2017.35.6.469]
  • M. N. Zheng and J. H. Rho, "Empirical analysis for the effect of the inter-regional express rail system (KTX) on the change of the relative dependency between regions in Korea-Focused on five metropolitan cities", Journal of Korea Planning Association, Vol. 50, No. 7, pp. 141–153, Oct. 2015. [https://doi.org/10.17208/jkpa.2015.11.50.7.141]
  • H. Hihn and D. A. Braun, "Specialization in hierarchical learning systems", Neural Process. Letters, Vol. 52, No. 3, pp. 2319–2352, Sep. 2020. [https://doi.org/10.1007/s11063-020-10351-3]
  • T. Sasakawa, J. Hu, and K. Hirasawa, "A brainlike learning system with supervised, unsupervised, and reinforcement learning", Electrical Engineering in Japan, Vol. 162, No. 1, pp. 32–39, Sep. 2008. [https://doi.org/10.1002/eej.20600]
  • N. Xu, "Understanding the reinforcement learning", Journal of Physics: Conference Series, Vol. 1207, No. 1, pp. 12014, Jan. 2019. https://iopscience.iop.org/article/10.1088/1742-6596/1207/1/012014/meta, . [https://doi.org/10.1088/1742-6596/1207/1/012014]
  • B. T. Pham, I. Prakash, and D. T. Bui, "Spatial prediction of landslides using a hybrid machine learning approach based on random subspace and classification and regression trees", Geomorphology, Vol. 303, pp. 256–270, Feb. 2018. [https://doi.org/10.1016/j.geomorph.2017.12.008]
  • M. Karabatak, "A new classifier for breast cancer detection based on Naïve Bayesian", Measurement, Vol. 72, pp. 32–36, Aug. 2015. [https://doi.org/10.1016/j.measurement.2015.04.028]
  • Y. Ma, B. Xu, and X. Xu, "Real estate confidence index based on real estate news", Emerging Markets Finance and Trade, Vol. 54, No. 4, pp. 747–760, Oct. 2017. [https://doi.org/10.1080/1540496X.2016.1232193]
  • T. Jiang et al., "Tongue image quality assessment based on a deep convolutional neural network", BMC Medical Informatics and Decision Making, Vol. 21, No. 1, pp. 1–14, May 2021. [https://doi.org/10.1186/s12911-021-01508-8]
  • A. Grybauskas, V. Pilinkienė, and A. Stundžienė, "Predictive analytics using Big Data for the real estate market during the COVID-19 pandemic", Journal of Big Data, Vol. 8, No. 1, pp. 1–20, Aug. 2021. [https://doi.org/10.1186/s40537-021-00476-0]
  • KOSIS (Korean Statistical Information Service), https://kosis.kr/eng/, . [accessed: Sep. 13, 2021]
  • KOSTAT, http://kostat.go.kr/portal/eng/index.action, . [accessed: Sep. 10, 2021]
  • Korea Train eXpress (KTX), https://www.letskorail.com, . [accessed Sep. 10, 2021]
  • Super Rapid Train (SRT), https://www.srail.or.kr, . [accessed Sep. 10, 2021]
  • Innovaion City, MOLIT, https://innocity.molit.go.kr, [accessed: Sep. 10, 2021]
Authors
Hyunjung Kim

2010 : B.S. degrees in Economics(Major), Management(Major), Urban and Environmental Engineering(Minor), Handong Global University

2012 : M.S. degrees in Civil and Environmental Engineering (Specialization: Urban Planning), Seoul National University

2015 : Ph.D. degrees in Urban Engineering, The University of Tokyo

2017 : Manager, Environmental Systems Research Institute (ESRI) Korea

Present : Research Professor, Seoul National University

Research interests : Urban Analytics, Smart Cities, Spatio-temporal Big Data Analysis, Artifical Inteligence in Urban Studies, Geographic Information System and Location Based Services, Deep Learning and Machine Learning

Kyuseok Kim

2011 : B.S. degree in Information and Telecommunication Engineering, Korea Aerospace University

2019 : M.S. degree in Information and Communication Technology Engineering, Ajou University

Present : Ph.D Candidate in Urban Planning, Seoul National University

2019 : Senior Research Engineer, LG Electronics(c)

2020 : Professional, LGUplus(c)

Present : Assistant Professor, Korea Polytechnics

Research interests : Data Analysis, Context-awareness, Short-range Wireless Communication Technologies, Deep Learning and Machine Learning

Fig. 1.

Fig. 1.
Study area

Fig. 2.

Fig. 2.
Research flow

Fig. 3.

Fig. 3.
Validation result in 227 districts

Table 1.

Variables and their sources

Variable Abbr. Source
Dummy variable for population growth PG KOSIS [21]
KOSTAT [22]
Year YY
Dummy variable for KTX KTX KTX [23]
SRT [24]
Dummy variable for SRT SRT
Dummy variable for KTX*SRT KS
Dummy variable for metropolitan city MC KOSIS [21]
KOSTAT [22]
Dummy variable for innovation city IC MOLIT [25]
Dummy variable for HSR(High-speed rail)*innovation city HI KTX [23]
SRT [24]
Elderly people ratio EPR KOSIS [21]
KOSTAT [22]
Number of households NOH
City area CA
Number of employees NOE
Independence rate of finance IOF
Single-person household Ratio SHR
Number of primary schools per thousand NOP
Number of academies per thousand NOA
Number of hospital beds per thousand NOH

Table 2.

Descriptive statistics(n=1,362)

Var. Minimum Maximum Average STD.DEV
PG 0 1 0.3950 0.4890
KTX 0 1 0.1564 0.3634
SRT 0 1 0.0352 0.1845
KS 0 1 0.0264 0.1605
MC 0 1 0.3348 0.4721
IC 0 1 0.6028 0.4895
HI 0 1 0.1109 0.3141
EPR 6.1000 39.9000 19.2153 7.9844
NOH 5,139 498,836 93,654.6557 86,939.0869
CA 15.1265 20.2046 17.6121 1.0283
NOE 7.4702 13.8889 10.6791 1.2368
IOF 4.0 72.7 23.7552 13.5418
SHR 17.3 49.5 30.6282 5.2848
NOP 0.2 110.0 13.7410 17.8039
NOA 0.1 7.4 1.2679 0.6175
NOH -1.6094 4.2499 2.4690 0.7544

Table 3.

Validation results of Naive Bayesian classification

Minimum Maximum Average STD.EV
Accuracy 0.5500 0.7050 0.6369 0.0241
Precision 0.5436 0.7742 0.6645 0.0392
Recall 0.3450 0.8450 0.5692 0.0957
F1-score 0.4240 0.7228 0.5973 0.0568