Open Access
Issue
Sci. Tech. Energ. Transition
Volume 79, 2024
Article Number 15
Number of page(s) 21
DOI https://doi.org/10.2516/stet/2024014
Published online 15 March 2024

© The Author(s), published by EDP Sciences, 2024

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Nomenclature

ABC: Artificial Bee Colony

AE: Absolute Error

AI: Artificial Intelligence

ANN: Artificial Neural Network

APE: Absolute Percentage Error

ARIMA: Autoregressive Integrated Moving Average

BPNN: Back Propagation Neural Network

BSA: Backtracking Search Algorithm

CKC: Carbon Kuznets Curve

CO2 : Carbon dioxide

CNN: Convolutional Neural Network

DGM: Discrete Grey Modeling

DL: Deep Learning

E: Error

EO: Equilibrium Optimizer

GBR: Gradient Boosting Regression

GDP: Gross Domestic Product

GHG: GreenHouse Gas

GHGs: GreenHouse Gas Emissions

GM: Grey Model

GWO: Grey Wolf Optimizer

H-W: Holt-Winters

LMDI: Logarithmic Mean Divisia Index

LSA: Lightning Search Algorithm

MAE: Mean Absolute Error

MAPE: Mean Absolute Percentage Error

MENR: Ministry of Energy and Natural Resources

ML: Machine Learning

MNGM: Metabolic Nonlinear Grey Model

MPA: Marine Predators Algorithm

NSE: Nash–Sutcliffe Efficiency

OECD: Organization for Economic Cooperation and Development

OLS: Ordinary Least Squares Regression

PSO: Particle Swarm Optimization

RE: Relative Error

RFR: Random Forest Regressor

RMSE: Root Mean Squared Error

SMAPE: Symmetric Mean Absolute Percentage Error

SOS: Symbiotic Organisms Search

SVM: Support Vector Machine

SVR: Support Vector Regression

1 Introduction

The transportation sector is of great importance for the globalization of nations, and the sustainability of daily life for human beings [1, 2]. It provides a reliable service for the transfer of people, goods, and materials from one point to another. In line with this, automobile use and density in the transportation network have exponentially increased over the past decades due to its immense necessity and enhancement of living standards [3, 4]. However, this increment has triggered enormous energy consumption for the limited non-renewable fossil fuels [5]. Correspondingly, this case creates a negative effect, particularly on the countries that do not have their own fossil fuel reserves. From the beginning, a large portion of the transportation sector has been dominated by the burning of fossil-based fuels in internal combustion engines. Even today, it is estimated that the usage percentage of fossil fuels in the transportation sector is more than 99% [6]. From another perspective, the transportation sector alone consumes a quarter of the world’s energy [7] and more than half of the oil production of the world [8]. Numerically, nearly 11 billion liters per day of diesel, gasoline, heavy fuel oil, and jet fuel are burned to sustain the transportation sector [9]. With this huge energy consumption, there is a rapid depletion of fossil fuel reserves, resulting in an upward fluctuation in their prices day by day. Today, many countries have suffered from the price increase of fossil fuels since their economies are adversely affected by these increments. The problem encountered with the use of fossil fuels is not only the rise in their prices. Such high depletion of fossil fuels is also the main trigger of many serious problems for the environment and living organisms. In other words, high fossil fuel consumption in this sector is also responsible for a growing percentage of pollutants in the atmosphere, which dramatically worsens the air quality on a local and global scale. Consequently, the effect of GreenHouse Gas emissions (GHGs) has nowadays become more visible to all of humanity as well as to nature. Today, it is an accepted reality that the transportation sector is responsible for many human deaths and causes many serious diseases [1012]. More and more, GHGs are, on the other hand, the most serious leading reason for climate change that the world has witnessed nowadays. Numerically, fossil fuel consumption in all sectors has cumulatively accounted for nearly 70% of total GHGs all across the world [13] and approximately 80–90% of the World carbon emissions (CO2) [14]. Among GHG emissions, CO2 emissions are located in a significant position due to their excessive amount. It accounts for 76% of total GHG emissions, followed by methane (16%), nitrous oxide (6%), and fluorinated gases (2%) [15]. Therefore, it has been observed in the literature that the published works have focused in general on estimating only CO2 emissions, and the researchers have set a nexus between the energetic and economic indicators with carbon emissions.

Türkiye is a country whose transportation network is constantly increasing year by year. In the country, the annual growth rate of registered motor vehicles is about 9%, and it has nearly doubled in the past two decades [10]. This is a very critical problem for sustainable economic development. Therefore, the government is seriously looking for solutions to prevent energy consumption and carbon emissions in the transportation sector. Another significant issue is that Türkiye is a country that is highly dependent on energy. Considering this issue numerically, Türkiye is a country that imports 3/4 of its primary energy needs and more than 90% of its petroleum needs [10, 16, 17]. On the other hand, the transportation sector in Türkiye accounted for more than 1/4 of Türkiye’s total energy consumption in 2017 [18, 19]. All of these clearly demonstrated that the transportation sector is very critical for the sustainable development targets of the governments, the improvement of living standards, and the achievement of a healthy environment. In this concept, the ongoing attempts available in the literature can be classified in two ways. On one hand, the researchers have strongly relied on the idea that electric vehicles will be a robust solution to carbon mitigation arising from the transportation sector [2022]. However, even if there are significant initiatives for alternative fuels and propulsion systems in today’s world, the projections for 2050 demonstrate the key role of fossil fuels as well as the high share thereof in the transportation sector [9]. Therefore, the combustion community is dedicated to mitigating the carbon emissions arising from the transportation sector. In this direction, fuel researchers have, on the other hand, tried to improve the alternative fuels to the conventional fuels [2325], which have a high energy density, low cost, and fewer exhaust pollutants. In both scenarios, the researchers were aware of the problem of the transportation sector and agreed that a quick solution to the problem was required as soon as possible. The solution to the problem must be found in advance, and the steps must be taken starting today. As it is well-known, yesterday’s decisions will affect today, and today’s decisions will affect the world of tomorrow. That is why it is vital to forecast transportation-based energy consumption as well as transportation-based carbon emissions in any state. With this viewpoint, decision-makers and policymakers can revise their future investments in all sectors, including the transportation sector so that they can reach their carbon mitigation and energy-saving goals. All of which converts the issue into a very attractive and hot discussion around the global community. Many researchers have used various soft computing approaches, time series, and empirical models to forecast the carbon footprint and energy consumption arising from the transportation sector with minimum error. Table 1 gives a summary of the recent works dealing with the forecasting of energy consumption and CO2 emissions from different sectors.

Table 1

Previous literature summary dealing with the forecasting of transportation-based energy and carbon emissions.

Isik et al. argue that economic expansion, population growth, and emission intensity all contribute to Türkiye’s transportation sector’s CO2 emissions, with fleet efficiency and fuel switching incentives showing encouraging trends from 2000 to 2010, but SUV popularity poses a challenge to emission reductions from 2010 to 2017. Influential factors on CO2 emissions in the transportation sector, emission reduction potential, effects of fuel switching incentives, and fleet efficiency on emission mitigation. It was also reported that reducing CO2 emissions in the transportation sector requires strategic measures, including: (i) meticulous planning of freight transportation demand management, encompassing comprehensive designs of production sites, material flows, and demand points, along with economic activities; (ii) implementation of passenger transportation strategies to decrease car travel, such as zoning for public transit corridors and enhancement of the public transportation system; and (iii) implementation of effective incentives for energy-efficient vehicles and clean energy technologies like electric cars to persuade individuals [19]. Bozdağ et al. conducted training in Machine Learning (ML) algorithms (Artificial Neural Network (ANN), k-Nearest Neighbors (kNN), LASSO, RF, Support Vector Regression (SVR), and Extreme Gradient Boosting (XGBoost)) using PM10 concentrations collected from seven stations located in Ankara, Türkiye. The input consisted of PM10 concentrations from 2009 to 2017 at six stations in Ankara. The goal was to forecast the PM10 concentrations for the seventh site in the year 2018. For each station, the model development stage was iterated, and the algorithms’ performance and error rates were assessed by comparing their results with the actual outcomes. The most optimal outcomes were achieved using Artificial Neural Network (ANN) with a coefficient of determination (R 2) of 0.58, a Root Mean Square Error (RMSE) of 20.8, and a Mean Absolute Error (MAE) of 14.4 [26]. Ozturk and Ozturk used data spanning from 1970 to 2015 about the utilization of coal, oil, natural gas, renewable energy sources, and overall energy to predict Türkiye’s energy consumption over the next 25 years using the Autoregressive Integrated Moving Average (ARIMA) model. The ARIMA models have been identified as ARIMA (1, 1, 1) for coal consumption, ARIMA (0, 0, 0) for natural gas consumption, ARIMA (0, 1, 0) for oil consumption, ARIMA (1, 1, 0) for renewable energy consumption, and ARIMA (0, 1, 2) for total energy consumption. The findings suggest that Türkiye’s energy consumption is projected to see a sustained upward trend until the conclusion of 2040. The consumption of coal, natural gas, oil, renewable energy, and total energy is projected to grow at average annual rates of 4.87%, 4.39%, 3.92%, 1.64%, and 4.20% accordingly during the next 25 years [27].

Based on the previous literature given in Table 1, upon analyzing the energy demand and carbon emissions arising from the transportation sector, it is well noticed that the researchers are trying to forecast the carbon emissions and energy consumption of the states with minimum error, depending on various input parameters for different sectors. In the literature, it is seen that ANNs are also used in different fields. Predictions of concrete compressive strength with ANNs [28], regression-based estimation of tunnel convergence [29], estimation of the safety factor of retaining walls with ANNs and genetic algorithms [30], and estimation of the foundation period of infilled reinforced concrete structures with ANNs [31] have been carried out. Accordingly, it is seen that the researchers have used various forecasting techniques, including ML algorithms, time series, empirical mathematical equations, and hybrid models, in recent years. In line with this, the present research focuses on the forecasting of transportation-related carbon emissions as well as energy consumption in Türkiye. Accordingly, years, gross domestic product, vehicle kilometers, transportation-based CO2 emissions, and energy consumption between 1970 and 2016 in Türkiye are used as input and output parameters. In the forecasting of CO2 emissions in Türkiye, the MLP, XGBoost, and SVM algorithms are used, taking into account several scenarios as follows: scenario 1: ENERGY/VK/POP/Y/GDP; scenario 2: ENERGY/VK/POP/Y; scenario 3: ENERGY/VK/POP; and scenario 4: ENERGY/VK. Furthermore, the performance successes of the algorithms are discussed with R 2, MAE, MAE, MAPE, MSE, RMSE, rRMSE, and MBE statistical metrics in the present research.

2 Methodology

In this section, where the data is obtained from, ML algorithms and scenarios created are explained in detail. Section 2.1 specifies what was done during the preparation of the data set. In Section 2.2, ML techniques and detailed mathematical explanations of the ML algorithms preferred in this study are presented under sub-titles.

2.1 Dataset

The data used in this study are transportation energy consumption/demand – Mtoe (ENERGY), vehicle kilometer-km (VK), POPulation – million (POP), Year (Y), Gross Domestic Product Per Capita (GDP), and CO2 emissions from the transportation sector between 1970 and 2016. Among the parameters mentioned above, VK, GDP, POP, and ENERGY were used as input parameters in the analysis, and CO2 emissions from the transportation sector were chosen as output parameters. The change in the raw data used in this study according to the years is shown in Figure 1. When Figure 1 is examined, it can be said that all input parameters have generally increased over the years. In addition, different scenarios for input parameters (scenario 1: ENERGY/VK/POP/Y/GDP, scenario 2: ENERGY/VK/POP/Y, scenario 3: ENERGY/VK/POP, and scenario 4: ENERGY/VK) were tested in this study and the results were compared. In this framework, gross domestic product per capita, population, energy consumption based on the transportation sector, and CO2 emissions data were obtained from the World Bank [50]. The number of vehicles and kilometers in Türkiye were obtained from the Turkish General Directorate of Highways [51] and the Turkish Statistical Institute [52]. The input parameters, the data source used in this study, and their effects on the output parameters are presented in detail in Table 2. In addition, some important descriptive information about the data sets used in this study is given in Table 3.

thumbnail Fig. 1

The data used in this study.

Table 2

Input parameters and their effects on the output parameters, along with data source used in this study.

Table 3

Some of the statistical parameters of the dataset used in this study.

Considering the differences between the size scales given in Table 3 for each parameter used in this study, it is not correct to present the data graphically. Therefore, it is more accurate to use a scaled form to make the data mathematically meaningful and comparable. Therefore, in this study, each parameter was used by normalizing between 0 and 1. In this context, the equation presented below was used in the normalization of the data [53].(1)

Here, Xnormalized denotes X values normalized between 0 and 1. Xmin and Xmax indicate the minimum and maximum values, respectively, within the values in the dataset.

The act of standardizing data into a format that allows for fair feature comparisons and prevents problems with disparate scales is known as data normalization. Data normalization can reduce data redundancy, enhance data integrity, and improve data analysis. There are various methods of data normalization, such as rescaling, mean normalization, and standardization. Min-max normalization is one of the most popular methods of data normalization. It scales the range of features to a fixed interval, usually [0, 1] or [−1, 1], depending on the nature of the data. When the distribution’s form and the precise values of the minimum and maximum are significant, min-max normalization is advantageous. It may not function effectively, however, if the dataset contains extreme values since it is vulnerable to outliers.

Table 4 shows the values that give the correlation indicators of each input parameter to the output parameter. The purpose of this was to provide the relationship between input and output parameters. When the table is examined, the minimum correlation value was found to be 0.89928, while the highest value was found to be 0.99971. In this context, the relationship between output and input parameters can be classified according to correlation coefficients. In other words, taking into account some studies in the literature [15, 54, 55], the comments related to the correlation coefficients can be expressed as follows: |r| <0.2 very weak relationship, 0.2 ≤ |r| < 0.4 weak relationship, 0.4 ≤ |r| < 0.6 moderate relationship, 0.6 ≤ |r| < 0.8 strong relationship, and |r| ≥0.8 very strong relationship. As a result, when the data and the relationship between each other are evaluated, it can be noted that there is a very strong connection between each input and output parameter. This situation states that it can be used to train ML algorithms to detect CO2 emissions values originating from the transportation sector in Türkiye’s current conditions.

Table 4

Correlation indicators of variables with respect to each other used in the present study.

The normalized values of the input and output parameters are shown in Figure 2. As can be seen from Figure 2, when the parameters are evaluated on the basis of years, it is understood that there are some decreases and increases. However, looking at the general trend, it can be said that the analyzed parameters are on an upward trend.

thumbnail Fig. 2

Normalized data used in this study.

2.2 Artificial intelligence techniques

Artificial Intelligence (AI) is a technique used in areas such as ML, Natural Language Processing (NLP), optimization, image processing, etc. to enable computers to mimic human behavior, while ML is a subset of AI techniques that enable computer systems to learn from data observations [56]. In recent years, there has been an increase in the use of AI and ML techniques for data analysis and intelligent processing of information [57]. ML works in integration with smart hardware and automated systems that emerged with the Industry 4.0 revolution. Considering all these, it is very important to determine the basic factors affecting the quality of life, which is the most important feature of the systems and life cycle, with these algorithms. ML algorithms are grouped into four main categories. These are shown as supervised, unsupervised, semi-supervised, and reinforcement learning in the field, respectively. These algorithms are often used to classify data and apply regressions. In Figure 3, the popularity values of these algorithms obtained from Google trends [58] in the last five years around the world are given.

thumbnail Fig. 3

Popularity scores of ML algorithms worldwide over the past 5 years.

The interest in AI algorithms continues to increase at a certain intensity every year with the development of technology. This is an indication that ML algorithms will not disappear from our lives in a short time. Non-traditional methods in complex engineering applications have represented an important class in recent years [59]. ANNs are very common and useful models used for classification, clustering, and prediction in different multidisciplinary studies. ANNs are competitive and useful models that are more preferred in ML than classical regression and statistical methods [60]. ML methods for generating effective and successful models in different application areas play an important role in classifying and predicting data. Common supervised neural networks are Single Layer Perceptron (SLP) [61], Multi-Layer Perceptron (MLP) [62], and linear classifiers [63]. Besides, Support Vector Machines (SVM) [64], kNN [65], naive bayes [66], and decision trees [67] are also examples of popular ANNs. In the literature, many algorithms that can be used in the field of ML have been proposed.

The characteristics of the algorithms used in the data set and the methods applied by these algorithms in the study are given below in detail.

2.2.1 MLP algorithm

MLPs are a common type of neural network used in different fields such as system load calculation, prediction of functional approaches, and analysis of complex systems [68]. An MLP can link multiple hidden layers between the input and output layers. In addition to being part of the MLP supervised learning method, it is a deep learning method. When distinguishing nonlinear data, the algorithm uses activation functions such as sigmoid and logistic functions [69]. In this way, it differs from logistic regression. Starting from the input layer, the data is propagated forward to the output layer. The error value is calculated based on the output obtained, and the error value is propagated back to minimize the error. The error value is calculated based on the difference between the known result and the predicted value, and the algorithm runs until this result is minimized. Figure 4 shows an MLP with an input, a scalar output, and a hidden layer. There are neurons in the input layer. {x i |x1, x2, x3x m } can be defined as input properties to the input layer. The neurons in the hidden layer weigh the values in the previous layer. In the output layer, the values from the hidden layer are converted to the output result. The MLP artificial neural network model is not prone to overfitting like deep neural networks [70]. In the experiments, the number of parameters (neurons) was kept low to prevent overfitting.

thumbnail Fig. 4

One hidden layer MLP.

2.2.2 XGBoost algorithm

Gradient Boosting Machines (GBM) is a model of augmented ML in which a strong right learner uses a set of weak and base learners [71]. In boosting tree algorithms, the regression tree is created by the user as a weak learner and continues by dividing into two subsets at each level until the tree reaches the maximum depth. It is generally preferred to minimize the gradient loss function. The XGBoost algorithm is seen as the best of the decision tree-based algorithms. The XGBoost algorithm is the upper segment of the gradient boosting algorithm, which has been improved with different arrangements, giving better results. XGBoost also calculates the quadratic gradients of the loss functions to minimize the loss in determining the best model, thus increasing performance by preventing overfitting [72]. In this study, the XGBoost algorithm was chosen for estimating CO2 emissions due to its two main properties. First, XGBoost is one of the most popular boosting tree algorithms for GBM [73]. It is widely used in solving problems due to its maximum performance with minimum requirements in the industrial field [74]. The second main feature of the XGBoost algorithm is that it parallelizes the entire tree structure, especially during training over the CPU, and performs cache optimization of data structures and algorithms.

The XGBoost algorithm is more resistant to overfitting than deep learning algorithms. However, in order to prevent overfitting, data pre-processing and the use of fewer attributes are beneficial. Hyperparameter settings in the XGBoost algorithm also prevent overfitting. For this reason, all these factors were taken into consideration to prevent overfitting.

2.2.3 SVM algorithm

SVM is an important technique widely used in the classification and regression fields of ML. Because of this feature, SVM was also used in this study. With the hyperplanes it creates in the high or infinite-dimensional plane, SVM achieves low generalization error at high margins, thus demonstrating a strong distinction in each class [75]. It also has a property corresponding to a convex optimization problem in model determination, which is important for achieving optimal hyperplane parameters. Thus, the classes to which the new data belongs are separated by the optimal boundary line. Another important feature of SVM is that data sets cannot be separated by a line or plane. Classes are separated from each other by determining the optimal lines for each class to handle more complex data sets, such as non-separable and nonlinear datasets.

Of the 47 data used for each algorithm from 1970 to 2016, 85% were for educational purposes; and 15% were used for testing purposes. Test sets of algorithms are randomly selected and transferred to the classification network. The maximum number of iterations in the MLP network is set at 500. The model uses the backpropagation function, where the weight values are updated according to the behavior of the error function. In the study, while the information in the input cell is transferred to the intermediate layer, the relevant weights are multiplied, and the net input of each intermediate layer is calculated according to the weighted sum function. The output of the intermediate layer cells was calculated by passing the net input obtained through the activation function. Weights were updated by backward calculation, and the error rate was tried to be minimized. The flowchart of the processes carried out in this study is shown in Figure 5.

thumbnail Fig. 5

Flowchart of the study.

The SVM algorithm is resistant to overfitting. Overfitting is not as common as in artificial neural networks. But data normalization – adjusting the C value and the seed value – helps prevent overfitting. Additionally, using fewer features also prevents overfitting.

3 Statistical indicators

The accuracy of a model gives an idea of how well it can predict, but that does not necessarily mean how well the model will perform in real-world conditions. The fit (suitability) of a model, on the other hand, is about whether the model fits the characteristics of the data. Accuracy and suitability are important for model selection and performance evaluation. Therefore, in the current study, the performance success of the prediction results generated by ML algorithms (MLA) is discussed extensively with the following statistical parameters that are frequently used in academic studies: In equations (2)(8), the indicator results can be calculated. The variables in these equations are: n represents the number of data points, ∑ the sum of all data points, x i the actual measured value, y i the predicted value, and the mean of the partition results.

Mean absolute error (MAE) is an indicator that is particularly useful when evaluating models that are susceptible to systematic errors. It offers a simple measure to evaluate the overall accuracy of the model by measuring the average magnitude of the deviation between the predicted values and the actual values [76]. In the model success comparison, a model with a smaller MAE value is considered more successful.(2)

Mean absolute percentage error (MAPE) is calculated as the average of absolute percentage errors between actual and predicted values, expressed as a percentage. MAPE is a useful metric that provides a percentage measure of the accuracy of a forecast. In this way, datasets can be easily compared and interpreted. However, it is critical to interpret the results carefully and use MAPE in conjunction with other metrics because of its limitations, such as being misleading when the actual values are close to zero [77].(3)

Mean Bias Error (MBE) is a metric whose statistical analysis is used to evaluate the performance of a model. A negative MBE result indicates that the estimates are consistently lower than the actual values, while a positive result indicates that the estimates are consistently higher than the actual values [78, 79].(4)

The Mean Square Error (MSE) statistical indicator is the square of the difference between the actual and predicted values, and it is required to be minimal by training algorithms in prediction studies [80].(5)

The coefficient of determination (R 2) is a statistical measure used to evaluate how fit a regression model is. R 2 represents the percentage of the independent variables’ explanatory power for the dependent variable. R 2 takes values between 0 and 1 [81]. The closer the value of the model is to 1, the better the fit [82].(6)

Root Mean Square Error (RMSE) is a performance indicator used to measure how far a model’s predictions are from the actual values. The RMSE indicates that (i) the predicted values are closer to the actual values; (ii) the “0” ideal value; and (iii) the model with the lower value performs better. Also, the results must be positive [83]. The fact that RMSE is sensitive to larger errors as well as outliers makes it a good measure of the model’s performance success [84].(7)

Relative root mean square error (rRMSE) is a modified version of RMSE that takes into account the scale of the measured data. The rRMSE is calculated as the ratio of the RMSE to the mean of the measurement results [85]. rRMSE normalizes RMSE by data range to allow a more meaningful comparison of the accuracy of different models.(8)

The a20 metric quantifies the percentage of instances when the absolute deviation between actual and predicted values is equal to or less than 20% of the actual value. A higher a20 score indicates better prediction accuracy, as the model can make more accurate predictions that are closer to the true values. The best possible score is 1.0 [86, 87].(9) (10)

4 Result and discussion

In this section, the findings and discussions obtained within the scope of the study are presented under sub-headings. First, a general year-by-year discussion of CO2 emissions from the transportation sector is planned to be presented. In this context, the reasons behind the increase in CO2 emissions were tried to be explained. Thus, it will be possible to understand how the analysis is followed. Then, the results of analyses performed in light of 3 different ML algorithms and 3 different scenarios described in the above sections are shared. The results of this analysis were discussed by taking into account the most commonly used statistical indicators in the literature, and the most appropriate algorithm and scenario were determined for this research study.

4.1 The outlook of transportation-based CO2 emissions in Türkiye

In addition to the effects of the transportation sector on the economic development of countries, it also has negative effects on air quality. This is particularly problematic for developing countries. Today, the basic energy needs of the transportation sector are supplied by fossil-based fuels. For this reason, it is a known fact that many deaths occur worldwide due to diseases associated with air pollution. It can be clearly stated that knowing the CO2 emissions values, which depend on many parameters along with energy demand, is important for countries to turn and update their policies and investments in this direction in the coming years. One of the most effective techniques within the scope of combating climate change is setting a price for CO2. It can be pointed out that this case appears to be the most appropriate theory. Unfortunately, carbon prices are still relatively low in many countries in order to struggle with climate change, as stated above. In other words, carbon prices have an inefficient level. On the other hand, it is stated that this is not the end, based on the report published by the Organization for Economic Cooperation and Development (OECD). The energy taxes are not designed in light of the carbon price. But they can contribute to the influence of the emissions trading systems as well as CO2 emissions. Meanwhile, the OECD has computed effective carbon prices considering the aforesaid impacts for more than four major countries worldwide. In addition, the effective CO2 price is mapped for the purpose of presenting which countries have the highest effective carbon price [88]. The EU carbon permits data chart is plotted in Figure 6. As observed, the highest EU carbon permit was €104.80 in February 2023. The EU carbon permits data chart is plotted in Figure 6. The highest EU carbon permit was €104.80 in February 2023. As observed, carbon prices have increased sharply in recent years. Growing environmental concerns and imposing strict restrictions are very important. In this perspective, there is a need for restrictions, regulations, norms, legislations, and wide-ranging policies that are strongly recommended to be supported to reduce fossil-based fuel consumption in the transportation sector and thus reduce GHGs.

thumbnail Fig. 6

EU carbon permits data chart [89].

The variation in CO2 emissions from transportation in Türkiye between 1970 and 2016 is given in Figure 7. As can be seen from the graph, although CO2 emissions have increased and decreased over the years, it should be said that they have increased when a general evaluation is made. This upward trend in CO2 emissions originating from the transportation sector, of course, depends on many parameters. It can be said that these are population, energy consumed in the transportation sector, an increase in the number of vehicles, etc. This is very important for understanding the relationship between the parameters. In addition, when Table 4 is examined, it is understood that vehicle kilometer information, population, GDP, and energy consumption values used as input parameters provide a very strong correlation with CO2 emissions. In other words, vehicle kilometers were correlated at 0.96875, GDP at 0.92063, population at 0.96002, and energy consumption at 0.98563.

thumbnail Fig. 7

Change of CO2 emissions from transportation between 1970 and 2016 in Türkiye.

When the CO2 emissions originating from the transportation sector in Türkiye are examined, the value that was 11.503 x 106 tons in 1970 reached 68.016 x 106 tons at the end of 2016 [50]. From 1970 to 2016, vehicle kilometers increased by 18.47 times, GDP by 22.24 times, population by 2.29 times, and energy consumption by 8.06 times. In general, when the change in CO2 emissions over the years is evaluated, it can be stated that there is an approximate linear trend. A linear curve is drawn in Figure 4, and the equation of the curve is also presented to show this situation more clearly. On a yearly basis, CO2 emissions increased by approximately 18.778% in 1993 compared to the previous year. In the same year, energy consumption increased by 21.871%, population by 1.623%, GDP by 11.885%, and vehicle kilometers by 5.282%. In contrast, the largest reduction occurred in 1980, with approximately 9.221%. In the same year, energy consumption did not change; the population increased by 2.284%, GDP decreased by 24.768%, and vehicle kilometers increased by 1.927%. These results show that GDP and energy consumption have significant effects on CO2 emissions. Due to the rapidly increasing population, socioeconomic developments, and modernization have increased the burden on the transportation sector. This situation causes an increase in CO2 emissions. According to the data of the Turkish Statistical Institute, as of December 31, 2022, the population of Türkiye increased by 599,280 people compared to the previous year and increased to over 85 million [52]. In addition, it is stated that a large part of this population consists of people of working age. As is known, Türkiye is a developing country. Considering the GDP data, there was a 3.5% growth in the last quarter of 2022 and a 5.6% growth in the whole of 2022 compared to 2021. While the per capita income was $9,693 in 2018, it is expected to exceed $12,000 in 2023.

The increase in the data described above directly affects the number of road vehicles in Türkiye. The number of motor vehicles registered to traffic in Türkiye over the years is presented in Figure 8. Based on the data obtained from TurkStat, the number of registered motor land vehicles in traffic in Türkiye, which was over 230 thousand in 1966, will increased to over 26 million by the end of 2022. When the data from the last decade is taken into consideration, the number of registered motor land vehicles in Türkiye has increased by 47.62%. However, the number of registered motor land vehicles in Türkiye grew by an average of 8.98% annually. While the highest annual increase was in 1979, the lowest increase was in 2019 with 1.27%. In addition, 53.88% of the total number of registered vehicles in Türkiye consists of automobiles. When the distribution of automobiles by fuel type is examined, it is reported that 26.8% are gasoline, 36.9% are diesel, 35.1% are LPG and the rest are hybrid, electric, etc.

thumbnail Fig. 8

The variation of number of registered motor vehicles in Türkiye according to the year (The graph was plotted based on the data coming from the ref [52].

Fossil-based fuels like coal, oil, and natural gas constitute the base item of energy consumed in basic industries such as trade, industry, housing, and transportation, and oil ranks first among the above-mentioned fuel types. When the global primary energy consumption rates of 2021 are evaluated, fossil fuels constitute 83.4% of total consumption. When the foreign dependency data of Türkiye between 1990 and 2020 were evaluated, the foreign dependency ratio, which was 51.6% in 1990, reached 70.1% in 2020. When Türkiye’s primary energy consumption is evaluated on the basis of sectors, it is stated that 18.3% is used in the transportation sector [90]. Oil consumption, which was 313.2 thousand b/d in 1980, nearly tripled and reached 922.6 thousand b/d in 2020. As can be seen from Figure 5, it can be predicted that there will be an increase in the number of registered motor land vehicles in Türkiye over the years. This shows that Türkiye’s oil consumption is increasing very rapidly. In other words, the energy demand of the transportation sector will undoubtedly boost with the augmentation in the number of vehicles. Having realized 50,127 b/d of crude oil production in 2012, Türkiye produced 69,332 b/d of crude oil in 2021. On the other hand, while Türkiye imported 385 thousand b/d of crude oil in 2012, this value increased to 631 thousand b/d in 2021. The fact that a large part of the energy need is met from oil and its derivatives and the oil reserves are constantly decreasing leads humanity to long-lasting, non-depleting, clean, and environmental sources. Considering environmental factors such as acid rain, global warming, and climate change due to the effects of carbon dioxide and GHG formed as a result of the use of fossil fuels, the importance of alternative energy sources whose effects on the environment can be controlled is gradually increasing. In particular, it can be stated that Türkiye, which meets more than 90% of its oil needs from other countries through imports, is a net importer in terms of energy resources. This situation not only hinders the economic growth of the country but also raises environmental issues. Another significant point is the emergence of the necessity of increasing the tendency towards alternative and renewable energy sources to meet Türkiye’s needs in terms of energy resources [90, 91].

As a result of the burning of fossil fuels in internal combustion engines, many harmful pollutants are released into the atmosphere, and therefore, these gases seriously threaten the health of humans and other living things [92]. In addition, the use of petroleum-based fuels significantly increases the effects of GHG, leading to global warming [93]. To overcome global warming, one of the biggest environmental concerns of the century, CO2 emissions need to be reduced [94]. In light of all the situations and information described in this section, it is understood that the energy used in the transportation sector has a great impact on both the environment and the country’s economy. As a result, this should be taken into account for future investments. In addition, the above-mentioned problems can be overcome by concrete steps to be taken in the short, medium, and long term.

4.2 Prediction of transport-based CO2 emissions using several ML algorithms

In this study, supervised learning models were used; this is because algorithms or models learn from labeled data. It is an important ML technique used, especially in classification and regression studies. In this study, the XGBoost ensemble learning algorithm, which is the most advanced of the boosting algorithms used in both classification and regression studies, was used. To see the response of the system to the new samples, the MLP algorithm was chosen. In addition, the SVM was included in our analysis due to its high accuracy in regression studies with ML. A comprehensive study is presented in which a data-driven system was tested with ML algorithms. In the study, the data were analyzed using the regression technique. The regression technique is a very popular method among ML modeling techniques. The relationship between the dependent variable and the independent variables was examined by using the most appropriate straight line obtained in the applied regression method. In classification, a linear line is considered a boundary separating two classes, while the line created in regression reflects the linear relationship between two variables. MLP, on the other hand, is the most popular type of artificial neural network used in solving nonlinear problems [95].

In this analysis, four different scenarios were created with three different ML methods. These are scenario 1: ENERGY/VK/POP/Y/GDP, scenario 2: ENERGY/VK/POP/Y, scenario 3: ENERGY/VK/POP, and scenario 4: ENERGY/VK conditions of the input parameters as reported in the above sections. Within the framework of these scenarios, CO2 emissions from the transportation sector were estimated between 1970 and 2016. Figure 9 shows the estimated CO2 emissions over the operating range, the actual values, and the estimation error that occurred. As observed from the graph, CO2 emissions from the transportation sector are generally increasing in Türkiye. It is not difficult to say that the estimated CO2 emissions using ML techniques are very close to the actual values. In other words, the estimated and actual CO2 emissions are very close to each other.

thumbnail Fig. 9

Prediction of transportation-based CO2 emissions using various ML algorithms a) scenario 1, b) scenario 2, c) scenario 3, d) scenario 4).

The ML algorithms used in this study and the statistical indicators obtained in line with the planned scenarios are presented in detail in Tables 58, respectively, for better comparison. Prediction success was calculated using the target value and the obtained result value. To measure the success of the predictions, the success prediction ratio, R 2, MAE, MAPE, MSE, RMSE, rRMSE, and MBE methods were used. The equations of the relevant statistical indicators are explained in detail in Chapter 3. When the tables are evaluated, each statistical indicator should be found at certain intervals. For example, R 2 should be in the range of 0–1, RMSE 0–5, rRMSE 0–15%, MAPE 0–15%, MABE 0–5, and MBE 0–3. Considering all the results, each ML algorithm gave very satisfactory results in the estimation of CO2 emissions from transportation in Türkiye.

Table 5

Statistical indicators of XGBoost, SVM, and MLP algorithms for scenario 1 to predict CO2 emissions.

Table 6

Statistical indicators of XGBoost, SVM, and MLP algorithms for scenario 2 to predict CO2 emissions.

Table 7

Statistical indicators of XGBoost, SVM, and MLP algorithms for scenario 3 to predict CO2 emissions.

Table 8

Statistical indicators of XGBoost, SVM, and MLP algorithms for scenario 4 to predict CO2 emissions.

The average MSE value in the study is ideal because it is very close to 0. It is desired that the RMSE value be close to zero (in the range of 0.03–0.1 in this study), and it is seen that it is very close to the desired value. It is also desired to have the smallest values for MAE and MAPE, and in this study, it is seen that the values are very close to the desired values in the two-featured XGBoost algorithm with 0.0278 and 0.0576, respectively. When the scenarios created were examined, it was seen that the XGBoost algorithm was the most successful. The highest R 2 value (0.9886) of the XGBoost algorithm was reached when the VK and ENERGY features were used in the 4th scenario. It was observed that the MLP algorithm reached the highest value of 0.9689 and the SVM algorithm reached the highest R 2 value of 0.9883 in the 4th scenario. For the limit values stated above, while the R 2 value of the XGBoost algorithm in the 4th scenario gave the closest result to 1, it showed that it was the most appropriate algorithm with the values of RMSE 0.0333, rRMSE 3.4950%, MAPE 0.0576, MAE 0.0278, and MBE −0.0010.

5 Conclusion

Within the scope of this study, is aimed at estimating the CO2 emissions originating from the transportation sector in Türkiye with different artificial intelligence algorithms. Three different ML algorithms (XGBoost, SVM, and MLP) were used to estimate CO2 emissions. In the training of the algorithms, the correlation method was preferred in determining the input parameters, and the parameters that provided the most correlation with the output parameter were determined. These are ENERGY > VK > POP > Y > GDP, respectively. Accordingly, four different scenarios were prepared, and the effects of parameter groups on CO2 emissions were tried to be determined using various statistical indicators. In other words, seven different statistical measurements were used to determine the prediction performance of the algorithms. In conclusion, based on this research, the following important findings can be listed:

  • It was determined that there was a strong correlation between energy consumption from transportation (0.98563), vehicle kilometers (0.96875), population (0.96002), year (0.95643), economic indicators (0.92063), and CO2 emissions.

  • Compared to today, when the metrics mentioned above and selected as input parameters within the scope of this study are evaluated, it is observed that there will be increases in the coming years.

  • When all the results obtained were evaluated, it was observed that the XGBoost algorithm came to the forefront in estimating the CO2 emissions from the transportation sector, while the other algorithms generally gave lower results. However, when all metrics are considered, there is no doubt that each ML algorithm offers very successful results in predicting CO2 emissions outputs.

  • It was seen that the best scenario for estimating CO2 emissions according to four different scenarios planned is number 4 and all ML algorithms produce very successful results when all parameters are accepted as inputs.

  • It should be said that all algorithms give successful results in the estimation of CO2 emissions according to R 2, which is one of the statistical indicators widely used in the literature.

  • According to the MAPE indicator, it was determined that all algorithms other than the XGBoost algorithm (MAPE = 0.0576) gave low prediction accuracy in CO2 emissions estimation.

  • It has been seen that Artificial Intelligence Technologies can be used in the estimation of CO2 emissions as in every other field.

In summary, it is clear that to sustain the momentum in Türkiye’s economic development in recent years, decision-makers need to review energy policy to a large extent. Otherwise, the biggest obstacle to economic development will be the energy problem. In addition, due to the limited energy resources, the energy demand method is necessary for the more efficient use of energy resources in the future. This will continue to be an important argument for CO2 emissions. As a result, in the continuation of this study, the future of parameters such as future energy demand, number of vehicles, and CO2 emissions can be determined by making predictions with forward-looking time-based algorithms, and measures can be taken within this scope. Although there are estimation studies for the future in the literature, it is not difficult to say that they are insufficient. As a result, the different ML algorithms preferred in this paper have provided very good results in estimating CO2 emissions from the transportation sector in Türkiye. However, the forecasted results for the future displayed that some serious initiatives have to be taken by policymakers in the near future to determine and mitigate the growth rate of the increases in the above-mentioned parameters. It is thought that the results obtained from this study will contribute to the development of investment plans for public institutions. In addition, it is planned to estimate CO2 emissions with different deep learning architectures and features in future studies.

6 Limitations and future work

This study has some limitations that should be acknowledged and addressed in future work. First, this study applied three artificial intelligence algorithms (MLP, XGBoost, and SVM) to estimate CO2 emissions in Türkiye’s transportation sector. Other algorithms (Random Forest, K-Nearest Neighbors, etc.) could also be explored and compared for their performance and suitability for this task. Second, this study considered five input parameters (energy consumption, vehicle kilometers, population, year, and gross domestic product per capita) that were assumed to have a linear relationship with CO2 emissions. However, there could be other factors, such as fuel types, traffic patterns, weather conditions, or vehicle types, that could have a nonlinear or interactive effect on CO2 emissions. Future research could incorporate these additional data sources to further refine CO2 emissions predictions and potentially identify specific emission hotspots. Third, this study designed four scenarios based on the correlation effect of the input parameters. However, there could be other scenarios that reflect different assumptions, projections, or policy interventions that could affect CO2 emissions in the transportation sector. Future research could develop scenario-based models that simulate the effects of various policy interventions, such as carbon taxes, subsidies, regulations, or incentives, on CO2 emissions and inform the design of effective emission mitigation strategies. Fourth, this study was restricted to Türkiye, which has a specific geographic, economic, and social context that could influence CO2 emissions in the transportation sector. Therefore, the findings of this study may not be generalizable or applicable to other regions or countries that have different characteristics or challenges. Future research could extend the scope of this study to other regions or countries and compare the results and implications for CO2 emissions in the transportation sector.

Acknowledgments

The authors would like to thank the Editors and anonymous reviewers for helping us to present a balanced account of our paper.

Conflict of interest

The authors declared that there is no competing financial interest in this research.

Data availability statement

The data used and/or analyzed throughout the present study are available from the authors upon reasonable request.

Authors’ contributions statement

Gökalp Çınarer and Kazım Kılıç: Methodology, Formal analysis, Visualization, Writing – original draft, Writing – review & editing. Murat Kadir Yeşilyurt and Ümit Ağbulut: Conceptualization, Investigation, Methodology, Data curation, Validation, Resources, Formal analysis, Visualization, Writing – original draft, Writing – review & editing. Zeki Yılbaşı: Investigation, Visualization, Writing – original draft, Writing – review & editing.

Ethics approval

The authors declared that no animal and human studies are presented in this manuscript and no potentially identifiable human images or data are given in this research.

References

All Tables

Table 1

Previous literature summary dealing with the forecasting of transportation-based energy and carbon emissions.

Table 2

Input parameters and their effects on the output parameters, along with data source used in this study.

Table 3

Some of the statistical parameters of the dataset used in this study.

Table 4

Correlation indicators of variables with respect to each other used in the present study.

Table 5

Statistical indicators of XGBoost, SVM, and MLP algorithms for scenario 1 to predict CO2 emissions.

Table 6

Statistical indicators of XGBoost, SVM, and MLP algorithms for scenario 2 to predict CO2 emissions.

Table 7

Statistical indicators of XGBoost, SVM, and MLP algorithms for scenario 3 to predict CO2 emissions.

Table 8

Statistical indicators of XGBoost, SVM, and MLP algorithms for scenario 4 to predict CO2 emissions.

All Figures

thumbnail Fig. 1

The data used in this study.

In the text
thumbnail Fig. 2

Normalized data used in this study.

In the text
thumbnail Fig. 3

Popularity scores of ML algorithms worldwide over the past 5 years.

In the text
thumbnail Fig. 4

One hidden layer MLP.

In the text
thumbnail Fig. 5

Flowchart of the study.

In the text
thumbnail Fig. 6

EU carbon permits data chart [89].

In the text
thumbnail Fig. 7

Change of CO2 emissions from transportation between 1970 and 2016 in Türkiye.

In the text
thumbnail Fig. 8

The variation of number of registered motor vehicles in Türkiye according to the year (The graph was plotted based on the data coming from the ref [52].

In the text
thumbnail Fig. 9

Prediction of transportation-based CO2 emissions using various ML algorithms a) scenario 1, b) scenario 2, c) scenario 3, d) scenario 4).

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.