Open Access
Issue
Sci. Tech. Energ. Transition
Volume 78, 2023
Article Number 15
Number of page(s) 9
DOI https://doi.org/10.2516/stet/2023011
Published online 12 July 2023

© The Author(s), published by EDP Sciences, 2023

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

In recent years, there has been a significant decrease in non-renewable energy, leading to the widespread development and utilization of renewable energy sources such as solar and wind power. Photovoltaic (PV) power generation is a clean and environmentally friendly form of energy that has been adopted in many regions [1]. However, due to the large fluctuations in PV power output, its integration with power grid systems presents various security challenges [2]. Therefore, accurate prediction of PV power generation is crucial to enhance the safety and stability of the power system.

Currently, PV power forecasting methods can be broadly classified into two categories: physical forecasting and statistical forecasting. The physical forecasting method involves predicting the PV power output based on the current weather and PV station model. However, this approach is greatly influenced by the PV station and lacks a guarantee of prediction accuracy. On the other hand, statistical prediction methods rely on the analysis of historical power and meteorological data collected from PV stations to forecast PV power output. Popular statistical forecasting methods include Markov chain [3], neural network algorithms [4], and extreme learning machine methods [5]. Markov chain prediction is based on predicting the state of the next time step using the previous state. Neural network algorithms utilize known data, self-learning, and continuous training to identify the mapping relationship between input and output, approximating any complex nonlinear function. Extreme learning machine methods are machine learning algorithms used to solve single hidden layer neural networks.

Currently, many experts and scholars focus on collecting artificial intelligence and intelligent algorithms to achieve improved prediction performance. Literature [6] proposed a photovoltaic power prediction method combining Support Vector Regression (SVR) and Kalman filter, but inaccurate parameter selection of SVR resulted in reduced prediction accuracy. Literature [7] combined the cuckoo search algorithm with SVR to enhance parameter selection yet failed to consider the missing data and abnormal cases. In literature [8], a BP neural network was optimized using the Sparrow Search Algorithm (SSA) to establish an SSA-BP PV power prediction model. Literature [9] combined genetic algorithms and BP neural networks, improving prediction accuracy to some extent but not in cloudy and rainy weather.

The shallow neural network algorithms used in the above literature are relatively basic, with insufficient learning depth, weak generalization ability, and vulnerable to local optima [10]. Hence, literature [1115] proposed using deep learning networks, such as Long Short-Term Memory (LSTM) or Deep Belief Networks (DBN), in photo-voltaic power prediction to enhance accuracy. Literature [11] combined Empirical Mode Decomposition (EMD), principal component analysis, and LSTM, but EMD decomposition had fuzzy aliasing, which affected prediction accuracy. Literature [12] proposed a prediction method combining fuzzy neural clustering and LSTM with good clustering and prediction accuracy but lower accuracy in rainy weather. Literature [13] combined Ensemble EMD (EEMD) and LSTM for power prediction of photovoltaic power stations, yet LSTM showed poor performance and slow speed in long sequence processing. Literature [14] proposed using the simulated annealing algorithm to optimize short-term load prediction of deep confidence networks, reducing the hysteresis of the predicted value but requiring improved prediction accuracy for peak and trough load values. Literature [15] proposed a short-term power prediction method combining DBN and Kalman filtering algorithm, modifying the predicted value and reducing prediction error to improve accuracy. However, these studies considered only a single prediction algorithm, ignoring the influence of combining two algorithms on prediction performance. Combining both long and short-term neural networks and deep confidence networks in photovoltaic power prediction methods would enhance the prediction system.

Hence, this study proposes a novel photovoltaic power prediction method that combines Long Short-Term Memory (LSTM) and Deep Belief Network (DBN) algorithms with intelligent algorithms. The approach utilizes the Fast Correlation-Based Feature Selection (FCBF) to identify meteorological input features with strong correlations to photovoltaic power generation. Furthermore, the selected meteorological features are decomposed using the Adaptive White Noise Complete Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) to minimize data volatility. The Sparrow Search Algorithm (SSA) is employed to optimize the DBN model parameters, and the combined model of LSTM-SSA-DBN is utilized to forecast the CEEMDAN-decomposed data. The proposed model is validated through MATLAB simulation experiments, demonstrating its superiority in terms of prediction accuracy compared to other models.

2 Preliminary data processing

Preprocessing the original data is an essential step to improve the accuracy of power prediction. Selecting features with strong correlations to photovoltaic power and decomposing the original data can enhance prediction accuracy. To ensure that the selected features are suitable for prediction, this study employs the FCBF and CEEMDAN methods to preprocess the original data.

2.1 Selection of the meteorological factors

The fast correlation filtering method uses Symmetric Uncertainty (SU) to select the meteorological factors. SU is expressed as(1)where SU(ω nj , P) is the correlation between the meteorological characteristics ω nj and the photovoltaic power category P, E(ω nj ) is the information gain, E(P) is the information entropy, and G is the empty set in step 1 of the FCBF algorithm is described below.

By determining the correlation between the features and categories as well as the relationship between the features, the feature subset having a high category correlation is selected, and the redundant features are deleted. In this study, photovoltaic power was taken as the category, and the total radiation, wind speed, radiation temperature, air pressure, and relative humidity were taken as the characteristics. It was assumed that the feature set of the photovoltaic power station and the class of photovoltaic power was as shown in Equations (2) and (3), respectively.(2)where i is the data of the j feature of the weather vector, N is 100, and D is 6.(3)where p i is the photovoltaic power category data of the ω ij weather vector.

The steps of the FCBF algorithm for selecting the input meteorological characteristics are as follows:

Step 1: Correlation analysis. Set the empty sets G and Q. Set the forecast threshold σ of FCBF as 0.4, calculate the correlation, SU(ω nj , P) between the W features of the meteorological dataset and the photovoltaic power category, P. If SU(ω nj , P) < σ, then delete the features ω nj and put the undeleted feature variables into G.

Step 2: Redundancy analysis. Arrange the characteristic variables in G according to the correlation of SU(ω nj , P) from small to large and put the one with the greatest correlation into Q. Then, calculate the correlation between the remaining characteristic variables and those in the set G. If SU(ω nj , ω ij ) > SU(ω nj , P), then remove from G.

Step 3: Go back to Step 2 until a final set of meteorological features is obtained.

Twenty groups of original data values were selected to take an average value for the FCBF calculation, and the obtained FCBF values are shown in Table 1. The time-varying curve of the photovoltaic power and meteorological factors is shown in Figure 1.

thumbnail Fig. 1

Time-varying curve of photovoltaic power and the different meteorological factors.

Table 1

FCBF values of each influencing factor.

As can be seen from Table 1 and Figure 1, the total radiation amount, wind speed, and radiation temperature have a strong correlation with photovoltaic power generation. Therefore, these three factors were selected as the input factors of the prediction model.

2.2 CEEMDAN data decomposition

CEEMDAN is an enhanced decomposition method based on the empirical mode decomposition algorithm. It decomposes photovoltaic power and its influencing factors into multiple non-stationary and less linear sub-sequences. For detailed decomposition steps, refer to the literature [16]. Figure 2 shows the CEEMDAN decomposition results on a sunny day.

thumbnail Fig. 2

This is a figure Decomposition results of photovoltaic power data.

The high-frequency component obtained by CEEMDAN decomposition directly reflects the random characteristics of power changes, while the low-frequency component and remaining component represent the influence of slow-changing related factors, such as meteorological factors, on power data. In the decomposed frequency domain, the low-frequency component of photovoltaic power data accounts for a relatively high proportion, and its corresponding component changes relatively slowly, making the prediction process simple. Hence, a prediction model with a relatively simple structure is selected to forecast the low-frequency component. Additionally, to obtain high-precision overall power prediction results, processing the high-frequency components is crucial. Since the variation amplitude and frequency of the high-frequency component are large, a higher-precision prediction model is required.

Considering the above analysis, IMF1-IMF4 are selected as the high-frequency components, while the remaining are identified as the low-frequency components. As predicting the high-frequency component is challenging, a new neural network of SSA-DBN is chosen as the prediction model for the high-frequency component. On the other hand, a simple LSTM network model is selected to forecast the low-frequency component. The final prediction results are obtained by superimposing the photovoltaic power prediction.

3 Principles of deep learning

Deep learning networks can alleviate the shortcomings of shallow neural networks, such as local optimization overfitting and weak generalization performance. LSTM and DBN, which are typical representatives of deep learning networks, have been widely used in prediction models.

3.1 LSTM prediction model

The LSTM network is a prediction model that improves on the Recurrent Neural Network (RNN), which efficiently solves the problem of gradient explosion and gradient disappearance of the classical RNN networks [17]. The LSTM network consists of a “gated structure” with short-term memory (input gate, forget gate, and output gate) and a cell state with long-term memory [18]. Its structure is shown in Figure 3.

thumbnail Fig. 3

Schematic showing the network structure of the LSTM.

The gated structure of the entire LSTM network and the cell state of the long-term memory are expressed as follows (Eqs. (4)(8)):

In-gate: (4)

Forgotten Gate:(5)

Out-gate:(6)

Memory unit:(7)

Output value:(8)where σ is the sigmoid activation function, W is the weight matrix, b is the offset parameter, x t is the cell input at time t, ht is cell output at the current time t, and C t is the state at time t.

3.2 Optimization of the DBN model by SSA

The DBN model, proposed by Geoffrey Hinton, is composed of multiple hidden layers and a regression layer [19]. The DBN algorithm first trains the Restricted Boltzmann Machine (RBM) of each layer and then uses the Backpropagation (BP) algorithm to fine-tune and optimize the initial value obtained by pretraining, and finally provides the predicted result. Through unsupervised hierarchical pretraining and supervised tuning, DBN can deal with the nonlinear nature of photovoltaic power generation and obtain better prediction accuracy [20]. The DBN model structure is shown in Figure 4.

thumbnail Fig. 4

Schematic showing the model structure of DBN.

However, the accuracy of the DBN prediction model completely depends on the manual adjustment of the network parameters. The parameter adjustment procedure lacks a theoretical basis, its accuracy cannot be guaranteed, and the process is complicated. In order to overcome the above limitations, the SSA was introduced in this work to adjust the parameters of each network structure of the DBN.

The structure of the SSA consists of a discoverer, an entrant and reconnaissance, and an early warning mechanism [21]. Using the sparrow’s strong search ability and efficient parameter optimization effect, the network structure parameters of the DBN were dynamically adjusted. The SSA algorithm was used to optimize the number of neurons in the two hidden layers of the DBN and the learning rate of the entire DBN network in order to find the optimal solution of the objective function and obtain the optimal parameters of the DBN network structure.

The entire process of optimizing the DBN structure by SSA is shown in Figure 5.

thumbnail Fig. 5

Flowchart showing the optimization procedure of the DBN structure using SSA.

3.3 Error evaluation index

This paper uses two commonly used evaluation metrics, namely Root Mean Square Error (RMSE) and coefficient of determination (R2), to assess the accuracy of the prediction results. As these metrics are widely known, further explanation is not necessary here.

4 Process design of the combined prediction model

After presenting the data processing and decomposition procedures and the development of the deep learning model, this section presents the prediction method combining SSA-DBN and LSTM to solve the problems of high volatility and strong randomness of photovoltaic power generation. The specific process design of the combined prediction algorithm is shown in Figure 6, which is divided into four specific steps: data preprocessing, data decomposition, prediction and superposition of each component, and error analysis.

thumbnail Fig. 6

Flowchart of the combined prediction algorithm.

Data preprocessing: The FCBF method was used to extract the meteorological factors related to photovoltaic power generation, and the redundant meteorological factors were removed to prepare the data for photovoltaic power generation prediction.

Data decomposition: The CEEMDAN method was used to decompose the selected data, and the decomposed components were categorized into high-frequency and low-frequency components (IMF1–IMF4 are high-frequency components, and IMF5–RE are low-frequency components), which reduced the volatility of the data. By performing the decomposition of data by CEEMDAN, the problem of decreasing the prediction accuracy caused by data fluctuation was solved.

Data prediction: The high-frequency component of CEEMDAN decomposition was input into the SSA-DBN model, and the low-frequency component was input into the LSTM model for prediction. The prediction results were superposed to obtain the final prediction results.

Error analysis: Two error indices, Root Mean Square Error (RMSE) and coefficient of determination (R2), were used for analyzing the error of the prediction results.

5 Simulation and analysis

To assess the accuracy of the proposed prediction model, this study utilized the monitoring data of a photovoltaic power station located in Hubei as the research subject. The sample data from April was used as training data, with the total radiation amount, radiation temperature, and wind speed selected by FCBF used as input, and the radiation amount used as an output to establish the mathematical model. According to the radiation amount, the weather in May was divided into three categories: sunny, cloudy, and rainy, as shown in Figure 7. The sample data from May was used as the test set. Since photovoltaic power generation typically occurs during daytime hours, this study focused on the time period from 7:00 to 19:00 for power prediction, with data recorded every 15 min. MATLAB software was used to predict the changes in the three weather types from 7:00 to 19:00 in the future, and the simulation results of each prediction model were compared and verified. The evaluation indexes used were the root mean square error (RMSE) and the fitting determination coefficient R2, which are common evaluation metrics in this field.

thumbnail Fig. 7

Irradiance under various weather conditions.

Table 2 displays the model parameters, including the network structure parameters, maximum iteration times, and learning rates of each prediction model.

Table 2

Parameter Settings of the prediction model.

This paper conducted two sets of comparison experiments to verify the benefits of FCBF and the combined model. In the first set of experiments, the Fusion Model (FM) without feature selection was compared with the combination algorithm that includes feature selection. The prediction results for sunny, rainy, and cloudy weather are presented in aFigures 8a8c. In the second set, BP, LSTM, SSA-DBN, and the proposed algorithm were compared to highlight the advantages of the combined model. The prediction results for sunny, rainy, and cloudy weather are shown in Figures 9a9c. Additionally, the forecast errors are presented in Figures 10a10c.

thumbnail Fig. 8

Experiment 1 predicted the results. (a) Sunny weather, (b) rainy days, (c) cloudy.

thumbnail Fig. 9

Experiment 2 predicted the results. (a) Sunny weather, (b) rainy days, (c) cloudy.

thumbnail Fig. 10

Experiment 2 prediction error. (a) Sunny weather, (b) rainy days, (c) cloudy.

The results depicted in Figure 8 indicate that the prediction model without the FCBF algorithm for meteorological feature selection showed significant deviations from the actual value due to the inclusion of irrelevant input factors, such as relative humidity and air pressure, leading to reduced prediction accuracy. In contrast, the proposed combined algorithm employs the FCBF algorithm to screen input factors and select only those that are highly correlated with photovoltaic power generation, reducing data redundancy and improving prediction accuracy.

In Experiment 2, the comparison between the single and combined models demonstrated that the combined model exhibits the best degree of fitting with the actual value under different weather types. Particularly, for cloudy weather type, the prediction error of the proposed model ranges between –10 MW and 10 MW, whereas other models exhibit errors between –50 MW and 30 MW. These results suggest that the combined model has better prediction accuracy than other models.

The results presented in Figure 10 indicate that even in regular sunny weather conditions, BP also exhibits a certain deviation from the actual value. This can be attributed to the shallow neural network structure of BP with limited generalization ability, making it susceptible to local optimization problems. By contrast, the combined prediction model leverages SSA-DBN and BP models to predict the high and low-frequency components of CEEMDAN decomposition, respectively, thus, achieving more accurate prediction results.

The experimental results and error analysis of the two comparison groups indicate that the FCBF and combination prediction models perform better than FM and single models in terms of prediction accuracy. These results validate the proposed predictive model measurement method. Moreover, to quantitatively assess the accuracy of the error model, we use different error evaluation indices to calculate the error of the prediction results for each model. The calculation results for each model are presented in Table 3.

Table 3

Error evaluation indexes of a single model and combination model.

The following conclusions can be drawn from the error calculation results of each model in Table 3:

  1. The RMSE results of the combined LSTM and SSA-DBN models were generally lower than those of the individual models (BP, LSTM, and SSA-DBN), as indicated by the two error indicators of RMSE and R2. Under similar weather conditions, the prediction results in this paper reached a fitting of more than 95% according to the R2 calculation results.

  2. When using only BP, LSTM, and SSA-DBN methods for prediction, the overall prediction error is largely due to the complexity of the original data. The predicted curves of BP, LSTM, and SSA-DBN change gently, and when the generation power changes drastically, their predicted curves differ greatly from the actual power curves. Particularly in the case of cloudy weather, the combined model showed an RMSE reduction of 15.23%, 17.448%, and 5.583%, respectively, compared to BP, LSTM, and SSA-DBN.

  3. The accuracy of the proposed combined model, which utilizes FCBF feature screening, is improved by approximately 5.4% compared to the FM without feature screening. However, FM produced better prediction results than the other models, demonstrating the superiority of the LSTM and SSA-DBN combination model.

  4. The calculation results of RMSE and R2 for the single SSA-DBN model are worse than those of the proposed model. The RMSE index increased by 19.34%, 19.07%, and 5.558% on sunny, rainy, and cloudy days, respectively, when compared to the proposed model.

Overall, the proposed prediction method in this paper shows lower RMSE values compared to BP, LSTM, SSA-DBN, and FM models, while the R2 fitting coefficient is generally higher than the other models, which supports the validity and reliability of the proposed method.

6 Conclusion

In this study, a combined photovoltaic power prediction method based on SSA-DBN and LSTM was proposed in view of the characteristics of large power fluctuation and strong randomness of photovoltaic power generation. The total radiation, radiation temperature, and wind speed were selected using the fast correlation filtering algorithm as the input and the generating power as the output. Using the actual data of a photovoltaic power station in Hubei province, the short-term prediction of the photovoltaic power station for a future day was realized based on three weather types: sunny, rainy, and cloudy. The following are the key points of this study:

  1. The FCBF algorithm was used to eliminate the irrelevant factors and select the meteorological factors with a strong correlation as the input. The input factors of total radiation amount, radiation temperature, and wind speed were selected by calculation which improved the prediction accuracy to a certain extent.

  2. The CEEMDAN algorithm was used to decompose the input data and decompose it into multiple subsequences to reduce the volatility of the data.

  3. Using a combination of SSA-DBN and LSTM for prediction, the comparison between the simulation results and the actual data showed that the combined model proposed in this study provides a better prediction for the different weather types.

References

  • Comello S., Reichelstein S., Sahoo A. (2018) The road ahead for solar PV power, Renewable Sustainable Energy Rev. 92, 744–756. [CrossRef] [Google Scholar]
  • Lin P., Peng Z., Lai Y., Cheng S., Chen Z., Wu L. (2018) Short-term power prediction for photovoltaic power plants using a hybrid improved Kmeans-GRA-Elman model based on multivariate meteorological factors and historical power datasets, Energy Convers. Manage. 177, 704–717. [CrossRef] [Google Scholar]
  • Li J., Luo Y., Yang S., Wei S., Huang Q. (2021) Review of renewable energy power uncertainty prediction methods, High Volt. Technol. 47, 4, 1144–1157. [Google Scholar]
  • Sobri S., Koohi-Kamali S., Abd Rahim N. (2018) Solar photovoltaic generation forecasting methods: A review, Energy Convers. Manage. 156, 459–497. [CrossRef] [Google Scholar]
  • Yao X., Mao S. (2023) Electric supply and demand forecasting using seasonal grey model based on PSO-SVR, Grey Syst. Theory Appl. 13, 1, 141–171. [CrossRef] [Google Scholar]
  • Yu N., Li X., Fei K., Ren J., Ni X. (2020) Based on SVR – UKF photovoltaic power station power prediction, Autom. Instrum. 246, 4, 73–77. [Google Scholar]
  • Wang X., Liu J., Hu B., Zheng L. (2020) Based on CS – SVR model of short-term wind power prediction, Comput. Meas. Control. 28, 1, 152–155. [Google Scholar]
  • Li M., Gu Y., Zhang Y., Gao X., Wei G. (2023) Quantitative prediction of ternary mixed gases based on an SnO2 sensor array and an SSA-BP neural net-work model, Phys. Chem. Chem. Phys. 25, 10935–10945. [CrossRef] [PubMed] [Google Scholar]
  • Li N., Wang Y., Ma W., Xiao Z., An Z. (2022) A wind power prediction method based on DE-BP neural network, Front. Energy Res. 10, 15–25. [Google Scholar]
  • Liu W., Guo Z.-Q., Wang D., Liu G.-W., Jiang F., Niu Y.-J., Ma L.-X. (2023) Whales algorithm and its weights in the shallow layer neural network search threshold optimization, J. Control Decis. 38, 4, 1144–1152. [Google Scholar]
  • Zhang Y., Cheng Q., Jiang W., Liu X., Shen L., Chen Z.H. (2021) Photovoltaic power prediction model based on EMD-PCA-LSTM, Acta Energiae Solaris Sinica 42, 9, 62–69. [Google Scholar]
  • Zhang L., Wang X., Wu H., Xie L., Teng Y., Wei Y. (2023) Based on FCM and LSTM photovoltaic power short-term prediction, Power Supply 40, 1, 10–17. [Google Scholar]
  • Huang Y., Zhang X., Yang L. (2019) Short-term wind speed prediction based on EEMD-LSTM, J. Phys. Conf. Ser. 1314, 012105. [Google Scholar]
  • Liu D., Zhou L., Zheng X. (2021) Super short-term power load forecasting based on SA - DBN, J. Guangxi Normal University (Natural Science Edition) 33, 4, 21–33. [Google Scholar]
  • Zhang C., He Y., Jiang S., Wang T., Yuan L., Li B. (2019) Transformer fault diagnosis method based on self-powered RFID sensor tag, DBN, and MKSVM, IEEE Sens. J. 19, 18. [Google Scholar]
  • Garai S., Paul R.K. (2023) Development of MCS based-ensemble models using CEEMDAN decomposition and machine intelligence, Intell. Syst. Appl. 18, 200202. [Google Scholar]
  • He Y., Zhou C., Hu Y. (2023) Application of LSTM method combined with feature optimization in chiller failure detection, J. Phys. Conf. Ser. 2442, 012026. [Google Scholar]
  • Wang J., Hao S., Li S., Wang T.-Z., Zhang W. Prediction of wind farm group power based on ES-GRU-LSTM, Comput. Technol. Automat. 202, 37–41 (in Chinese). [Google Scholar]
  • Zeng W., Cao Y., Feng L., Fan J., Zhong M., Mo W., Tan Z. (2023) Hybrid CEEMDAN-DBN-ELM for online DGA serials and transformer status forecasting, Electr. Power Syst. Res. 217, 109176. [CrossRef] [Google Scholar]
  • Xue J. (2020) Research and application of a new swarm intelligence optimization technique, Donghua University. [Google Scholar]
  • Jiankai X., Bo S. (2020) A novel swarm intelligence optimization approach: sparrow search algorithm, Syst. Sci. Control Eng. 8, 1, 22–34. [CrossRef] [Google Scholar]

All Tables

Table 1

FCBF values of each influencing factor.

Table 2

Parameter Settings of the prediction model.

Table 3

Error evaluation indexes of a single model and combination model.

All Figures

thumbnail Fig. 1

Time-varying curve of photovoltaic power and the different meteorological factors.

In the text
thumbnail Fig. 2

This is a figure Decomposition results of photovoltaic power data.

In the text
thumbnail Fig. 3

Schematic showing the network structure of the LSTM.

In the text
thumbnail Fig. 4

Schematic showing the model structure of DBN.

In the text
thumbnail Fig. 5

Flowchart showing the optimization procedure of the DBN structure using SSA.

In the text
thumbnail Fig. 6

Flowchart of the combined prediction algorithm.

In the text
thumbnail Fig. 7

Irradiance under various weather conditions.

In the text
thumbnail Fig. 8

Experiment 1 predicted the results. (a) Sunny weather, (b) rainy days, (c) cloudy.

In the text
thumbnail Fig. 9

Experiment 2 predicted the results. (a) Sunny weather, (b) rainy days, (c) cloudy.

In the text
thumbnail Fig. 10

Experiment 2 prediction error. (a) Sunny weather, (b) rainy days, (c) cloudy.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.