Issue 
Sci. Tech. Energ. Transition
Volume 79, 2024



Article Number  27  
Number of page(s)  11  
DOI  https://doi.org/10.2516/stet/2024024  
Published online  29 April 2024 
Regular Article
Comparative analysis of the performance of supervised learning algorithms for photovoltaic system fault diagnosis
^{1}
Faculty of Engineering Electronics and Communications Department, Misr University for Science and Technology, 6th of October City, Giza, Egypt
^{2}
Faculty of Engineering Electronics and Communications Department, Cairo University, Giza, Egypt
^{3}
Faculty of Engineering Electrical Power Department, Cairo University, Giza, Egypt
^{*} Corresponding author: ghada.shaban@must.edu.eg
Received:
13
December
2023
Accepted:
31
March
2024
New trends were introduced in using PhotoVoltaic (PV) energy which are mostly attributable to new laws internationally having a goal to decrease the usage of fossil fuels. The PV systems efficiency is impacted significantly by environmental factors and different faults occurrence. These faults if they were not rapidly identified and fixed may cause dangerous consequences. A lot of methods have been introduced in the literature to detect faults that may occur in a PV system such as using CurrentVoltage (IV) curve measurements, atmospheric models and statistical methods. In this paper, various machine learning techniques in particular supervised learning techniques are used for PV array failure diagnosis. The main target is the identification and categorization of several faults that may occur such as shadowing, degradation, open circuit and short circuit faults that have a great impact on PV systems performance. The results showed the technique’s high ability of fault diagnosis capability. The KNearest Neighbor (KNN) technique showed the best fault prediction performance. It achieves prediction accuracy of 99.2% and 99.7% Area Under CurveReceiver Operating Curve (AUCROC) score. This shows its superiority in fault prediction in PV systems over other used methods Decision Tree, Naïve Bayes, and Logistic Regression.
Key words: PV / Machine learning / Logistic regression / Decision tree
© The Author(s), published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
In the presence of different energy sources, solar PhotoVoltaic (PV) system have grown rapidly in recent years [1]. In these systems, several type of faults may occur due to environmental changes and the possible failure that may occur in manufacture, transportation, installation, and other processes [2, 3]. These appearing faults may cause serious safety risks [4] including fire occurrence, electrical shocks, and physical harm. They also have a serious impact on the amount of power generated. Consequently, it is essential to perform efficient PV fault detection and diagnosis.
PV electrical data are used for fault diagnosis such as output power, output voltage, current at the DC or AC side, and the currentvoltage characteristic (IV curve) [5]. The fault diagnosis using the IV curve is an important method [6] because the IV curve gives rich information about the health state of PV modules. IV tracers could give the measurement for a single module, smallscale string or array when there is a need for getting the IV curves. The solution of the hardware for the measurement of IV curves periodically at the power plant level has become accessible commercially [7, 8]. Diagnosis of faults using the IV curve was used in a variety of ways in literature to extract data. It is possible to take the curve’s important features such as V_{OC}, I_{SC}, V_{MPP}, I_{MPP}, FF, R_{s}, and R_{sh} to be used for fault diagnosis using threshold analysis, statistical methods, Machine Learning Techniques (MLT), etc. For demonstration, in [9], Partial Shading (PS), Short Circuit (SC), Open Circuit (OC) as well and failure of bypass diode were detected using V_{OC}, I_{SC}, V_{MPP}, and I_{MPP} characteristics. The threshold method and Artificial Neural Network (ANN) are both used to determine the faults. In [10], V_{OC}, V_{MPP}, I_{MPP}, and R_{s} that are extracted from the IV curve are used for short circuit fault, partial shading and degradation diagnosis. The parameters V_{MPP}, I_{MPP}, and P_{MPP} were also used to detect partial shading faults in [11]. In [12], the V_{MPP} parameter was used to identify the occurrence of partial shading using the voltage drop on the IV curves. Similar techniques can be seen in [13, 14]. Another technique is to determine the steps on the IV curves are calculate the curve’s first or second derivative. The steps were extracted by the analysis of the negative peaks on the IV curve and then a threshold is found to determine the partial shading and crack faults in [15]. IV curves were divided into low and highvoltage domains in [16]. More related studies can be found in [17, 18]. The third introduced technique was comparing full IV curves to simulated ones. This was applied in [19] by comparing the generated IV curves that was inspected from the double diode model with the measured ones which were used to diagnose faults like partial shading and degradation using threshold analysis. Also Partial shading, ground faults and short circuits were identified using the same method in [15].
Another applied technique for fault diagnosis and classification in PV systems is machine learning. Artificial Intelligence (AI) refers to computer programs that emulate human cognitive functions to carry out sophisticated activities like data analysis, language translation, and decisionmaking that were previously limited to human performance. A branch of AI known as machine learning involves training algorithms on data sets to create models that can carry out particular tasks. Machine learning models are able to do complicated tasks like sales forecasting, image sorting, and big data analysis. These models have been used in several applications showing significant results. These applications are the medical ones which can help in saving many human lives. Convolutional Neural Networks (CNNs) have been used in classifying normal and cancerous cells to provide early detection of Acute Lymphoblastic Leukaemia (ALL) [28]. This method was able to achieve high accuracy of prediction. Another application where the MLT which is applied in diagnosing the symptoms of Coronary Artery Disease (CAD) [29]. Many models of machine learning have been applied in the prediction of CAD symptoms which is one of the primary causes of death worldwide. These models were Neural Networks (NN), Support Vector Machines (SVM), Random Forests (RF), Logistic Regression (LR), and KNearest Networks (KNN). MLT have been also applied in the diagnosis of malaria which continues to be a major global health issue, with a high case incidence and substantial annual death toll [30]. Effective treatment for malaria depends heavily on early detection and precise diagnosis. The purpose of this study was to evaluate how well CNNs and conventional NNs performed in the identification and categorization of various malaria kinds using images from blood smears. The CNN model outperformed other diagnostic models, providing improved precision and consistency when categorizing cases of malaria. MLT have also been used in the precise COVID19 diagnosis and evaluation [31]. This study showed that deep learning approaches have a substantial practical potential to offer a precise and effective intelligent system for identifying and gauging COVID19 severity.
These techniques allow software applications to be able to make more accurate predictions. This is done by providing prediction models that are based on statistical techniques. The input data is processed to get the predicted output and then updated when new inputs are available. Two categories are available in machine learning which are supervised learning and unsupervised learning. They are used depending on the type of problem considered. These techniques use different groups of approaches to make accurate prediction models [20].
In supervised learning the input and output pairs are provided for categorization. To get the desired results, some algorithms are used such as regression, Decision Tree (DT), RF, KNN and logistic regression [21].
Unsupervised learning produces predictive models even when there is no labeled pair of inputoutput to use. Depending on the problem, some techniques are used such as kmeans for clustering problems, the Apriori algorithm [22] for learning problems and many more.
2 Literature on PV fault diagnosis techniques
Most prediction models depend on knowing the incident radiation on the PV solar park and from these values the electrical power generated was determined. The curves formulated by the PV panels were used as the sources of data along with several formulas and correlations [23]. In order to give crucial information for the prediction of faults, prediction models often depend on evaluating statistical data generation over time and longterm meteorological data [24]. In research [5], neural network techniques were used to forecast the produced energy by PV systems. The temperature of the PV modules has also been predicted in [25]. The PV prediction models are categorized as shown in Figure 1.
Fig. 1 PV classification models and strategies for prediction. 
2.1 Machine learning techniques
These techniques are based on artificial intelligence methods. These techniques need a lot of input data to give high prediction accuracy of PV fault diagnosis. Some of the techniques used for machine learning approaches are listed below:

ANN: Networks of Multilayer Perceptron type are used in most of the research. ANNbased methods for predicting faults in PV systems are given a lot of attention [5].

SVM: They are used to diagnose faults in PV systems using a time series analytic method, and a lot of interest is given to these techniques [22].
2.2 Hybrid models
To increase the advantages of several approaches and improve the accuracy of fault prediction, some models were used integrated together. For example, systems like neurofuzzy combine fuzzy logic with the ability to learn a neural network. Adaptive NeuroFuzzy Inference Systems (ANFIS) have been used for fault diagnosis as well [26]. Another hybrid model that combines both Back Propagation Neural Network and Particle Swarm Optimization technique (BPNNPSO) has also been used for fault diagnosis in PV systems [5].
2.3 Atmospheric models
These models are interested in the metrological variables related to the forecast. These variables are acquired from various metrological institutes that give numerical forecasting programs. Environmental behavior affects greatly the fault occurrence in PV systems. These meteorological factors include humidity, temperature, air pressure, shading, and wind speed.
3 Methodology
Several techniques were used for PV fault diagnosis had been used as illustrated in Section 2. In this paper, some deep learning techniques were applied for fault diagnosis in PV system. The system used here is a microgrid that was designed to show the effect of using supervised deeplearning techniques in fault diagnosis [27]. In order to study which supervised learning model gives the most accurate predictions, several supervised learning models have been applied. These models will be suitable for simulation and implementation for other PV systems in the future and will lead to the system’s improved performance through avoiding different faults effects.
Four supervised learning techniques are used in this study for PV faults diagnosis: KNN, LR, DT, and Naïve Bayes (NB). The framework of the used methodology is shown in Figure 2.
Fig. 2 The proposed methodology’s framework. 
The techniques used were developed using a data mining tool for machine learning and data visualization “Orange” integrated with MATLAB for numerical computations and algorithms. The four algorithms and the error metrics are discussed in the following sections.
3.1. Knearest neighbors
KNN algorithm can be used for both classification and regression problems. The KNN algorithm assumes that objects belong to the same class as their nearest neighbors. The method needs the value of a positive integer k. The algorithm gives k points on the dataset having a similar pattern to the sample (called KNN) [15]. The KNN algorithm works by detecting the neighbors when used in the prediction applications [21]. The values of the secondary current I_{sec} and the load power are the key features. These data are then added together in a matrix X_{ ij }, where each row represents a feature vector for a specific period. The feature vector y_{ j } that represents the nearest neighbor for a new data point at time t, is then compared with all the rows in X_{ ij } and the vector of Euclidean distances d_{ i } gives:(1)
The distance values are sorted in ascending order and the first k matches are found. The average value of all the variable’s numerical values of the KNN is used as the value for y_{ j }.
3.2 Logistic regression
The main purpose of using supervised machine learning, and LR is to solve classification problems. The objective is to estimate the probability that a certain instance belongs to a class. It is considered a statistical method that studies the correlation between a group of independent variables and a set of binary dependent variables. It is an effective technique for making decisions. The formulas that are used to represent this logistic function are shown below which are also known as the log odds or the natural algorithm of odds:(2) (3)
The beta parameter is estimated using maximum likelihood estimation (MLE) in this model. It is important to test the model’s accuracy. The HosmerLemeshow test is a technique for evaluating the model’s prediction accuracy.
3.3 Decision tree
This is a supervised learning technique that is used for regression and classification applications. It has a hierarchy organization, containing root nodes, branches, internal nodes and leaf nodes. This hierarchy is shown in Figure 3.
Fig. 3 A decision tree diagram. 
All the outputs in the dataset are represented as leaf nodes. Pruning is the method used for preventing overfitting and complexity reduction. Crossvalidation is then used to test the accuracy of the model.
The characteristic having the least entropy is used to choose the best feature to get the best DT. The characteristic that gives the best results in the classification of the training data will result in the best split. The formulas of entropy and information gain that are used for classification are described in equations (4) and (5).(4)where S represents the dataset that entropy is calculated; c represents the classes in set S; p(c) represents the proportion of data points that belong to class c to the number of total data points in the set, S (5)where a represents a specific attribute or class label; Entropy(S) is the entropy of the dataset, S; S_{V}/S represents the proportion of the values in S_{V} to the number of values in the dataset, S; Entropy (S_{V}) is the entropy of the dataset, S_{V}.
3.4 Naïve Bayes
Naïve Bayes algorithms is a supervised learning technique used for classification and is based on the Bayes theorem. It is used for the classification of large datasets. It is one of the most efficient classification algorithms available. It makes predictions based on the prediction that a data sample will occur.
The Bayes theorem, commonly known as the Bayes Rule or Bayes law, is used to calculate the probability of a hypothesis given some prior information. The conditional probability is used.
The Bayes theorem/s formula is as follows:(6)where P(A/B) is Posterior probability: Probability of hypothesis A on the observed event B. P(B/A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true. P(A) is Prior probability: Probability of hypothesis before observing the evidence. P(B) is Marginal probability: Probability of evidence.
3.5 Evaluation metrics
Performance evaluation metrics such as F1 score, AUCROC, precision, and recall were used to determine how well the models predicted faults. The ratio of true positives to all the true positives and false negatives is called the recall.(7)
The ratio of true positives that were accurate is known as precision.(8)
Another evaluation metric is the Fscore or F1 score. It is calculated using both Precision and Recall. The following formula describes the F1 score:(9)
The AUCROC curve is used to display the results of the classification model on graphs. It is a significant indication of the effectiveness of the model. The ROC (Receiver Operating Characteristic) curve is a graph that shows how well a model of classification works at various threshold values. It is plotted between the True Positive Rate and the False Positive Rate. The 2D area under the curve of ROC (AUC) is calculated.
4 Data analysis of PV system in different conditions
Referring to a previously designed model of a microgrid PV system in [27], the dataset obtained is used for validation and showing the ability of the supervised learning algorithms for the prediction of different faults. Figure 4 shows the constructed simulation blocks, where S_{1} and S_{2} describe PV strings with eight 330 W PV modules each with inputs temperature and irradiance.
Each module gives a simulated voltage V_{dc,s} and also a simulated current I_{dc,s} which are inputs to a voltage output boost converter (B_{s}) that uses the Maximum Power Point Tracking(MPPT) algorithm (with S_{1} and S_{2}). The output is then routed into a full bridge inverter (J_{1}), that transforms the DC into a singlephase output with 127 V and 60 Hz that is connected to a model utility grid.
To simulate the faults that may occur in the system considered above, some elements are used such as switches that simulate open circuit faults, resistors that simulate string degradation, a variable that simulates partial shadowing and switches to simulate short circuit faults. The simulation was used to make a training dataset that included data for a full range of temperature (T) and irradiance (G) for each of the five conditions. The temperature is bound to 85 °C, to be the maximum working temperature of the PV module while the lower temperature is set at 5 °C. The range of irradiance is set between 100 W/m^{2} and 1000 W/m^{2} corresponding to the time interval between starting the inverter operation and the peak of power generation. For each of the faults under consideration, the temperature was simulated in 19 stages of 50 W/m^{2}. In total, 10,000 samples were produced by this configuration.
4.1 Classification of faults
The fault diagnosis provides the user with information about the sources of any faults that were discovered. The performance of the four most popular supervised MLT was tested. A feature vector is formed containing the input variables of these algorithms.(10)
The faults that are determined and diagnosed are short circuit faults, open circuit faults, degradation, and shadowing. The used dataset includes 16 days of data from an ongrid PV plant that works normal conditions and when faults occur. These faults were labeled as shown in Table 1.
Normal and faulty conditions given values.
5 Results and discussion
5.1 Steps for algorithms application
Four machine learning algorithms are applied for fault diagnosis in the PV system based on the analysis performed in the previous sections. These algorithms are KNN, LR, DT, and NB. The input variables are used in these models such as solar irradiance, temperature, simulated voltage, and simulated current. In the training process, the crossvalidation method is applied. This method divides the training dataset into two groups, the first is used for training and the second is used to test the accuracy of classification. Using different data to create each part allows for several iterations of the procedure. Attention must be taken to make sure that the training and test groups contain samples of data from each category. The average of all iterations shows the result. In this work, the simulated dataset is divided into five equal parts and 80% of the data is used for training and a new sample of data is used each time. The parameters of each technique were changed during applying crossvalidation and they are set to the values that produce the best accuracy. The following are some parameter settings:

KNN: The KNN parameters were chosen such that: k = 2 neighbors and the distance used is the Euclidean distance.

LR: The regulation type in the LR method was set to Lasso (L1). This decreases the complexity of prediction.

DT: Minimum number of instances was set to 2 and the maximum tree depth was limited to 100. The stop condition was set to occur when the majority reaches 95%.

NB: According to the values of the samples, the probability that each sample belongs to a certain category is determined. The sample is then classified in the category having the highest probability.
The sampling type used for testing in all models is stratified fivefold crossvalidation.
The model is trained again if the obtained results are not accurate in predictions to give the output. The part of the data for testing is then applied. After making predictions, the predicted data and the actual data are then compared to determine the most accurate model.
5.2 Fault detection results
The output obtained after using the models to identify the faults in PV systems are shown in this section. The total number of instances obtained is 10,000. Of which, 8054 instances are the actual instances used for training. They are divided into sets of values for testing the ability of the model to correctly predicting the faults occurring in the PV system. The normal condition indexed “0” and faults indexed from “1” to “3” shown in Table 1 are those classified.
Figures 5–8 represent the confusion matrix of the results of fault detection obtained from the applied techniques which are KNN, LR, DT, and NB. Some other classification models were simulated such as SVM, RF, and NN. However, the focus of the results and the work was on the mentioned models as those are the ones that show the best results of fault detection.
Fig. 5 Confusion matrix for fault detection based on KNN model. 
Fig. 6 Confusion matrix for faults detection using LR model. 
Fig. 7 Confusion matrix for faults detection using DT model. 
Fig. 8 Confusion matrix for faults detection using NB Model. 
Figures 5–8 show the data that was used to calculate the overall efficiency of the models proposed for fault detection. Some results are concluded. The instances which are used for the test show the best accuracy in predicting the normal functioning of the system, the best ability in diagnosis of faults in the PV system, the least number of false positives and the least number of false negatives.
Table 2 shows the performance metrics that show the results of using the different models for the classification of faults occurring in the PV system. Referring to the given results, DT model outperforms the three alternative models across all statistical characteristics examined. It is shown that the AUCROC result is 99.4% using the DT model, 99.8% using the NB model, 99.9% using the LR model, and 99.7% using the KNN model. The precision using the DT model is 98.6%, using the NB model is 97.8%, using the LR model 97.8%, and using the KNN model 99.2%.
Performance metrics scores for classification of faults in PV system using different classification models.
All the results show high performance in the prediction of instances in their correct classes. The highest accuracy of prediction goes to the KNN technique over other techniques.
The ROC curve analysis for the classification of the faults in the PV system is used to display a classification model’s performance across all classification levels. Two parameters are plotted on this curve, the true positive and the false positive rates.
The ROC analysis here was done for all the types of faults to show the performance of the classification of different learning models. All the target classes 0, 1, 2, and 3 are considered for classification. Figures 9a–9e show the ROC analysis for all the classification models. The results of the ROC analysis all show the superiority of correct prediction for all the target classes for fault diagnosis in the PV system using the KNN algorithm over all the other used classification algorithms.
Fig. 9 a) ROC Analysis for prediction of target class (0); b) ROC Analysis for prediction of target class (1); c) ROC Analysis for prediction of target class (2); d) ROC Analysis for prediction of target class (3); e) ROC Analysis for prediction of target class (4). 
6 Conclusion
This study compares the effectiveness of supervised learning methods for diagnosing PV system faults using data from a gridtied photovoltaic plant that was operating both improperly and normally for 16 days [27]. The methods KNN, LR, DT, and NB were used to create four alternative models for prediction.
In literature, many techniques have been used for predicting faults in solar PV systems as discussed. The analysis made on these techniques shows that most studies conduct the use of MLT for PV fault diagnosis because of their high efficiency in estimation and prediction. Supervised Machine Learning algorithms are useful tools for the prediction of faults that lead to PV system degradation. In this work, the faults that were predicted in the PV system are the short circuit, degradation, open circuit, and shadowing. These faults happened over 16 days in a working PV system. The techniques used for predictions are (KNN, LR, DT, and NB), giving high accuracy for prediction by comparing the error metrics. The KNN technique showed a precision of prediction of 99.2% and the AUCROC result is 99.7%, outperforming the other methods of LR, NB, and DT for PV system fault diagnosis.
As a future work, formulating a small PV system as hardware can be done and showing the faults occurrence that affects this system. A complete study of the method of hardware design and fault types that can be shown to this system is intended to be made. Techniques that can be applied for eliminating the effects of these faults can also be applied.
Funding
The fund for this study was a personal fund only without any other outside help.
Conflicts of Interest
The authors declare that they have no competing interests.
Data availability statement
All data used in the paper are referred to in the references used in the paper.
Author contribution statement
HK and MM supervised and gave ideas for making this work appear as it is. Eng. GE had written and made all the research data to reach the results. All authors read and approved the final manuscript.
References
 Osmani K., Haddad A., Lemenand T., Castanier B., Ramadan M. (2020) A review on maintenance strategies for PV systems, Sci. Total Environ. 746, 141753. [Google Scholar]
 Khalil I.U., UlHaq A., Mahmoud Y., Jalal M., Aamir M., Ahsan M.U., Mehmood K. (2020) Comparative analysis of photovoltaic faults and performance evaluation of its detection techniques, IEEE Access 8, 26676–26700. [Google Scholar]
 Prasanna R., Karthik C., Chowdhury S., Khan B. (2022) Comprehensive review on modelling, estimation, and types of faults in solar photovoltaic system, Int. J. Photoenergy 2022, 3053317. [Google Scholar]
 Mellit A., Kalogirou S. (2022) Assessment of machine learning and ensemble methods for fault diagnosis of photovoltaic systems, Renew. Energy 184, 1074–1090. [Google Scholar]
 Eldeghady G.S., Kamal H.A., Moustafa Hassan M.A. (2023) Fault diagnosis for PV system using a deep learning optimized via PSO heuristic combination technique, Electr. Eng. 105, 2287–2301. [Google Scholar]
 Liu Y., Ding K., Zhang J., Lin Y., Yang Z., Chen X., Li Y., Chen X. (2022) Intelligent fault diagnosis of photovoltaic array based on variable predictive models and IV curves, Solar Energy 237, 340–351. [Google Scholar]
 Londoño C.D., Cano J.B., Jaramillo F., Valencia J.A., Velilla E. (2023) Outdoor and synthetic performance data for PV devices concerning the weather conditions and capacitor values of IV tracer, Data Brief 47, 109007. [Google Scholar]
 Padilla A., Londoño C., Jaramillo F., Tovar I., Cano J.B., Velilla E. (2022) Photovoltaic performance assess by correcting the IV curves in outdoor tests, Solar Energy 237, 11–18. [Google Scholar]
 Li B., Diallo D., MiganDubois A., Delpha C. (2022) Performance evaluation of IEC 60891:2021 procedures for correcting IV curves of photovoltaic modules under healthy and faulty conditions, Prog. Photovolt. Res. Appl. 31, 474–493. [Google Scholar]
 Raj R.D.A., Bhattacharjee S. (2020) An Inclusive Investigation of Potential Faults in Solar Photovoltaic Array, in: 2020 International Conference on Computer, Electrical & Communication Engineering (ICCECE), Kolkata, India, IEEE, pp. 1–6. [Google Scholar]
 Li B. (2021) Health monitoring of photovoltaic modules using electrical measurements, Dissertation, Université ParisSaclay. [Google Scholar]
 Dhimish M., Chen Z. (2019) Novel opencircuit photovoltaic bypass diode fault detection algorithm, IEEE J. Photovol. 9, 1819–1827. [Google Scholar]
 Ghazali S.N.A.M., Mohd A., Sujod M.Z. (2023) A comparative analysis of solar photovoltaic advanced fault detection and monitoring techniques, Electrica 23, 1, 137–148. [Google Scholar]
 Ibrahim A.L.W., Fang Z., Ameur K., Min D., Shafik M.B., AlMuthanna G. (2021) Comparative study of solar PV system performance under partial shaded condition utilizing different control approaches, Indian J. Sci. Technol. 14, 1864–1893. [Google Scholar]
 Delpha C., MiganDubois A., Diallo D. (2021) Fault diagnosis of photovoltaic panels using full IV characteristics and machine learning techniques, Energy Convers. Manag. 248, 114785. [Google Scholar]
 Lin P., Qian Z., Lu X., Lin Y., Lai Y., Cheng S., Chen Z., Wu L. (2022) Compound fault diagnosis model for Photovoltaic array using multiscale SEResNet, Sustain. Energy Technol. Assess. 50, 101785. [Google Scholar]
 Sarikh S., Raoufi M., Bennouna A., Benlarabi A., Ikken B. (2020) Implementation of a plug and play IV curve tracer dedicated to characterization and diagnosis of PV modules under real operating conditions, Energy Convers. Manag. 209, 112613. [Google Scholar]
 Koester L., Lindig S., Louwen A., Astigarraga A., Manzolini G., Moser D. (2022) Review of photovoltaic module degradation, field inspection techniques and technoeconomic assessment, Renew. Sustain. Energy Rev. 165, 112616. [Google Scholar]
 Thandaiah Prabu R., Parasuraman S., Sahoo S., Amirthalakshmi T.M., Ramesh S., Agnes Shifani S., Arockia Jayadhas S., Indra Reddy M., Al Obaid S., Alfarraj S., Kumar S.S. (2022) The numerical algorithms and optimization approach used in extracting the parameters of the singlediode and doublediode photovoltaic (PV) models, Int. J. Photoenergy 2022, 5473266. [Google Scholar]
 Edun A.S., LaFlamme C., Kingston S.R., Tetali H.V., Benoit E.J., Scarpulla M., Furse C.M., Harley J.B. (2020) Finding faults in PV systems: Supervised and unsupervised dictionary learning with SSTDR, IEEE Sens. J. 21, 4855–4865. [Google Scholar]
 Gutiérrez L., Patiño J., DuqueGrisales E. (2021) A comparison of the performance of supervised learning algorithms for solar power prediction, Energies 14, 4424. [Google Scholar]
 Li N., Shepperd M., Guo Y. (2020) A systematic review of unsupervised learning techniques for software defect prediction, Inf. Softw. Technol. 122, 106287. [Google Scholar]
 Humada A.M., Darweesh S.Y., Mohammed K.G., Kamil M., Mohammed S.F., Kasim N.K., Tahseen T.A., Awad O.I., Mekhilef S. (2020) Modeling of PV system and parameter extraction based on experimental data: review and investigation, Solar Energy 199, 742–760. [Google Scholar]
 Ahmed R., Sreeram V., Mishra Y., Arif M.D. (2020) A review and evaluation of the stateoftheart in PV solar power forecasting: Techniques and optimization, Renew. Sustain. Energy Rev. 124, 109792. [Google Scholar]
 Dong X.J., Shen J.N., He G.X., Ma Z.F., He Y.J. (2021) A general radial basis function neural network assisted hybrid modeling method for photovoltaic cell operating temperature prediction, Energy 234, 121212. [Google Scholar]
 Kaloop M.R., Bardhan A., Kardani N., Samui P., Hu J.W., Ramzy A. (2021) Novel application of adaptive swarm intelligence techniques coupled with adaptive networkbased fuzzy inference system in predicting photovoltaic power, Renew. Sustain. Energy Rev. 148, 111315. [Google Scholar]
 Lazzaretti A.E., da Costa C.H., Rodrigues M.P., Yamada G.D., Lexinoski G., Moritz G.L., Oroski E., de Goes R.E., Linhares R.R., Stadzisz P.C., Omori J.S., dos Santos R.B. (2020) A monitoring system for online fault detection and classification in photovoltaic plants, Sensors 20, 17, 4688. [Google Scholar]
 Ghaderzadeh M., Hosseini A., Asadi F., Abolghasemi H., Bashash D., Roshanpoor A. (2022) Automated detection model in classification of Blymphoblast cells from normal Blymphoid precursors in blood smear microscopic images based on the majority voting technique, Sci. Program. 2022, 1–8. [Google Scholar]
 Garavand A., Behmanesh A., Aslani N., Sadeghsalehi H., Ghaderzadeh M. (2023) Towards diagnostic aided systems in coronary artery disease detection: a comprehensive multiview survey of the state of the art, Int. J. Intell. Syst. 2023, 1–19. [Google Scholar]
 Fasihfar Z., Rokhsati H., Sadeghsalehi H., Ghadezadeh M., Gheisari M. (2023) AIdriven malaria diagnosis: developing a robust model for accurate detection and classification of malaria parasites, Iran. J. Blood Cancer 15, 112–124. [Google Scholar]
 Ghaderzadeh M., Asadi F., Ramezan Ghorbani N., Almasi S., Taami T. (2023) Toward artificial intelligence (AI) applications in the determination of COVID19 infection severity: considering AI as a disease control strategy in future pandemics, Iran. J. Blood Cancer 15, 3, 93–111. [Google Scholar]
All Tables
Performance metrics scores for classification of faults in PV system using different classification models.
All Figures
Fig. 1 PV classification models and strategies for prediction. 

In the text 
Fig. 2 The proposed methodology’s framework. 

In the text 
Fig. 3 A decision tree diagram. 

In the text 
Fig. 4 Architecture of a photovoltaic simulator system [27]. 

In the text 
Fig. 5 Confusion matrix for fault detection based on KNN model. 

In the text 
Fig. 6 Confusion matrix for faults detection using LR model. 

In the text 
Fig. 7 Confusion matrix for faults detection using DT model. 

In the text 
Fig. 8 Confusion matrix for faults detection using NB Model. 

In the text 
Fig. 9 a) ROC Analysis for prediction of target class (0); b) ROC Analysis for prediction of target class (1); c) ROC Analysis for prediction of target class (2); d) ROC Analysis for prediction of target class (3); e) ROC Analysis for prediction of target class (4). 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.