Artificial intelligence (AI) systems are widely accepted as a technology that offers an alternative way to tackle complex and ill-defined problems. These systems can learn from examples, are fault tolerant in the sense that they are able to handle noisy and incomplete data, are able to deal with non-linear problems, and once trained, can perform prediction and generalization at high speed (Rumelhart et al., 1986). They have been used in diverse applications in control, robotics, pattern recognition, forecasting, medicine, power systems, manufacturing, optimization, signal processing, and the social/psychological sciences. They are particularly useful in system modeling, such as in implementing complex mappings and system identification. AI systems comprise areas such as artificial neural networks, genetic algorithms, fuzzy logic, and various hybrid systems, which combine two or more techniques.
Artificial neural networks (ANNs) mimic somewhat the learning process of a human brain. ANNs are collections of small, individually interconnected processing units. Information is passed between these units along interconnections. An incoming connection has two values associated with it: an input value and a weight. The output of the unit is a function of the summed value. ANNs, though implemented on computers, are not programmed to perform specific tasks. Instead, they are trained with respect to data sets until they learn the patterns used as inputs. Once they are trained, new patterns may be presented to them for prediction or classification. ANNs can automatically learn to recognize patterns in data from real systems or physical models, computer programs, or other sources. They can handle many inputs and produce answers in a form suitable for designers.
Genetic algorithms (GAs) are inspired by the way living organisms adapt to the harsh realities of life in a hostile world, i.e., by evolution and inheritance. In the process the algorithm imitates the evolution of a population by selecting only fit individuals for reproduction. Therefore, a genetic algorithm is an optimum search technique based on the concepts of natural selection and survival of the fittest. It works with a fixed-size population of possible solutions to a problem, called individuals, which evolve over time. A genetic algorithm utilizes three principal genetic operators: selection, crossover, and mutation.
Fuzzy logic is used mainly in control engineering. It is based on fuzzy logic reasoning, which employs linguistic rules in the form of if-then statements. Fuzzy logic and fuzzy control feature a relative simplification of a control methodology description. This allows the application of a “human language” to describe the problems and their fuzzy solutions. In many control applications, the model of the system is unknown or the input parameters are highly variable and unstable. In such cases, fuzzy controllers can be applied. These are more robust and cheaper than conventional PID controllers. It is also easier to understand and modify fuzzy controller rules, which not only use a human operator’s strategy but are expressed in natural linguistic terms.
Hybrid systems combine more than one of these technologies, either as part of an integrated method of problem solution or to perform a particular task, followed by a second technique, which performs some other task. For example, neuro-fuzzy controllers use neural networks and fuzzy logic for the same task, i.e., to control a process; whereas in another hybrid system, a neural network may be used to derive some parameters and a genetic algorithm might be used subsequently to find an optimum solution to a problem.
For the estimation of the flow of energy and the performance of solar energy systems, analytic computer codes are often used. The algorithms employed are usually complicated, involving the solution of complex differential equations. These programs usually require a great deal of computer power and need a considerable amount of time to give accurate predictions. Instead of complex rules and mathematical routines, artificial intelligence systems are able to learn the key information patterns within a multi-dimensional information domain. Data from solar energy systems, being inherently noisy, are good candidate problems to be handled with artificial intelligence techniques.
The major objective of this section is to illustrate how artificial intelligence techniques might play an important role in the modeling and prediction of the performance and control of solar processes. The aim of this material is to enable the reader to understand how artificial intelligence systems can be set up. Various examples of solar energy systems are given as references so that interested readers can find more details. The results presented in these examples are testimony to the potential of artificial intelligence as a design tool in many areas of solar engineering.
11.6.1 Artificial neural networks
The concept of ANN analysis was conceived nearly 50 years ago, but only in the last 20 years has applications software been developed to handle practical problems. The purpose of this section is to present a brief overview of how neural networks operate and describe the basic features of some of the mostly used neural network architectures. A review of applications of ANNs in solar energy systems is also included.
ANNs are good for some tasks but lacking in some others. Specifically, they are good for tasks involving incomplete data sets, fuzzy or incomplete information, and highly complex and ill-defined problems, where humans usually decide on an intuitional basis. They can learn from examples and are able to deal with non-linear problems. Furthermore, they exhibit robustness and fault tolerance. The tasks that ANNs cannot handle effectively are those requiring high accuracy and precision, as in logic and arithmetic. ANNs have been applied successfully in a number of application areas. Some of the most important ones are (Kalogirou, 2003b):
• Function approximation. The mapping of a multiple input to a single output is established. Unlike most statistical techniques, this can be done with adaptive model-free estimation of parameters.
• Pattern association and pattern recognition. This is a problem of pattern classification. ANNs can be effectively used to solve difficult problems in this field—for instance, in sound, image, or video recognition. This task can be made even without an a priori definition of the pattern. In such cases, the network learns to identify totally new patterns.
• Associative memories. This is the problem of recalling a pattern when given only a subset clue. In such applications, the network structures used are usually complicated, composed of many interacting dynamical neurons.
• Generation of new meaningful patterns. This general field of application is relatively new. Some claims are made that suitable neuronal structures can exhibit rudimentary elements of creativity.
ANNs have been applied successfully in various fields of mathematics, engineering, medicine, economics, meteorology, psychology, neurology, and many others. Some of the most important ones are in pattern, sound, and speech recognition; the analysis of electromyographs and other medical signatures; the identification of military targets; and the identification of explosives in passenger suitcases. They have also being used in forecasting weather and market trends, the prediction of mineral exploration sites, prediction of electrical and thermal loads, adaptive and robotic control, and many others. Neural networks are also used for process control because they can build predictive models of the process from multi-dimensional data routinely collected from sensors.
Neural networks obviate the need to use complex, mathematically explicit formulas, computer models, and impractical and costly physical models. Some of the characteristics that support the success of ANNs and distinguish them from the conventional computational techniques are (Nannariello and Frike, 2001):
• The direct manner in which ANNs acquire information and knowledge about a given problem domain (learning interesting and possibly non-linear relationships) through the “training” phase.
• The ability to work with numerical or analog data that would be difficult to deal with by other means because of the form of the data or because there are many variables.
• Its “black-box” approach, in which the user requires no sophisticated mathematical knowledge.
• The compact form in which the acquired information and knowledge is stored within the trained network and the ease with which it can be accessed and used.
• The ability of solutions provided to be robust, even in the presence of “noise” in the input data.
• The high degree of accuracy reported when ANNs are used to generalize over a set of previously unseen data (not used in the “training” process) from the problem domain.
While neural networks can be used to solve complex problems, they do suffer from a number of shortcomings. The most important of them are:
• The need for data used to train neural networks to contain information that, ideally, is spread evenly throughout the entire range of the system.
• The limited theory to assist in the design of neural networks.
• The lack of guarantee of finding an acceptable solution to a problem.
• The limited opportunities to rationalize the solutions provided.
The following sections briefly explain how the artificial neuron is visualized from a biological one and the steps required to set up a neural network. Additionally, the characteristics of some of the most used neural network architectures are described.
Biological and artificial neurons
A biological neuron is shown in Figure 11.16. In the brain, coded information flows (using electrochemical media, the so-called neurotransmitters) from the synapses toward the axon. The axon of each neuron transmits information to a number of other neurons. The neuron receives information at the synapses from a large number of other neurons. It is estimated that each neuron may receive stimuli from as many as 10,000 other neurons. Groups of neurons are organized into subsystems, and the integration of these subsystems forms the brain. It is estimated that the human brain has around 100 billion interconnected neurons.

FIGURE 11.16 A schematic of a biological neuron.
Figure 11.17 shows a highly simplified model of an artificial neuron, which may be used to stimulate some important aspects of the real biological neuron. An ANN is a group of interconnected artificial neurons, interacting with one another in a concerted manner. In such a system, excitation is applied to the input of the network. Following some suitable operation, it results in a desired output. At the synapses, there is an accumulation of some potential, which in the case of the artificial neurons, is modeled as a connection weight. These weights are continuously modified, based on suitable learning rules.

FIGURE 11.17 A simplified model of an artificial neuron.
Artificial neural network principles
According to Haykin (1994), a neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use. It resembles the human brain in two respects:
• Knowledge is acquired by the network through a learning process.
• Interneuron connection strengths, known as synaptic weights, are used to store the knowledge.
ANN models may be used as an alternative method in engineering analysis and predictions. ANNs mimic somewhat the learning process of a human brain. They operate like a “black-box” model, requiring no detailed information about the system. Instead, they learn the relationship between the input parameters and the controlled and uncontrolled variables by studying previously recorded data, similar to the way a non-linear regression might perform. Another advantage of using ANNs is their ability to handle large, complex systems with many interrelated parameters. They seem to simply ignore excess input parameters that are of minimal significance and concentrate instead on the more important inputs.
A schematic diagram of a typical multi-layer, feed-forward neural network architecture is shown in Figure 11.18. The network usually consists of an input layer, some hidden layers, and an output layer. In its simple form, each single neuron is connected to other neurons of a previous layer through adaptable synaptic weights. Knowledge is usually stored as a set of connection weights (presumably corresponding to synapse efficacy in biological neural systems). Training is the process of modifying the connection weights in some orderly fashion, using a suitable learning method. The network uses a learning mode, in which an input is presented to the network along with the desired output and the weights are adjusted so that the network attempts to produce the desired output. The weights after training contain meaningful information, whereas before training they are random and have no meaning.

FIGURE 11.18 Schematic diagram of a multi-layer feed-forward neural network.
Figure 11.19 illustrates how information is processed through a single node. The node receives the weighted activation of other nodes through its incoming connections. First, these are added up (summation). The result is then passed through an activation function; the outcome is the activation of the node. For each of the outgoing connections, this activation value is multiplied by the specific weight and transferred to the next node.

FIGURE 11.19 Information processing in a neural network unit.
A training set is a group of matched input and output patterns used for training the network, usually by suitable adaptation of the synaptic weights. The outputs are the dependent variables that the network produces for the corresponding input. It is important that all the information the network needs to learn is supplied to the network as a data set. When each pattern is read, the network uses the input data to produce an output, which is then compared to the training pattern, i.e., the correct or desired output. If there is a difference, the connection weights (usually but not always) are altered in such a direction that the error is decreased. After the network has run through all the input patterns, if the error is still greater than the maximum desired tolerance, the ANN runs again through all the input patterns repeatedly until all the errors are within the required tolerance. When the training reaches a satisfactory level, the network holds the weights constant and the trained network can be used to make decisions, identify patterns, or define associations in new input data sets not used to train it.
By learning, we mean that the system adapts (usually by changing suitable controllable parameters) in a specified manner so that some parts of the system suggest a meaningful behavior, projected as output. The controllable parameters have different names, such as synaptic weights, synaptic efficacies, free parameters, and others.
The classical view of learning is well interpreted and documented in approximation theories. In these, learning may be interpreted as finding a suitable hypersurface that fits known input–output data points in such a manner that the mapping is acceptably accurate. Such a mapping is usually accomplished by employing simple non-linear functions that are used to compose the required function (Pogio and Girosi, 1990).
A more general approach to learning is adopted by Haykin (1994), in which learning is a process by which the free parameters of a neural network are adapted through a continuing process of simulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameter changes take place.
Generally, learning is achieved through any change in any characteristic of a network so that meaningful results are achieved, meaning that a desired objective is met with a satisfactory degree of success. Thus, learning could be achieved through synaptic weight modification, network structure modifications, appropriate choice of activation functions and other ways.
The objective is usually quantified by a suitable criterion or cost function. It is usually a process of minimizing an error function or maximizing a benefit function. In this respect, learning resembles optimization. That is why a genetic algorithm, which is an optimum search technique (see Section 11.6.2), can also be employed to train ANNs.
Several algorithms are commonly used to achieve the minimum error in the shortest time. There are also many alternative forms of neural networking systems and, indeed, many different ways in which they may be applied to a given problem. The suitability of an appropriate paradigm and strategy for an application depends very much on the type of problem to be solved.
The most popular learning algorithms are back-propagation (BP) and its variants (Barr and Feigenbaum, 1981; Werbos, 1974). The back-propagation algorithm is one of the most powerful learning algorithms in neural networks. Back-propagation training is a gradient descent algorithm. It tries to improve the performance of the neural network by reducing the total error by changing the weights along its gradient. The error is expressed by the root mean square value (RMS), which can be calculated by:
(11.125)
where E is the RMS error, t is the network output (target), and o is the desired output vectors over all patterns, p. An error of zero would indicate that all the output patterns computed by the ANN perfectly match the expected values and the network is well trained. In brief, back-propagation training is performed by initially assigning random values to the weight terms (wij)1 in all nodes. Each time a training pattern is presented to the ANN, the activation for each node, αpi, is computed. After the output of the layer is computed the error term, δpi, for each node is computed backward through the network. This error term is the product of the error function, E, and the derivative of the activation function and, hence, is a measure of the change in the network output produced by an incremental change in the node weight values. For the output layer nodes and the case of the logistic-sigmoid activation, the error term is computed as:
(11.126)
For a node in a hidden layer,
(11.127)
In this expression, the k subscript indicates a summation over all nodes in the downstream layer (the layer in the direction of the output layer). The j subscript indicates the weight position in each node. Finally, the δ and α terms for each node are used to compute an incremental change to each weight term via:
(11.128)
The term ε, referred to as the learning rate, determines the size of the weight adjustments during each training iteration. The term m is called the momentum factor. It is applied to the weight change used in the previous training iteration, wij(old). Both of these constant terms are specified at the start of the training cycle and determine the speed and stability of the network. The training of all patterns of a training data set is called an epoch.
Network parameter selection
Though most scholars are concerned with the techniques to define ANN architecture, practitioners want to apply the ANN architecture to the model and obtain quick results. The term neural network architecture refers to the arrangement of neurons into layers and the connection patterns between layers, activation functions, and learning methods. The neural network model and the architecture of a neural network determine how a network transforms its input into an output. This transformation is, in fact, a computation. Often, the success depends on a clear understanding of the problem, regardless of the network architecture. However, in determining which neural network architecture provides the best prediction, it is necessary to build a good model. It is essential to be able to identify the most important variables in a process and generate best-fit models. How to identify and define the best model is very controversial.
Despite the differences between traditional approaches and neural networks, both methods require preparing the model. The classical approach is based on the precise definition of the problem domain as well as the identification of a mathematical function or functions to describe it. It is, however, very difficult to identify an accurate mathematical function when the system is non-linear and parameters vary with time due to several factors. The control program often lacks the capability to adapt to the parameter changes. Neural networks are used to learn the behavior of the system and subsequently to simulate and predict its behavior. In defining the neural network model, first the process and the process control constraints have to be understood and identified. Then, the model is defined and validated.
When using a neural network for prediction, the following steps are crucial. First, a neural network needs to be built to model the behavior of the process, and the values of the output are predicted based on the model. Second, based on the neural network model obtained in the first phase, the output of the model is simulated using different scenarios. Third, the control variables are modified to control and optimize the output.
When building the neural network model, the process has to be identified with respect to the input and output variables that characterize it. The inputs include measurements of the physical dimensions, measurements of the variables specific to the environment or equipment, and controlled variables modified by the operator. Variables that have no effect on the variation of the measured output are discarded. These are estimated by the contribution factors of the various input parameters. These factors indicate the contribution of each input parameter to the learning of the neural network and are usually estimated by the network, depending on the software employed.
The selection of training data plays a vital role in the performance and convergence of the neural network model. An analysis of historical data for identification of variables that are important to the process is important. Plotting graphs to check whether the various variables reflect what is known about the process from operating experience and for discovery of errors in data is very helpful.
All input and output values are usually scaled individually such that the overall variance in the data set is minimized. Therefore, the input and output values are normalized. This is necessary because it leads to faster learning. The scaling used is either in the range −1 to 1 or in the range 0 to 1, depending on the type of data and the activation function used.
The basic operation that has to be followed to successfully handle a problem with ANNs is to select the appropriate architecture and the suitable learning rate, momentum, number of neurons in each hidden layer, and the activation function. The procedure for finding the best architecture and the other network parameters is laborious and time-consuming, but as experience is gathered, some parameters can be predicted easily, tremendously shortening the time required.
The first step is to collect the required data and prepare them in a spreadsheet format with various columns representing the input and output parameters. If a large number of sequences or patterns are available in the input data file, to avoid a long training time, a smaller training file may be created, containing as much as possible representative samples of the whole problem domain, in order to select the required parameters and to use the complete data set for the final training.
Three types of data files are required: a training data file, a test data file, and a validation data file. The former and the last should contain representative samples of all the cases the network is required to handle, whereas the test file may contain about 10% of the cases contained in the training file.
During training, the network is tested against the test file to determine accuracy, and training should be stopped when the mean average error remains unchanged for a number of epochs. This is done in order to avoid over-training, in which case, the network learns the training patterns perfectly but is unable to make predictions when an unknown training set is presented to it.
In back-propagation networks, the number of hidden neurons determines how well a problem can be learned. If too many are used, the network will tend to memorize the problem and not generalize well later. If too few are used, the network will generalize well but may not have enough “power” to learn the patterns well. Getting the right number of hidden neurons is a matter of trial and error, since there is no science to it. In general, the number of hidden neurons (N) may be estimated by applying the following empirical formula (Ward Systems Group, Inc., 1996):
(11.129)
I = number of input parameters;
O = number of output parameters; and
Pi = number of training patterns available.
The most important parameter to select in a neural network is the type of architecture. A number of architectures can be used in solar engineering problems. A short description of the most important ones is given in this section: back-propagation (BP), general regression neural networks (GRNNs), and the group method of data handling (GMDH). These are described briefly in the next sections.
Back-propagation architecture
Architectures in the back-propagation category include standard networks, recurrent, feed-forward with multiple hidden slabs, and jump connection networks. Back-propagation networks are known for their ability to generalize well on a wide variety of problems. They are a supervised type of network, i.e., trained with both inputs and outputs. Back-propagation networks are used in a large number of working applications, since they tend to generalize well.
The first category of neural network architectures is the one where each layer is connected to the immediately previous layer (see Figure 11.18). Generally, three layers (input, hidden, and output) are sufficient for the majority of problems to be handled. A three-layer back-propagation network with standard connections is suitable for almost all problems. One, two, or three hidden layers can be used, however, depending on the problem characteristics. The use of more than five layers in total generally offers no benefits and should be avoided.
The next category of architecture is the recurrent network with dampened feedback from either the input, hidden, or output layer. It holds the contents of one of the layers as it existed when the previous pattern was trained. In this way, the network sees the previous knowledge it had about previous inputs. This extra slab is sometimes called the network’s long-term memory. The long-term memory remembers the input, output, or hidden layer that contains features detected in the raw data of previous patterns. Recurrent neural networks are particularly suitable for prediction of sequences, so they are excellent for time-series data. A back-propagation network with standard connections, as just described, responds to a given input pattern with exactly the same output pattern every time the input pattern is presented. A recurrent network may respond to the same input pattern differently at different times, depending on the patterns that had been presented as inputs just previously. Thus, the sequence of the patterns is as important as the input pattern itself. Recurrent networks are trained as the standard back-propagation networks, except that patterns must always be presented in the same order. The difference in structure is that an extra slab in the input layer is connected to the hidden layer, just like the other input slab. This extra slab holds the contents of one of the layers (input, output, or hidden) as it existed when the previous pattern was trained.
The third category is the feed-forward network with multiple hidden slabs. These network architectures are very powerful in detecting different features of the input vectors when different activation functions are given to the hidden slabs. This architecture has been used in a number of engineering problems for modeling and prediction with very good results (see the later section, “ANN Applications in Solar Energy Systems”). This is a feed-forward architecture with three hidden slabs, as shown in Figure 11.20. The information processing at each node is performed by combining all input numerical information from upstream nodes in a weighted average of the form:
(11.130)
α(pi) = activation for each node.
b1 = a constant term referred to as the bias.
The final nodal output is computed via the activation function. This architecture has different activation functions in each slab. Referring to Figure 11.20, the input slab activation function is linear, i.e., α(pi) = βi (where βi is the weighted average obtained by combining all input numerical information from upstream nodes), while the activations used in the other slabs are as follows.

FIGURE 11.20 Feed-forward architecture with multiple hidden slabs.
Gaussian for slab 2,
(11.131)
Tanh for slab 3,
(11.132)
Gaussian complement for slab 4,
(11.133)
Logistic for the output slab,
(11.134)
Different activation functions are applied to hidden layer slabs to detect different features in a pattern processed through a network. The number of hidden neurons in the hidden layers may also be calculated with Eq. (11.129). However, an increased number of hidden neurons may be used to get more “degrees of freedom” and allow the network to store more complex patterns. This is usually done when the input data are highly non-linear. It is recommended in this architecture to use Gaussian function on one hidden slab to detect features in the middle range of the data and Gaussian complement in another hidden slab to detect features from the upper and lower extremes of the data. Combining the two feature sets in the output layer may lead to a better prediction.
General regression neural network architecture
Another type of architecture is general regression neural networks (GRNNs), which are known for their ability to train quickly on sparse data sets. In numerous tests, it was found that a GRNN responds much better than back-propagation to many types of problems, although this is not a rule. It is especially useful for continuous function approximation. A GRNN can have multi-dimensional input, and it will fit multi-dimensional surfaces through data. GRNNs work by measuring how far a given sample pattern is from patterns in the training set in N-dimensional space, where N is the number of inputs in the problem. The Euclidean distance is usually adopted.
A GRNN is a four-layer feed-forward neural network based on the non-linear regression theory, consisting of the input layer, the pattern layer, the summation layer, and the output layer (see Figure 11.21). There are no training parameters, such as learning rate and momentum, as in back-propagation networks, but a smoothing factor is applied after the network is trained. The smoothing factor determines how tightly the network matches its predictions to the data in the training patterns. Although the neurons in the first three layers are fully connected, each output neuron is connected only to some processing units in the summation layer. The summation layer has two types of processing units: summation units and a single division unit. The number of summation units is always the same as the number of the GRNN output units. The division unit only sums the weighted activations of the pattern units of the hidden layer, without using any activation function.

FIGURE 11.21 General regression neural network architecture.
Each GRNN output unit is connected only to its corresponding summation unit and the division unit (there are no weights in these connections). The function of the output units consists of a simple division of the signal coming from the summation unit by the signal coming from the division unit. The summation and output layers together basically perform a normalization of the output vector, making a GRNN much less sensitive to the proper choice of the number of pattern units. More details on GRNNs can be found in Tsoukalas and Uhrig (1997) and Ripley (1996).
For GRNN networks, the number of neurons in the hidden pattern layer is usually equal to the number of patterns in the training set because the hidden layer consists of one neuron for each pattern in the training set. This number can be made larger if one wants to add more patterns, but it cannot be made smaller.
The training of the GRNN is quite different from the training used in other neural networks. It is completed after presentation of each input–output vector pair from the training data set to the GRNN input layer only once.
The GRNN may be trained using a genetic algorithm (see Section 11.6.2). The genetic algorithm is used to find the appropriate individual smoothing factors for each input as well as an overall smoothing factor. Genetic algorithms use a “fitness” measure to determine which individuals in the population survive and reproduce. Therefore, survival of the fittest causes good solutions to progress. A genetic algorithm works by selective breeding of a population of “individuals”, each of which could be a potential solution to the problem. In this case, a potential solution is a set of smoothing factors, and the genetic algorithm seeks to breed an individual that minimizes the mean squared error of the test set, which can be calculated by:
(11.135)
t = network output (target); and
o = desired output vectors over all patterns (p) of the test set.
The larger the breeding pool size, the greater is its potential to produce a better individual. However, the networks produced by every individual must be applied to the test set on every reproductive cycle, so larger breeding pools take longer time. After testing all the individuals in the pool, a new “generation” of individuals is produced for testing. Unlike the back-propagation algorithm, which propagates the error through the network many times, seeking a lower mean squared error between the network’s output and the actual output or answer, GRNN training patterns are presented to the network only once.
The input smoothing factor is an adjustment used to modify the overall smoothing to provide a new value for each input. At the end of training, the individual smoothing factors may be used as a sensitivity analysis tool; the larger the factor for a given input, the more important that input is to the model, at least as far as the test set is concerned. Inputs with low smoothing factors are candidates for removal for a later trial.
Individual smoothing factors are unique to each network. The numbers are relative to each other within a given network, and they cannot be used to compare inputs from different networks.
If the number of input, output, or hidden neurons is changed, however, the network must be retrained. This may occur when more training patterns are added, because GRNN networks require one hidden neuron for each training pattern.
Group method of data handling neural network architecture
One type of neural network that is very suitable for modeling is the group method of data handling (GMDH) neural network. The GMDH technique was invented by A.G. Ivakhenko, from the Institute of Cybernetics, Ukrainian Academy of Sciences (Ivakhenko, 1968, 1971), but enhanced by others (Farlow, 1984). This technique is also known as polynomial networks. Ivakhenko developed the GMDH technique to build more accurate predictive models of fish populations in rivers and oceans. The GMDH technique worked well for modeling fisheries and many other modeling applications (Hecht-Nielsen, 1991). The GMDH is a feature-based mapping network.
The GMDH technique works by building successive layers, with links that are simple polynomial terms. These polynomial terms are created by using linear and non-linear regression. The initial layer is simply the input layer. The first layer created is made by computing regressions of the input variables, from which the best ones are chosen. The second layer is created by computing regressions of the values in the first layer, along with the input variables. Only the best, called survivors, are chosen by the algorithm. This process continues until the network stops getting better, according to a prespecified selection criterion. More details on the GMDH technique can be found in the book by Hecht-Nielsen (1991).
The resulting network can be represented as a complex polynomial description of the model in the form of a mathematical equation. The complexity of the resulting polynomial depends on the variability of the training data. In some respects, GMDH is very much like using regression analysis but far more powerful. The GMDH network can build very complex models while avoiding overfitting problems. Additionally, an advantage of the GMDH technique is that it recognizes the best variables as it trains and, for problems with many variables, the ones with low contribution can be discarded.
The central idea behind the GMDH technique is that it is trying to build a function (called a polynomial model) that behaves as closely as possible to the way the predicted and actual values of the output would. For many end users, it may be more convenient to have a model that is able to make predictions using polynomial formulas that are widely understood than a normal neural network, which operates like a “black-box” model. The most common approach to solving such models is to use regression analysis. The first step is to decide the type of polynomial that regression should find. For example, a good idea is to choose, as terms of the polynomial, powers of input variables along with their covariants and trivariants, such as:
(11.136)
The next step is to construct a linear combination of all the polynomial terms with variable coefficients. The algorithm determines the values of these coefficients by minimizing the squared sum of differences between sample outputs and model predictions, over all samples.
The main problem when utilizing regression is how to choose the set of polynomial terms correctly. In addition, decisions need to be made on the degree of the polynomial. For example, decisions have to be made on how complex the terms should be or whether the model should evaluate terms such as x10, or maybe limit consideration to terms such as x4 and lower. The GMDH technique works better than regression by answering these questions before trying all possible combinations.
The decision about the quality of each model must be made using some numeric criterion. The simplest criterion (a form of which is also used in linear regression analysis) is the sum, over all samples, of the squared differences between the actual output (ya) and the model’s prediction (yp) divided by the sum of the squared actual output. This is called the normalized mean squared error (NMSE). In equation form,
(11.137)
However, if only the NMSE is used on real data, the NMSE value gets smaller and smaller as long as extra terms are added to the model. This is because the more complex the model, the more exact it is. This is always true if NMSE is used alone, which determines the quality of the model by evaluating the same information already used to build the model. This results in an “over-complex” model or model overfit, which means the model does not generalize well because it pays too much attention to noise in the training data. This is similar to over-training other neural networks.
To avoid this danger, a more powerful criterion is needed, based on information other than that which was used to build the evaluated model. There are several ways to define such criteria. For example, the squared sum of differences between the known output and model prediction over some other set of experimental data (a test set) may be used. Another way to avoid overfitting is to introduce a penalty for model complexity. This is called the predicted squared error criterion.
Theoretical considerations show that increasing model complexity should be stopped when the selection criterion reaches a minimum value. This minimum value is a measure of model reliability.
The method of searching for the best model based on testing all possible models is usually called the combinatorial GMDH algorithm. To reduce computation time, the number of polynomial terms used to build the models to be evaluated should be reduced. To do so, a one-stage procedure of model selection should be changed to a multi-layer procedure. In this, the first two input variables are initially taken and combined into a simple set of polynomial terms. For example, if the first two input variables are x1 and x2, the set of polynomial terms would be {c, x1, x2, x1 × x2}, where (c) represents the constant term. Subsequently, all possible models made from these terms are checked and the best is chosen; any one of the evaluated models is a candidate for survival.
Then, another pair of input variables is taken and the operation is repeated, resulting in another candidate for survival, with its own value of the evaluation criterion. By repeating the same procedure for each possible pair of n input variables, n(n − 1)/2 candidates for survival are generated, each with its own value of the evaluation criterion.
Subsequently, these values are compared, and several candidates for survival that give the best approximation of the output variable are chosen. Usually a predefined number of the best candidates are selected for survival and are stored in the first layer of the network and preserved for the next layer. The candidates selected are called survivors.
The layer of survivors is used for inputs in building the next layer in the network. The original network inputs used in the first layer may also be chosen as inputs to the new layer. Therefore, the next layer is built with polynomials of this broadened set of inputs. It should be noted that, since some inputs are already polynomials, the next layer may contain very complex polynomials.
The layer building of the GMDH procedure continues as long as the evaluation criteria continue to diminish. Each time a new layer is built the GMDH algorithm checks whether the new evaluation criterion is lower than the previous one and, if this is so, continues training; otherwise, it stops training.
ANN Applications in solar energy systems
Artificial neural networks have been used by the author in the field of solar energy, for modeling the heat-up response of a solar steam generating plant (Kalogirou et al., 1998), the estimation of a parabolic trough collector intercept factor (Kalogirou et al., 1996), the estimation of a parabolic trough collector local concentration ratio (Kalogirou, 1996a), the design of a solar steam-generation system (Kalogirou, 1996b), the performance prediction of a thermosiphon solar water heater (Kalogirou et al., 1999a), modeling solar domestic water-heating systems (Kalogirou et al., 1999b), the long-term performance prediction of forced circulation solar domestic water-heating systems (Kalogirou, 2000), and the thermosiphon solar domestic water-heating system’s long-term performance prediction (Kalogirou and Panteliou, 2000). A review of these models, together with other applications in the field of renewable energy, is given in an article by Kalogirou (2001). In most of those models, the multiple hidden layer architecture shown in Figure 11.20 was used. The errors reported are well within acceptable limits, which clearly suggests that ANNs can be used for modeling and prediction in other fields of solar energy engineering. What is required is to have a set of data (preferably experimental) representing the past history of a system so that a suitable neural network can be trained to learn the dependence of expected output on the input parameters.
11.6.2 Genetic algorithms
The genetic algorithm (GA) is a model of machine learning that derives its behavior from a representation of the processes of evolution in nature. This is done by the creation, within a machine or computer, of a population of individuals represented by chromosomes. Essentially, these are a set of character strings that are analogous to the chromosomes in the DNA of human beings. The individuals in the population then go through a process of evolution.
It should be noted that evolution as occurring in nature or elsewhere is not a purposive or directed process, i.e., no evidence supports the assertion that the goal of evolution is to produce humankind. Indeed, the processes of nature seem to end with different individuals competing for resources in the environment. Some are better than others; those that are better are more likely to survive and propagate their genetic material.
In nature, the encoding for the genetic information is done in a way that admits asexual reproduction and typically results in offspring that are genetically identical to the parent. Sexual reproduction allows the creation of genetically radically different offspring that are still of the same general species.
In an oversimplified consideration, at the molecular level, what happens is that a pair of chromosomes bump into one another, exchange chunks of genetic information, and drift apart. This is the recombination operation, which in GAs is generally referred to as crossover because of the way that genetic material crosses over from one chromosome to another.
The crossover operation happens in an environment where the selection of who gets to mate is a function of the fitness of the individual, i.e., how good the individual is at competing in its environment. Some GAs use a simple function of the fitness measure to select individuals (probabilistically) to undergo genetic operations, such as crossover or asexual reproduction, i.e., the propagation of genetic material remains unaltered. This is a fitness proportionate selection. Other implementations use a model in which certain randomly selected individuals in a subgroup compete and the fittest is selected. This is called tournament selection. The two processes that most contribute to evolution are crossover and fitness-based selection/reproduction. Mutation also plays a role in this process.
GAs are used in a number of application areas. An example of this would be multi-dimensional optimization problems, in which the character string of the chromosome can be used to encode the values for the different parameters being optimized.
Therefore, in practice, this genetic model of computation can be implemented by having arrays of bits or characters to represent the chromosomes. Simple bit manipulation operations allow the implementation of crossover, mutation, and other operations.
When the GA is executed, it is usually done in a manner that involves the following cycle. Evaluate the fitness of all of the individuals in the population. Create a new population by performing operations such as crossover, fitness-proportionate reproduction, and mutation on the individuals whose fitness has just been measured. Discard the old population and iterate using the new population. One iteration of this loop is referred to as a generation. The structure of the standard GA is shown in Figure 11.22 (Zalzala and Fleming, 1997).

FIGURE 11.22 The structure of a standard genetic algorithm.
With reference to Figure 11.22, in each generation, individuals are selected for reproduction according to their performance with respect to the fitness function. In essence, selection gives a higher chance of survival to better individuals. Subsequently, genetic operations are applied to form new and possibly better offspring. The algorithm is terminated either after a certain number of generations or when the optimal solution has been found. More details on genetic algorithms can be found in Goldberg (1989), Davis (1991), and Michalewicz (1996).
The first generation (generation 0) of this process operates on a population of randomly generated individuals. From there on, the genetic operations, in concert with the fitness measure, operate to improve the population.
During each step in the reproduction process, the individuals in the current generation are evaluated by a fitness function value, which is a measure of how well the individual solves the problem. Then, each individual is reproduced in proportion to its fitness. The higher the fitness, the higher is its chance to participate in mating (crossover) and produce an offspring. A small number of newborn offspring undergo the action of the mutation operator. After many generations, only those individuals who have the best genetics (from the point of view of the fitness function) survive. The individuals that emerge from this “survival-of-the-fittest” process are the ones that represent the optimal solution to the problem specified by the fitness function and the constraints.
Genetic algorithms are suitable for finding the optimum solution in problems where a fitness function is present. Genetic algorithms use a “fitness” measure to determine which individuals in the population survive and reproduce. Thus, survival of the fittest causes good solutions to progress. A genetic algorithm works by selective breeding of a population of “individuals”, each of which could be a potential solution to the problem. The genetic algorithm seeks to breed an individual that either maximizes, minimizes, or is focused on a particular solution to a problem.
The larger the breeding pool size, the greater the potential for producing a better individual. However, since the fitness value produced by every individual must be compared with all other fitness values of all other individuals on every reproductive cycle, larger breeding pools take longer time. After testing all the individuals in the pool, a new “generation” of individuals is produced for testing.
During the setting up of the genetic algorithm, the user has to specify the adjustable chromosomes, i.e., the parameters that would be modified during evolution to obtain the maximum value of the fitness function. Additionally, the user has to specify the ranges of these values, called constraints.
A genetic algorithm is not gradient-based and uses an implicitly parallel sampling of the solutions space. The population approach and multiple sampling means that it is less subject to becoming trapped in local minima than traditional direct approaches and can navigate a large solution space with a highly efficient number of samples. Although not guaranteed to provide the globally optimum solution, GAs have been shown to be highly efficient at reaching a very near optimum solution in a computationally efficient manner.
The genetic algorithm is usually stopped after best fitness remains unchanged for a number of generations or when the optimum solution is reached.
An example of using GAs in this book is given in Chapter 3, Example 3.2, where the two glass temperatures are varied to get the same Qt/Ac value from Eqs (3.15), (3.17), and (3.22). In this case, the values of Tg1 and Tg2 are the adjustable chromosomes and the fitness function is the sum of the absolute difference between each Qt/Ac value from the mean Qt/Ac value (obtained from the aforementioned three equations). In this problem, the fitness function should be 0, so all Qt/Ac values are equal, which is the objective. Other applications of GAs in solar energy are given in the next section.
GA Applications in solar energy systems
Genetic algorithms were used by the author in a number of optimization problems: the optimal design of flat-plate solar collectors (Kalogirou, 2003c), predicting the optimal sizing coefficient of PV supply systems (Mellit and Kalogirou, 2006a), and the optimum selection of the fenestration openings in buildings (Kalogirou, 2007). They have also been used to optimize solar energy systems, in combination with TRNSYS and ANNs (Kalogirou, 2004a). In this, the system is modeled using the TRNSYS computer program and the climatic conditions of Cyprus. An ANN was trained, using the results of a small number of TRNSYS simulations, to learn the correlation of collector area and storage tank size on the auxiliary energy required by the system, from which the life cycle savings can be estimated. Subsequently, a genetic algorithm was employed to estimate the optimum size of these two parameters, for maximizing the life-cycle savings; thus, the design time is reduced substantially. As an example, the optimization of an industrial process heat system employing flat-plate collectors is presented (Kalogirou, 2004a). The optimum solutions obtained from the present methodology give increased life cycle savings of 4.9 and 3.1% when subsidized and non-subsidized fuel prices are used, respectively, as compared to solutions obtained by the traditional trial and error method. The present method greatly reduces the time required by design engineers to find the optimum solution and, in many cases, reaches a solution that could not be easily obtained from simple modeling programs or by trial and error, which in most cases depends on the intuition of the engineer.
GENOPT and TRNOPT programs
When simulation models are used to simulate and design a system, it is usually not easy to determine the parameter values that lead to optimal system performance. This is sometimes due to time constraints, since it is time-consuming for a user to change the input values, run the simulation, interpret the new results, and guess how to change the input for the next trial. Sometimes time is not a problem, but due to the complexity of the system analyzed, the user is just not capable of understanding the non-linear interactions of the various parameters. However, using genetic algorithms, it is possible to do automatic single- or multi-parameter optimization with search techniques that require only little effort. GenOpt is a generic optimization program developed for such system optimization. It was designed by the Lawrence Berkeley National Laboratory and is available free of charge (GenOpt, 2011). GenOpt is used for finding the values of user-selected design parameters that minimize a so-called objective function, such as annual energy use, peak electrical demand, or predicted percentage of dissatisfied people (PPD value), leading to best operation of a given system. The objective function is calculated by an external simulation program, such as TRNSYS (Wetter, 2001). GenOpt can also identify unknown parameters in a data fitting process. GenOpt allows coupling of any simulation program (e.g., TRNSYS) with text-based input–output (I/O) by simply modifying a configuration file, without requiring code modification. Further, it has an open interface for easily adding custom minimization algorithms to its library. This allows the use of GenOpt as an environment for the development of optimization algorithms (Wetter, 2004).
Another tool that can be used is TRNopt, which is an interface program that allows TRNSYS users to quickly and easily utilize the GenOpt optimization tool to optimize combinations of continuous and discrete variables. GenOpt actually controls the simulation and the user sets up the optimization beforehand, using the TRNopt pre-processor program.
11.6.3 Fuzzy logic
Fuzzy logic is a logical system, which is an extension of multi-valued logic. Additionally, fuzzy logic is almost synonymous with the theory of fuzzy sets, a theory that relates to classes of objects without sharp boundaries in which membership is a matter of degree. Fuzzy logic is all about the relative importance of precision, i.e., how important it is to be exactly right when a rough answer will work. Fuzzy inference systems have been successfully applied in fields such as automatic control, data classification, decision analysis, expert systems, and computer vision. Fuzzy logic is a convenient way to map an input space to an output space—as for example, according to hot-water temperature required, to adjust the valve to the right setting, or according to the steam outlet temperature required, to adjust the fuel flow in a boiler. From these two examples, it can be understood that fuzzy logic mainly has to do with the design of controllers.
Conventional control is based on the derivation of a mathematical model of the plant from which a mathematical model of a controller can be obtained. When a mathematical model cannot be created, there is no way to develop a controller through classical control. Other limitations of conventional control are (Reznik, 1997):
• Plant non-linearity. Non-linear models are computationally intensive and have complex stability problems.
• Plant uncertainty. Accurate models cannot be created due to uncertainty and lack of perfect knowledge.
• Multi-variables, multi-loops, and environmental constraints. Multi-variable and multi-loop systems have complex constraints and dependencies.
• Uncertainty in measurements due to noise.
• Temporal behavior. Plants, controllers, environments, and their constraints vary with time. Additionally, time delays are difficult to model.
The advantages of fuzzy control are (Reznik, 1997):
• Fuzzy controllers are more robust than PID controllers, as they can cover a much wider range of operating conditions and operate with noise and disturbances of different natures.
• Their development is cheaper than that of a model-based or other controller to do the same thing.
• They are customizable, since it is easier to understand and modify their rules, which are expressed in natural linguistic terms.
• It is easy to learn how these controllers operate and how to design and apply them in an application.
• They can model non-linear functions of arbitrary complexity.
• They can be built on top of the experience of experts.
• They can be blended with conventional control techniques.
Fuzzy control should not be used when conventional control theory yields a satisfactory result and an adequate and solvable mathematical model already exists or can easily be created.
Fuzzy logic was initially developed in 1965 in the United States by Professor Lofti Zadeh (1973). In fact, Zadeh’s theory not only offered a theoretical basis for fuzzy control but established a bridge connecting artificial intelligence to control engineering. Fuzzy logic has emerged as a tool for controlling industrial processes, as well as household and entertainment electronics, diagnosis systems, and other expert systems. Fuzzy logic is basically a multi-valued logic that allows intermediate values to be defined between conventional evaluations such as yes–no, true–false, black–white, large–small, etc. Notions such as “rather warm” or “pretty cold” can be formulated mathematically and processed in computers. Thus, an attempt is made to apply a more humanlike way of thinking to the programming of computers.
A fuzzy-controller design process contains the same steps as any other design process. One needs initially to choose the structure and parameters of a fuzzy controller, test a model or the controller itself, and change the structure and/or parameters based on the test results (Reznik, 1997). A basic requirement for implementing fuzzy control is the availability of a control expert who provides the necessary knowledge for the control problem (Nie and Linkens, 1995). More details on fuzzy control and practical applications can be found in the works by Zadeh (1973), Mamdani (1974, 1977), and Sugeno (1985).
The linguistic description of the dynamic characteristics of a controlled process can be interpreted as a fuzzy model of the process. In addition to the knowledge of a human expert, a set of fuzzy control rules can be derived by using experimental knowledge. A fuzzy controller avoids rigorous mathematical models and, consequently, is more robust than a classical approach in cases that cannot, or only with great difficulty, be precisely modeled mathematically. Fuzzy rules describe in linguistic terms a quantitative relationship between two or more variables. Processing the fuzzy rules provides a mechanism for using them to compute the response to a given fuzzy controller input.
The basis of a fuzzy or any fuzzy rule system is the inference engine responsible for the inputs’ fuzzification, fuzzy processing, and defuzzification of the output. A schematic of the inference engine is shown in Figure 11.23. Fuzzification means that the actual inputs are fuzzified and fuzzy inputs are obtained. Fuzzy processing means that the inputs are processed according to the rules set and produces fuzzy outputs. Defuzzification means producing a crisp real value for fuzzy output, which is also the controller output.

FIGURE 11.23 Operation of a fuzzy controller.
The fuzzy logic controller’s goal is to achieve satisfactory control of a process. Based on the input parameters, the operation of the controller (output) can be determined. The typical design scheme of a fuzzy logic controller is shown in Figure 11.24 (Zadeh, 1973). The design of such a controller contains the following steps:
1. Define the inputs and the control variables.
2. Define the condition interface. Inputs are expressed as fuzzy sets.
4. Design the computational unit. Many ready-made programs are available for this purpose.
5. Determine the rules for defuzzification, i.e., to transform fuzzy control output to crisp control action.

FIGURE 11.24 Basic configuration of fuzzy logic controller.
Membership functions
A membership function is a curve that defines how each point in the input space is mapped to a membership value, or degree of membership, between 0 and 1. In the literature, the input space is sometimes referred to as the universe of discourse. The only condition a membership function must really satisfy is that it must vary between 0 and 1. Additionally, it is possible, in a fuzzy set, to have a partial membership, such as “the weather is rather hot”. The function itself can be an arbitrary curve whose shape can be defined as a function that suits the particular problem from the point of view of simplicity, convenience, speed, and efficiency.
Based on signals usually obtained from sensors and common knowledge, membership functions for the input and output variables need to be defined. The inputs are described in terms of linguistic variables as, for example, very high, high, okay, low, and very low, as shown in Figure 11.25. It should be noted that, depending on the problem, different sensors could be used showing different parameters such as distance, angle, resistance, slope, etc.

FIGURE 11.25 Membership functions for linguistic variables describing an input sensor.
The output can be adjusted in a similar way, according to some membership functions—for example, the ones presented in Figure 11.26. In both cases, membership curves other than the triangular can be used, such as trapezoidal, quadratic, Gaussian (exponential), cos-function, and many others.

FIGURE 11.26 Membership functions for linguistic variables describing motor operation.
Logical operations
The most important thing to realize about fuzzy logical reasoning is that it is a superset of standard Boolean logic, i.e., if the fuzzy values are kept at their extremes of 1 (completely true) and 0 (completely false), standard logical operations hold. In fuzzy logic, however, the truth of any statement is a matter of degree. The input values can be real numbers between 0 and 1. It should be noted that the results of the statement A AND B, where A and B are limited to the range (0, 1) can be resolved by using min (A, B). Similarly, an OR operation can be replaced with the max function so that A OR B becomes equivalent to max (A, B), and the operation NOT A is equivalent to the operation 1 − A. Given these three functions, any construction can be resolved using fuzzy sets and the fuzzy logical operations AND, OR, and NOT. An example of the operations on fuzzy sets is shown in Figure 11.27.

FIGURE 11.27 Operations on fuzzy sets.
In Figure 11.27, only one particular correspondence between two-valued and multi-valued logical operations for AND, OR, and NOT is defined. This correspondence is by no means unique. In more general terms, what are known as the fuzzy intersection or conjunction (AND), fuzzy union or disjunction (OR), and fuzzy complement (NOT) can be defined.
The intersection of two fuzzy sets, A and B, is specified in general by a binary mapping, T, which aggregates two membership functions as:
(11.138)
The binary operator, T, may represent the multiplication of μA(x) and μB(x). These fuzzy intersection operators are usually refined as T norm (triangular norm) operators. Similarly, in fuzzy intersection, the fuzzy union operator is specified in general by a binary mapping, S, as:
(11.139)
The binary operator, S, may represent the addition of μA(x) and μB(x). These fuzzy union operators are usually referred to as T conorm (or S norm) operators.
IF-THEN rules
Fuzzy sets and fuzzy operators are the subjects and verbs of fuzzy logic. While the differential equations are the language of conventional control, if–then rules, which determine the way a process is controlled, are the language of fuzzy control. Fuzzy rules serve to describe the quantitative relationship between variables in linguistic terms. These if-then rule statements are used to formulate the conditional statements that comprise fuzzy logic. Several rule bases of different complexity can be developed, such as:
IF Sensor 1 is Very Low AND Sensor 2 is Very Low THEN Motor is Fast Reverse
IF Sensor 1 is High AND Sensor 2 is Low THEN Motor is Slow Reverse
IF Sensor 1 is Okay AND Sensor 2 is Okay THEN Motor Off
IF Sensor 1 is Low AND Sensor 2 is High THEN Motor is Slow Forward
IF Sensor 1 is Very Low AND Sensor 2 is Very High THEN Motor is Fast Forward
In general form, a single fuzzy IF-THEN rule is of the form:
(11.140)
where A, B, and C are linguistic values defined by fuzzy sets on the ranges (universe of discourse) X, Y, and Z, respectively. In if-then rules, the term following the IF statement is called the premise or antecedent, and the term following THEN is called the consequent.
It should be noted that A and B are represented as a number between 0 and 1, and so the antecedent is an interpretation that returns a single number between 0 and 1. On the other hand, C is represented as a fuzzy set, so the consequent is an assignment that assigns the entire fuzzy set C to the output variable z. In the if-then rule, the word is gets used in two entirely different ways, depending on whether it appears in the antecedent or the consequent. In general, the input to an if-then rule is the current value of an input variable, in Eq. (11.140), x and y, and the output is an entirely fuzzy set, in Eq. (11.140), z. This will later be defuzzified, assigning one value to the output.
Interpreting an if-then rule involves two distinct parts:
1. Evaluate the antecedent, which involves fuzzifying the input and applying any necessary fuzzy operators.
2. Apply that result to the consequent, known as implication.
In the case of two-valued or binary logic, if-then rules present little difficulty. If the premise is true, then the conclusion is true. In the case of a fuzzy statement, if the antecedent is true to some degree of membership, then the consequent is also true to that same degree; that is,
In binary logic, p → q (p and q are either both true or both false)
In fuzzy logic, 0.5p → 0.5q (partial antecedents provide partial implication)
It should be noted that both the antecedent and the consequent parts of a rule can have multiple components. For example, the antecedent part can be:
if temperature is high and sun is shining and pressure is falling, then …
In this case, all parts of the antecedent are calculated simultaneously and resolved to a single number using the logical operators described previously. The consequent of a rule can also have multiple parts, for example,
if temperature is very hot, then boiler valve is shut and public mains water valve is open
In this case, all consequents are affected equally by the result of the antecedent. The consequent specifies a fuzzy set assigned to the output. The implication function then modifies that fuzzy set to the degree specified by the antecedent. The most common way to modify the output set is truncation using the min function.
In general, interpreting if-then fuzzy rules is a three-part process:
1. Fuzzify inputs. All fuzzy statements in the antecedent are resolved to a degree of membership between 0 and 1.
2. Apply a fuzzy operator to multiple part antecedents. If there are multiple parts to the antecedent, apply fuzzy logic operators and resolve the antecedent to a single number between 0 and 1.
3. Apply the implication method. The degree of support for the entire rule is used to shape the output fuzzy set. The consequent of a fuzzy rule assigns an entire fuzzy set to the output. This fuzzy set is represented by a membership function that is chosen to indicate the quantities of the consequent. If the antecedent is only partially true, then the output fuzzy set is truncated according to the implication method.
Fuzzy inference system
Fuzzy inference is a method that interprets the values in the input vector and, based on some sets of rules, assigns values to the output vector. In fuzzy logic, the truth of any statement becomes a matter of a degree.
Fuzzy inference is the process of formulating the mapping from a given input to an output using fuzzy logic. The mapping then provides a basis from which decisions can be made or patterns discerned. The process of fuzzy inference involves all of the pieces described so far, i.e., membership functions, fuzzy logic operators, and if-then rules. Two main types of fuzzy inference systems can be implemented: Mamdani-type (1977) and Sugeno-type (1985). These two types of inference systems vary somewhat in the way outputs are determined.
Mamdani-type inference expects the output membership functions to be fuzzy sets. After the aggregation process, there is a fuzzy set for each output variable, which needs defuzzification. It is possible, and sometimes more efficient, to use a single spike as the output membership function rather than a distributed fuzzy set. This, sometimes called a singleton output membership function, can be considered a pre-defuzzified fuzzy set. It enhances the efficiency of the defuzzification process because it greatly simplifies the computation required by the more general Mamdani method, which finds the centroid of a two-dimensional function. Instead of integrating across the two-dimensional function to find the centroid, the weighted average of a few data points can be used.
The Sugeno method of fuzzy inference is similar to the Mamdani method in many respects. The first two parts of the fuzzy inference process, fuzzifying the inputs and applying the fuzzy operator, are exactly the same. The main difference between Mamdani-type and Sugeno-type fuzzy inferences is that the output membership functions are only linear or constant for the Sugeno-type fuzzy inference. A typical fuzzy rule in a first-order Sugeno fuzzy model has the form:
(11.141)
where A and B are fuzzy sets in the antecedent, while p, q, and r are all constants. Higher-order Sugeno fuzzy models are possible, but they introduce significant complexity with little obvious merit. Because of the linear dependence of each rule on the system’s input variables, the Sugeno method is ideal for acting as an interpolating supervisor of multiple linear controllers that are to be applied, respectively, to different operating conditions of a dynamic non-linear system. A Sugeno fuzzy inference system is extremely well suited to the task of smoothly interpolating the linear gains that would be applied across the input space, i.e., it is a natural and efficient gain scheduler. Similarly, a Sugeno system is suitable for modeling non-linear systems by interpolating multiple linear models.
Fuzzy systems applications in solar energy systems
The applications of fuzzy systems in solar applications are much fewer. They concern the design of a fuzzy single-axis tracking mechanism controller (Kalogirou, 2002) and a neuro-fuzzy-based model for PV power supply system (Mellit and Kalogirou, 2006b). In fact, the membership functions shown in Figures 11.25 and 11.26 and the rule basis, given previously, are from the first application, whereas the latter is a hybrid system described in the next section.
11.6.4 Hybrid systems
Hybrid systems are systems that combine two or more artificial intelligence techniques to perform a task. The classical hybrid system is the neuro-fuzzy control, whereas other types combine genetic algorithms and fuzzy control or artificial neural networks and genetic algorithms as part of an integrated problem solution or to perform specific, separate tasks of the same problem. Since most of these techniques are problem specific, more details are given here for the first category.
A fuzzy system possesses great power in representing linguistic and structured knowledge using fuzzy sets and performing fuzzy reasoning and fuzzy logic in a qualitative manner. Also, it usually relies on domain experts to provide the necessary knowledge for a specific problem. Neural networks, on the other hand, are particularly effective at representing non-linear mappings in computational fashion. They are “constructed” through training procedures presented to them as samples. Additionally, although the behavior of fuzzy systems can be understood easily due to their logical structure and step by step inference procedures, a neural network generally acts as a “black box”, without providing explicit explanation facilities. The possibility of integrating the two technologies was considered quite recently into a new kind of system, called neuro-fuzzy control, where several strengths of both systems are utilized and combined appropriately.
More specifically, neuro-fuzzy control means (Nie and Linkens, 1995):
1. The controller has a structure resulting from a combination of fuzzy systems and ANNs.
2. The resulting control system consists of fuzzy systems and neural networks as independent components performing different tasks.
3. The design methodologies for constructing respective controllers are hybrid ones coming from ideas in fuzzy and neural control.
In this case, a trained neural network can be viewed as a means of knowledge representation. Instead of representing knowledge using if-then localized associations as in fuzzy systems, a neural network stores knowledge through its structure and, more specifically, its connection weights and local processing units, in a distributed or localized manner. Many commercial software (such as Matlab) include routines for neuro-fuzzy modeling.
The basic structure of a fuzzy inference system is described in Section 11.6.3. This is a model that maps the input membership functions, input membership function to rules, rules to a set of output characteristics, output characteristics to output membership functions, and the output membership function to a single-valued output or decision associated with the output. Thus, the membership functions are fixed. In this way, fuzzy inference can be applied to modeling systems whose rule structure is essentially predetermined by the user’s interpretation of the characteristics of the variables in the model.
In some modeling situations, the shape of the membership functions cannot be determined by just looking at the data. Instead of arbitrarily choosing the parameters associated with a given membership function, these parameters could be chosen to tailor the membership functions to the input–output data in order to account for these types of variations in the data values. If fuzzy inference is applied to a system for which a past history of input–output data is available, these can be used to determine the membership functions. Using a given input–output data set, a fuzzy inference system can be constructed, whose membership function parameters are tuned or adjusted using a neural network. This is called a neuro-fuzzy system.
The basic idea behind a neuro-fuzzy technique is to provide a method for the fuzzy modeling procedure to learn information about a data set, in order to compute the membership function parameters that best allow the associated fuzzy inference system to track the given input–output data. A neural network, which maps inputs through input membership functions and associated parameters, then through output membership functions and associated parameters to outputs, can be used to interpret the input–output map. The parameters associated with the membership functions will change through a learning process. Generally, the procedure followed is similar to any neural network technique described in Section 11.6.1.
It should be noted that this type of modeling works well if the data presented to a neuro-fuzzy system for training and estimating the membership function parameters is representative of the features of the data that the trained fuzzy inference system is intended to model. However, this is not always the case, and data are collected using noisy measurements or training data cannot be representative of all features of the data that will be presented to the model. For this purpose, model validation can be used, as in any neural network system. Model validation is the process by which the input vectors from input–output data sets that the neuro-fuzzy system has not seen before are presented to the trained system to check how well the model predicts the corresponding data set output values.

Leave a Reply