ISA Transactions 51 (2012) 111–119 Contents lists available at SciVerse ScienceDirect ISA Transactions journal homepage: www.elsevier.com/locate/isatrans Control chart pattern recognition using K-MICA clustering and neural networks Ataollah Ebrahimzadeh ∗ , Jalil Addeh, Zahra Rahmani Faculty of Electrical and Computer Engineering, Babol University of Technology, Babol, Iran article info abstract Article history: Automatic recognition of abnormal patterns in control charts has seen increasing demands nowadays in Received 26 May 2011 manufacturing processes. This paper presents a novel hybrid intelligent method (HIM) for recognition Received in revised form of the common types of control chart pattern (CCP). The proposed method includes two main modules: 18 August 2011 a clustering module and a classifier module. In the clustering module, the input data is first clustered Accepted 23 August 2011 Available online 28 October 2011 by a new technique. This technique is a suitable combination of the modified imperialist competitive algorithm (MICA) and the K-means algorithm. Then the Euclidean distance of each pattern is computed Keywords: from the determined clusters. The classifier module determines the membership of the patterns using Control chart patterns the computed distance. In this module, several neural networks, such as the multilayer perceptron, Clustering probabilistic neural networks, and the radial basis function neural networks, are investigated. Using the Neural networks experimental study, we choose the best classifier in order to recognize the CCPs. Simulation results show Modified imperialist competitive algorithm that a high recognition accuracy, about 99.65%, is achieved. K -means algorithm © 2011 ISA. Published by Elsevier Ltd. All rights reserved. 1. Introduction literature review shows that the techniques that use supervised neural networks as the classifier have higher performances. The Control chart patterns (CCPs) are important statistical process advantage with a neural network is that it does not require the control tools for determining whether a process is run in its provision of explicit rules or templates. Most existing techniques intended mode or in the presence of unnatural patterns. CCPs can use unprocessed data as the inputs of the CCP recognition system. exhibit six types of pattern: normal (NR), cyclic (CC), increasing The use of unprocessed CCP data has many additional problems, trend (IT), decreasing trend (DT), upward shift (US), and downward such as the amount of data to be processed being large. On the shift (DS) [1]. Except for normal patterns, all other patterns indicate other hand, the approaches which use features are more flexible that the process being monitored is not functioning correctly and to deal with a complex process problem, especially when no prior requires adjustment. Fig. 1 shows six pattern types of control chart. information is available. If the features represent the characteristic Over the years, numerous supplementary rules known as zone of patterns explicitly, and if their components are reproducible tests or run tests [2] have been proposed to analyze control with the process conditions, the classifier recognition accuracy will charts. Interpretation of the process data still remains difficult increase [15]. Further, if the feature is amenable to reasoning, it will because it involves pattern recognition tasks. It often relies on help in understanding how a particular decision was made, and the skill and experience of the quality control personnel to this makes the recognition process a transparent process. Features identify the existence of an unnatural pattern in the process. An could be obtained in various forms, including principal component efficient automated control chart pattern (CCP) recognition system analysis shape features [11,13], correlation between the input can compensate this gap and ensure consistent and unbiased and various reference vectors [16], and statistical correlation interpretation of CCPs, leading to a smaller number of false coefficients [17]. alarms and better implementation of control charts. With this This paper presents a novel hybrid intelligent method (HIM) for recognition of the common types of control chart pattern. The aim, several approaches have been proposed for CCP recognition. proposed method includes two main modules: a clustering module Some of the researchers have used expert systems [2], the fuzzy- and a classifier module. In the clustering module, the input data clustering method [3] and decision tree (DT) based classifiers [4]. is first clustered by a new technique. This technique is a suitable Other researchers have used artificial neural networks (ANNs) for combination of the modified imperialist competitive algorithm recognition of CCPs [5–15]. ANNs can be simply classified into (MICA) and the K -means algorithm. Then the Euclidean distance of two main categories: supervised ANNs and unsupervised ANNs. A each pattern is computed from the determined clusters. It is used as the characteristic feature. The classifier module determines the membership of the patterns using the computed distance. ∗ Corresponding author. The rest of paper is organized as follows. Section 2 explains E-mail address:
[email protected](A. Ebrahimzadeh). the general scheme of the proposed method. Sections 3 and 4 0019-0578/$ – see front matter © 2011 ISA. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.isatra.2011.08.005 112 A. Ebrahimzadeh et al. / ISA Transactions 51 (2012) 111–119 a b c d e f Fig. 1. Six various basic patterns of control charts: (a) normal pattern, (b) cyclic pattern, (c) upward trend, (d) downward trend, (e) upward shift, and (f) downward shift. 3.1. K -means K -means is one of the simplest unsupervised learning algo- rithms. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume K clus- ters) fixed a priori. The main idea is to determine the K centroids. These centroids should be placed in a cunning way, as different lo- cations cause different results. So, the best choice is to place them as far away from each other as possible. The next step is to take each point referring to a given data set and associate it to the near- Fig. 2. General scheme of the proposed method. est centroid. When no point remains, the first step is completed, and an early grouping is done. At this point, we need to recalculate describe the main parts of the proposed method, i.e., the clustering the K new centroids as centers of the clusters resulting from the technique and classifier module, respectively. Section 5 shows previous step. After we have these K new centroids, a new binding some simulation results, and Section 6 concludes the paper. has to be done between the same data set points and the nearest new centroid. A loop has been generated. As a result of this loop, we may notice that the K centroids change their location step by 2. General structure of the proposed method step until no more changes are made. In other words, the centroids do not move any further. Finally, the goal of this algorithm is to The proposed method includes two main modules: a clustering minimize an objective function, which in this case is a squared er- module and a classifier module. Fig. 2 shows the general scheme ror function [18,19]. The objective function has been calculated as of this method. In the clustering module, the input data is follows: first clustered by a new technique. This technique is a suitable N combination of the modified imperialist competitive algorithm − cos t (X ) = min{‖Yi − Xi ‖}, j = 1, 2, 3, . . . , k (1) (MICA) and the K -means algorithm. It is named the K-MICA i=1 clustering technique. Then the Euclidean distance of each pattern is computed from the determined clusters. It is used as the where ‖Yi − Xi ‖ is a chosen distance measurement between a data characteristic feature. It is obvious that the Euclidean distance of input Yi and the cluster center Xi . N and K are the number of input each signal from its own cluster center is smaller than the value data and the number of cluster centers, respectively. The algorithm is composed of the following steps. of Euclidean distance of that signal from other cluster centers. Subsequently, each signal is shown as a 1 × 6 vector, in which 1. Place the K points into the space represented by the objects that one of the rows has a small value and the remaining rows have a are clustered. These points represent initial group centroids. large value. Application of this approach causes the dimension of 2. Assign each object to the group that has the closest centroid. 3. When all objects have been assigned, recalculate the positions the classifier to be decreased. Also, we have a more efficient feature of the K centroids. that is better than the raw data. The classifier module determines 4. Repeat Steps 2 and 3 until the centroids are immobilized. the membership of the patterns using the computed distance. The following sections present the main modules. Although it can be proved that the procedure will always terminate, the k-means algorithm does not necessarily find the most optimal configuration, corresponding to the global objective 3. K-MICA clustering technique function minimum. The algorithm is also significantly sensitive to the initial randomly selected cluster centres. The k-means Clustering is an important problem that must often be solved algorithm can be run multiple times to reduce this effect. as a part of more complicated tasks in pattern recognition, image analysis, and other fields of science and engineering. Clustering 3.2. The imperialist competitive algorithm (ICA) procedures partition a set of objects into clusters such that objects in the same cluster are more similar to each other than objects The imperialist competitive algorithm (ICA) is a population- in different clusters, according to some predefined criteria. In this based stochastic search algorithm. It was introduced by Atashpaz section, the applied clustering algorithm is described. and Lucas [20,21]. Since then, it has been used to solve some A. Ebrahimzadeh et al. / ISA Transactions 51 (2012) 111–119 113 Step 1: The initial population for each empire should be generated randomly. Step 2: Move the colonies toward their relevant imperialist. Step 3: Exchange the position of a colony and the imperialist if its cost is lower. Step 4: Compute the objective function of all empires. Step 5: Pick the weakest colony and give it to one of the best empires. Step 6: Eliminate the powerless empires. Step 7: If there is just one empire, stop; if not, go to 2. The last imperialist is the solution of the problem. 3.3. Modified ICA (MICA) In order to improve the convergence velocity and accuracy Fig. 3. Generating the initial empire. of the ICA, [23] recommends a modified imperialist competitive algorithm (MICA). Premature convergence may occur under different situations: the population converges to local optima of the objective function or the search algorithm proceeds slowly or does not proceed at all. In [23], a new mutation operator is proposed. Mutation is a powerful strategy which diversifies the ICA population and improves the ICA’s performance by preventing premature convergence to local minima. During the assimilation policy, each colony (X ) moves toward its relevant imperialist by a unit, where the initial distance between them is S. The new t +1 position of each colony would be Xmove ,j (t is the iteration number): Fig. 4. Moving colonies toward their relevant imperialist. α∼ = U (0, β × S ) (3) Xmove,j = Xjt + α, t +1 kinds of optimization problem. The algorithm is inspired by imperialistic competition. It attempts to present the social policy t +1 where Xjt and Xmove ,j are the jth colony of each empire. After this of imperialisms to control more countries and use their sources t +1 when colonies are dominated by some rules. If one empire loses movement for each colony X , a mutant colony Xmut is generated as its power, the others will compete to take possession of it. In the follows: ICA, this process is simulated by individuals that are known as ,j = Xm1 + rand(.) × (Xm2 − Xm3 ) t +1 t t t Xmut (4) countries. This algorithm starts with a randomly initial population and Xmut,j = [Xmut,1 , Xmut,2 , . . . , Xmut,b ]1×b , b = k × d. objective function which is computed for them. The most powerful Then the selected colony would be countries are selected as imperialists and the others are colonies of ,j = [Xnew,1 , Xnew,2 , . . . , Xnew,b ]1×b t +1 these imperialists. Then a competition between imperialists takes Xnew place to get more colonies. The best imperialist has more chance if rand(.) < γ Xmut,z to possess more colonies. Then one imperialist with its colonies Xnew,z = z = 1, 2, . . . , b, (5) makes an empire. Fig. 3 shows the initial populations of each Xz otherwise, empire [20–22]. If the empire is bigger, its colonies are greater and where rand(.) is a random number between 0 and 1, and γ is a the weaker ones are less. In this figure, Imperialist 1 is the most number less than 1. m1 , m2 , m3 are three individuals which are powerful and has the greatest number of colonies. selected from initial colonies randomly. In order to cover the entire After dividing the colonies between the imperialists, these colonies uniformly, it is better to select them as m1 ̸= m2 ̸= m3 ̸= colonies approach their related imperialist countries. Fig. 4 repre- j. k is the number of clusters, and the dimension of each cluster sents this movement. Based on this concept, each colony moves center will be d. To choose the best colony between Xmove,j and toward the imperialist by α units and reaches its new position. Xnew,j to replace the jth colony (Xj ), an objective function is used: α ≈ U (0, β × S ). (2) t +1 if cost(Xmove t +1 ,j ) ≤ cost(Xnew,j ) t +1 Here, α is a random variable with uniform (or any proper) distri- Xmove ,j Xjt +1 = (6) bution; β , a number greater than 1, causes colonies move toward t +1 Xnew ,j otherwise. their imperialists from different direction, and S is the distance be- tween the colony and imperialist. If after this movement one of the colonies possess more power 3.4. Hybrid K-MICA than its relevant imperialist, they will exchange their positions. To begin the competition between empires, the total objective As mentioned before, K -means is used for its ease and simplicity function of each empire should be calculated. It depends on the for applications. However, it has some drawbacks. First, its result objective function of both the imperialist and its colonies. Then the may depend on the initial values. Also, it may converge to a local competition starts: the weakest empire loses its possessions and minimum. Recently, numerous ideas have been used to alleviate powerful empires try to gain them. An empire that has lost all its this drawback by using global optimization algorithms such as the colonies will collapse. Finally, the most powerful empire will take genetic algorithm (GA) [24], hybrid PSO-SA [25], hybrid PSO-ACO- possession of other empires, and it wins the competition. K [26], and HBMO [27]. In this study, we used a method that was To apply the ICA for clustering, the following steps have to be given in [23], called the hybrid K-MICA. According to the original taken [22]. ICA, first a primary population is generated and then empires with 114 A. Ebrahimzadeh et al. / ISA Transactions 51 (2012) 111–119 centerj = [x1 , x2 , . . . , xd ] j = 1, 2, . . . , k H =k×d X0 = [X01 , X02 , . . . , X0H ] j j j x0 = rand(.) × (xjmax − xmin ) + xmin , j = 1, 2, . . . , H Xi = [ , x1i ,..., x2i xH i ], i = 1, 2, . . . , Ninitial j j j xi =4× xi−1 × (1 − xi−1 ), j = 1, 2, . . . , H xmin j < xj < xmax j , where centerj is the jth cluster center for the ith country. Xi is one of the countries. Ninitial is the population number and d is the dimen- sion of each cluster center. xmaxj and xmin j (each feature of center) are the maximum and minimum value of each point referring to the jth cluster center, which are in order. k is the number of clus- ters. H is the number of state variables. X0 is an initial solution. Step 2: Calculate the objective function value. Suppose that we have N sample feature vectors. The objective function is evaluated for each country as follows: Step 2-1: i = 1 and Objec = 0. Step 2-2: select the ith sample. Step 2-3: calculate the distances between the ith sample and centerj (j = 1, 2, . . . , K ). Step 2-4: add the value of Objec with the minimum distance calculated in Step 2-3. (Objec = Objec + min(|centeri − Ym | , i = 1, 2, . . . , K )). Step 2-5: if all samples have been selected, go to the next step, otherwise i = i + 1 and return to step 2-2. Step 2-6: cost(X ) = Objec. The objective function is calculated mathematically as below: N − cost(X ) = min(|centeri − Ym | , i = 1, 2, . . . , K ). (8) m=1 Step 3: Sort the initial population based on the objective function values. The initial population is increased based on the value of the objective function. Fig. 5. Pseudo-code of the K-MICA algorithm. Step 4: Select the imperialist states. Countries that have the minimum objective function are se- their possessions appear. Applying K -means to each empire causes lected as the imperialist states and the remaining ones form the us to improve the initial population of colonies. It makes the hybrid colonies of these imperialists. algorithm converge more quickly and prevents it from falling into Step 5: Divide the colonies among the imperialists. local optima. The outputs of K -means form the initial empires of Based on the power of each imperialist the colonies are divided the modified ICA. among them. The power of each imperialist is calculated as follows. To improve the outcome of the algorithm, it is better that a powerless imperialist is not removed when it loses all possessions. Cn = max {cost} − costn (9) This imperialist is one of the best answers, and it can contribute in imperialistic competition as a weak colony or as one to be given to a powerful empire. The pseudo-code of this algorithm is shown in C n Fig. 5. Pn = (10) N∑ imp To apply the ICA for clustering, the following steps have to be C i taken [23]. i=1 Step 1: Generate an initial population. Cnnorm = round(Pn (Ncol )). (11) An initial population of input data is generated by chaos initial- In the above equations, costn is the cost of the nth imperialist and ization as follows: Cn the normalized cost of each one. The normalized power of each imperialist is introduced as Pn ; then the initial number of colonies X1 X2 for each empire will be Cnnorm , where Ncol and Nimp are the total Population = ... numbers of colonies and imperialists. XNinitial Step 6: Use the K -means algorithm for each empire. Xi = Countryi = [center1 , center2 , . . . , centerk ] Step 7: Move colonies toward their imperialist states as described i = 1, 2, . . . , Ninitial (7) in Section 3. A. Ebrahimzadeh et al. / ISA Transactions 51 (2012) 111–119 115 Step 8: Use mutation to change the direction of colonies. This is Step 13: Apply chaotic local search (CLS) to search around the mentioned in modified ICA. global solution. The ICA has gained much attention and widespread Step 9: Check the cost of all colonies in each empire. applications in different fields. However, it often converges to local During the previous steps, the cost of each colony might have optima. In order to avoid this shortcoming, a CLS algorithm is used changed. Check the cost of all colonies of an empire. If there is one to search around the global solution in this study. CLS is based on that has a lower cost than its relevant imperialist, exchange their the logistic equation as follows: positions. Cxi = [Cx1i , Cx2i , . . . , CxH i ]1×H , i = 0, 1, 2, . . . , Nchaos (18) Step 10: Check the total cost of each empire. j j The cost of each empire depends on the power of both the Cxii+1 =4× Cxi × (1 − Cxi ), j = 1, 2, . . . , H imperialist and its colonies. It is calculated as follows: j Cx0 = rand(.) TCn = cost(imperialistn ) ∫ j Cxi [0, 1], Cxj0 ̸∈ {0.25, 0.5, 0.75} . + ξ mean {cost(colonies of empiren )} . (12) TCn is the total cost of the nth empire, and ξ is an attenuation In the CLS, the best solution is considered as an initial solution 0 0 coefficient between 0 and 1 to reduce the effect of the cost of the Xds for the CLS. Xcls is scaled into (0, 1) according to the following colonies. equation: Step 11: Perform an imperialistic competition. 0 = [Xds1 ,0 , Xds2 ,0 , . . . , Xcls H Xcls ,0 ]1×H (19) All empires, according their power (total cost), try to get the colonies of the weakest empire. Cx = [ , Cx10 Cx20 ,..., CxH 0 ] j j TCnnorm = max {TCi } − TCn (13) j − xcls,0 xmin Cx0 = j j , j = 1, 2, . . . , H . − xmax xmin TC norm The chaos population for CLS is generated as follows: PPn = N n , (14) ,i , Xcls,i , . . . , Xcls,i ]1×H , i = 1, 2, . . . , Nchaos imp i 1 2 H ∑ TCinorm Xcls = [Xcls (20) j j j j i =1 xcls,i = cxi−1 ×( xjmax − xmin )+ xmin , j = 1, 2, . . . , H where TCnnorm is the normalized total cost of the nth empire and the j possession probability of each empire is PPn . where cxi indicates the jth chaotic variable, and Nchaos is the A roulette wheel can be used for stochastic selection of the number of individuals for th CLS. rand(.) is a random number winning empire which will dominate the weakest colony of the between 0 and 1. weakest empire. To perform the roulette wheel algorithm, it is The objective function is evaluated for all individuals of the CLS. necessary to calculate the cumulative probability as follows: One country selected randomly is replaced with the best solution n among them. − CPn = PPn . Step 14: Check the number of empires. i=1 If there is just one empire remaining, stop. Else go to step 7. According to this equation, the cumulative probability for n = 1 is equal to its probability, while for the last n it corresponds to 1. 4. Classifiers Then a random number with uniform distribution is generated and compared with all CPn . This section briefly describes the neural network classifiers. Each sector with higher probability will have more chance to be chosen. Therefore the winner empire will specify. 4.1. Multi-layer perceptron (MLP) neural networks As mentioned, to use the roulette wheel algorithm, computing the cumulative distribution function is essential. To reduce this An MLP neural network consists of an input layer (of source time-consuming step an approach has been presented as follows: nodes), one or more hidden layers (of computation nodes), P = [PP1 , PP2 , . . . , PPN ] (15) and an output layer. The recognition basically consists of two imp phases: training and testing. In the training stage, weights are R = [r1 , r2 , . . . , rNimp ] r1 , r2 , . . . , rNimp ≈ U (0, 1) (16) calculated according to the chosen learning algorithm. The issue of the learning algorithm and its speed is very important for the D = P − R = [D1 , D2 , . . . , DNimp ] MLP model. In this study, the following learning algorithms are = [PP1 − r1 , PP2 − r2 , . . . , PPN − rNimp ], imp (17) considered. where P is the vector of possession probability of all empires 4.1.1. Back-propagation with momentum (BP with momentum) and R is a vector with uniformly distributed random numbers. algorithm The maximum index in D shows the winner empire that gets the colony. The BP algorithm makes use of gradient descent with a momen- After realizing the winner empire, the weakest colony of the tum term to smooth out oscillation [28]. Eq. (21) gives the weight weakest empire will be given to the winner empire. Then we update for BP with momentum: should subtract one of the populations of this weak empire and δE δE add one to the winner’s population. 1Wij (t + 1) = −ε (t ) + µ (t − 1), (21) δ Wij δ Wij Step 12: Remove the weakest empire. If there is any empire without a colony, eliminate it. Replace where wij represents the weight value from neuron j to neuron i, ε one of the weakest colonies of the best empire (low cost) with this is the learning rate parameter, and E represents the error function. imperialist. It adds an extra momentum parameter, µ, to the weight changes. 116 A. Ebrahimzadeh et al. / ISA Transactions 51 (2012) 111–119 4.1.2. Resilient back-propagation (RPROP) algorithm Table 1 K-MICA parameters for clustering. The RPROP algorithm considers the sign of derivatives as the Parameter Value indication for the direction of the weight update [29]. In doing so, the size of the partial derivative does not influence the weight step. Npop 50 Nimp 12 The following equation shows the adaptation of the update values γ 0.4 of ∆ij (weight changes) for the RPROP algorithm. For initialization, ζ 0.1 all are set to small positive values: β 20 Maximum number of iterations 100 δE δE η+ × ∆ij (t − 1); if (t − 1) (t ) > 0 δ Wij δ Wij 4.3. Probabilistic neural networks (PNNs) ∆ij (t ) = δE δE (22) η− × ∆ij (t − 1); if (t − 1) (t ) < 0 δ Wij δ Wij η × ∆ij (t − 1); otherwise, A probabilistic neural network (PNN) is a kind of radial basis 0 network suitable for classification problems. PNNs have three where η0 = 0, 0 < η− < 1 < η+ , η−,0,+ are known as layers: input, pattern, and summation. The input layer has as many the update factors. Whenever the derivative of the corresponding elements as there are individual parameters needed to describe weight changes its sign, this implies that the previous update the samples to be classified [30]. The pattern layer organizes the value is too large and it has skipped a minimum. Therefore, the training set in such a way that an individual processing element update value is then reduced (η− ), as shown above. However, if the represents each input vector. The pattern units in the probabilistic derivative retains its sign, the update value is increased (η+ ). This network are used to store pattern examples, taken directly from will help to accelerate convergence in shallow areas. To avoid over- the training data. The entire set of training data is used, and so the acceleration, in the epoch following the application of (η+ ), the number of units in the first hidden layer is set equal to the number new update value is neither increased nor decreased (η0 ) from the of training cases. The summation layer has as many processing previous one. Note that the values of ∆ij remain non-negative in elements as there are classes to be recognized, and simply collects every epoch. This update value adaptation process is then followed the outputs from all hidden neurons of each respective class. The by the actual weight update process, which is governed by the products of the summation layer are forwarded to the output (one following equations: neuron for each data class), where the estimated probability of the new pattern being a member of that data class is computed. The δE transfer function is a radial basis function for the first layer and is −∆ij ; if (t ) > 0 a competitive function for the second layer. Only the first layer has δ Wij 1Wij (t ) = δE (23) biases. Training of the probabilistic neural network is much easier +∆ij ; if (t ) < 0 than with back-propagation. It can be simply finished by setting δ Wij the weights of the network using the training set. The outputs of 0; otherwise. summary layer are binary values. If yi is larger than input of other The values of the training parameters adopted for the algorithms neurons (which means that this input pattern belongs to class i), yi were determined empirically. They were η− = .05, η+ = 1.2. is set to 1, otherwise it is set to zero. 4.2. Radial basis function neural networks (RBFNNs) 5. Simulation results RBF neural networks with their structural simplicity and In this section, the performance of the proposed recognizer is training efficiency are a good candidate to perform a nonlinear evaluated. For this purpose, we have used practical and real-world mapping between the input and the output vector space. The data [31]. This dataset contains 600 examples of control charts. For RBFNN is fully connected feed forward structure and it consists this study, we have used 60% of the data for training the classifier of three layers, namely, an input layer, a single layer of nonlinear and the rest for testing. The easiest way to assess the performance processing units, and an output layer. The input layer is composed rate is to choose a test set independent of the training set and of input nodes that are equal to the dimension of the input vector validation set to classify its examples, count the examples that have x. The output of the jth hidden neuron with Gaussian transfer been correctly classified, and divide by the size of the test set. The function can be calculated as proportion of test-set examples that are classified correctly to the total samples estimates the performance of the recognizer for each hj = exp(−‖x − cj ‖2 /σ 2 ), (24) pattern. In order to obtain the recognition accuracy (RA) of system, one needs to compute the average value of the performances of the where hj is the output of the jth neuron, x ∈ ℜn×1 is an input CCPs. vector, cj ∈ ℜn×1 is the jth RBF center, σ is the center spread We apply the K-MICA for finding the optimum cluster centers. parameter, which controls the width of the RBF, and ‖.‖2 represent The parameters of the clustering algorithm used in this study the Euclidean norm. The output of any neuron at the output layer are shown in Table 1. These values were selected for the best of the RBF network is calculated as performance after several experiments. k As an example, patterns’ Euclidean distance of signals from cluster center 2 (normal pattern) is shown in Fig. 6. The horizontal − yi = wij hj , (25) j =1 axis shows the number of signals and the vertical axis shows the Euclidean distance of signals from cluster center 2. As shown in the where wij is the weight connecting hidden neuron j to output figure, the normal pattern signals have smaller values due to other neuron i and k is the number of hidden layer neurons. signals. A. Ebrahimzadeh et al. / ISA Transactions 51 (2012) 111–119 117 Table 6 Comparison of the effect of the clustering algorithm. Classifier Clustering algorithm RA (%) MLP-RP GA 98.6 MLP-RP FCM 97.2 MLP-RP KMC 97.3 MLP-BP with momentum GA 99.04 MLP-BP with momentum FCM 97.18 MLP-BP with momentum KMC 97.74 RBF GA 98.73 RBF FCM 97.85 RBF KMC 97.44 PNN GA 96.7 PNN FCM 93.9 PNN KMC 92.4 Fig. 6. Euclidean distance of signals from cluster center 2. Table 7 Recognition accuracy of the recognizer for different values of parameters. Table 2 Case Npop Nimp β ξ γ Classifier RA (%) MLP architecture and training parameters. 1 50 12 25 0.5 0.6 MLP(RP) 99.54 Number of layers 2 2 50 10 25 0.1 0.4 MLP(RP) 99.55 Number of output neurons 6 3 50 8 25 0.05 0.2 MLP(RP) 99.60 Learning algorithm Resilient back-propagation 4 50 12 20 0.1 0.4 MLP(RP) 99.65 Back-propagation with momentum 5 50 10 20 0.5 0.8 MLP(RP) 99.51 The initial weights and basis Random 6 50 8 20 0.05 0.6 MLP(RP) 99.61 Activation function (RP) Tangent-sigmoid 7 50 10 15 0.5 0.8 MLP(RP) 99.56 Linear 8 50 8 15 0.05 0.4 MLP(RP) 99.63 9 50 10 10 0.1 0.8 MLP(RP) 99.60 10 50 8 10 0.05 0.6 MLP(RP) 99.62 Table 3 Performance comparisons of different classifiers with row data. Classifier Parameter RA (%) independent runs. As depicted in Table 3, using various neural MLP (RP) NNHL = 20 93.81 networks and unprocessed data (raw data), the highest accuracy is MLP (with momentum) NNHL = 10 93.57 95.33%, which is achieved by the RBF neural network. From Table 4, RBFNN Spread = 2 95.33 PNN Spread = 5 93.75 it can be found that, by using the proposed features as the input of the classifier, the accuracy of the system is increased to 99.65%. This accuracy value is achieved by the MLP neural network with the RP Table 4 Performance comparisons of different classifiers with Euclidean distance of signals learning algorithm. from cluster center. A comparison between Tables 3 and 4 shows the efficiency of Classifier Parameter RA (%) the proposed feature. As can be recognized from these tables, the MLP (RP) NNHL = 20 99.65 accuracy of MLP (RP), MLP (momentum), the PNN, and the RBF MLP (with momentum) NNHL = 30 99.23 neural network, are increased to 99.65%, 99.23%, 99.53%, 96.42%, RBFNN Spread = 7 99.53 respectively, which shows the significant role of the feature in PNN Spread = 3 96.42 classifier performance. In order to indicate the details of the recognition for each pat- 5.1. Performance comparison of different neural network and pro- tern, the confusion matrix of the recognizer without optimization posed features is shown in Table 5. As we know, the values in the diagonal of the confusion matrix show the correct performance of the recognizer The training parameters and the configuration of the MLP for each pattern. In other words, these values show how many of used in this study are shown in Table 2. The MLP classifiers considered patterns are recognized correctly by the system. The were tested with various neurons for a single hidden layer, and other values show the mistakes of the system. For example, look at the best networks were selected. For the PNN and RBFNN, a the third row of this matrix. The value of 97.90% shows the percent- Gaussian activation function and a single hidden layer with 360 neurons were considered. These values were selected for the best age of correct recognition of upward trend patterns and the value performance after several experiments. of 2.10% shows that this type of pattern is wrongly recognized as Tables 3 and 4 show the recognition accuracy (RA) of different upward shift patterns. In order to obtain the recognition accuracy systems. In these tables, NNHL means the number neurons in (RA) of the system, it is needed to compute the average value that the hidden layers. The obtained results are the average of 10 will be appeared in the diagonal of the confusion matrix. Table 5 Confusion matrix for best result. Normal Cyclic Upward trend Downward trend Upward shift Downward shift Normal 100% 0 0 0 0 0 Cyclic 0 100% 0 0 0 0 Upward trend 0 0 97.9% 0 2.1% 0 Downward trend 0 0 0 100% 0 0 Upward shift 0 0 0 0 100% 0 Downward shift 0 0 0 0 0 100% 118 A. Ebrahimzadeh et al. / ISA Transactions 51 (2012) 111–119 Table 8 A summary of different classification algorithms together with their reported results used measures of the accuracy. Ref. no Number of CCP types Input Classifier Total recognition accuracy (%) [5] 6 Unprocessed data MLP(RSFM) 97.46 [6] 6 Unprocessed data MLP 94.30 [7] 6 Unprocessed data PNN 95.58 [8] 6 Unprocessed data MLP 93.73 [9] 6 Unprocessed data LVQ 97.7 [35] 4 Unprocessed data MLP(SPA) 96.38 This work 6 Euclidean distance MLP 99.65 5.2. Performance comparison of different clustering algorithms 6. Conclusion The performance of the classifiers has been compared with Accurate recognition of control chart patterns (CCPs) is different clustering algorithms. For this purpose, fuzzy C-mean very important for producing high-quality products. This study clustering [32,33], K -mean clustering [18,19], and genetic algo- investigated the design of an accurate system for automatic rithm clustering [34] are considered. Table 6 shows the RA of recognition of CCPs. Here the usage of the Euclidean distance different systems. From Tables 4 and 6, it can be seen that the of the patterns from the cluster centers is proposed as the K-MICA clustering based system achieves higher recognition ac- efficient features. For extraction of these features we have used a curacy, 99.65%. combination of the K -means algorithm and modified imperialist competitive algorithm (MICA). This algorithm is named as K-MICA. The extracted features are applied to the different types of the 5.3. Effects of the K-MICA parameters on the performance of the neural networks. Simulations result show that the MLP neural method network with RP learning algorithm has the highest rate of the recognition accuracy (RA). This RA is about 99.65%. In this subsection, we report the sensitivity of the recognition system with respect to Nimp , Npop , βξ , and γ , which control the References behavior, and thus the goodness of the K-MICA search process. The results obtained for 10 sets of parameters are shown in Table 7. To [1] Montgomery DC. Introduction to statistical quality control. 5th ed. Hoboken investigate the influence of the parameters in each case, the neural (NJ, USA): John Wiley; 2005. [2] Swift JA, Mize JH. Out-of-control pattern recognition and analysis for quality network is tested 10 times independently with various numbers of control charts using LISP-based systems. Computers & Industrial Engineering neurons in the hidden layer. The average of the best obtained value 1995;28:81–91. is given in the table. It illustrates that this hybrid system has a little [3] Wang CH, Kuo W. Identification of control chart patterns using wavelet filtering and robust fuzzy clustering. Journal of Intelligent Manufacturing dependency on variation of the parameters. 2007;18:343–50. [4] Wang CH, Guo RS, Chiang MH, Wong JY. Decision tree based control chart pattern recognition. International Journal of Production Research 2008; 5.4. Comparison of the proposed method (HIM) with other methods 124–34. in the literature [5] Le Q, Goal X, Teng L, Zhu M. A new ANN model and its application in pattern recognition of control charts. In: Proc. IEEE. WCICA. 2008. p. 1807–11. [6] Pham DT, Oztemel E. Control chart pattern recognition using neural networks. Several researchers in the past have addressed arrhythmia Journal of Systems Engineering 1992;2:256–62. [7] Cheng Z, Ma Y. A research about pattern recognition of control chart using detection and the classification problem using the CCP signals probability neural network. In: Proc. ISECS. 2008. p. 140–5. directly or by analyzing the pattern rate variability signal. Direct [8] Sagiroujlu S, Besdoc E, Erler M. Contro chart pattern recognition using artificial comparison with other works is difficult for control chart pattern neural networks. Turkish Journal of Electrical Engineering 2000;8:137–47. [9] Pham DT, Oztemel E. Control chart pattern recognition using linear vector recognition problems. This is mainly because of the fact that there quantization networks. International Journal of Production Research 1994; is no single unified data set available. A different setup of patterns 256–62. (for example, the number of training and testing samples and the [10] Guh RS. Robustness of the neural network based control chart pattern recognition system to non-normality. International Journal of Quality and number of patterns) will lead to different performance. Besides, Reliability Management 2002;19:97–112. there are many different kinds of benchmarking system used [11] Gauri SK, Chakraborty S. A study on the various features for effective control for system quality. This causes difficulties for direct numerical chart pattern recognition. International Journal of Advanced Manufacturing Technology 2007;34:385–98. comparison. Table 8 compares the different methods in case of: the [12] Gou Y, Dooley KJ. Identification of change structure in statistical process recognition accuracy, the used classifier and the used inputs. control. International Journal of Production Research 1992;30:1655–69. As for neural network-based CCP recognizers, Le et al. [5] [13] Wani MA, Rashid S. Parallel algorithm for control chart pattern recognition. In: Proc. IEEE the fourth international conference on machine learning and introduced a new ANN model, and their numerical simulations applications. ICMLA’05. 2005. showed that this model has a recognition accuracy of about [14] Guh RS, Shiue YR. On-line identification of control chart patterns using self- 97.46% for recognition of six types of CCP. Pham and Oztemel [6] organized approaches. International Journal of Production Research 2005;43: 1225–54. reported a generalization rate of 94.30%. Cheng and Ma [7] have [15] Pacella M, Semeraro Q, Anglani A. Adaptive resonance theory-based neural gained recognition accuracy (RA) of about 95.58%. However, the algorithms for manufacturing process quality control. International Journal of performance for lower patterns is reported to be less than 90%. Production Research 2004;43:4581–607. [16] Al-Ghanim AM, Ludeman LC. Automated unnatural pattern recognition on The proposed method in [8] reached a RA of about 93.73% of control charts using correlation analysis techniques. Computers & Industrial classification accuracy for recognition of six types of CCP. In [9], the Engineering 1997;32:679–90. authors used an LVQ neural network and achieved an RA of about [17] Yang JH, Yang MS. A control chart pattern recognition scheme using a statistical correlation coefficient method. Computers & Industrial Engineering 97.70%. In [35], Guh and Tannock proposed a sequential pattern 2005;48:205–21. analysis method and reported a classification rate of about 96.38%. [18] Morales AK, Erazo FR. A search space reduction methodology for data mining Compared to these papers, an effective system is proposed in the in large databases. Engineering Applications of Artificial Intelligence 2009;22: 57–65. current work which provides a better accuracy over a wider range [19] Sakai T, Imiya A. Unsupervised cluster discovery using statistics in scale space. of different types of CCP (six different classes). Engineering Applications of Artificial Intelligence 2009;22:92–100. A. Ebrahimzadeh et al. / ISA Transactions 51 (2012) 111–119 119 [20] Atashpaz-Gargari E, Lucas C. Designing an optimal PID controller using colonial SA, and K -means for cluster analysis. International Journal of Innovative competitive algorithm. In: Proceedings of the first Iranian joint congress on Computing, Information and Control 2010;6:1–10. fuzzy and intelligent systems. 2007. [27] Fathian M, Amiri B. A honey-bee mating approach on clustering. The [21] Atashpaz-Gargari E, Lucas C. Imperialist competitive algorithm: an algorithm International Journal of Advanced Manufacturing Technology 2007;38(7–8): for optimization inspired by imperialistic competition. In: Proceedings of the 809–21. IEEE congress on evolutionary computation. 2007. p. 4661–7. [28] Haykin S. Neural networks: a comprehensive foundation. New York: [22] Rajabioun R, Hashemzadeh F, Atashpaz-Gargari E. Colonial competitive algo- MacMillan; 1999. rithm a novel approach for PID controller design in MIMO distillation column [29] Riedmiller M, Braun H. A direct adaptive method for faster back-propagation process. International Journal of Intelligent Computing and Cybernetics 2008; learning: the RPROP algorithm. In: Proc. ICNN. 1993. p. 586–91. [30] Specht DF. Probabilistic neural networks. Neural Networks 1990;3:109–18. 3:337–55. [31] http://archive.ics.uci.edu/ml/databases/syntheticcontrol/syntheticcontrol. [23] Niknam T, Taherian Fard E, Pourjafarian N, Rousta A. An efficient hybrid data.html. algorithm based on modified imperialist competitive algorithm and K -means [32] Dunn JC. A fuzzy relative of the ISODATA process and its use in detecting for data clustering. Engineering Applications of Artificial Intelligence 2011;24: compact well-separated clusters. Journal of Cybernetics 1973;3:32–57. 306–17. [33] Bezdek JC. Pattern recognition with fuzzy objective function algorithms. New [24] Krishna K, Murty M. Genetic k-means algorithm. IEEE Transactions on Systems, York: Plenum Press; 1981. Man and Cybernetics, Part B (Cybernetics) 1999;29:433–9. [34] Murthy CA, Chowdhury N. In search of optimal clusters using genetic [25] Niknam T, Amiri B, Olamaie J, Arefi A. An efficient hybrid evolutionary algorithms. Pattern Recognition Letters 1996;17:825–32. optimization algorithm based on PSO and SA for clustering. Journal of Zhejiang [35] Guh RS, Tannock JDT. A neural network approach to characterize pattern University Science 2009;10:512–9. parameters in process control charts. Journal of Intelligent Manufacturing [26] Firouzi B, Sadeghi MS, Niknam T. A new hybrid algorithm based on PSO, 1999;10:449–62.