Finite Mixture Estimation Algorithm for Arbitrary Function Approximation

The paper considers a new prospect of the arbitrary continuous function approximation from a limited set of input data with the REBMIX algorithm, developed for the finite mixture density estimation. Since the REBMIX estimates the unknown parameters with the unique semiparametric method, it is assumed that it could be used also for the estimation of the unknown parameters in the fields that are not directly connected to density function estimation. For the approximation of the arbitrary continuous function with the REBMIX algorithm, the required procedure is developed in the paper. The results gained by the proposed procedure and by the radial basis function network for three different datasets are compared by calculating the RMSE values between estimated and test output values. The adequacy of the proposed procedure is estimated by using both univariate and bivariate datasets. It can be concluded that with the developed procedure, the REBMIX algorithm can be applied successfully for the continuous function approximation.


INTRODUCTION
Since the beginning of neural networks research [1], the field of neural networks has been established as an interdisciplinary subject with deep roots in neurosciences, psychology, mathematics, the physical sciences and engineering [2] to [7].Radial Basis Function (RBF) networks emerged as a variant of artificial neural networks in the late 1980's.However, their roots reach further back to much older pattern recognition techniques, such as potential functions, clustering, functional approximation, spline interpolation and mixture models [8].Until now the RBF networks have been successfully applied to a large diversity of applications including interpolation [9], classification [10], speech recognition [11], image restoration [12], 3-D object modelling [13], motion estimation and moving object segmentation [14], etc.Their excellent approximation capabilities have been studied by both, Park and Sandberg and Poggio and Girosi [15] and [16].Because of their excellent approximation properties and simple structure, RBF networks have been chosen in the research to compare the results of the arbitrary continuous function approximation.
REBMIX, which is the acronym for the Rough and Enhanced component parameter estimation that is followed by the Bayesian classification of the remaining observations for the finite MIXture estimation, is a numerical procedure that arises from an engineering viewpoint on the mixture estimation problem.The development of the REBMIX algorithm began in the late 1990's with the work of Nagode and Fajdiga [17].Since then, it has evolved gradually over the years [18] to [21] and the latest improvements in modelling both univariate and multivariate finite mixtures can be found in [22] and [23] and in modelling load spectra growth in [24].Until now it has been noted also in other research works concerning fatigue analysis [25] to [28], modelling the expected service usage [29] and [30], etc.
The paper presents an alternative perspective on the arbitrary continuous function approximation.Although the REBMIX algorithm has been originally developed for the finite mixture estimation problems, its unique semi-parametric method for the estimation of the unknown parameters indicates that it could also be used for the parameters estimations on the fields that are not directly connected with the probability density function.Unknown number of components and their parameters are estimated on the basis of the calculated empirical densities from the observed dataset.Calculated empirical densities thus represent the desired output values for a certain region of the input space, just like the arbitrary measured data does.The resemblance between the empirical densities and the output values of the arbitrary measured signal implies that with the proper procedure, REBMIX can be used for the approximation of the arbitrary continuous function.The next logical step forward is thus to extend the REBMIX on the field of arbitrary continuous functions approximation so that all of its properties, which proved already at the estimation of the finite mixture densities, are preserved.The adequacy of the extended REBMIX is appraised according to the results gained by the RBF network.
The paper is structured as follows.In Section 1 the required definitions are cited.In Section 2 the results of the univariate and bivariate function estimations with the proposed procedure and RBF network are presented and compared.Finally, in Section 3 the conclusions are listed and the adequacy of the proposed procedure is discussed.

Radial Basis Function Network
The radial basis function (RBF) network is based on the simple intuitive idea that an arbitrary function y(x) can be approximated as the linear superposition of a set of localized basis functions ϕ j (x) [3].RBF's are embedded in a three layer neural network shown in Fig. 1.The first layer, called the input layer is made of source nodes (sensory units) that represent the components of the input vector.The second layer, the only hidden layer in the network, consists of hidden units, which implement radial activated functions and perform a nonlinear transformation from the input space to the hidden space.The third layer, called the output layer is linear and contains units that represent a weighted sum of hidden unit outputs.Units in the output layer supply the response of the network to the activation pattern applied to the input layer and represent the components of the output vector [4].

Fig. 1. Three layer neural network
Origins of the RBF networks lie in techniques used for the exact interpolation between data points in high dimensional spaces.In applications of neural networks, a general interest is not an exact interpolation since it can lead to particularly poor results when the trained network is presented with new data.Generally, a smooth approximation [31], which can be achieved by using fewer basis functions m than data points n and by minimizing a sum-ofsquares error (SSE) function, can lead to much better results [2].
When m < n, the RBF neural network corresponds to a set of functions given by [2] and [3]: Here w kj represents the weight of the j th basis function output which contributes to the k th network output y k and φ j j ( ) x θ θ represents the activation of hidden unit j when the network is presented with d-dimensional input vector x, see Fig. 1.A bias for the output units is included in Eq. ( 1) as an extra "basis function" ϕ 0 whose activation is fixed to be ϕ 0 = 1.For most applications the basis functions are chosen to be Gaussian: where a j controls the height of the peak, vector μ j represents the center and the parameter σ j represents the width of the j th basis function.Note that each basis function can have its own width parameter σ j [2] and [3].To compare the estimated values with the target values, an error function has to be used.The most commonly used form of the error function for regression problems is the SSE function [2] and [32], given by: where x q denotes the q th d-dimensional input training vector and w the output layer weight vector for current basis functions parameters Θ, t k q is the k th target value in q th c-dimensional target vector t q .Bishop [2] suggests assessing the performance of the trained network using different error function from that used to train them.If the SSE function is used in the network training phase, the root-mean-square error (RMSE) function should be used in network testing.The RMSE function is given by: where x q * denotes the q th input test vector and n* is the number of input test vectors, w and Θ denote the weight vector and the basis functions parameters of the trained network respectively and t k q * is the k th target test value in the q th c-dimensional target test vector t q *.In Eq. ( 4), the t* stands for the average of the target test values: Training of the RBF networks takes place in two successive stages.First, the centers and the basis function widths are determined.In the second stage the linear output layer weights are determined.For the determination of the basis functions parameters there exists a variety of procedures [2] to [4] and [33].Since the scope of the paper is not searching for the optimal learning procedure of the RBF networks but assessing the suitability of the REBMIX algorithm for arbitrary function approximation, only simple and fast procedures for the determination of basis functions centers and width are selected in the paper, which in spite of their simplicity assure adequate network training.
The first and simplest approach to determine the basis functions centers, denoted by C1, is to set them on the highest output values in the training dataset [3].This approach usually results in a large number of basis functions to achieve satisfactory results.The second approach for center determination, denoted by C2, is to select them randomly from the training dataset [3].In this very commonly used learning technique the estimated function is much smoother and usually better approximates the training data with a fewer number of basis functions.The disadvantage of this approach is the reproduction of network training.
The widths of Gaussian basis functions are also determined by using two simple approaches.In the first approach, denoted by S1, the basis function widths are set to be equal to the average Euclidean distance between the adjacent basis function centers, which ensures that the basis functions overlap to some degree and hence give a relatively smooth approximation [2] and [3].In the second approach, denoted by S2, the widths are no longer equal for all basis functions, but are determined on the basis of the average Euclidean distance to the p-nearest centers [2] and [3].
The output layer weights are calculated in a way to minimize the SSE function with respect to these weights.With the insertion of Eq. ( 1) into Eq.( 3) and the differentiation of the SEE function it is possible to rewrite the equation in matrix notation in the following form [2] and [3]: where ( ) = φ x .The formal solution for the weights is given by: where the notation Φ † denotes the pseudo-inverse of Φ given by: 1.2 REBMIX Algorithm for the Finite Mixture Estimation Let x 1 , ..., x n be an observed d-dimensional dataset of size n of continuous vector observations x q .Each observation is assumed to follow predictive mixture density [34]: with conditionally independent component densities: indexed by vector parameter θ j .The objective of the analysis is the inference about the unknowns: the number m of components, component weights w j summing to 1 and component parameters θ j .
Since the description of the REBMIX algorithm estimation procedure and proof of its convergence is extensive and published in [17] to [24], further details will not be presented here.For interested readers REBMIX software is available at http://CRAN.Rproject.org/package=rebmix.

Arbitrary Function Approximation with the REBMIX Algorithm
There are two major differences when estimating the arbitrary function from a set of data points with REBMIX algorithm and RBF network.
The first difference is related to the component weights.In the REBMIX algorithm the weights are limited with two conditions, w j > 0 for (j = 1, ..., m) and , while in the RBF network approach there are no limitations concerning the weights for regression problems.In fact, output layer weights can also be negative.This property can be very useful for better estimation of the function valleys and for observations with negative output values.The observations with negative output values can be processed with the REBMIX algorithm only if they are previously properly treated so that all the observed values have positive signs.
The second major difference is related to the function estimation.When using the RBF network, the arbitrary function can be estimated directly from the set of observed data and therefore it usually does not integrate to unity.With the REBMIX algorithm, the arbitrary function can be approximated indirectly from the estimation of the finite mixture density, which integrates to unity, Therefore it is necessary to properly transform the measured training dataset and postprocess estimated finite mixture density f m ( , , ) x w Θ Θ in such a way that it can be compared to the observed output values.
The procedure for the preparation of the observations and postprocessing of estimated function f m ( , , ) x w Θ Θ is depicted in Fig. 2 relying on the steps to follow: 1.All the data are either raised if t k min < 0 or lowered for the minimal output value: as it turned out that in such cases the REBMIX algorithm estimates the finite mixture density function much better.To improve the accuracy of the estimated function, t k q ' may be multiplied by a factor 10, 100, etc. and rounded to the nearest integer.2. The volume under the shifted data is calculated by: where h i q is the length of the hypersquare side for the q th data in the i th dimension.3. The measured training dataset is transformed in such a way that REBMIX preprocessing methods can be used.With this purpose each d-dimensional input data vector x q is copied t k q ' times so that the total number of vector observations used as input data for the finite mixture estimation equals 4. Finite mixture estimation with the REBMIX algorithm is performed.5.The postprocessing of the estimated finite mixture density function is carried out in such a way that the continuous function, representing input-output mapping of the original dataset is gained.Estimated finite mixture density function f m ( , , ) x w Θ Θ is multiplied by the volume (12) and the transformation function is gained: In the case of a univariate dataset, the volume reduces to the area under the observed data.The estimated function y m k '( , , ) x w Θ Θ can be compared to the true one and to the function estimated by the RBF network., , ) .
' min The correctness of the proposed procedure for the function approximation is proved by the following examples.

Vertical Wheel Forces Dataset
The univariate dataset used in the research derives from measurements of vertical wheel forces that occur when driving the vehicle on a test track.The entire signal, measured with 250 Hz sample rate, is shown in Fig. 3. From all measured data only a section indicated with a square containing 1070 successive data is selected for further treatment due to the faster estimation process.Approximately 30% from these data are randomly selected to form the test dataset that is used only for the evaluation of the estimated functions and is not present in the training phase when the number of components, their parameters and weights are estimated.The test dataset thus consists of n* = 320 data and the remaining n = 750 data form the training dataset.
When the RBF network is applied for the estimation of the function parameters and weights, no special preparation of the training dataset is necessary.Nevertheless, all the data used in the research are lowered for t k min to reduce the estimation error especially on the edges of the observed function.For all combinations of the selected learning procedures C1-S1, C2-S1, C1-S2 and C2-S2 and for each m n ∈{ } 1,..., , the basis functions parameters and weights are determined.Each  The results are shown in Table 1.In most cases the best learning combination turns out to be C2-S2.It results in the smallest number of basis functions and the lowest RMSE, while C1-S1 stands for the worst learning combination possible.
On the other hand, when the REBMIX is applied, all training data are lowered by t k min and the input training data points x q are copied t q ' times.Although the REBMIX allows the selection of different preprocessing, the histogram and Parzen window are only suitable.The former is chosen in the article.For the finite mixture density, the normal parametric family is chosen.To determine optimal number of components m, parameters Θ and weights w, finite mixture estimation is carried out for s ∈{ } 10 750 ,..., .Thus all possible arrangements of observations are captured and the optimal number of bins is obtained according to both, the information criterion and the positive relative deviation D. Estimations are carried out for all combinations of the available information criteria  13) and ( 14) and the corresponding RMSE values are calculated.The calculated RMSE values are than used in the continuation for the performance comparison of the proposed procedure with the RBF network.
The results are shown in Table 2.The optimal number of components increases with the decrease of D and stops to increase if D < 0.0005.If D < 0.001, the optimal number of components increases rapidly while there is only a small decrease of RMSE.The mixture of 13 components is thus supposed to be the optimal one.It can be noted that the RBF network requires 16 basis functions for similar RMSE as the REBMIX (see the last row in Table 1).The corresponding functions are shown in Fig. 4. Both, the RBF network and REBMIX represent the middle section well, while larger deviations appear at the edges.

Two-Dimensional Gaussian Dataset
The bivariate dataset, derived from a mixture of four Gaussian functions Bors and Pitas [33], is studied next.From a mixture of four two-dimensional (2D) Gaussian functions with the following vector parameters: among which n* = 132 randomly selected data form the test dataset and the residual n = 309 data form the training dataset.In addition to the noise free bivariate dataset, the random Gaussian noise with μ = 0 and σ = 0.6 is added to y q to simulate the noisy dataset, which is usually observed in the measurements.
When the RBF network is applied, no preparation of the training dataset is carried out as t k min = 0 .3 and 4 for noise free and noisy dataset, respectively.The smallest number of basis functions and the lowest RMSE are gained when the learning combinations C2-S1 and C2-S2 are applied.The worst learning combination turned out to be the C1-S1.The results are shown in Tables 5 and 6.For the noise free dataset the optimal number of components is the same for all D values, whereas for the noisy dataset the optimal number of components increases by one when D ≤ 0.01.With the increase of the number of components the RMSE value also increases.This means that the estimated function with a larger number of components overfits the data and consequently results in a worse estimate.The mixture of 5 components is thus supposed to be the optimal one.Unlike for the univariate dataset, where optimal s n  , for the presented bivariate datasets the optimal s > n in both dimensions.This indicates that some of the histogram bins stay empty after the observations are arranged.The RBF network requires 13 basis functions in the case of noise free dataset and 17 basis functions in the case of noisy dataset for similar RMSE as the REBMIX (see the last row in Tables 3 and 4).The corresponding functions are shown in Figs. 5 and 6.In the case of the noise free dataset the function estimated by the REBMIX overestimates all four components on their peak values and slightly underestimates the simulated function in the valleys.If the analogy with the univariate function estimation is taken, it is expected that the REBMIX would estimate the underlying function even better if it was composed of a greater number of intermediate components.
On the other hand, the function estimated by the RBF network underestimates the first component considerably and the second and fourth component slightly but estimates the valley between the second and the third component well.Similar results are also obtained in the case of the noisy dataset where the function estimated by the REBMIX again overestimates the peak values of all four components and slightly underestimates the valleys between them (see Fig. 6).The function estimated by RBF network represents the first three components very well and underestimates the fourth one.

CONCLUSION AND FUTURE WORK
In the article continuous functions are estimated with the REBMIX algorithm for the first time.Both univariate and bivariate datasets are used to evaluate its adequacy.The estimated functions are compared to the functions estimated by the elementary RBF network.
For the applied univariate and bivariate datasets it can be concluded that the functions estimated by the REBMIX using the proposed procedure approximate the actual functions well.Hence the assumption is derived that the REBMIX can be applied for the estimation of the univariate and bivariate continuous functions if the training dataset is transformed properly and the estimated finite mixture densities are postprocessed properly.Although the procedure requires the transformation of the training data and postprocessing of the estimated function, the estimation times are still very short since all the properties of the REBMIX are preserved.
The future development of the REBMIX will be focused on its connection to the RBF networks.Possibly, the REBMIX can be used to determine the centers of basis functions μ j and widths σ j in the first stage of the RBF network learning process.The determination of the final layer weights should remain unchanged.In this way the postprocessing of the estimated finite mixture density can be omitted since the estimated function would already approximate the actual observed function.The entire estimation process can also be simplified if the transformation of the training data was comprehended in the REBMIX preprocessing.
To assess the benefits of the connection between the REBMIX and the RBF neural network, further investigations are to be carried out.Future work will thus be focused on additional testing using also other parametric families and the Parzen window preprocessing.The tests will also have to be carried out for the function estimations from multivariate datasets and a larger number of data.Expectedly, by connecting these procedures the REBMIX will be used to solve other problems covered by the neural networks as well, such as classification problems, inverse problems, etc.

6 .
To compare the estimated values to the actual measured output ones, it is necessary to shift the estimated function y m y k ' (x | m, w, Θ) is then raised by t k min , the trained RBF network y k (x | m, w, Θ) is subjected to the test dataset and the RMSE is calculated.The network training is stopped if RMSE ≤ RMSE lim , where the RMSE lim ∈{ } 0.5, 0.3, 0.2, 0.1 and min or m = n.

Fig. 4 .
Fig. 4. Comparison between measured univariate signal and both estimated functions with similar RMSE value

Fig. 5 .
Fig. 5. Comparison between simulated noise free bivariate function and both estimated functions with similar RMSE value

Table 1 .
The results of function estimation for vertical wheel forces dataset with the RBF network; the -indicates that network training is stopped before the limiting RMSE value is reached

Table 2 .
The results of function estimation for vertical wheel forces dataset with the REBMIX

Table 5 .
The results of function estimation for the noise free bivariate dataset with the REBMIX

Table 3 .
The results of function estimation for the noise free bivariate dataset with the RBF network

Table 4 .
The results of function estimation for the noisy bivariate dataset with the RBF network Fig. 6.Comparison between simulated noisy bivariate function and both estimated functions with similar RMSE value

Table 6 .
The results of function estimation for the noisy bivariate dataset with the REBMIX