TOP
header image
 MJC:  Home | Publications | Contact | Feedback

 Topics in Particle and Dispersion Science

  Home | Survey | Topics | Index | References | Dictionary | Contributing | Gallery | Community

PSD approximation and analysis with empirical orthogonal functions (EOF): The method Prev topic | Next topic

Let a vector ni (D) = [ni (D1), ..., ni (DM )] ≡ ni (Dj),  j = 1,..., M, be the i-th data vector of dimension M from a set of N data vectors, where j numbers intervals (bins) of the variable D. Each such vector with M components may represent, for example, a differential particle size distribution of aerosol measured at M particle diameters, D j ,  j = 1,..., M. The mean data vector, <n(D)>, is defined as follows:

<n(D j )> = N -1 i = 1 to N  ni (D j ) i = 1, ... , N;  j = 1, ... , M  (1)

Hence, each data vector, ni (D), can be represented by a deviation vector, δni (D), from the mean vector, as follows:

δni (D j ) = ni (D j ) - <n(D j )> i = 1, ... , N;  j = 1, ... , M  (2)

Each deviation vector can be expanded into a series of orthogonal vectors, hk (D):

δni (D j ) = ∑ k = 1 to M  βik hk (D j ) i = 1, ... , N;  j, k = 1, ... , M  (3)

which fulfill the conditions of orthonormality:

j = 1 to M  hk (D j ) hl (D j ) = M δkl δkl = 0 for lk
δkl = 1 for l = k
 (4)

The vectors hk (D) are called modes, main components, characteristic vectors, and empirical orthogonal functions. The coefficients βik are called amplitude functions or simply amplitudes (for example, Preisendorfer RW 1988, Lorenz EN 1956). These coefficients are calculated by using the following equations:

βik = ∑ j = 1 to M  δni (D j ) hk (D j ) i = 1, ... , N;  j = 1, ... , M  (5)

In the EOF method, the vectors hk are the eigenvectors of the covariance matrix, Cov(Di, D j ), where i, j = 1, ... , M:

Cov(Di, D j ) = N -1 k = 1 to N  δnk (Di) δnk (D j ) i, j = 1, ... , M  (6)

calculated, as stated above, for an input set of empirical data vectors, nk(D), k = 1, ... , N. Hence, the vectors hk are solutions of the following equations:

i = 1 to M  Cov(Di, D j ) hk (Di) = λk hk (D j ) j, k = 1, ... , M  (7)

where λk are eigenvalues of the covariance matrix. The eigenvalues λk and eigenvectors hk can be calculated, for example, by using Jacobi's method (Ralston A 1975). The eigenvalues and amplitudes are related in the following ways:

λi δij M -1 = N -1 k = 1 to N  βki βkj i, j = 1, ... , M  (8)
λ j M -1 = N -1k = 1 to N  βkj 2 j = 1, ... , M  (9)

It follows from the foregoing discussion that an empirical data vector from a set of these vectors that were used to calculate the Cov matrix of the input data set, can be approximated by the following equation:

ni (D j ) = <n(D j )> + ∑ k = 1 to L  βik hk (D j ) i = 1, ... , N  (10)

where LM is the number of modes chosen to simultaneously minimize L and maximize the criterion R(L):

R(L) = V -1 i = 1 to L  λi    (11)

defining the relative cumulative contribution of the first L eigenvalues to the total variance, V (the first line of Eq. 12) which simply equals the sum of all eigenvalues (the second line of Eq. 12, for example, Abdi H 2007):

V = N -1 k = 1 to N i = 1 to M  δnk 2 (Di )  
  = ∑ i = 1 to M  λi  (12)

The eigenvalues λi quickly decrease with the increasing index i. This permits setting L to a low value, generally on the order of 1 to 3 and achieve acceptable values of R(L) greater than about 0.90-0.95 (for example, Jankowski A 1994, Nielsen PB 1979).

The significance of the modes can be readily understood in terms of the fractions of the total variance they account for. Specifically, the mode h1 accounts for the greatest fraction of the variance of the input data set. The mode h2 accounts for the greatest variance of a data set 2, obtained by subtracting the first mode for the initial data set. The mode h3 accounts for the greatest variance of a data set 3, obtained by subtracting the second mode for the data set 2, and so on. However, the physical interpretation of the modes is less straightforward because some components of a mode can be negative (for example, Jonasz M 1983, Kitchen JC et al 1975). To avoid this problem, two alternative approaches have been developed: absolute principal component analysis (e.g. Chan TW and Mozurkevich 2007a) and positive matrix factorization (for example, Paatero P and Tappert 1994).

It is important to note that the value of R(L) obtained with Eq. 11 applies only to the expansions, with Eq. 10, of data vectors from the input set. Expansion, by using the eigenvectors of that set, of a data vector, next (D), external to the input data set may not account for the same fraction R of the variance of the enlarged data set, i.e. the input data set + next (D).

CITATION:
Kuśmierczyk-Michulec J. 2008. PSD approximation and analysis with empirical orthogonal functions (EOF) (www.tpdsci.com/Tpc/PsdEof.php). In: Top. Part. Disp. Sci. (www.tpdsci.com).
HISTORY:
Published: 19-Jun-2008
Modified: 30-Jun-2008
Peer-reviewed: PENDING
Copyright 2005-2008 MJC Optical Technology. All rights reserved. | Terms of use Menu