Strategies for optimizing the phase correction algorithms in Nuclear Magnetic Resonance spectroscopy

Nuclear Magnetic Resonance (NMR) spectroscopy is a popular medical diagnostic technique. NMR is also the favourite tool of chemists/biochemists to elucidate the molecular structure of small or big molecules; it is also a widely used tool in material science, in food science etc. In the case of medical diagnosis it allows for determining a metabolic composition of analysed tissue which may support the identification of tumour cells. Precession signal, that is a crucial part of MR phenomenon, contains distortions that must be filtered out before signal analysis. One of such distortions is phase error. Five popular algorithms: Automics, Shanon's entropy minimization, Ernst's method, Dispa and eDispa are presented and discussed. A novel adaptive tuning algorithm for Automics method was developed and numerically optimal solutions to automatic tuning of the other four algorithms were proposed. To validate the performance of the proposed techniques, two experiments were performed - the first one was done with the use of in silico generated data. For all presented methods, the fine tuning strategies significantly increased the correction accuracy. The highest improvement was observed for Automics algorithm, independently of noise level, with relative phase error dropping by average from 10.25% to 2.40% for low noise level and from 12.45% to 2.66% for high noise level. The second validation experiment, done with the use of phantom data, confirmed the in silico results. The obtained accuracy of the estimation of metabolite concentration was at 99.5%. Conclusions The proposed strategies for optimizing the phase correction algorithms significantly improve the accuracy of Nuclear Magnetic Resonance spectroscopy signal analysis.


Introduction
Magnetic resonance spectroscopy (abr. MRS) is a technique widely used in, among the others, modern oncology to determine metabolic profile of the tissues. It is especially useful to differentiate between healthy tissues and tumours. However the differences in metabolic profiles are in many cases slight, thus signal must be carefully pre-processed in order to accurately estimate amount of metabolites in examined tissues. One of the distortions that affects MR spectrum mostly is phase error. The first simplest mathematical model of phase error was proposed by Ernst [1] where the assumption on linearity of phase error along the spectrum, described by two factors frequency dependent and frequency independent one, was made. Such a model posits the phase error to be mainly a consequence of eddy current induction in the scanner. The extension of linear model is bilinear one, incorporating multiplicatory factor that stands for phase error fraction caused by magnetic field inhomogeneity [2,3]. In contrast to the linear models, higher order models were proposed assuming the nonlinearity of the phase error with respect to the frequency.
While considering MRS in clinical diagnosis, the influence of field inhomogeneity may be neglected due to small spectrum complexity (low dimensionality) and significantly lower field strength (than in chemical measurements), leaving the phase error related to the induction of eddy currents in coils only [4].
The linear model of phase error consists of two parts: zero and first order and it is given by: Zero order component 0 is representing the offset between absorption and dispersion spectrum while first order component 1 is representing the frequency dependant shift where dependency is modelled by straight line with slope coefficient k/N, where k denotes index of the point in spectrum and N is total number of points.
There are two possible approaches of phasing MR spectra. The first one is manual and requires expert knowledge about zero and first order component. Such a procedure is time consuming and requires an experienced human operator. The drawback of the manual method is the fact that it is hard to estimate the correction effect for the whole spectrum [5]. It is then done for small parts of spectrum and leads to overcorrection of analysed fragments and no correction of other fragments.
The second possible approach that is free of the mentioned drawback is the automatic correction. For such an approach the correction is done accordingly to the linear model, but the quality of spectrum is evaluated and maximized automatically, mostly with the use of optimization techniques. The automatic phase correction algorithms are still less popular than manual approach, which is partially caused by unsatisfactory accuracy of the existing techniques.
The most popular techniques of automatic phasing of MR spectra are: Automics [6], Shanons Entropy minimization [7], Ernst's method [1], Dispa [8] and eDispa [9]. The aim of this paper is to design, implement and verify the automatic tuning strategies for the above mentioned methods that will result in efficiency increase of phase error correction for 1H (Hydrogen isotope Protium) NMR spectra.

Methods
While looking at the algorithms designed for phase error correction in MR spectra, one can notice two groups of them: model based and model free techniques. In the family of methods that derive direct value of phase error (without a priori assumed error model) the most popular are: [5] which is based on filter diagonalisation method and [10] which performs the phase correction with the use of separately measured water signal. The newest approach of direct phase error correction seems to be the method by [11] requiring the registration of series of spectra.
The five chosen linear model based methods: Automics, Shanon's, Ernst's, Dispa and eDispa, as being the most popular in clinical approaches, were implemented following the description included in the original publications. The mechanism of finding optimal solution was examined for each technique and tuning of algorithms, by means of tuning of parameters or application of efficient numerical solution, was performed.

Automics
The first analysed algorithm, proposed by [6], is based on estimating linear model parameters 0 and 1 dependent on a phase evaluated at the tails of spectrum. The original method assumes definition of two intervals at each tail. For each pair of intervals a mean phase is calculated. Than having two values of phase: at the beginning and end of the spectrum, phase error is estimated. It is assumed that phase error in these intervals does not differ significantly. The length of the intervals may be understood as a parameter for the algorithm that may be optimized to find the best solution [6].
The developed procedure for interval length estimation starts with the initial interval as a single point and then extends stepwise the interval by other points as long as there is no significant trend in the data within the interval. To verify no trend hypothesis a linear regression model is constructed within each interval and statistical test on signal gradient being equal to zero is applied. For the purpose of this study significance level was set to 5% (α = 0.05). The procedure is repeated for each of two intervals -at the beginning and at the end of the spectrum. If a significant change of signal magnitude (and consequently phase error) is found in one of the intervals the procedure is terminated and the found length is treated as the optimal one. The parameters of phase error model are calculated by solving the set of two equations (for each tail) in the form given by equation below.
Where index j stands for the location of the interval: j = 1 for the interval located at the beginning of the spectrum; while j = 2 for the interval located at the end of the spectrum. R j,2 and I j,2 are the real and imaginary part of an element at the end of the interval; and R j,1 and I j,1 are the real and imaginary part of an element at the beginning of the interval, k j,1 and k j,2 are the indices of the beginning and the end of the j th interval, and N is a length of the spectrum.

Methods based on reformulation to the optimization problem
Three of the above mentioned algorithms might be tuned by application of a properly chosen numerical method for solving their optimization problem.

Shanon's entropy minimization
The Shanon's entropy minimization method is based on the assumption that ideal absorption spectrum should be positive. Such a spectrum has smaller Shanon's entropy than spectrum that contains points that are both: positive and negative [7]. The problem for this method is to find set of parameters 0 , 1 for which the spectrum phased with linear model has the smallest error of correction. The minimization problem is given by equation.
where H is the Shannon entropy of given spectrum, S A (k,...) is a magnitude of the absorption spectrum at k th data point and P is a penalty factor.

Ernst
Ernst method is based on the axiom that the integral of single line (peak) dispersion spectrum should be equal to zero. Since the spectrum is a composition of peaks it is clear that for no phase error the dispersion integral of whole spectrum should be zero [1]. Because of the noise it is rarely to be true. In the Ernst method the optimization problem is to find parameters of linear model for which the dispersion integral will be minimal.
where I is an integral value, S D is a magnitude of the dispersion spectrum; a and b are the integration limits equivalent to the minimum and maximum values on the frequency [Hz] or [ppm] scale of the spectrum.

eDispa
In eDispa method authors use linear model as well and perform the two-step quality calculation [9]. The first step is based on calculation of defined η -functional of the form given below.
Q is a functional defined as: where S A denotes the magnitude of the absorption spectrum, k is the index of data point, N is a length of the spectrum.

Solutions to the optimization problems
The solution to three above mentioned algorithms could be found with the use of classical optimization algorithms. In case of Shanon's entropy minimization and eDispa problem the Nelder-Mead algorithm can be applied [12]. As for Ernst's method the problem is more complex, and it may be solved with the use of integral global optimization [13].
One of the crucial steps during the Nelder-Mead optimization, due to the strong nonlinearity of the optimized functions, is setting of parameter initial values. To improve accuracy of tuned algorithms and to make whole optimization process faster the procedure for setting of initial conditions has been proposed. It is based on observation that water peak is located in the middle of spectrum and it is with no doubt the peak of maximal amplitude both in absorption and magnitude spectrum. As described in previous paragraph, a phase angle between absorption and dispersion part measured at peak maximum should be equal 0. If it is not, the measured value is a rough estimate of phase error at the half of spectrum: where Δ 0 is an initial estimate of the phase error, and S stands for the MR spectrum, index k max denotes the spectrum data point with maximum of magnitude spectrum.
Knowing the value of Δ 0 it is easy to estimate 0 and 1 just using the equation for linear phase error model and additional assumption that ratio of phase error components is equal ¼. This value is empirical and it was chosen after set of experiments done on clinical spectra. Example of effectiveness of the proposed initial condition is shown in Figure 1.
The proposed initial condition may be used even when water signal is partially suppressed during measurement procedure. When water signal is not present in the data (full water suppression) the maximum of signal may be used, however it is definitely not as good as water peak.

Dispa
Dispa method is based on the assumption that phase at the maximum of the peak should, in an ideal case, be equal 0 [8]. Assuming the linear model of Δ it is then easy to estimate 0 and 1 with use of just two neighbouring peaks. It was noticed that such approach might lead to wrong estimates because of noise presence and its influence on maximal point of peak. An idea for Dispa method is to evaluate phase value at max points of all significant peaks and then estimate Δ model with use of linear regression model.

Quality criterion
In order to properly estimate value of phase error that remains in the data after phase correction, a quality criterion was proposed. The assumed criterion uses the phase plot (relation between dispersion and absorption spectrum), obtained for last significant peak in the analysed spectrum. Because of signal sampling a peak and consequently phase plot is not a continuous line but a set of points. Because the criterion uses major radius of phase plot, an estimates of ellipse parameters are obtained from the data points. Having ellipse equation it is then easy to derive the equation for its major and minor radius. The assumption is that in case of no phase error, major radius of an ellipse phase plot should lay exactly on the real axis. The remaining phase error is the angle between real axis of the phase plane and major radius of an ellipse phase plot. The idea of the assumed quality criterion is shown in Figure 2.

Data
To verify the quality of spectra phasing two data sets were collected. The first one consists of numerically simulated spectra (named as synthetic data), while the second data set consists of 27 measurements obtained for a brain phantom that contained: 5 mM of Lactates at 0.5 ppm, 12.5 mM of N-acetylaspartate (abr. NAA) at 2.0 ppm, 10 mM of Creatine at 3.0 ppm, 3 mM of Choline at 3.2 ppm and 7.5 mM of myo-Inositol located at 4.6 ppm. The data were measured with the use of Philips Achieva scanner of 1.5 T magnetic field induction. The echo and repetition time were equal to 35 and 1500 ms. Every spectrum was averaged over 128 replicates. The number of points was equal to 1024 and the sequence type was PRESS.
To obtain the synthetic data set being similar in structure to the brain data, randomly chosen single spectrum was taken from phantom data and it was manually phased by human expert. Then the absorption spectrum was extracted and pre- Figure 2 Visualization of idea behind assumed quality criterion [15]. processed to filter out noise and baseline. The smoothed signal was used as the reference for synthetic signal generator.
Single synthetic spectrum was generated with the use of the following procedure: 1) To ensure signal complexity similar to the clinical spectra, additional peaks together with additive noise were randomly added to the reference signal in frequency domain.
2) The dispersion spectrum was reconstructed with the use Hilbert transformation. Since the noise component was purely random each combination of 0 and 1 was repeated 50 times. In total 1250 simulations were performed. Exemplary simulated spectrum is presented in Figure 3.

Evaluation
To evaluate the efficiency of the proposed strategies for algorithm tuning, every spectrum was corrected twice, by original and tuned algorithm. The residual post correction error was calculated following the procedure described in Quality criterion section. The error residuum expressed as a percentage of the applied phase distortion value was named a relative error.
The block diagram describing performed comparison study is shown in Figure 4.

Experiment I -synthetic data
The Exemplary spectrum obtained with the addition of the phase error equal to 10 degrees and low level noise. [15] with an additive noise of 30.75 dB (low) or 8.52 dB (high) and repeated 50 times what results in 1250 simulations in total. The correction was applied to every generated spectrum in both manners: with the use of original algorithm and with applied tuning routines. The relative error was calculated, and the results were grouped with respect to the total Δ value. For each group mean value, standard deviation and coefficient of variation CV were calculated.

Low level of additive noise
Basing on the above results it may be noticed that proposed tuning routines improve phase correction quality for each of the analysed algorithm. The highest increase was observed for algorithm Automics and the lowest increase was observed for Ernst algorithm. By looking at a mean value of relative error it may be concluded that with the increase of Δ the remaining phase error after the correction increases in both cases (before and after tuning). By looking at descriptive statistics, for correction with the use of tuned algorithms the dispersion of results among spectra with different noise is much lower for Automics, Shanon's and Ernst but remains at the same level for Dispa and eDispa.

High level of additive noise
The second part of the synthetic data experiment differs from the first by a much larger additive noise that was applied to all generated spectra. As previously a huge improvement in results obtained for with tuning was noticed. For all five algorithms the values of relative phase error are worse while compared to the low noise level results. The best results were obtained for Automics algorithm. It was also observed that addition of higher-level noise increases the dispersion of spectra generated for same combination of 0 and 1 values. With low noise this value was equal to~1% for all methods without tuning, while by increase of the noise level it doubles. By application of tuning it was possible to decrease the dispersion to about 1%.

Experiment II -brain phantom data analysis
In the second part of validation procedure a data obtained on brain phantom was analysed. The number of processed spectra was equal to 27. All signals were collected at different time frames with use of Philips Achieva (1.5 T) with parameters: echo time = 35 ms, repetition time = 1500 ms, number of averages = 128, number of points = 1024 and the sequence type PRESS. Thus it was assumed that the distortions such as phase errors or noise would be different from spectrum to spectrum. Following the results of the analysis performed on synthetic data, each phantom spectrum phase was corrected with the use of Automics algorithm only. The correction was performed twice: with and without parameter tuning. The spectra were then decomposed into Gaussian Mixture Model (abr. GMM) and the concentrations of metabolites were calculated [14]. The obtained estimates of concentrations are presented in the form of boxplots in Figure 7 and their descriptive statistics are included in table 5. One can conclude that the tuning routine applied to the Automics algorithms improves the results of phase correction giving more accurate estimates of metabolite concentrations. In comparison to the algorithm before tuning the increase is significant for each analysed metabolite. After application of tuning procedure the maximum difference between estimated mean and the true values of metabolite concentration is 0.4% (10.04 vs. 10.00 mM for Creatine and 5.02 vs. 5.0 for Lactate). Estimated mean concentration of Choline is exactly at desired value of 3.0 mM. By looking at the Figure 5 The relative phase error [%] and its confidence intervals obtained for phase correction algorithms a) before and b) after the application of tuning procedures -low noise.     values of standard deviation and coefficient of variation it may be noticed that for all metabolites dispersion of results among 27 spectra is smaller while compared to not tuned version (highest increase for myo-Inositol: standard deviation for not tuned Automics was 0.73 while 0.06 for tuned algorithm).

Discussion
The in silico experiment was performed to verify effectiveness of the proposed tuning for five popular phase correction algorithms. Parameter tuning increases the correction efficiency by at least 4%. The higher impact of tuning algorithm is observed for higher phase errors. The highest impact was observed for Automics algorithm for which modification was the most complex. The adaptive definition of interval length is more efficient than the fixed length option. Additionally, it minimizes the risk that interval contains points that significantly differ in magnitude and phase error. For the group of methods based on the reformulation to the optimization problem it was observed that implementation of efficient simplex algorithm increased accuracy of all three of them.
It is also a result of efficient setting of the initial condition. The high (around 5%) improvement was observed for the tuned version of Dispa algorithm. It is a result of accurate estimation of Δ parameters with the use of all peaks not just selected two. Because of the noise presented in the data, the position of maximal point in the peak  The presented descriptive statistics are: sample mean (x ), sample standard deviation (s), 95% confidence interval for population mean value (95%CI) and coefficient of variation (CV). Right column presents paired t test p-value resulting from testing the hypothesis on no accuracy improvement by algorithm tuning.
is shifted. If the observed maximum is not a true maximum, the phase evaluated at that point is also wrong. For lower noise the correction accuracy is slightly better but that was expected.
In the analysis of clinical phantom data it was proven that tuned algorithm outperforms not tuned version. Only the Automics algorithm was used for phase correction, as it was demonstrated to be the best performing during the synthetic data analysis. Its original version gives results that are satisfactory however we have proven that tuning may increase accuracy and may decrease the dispersion of metabolite concentration estimates among the spectra.

Conclusions
The proposed tuning routines significantly increase the accuracy of phase error correction for all examined algorithms: Automics, Shanon's entropy minimization, Ernst's, eDispa and Dispa. To understand the importance of proper spectrum phasing two-step validation experiment was performed. The first one was based on the analysis of spectra with known phase error disturbed by additional random noise (synthetic data), while the second validation experiment was performed on spectra with unknown phase error but known original concentration of metabolites. Both validation experiments showed that tuning routines increase the accuracy. The second, phantom based validation experiment has shown that phase error correction the crucial role in determining the metabolite concentration and may lead to more accurate clinical diagnosis.

Competing interests
We hereby confirm that the authors of this manuscript do not have any competing financial, professional or personal interests.

Authors' contributions
We hereby declare that FB contributed in the developing methodology for data analysis and performed the numerical experiments. JP contributed in designing the experiment, developing methodology for data analysis and analysed and discussed results. RT contributed in the NMR methodology part.