Picking peaks from N-dimensional NMR spectra
The problem
We can model a spectrum, , as the convolution of a discrete set of peaks, with a lineshape function , plus a noise term :
where is a vector of peak widths along each axis. is distributed according to a Gaussian distribution with variance (Keeler, 2005).
The goal of peak picking is to take a spectrum and determine the set of peaks, , which gave rise to it.
The following article provides an overview of the various algorithms which can be employed to solve this problem.
Local maxima
A naïve algorithm identifies peaks at the local maxima of :
however this results in the identification of peaks in the noise as well.
This simple algorithm can be improved with two simple modifications. Most of the noise peaks have a significantly lower intensity than the true peaks. By only selecting peaks above a threshold, many of these can be eliminated.
Furthermore applying a Gaussian blur to the spectrum, , reduces detection of noise peaks by smoothing the high-frequency noise.
Laplacian of Gaussian
The Laplacian of a spectrum:
has peaks where the spectrum intensity changes rapidly, and is close to zero in uniform areas. Thus, the local maxima of the Laplacian of a spectrum correspond to areas of high curvature, which are likely to be spectral peaks. However, taking the second derivative amplifies high-frequency noise signals. As before, high-frequency noise can be attenuated using a Gaussian blur. To compensate for the loss in contrast introduced by the Gaussian blur, the Laplacian of Gaussian kernel is multiplied by a normalization factor of .
Since this can be represented as the convolution of the spectrum with a single Laplacian-of-Gaussian kernel,
The parameter determines the size of the features which are detected: a larger causes larger features to be blurred away, while a smaller retains more high-frequency information. Since the peaks in an NMR spectrum are all roughly the same scale,
can be well-approximated by the difference of two Gaussian blurs
Since Gaussian blur can be calculated much faster than convolution by an arbitrary kernel, this method is preferred, especially for large or high-dimensional spectra.
Wavelet transform
Local quadratic fitting
Curve fitting / GSD
- NMR lineshape
- Lorentzian distribution
- Drawbacks
qGSD
- For quantitative applications when knowledge of the exact lineshape or peak integral is required
- Not really necessary for most protein structural applications
Machine learning
- DEEP picker
Validation
- Peaks between different spectra of the same sample should be consistent.
- For example, an N-HSQC peak should correspond to a set of peaks in a HNCACB spectrum with the same H/N shift.