Picking peaks from N-dimensional NMR spectra


The problem

We can model a spectrum, S(x)S(\mathbf{x}), as the convolution of a discrete set of peaks, with a lineshape function f(x)f(\mathbf{x}), plus a noise term αη(x)\alpha\eta(\mathbf{x}):

S(x)=pXfγ(px)+ηα(x),S(\mathbf{x}) = \sum_{\mathbf{p} \in X} f_{\boldsymbol{\gamma}}(\mathbf{p} - \mathbf{x}) + \eta_\alpha(\mathbf{x}),

where γ\gamma is a vector of peak widths along each axis. ηα\eta_\alpha is distributed according to a Gaussian distribution with variance α\alpha (Keeler, 2005).

The goal of peak picking is to take a spectrum SS and determine the set of peaks, XX, which gave rise to it.

The following article provides an overview of the various algorithms which can be employed to solve this problem.

Local maxima

A naïve algorithm identifies peaks at the local maxima of SS:

X={xRnS(x)=0}X = \left\{\mathbf{x} \in \mathbb{R}^n \mid \nabla S({\mathbf{x}}) = \mathbf{0}\right\} S(x)=f(xix)+η(x)\nabla S(\mathbf{x}) = \sum \nabla f(\mathbf{x_i} - \mathbf{x}) + \nabla \eta(\mathbf{x})

however this results in the identification of peaks in the noise as well.

This simple algorithm can be improved with two simple modifications. Most of the noise peaks have a significantly lower intensity than the true peaks. By only selecting peaks above a threshold, many of these can be eliminated.

Furthermore applying a Gaussian blur to the spectrum, Sblurred=SGσS_\text{blurred} = S * G_{\sigma}, reduces detection of noise peaks by smoothing the high-frequency noise.

Gσ=(2πσ)nexp(x2σ2)G_\sigma = \left(\sqrt{2\pi}\sigma\right)^{-n} \exp\left(\frac{ |\mathbf{x}| }{2\sigma^2}\right)

Laplacian of Gaussian

The Laplacian of a spectrum:

2S(x)=2Sx12+...+2Sxn2,\nabla^2 S(\mathbf{x}) = \frac{\partial^2 S}{\partial x_1^2} + ... + \frac{\partial^2 S}{\partial x_n^2},

has peaks where the spectrum intensity changes rapidly, and is close to zero in uniform areas. Thus, the local maxima of the Laplacian of a spectrum correspond to areas of high curvature, which are likely to be spectral peaks. However, taking the second derivative amplifies high-frequency noise signals. As before, high-frequency noise can be attenuated using a Gaussian blur. To compensate for the loss in contrast introduced by the Gaussian blur, the Laplacian of Gaussian kernel is multiplied by a normalization factor of σ2\sigma^2.

Since σ22(GσS)=(σ22Gσ)S,\sigma^2\nabla^2 (G_\sigma * S) = (\sigma^2\nabla^2 G_\sigma) * S, this can be represented as the convolution of the spectrum with a single Laplacian-of-Gaussian kernel,

LoGσ=σ22Gσ\text{LoG}_\sigma = -\sigma^2\nabla^2 G_\sigma

The parameter σ\sigma determines the size of the features which are detected: a larger σ\sigma causes larger features to be blurred away, while a smaller σ\sigma retains more high-frequency information. Since the peaks in an NMR spectrum are all roughly the same scale,

LoGσ\text{LoG}_\sigma can be well-approximated by the difference of two Gaussian blurs

LoGσ2k21(GσGkσ)\text{LoG}_\sigma \approx \frac{2}{k^2 - 1} \left(G_\sigma - G_{k\sigma}\right)

Since Gaussian blur can be calculated much faster than convolution by an arbitrary kernel, this method is preferred, especially for large or high-dimensional spectra.

Wavelet transform

Local quadratic fitting

Curve fitting / GSD

  • NMR lineshape
  • Lorentzian distribution
  • Drawbacks

qGSD

  • For quantitative applications when knowledge of the exact lineshape or peak integral is required
  • Not really necessary for most protein structural applications

Machine learning

  • DEEP picker

Validation

  • Peaks between different spectra of the same sample should be consistent.
  • For example, an N-HSQC peak should correspond to a set of peaks in a HNCACB spectrum with the same H/N shift.

References

Keeler J (2005) Understanding NMR Spectroscopy 2nd ed. Wiley