Random Fourier Features
30 Aug 2014A kernel $k(x,y)=\left\langle \phi(x),\phi(y)\right\rangle $ in general may correspond to an inner product in an infinitedimensional space whose feature map $\phi$ cannot be explicitly computed. In Rahimi & Recht 2007 , methods for computing an approximate feature maps $\hat{\phi}$ were proposed. The approximate feature maps are such that $k(x,y)\approx\hat{\phi}(x)^{\top}\hat{\phi}(y)$ (with equality in expectation) where $\hat{\phi}\in\mathbb{R}^{d}$ and $d$ is a free parameter. High $d$ yields a better approximation with higher computational cost.
Assume $k(x,y)=k(xy)$ (translation invariant) and $x,y\in\mathbb{R}^{m}$. Random Fourier features $\hat{\phi}(x)\in\mathbb{R}^{d}$ such that $k(x,y)\approx\hat{\phi}(x)^{\top}\hat{\phi}(y)$ are generated as following.

Compute the Fourier transform $\hat{k}$ of the kernel $k$. That is, $\hat{k}(\omega)=\frac{1}{2\pi}\int e^{j\omega^{\top}\delta}k(\delta)\, d\delta$. For a Gaussian kernel with unit width, $\hat{k}(\omega)=\left(2\pi\right)^{m/2}e^{\omega^{2}/2}$.

Draw $d$ i.i.d. samples $\omega_{1},\ldots,\omega_{d}\in\mathbb{R}^{m}$ from $p$ i.e.,
randn(m, d)
. Draw $d$ i.i.d samples $b_{1},\ldots,b_{d}\in\mathbb{R}$ from $U[0,2\pi]$ (uniform distribution). 
$\hat{\phi}(x)=\sqrt{\frac{2}{d}}\left(\cos\left(\omega_{1}^{\top}x+b_{1}\right),\ldots,\cos\left(\omega_{d}^{\top}x+b_{d}\right)\right)^{\top}$
Why it works ?
Bochnerâ€™s theorem: A continuous kernel $k(x,y)=k(xy)$ on $\mathbb{R}^{m}$ is positive definite iff $k(\delta)$ is the Fourier transform of a nonnegative measure.
Furthermore, if a translation invariant kernel $k(\delta)$ is properly scaled, Bochnerâ€™s theorem guarantees that its Fourier transform $\hat{k}(\omega)$ is a proper probability distribution. From this fact, we have where $\eta_{\omega}(x)=e^{j\omega^{\top}x}$ and $\cdot^{*}$ denotes the complex conjugate. Since both $\hat{k}$ and $k$ are real, the complex exponential contains only the cosine terms. Drawing $d$ samples is for lowering the variance of the approximation. The uniformly random numbers ${b_i}_i$ are to correct the bias of the estimator.