Interface
Data Orientation
Data matrices may be oriented in one of two ways with respect to the observations. Functions producing a kernel matrix require an orient argument to specify the orientation of the observations within the provided data matrix.
Row Orientation (Default)
An orientation of Val(:row) identifies when observation vector corresponds to a row of the data matrix. This is commonly used in the field of statistics in the context of design matrices.
For example, for data matrix $\mathbf{X}$ consisting of observations $\mathbf{x}_1$, $\mathbf{x}_2$, $\ldots$, $\mathbf{x}_n$:
When row-major ordering is used, then the kernel matrix of $\mathbf{X}$ will match the dimensions of $\mathbf{X}^{\intercal}\mathbf{X}$. Similarly, the kernel matrix will match the dimension of $\mathbf{X}^{\intercal}\mathbf{Y}$ for row-major ordering of data matrix $\mathbf{X}$ and $\mathbf{Y}$.
Column Orientation
An orientation of Val(:col) identifies when each observation vector corresponds to a column of the data matrix:
With column-major ordering, the kernel matrix will match the dimensions of $\mathbf{XX}^{\intercal}$. Similarly, the kernel matrix of data matrices $\mathbf{X}$ and $\mathbf{Y}$ match the dimensions of $\mathbf{XY}^{\intercal}$.
Essentials
MLKernels.ismercer — Method.ismercer(κ::Kernel)Returns true if kernel κ is a Mercer kernel; false otherwise.
MLKernels.isnegdef — Method.isnegdef(κ::Kernel)Returns true if the kernel κ is a negative definite kernel; false otherwise.
MLKernels.isstationary — Method.isstationary(κ::Kernel)Returns true if the kernel κ is a stationary kernel; false otherwise.
MLKernels.isisotropic — Method.isisotropic(κ::Kernel)Returns true if the kernel κ is an isotropic kernel; false otherwise.
MLKernels.kernel — Method.kernel(κ::Kernel, x, y)Apply the kernel κ to $x$ and $y$ where $x$ and $y$ are vectors or scalars of some subtype of $Real$.
MLKernels.Orientation — Constant.OrientationUnion of the two Val types representing the data matrix orientations:
Val{:row}identifies when observation vector corresponds to a row of the data matrixVal{:col}identifies when each observation vector corresponds to a column of the data matrix
MLKernels.kernelmatrix — Method.kernelmatrix([σ::Orientation,] κ::Kernel, X::Matrix [, symmetrize::Bool])Calculate the kernel matrix of X with respect to kernel κ.
MLKernels.kernelmatrix! — Method.kernelmatrix!(P::Matrix, σ::Orientation, κ::Kernel, X::Matrix, symmetrize::Bool)In-place version of kernelmatrix where pre-allocated matrix K will be overwritten with the kernel matrix.
MLKernels.kernelmatrix — Method.kernelmatrix([σ::Orientation,] κ::Kernel, X::Matrix, Y::Matrix)Calculate the base matrix of X and Y with respect to kernel κ.
MLKernels.kernelmatrix! — Method.kernelmatrix!(K::Matrix, σ::Orientation, κ::Kernel, X::Matrix, Y::Matrix)In-place version of kernelmatrix where pre-allocated matrix K will be overwritten with the kernel matrix.
MLKernels.centerkernelmatrix! — Method.centerkernelmatrix(K::Matrix)Centers the (rectangular) kernel matrix K with respect to the implicit Kernel Hilbert Space according to the following formula:
Where $\mathbf{\mu}_{\phi\mathbf{x}}$ and $\mathbf{\mu}_{\phi\mathbf{x}}$ are given by:
Approximation
In many cases, fast, approximate results is more important than a perfect result. The Nystrom method can be used to generate a factorization that can be used to approximate a large, symmetric kernel matrix. Given data matrix $\mathbf{X} \in \mathbb{R}^{n \times p}$ (one observation per row) and kernel matrix $\mathbf{K} \in \mathbb{R}^{n \times n}$, the Nystrom method takes a sample $S$ of the observations of $\mathbf{X}$ of size $s < n$ and generates a factorization such that:
Where $\mathbf{W}$ is the $s \times s$ pseudo-inverse of the sample kernel matrix based on $S$ and $\mathbf{C}$ is a $s \times n$ matrix.
The Nystrom method uses an eigendecomposition of the sample kernel matrix of $\mathbf{X}$ to estimate $\mathbf{K}$. Generally, the order of $\mathbf{K}$ must be quite large and the sampling ratio small (ex. 15% or less) for the cost of the computing the full kernel matrix to exceed that of the eigendecomposition. This method will be more effective for kernels that are not a direct function of the dot product as they are not able to make use of BLAS in computing the full matrix $\mathbf{K}$ and the cross-over point will occur for smaller $\mathbf{K}$.
MLKernels.jl implements the Nystrom approximation:
MLKernels.NystromFact — Type.NystromFactType for storing a Nystrom factorization. The factorization contains two fields: W and C as described in the nystrom documentation.
MLKernels.nystrom — Function.nystrom([σ::Orientation,] κ::Kernel, X::Matrix, [S::Vector])Computes a factorization of Nystrom approximation of the square kernel matrix of data matrix X with respect to kernel κ. Returns a NystromFact struct which stores a Nystrom factorization satisfying:
MLKernels.kernelmatrix — Method.nystrom(CᵀWC::NystromFact)Compute the approximate kernel matrix based on the Nystrom factorization.