Interface
Data Orientation
Data matrices may be oriented in one of two ways with respect to the observations. Functions producing a kernel matrix require an orient
argument to specify the orientation of the observations within the provided data matrix.
Row Orientation (Default)
An orientation of Val(:row)
identifies when observation vector corresponds to a row of the data matrix. This is commonly used in the field of statistics in the context of design matrices.
For example, for data matrix $\mathbf{X}$ consisting of observations $\mathbf{x}_1$, $\mathbf{x}_2$, $\ldots$, $\mathbf{x}_n$:
When row-major ordering is used, then the kernel matrix of $\mathbf{X}$ will match the dimensions of $\mathbf{X}^{\intercal}\mathbf{X}$. Similarly, the kernel matrix will match the dimension of $\mathbf{X}^{\intercal}\mathbf{Y}$ for row-major ordering of data matrix $\mathbf{X}$ and $\mathbf{Y}$.
Column Orientation
An orientation of Val(:col)
identifies when each observation vector corresponds to a column of the data matrix:
With column-major ordering, the kernel matrix will match the dimensions of $\mathbf{XX}^{\intercal}$. Similarly, the kernel matrix of data matrices $\mathbf{X}$ and $\mathbf{Y}$ match the dimensions of $\mathbf{XY}^{\intercal}$.
Essentials
MLKernels.ismercer
— Method.ismercer(κ::Kernel)
Returns true
if kernel κ
is a Mercer kernel; false
otherwise.
MLKernels.isnegdef
— Method.isnegdef(κ::Kernel)
Returns true
if the kernel κ
is a negative definite kernel; false
otherwise.
MLKernels.isstationary
— Method.isstationary(κ::Kernel)
Returns true
if the kernel κ
is a stationary kernel; false
otherwise.
MLKernels.isisotropic
— Method.isisotropic(κ::Kernel)
Returns true
if the kernel κ
is an isotropic kernel; false
otherwise.
MLKernels.kernel
— Method.kernel(κ::Kernel, x, y)
Apply the kernel κ
to $x$ and $y$ where $x$ and $y$ are vectors or scalars of some subtype of $Real$.
MLKernels.Orientation
— Constant.Orientation
Union of the two Val
types representing the data matrix orientations:
Val{:row}
identifies when observation vector corresponds to a row of the data matrixVal{:col}
identifies when each observation vector corresponds to a column of the data matrix
MLKernels.kernelmatrix
— Method.kernelmatrix([σ::Orientation,] κ::Kernel, X::Matrix [, symmetrize::Bool])
Calculate the kernel matrix of X
with respect to kernel κ
.
MLKernels.kernelmatrix!
— Method.kernelmatrix!(P::Matrix, σ::Orientation, κ::Kernel, X::Matrix, symmetrize::Bool)
In-place version of kernelmatrix
where pre-allocated matrix K
will be overwritten with the kernel matrix.
MLKernels.kernelmatrix
— Method.kernelmatrix([σ::Orientation,] κ::Kernel, X::Matrix, Y::Matrix)
Calculate the base matrix of X
and Y
with respect to kernel κ
.
MLKernels.kernelmatrix!
— Method.kernelmatrix!(K::Matrix, σ::Orientation, κ::Kernel, X::Matrix, Y::Matrix)
In-place version of kernelmatrix
where pre-allocated matrix K
will be overwritten with the kernel matrix.
MLKernels.centerkernelmatrix!
— Method.centerkernelmatrix(K::Matrix)
Centers the (rectangular) kernel matrix K
with respect to the implicit Kernel Hilbert Space according to the following formula:
Where $\mathbf{\mu}_{\phi\mathbf{x}}$ and $\mathbf{\mu}_{\phi\mathbf{x}}$ are given by:
Approximation
In many cases, fast, approximate results is more important than a perfect result. The Nystrom method can be used to generate a factorization that can be used to approximate a large, symmetric kernel matrix. Given data matrix $\mathbf{X} \in \mathbb{R}^{n \times p}$ (one observation per row) and kernel matrix $\mathbf{K} \in \mathbb{R}^{n \times n}$, the Nystrom method takes a sample $S$ of the observations of $\mathbf{X}$ of size $s < n$ and generates a factorization such that:
Where $\mathbf{W}$ is the $s \times s$ pseudo-inverse of the sample kernel matrix based on $S$ and $\mathbf{C}$ is a $s \times n$ matrix.
The Nystrom method uses an eigendecomposition of the sample kernel matrix of $\mathbf{X}$ to estimate $\mathbf{K}$. Generally, the order of $\mathbf{K}$ must be quite large and the sampling ratio small (ex. 15% or less) for the cost of the computing the full kernel matrix to exceed that of the eigendecomposition. This method will be more effective for kernels that are not a direct function of the dot product as they are not able to make use of BLAS in computing the full matrix $\mathbf{K}$ and the cross-over point will occur for smaller $\mathbf{K}$.
MLKernels.jl implements the Nystrom approximation:
MLKernels.NystromFact
— Type.NystromFact
Type for storing a Nystrom factorization. The factorization contains two fields: W
and C
as described in the nystrom
documentation.
MLKernels.nystrom
— Function.nystrom([σ::Orientation,] κ::Kernel, X::Matrix, [S::Vector])
Computes a factorization of Nystrom approximation of the square kernel matrix of data matrix X
with respect to kernel κ
. Returns a NystromFact
struct which stores a Nystrom factorization satisfying:
MLKernels.kernelmatrix
— Method.nystrom(CᵀWC::NystromFact)
Compute the approximate kernel matrix based on the Nystrom factorization.