Skip to content

Low Level APIs

Model Kernels

Kernel API
RMSNorm liger_kernel.transformers.LigerRMSNorm
LayerNorm liger_kernel.transformers.LigerLayerNorm
RoPE liger_kernel.transformers.liger_rotary_pos_emb
SwiGLU liger_kernel.transformers.LigerSwiGLUMLP
GeGLU liger_kernel.transformers.LigerGEGLUMLP
CrossEntropy liger_kernel.transformers.LigerCrossEntropyLoss
Fused Linear CrossEntropy liger_kernel.transformers.LigerFusedLinearCrossEntropyLoss
Multi Token Attention liger_kernel.transformers.LigerMultiTokenAttention
Softmax liger_kernel.transformers.LigerSoftmax
Sparsemax liger_kernel.transformers.LigerSparsemax

RMS Norm

RMS Norm simplifies the LayerNorm operation by eliminating mean subtraction, which reduces computational complexity while retaining effectiveness.

This kernel performs normalization by scaling input vectors to have a unit root mean square (RMS) value. This method allows for a ~7x speed improvement and a ~3x reduction in memory footprint compared to implementations in PyTorch.

Try it out

You can experiment as shown in this example here.

RoPE

RoPE (Rotary Position Embedding) enhances the positional encoding used in transformer models.

The implementation allows for effective handling of positional information without incurring significant computational overhead.

Try it out

You can experiment as shown in this example here.

SwiGLU

GeGLU

CrossEntropy

This kernel is optimized for calculating the loss function used in classification tasks.

The kernel achieves a ~3x execution speed increase and a ~5x reduction in memory usage for substantial vocabulary sizes compared to implementations in PyTorch.

Try it out

You can experiment as shown in this example here.

Fused Linear CrossEntropy

This kernel combines linear transformations with cross-entropy loss calculations into a single operation.

Try it out

You can experiment as shown in this example here

Multi Token Attention

The Multi Token Attention kernel implementation provides and optimized fused implementation of multi-token attention over the implemented Pytorch model baseline. This is a new attention mechanism that can operate on multiple Q and K inputs introduced by Meta Research.

Paper: https://arxiv.org/abs/2504.00927

Softmax

The Softmax kernel implementation provides an optimized implementation of the softmax operation, which is a fundamental component in neural networks for converting raw scores into probability distributions.

The implementation shows notable speedups compared to the Softmax PyTorch implementation

Sparsemax

Sparsemax is a sparse alternative to softmax that produces sparse probability distributions. This kernel implements an efficient version of the sparsemax operation that can be used as a drop-in replacement for softmax in attention mechanisms or classification tasks.

The implementation achieves significant speed improvements and memory savings compared to standard PyTorch implementations, particularly for large input tensors.

Alignment Kernels

Kernel API
Fused Linear CPO Loss liger_kernel.chunked_loss.LigerFusedLinearCPOLoss
Fused Linear DPO Loss liger_kernel.chunked_loss.LigerFusedLinearDPOLoss
Fused Linear ORPO Loss liger_kernel.chunked_loss.LigerFusedLinearORPOLoss
Fused Linear SimPO Loss liger_kernel.chunked_loss.LigerFusedLinearSimPOLoss

Distillation Kernels

Kernel API
KLDivergence liger_kernel.transformers.LigerKLDIVLoss
JSD liger_kernel.transformers.LigerJSD
Fused Linear JSD liger_kernel.transformers.LigerFusedLinearJSD

Experimental Kernels

Kernel API
Embedding liger_kernel.transformers.experimental.LigerEmbedding
Matmul int2xint8 liger_kernel.transformers.experimental.matmul