Skip to content

Feed Forward Networks (FFN)

About

The d9d.module.block.ffn package implements standard dense Feed-Forward networks used in Transformer blocks.

Features

SwiGLU

SwiGLU is a SwiGLU layer.

Uses efficient SiLU-Mul kernel.

Kernel Benchmarks (BF16, H100)

d9d.module.block.ffn

SwiGLU

Bases: Module, ModuleLateInit

Implements the SwiGLU Feed-Forward Network (FFN).

This module applies the gated activation function: down(SiLU(gate(x)) * up(x)). It corresponds to the standard MLP block used in architectures like LLaMA.

__init__(hidden_size, intermediate_size, bias=False)

Constructs a SwiGLU object.

Parameters:

Name Type Description Default
hidden_size int

The hidden dim size.

required
intermediate_size int

The intermediate dim size of the FFN.

required
bias bool

Whether to use bias in the linear projections.

False

forward(x)

Applies the SwiGLU FFN to the input.

Parameters:

Name Type Description Default
x Tensor

Input tensor. Shape: (batch_size, seq_len, hidden_dim).

required

Returns:

Type Description
Tensor

Output tensor. Shape: (batch_size, seq_len, hidden_dim).

reset_parameters()

Resets module parameters.