Skip to content

Positional Embeddings

About

The d9d.module.block.positional package manages positional encoding logic.

Features

Rotary Positional Encoding

Rotary Positional Encoding from RoFormer.

See RotaryEmbeddingProvider and RotaryEmbeddingApplicator classes.

First one is typically bound to a model class and is used for providing (cos, sin) embedding tensors for specified position IDs.

Second one is typically bound to attention module implementation and is used for modifying query and key states in runtime.

Embedding Layout Styles

The package supports multiple internal memory layouts for RoPE operations via the RotaryEmbeddingStyle enumeration. It is critical that both the provider and applicator share the identical style configuration:

d9d.module.block.positional

Provides modules for positional embeddings, such as Rotary Positional Embeddings.

RotaryEmbeddingApplicator

Bases: Module

Applies Rotary Positional Embeddings (RoPE) to Q and K projections.

__init__(style)

Constructs RotaryEmbeddingApplicator object.

Parameters:

Name Type Description Default
style RotaryEmbeddingStyle

Rotary embedding layout style alignment.

required

forward(query_states, key_states, position_embedding_cos, position_embedding_sin)

Rotates query and key states using provided cosine and sine embeddings.

Parameters:

Name Type Description Default
query_states Tensor

Query tensor. Shape: (batch, n_heads, seq_len, head_dim).

required
key_states Tensor

Key tensor. Shape: (batch, n_kv_heads, seq_len, head_dim).

required
position_embedding_cos Tensor

Cosine values for positions. Shape: (batch, seq_len, head_dim).

required
position_embedding_sin Tensor

Sine values for positions. Shape: (batch, seq_len, head_dim).

required

Returns:

Type Description
tuple[Tensor, Tensor]

A tuple containing the rotated query and key tensors.

RotaryEmbeddingProvider

Bases: Module, ModuleLateInit

Module that manages and provides Rotary Positional Embeddings.

__init__(rope_base, head_dim, max_position_ids, style)

Constructs the RotaryEmbeddingProvider.

Parameters:

Name Type Description Default
rope_base int

Base geometrical progression period for RoPE.

required
head_dim int

Dimensionality of the attention head.

required
max_position_ids int

Maximum supported sequence length for caching.

required
style RotaryEmbeddingStyle

Embedding layout alignment.

required

forward(position_ids)

Retrieves cached cosine and sine embeddings for specific positions.

Parameters:

Name Type Description Default
position_ids Tensor

Tensor of position indices.

required

Returns:

Type Description
tuple[Tensor, Tensor]

A tuple of (cos, sin) tensors aligned with the input positions.

reset_parameters()

Resets module buffer populated values.

RotaryEmbeddingStyle

Bases: StrEnum

Supported Rotary Positional Embedding (RoPE) layout styles.

Attributes:

Name Type Description
HALF

Applies transformations by splitting the feature dimension into two halves.

INTERLEAVED

Applies transformations by treating adjacent feature elements as pairs.