hey, I’m Jeswanth đź‘‹

take a seat and read along.

I’m a Member of Technical Staff at ZLabs, Zoho — I train LLMs, experiment with architectures, build evaluations, and apply what I learn from papers.

I write about things I’m actively exploring — GPU kernels, LLM inference, multimodal systems, and the occasional deep dive.

RoPE Implementation Explained Mathematically

Introduction In this post, I explain the RoPE (Rotary Position Embedding) implementation mathematically. I assume readers have some high-level understanding of sinusoidal positional encoding and Rotary PE. The implementation discussed here is based on HuggingFace’s Transformers library (specifically the Llama model). RoPE was introduced in the paper “RoFormer: Enhanced Transformer with Rotary Position Embedding” by Su et al. (2021). The core idea is to encode positional information by rotating query and key vectors before computing attention scores — rather than adding a fixed positional vector to token embeddings. This results in an elegant property: attention scores naturally encode relative position between tokens. ...

June 4, 2026 Â· 9 min Â· Jeswanth Mukesh