Transformer Implementation
In this section, we'll take our letter frequency solution and implement it using the transformer architecture as follows:
---
title: One-layer Attention-only Transformer
---
stateDiagram-v2
Embedding: Embedding
Embedding: 1. One-hot encode each token as a vector of numbers.
Attention: Attention Block
Attention: 2. Take the mean to obtain ciphertext frequencies.
Unembedding: Unembedding
Unembedding: 3. Compare the ciphertext letter frequencies to each rotation's expected frequencies.
[*] --> Embedding: "d edb" (5 tokens)
Embedding --> Attention
Attention --> Unembedding
Unembedding --> [*]: largest score indicates rotation 3 ("a bay")