AI News
Real Time

OpenMythos: An Open-Source Theoretical Reconstruction of the Mythos Architecture

OpenMythos: An open-source Theoretical Reconstruction of the Mythos ArchitectureThe OpenMythos project has been officially open-sourced, presenting a...

OpenMythos: An open-source Theoretical Reconstruction of the Mythos Architecture

The OpenMythos project has been officially open-sourced, presenting a theoretical reconstruction of the Claude Mythos architecture. This implementation utilizes a Recurrent-Depth Transformer (RDT) design, offering a novel solution for Deep Reasoning tasks.

What is OpenMythos?

OpenMythos is an open-source, theoretical model implementation designed to explore the Recurrent-Depth Transformer (RDT) architecture. Built upon publicly avAIlable research literature, it implements a three-stage architecture: Prelude, a Recurrent Block, and Coda.

Core Features:

Core Architecture Design

OpenMythos employs an innovative three-stage structural design:

Prelude

  • Consists of Standard Transformer layers.

  • Runs once during a single forward propagation.

  • Generates the initial hidden state and encodes input vectors.

Recurrent Block

  • Updates the hidden state during each cycle.

  • Injects the original encoded input at every cycle.

Coda

  • Consists of standard Transformer layers.

  • Runs once during a single forward propagation.

  • Converts the final hidden state into ouTPUt logits.

Recurrent Update Rule
In each recurrent step $t$, the hidden state is updated according to the following rule:


Where:

  • The Transformer block APPlies attention and MLP.

This design ensures the original input signal remains ACTive throughout the recurrent depth, preventing the model from drifting during iterations.

Key Technical Innovations

  1. LTI Stability Constraints
    training recurrent models is notoriously unstable, primarily facing two failure modes: Residual Explosion (unbounded growth of hidden state $h_t$) and Loss Spikes (sudden divergence due to large spectral norms of injection parameters). OpenMythos employs Linear Time-Invariant (LTI) system theory to guarantee stability by constraining injection parameters:

    • Use ZOH/Euler discretization.

    • Enforce negativity by learning a scalar $\Delta t$: $A := \text{Diag}(-\exp(\log_A))$.
      This ensures the spectral radius $\rho(A) < 1$, making the recurrent model more robust to hyperparameter choices and allowing stable training even at high learning rates.

  2. Adaptive Computation Time (ACT)
    More loops are not always better. Beyond a certain depth, excessive looping degrades predictive performance—a failure mode known as "over-thinking." OpenMythos adopts an Adaptive Computation Time (ACT) mechanism:

    • Learns a scalar for each position.

    • Dynamically decides when to stop looping.

    • Difficult positions receive more compute; simple tokens stop early.
      This allows the model to learn a halting signal when the answer converges, rendering the model Turing complete under certain assumptions.

  3. Mixture of Experts (MoE) Design
    Each FFN in the recurrent block is replaced by a fine-grained MoE layer:

    • Splits the FFN into many small experts (1/m of normal size).


  4. Continuous Depth Batching
    Since all tokens share the Same recurrent block, the model can exit the loop at different depths—processing simple inputs quickly within the same batch while using more iterations for difficult inputs. Theoretical analysis suggests inference throughput can be improved by 2-3x.

Systematic Generalization Capability

Recurrent models exhibit unique advantages in handling knowledge combinations never seen during training, achieving this through a three-stage epiphany process:

  1. MEMOrization Phase: The model fits the training distribution.

  2. In-Distribution Generalization: The model handles known combinations.

  3. Systematic Generalization: The model handles novel, out-of-distribution combinations.

This is why recurrent models feel qualitatively different from others on new problems—capabilities emerge via phase transition rather than gradual appearance. When training on 5-hop reasoning chains and testing on 10-hop chains, standard Transformers fail, whereas recurrent models succeed by running more loops at inference time.

Inference-Time Scaling

More loops at inference time improve quality, following a predictable saturated exponential decay—gains are real but diminishing. This reflects the inference-time scaling characteristics of Chain-of-Thought. At 770M parameters, the recurrent model achieves the downstream quality of a 1.3B fixed-depth Transformer trained on the same data—achieving similar quality with roughly half the parameters.

Implementation Features

Pre-configured Scales
OpenMythos provides pre-configured scales ranging from 1B to 1T parameters:

VariantDimExpertsExpert DimRecurrent ItersContextMax Output
mythos_1b2048642048164K4k
mythos_3b3072644096164k4k
mythos_10b40961285632248K4k
mythos_50b61442569728328k4k
mythos_100b819225613568321M128k
mythos_500b1228851223040481M128k
mythos_1t1638451234560641M128k

Quick Start

Install OpenMythos:

pip install open-mythos


Basic Usage Example:

import torch
from open_mythos.main import OpenMythos, MythosConfig

# Configure model
cfg = MythosConfig(
    vocab_size=1000,
    dim=256,
    n_heads=8,
    max_seq_len=128,
    max_loop_iters=4,
    prelude_layers=1,
    coda_layers=1,
    n_experts=8,
    n_shared_experts=1,
    n_experts_per_tok=2,
    expert_dim=64,
    attn_type="mla"  # or "gqa"
)

# Initialize model
model = OpenMythos(cfg)

# Forward propagation
ids = torch.randint(0, cfg.vocab_size, (2, 16))
logits = model(ids, n_loops=4)

# Generate text
out = model.generate(ids, max_new_tokens=8, n_loops=8)

Training Configuration

OpenMythos provides complete training scripts supporting single and multi-GPU setups. Key design choices include:

  • Optimizer: Muon for 2D weight matrices, AdamW for embeddings and norms.

  • Dataset: HuggingFaceFW/fineweb-edu (default sample-10BT).

  • Tokenizer: Uses OpenAI/gpt-oss-20b via MythosTokenizer.

  • Parallelization: PyTorch DDP via torchrun.

  • Precision: bfloat16 on H100/A100, float16 + GradScaler on older GPUs.

Technical Advantages Summary

  1. Inference Scaling: Inference-time compute scales with loop count, not model size.

  2. Deep Reasoning: Parameter-wise deep reasoning is "free."

  3. Training Stability: LTI constraints guarantee training stability.

  4. Adaptive Compute: ACT mechanism enables on-demand computation.

Application Scenarios

The OpenMythos architecture is particularly suitable for:

  • Multi-step mathematical reasoning.

  • Long-horizon planning.

  • Hierarchical argumentation.

  • code generation and debugging.

  • Scientific reasoning.

Open Source Statement

OpenMythos is an independent, community-driven theoretical reconstruction project built solely on publicly available research and speculation. It is not affiliated with anthropic or any of its proprietary systems. The project is open-sourced under the MIT license, and community contributions are welcome.

★★★★★
★★★★★
Be the first to rate this article.

Comments & Questions (0)

Captcha
Please be respectful — let's keep the conversation friendly.

No comments yet

Be the first to comment!