LLM Performance Analysis
Evaluating frontier AI models on their ability to write, repair, and migrate Solana smart contracts across Anchor and native Rust.
View source on GitHubWeighted score across 18 smart contract tasks
Click any model for detailed breakdown and per-task results
Each task is scored across four stages: Build (compiles successfully), Public Tests (provided test cases), Hidden Tests (secret correctness checks), and Adversarial Tests (security and edge-case exploits). Weighted scores are averaged across all 18 tasks.
Models are tested in three interaction modes: Generate (write from spec), Repair (fix broken code), and Migrate (upgrade existing contracts). Tasks span escrow, vaults, staking, multisig, vesting, and state management.
All models run in identical sandboxed environments with the same toolchain: Anchor 0.32.1, Solana 3.1.11, Rust 1.91.1. Single-attempt, offline mode with no retry loops. Models receive the same prompt and starter code.