LLM Performance Analysis

Solana Smart Contract
Benchmark

Evaluating frontier AI models on their ability to write, repair, and migrate Solana smart contracts across Anchor and native Rust.

18 Tasks
7 Categories
3 Modes
10 Models
View source on GitHub

Overall Score

Weighted score across 18 smart contract tasks


Model Results

Click any model for detailed breakdown and per-task results


Methodology

Scoring

Each task is scored across four stages: Build (compiles successfully), Public Tests (provided test cases), Hidden Tests (secret correctness checks), and Adversarial Tests (security and edge-case exploits). Weighted scores are averaged across all 18 tasks.

Task Types

Models are tested in three interaction modes: Generate (write from spec), Repair (fix broken code), and Migrate (upgrade existing contracts). Tasks span escrow, vaults, staking, multisig, vesting, and state management.

Environment

All models run in identical sandboxed environments with the same toolchain: Anchor 0.32.1, Solana 3.1.11, Rust 1.91.1. Single-attempt, offline mode with no retry loops. Models receive the same prompt and starter code.