LLM Performance Analysis

Solana Smart Contract
Benchmark

Evaluating frontier AI models on their ability to write, repair, and migrate Solana smart contracts across Anchor and native Rust.

18 Tasks

7 Categories

3 Modes

13 Models

View source on GitHub

Methodology

Scoring

Each task is scored across four stages: Build (compiles successfully), Public Tests (provided test cases), Hidden Tests (secret correctness checks), and Adversarial Tests (security and edge-case exploits). Weighted scores are averaged across all 18 tasks.

Task Types

Models are tested in three interaction modes: Generate (write from spec), Repair (fix broken code), and Migrate (upgrade existing contracts). Tasks span escrow, vaults, staking, multisig, vesting, and state management.

Environment

All models run in identical sandboxed environments with the same toolchain: Anchor 0.32.1, Solana 3.1.11, Rust 1.91.1. Single-attempt, offline mode with no retry loops. Models receive the same prompt and starter code.

Solana Smart Contract
Benchmark

Overall Score

Model Results

Methodology

Scoring

Task Types

Environment

Solana Smart ContractBenchmark

Overall Score

Model Results

Methodology

Scoring

Task Types

Environment

Solana Smart Contract
Benchmark