Mistral Devstral 2: A New Era for Coding AI
Mistral has introduced Devstral 2, a new state-of-the-art open-weights AI model designed as a highly capable coding agent for production-grade workflows. 🤖
Key Features & Benchmarks: The model comes in two versions: Devstral 2 (123 billion parameters) and Devstral small (24 billion parameters, suitable for local deployment on consumer hardware). Both are currently free for a limited time in tools like Kilo Code. Benchmarks reveal Devstral 2 scores 72.2% on the BenchVerify test, outperforming models such as GLM 4.6, Miniax, and Quan 3, while trailing DeepSeek v3.2 by only approximately 1%. It is reported to be more cost-efficient than cloud models on real-world tasks and approaches proprietary giants like Google and Anthropic in performance, despite being open-weight and significantly smaller (Devstral 2 is 5x, Devstral small 28x smaller than DeepSeek v3.2). In an independent annotation test, Devstral 2 achieved a 42.8% win rate against DeepSeek v3.2, though it remains behind "set 4.5." It features a substantial 256K context window. Pricing is competitive: Devstral 2 costs $0.40 per 1 million input tokens and $2 per 1 million output tokens, while Devstral small is $0.10 per 1 million input tokens and $0.30 per 1 million output tokens. 💰
Use Cases & Testing:
- Backend Coding: For a URL shortener project using TypeScript, Hono, and Better SQLite3, Devstral 2 successfully scaffolded the API from scratch, including database initialization, Zod validation, and multiple endpoints. It demonstrated autonomous error correction for TypeScript issues and self-installed required adapters. However, it occasionally used deprecated Zod schemas, necessitating manual review. All created endpoints (create, read, delete, redirect) functioned as expected. ✅
- Frontend Coding: When tasked with building a "Deals" page for an existing React CRM dashboard, Devstral 2 efficiently created the page, reusing existing components and aligning perfectly with the application's established user experience (UX). CRUD operations, search, sort, and filter functionalities were implemented accurately and without issue. 🎨
Overall Impression: Devstral 2 is lauded as a solid, practical, and implementation-focused model, excelling in execution rather than deep planning due to its "non-thinking" nature. Its strengths include scaffolding functional backend APIs and seamlessly integrating new frontend components within existing design systems. The primary weakness is its occasional reliance on outdated functions or patterns, requiring developer vigilance for review and correction. Ideal for daily coding tasks, quick iterations, bug fixes, and handling legacy code. Nathan recommends Devstral 2 for its speed, capability, and helpfulness in accelerating development. 👍