Category: AI Model Comparisons - Page 2

10 Min Read

onApril 25, 2026

Allen Institute’s MolmoWeb 8B Beats GPT-4o on Web Navigation—First Open-Weight Agent to Outperform Proprietary Models Across 4 Major Benchmarks

An 8-billion parameter open-source model just outperformed GPT-4o at navigating the web. Allen Institute released everything—training code,…

9 Min Read

Artur Markus

onApril 24, 2026

Alibaba’s Qwen Hits 700 Million Downloads on Hugging Face—Overtakes Meta’s Llama as World’s Most Popular Open-Source AI Model

A Chinese AI model just quietly dethroned Meta’s Llama as the world’s most downloaded open-source system—and December’s…

9 Min Read

Artur Markus

onApril 15, 2026

Meta Llama 4 Launches April 5 with 109B-Parameter Scout Model—17B Active Parameters Fit on Single H100, Outperform GPT-4.5 and Claude 3.7 on STEM Benchmarks

Meta just made frontier AI fit on one GPU—and simultaneously banned 450 million Europeans from using it.

10 Min Read

Artur Markus

onApril 11, 2026

OpenAI o3-mini Launches January 31 with 95% Cost Cut Over GPT-4—First Small Reasoning Model with Function Calling Outperforms o1 on Coding Benchmarks

OpenAI’s cheapest reasoning model now beats its most expensive one at writing code. That sentence should make every CTO reconsider their…

14 Min Read

Artur Markus

onApril 4, 2026

Google Gemma 4 Ranks #3 on Arena AI Leaderboard—31B Open Model Hits 85.2% MMLU Pro and 89.2% AIME 2026, Outperforming Models 20× Larger

A 31-billion parameter model now outperforms systems with 600B+ parameters on reasoning and math benchmarks. Google just made the most…

9 Min Read

Artur Markus

onApril 3, 2026

Microsoft Maia 200 Cuts AI Token Costs 30% with 140 Billion Transistors—3nm Chip Deployed January 27 in US Data Centers, Outperforms Amazon Trainium and Google TPU on Inference

Microsoft just made your AI inference bill 30% cheaper—not through clever prompt engineering or model distillation, but by building the most…

8 Min Read

Artur Markus

onMarch 31, 2026

Gemini 3.1 Pro Preview Hits 94.1% on LM Council Benchmark—Google Takes Three of Top Four Leaderboard Spots as Claude 4.6 Opus and GPT-5.4 Trail by Double Digits

Google just claimed three of the top four spots on major AI benchmarks while charging 60% less than the competition. The Claude-GPT duopoly…

9 Min Read

Artur Markus

onMarch 16, 2026

University of Montreal Tests AI Against 100,000 Humans on Creativity—GPT-4 Beats 72% But Top 10% Still Win

GPT-4 now outperforms the average human on standardized creativity tests. But before you fire your creative team, consider this: when…

10 Min Read

Artur Markus

onMarch 5, 2026

Moonshot AI Releases Kimi K2.5 with 100-Agent Swarm Feature—Trained on 15 Trillion Tokens, Beats GPT-5.2 on Coding and Video Benchmarks

The AI scaling wars just took an unexpected turn: a Chinese startup released a model that orchestrates 100 parallel agents instead of chasing…

10 Min Read

Artur Markus

onFebruary 24, 2026

University of Montreal Study Proves AI Beats Average Humans on Creativity Tests—But Top 10% Still Outperform GPT-4

The world’s largest creativity study just revealed an uncomfortable truth: half of humanity is now less creative than a language model.…

AI Model Comparisons

Subscribe to my Blog