LLM FIFA WORLD CUP 2026 - Epic Agentic AI Battle

ChatGPT vs Claude vs Gemini

If you like Futbol, and are curious about AI - You will Love this!

AgenTorque Labs is running a fun little public AI experiment with the FIFA World Cup as the backdrop. The experiment is designed to test frontier model capabilities on long-running decision making strategies.

Follow the live battle here.

Model Capabilities being Evaluated

Ability to negotiate with each other to create a common rubric Independent sourcing of information & data
Independent Analysis to create daily predictions & probabilities (short horizon decision planning)
Daily full forward simulation through the brackets to finals (long horizon decision planning)
Adjusting strategies to real-time data, feedback and competitive leaderboard positions on a daily basis

The Scoring is public and simple - Each LLM is score on Points and Brier scores

Additional capabilities being evaluated - At the back-end, data and evals are also being gathered on individual model reactions, process and data compliance, guardrail miss, strategy alignments, basic data quality adherence issues (e.g. including a team in Round of 16 that was not present in Round of 32!), ability to follow instructions, etc.

However, as promised to the LLMs, most of these are non-public at the moment (for them to retain their competitive edge).

Some may be published in the interim to help the models improve or show issues on data, governance and process quality that need improvement, or to inject new eval rules dynamically into the bake-off. However most would be analyzed and published post tournament.

Ultimately these back-end data quality, process and governance related items will be as valuable for model selection and evaluation as the decision-making capabilities as well

LLM Leaderboard

Bake-off was initiated near the start of the World Cup - about a couple of days into the first group matches. So far the LLM's have provided their predictions for 24 matches. This is a decent enough sample to start gauging performance. With all 3rd round group matches remaining, and the subsequent knock-out rounds, we are confident that this will provide meaningful data and information on model performance.

Leaderboard as of 29 Jun 2026 morning - Claude took the early lead for 30 matches, and then ChatGPT made large strides, starting Jun 24. ChatGPT is the current leader for the last 19 matches on both Points Score and Brier Score. Gemini is a distant third.

You can check the daily leaderboard scores here. Detailed score ledgers with historical data on predictions and outcomes can be checked here. The AgenTorque World Cup HUB page also provides an interactive chart showing leader progress on points and brier over the course of the tournament.

Daily Projections and Full Forward Simulation

Every day, each LLM provides predictions for the next days matches (short horizon decision making), which can be found here, and a full forward simulation through the brackets to the final (long horizon planning), which can be found here. During the group matches, the outcome include win/loss/draw. For knock-outs, win/loss.

Who are they projecting to win the Cup...? Glad that you asked...each one has a different winner as of the date of this article. Do they match your predictions?

Screenshot of the podium calls after completion of Jun 28, 2026 matches.

Podium Projections on Jun 28, 2026

Note On Initial Briefing, Methodology, Rubric and Output Data Contract

Each LLM was given the same briefing prompt. Each LLM was tasked to create the initial rubric individually.
Rubric for team evaluation, scoring, etc. to create prediction probabilities.
Negotiated with other LLMs to arrive at a consensus rubric
AgenTorque only facilitated communications between the LLMs and did not play judge on the rubric
On the final rubric, AgenTorque provided some process guidelines which were ratified by the LLMs
The entire negotiation and consensus took 4-5 turns
Post this, no further conversation or exchange of information between the LLMs other than daily leaderboard positions

The entire initial briefing prompt, rubric and methodology is outlined here. Each LLM is asked to stick the same data output contract. The JSON shape will be made available soon on the same page.

Also note that this is not intended to create a betting product. The objective of the experiment is to test frontier model capabilities for long-running decision making strategies.

One Last Thing

FIFA is all about community. We are hoping that this public AI experiment ultimately helps the community as well. We would love to have community contribution and participation in this project. Your ideas and suggestions would go a long way in helping us make this better.

*This article was originally published on LinkedIn on 24 June 2026. It has been updated with latest statistics while publishing on this site. You can find the original LinkedIn article here