We all use ChatGPT or Claude daily at this point. But that moment when you get a confident answer and something feels off… so you open another tab, paste the same question into Gemini, get a completely different answer, and now you’re more confused than before.
I got sick of the tab-switching game so I set up a system where 4 models (GPT, Gemini, DeepSeek, Grok) debate a single question with assigned roles. One proposes, others critique, one synthesizes. Multiple rounds until they converge on a verdict with a confidence score.
First thing I tested: “Which AI is the best overall?” Grok won at 75% consensus. GPT agreed, DeepSeek sided with Grok, Gemini was the only one pushing for itself. None picked Claude or ChatGPT.
The interesting part is weak arguments die fast when 3 models are actively attacking them. A single model can hallucinate confidently but it’s way harder to maintain a bad argument when critics are poking holes every round.
Has anyone else experimented with multi-model debate or consensus systems? Curious what approaches others are taking.
here’s the full debate result: Which AI model is currently the best overall, in terms of…
submitted by /u/Fluffy-4213
[link] [comments]