Skip to content

sgoedecke/ai-poker-arena

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Poker Arena

Uses GitHub Models to compare the performance of many small models in a simulated game of poker:

Image

Setup

Set your GITHUB_TOKEN env var to a GitHub PAT (doesn't need any permissions). If you have gh installed you can use the one from gh auth token.

Then npm install and npm run start to start the local server.

You can plug in any model from the list.

Motivation

AI models are often evaluated against benchmarks or with direct human voting (e.g. LLMSYS/Chatbot Arena). Benchmarks have many known issues (leaking into training data, evaluating mostly-right answers, etc), and human voting biases towards longer and more impressive-sounding answers. A lot of the most informed people judge models based on vibe, or "big model smell". There's been some recent work at putting models in a simulated space (e.g. a Minecraft build-off here) to get a sense of their creativity and ability to construct a large or complex project, but that's really early so far.

I thought it'd be interesting to evaluate models based on their competition with each other in a simulated space: purely adversarial.

Disclaimer: I work on GitHub Models at GitHub, but this isn't a formal GitHub project or affiliated in any way. I built this on the weekend because I thought it was a neat idea.

About

Making multiple LLMs play Texas Holdem against each other

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors