AI Poker Arena

Uses GitHub Models to compare the performance of many small models in a simulated game of poker:

Setup

Set your GITHUB_TOKEN env var to a GitHub PAT (doesn't need any permissions). If you have gh installed you can use the one from gh auth token.

Then npm install and npm run start to start the local server.

You can plug in any model from the list.

Motivation

AI models are often evaluated against benchmarks or with direct human voting (e.g. LLMSYS/Chatbot Arena). Benchmarks have many known issues (leaking into training data, evaluating mostly-right answers, etc), and human voting biases towards longer and more impressive-sounding answers. A lot of the most informed people judge models based on vibe, or "big model smell". There's been some recent work at putting models in a simulated space (e.g. a Minecraft build-off here) to get a sense of their creativity and ability to construct a large or complex project, but that's really early so far.

I thought it'd be interesting to evaluate models based on their competition with each other in a simulated space: purely adversarial.

Disclaimer: I work on GitHub Models at GitHub, but this isn't a formal GitHub project or affiliated in any way. I built this on the weekend because I thought it was a neat idea.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
public		public
.gitignore		.gitignore
README.md		README.md
hands.js		hands.js
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js
test.js		test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Poker Arena

Setup

Motivation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

sgoedecke/ai-poker-arena

Folders and files

Latest commit

History

Repository files navigation

AI Poker Arena

Setup

Motivation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages