A Picture is Worth 170 Tokens: How Does GPT-4o Encode Images?

Here’s a fact: GPT-4o charges 170 tokens to process each 512x512 tile used in high-res mode. At ~0.75 tokens/word, this suggests a picture is worth about 227 words—only a factor of four off from the traditional saying. (There’s also an 85 tokens charge for a low-res ‘master thumbnail’ of each picture and higher resolution images are broken into many such 512x512 tiles, but let’s just focus on a single high-res tile.
Read more...

Let's Play Jeopardy! with LLMs

Update 2024-05-14: Hot off the presses, the benchmark now includes the recently released GPT-4o model! How good are LLMs at trivia? I used the Jeopardy! dataset from Kaggle to benchmark ChatGPT and the new Llama 3 models. Here are the results: There you go. You’ve already gotten 90% of what you’re going to get out of this article. Some guy on the internet ran a half-baked benchmark on a handful of LLM models, and the results were largely in line with popular benchmarks and received wisdom on fine-tuning and RAG.
Read more...

My Dinner with ChatGPT

It's hard to talk about ChatGPT without cherry-picking. It's too easy to try a dozen different prompts, refresh each a handful of times, and report the most interesting or impressive thing from those sixty trials. While this problem plagues a lot of the public discourse around generative models, cherry-picking is particularly problematic for ChatGPT because it's actively using the chat history as context. (It might be using a $\mathcal{O}(n \log{} n)$ attention model like reformer or it might just be brute forcing it, but either it has an impressively long memory; about 2048 "
Read more...