weekend ai reads for 2024-04-26

📰 ABOVE THE FOLD: BENCHMARKS

“The first thing to recognise is that it’s very hard to really properly evaluate models in the same way it’s very hard to properly evaluate humans,” said Mike Volpi, a partner at venture capital firm Index Ventures. “If you look at one thing like ‘can you jump high or run fast?’ it’s easy. But human intelligence? It’s almost an impossible task.”

Traditional benchmarks are often static or close-ended (e.g., MMLU multi-choice QA), which do not satisfy the above requirements. On the other hand, models are evolving faster than ever, underscoring the need to build benchmarks with high separability.

We introduce Arena-Hard – a data pipeline to build high-quality benchmarks from live data in Chatbot Arena, which is a crowd-sourced platform for LLM evals. To measure its quality, we propose two key metrics:

1. Agreement to Human preference: whether the benchmark score has high agreement to human preference.

2. Separability: whether the benchmark can confidently separate models.

  • detailed look into what it takes to develop a benchmark for a specific domain

A.I. Has a Measurement Problem — Which A.I. system writes the best computer code or generates the most realistic image? Right now, there’s no easy way to answer those questions. / New York Times (8 minutes)

“All of these benchmarks are wrong, but some are useful,” he said. “Some of them can serve some utility for a fixed amount of time, but at some point, there’s so much pressure put on it that it reaches its breaking point.”

AI is getting so clever, so fast, that many of the benchmarks used to this point are now obsolete. Indeed, researchers in this area are scrambling to develop new, more challenging benchmarks. To put it simply, AIs are getting so good at passing tests that now we need new tests – not to measure competence, but to highlight areas where humans and AIs are still different, and find where we still have an advantage.

 

📻 QUOTE OF THE WEEK

I think that people don’t realize how much they expose by simply putting a picture out there.

Michal Kosinski, associate professor of organizational behavior at Stanford University’s Graduate School of Business (source) (the paper)

 

🏗️ FOUNDATIONS & CULTURE

When it comes to artificial intelligence, what are we actually creating? Even those closest to its development are struggling to describe exactly where things are headed, says Microsoft AI CEO Mustafa Suleyman, one of the primary architects of the AI models many of us use today. He offers an honest and compelling new vision for the future of AI, proposing an unignorable metaphor — a new digital species — to focus attention on this extraordinary moment.

What does it mean to make something totally new, fundamentally different to any invention that we’ve known before?

WHY AI Works (32:40) / YouTube

How I Created an AI Version of Myself / Keith McNulty, Medium (15 minutes)

Now that I have my documents of an appropriate length, I will need to load them to a vector database. A vector database stores text in both its original form, but also as embeddings, which are large arrays of floating point numbers, fundamental to how large language models process language. Words, sentences or documents that have ‘close’ embeddings in multidimensional space will be closely related to each other in content.

  • related, Why did I deepfake myself? To see if conversing with an AI-generated version of myself can lead to self-reflection, new insights into my thought patterns, and deep truths. (15:00) / Reid Hoffman, Twitter (sorry)

    • simultaneously bizarre and fascinating

    • Reid Hoffman has a 15-minute conversation with his AI avatar, trained on his own books, writings, talks, etc.

 

🎓 EDUCATION

ASU+GSV 2024 Conference Notes / On EdTech Newsletter (7 minutes)

What was encouraging for me is that we seem to be getting over that annoying moral panic phase of AI where everything seemed driven and dominated by a fear of cheating and the attempts to detect it. I believe we are in a new phase now where colleges and universities and vendors and investors are exploring what AI can do and how it might fit into education, but crucially we haven’t figured it out yet.

The AI Tools in Education Database / EdTech Insiders, Notion

This database is intended to be a community resource for educators, researchers, students, and other edtech specialists looking to stay up to date.

  • 320 products with a description and summary of features (e.g., “Designed for Learners”, “Homework help”)

All of the questions apply a culturally aware perspective rather than a traditional edtech adoption perspective (though the traditional perspective is another useful lens to evaluate this moment of rapid AI integration).

Kaiden AI – AI Teaching Assistant

Kai is your Al-powered teaching assistant, designed to save you time on lesson planning, content creation, and grading. It integrates chat, file uploads, and a robust knowledge base to streamline your workflow.

 

📊 DATA & TECHNOLOGY

Vana — The first network for user-owned data

  • aims to empower individuals to own and control their data by creating a market and infrastructure to sell aggregated data to model trainers, with the goal of preventing a centralized state where the power of AI is held by a few

  • as data start getting real dollars assigned to them (see related), personal agency over data will be more discussed

  • related (1), Inside Big Tech's underground race to buy AI training data / Reuters (10 minutes)

  • related (2), Is there enough text to feed the AI beast? / Semafor (4 minutes)

While the Phi-3 family of models knows some general knowledge, it cannot beat a GPT-4 or another LLM in breadth — there’s a big difference in the kind of answers you can get from a LLM trained on the entirety of the internet versus a smaller model like Phi-3.

 

🎉 FUN and/or PRACTICAL THINGS

AI Image Generator — Free Text-to-Image Generator

  • select a style and enter a prompt and it will extend the prompt accordingly

  • possibly a good way to learn image prompting techniques

  • related, updated diffusion model from Adobe Firefly

Limitless — Personalized AI powered by what you’ve seen, said, and heard

  • no

  • an AI-powered device that claims to enhance workplace productivity by automating tasks like meeting transcription and summarization

  • it seems like a privacy nightmare for non-consenting people since it is designed to continuously collect and analyze everything the wearer sees, says, and hears

  • related, The Ray-Ban Meta Smart Glasses have multimodal AI now / The Verge (6 minutes)

    • also no

MiniFigure AI — Turn Your Headshot Into a (Lego) MiniFigure

  • does what it advertises, while also generating some interesting glitches

 

🧿 AI-ADJACENT

function musicFor(task = 'programming') { return 'A series of mixes intended for listening while ${task to focus the brain and inspire the mind.';
  • good background music and great interface