weekend ai reads for 2024-09-13

📰 ABOVE THE FOLD: THE PETER PRINCIPLE (you know who you are)

And so, “good followers” get promoted. But there comes a point in a promotion cycle when you need to stop following and take the lead. You need out-of-the-box thinking and need to rally the troops for a brave, daring assault that no one else saw coming.

The employment agency through which I got my fake job made no epistemological distinction between knowing something and knowing about something. JavaScript, for example, a computer programming language: I knew about it but did not know it. Fine.

Will AI make us overconfident? — Like the internet or a magical sidekick, chatbots are reorganizing knowledge to be more interactive and more accessible. / Ted Underwood (8 minute read)

These field experiments, which were run by the companies as part of their ordinary course of business, provided a randomly selected subset of developers with access to GitHub Copilot, an AI-based coding assistant that suggests intelligent code completions. Though each separate experiment is noisy, combined across all three experiments and 4,867 software developers, our analysis reveals a 26.08% increase (SE: 10.3%) in the number of completed tasks among developers using the AI tool. Notably, less experienced developers showed higher adoption rates and greater productivity gains.

GenAI Increases Productivity & Expands Capabilities / Boston Consulting Group (18 minute read)

Yet the people who participated in the coding task scored the same on the assessment as people who didn’t do the coding task. Performing the data-science tasks in our experiment thus did not increase participants’ knowledge.

  • the headline is misleading; they bury the lede, which is it doesn’t actually make the users better, it just helps them accomplish a task more efficiently

By recruiting over 100 NLP researchers to write novel ideas and blind reviews of both LLM and human ideas, we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility. Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. Finally, we acknowledge that human judgements of novelty can be difficult, even by experts, and propose an end-to-end study design which recruits researchers to execute these ideas into full projects, enabling us to study whether these novelty and feasibility judgements result in meaningful differences in research outcome.

  • a minor annoyance this week

  • this was breathlessly touted all over but it’s clear that hardly anyone took time to read the paper

  • the study focused on research on prompting Claude, so “novel research ideas” is already pretty a constrained space

  • then they have Claude rewrite all the human submissions, which makes them worse? so of course the human submissions didn’t fare better

 

📻 QUOTE OF THE WEEK

The entire concept of work that we have had for thousands of years was a temporary model that was required to solve a temporary problem. Namely, people who are trying to build or sell something that required work they were unable to do by themselves.

Daniel Miessler (source)

 

🏗️ FOUNDATIONS & CULTURE

OpenAI o1 Hub — We've developed a new series of AI models designed to spend more time thinking before they respond. Here is the latest news on o1 research, product and other updates. / Open AI blog (4 minute read)

Andrej Karpathy from OpenAI and Tesla / No Priors, YouTube (44 minute video)

In this episode, Andrej discusses the evolution of self-driving cars, comparing Tesla's and Waymo’s approaches, and the technical challenges ahead. They also cover Tesla’s Optimus humanoid robot, the bottlenecks of AI development today, and how AI capabilities could be further integrated with human cognition. Andrej shares more about his new mission Eureka Labs and his insights into AI-driven education and what young people should study to prepare for the reality ahead.

Part of the reason is that many applicants simply pick one of the preset CV templates offered by their software provider. These can be customised with text and images, but tend to look similar.

  • they get no sympathy from us because AI is vetting all these job applications for them anyway

How CEOs Are Using Gen AI for Strategic Planning / Harvard Business Review (7 minute read)

Apple’s iPhone 16 AI is useful so far, except when it’s bonkers — The new iPhone is all about artificial intelligence. But in tests of its prerelease software, it does an uncomfortable amount of making things up. / Washington Post (11 minute read)

Google’s AI Will Help Decide Whether Unemployed Workers Get Benefits — [Nevada] is working with Google on a first-of-its-kind generative AI system that will analyze transcripts from appeals hearings and issue a recommended decision in an effort to clear a stubborn backlog of claims. / Gizmodo (11 minute read)

 

🎓 EDUCATION

Here’s how ed-tech companies are pitching AI to teachers / MIT Technology Review (8 minute read)

“We know from plenty of research that teacher workload actually comes from data collection and analysis, reporting, and communications,” he says. “Those are all areas where AI can help.”

Then there are a host of not-so-menial tasks that teachers are more skeptical AI can excel at. They often come down to two core teaching responsibilities: lesson planning and grading.

Survey: College advisers could benefit from AI assistance — A new report from Tyton Partners encourages institutional leaders and academic advisers to consider the role of generative artificial intelligence to support advising caseloads and course mapping. / Inside Higher Ed (7 minute read)

Stanford students train AI to help with college essays — Two entrepreneurial Stanford students fed hundreds of essays—both high and low quality—into an AI model to train it on what top-tier colleges look for in admissions essays. / Inside Higher Ed (8 minute read)

  • crowded space; related, Athena — AI College Application Help

Students who used Athena this last application season saw a 3x higher acceptance rate to Top 15 universities.

Mayo Clinic launching AI education program — Harper Family Foundation provided $10M to train staff and medical professionals to deploy AI technology ethically for patients, system says. / Healthcare Finance News (5 minute read)

Cybersecurity, AI Remain Top Concerns for State Ed-Tech Leaders / Government Technology (5 minute read)

 

📊 DATA & TECHNOLOGY

For now, though, we’re making a calculation. We’re a startup that needs to move fast. We know other companies are making different choices, but we’re choosing to spend our precious time on what we know works instead of risking it on something new1.

  • thoughtful analysis of why “throwing a chatbot at it” is not always the correct answer

Futures of the data foundry business model — Scale AI’s future versus further scaling of language model performance. How Nvidia may take all the margins from the data market, too. / Interconnects, Substack (sorry) (14 minute read)

Aicado — AI Implementation Hub for Non-Technicals

When A.I.’s Output Is a Threat to A.I. Itself — As A.I.-generated data becomes harder to detect, it’s increasingly likely to be ingested by future A.I., leading to worse results. / New York Times (13 minute read)

  • useful visuals to understand the implications

  • related, How Will A.I. Learn Next? — As chatbots threaten their own best sources of data, they will have to find new kinds of knowledge. / The New Yorker (21 minute read)

The point of a dashboard isn’t to use a dashboard / Terence Eden’s Blog (8 minute read)

Every so often, an employer asks me to help make a dashboard.

Usually, this causes technologists to roll their eyes. They have a vision of a CEO grandly staring at a giant projection screen, watching the pretty graphs go up and down, and making real-time decisions about Serious Business. Ugh! What a waste of time!

The thing is - that's not what a dashboard is for. And that's generally not why a CEO wants it.

A dashboard shows that you have access to your data. And that is a huge deal.

  • honestly, this whole article should be the quote of the week

It is incredibly easy to game the LLM benchmarks. / Jim Fan, Twitter (sorry) (3 minute read)

It’s incredible that people still get excited by MMLU or HumanEval numbers in Sept, 2024. These benchmarks are seriously broken, and gaming them can be an undergrad homework project.

I would not trust any claims of a superior model until I see the following:

1. ELO points on LMSys Chatbot Arena. It’s difficult to game democracy in the wild.

2. Private LLM evaluation from a trusted 3rd party, such as Scale AI’s benchmark. The test set must be well-curated and held secret, otherwise it quickly loses potency.

 

🎉 FUN and/or PRACTICAL THINGS

Superhuman Automated Forecasting / Center for AI Safety (8 minute read)

  • the prompt via Dan Hendrycks on Twitter (sorry)

  • as of writing time, “Will Trump win the 2024 US presidential election?” gives a probability of 65%; it was 52% prior to the debate; do with that what you will

Gentype — Make an alphabet out of anything

  • text-to-font; prompts are loosely guardrailed

  • from Google Labs; free, requires Google login

Amazon says its football AI can predict blitzes — The new Thursday Night Football system will tackle ‘a huge hole in our football stats toolbox.’ / Popular Science (6 minute read)

  • we were worried the smartest minds in the world wouldn’t get around to solving this problem

Focus Buddy — Boost Productivity with AI-Powered Focus Sessions

  • seems geared to manage ADHD or ADHD-like tendencies

La Baye Aréa, French TV Series / TRBDRK, YouTube (1 minute video)

  • we’re not going to be able to describe this faster than you can watch it, but a title sequence for a French series featuring La Baye Aréa luminaries like Jacques D’Orsay, Samuel Alumont, Marc Le Zuber, and many others

  • made with Midjourney 6 + Runway Gen-3 + Udio

  • the people still look very uncanny valley, but the lighting, lenses, camera movements, and production design are excellent

Trishasode, Episode 4 / Trisha Code, YouTube (19 minute video)

  • if the first 75 seconds don’t amuse you, don’t bother with the remaining 1,100 or so

 

🧿 AI-ADJACENT

  • three guesses to select the dataset that matches the line chart shown

  • daily puzzle, but you can go back and try others

  • wish list is to overlay the incorrect answers’ line charts over the main one after the round is over