That AI Thing
Posts
weekend ai reads for 2024-09-13

weekend ai reads for 2024-09-13

September 13, 2024

📰 ABOVE THE FOLD: THE PETER PRINCIPLE (you know who you are)

The “Peter Principle”: Why corporate incompetency is inevitable / Big Think (9 minute read)

And so, “good followers” get promoted. But there comes a point in a promotion cycle when you need to stop following and take the lead. You need out-of-the-box thinking and need to rally the troops for a brave, daring assault that no one else saw coming.

related and very good, The Contingency Contingent / N Plus One Magazine (38 minute read)

The employment agency through which I got my fake job made no epistemological distinction between knowing something and knowing about something. JavaScript, for example, a computer programming language: I knew about it but did not know it. Fine.

Will AI make us overconfident? — Like the internet or a magical sidekick, chatbots are reorganizing knowledge to be more interactive and more accessible. / Ted Underwood (8 minute read)

The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers / Social Science Research Network (28 minute read)

These field experiments, which were run by the companies as part of their ordinary course of business, provided a randomly selected subset of developers with access to GitHub Copilot, an AI-based coding assistant that suggests intelligent code completions. Though each separate experiment is noisy, combined across all three experiments and 4,867 software developers, our analysis reveals a 26.08% increase (SE: 10.3%) in the number of completed tasks among developers using the AI tool. Notably, less experienced developers showed higher adoption rates and greater productivity gains.

paper at author’s site [PDF]

GenAI Increases Productivity & Expands Capabilities / Boston Consulting Group (18 minute read)

Yet the people who participated in the coding task scored the same on the assessment as people who didn’t do the coding task. Performing the data-science tasks in our experiment thus did not increase participants’ knowledge.

the headline is misleading; they bury the lede, which is it doesn’t actually make the users better, it just helps them accomplish a task more efficiently

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers / arXiv (35 minute read)

By recruiting over 100 NLP researchers to write novel ideas and blind reviews of both LLM and human ideas, we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility. Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. Finally, we acknowledge that human judgements of novelty can be difficult, even by experts, and propose an end-to-end study design which recruits researchers to execute these ideas into full projects, enabling us to study whether these novelty and feasibility judgements result in meaningful differences in research outcome.

a minor annoyance this week
this was breathlessly touted all over but it’s clear that hardly anyone took time to read the paper
the study focused on research on prompting Claude, so “novel research ideas” is already pretty a constrained space
then they have Claude rewrite all the human submissions, which makes them worse? so of course the human submissions didn’t fare better

📻 QUOTE OF THE WEEK

The entire concept of work that we have had for thousands of years was a temporary model that was required to solve a temporary problem. Namely, people who are trying to build or sell something that required work they were unable to do by themselves.

Daniel Miessler (source)

🏗️ FOUNDATIONS & CULTURE

OpenAI o1 Hub — We've developed a new series of AI models designed to spend more time thinking before they respond. Here is the latest news on o1 research, product and other updates. / Open AI blog (4 minute read)

Andrej Karpathy from OpenAI and Tesla / No Priors, YouTube (44 minute video)

In this episode, Andrej discusses the evolution of self-driving cars, comparing Tesla's and Waymo’s approaches, and the technical challenges ahead. They also cover Tesla’s Optimus humanoid robot, the bottlenecks of AI development today, and how AI capabilities could be further integrated with human cognition. Andrej shares more about his new mission Eureka Labs and his insights into AI-driven education and what young people should study to prepare for the reality ahead.

How AI is generating a ‘sea of sameness’ in job applications / Financial Times (7 minute read)

Part of the reason is that many applicants simply pick one of the preset CV templates offered by their software provider. These can be customised with text and images, but tend to look similar.

they get no sympathy from us because AI is vetting all these job applications for them anyway

How CEOs Are Using Gen AI for Strategic Planning / Harvard Business Review (7 minute read)

related, AI Playbook: Common questions from leaders (Part 2) / Designing with AI, Substack (sorry) (5 minute read)

Apple’s iPhone 16 AI is useful so far, except when it’s bonkers — The new iPhone is all about artificial intelligence. But in tests of its prerelease software, it does an uncomfortable amount of making things up. / Washington Post (11 minute read)

related, Why Apple Intelligence won’t change your iPhone anytime soon — Apple Intelligence is not that scary, not that advanced, and definitely not finished. / Vox (13 minute read)

Google’s AI Will Help Decide Whether Unemployed Workers Get Benefits — [Nevada] is working with Google on a first-of-its-kind generative AI system that will analyze transcripts from appeals hearings and issue a recommended decision in an effort to clear a stubborn backlog of claims. / Gizmodo (11 minute read)

this didn’t make the edit last week but very related, Judge Rules $400 Million Algorithmic System Illegally Denied Thousands of People’s Medicaid Benefits — Thousands of children and adults were automatically terminated from Medicaid and disability benefits programs by a computer system that was supposed to make applying for and receiving health coverage easier. / Gizmodo (5 minute read)

🎓 EDUCATION

Here’s how ed-tech companies are pitching AI to teachers / MIT Technology Review (8 minute read)

“We know from plenty of research that teacher workload actually comes from data collection and analysis, reporting, and communications,” he says. “Those are all areas where AI can help.”

Then there are a host of not-so-menial tasks that teachers are more skeptical AI can excel at. They often come down to two core teaching responsibilities: lesson planning and grading.

Survey: College advisers could benefit from AI assistance — A new report from Tyton Partners encourages institutional leaders and academic advisers to consider the role of generative artificial intelligence to support advising caseloads and course mapping. / Inside Higher Ed (7 minute read)

Stanford students train AI to help with college essays — Two entrepreneurial Stanford students fed hundreds of essays—both high and low quality—into an AI model to train it on what top-tier colleges look for in admissions essays. / Inside Higher Ed (8 minute read)

crowded space; related, Athena — AI College Application Help

Students who used Athena this last application season saw a 3x higher acceptance rate to Top 15 universities.

Mayo Clinic launching AI education program — Harper Family Foundation provided $10M to train staff and medical professionals to deploy AI technology ethically for patients, system says. / Healthcare Finance News (5 minute read)

Cybersecurity, AI Remain Top Concerns for State Ed-Tech Leaders / Government Technology (5 minute read)

📊 DATA & TECHNOLOGY

AI chatbots are banned from our docs… for now / Mux (9 minute read)

For now, though, we’re making a calculation. We’re a startup that needs to move fast. We know other companies are making different choices, but we’re choosing to spend our precious time on what we know works instead of risking it on something new1.

thoughtful analysis of why “throwing a chatbot at it” is not always the correct answer

Futures of the data foundry business model — Scale AI’s future versus further scaling of language model performance. How Nvidia may take all the margins from the data market, too. / Interconnects, Substack (sorry) (14 minute read)

Aicado — AI Implementation Hub for Non-Technicals

When A.I.’s Output Is a Threat to A.I. Itself — As A.I.-generated data becomes harder to detect, it’s increasingly likely to be ingested by future A.I., leading to worse results. / New York Times (13 minute read)

useful visuals to understand the implications
related, How Will A.I. Learn Next? — As chatbots threaten their own best sources of data, they will have to find new kinds of knowledge. / The New Yorker (21 minute read)

The point of a dashboard isn’t to use a dashboard / Terence Eden’s Blog (8 minute read)

Every so often, an employer asks me to help make a dashboard.

Usually, this causes technologists to roll their eyes. They have a vision of a CEO grandly staring at a giant projection screen, watching the pretty graphs go up and down, and making real-time decisions about Serious Business. Ugh! What a waste of time!

The thing is - that's not what a dashboard is for. And that's generally not why a CEO wants it.

A dashboard shows that you have access to your data. And that is a huge deal.

honestly, this whole article should be the quote of the week

It is incredibly easy to game the LLM benchmarks. / Jim Fan, Twitter (sorry) (3 minute read)

It’s incredible that people still get excited by MMLU or HumanEval numbers in Sept, 2024. These benchmarks are seriously broken, and gaming them can be an undergrad homework project.

I would not trust any claims of a superior model until I see the following:

1. ELO points on LMSys Chatbot Arena. It’s difficult to game democracy in the wild.

2. Private LLM evaluation from a trusted 3rd party, such as Scale AI’s benchmark. The test set must be well-curated and held secret, otherwise it quickly loses potency.

🎉 FUN and/or PRACTICAL THINGS

Superhuman Automated Forecasting / Center for AI Safety (8 minute read)

the prompt via Dan Hendrycks on Twitter (sorry)
as of writing time, “Will Trump win the 2024 US presidential election?” gives a probability of 65%; it was 52% prior to the debate; do with that what you will

Gentype — Make an alphabet out of anything

text-to-font; prompts are loosely guardrailed
from Google Labs; free, requires Google login

A.I. Can Now Create Lifelike Videos. Can You Tell What’s Real? / New York Times (10 minute read)

10 short videos that you have to identify as AI or not; most are obvious, one or two will make you wonder
related, I put 7 leading AI image generators to the test with the same prompt — here’s the winner / Tom’s Guide (12 minute read)

Amazon says its football AI can predict blitzes — The new Thursday Night Football system will tackle ‘a huge hole in our football stats toolbox.’ / Popular Science (6 minute read)

we were worried the smartest minds in the world wouldn’t get around to solving this problem

Focus Buddy — Boost Productivity with AI-Powered Focus Sessions

seems geared to manage ADHD or ADHD-like tendencies

La Baye Aréa, French TV Series / TRBDRK, YouTube (1 minute video)

we’re not going to be able to describe this faster than you can watch it, but a title sequence for a French series featuring La Baye Aréa luminaries like Jacques D’Orsay, Samuel Alumont, Marc Le Zuber, and many others
made with Midjourney 6 + Runway Gen-3 + Udio
the people still look very uncanny valley, but the lighting, lenses, camera movements, and production design are excellent

Trishasode, Episode 4 / Trisha Code, YouTube (19 minute video)

if the first 75 seconds don’t amuse you, don’t bother with the remaining 1,100 or so

🧿 AI-ADJACENT

Graphs

three guesses to select the dataset that matches the line chart shown
daily puzzle, but you can go back and try others
wish list is to overlay the incorrect answers’ line charts over the main one after the round is over

⋄