- That AI Thing
- Posts
- weekend ai reads for 2024-09-13
weekend ai reads for 2024-09-13
đ° ABOVE THE FOLD: THE PETER PRINCIPLE (you know who you are)
The âPeter Principleâ: Why corporate incompetency is inevitable / Big Think (9 minute read)
And so, âgood followersâ get promoted. But there comes a point in a promotion cycle when you need to stop following and take the lead. You need out-of-the-box thinking and need to rally the troops for a brave, daring assault that no one else saw coming.
related and very good, The Contingency Contingent / N Plus One Magazine (38 minute read)
The employment agency through which I got my fake job made no epistemological distinction between knowing something and knowing about something. JavaScript, for example, a computer programming language: I knew about it but did not know it. Fine.
Will AI make us overconfident? â Like the internet or a magical sidekick, chatbots are reorganizing knowledge to be more interactive and more accessible. / Ted Underwood (8 minute read)
The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers / Social Science Research Network (28 minute read)
These field experiments, which were run by the companies as part of their ordinary course of business, provided a randomly selected subset of developers with access to GitHub Copilot, an AI-based coding assistant that suggests intelligent code completions. Though each separate experiment is noisy, combined across all three experiments and 4,867 software developers, our analysis reveals a 26.08% increase (SE: 10.3%) in the number of completed tasks among developers using the AI tool. Notably, less experienced developers showed higher adoption rates and greater productivity gains.
paper at authorâs site [PDF]
GenAI Increases Productivity & Expands Capabilities / Boston Consulting Group (18 minute read)
Yet the people who participated in the coding task scored the same on the assessment as people who didnât do the coding task. Performing the data-science tasks in our experiment thus did not increase participantsâ knowledge.
the headline is misleading; they bury the lede, which is it doesnât actually make the users better, it just helps them accomplish a task more efficiently
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers / arXiv (35 minute read)
By recruiting over 100 NLP researchers to write novel ideas and blind reviews of both LLM and human ideas, we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility. Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. Finally, we acknowledge that human judgements of novelty can be difficult, even by experts, and propose an end-to-end study design which recruits researchers to execute these ideas into full projects, enabling us to study whether these novelty and feasibility judgements result in meaningful differences in research outcome.
a minor annoyance this week
this was breathlessly touted all over but itâs clear that hardly anyone took time to read the paper
the study focused on research on prompting Claude, so ânovel research ideasâ is already pretty a constrained space
then they have Claude rewrite all the human submissions, which makes them worse? so of course the human submissions didnât fare better
đ» QUOTE OF THE WEEK
The entire concept of work that we have had for thousands of years was a temporary model that was required to solve a temporary problem. Namely, people who are trying to build or sell something that required work they were unable to do by themselves.
Daniel Miessler (source)
đïž FOUNDATIONS & CULTURE
OpenAI o1 Hub â We've developed a new series of AI models designed to spend more time thinking before they respond. Here is the latest news on o1 research, product and other updates. / Open AI blog (4 minute read)
Andrej Karpathy from OpenAI and Tesla / No Priors, YouTube (44 minute video)
In this episode, Andrej discusses the evolution of self-driving cars, comparing Tesla's and Waymoâs approaches, and the technical challenges ahead. They also cover Teslaâs Optimus humanoid robot, the bottlenecks of AI development today, and how AI capabilities could be further integrated with human cognition. Andrej shares more about his new mission Eureka Labs and his insights into AI-driven education and what young people should study to prepare for the reality ahead.
How AI is generating a âsea of samenessâ in job applications / Financial Times (7 minute read)
Part of the reason is that many applicants simply pick one of the preset CV templates offered by their software provider. These can be customised with text and images, but tend to look similar.
they get no sympathy from us because AI is vetting all these job applications for them anyway
How CEOs Are Using Gen AI for Strategic Planning / Harvard Business Review (7 minute read)
related, AI Playbook: Common questions from leaders (Part 2) / Designing with AI, Substack (sorry) (5 minute read)
Appleâs iPhone 16 AI is useful so far, except when itâs bonkers â The new iPhone is all about artificial intelligence. But in tests of its prerelease software, it does an uncomfortable amount of making things up. / Washington Post (11 minute read)
related, Why Apple Intelligence wonât change your iPhone anytime soon â Apple Intelligence is not that scary, not that advanced, and definitely not finished. / Vox (13 minute read)
Googleâs AI Will Help Decide Whether Unemployed Workers Get Benefits â [Nevada] is working with Google on a first-of-its-kind generative AI system that will analyze transcripts from appeals hearings and issue a recommended decision in an effort to clear a stubborn backlog of claims. / Gizmodo (11 minute read)
this didnât make the edit last week but very related, Judge Rules $400 Million Algorithmic System Illegally Denied Thousands of Peopleâs Medicaid Benefits â Thousands of children and adults were automatically terminated from Medicaid and disability benefits programs by a computer system that was supposed to make applying for and receiving health coverage easier. / Gizmodo (5 minute read)
đ EDUCATION
Hereâs how ed-tech companies are pitching AI to teachers / MIT Technology Review (8 minute read)
âWe know from plenty of research that teacher workload actually comes from data collection and analysis, reporting, and communications,â he says. âThose are all areas where AI can help.â
Then there are a host of not-so-menial tasks that teachers are more skeptical AI can excel at. They often come down to two core teaching responsibilities: lesson planning and grading.
Survey: College advisers could benefit from AI assistance â A new report from Tyton Partners encourages institutional leaders and academic advisers to consider the role of generative artificial intelligence to support advising caseloads and course mapping. / Inside Higher Ed (7 minute read)
Stanford students train AI to help with college essays â Two entrepreneurial Stanford students fed hundreds of essaysâboth high and low qualityâinto an AI model to train it on what top-tier colleges look for in admissions essays. / Inside Higher Ed (8 minute read)
crowded space; related, Athena â AI College Application Help
Students who used Athena this last application season saw a 3x higher acceptance rate to Top 15 universities.
Mayo Clinic launching AI education program â Harper Family Foundation provided $10M to train staff and medical professionals to deploy AI technology ethically for patients, system says. / Healthcare Finance News (5 minute read)
Cybersecurity, AI Remain Top Concerns for State Ed-Tech Leaders / Government Technology (5 minute read)
đ DATA & TECHNOLOGY
AI chatbots are banned from our docs⊠for now / Mux (9 minute read)
For now, though, weâre making a calculation. Weâre a startup that needs to move fast. We know other companies are making different choices, but weâre choosing to spend our precious time on what we know works instead of risking it on something new1.
thoughtful analysis of why âthrowing a chatbot at itâ is not always the correct answer
Futures of the data foundry business model â Scale AIâs future versus further scaling of language model performance. How Nvidia may take all the margins from the data market, too. / Interconnects, Substack (sorry) (14 minute read)
Aicado â AI Implementation Hub for Non-Technicals
When A.I.âs Output Is a Threat to A.I. Itself â As A.I.-generated data becomes harder to detect, itâs increasingly likely to be ingested by future A.I., leading to worse results. / New York Times (13 minute read)
useful visuals to understand the implications
related, How Will A.I. Learn Next? â As chatbots threaten their own best sources of data, they will have to find new kinds of knowledge. / The New Yorker (21 minute read)
The point of a dashboard isnât to use a dashboard / Terence Edenâs Blog (8 minute read)
Every so often, an employer asks me to help make a dashboard.
Usually, this causes technologists to roll their eyes. They have a vision of a CEO grandly staring at a giant projection screen, watching the pretty graphs go up and down, and making real-time decisions about Serious Business. Ugh! What a waste of time!
The thing is - that's not what a dashboard is for. And that's generally not why a CEO wants it.
A dashboard shows that you have access to your data. And that is a huge deal.
honestly, this whole article should be the quote of the week
It is incredibly easy to game the LLM benchmarks. / Jim Fan, Twitter (sorry) (3 minute read)
Itâs incredible that people still get excited by MMLU or HumanEval numbers in Sept, 2024. These benchmarks are seriously broken, and gaming them can be an undergrad homework project.
I would not trust any claims of a superior model until I see the following:
1. ELO points on LMSys Chatbot Arena. Itâs difficult to game democracy in the wild.
2. Private LLM evaluation from a trusted 3rd party, such as Scale AIâs benchmark. The test set must be well-curated and held secret, otherwise it quickly loses potency.
đ FUN and/or PRACTICAL THINGS
Superhuman Automated Forecasting / Center for AI Safety (8 minute read)
the prompt via Dan Hendrycks on Twitter (sorry)
as of writing time, âWill Trump win the 2024 US presidential election?â gives a probability of 65%; it was 52% prior to the debate; do with that what you will
Gentype â Make an alphabet out of anything
text-to-font; prompts are loosely guardrailed
from Google Labs; free, requires Google login
A.I. Can Now Create Lifelike Videos. Can You Tell Whatâs Real? / New York Times (10 minute read)
10 short videos that you have to identify as AI or not; most are obvious, one or two will make you wonder
related, I put 7 leading AI image generators to the test with the same prompt â hereâs the winner / Tomâs Guide (12 minute read)
Amazon says its football AI can predict blitzes â The new Thursday Night Football system will tackle âa huge hole in our football stats toolbox.â / Popular Science (6 minute read)
we were worried the smartest minds in the world wouldnât get around to solving this problem
Focus Buddy â Boost Productivity with AI-Powered Focus Sessions
seems geared to manage ADHD or ADHD-like tendencies
La Baye Aréa, French TV Series / TRBDRK, YouTube (1 minute video)
weâre not going to be able to describe this faster than you can watch it, but a title sequence for a French series featuring La Baye ArĂ©a luminaries like Jacques DâOrsay, Samuel Alumont, Marc Le Zuber, and many others
made with Midjourney 6 + Runway Gen-3 + Udio
the people still look very uncanny valley, but the lighting, lenses, camera movements, and production design are excellent
Trishasode, Episode 4 / Trisha Code, YouTube (19 minute video)
if the first 75 seconds donât amuse you, donât bother with the remaining 1,100 or so
đ§ż AI-ADJACENT
three guesses to select the dataset that matches the line chart shown
daily puzzle, but you can go back and try others
wish list is to overlay the incorrect answersâ line charts over the main one after the round is over
â