- That AI Thing
- Posts
- weekend ai reads for 2025-06-13
weekend ai reads for 2025-06-13
š° ABOVE THE FOLD: āTHE ILLUSION OF THINKINGā
this paper received a lot of attention this week, for the wrong reasons; the paper is fine but the conclusions some drew was either misinformed or misleading; we wanted to expand on the analysis to better understand it, mostly for ourselves
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity / Apple Machine Learning Research (3 minute read)
By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes: (1) low-complexity tasks where standard models surprisingly outperform LRMs, (2) medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks where both models experience complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the modelsā computational behavior, shedding light on their strengths, limitations, and ultimately raising crucial questions about their true reasoning capabilities.
the paper, a brisk eleven pages without references or appendices, is at the link
Did Complexity Just Break AIās Brain? / Psychology Today (9 minute read)
Worse still, the models donāt seem to know theyāre failing. They produce what appear to be sound, step-by-step answers. To a lay reader, or even a trained one, these outputs can seem rational. But theyāre not grounded in any algorithmic or consistent method. Theyāre approximations of logic based on semantic coherence.
clickbait title; and no, it didnāt
they are correct that LMs are ālogic based on semantic coherenceā
When billion-dollar AIs break down over puzzles a child can do, itās time to rethink the hype / Gary Marcus, The Guardian (7 minute read)
Apple did this by showing that leading models such as ChatGPT, Claude and Deepseek may ālook smart ā but when complexity rises, they collapseā. In short, these models are very good at a kind of pattern recognition, but often fail when they encounter novelty that forces them beyond the limits of their training, despite being, as the paper notes, āexplicitly designed for reasoning tasksā.
Gary Marcus usually relishes when LLMs fail to live up to their promise, but he is too gleeful (in our opinion) and draws too-broad conclusions, all the while missing what the paper was really trying to say
however ā¦
Give Me a Reason(ing Model) / Zvi Mowshowitz, Substack archive (12 minute read)
It seems important that this doesnāt follow?
1. Not doing [X] in a given situation doesnāt mean you canāt do [X] in general.
2. Not doing [X] in a particular test especially doesnāt mean a model canāt do [X].
3. Not doing [X] can be a simple āyou did not provide enough tokens to [X]ā issue.
4. The more adversarial the example, the less evidence this provided.
5. Failure to do any given task requiring [X] does not mean you canāt [X] in general.
Or more generally, āwonātā or ādoesnātā [X] does not show ācanātā [X]. It is of course often evidence, since doing [X] does prove you can [X]. How much evidence it provides depends on the circumstances.
I spent 20$ on poking holes in their paper lol / scaling01, XCancel (3 minute read)
Take logic puzzle like Tower of Hanoi w/ 10s to 1000000s of moves to solve correctly. Check first step where an LLM makes mistake. Long problems aren't solved. Fewer thought tokens/early mistakes on longer problems. 1/11 / Afinetheorem, XCancel (4 minute read)
tl;dr
Beware General Claims about āGeneralizable Reasoning Capabilitiesā (of Modern AI Systems) / Lawrence Chen, Less Wrong (29 minute read)
š» QUOTES OF THE WEEK
Tasted a little tear gasā tasted like fascism
Like Dario Amodeiās Machines of Loving Grace, the latest Altman essay spends the bulk of its time hand waving away all the important concerns about such futures, both in terms of getting there and where the there is we even want to get. Itās basically wish fulfillment.
š„ FOR EVERYONE
The AI Proficiency Report [PDF] ā Al investments are increasing, but proficiency is not / Section AI (10 minute read)
AI EXPERTS: 1% of the workforce
Al experts are the most proficient Al users in the workforce, as well as the most bullish about Alās potential. They receive the most resources for Al development, and 31% of them are saving more than 12 hours - or a day and a half - of their work week by using Al.
Their stats
⢠Most likely to be C-suite
⢠84 average proficiency score out of 100
⢠69% are daily Al users
⢠47% are saving 8-12 hours per week using Al
Their differentiators
⢠94% are in companies that approve of Al
⢠71% have received Al training
⢠84% have managers that encourage Al
⢠75% report their company having a clear Al policy
What if Making Cartoons Becomes 90% Cheaper? / New York Times (14 minute read)
Just a few years ago, lip-syncing a minute of animation could take up to four hours. An animator would listen to an audio track and laboriously adjust character mouths frame by frame. But Mr. Peckās one-minute scene took 15 minutes for the A.I. tool to sync, including time spent by an artist to refine a few spots by hand.
In Brazilās Amazon, AI is making healthcare safer ā At overburdened clinics, pharmacists use AI to catch dangerous errors. Itās frontier tech meets frontier medicine ā with global implications. / Rest of World (9 minute read)
via david, Artificial Intelligence Is Not Intelligent ā Despite what tech CEOs might say, large language models are not smart in any recognizably human sense of the word. / The Atlantic (9 minute read)
related (?), God is hungry for Context: First thoughts on o3 pro / Latent Space, Substack archive (7 minute read)
The plan o3 gave us was plausible, reasonable; but the plan o3 Pro gave us was specific and rooted enough that it actually changed how we are thinking about our future. [eds: emphasis theirs]
This is hard to capture in an eval.
š FOUNDATIONS
Rethinking decision making to unlock AI potential / McKinsey & Company (14 minute read)
As agents take on high-frequency or transactional work, employees shift into roles that require more oversight, ethics, and judgment, including:
- Custodians whoensure the integrity of data, model performance, and customer outcomes.
- Judgment holders who handle ambiguous or high-stakes decisions where context, nuance, and trust are essential.
- Approvers and auditors who review exceptions, manage escalations, and reinforce compliance boundaries.
Why AI Agents Need a New Protocol / Frank Fiegel, Glama (7 minute read)
we are not API apologists, but many of the āAPI limitationsā in this article are more about legacy API design patterns rather than fundamental limitations of APIs
it glosses over the real advantage of MCP, which is less about technical capabilities and more about having one consistent protocol that AI models can be specifically trained on, which is lacking in API design patterns that exist today
anyway, still useful reading if you care about MCPs
related, The no-nonsense approach to AI agent development / Vercel blog (7 minute read)
UX Challenges with MCPs / Hardik Pandya (10 minute read)
First, configuration is unintuitive. MCPs work like IFTTT where you need to establish connections on both the app side and the LLM side to make them function. This creates setup friction that most users wonāt tolerate.
Second, the UX approach doesnāt feel right for the future of apps with natural language capabilities. The way MCPs bolt conversational AI onto existing tools feels like a bridge solution rather than how apps will naturally evolve. The interaction patterns arenāt optimized for mass-market usage.
To understand why these limitations matter, itās important to see the types of workflows that MCPs help with today.
š FOR LEADERS
How To Prepare Teams For Their New AI Agent Coworkers / Forbes (7 minute read)
Instead, leaders should set their teams up for success by helping to identify opportunities for AI-human collaboration. One way to do this is for teams to ensure that agents are working with good data, weeding out wrong or incomplete information and regularly optimizing and providing feedback.
A Chief AI Officer Won't Fix Your AI Problems / The New Stack (8 minute read)
In my experience, these organizations arenāt necessarily resistant to innovation. Instead, theyāre working to balance progress with the need for stability, compliance, and alignment with existing operations. In these cases, appointing a Chief AI Officer can serve a valuable purpose by creating a focal point for AI strategy and helping to coordinate efforts across departments.
Moats in the Age of AI / Clouded Judgement (7 minute read)
Speed isnāt just important, it is the moat. The ability to build, ship, learn, and adapt faster than everyone else is the only sustainable edge right now. In a world where everything is open source, everything is demo-able, and everything is one blog post away from being copied, speed is the only thing that compounds.
related (1), In Consumer AI, Momentum Is the Moat / Andreessen Horowitz (12 minute read)
related (2), What āWorkingā Means in the Era of AI Apps / Andreessen Horowitz (6 minute read)
The median enterprise company in our sample set reached more than $2 million in ARR in its first year, raising a Series A just nine months post-monetization. Median consumer companies performed even better, reaching $4.2 million in ARR and raising an A round within eight months. What was once considered ābest in classā ā the $0 to $1 million ARR ramp ā is now on the lower end of growth weāre seeing.
35 years of product design wisdom from Apple, Disney, Pinterest and beyond | Bob Baxley / Lennyās Podcast, YouTube (102 minute video)
thoughts on product management, design, AI and more
link above jumps to the benefits and risks of AI prototyping tools
š FOR EDUCATORS
On Generative AI in the Classroom: Give Up, Give In, or Stand Up / Active History (16 minute read)
Part of our task in the face of generative AI is to make an argument for the value of thinking ā laboured, painful, frustrating thinking. It is not an easy sell. But to give up on this is to give up on our students, most of whom are at an age where they can be easily seduced by techno-sirens promising instantaneous essays for minimal effort and with little chance of getting caught. They deserve better from us.
Your Campus Already Has AIāAnd Thatās the Problem / Marc Watkins, Substack archive (11 minute read)
While many of us are aware we should be mindful about uploading sensitive documents to AI systems, talking to a bot like it is a person and habitually revealing personal information to it is an extraordinary security risk when you deal with sensitive data. Our words are now prompts, our conversations become data, and the potential FERPA and HIPPA violations that may come from talking about someone with something is not being discussed enough.
College Grads Are Lab Rats in the Great AI Experiment / Bloomberg (7 minute read)
The PR worker also didnāt seem to be doing āhigher-level work,ā but simply doing analysis more quickly. The output provided by AI is clearly useful to a junior workerās bosses, but Iām skeptical that itās giving them a deeper understanding of how a business or industry works.
related (1), How OpenAI, maker of ChatGPT, plans to make āAI-native universitiesā / Business Standard (7 minute read)
OpenAI dubs its sales pitch āAI-native universities.ā āOur vision is that, over time, AI would become part of the core infrastructure of higher education,ā Leah Belsky, OpenAIās vice president of education, said.
related (2), Ohio State announces every student will use AI in class / NBC4 WCMH-TV (9 minute read)
The university will now require students to take an AI skills seminar, and it will incorporate workshops into existing framework like the First Year Seminar program. The seminars are optional one-credit courses tailored to first-year students in specialized subjects like Fantasy Worldbuilding in Television, Know Your Recreational Drugs and soon, AI.
China shuts down AI tools during nationwide college exams ā Popular AI apps from Alibaba and ByteDance have disabled features like image recognition to prevent cheating. / The Verge (4 minute read)
š FOR TECHNOLOGISTS
Data on AI Supercomputers / Epoch AI (11 minute read)
US & China are far outpacing the rest of the world
How Anthropic teams use Claude Code [PDF] / Anthropic (6 minute read)
list of use cases across various functions; no prompts or detailed how-to guides
related (1), AI-assisted coding for teams that canāt get away with vibes / Atharva Raykar, nilenso blog (12 minute read)
related (2), The Prompt Engineering Playbook for Programmers ā Turn AI coding assistants into more reliable development partners / Addy Osmani, Substack archive (52 minute read)
Claude Code is My Computer / Peter Steinberger (7 minute read)
TL;DR: I run Claude Code in no-prompt mode; it saves me an hour a day and hasnāt broken my Mac in two months. The $200/month Max plan pays for itself.
For the past two months, Iāve been living dangerously. I launch Claude Code (released in late February) with --dangerously-skip-permissions, the flag that bypasses all permission prompts. According to Anthropicās docs, this is meant āonly for Docker containers with no internetā, yet it runs perfectly on regular macOS.
reading this made our palms sweaty
š FOR FUN
Seventh Sight ā Analyse Your Dreams
SeventhSight app uses patented Machine Learning Artificial Intelligence to analyse the meaning of your dream. Find powerful insights into your daily life by understanding what your subconscious is telling you!
we could not locate the patent in the USPTO search engine, if your usage of this hinged on that
Timbaland Announces New AI Entertainment Company ā Timbo also introduced a new genre called āA-Pop.ā / Billboard archive (5 minute read)
Timbaland has launched his own AI entertainment company called Stage Zero and its first signee is the artist TaTa. Co-founded with Rocky Mudaliar and Zayd Portillo, Stage Zeroās first signee is an AI pop artist called TaTa, driven by Suno AI. The pop artist, along with a bevy of AI-driven creative tools will all be under Timboās new company.
š§æ AI-ADJACENT
How AI Saved My Company From a 2-Year Litigation Nightmare / Tyler Tringas (19 minute read)
Mental Health Tip: Do this research using voice mode while walking. Legal research isnāt fun, but being outside in fresh air using your voice rather than staring at a screen makes it much more bearable.
ā