- That AI Thing
- Posts
- weekend ai reads for 2024-02-09
weekend ai reads for 2024-02-09
📰 ABOVE THE FOLD: INFERENCE, via adam
Articles and releases with a common theme of increased access to LLMs: not only their weights (what you use to run the models) but the data used to train them; the means to modify them to end-user purposes; and the compute for all of the above via the decreasing cost of training, experimentation, finetuning, and inference (use). These all point to the increasing commodification and democratization of AI modeling. Toward the end, a brief question about implications for the safe use of these models and a new enterprise LLM leaderboard.
Increased access to model weights:
Allen Institute for AI releases OLMo (Open Language Model) and fully reproducible LLM pipeline, including Dolma, the dataset used for pretraining.
It’s not the best model out there at its size, but for its data and compute budget it’s close.
More importantly, they share the end-to-end training scripts and data, enabling scientific research on questions like “what happens to LLM performance if we include Books3 (pirated material) in the training mix?”
Increased access to datasets for training and finetuning:
Dolma (for pretraining the OLMo family of models): 5.4TB and “3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials.”
OpenHermes-2.5: a “compilation and curation of many open source datasets and custom created synthetic datasets” used to finetune state-of-the-art 7b-size models.
This is useful if you’re curious about the data format used for instruction-tuning (as opposed to pretraining).
Increased ability to finetune/modify models:
MLX is a machine learning framework released by Apple that mimics the industry-standard PyTorch and NumPy frameworks.
MLX better leverages the integrated memory and GPUs of Apple machines, allowing developers to, e.g., finetune on their local laptop rather than a cluster of GPUs in the cloud to which they might not have access. This increases their ability to experiment.
Industry folks expect Apple to be releasing more AI models to local devices and this is likely part of that push.
Model Merging: Brief Overview and Technical Deep Dive
“The general idea of taking two model weights and smushing them together to get a new, sometimes useful, model out is having its moment in the sun. It’s a new way of experimenting with LLMs, which is especially useful to the GPU poor.”
This article is wild—I glossed over the technical stuff at the end but the beginning of the article helpfully outlines the trend of essentially averaging model weights to make Frankenstein’s monsters.
Increased access to compute/inference for non-hobbyist scenarios:
Semianalysis: The Inference Race to the Bottom
“It’s clear that pre-training of a GPT-3.5 caliber model has become completely commoditized. OpenAI is still the king of the hill with GPT-4, but that lead has been compressed significantly. While we believe that most of the long-term value will be captured by the highest-end models, it’s also clear that the next tier down in model quality and cost will enable a multi-billion-dollar niche in the marketplace, especially when fine-tuned. But who can actually make money off these models if they are everywhere? Firms with unique distribution due to direct access to the customer via a full software as a service or social media will have a unique advantage. [AKG: our curriculum partners?] Firms that offer full training or fine-tuning services for others on proprietary data by helping them with every stage of the process from data to serving will have a unique advantage. Firms that can provide data protections and ensure that all model use is legal, will have a unique advantage. Firms that simply serve open models will not have a competitive advantage.”
ArtificialAnalysis provides comparisons of AI models and hosting providers if you want to dive deeper into the claims above: “Smaller, emerging hosts are offering high throughput and at competitive prices.”
Given the above, we should continue to think about the implications of the potential for prompt injection attacks and sleeper agents using open source models or hosting providers as attack vectors. There are precious few approaches to dealing with these attacks, and those that do exist probably live with large foundation model providers like OpenAI, Anthropic, and Microsoft.
Finally, an interesting development in LLM benchmark leaderboards—we might take notes when thinking about how we frame industry-specific benchmarks:
HuggingFace releases the Enterprise Scenario Leaderboard, with metrics for, e.g., Customer Support Dialogue and Toxicity.
“Most LLM benchmarks use academic tasks and datasets, which have proven to be useful for comparing the performance of models in constrained settings. However, enterprise use cases often look very different. We have selected a set of tasks and datasets based on conversations with companies using LLMs in diverse real-world scenarios. We hope the leaderboard can be a useful starting point for users trying to understand which model to use for their practical applications.”
📻 QUOTE OF THE WEEK
A good question to ask whenever a media company rolls out a shiny new product is: Which came first, the product or the money?
(source)
🏗️ FOUNDATIONS & CULTURE
The analysis found that human players had made significantly better and more novel moves in response to the 2016 advent of superhuman AI. Between 1950 and 2015, the improvement in quality of play was comparatively small, with a median annual DQI oscillating between roughly -0.2 and 0.2. Whereas after superhuman AI, the DQI leapt upward, with median values above 0.7 from 2018 to 2021. In 2015, 63 per cent of games showed novel strategies, whereas by 2018, that figure had risen to 88 per cent.
one benefit of AI is that humans can learn by observing to discover new ways to think and approach novel problems
Mark Zuckerberg explained how Meta will crush Google and Microsoft at AI—and Meta warned it could cost more than $30 billion a year / Yahoo Finance
"There are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the common crawl data set,” Zuckerberg said on Meta’s earnings call on Thursday.
The results show LLMs demonstrate comparable, if not superior, performance in determining legal issues when compared to Junior Lawyers and LPOs. However, a LLMs ability to locate issues within contracts, particularly where a standard is not present, is model-dependent and may not consistently outperform human practitioners, highlighting the importance of selecting the right model for the legal task.
the study also reports “99.97 percent reduction in cost over traditional methods”
‘Mad’ AI risks destroying the Information Age — Artificial intelligence's poisonous feedback loops threaten to warp reality / The Telegraph
Artificial intelligence-related lobbying reached new heights in 2023, with more than 450 organizations participating. It marks a 185% increase from the year before, when just 158 organizations did so
HK firm scammed of $34 million after employee duped by video call with deepfake of CFO / The Straits Times
The employee, who works in the finance department, had received a message in January – from someone who appeared to be the company’s Britain-based CFO – asking for a transaction to be made.
Although the employee was initially doubtful, Mr Chan said the victim was fooled after being invited to the video conference call and seeing the company’s CFO and other “employees” in attendance.
Google's Gemini Advanced: Tasting Notes and Implications — And then there were two. / One Useful Thing (Ethan Mollick), Substack (sorry)
OpenAI Shifts AI Battleground to Software That Operates Devices, Automates Tasks / The Information ($)
OpenAI is developing a form of agent software to automate complex tasks by effectively taking over a customer's device. The customer could then ask the ChatGPT agent to transfer data from a document to a spreadsheet for analysis, for instance, or to automatically fill out expense reports and enter them in accounting software. Those kinds of requests would trigger the agent to perform the clicks, cursor movements, text typing and other actions humans take as they work with different apps, according to a person with knowledge of the effort.
non-paywalled summary: OpenAI reportedly developing two AI agents to automate entire work processes / Decoder
The AI supply chain: "It makes visible the connection between an engineer training an algorithm in the UK, a miner extracting tantalum in Kazakhistan, an engineer in Mexico working in a data centre, a worker in Taiwan manufacturing GPUs and a worker in Kenya dismantling e-waste” / ana vldv on Twitter (sorry)
🎓 EDUCATION
How (and Why) the University of Michigan Built Its Own Closed Generative AI Tools — Case Study / Educause Review
Learniverse AI — Learn any skill in minutes with personalized learning paths.
free tier avaialble
AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts / yifanzhang-pro, GitHub
AutoMathText is an extensive and carefully curated dataset encompassing around 200 GB of mathematical texts. It's a compilation sourced from a diverse range of platforms including various websites, arXiv, and GitHub (OpenWebMath, RedPajama, Algebraic Stack).
may pair well with some of the finetuning guides in the next section
Does military AI research at universities benefit humanity? — The Pentagon articulates a research focus that includes lethality. But universities that receive military funding often welcome the money with expressions of pride and altruism—and scant mention of the potential for harm. / Inside Higher Ed
Socratic — Get unstuck. Learn better.
With help from teachers, Socratic brings you visual explanations of important concepts in each subject.
AI Course Creator — Create a course in under one hour with AI
don’t worry, instructional designers … your jobs are still safe
📊 DATA & TECHNOLOGY
We need startups to fight prompt injection, the top LLM security risk | AI Venture Capital / Signal Fire
So if input-scrubbing is a helpful tool but not a foolproof method for stopping prompt injections, what about checking the output from our LLMs? Once we’ve run the model, we can have a set of rules of what can/can’t be expressed.
Thesis on value accumulation in AI. / Irrational Exuberance
There are three fundamental components: Infrastructure (cloud providers, NVIDIA, etc), Modeling & Core (OpenAI, Anthropic, etc), and AI-enhanced products (Github Copilot, etc)
if we had “an image of the week” section, the header image would be it
How to Fine-Tune Mistral 7B on Your Own Data (22:02) / brev, YouTube
In this tutorial video, I walk you through how to fine-tune Mistral 7B, which outperforms Llama 2 13B on all tested benchmarks, on your own data.... like how I do with my journal entries from over the years, teen angst and all.
Eagle 7B: Soaring past Transformers with 1 Trillion Tokens Across 100+ Language / Rwvk Blog
This aligns well with the team’s common goal, of getting AI to support everyone, not just by allowing it to run cheaply and affordably even on lower-end hardware. But by supporting their language.
multilingual models are de rigeuer but small models that do it well are rare
Phinetuning 2.0 / g-ronimo, Hugging Face
This tutorial will guide you through fine-tuning Phi-2, demonstrating how to build a unique dataset and fine-tune the model using QLoRA.
crewAI - Multi AI Agents systems.
🎉 FUN and/or PRACTICAL THINGS
via soren, Contexto
Find the secret word. You have unlimited guesses.
The words were sorted by an artificial intelligence algorithm according to how similar they were to the secret word.
How AI Is Helping Us Learn About Birds — Machine learning is powering new insights into how birds migrate—and forecasts about where they’ll go next / The Markup
Enchanted / Augustdev, GitHub
Enchanted is open source, Ollama compatible, elegant iOS/iPad mobile app for chatting with privately hosted models such as Llama 2, Mistral, Vicuna, Starling and more.
apple will be in this space shortly (Google already is) but if you can’t wait, it works as advertised
DignifAI got popular on Twitter with the help of right-wing commentators like Jack Posobiec and Ian Miles Cheong, who posted screenshots from 4chan and examples of the tool putting more clothes on images of women.
GOODY-2 / Brian
GOODY-2 is a new AI model built with next-gen adherence to our industry-leading ethical principles. It’s so safe, it won’t answer anything that could be possibly be construed as controversial or problematic.
parody but good prompt engineering
How AI is ‘amplifying creativity’ in the fashion world / The Guardian
In April last year Cyril Foiret’s generative AI studio, Maison Meta, hosted the first AI fashion week in New York, which included a competition for aspiring designers to use AI to create a fashion line.
HuggingChat - Assistants — Popular assistants made by the community
their answer to OpenAi’s GPTs
🧿 AI-ADJACENT
a16z Consumer Abundance Agenda — How AI will transform consumer technology / Gamma, Andreessen Horowitz
We invest at the intersection of culture and platform shifts, where products emerge that capitalize on or create new consumer behaviors. This is not just a function of timing, but how the technology comes to market in products that capture our imagination.
⋄