Practical Guide to LLMs
Andrej Karpathy’s video continues his series on large language models (LLMs), shifting from foundational theory to hands-on usage. Below is a timestamped summary of the key points covered.
Introduction
Karpathy introduces the video as a follow-up to his prior explanation of how LLMs are trained. This session focuses on how to actually use LLMs in daily life and work.
LLM Ecosystem
He overviews the current LLM landscape:
- ChatGPT by OpenAI is described as the “Original Gangster” and most feature-rich.
- Alternatives include Gemini, Claude, and Grok.
- For model performance, he references Chatbot Arena (2:10) and the Scale leaderboard (2:25).
Basic Interaction with LLMs
- Demonstrates basic text-in, text-out interaction (e.g., writing a haiku at 3:01).
- Explains tokenization (3:55) and how conversations build a context window (6:16–7:41).
How Language Models Work
- LLMs are self-contained, pre-trained (8:06), and post-trained (10:39).
- Pre-training compresses vast text data (8:19).
- Post-training gives models an assistant-like persona (10:50).
Real-World Examples and Advice
- Practical use cases: Caffeine lookups (13:16), medication info (14:31).
- Tips:
- Start fresh chats to clear context (16:34).
- Choose your model wisely (18:04), considering cost and ability (18:52).
Thinking Models
- Introduces “thinking models” (22:56), which use reinforcement learning (24:46).
- Ideal for complex tasks like math and code (29:41).
Tool Use: Internet Search
- Demonstrates live search (31:20), like checking White Lotus episode dates (31:38).
Tool Use: Deep Research
- Combines search + thinking over longer timeframes (42:37).
- Example: Creating full research papers (46:50).
File Uploads
- Uploading PDFs and other docs enables the LLM to process content.
- Karpathy uses this to read papers (52:15) and books (55:01).
Code Generation
- Models can emit special code tokens, aiding in programming tasks.
Advanced Data Analysis
- Describes how LLMs function as junior data analysts, capable of charting and plotting data.
Artifacts
- LLMs can generate “Artifacts”—code snippets or small applications based on your input.
Coding with Cursor
- Highlights Cursor, a tool that integrates LLMs into the software development workflow.
Multimodality
- LLMs can now handle speech, audio, images, and video, not just text.
Image Input
- Shows how image uploads unlock insights and functionality.
Image Output
- LLMs can generate images, e.g., YouTube thumbnails and diagrams.
Video Input
- LLMs analyze video frames, enabling visual comprehension.
Video Generation
- Briefly mentions rapid progress in this space, though he hasn’t fully adopted it.
Quality of Life Features
- Wraps up with enhancements that improve usability, such as chat memory, file support, and better UI.