Practical Guide to LLMs

May 12, 2025

Andrej Karpathy’s video continues his series on large language models (LLMs), shifting from foundational theory to hands-on usage. Below is a timestamped summary of the key points covered.

Introduction

(0:00–0:34)

Karpathy introduces the video as a follow-up to his prior explanation of how LLMs are trained. This session focuses on how to actually use LLMs in daily life and work.

LLM Ecosystem

(0:34–2:49)

He overviews the current LLM landscape:

ChatGPT by OpenAI is described as the “Original Gangster” and most feature-rich.
Alternatives include Gemini, Claude, and Grok.
For model performance, he references Chatbot Arena (2:10) and the Scale leaderboard (2:25).

Basic Interaction with LLMs

(2:50–7:57)

Demonstrates basic text-in, text-out interaction (e.g., writing a haiku at 3:01).
Explains tokenization (3:55) and how conversations build a context window (6:16–7:41).

How Language Models Work

(7:57–12:12)

LLMs are self-contained, pre-trained (8:06), and post-trained (10:39).
Pre-training compresses vast text data (8:19).
Post-training gives models an assistant-like persona (10:50).

Real-World Examples and Advice

(12:12–22:54)

Practical use cases: Caffeine lookups (13:16), medication info (14:31).
Tips:
- Start fresh chats to clear context (16:34).
- Choose your model wisely (18:04), considering cost and ability (18:52).

Thinking Models

(22:54–30:00)

Introduces “thinking models” (22:56), which use reinforcement learning (24:46).
Ideal for complex tasks like math and code (29:41).

Tool Use: Internet Search

(30:00–38:58)

Demonstrates live search (31:20), like checking White Lotus episode dates (31:38).

Tool Use: Deep Research

(42:05–51:27)

Combines search + thinking over longer timeframes (42:37).
Example: Creating full research papers (46:50).

File Uploads

(51:27–58:00)

Uploading PDFs and other docs enables the LLM to process content.
Karpathy uses this to read papers (52:15) and books (55:01).

Code Generation

(59:00–1:04:44)

Models can emit special code tokens, aiding in programming tasks.

Advanced Data Analysis

(1:04:44–1:09:00)

Describes how LLMs function as junior data analysts, capable of charting and plotting data.

Artifacts

(1:09:00–1:14:02)

LLMs can generate “Artifacts”—code snippets or small applications based on your input.

Coding with Cursor

(1:14:02–1:22:54)

Highlights Cursor, a tool that integrates LLMs into the software development workflow.

Multimodality

(1:22:54–1:40:21)

LLMs can now handle speech, audio, images, and video, not just text.

Image Input

(1:40:21–1:47:03)

Shows how image uploads unlock insights and functionality.

Image Output

(1:47:03–1:52:24)

LLMs can generate images, e.g., YouTube thumbnails and diagrams.

Video Input

(1:52:24–1:53:28)

LLMs analyze video frames, enabling visual comprehension.

Video Generation

(1:53:28–1:53:28)

Briefly mentions rapid progress in this space, though he hasn’t fully adopted it.

Quality of Life Features

(1:53:28–2:06:31)

Wraps up with enhancements that improve usability, such as chat memory, file support, and better UI.