🤖

Series · Part 5 of 21

Foundation

Abhishek Saha · May 22, 2026 🤖 AI / ML

Why AI Forgets

Mid-conversation, AI suddenly doesn't remember what you said earlier. This isn't a bug — it's the context window. Here's how it works and how to work around it.

#ai #context-window #memory #llm #interactive

You’re deep in a conversation. You’ve explained your problem, set up context, asked five follow-up questions. Then you ask something simple — and the AI acts like you never introduced yourself.

It’s not a glitch. It’s the context window.

CONTEXT WINDOW— AI's short-term memory

This conversation has 11 messages. Use the slider to shrink the context window and watch the AI "forget" earlier messages.

Context window sizelast 3 messages (41 tokens)

tiny (forgets everything)full (205 tokens)

YOU

My name is Priya. I'm learning Python.

AI

Nice to meet you, Priya! Python is a great choice. What do you want to build?

YOU

I want to make a web scraper to collect cricket scores.

AI

Perfect for Python! You'd use requests + BeautifulSoup. Want me to show you a basic example?

YOU

Yes please, show me the example.

AI

Here's a starter:

import requests
from bs4 import BeautifulSoup

res = requests.get('https://cricbuzz.com')
soup = BeautifulSoup(res.text, 'html.parser')
scores = soup.find_all('div', class_='score')
print(scores)

YOU

Thanks! Can you also help me with error handling?

AI

Of course! Wrap your requests in try/except to handle network errors and bad responses.

YOU

What if the site blocks me?

AI

Add headers to mimic a real browser: requests.get(url, headers={'User-Agent': 'Mozilla/5.0 ...'}).

AI RESPONDS TO: "By the way, what was my name again?"

"I don't have access to that information in our conversation. Could you remind me?"
← The message where you introduced yourself has scrolled out of the context window.

What’s Actually Happening

Every time the AI responds, it reads the entire conversation from the beginning. Not a compressed summary — the actual raw text, token by token, from the first message to the last.

But there’s a limit to how much it can read in one go. That limit is the context window.

When your conversation exceeds it, something has to give. Older messages get cut. The model no longer has access to things you said early on. From its perspective, those messages never existed.

It’s Not “Memory” in the Human Sense

Human memory is associative and persistent — you remember things for years and can connect distant events. The model’s “memory” is just: whatever fits in the current window.

This is why:

Long research sessions should be broken into fresh chats
Important context should be repeated or summarized at the start of each session
“As we discussed earlier…” doesn’t help if earlier is out of the window

Context Windows Are Getting Larger

GPT-3 had 4,096 tokens. GPT-4 Turbo has 128,000. Claude has up to 200,000. Gemini 1.5 can handle 1,000,000.

Bigger windows are better — but they’re also slower and more expensive. Models still struggle with the “lost in the middle” problem: they’re better at remembering the beginning and end of a long context than the stuff in the middle.

Next up: There’s another way AI misleads you — and it’s more subtle than forgetting. Sometimes it remembers confidently, but what it remembers is wrong. That’s hallucination.

AI Demystified · 16 of 21 published

0 Grounding 5 Mental Models You Need Before Diving Into AI
1 Foundation What Happens When You Ask AI Something?
2 Foundation Transformers — The Architecture That Changed Everything
3 Foundation How AI Learns, Thinks, and Decides
4 Foundation How AI Reads Your Words
5 Foundation Why AI Forgets
6 Foundation Why AI Lies (And Doesn't Know It)
7 Foundation What AI Cannot Do
8 Foundation How AI Reasons (And Why It Sometimes Breaks)
9 Practice Prompt Engineering — How to Talk to AI
10 Practice Embeddings & Vector Databases — The Memory Layer of AI
11 Practice RAG Explained — How AI Knows What You Didn't Train It On
12 Practice Fine-tuning vs. Prompting — When to Use Which
13 Practice Do You Really Need GPT-4?
14 Practice Latency, Tokens, and Cost — The Physics of AI Products
15 Practice How Do You Know AI Is Actually Working?
16 Hands-On Coding Setup — Your AI Development Environment soon
17 Hands-On MCP Tool Calling — How AI Uses Tools soon
18 Hands-On AI Agents — Beyond Chatbots soon
19 Hands-On Build Your First Real AI App soon
20 Hands-On Token Optimization — Spend Less, Get More soon

← Part 4 How AI Reads Your Words Part 6 → Why AI Lies (And Doesn't Know It)

⚙️

How DNS Works — The Internet's Phone Book

⚙️ Tech

May 23, 2026

Part 1 · How the Internet Works

How DNS Works — The Internet's Phone Book

You type google.com. Your browser stares at it blankly — it has no idea where that is. Here's how the internet turns a name into an address, and why it's faster than you'd expect.

#dns #networking #internet #protocols

⚙️

How HTTP Works — The Language of the Web

⚙️ Tech

May 23, 2026

Part 4 · How the Internet Works

How HTTP Works — The Language of the Web

The TLS tunnel is open. Now your browser and server need to speak the same language. Here's what GET, POST, 404, and 200 actually mean — and what really travels across the wire.

#http #networking #protocols #api

⚙️

⚙️ Tech

May 23, 2026

Part 5 · How the Internet Works

How HTTP/2 Works — The Speed Upgrade

HTTP/1.1 can only send one request at a time. A modern page needs 50+ files. HTTP/2 loads everything at once over a single connection — here's how multiplexing actually works.

#http2 #networking #performance #web

newsletter

Get new posts in your inbox

No spam. No digest. Just a note when I publish something new.

Why AI Forgets

What’s Actually Happening

It’s Not “Memory” in the Human Sense

Context Windows Are Getting Larger

How DNS Works — The Internet's Phone Book

How HTTP Works — The Language of the Web

How HTTP/2 Works — The Speed Upgrade

Get new posts in your inbox

Discussion