🤖

Series · Part 5 of 21

Foundation
AI Demystified
Abhishek Saha
Abhishek Saha
· 🤖 AI / ML

Why AI Forgets

Mid-conversation, AI suddenly doesn't remember what you said earlier. This isn't a bug — it's the context window. Here's how it works and how to work around it.

Why AI Forgets

You’re deep in a conversation. You’ve explained your problem, set up context, asked five follow-up questions. Then you ask something simple — and the AI acts like you never introduced yourself.

It’s not a glitch. It’s the context window.

CONTEXT WINDOW— AI's short-term memory

This conversation has 11 messages. Use the slider to shrink the context window and watch the AI "forget" earlier messages.

Context window sizelast 3 messages (41 tokens)
tiny (forgets everything)full (205 tokens)
YOU
My name is Priya. I'm learning Python.
AI
Nice to meet you, Priya! Python is a great choice. What do you want to build?
YOU
I want to make a web scraper to collect cricket scores.
AI
Perfect for Python! You'd use requests + BeautifulSoup. Want me to show you a basic example?
YOU
Yes please, show me the example.
AI
Here's a starter: import requests from bs4 import BeautifulSoup res = requests.get('https://cricbuzz.com') soup = BeautifulSoup(res.text, 'html.parser') scores = soup.find_all('div', class_='score') print(scores)
YOU
Thanks! Can you also help me with error handling?
AI
Of course! Wrap your requests in try/except to handle network errors and bad responses.
YOU
What if the site blocks me?
AI
Add headers to mimic a real browser: requests.get(url, headers={'User-Agent': 'Mozilla/5.0 ...'}).
AI RESPONDS TO: "By the way, what was my name again?"
"I don't have access to that information in our conversation. Could you remind me?"
← The message where you introduced yourself has scrolled out of the context window.

What’s Actually Happening

Every time the AI responds, it reads the entire conversation from the beginning. Not a compressed summary — the actual raw text, token by token, from the first message to the last.

But there’s a limit to how much it can read in one go. That limit is the context window.

When your conversation exceeds it, something has to give. Older messages get cut. The model no longer has access to things you said early on. From its perspective, those messages never existed.

It’s Not “Memory” in the Human Sense

Human memory is associative and persistent — you remember things for years and can connect distant events. The model’s “memory” is just: whatever fits in the current window.

This is why:

  • Long research sessions should be broken into fresh chats
  • Important context should be repeated or summarized at the start of each session
  • “As we discussed earlier…” doesn’t help if earlier is out of the window

Context Windows Are Getting Larger

GPT-3 had 4,096 tokens. GPT-4 Turbo has 128,000. Claude has up to 200,000. Gemini 1.5 can handle 1,000,000.

Bigger windows are better — but they’re also slower and more expensive. Models still struggle with the “lost in the middle” problem: they’re better at remembering the beginning and end of a long context than the stuff in the middle.

Next up: There’s another way AI misleads you — and it’s more subtle than forgetting. Sometimes it remembers confidently, but what it remembers is wrong. That’s hallucination.

AI Demystified · 16 of 21 published

  1. 0 Grounding 5 Mental Models You Need Before Diving Into AI
  2. 1 Foundation What Happens When You Ask AI Something?
  3. 2 Foundation Transformers — The Architecture That Changed Everything
  4. 3 Foundation How AI Learns, Thinks, and Decides
  5. 4 Foundation How AI Reads Your Words
  6. 5 Foundation Why AI Forgets
  7. 6 Foundation Why AI Lies (And Doesn't Know It)
  8. 7 Foundation What AI Cannot Do
  9. 8 Foundation How AI Reasons (And Why It Sometimes Breaks)
  10. 9 Practice Prompt Engineering — How to Talk to AI
  11. 10 Practice Embeddings & Vector Databases — The Memory Layer of AI
  12. 11 Practice RAG Explained — How AI Knows What You Didn't Train It On
  13. 12 Practice Fine-tuning vs. Prompting — When to Use Which
  14. 13 Practice Do You Really Need GPT-4?
  15. 14 Practice Latency, Tokens, and Cost — The Physics of AI Products
  16. 15 Practice How Do You Know AI Is Actually Working?
  17. 16 Hands-On Coding Setup — Your AI Development Environment soon
  18. 17 Hands-On MCP Tool Calling — How AI Uses Tools soon
  19. 18 Hands-On AI Agents — Beyond Chatbots soon
  20. 19 Hands-On Build Your First Real AI App soon
  21. 20 Hands-On Token Optimization — Spend Less, Get More soon

Related posts

⚙️
How DNS Works — The Internet's Phone Book ⚙️ Tech
Part 1 · How the Internet Works

How DNS Works — The Internet's Phone Book

You type google.com. Your browser stares at it blankly — it has no idea where that is. Here's how the internet turns a name into an address, and why it's faster than you'd expect.

read more →
⚙️
How HTTP Works — The Language of the Web ⚙️ Tech
Part 4 · How the Internet Works

How HTTP Works — The Language of the Web

The TLS tunnel is open. Now your browser and server need to speak the same language. Here's what GET, POST, 404, and 200 actually mean — and what really travels across the wire.

read more →
⚙️
How HTTP/2 Works — The Speed Upgrade ⚙️ Tech
Part 5 · How the Internet Works

How HTTP/2 Works — The Speed Upgrade

HTTP/1.1 can only send one request at a time. A modern page needs 50+ files. HTTP/2 loads everything at once over a single connection — here's how multiplexing actually works.

read more →
newsletter

Get new posts in your inbox

No spam. No digest. Just a note when I publish something new.

Discussion