Reasoning Models

“Show your work.” — Every math teacher ever

Learning Objectives

By the end of this module, you will be able to:

Understand how reasoning capabilities differ across models
Identify when a task requires extended reasoning
Use chain-of-thought and extended thinking effectively
Recognize when reasoning models help vs. waste time
Choose the right model/mode for reasoning-heavy tasks

The Reasoning Challenge

Standard LLMs generate text token-by-token, predicting what comes next. This works brilliantly for many tasks — but can fail on complex reasoning.

Why? Each token is generated based on what came before, with limited “thinking ahead.” For problems requiring:

Multiple logical steps
Considering and rejecting alternatives
Checking work along the way
Planning before executing

…standard generation can stumble.

Solution: Reasoning models and techniques that give AI “time to think.”

Types of Reasoning Approaches

1. Chain-of-Thought Prompting (You trigger it)

You explicitly ask the model to think step by step:

“Solve this problem. Think through each step before giving your final answer.”

This works with any model. You’re using prompt engineering to encourage reasoning.

2. Extended Thinking (Built-in capability)

Some models have explicit “thinking” modes:

Claude: Extended thinking mode
OpenAI o1/o3: Reasoning-focused models
Other providers: Similar features emerging

These models automatically “think longer” on complex problems, showing (or hiding) their reasoning process.

3. Multi-Turn Reasoning (Iterative)

Break complex problems into steps yourself:

Turn 1: "What are the key considerations for this problem?"
Turn 2: "Given those, what approaches could work?"
Turn 3: "Evaluate approach A vs B"
Turn 4: "Implement the better approach"

You control the reasoning flow explicitly.

Chain-of-Thought Prompting

The Technique

Add instructions that encourage step-by-step reasoning:

Basic:

“Think step by step.”

Structured:

“Before answering, work through this systematically:

Identify what we know

Identify what we need to find

Consider approaches

Execute the best approach

Verify the answer”

Domain-specific:

“Debug this code by:

First, identify what the code is supposed to do

Trace through with example input

Find where behavior diverges from expectation

Propose and verify a fix”

When Chain-of-Thought Helps

Task Type	Without CoT	With CoT
Simple arithmetic	Often correct	Still correct
Multi-step word problems	Often wrong	Usually correct
Logic puzzles	Unreliable	More reliable
Complex debugging	Misses steps	More thorough
Planning tasks	Jumps to conclusions	More considered

Example: Math Problem

Without chain-of-thought:

“A store has 3 shirts that cost $15 each and 2 pants that cost $25 each. There’s a 10% discount on the total. What’s the final price?”

AI might jump to an answer and get it wrong.

With chain-of-thought:

“A store has 3 shirts that cost $15 each and 2 pants that cost $25 each. There’s a 10% discount on the total. What’s the final price? Show your calculation step by step.”

Step 1: Calculate shirt total
3 × $15 = $45

Step 2: Calculate pants total
2 × $25 = $50

Step 3: Calculate subtotal
$45 + $50 = $95

Step 4: Calculate 10% discount
$95 × 0.10 = $9.50

Step 5: Calculate final price
$95 - $9.50 = $85.50

Final answer: $85.50

Now you can verify each step.

Extended Thinking Models

What’s Different

Extended thinking models are designed to:

Spend more “compute” on reasoning
Explore multiple approaches internally
Check their own work
Handle more complex problems

How to Use Them

Claude Extended Thinking: Often automatic for complex tasks, or can be requested:

“Take your time thinking through this carefully.”

The model shows its reasoning process (or summarizes it).

OpenAI o1/o3 Models: Explicitly designed for reasoning tasks. Use them when you need:

Complex analysis
Multi-step problem solving
Code architecture decisions

Visible vs. Hidden Thinking

Some models show their reasoning (“thinking out loud”). Others think internally and only show the final answer.

Visible thinking advantages:

You can verify the reasoning
You can catch errors in logic
You learn from the process

Hidden thinking advantages:

Cleaner output
Faster to read
Less token usage in some contexts

When to Use Extended Thinking

Use extended thinking for:

Complex coding problems
Architectural decisions
Multi-step analysis
Problems where errors are costly
When you need to verify reasoning

Don’t use extended thinking for:

Simple, quick tasks
Creative writing (thinking doesn’t help creativity)
When speed matters more than depth
Tasks with obvious solutions

Practical Patterns

Pattern 1: Think Then Execute

First, think through the design for this feature without writing code.
Consider:
- What data structures do we need?
- What's the algorithm approach?
- What edge cases exist?

Then, implement the solution.

Separates planning from execution.

Pattern 2: Checkpoint Reasoning

Solve this in stages. After each stage, pause and verify before continuing.

Stage 1: [First part]
Verification: Is this correct so far?

Stage 2: [Second part]
Verification: Does this follow from Stage 1?

Final: [Complete solution]

Forces verification at each step.

Pattern 3: Compare and Choose

For this problem, consider three different approaches:

Approach A: [describe]
Approach B: [describe]
Approach C: [describe]

For each, analyze:
- Pros
- Cons
- Complexity
- When it's best suited

Then recommend the best approach for my specific situation.

Ensures alternatives are considered.

Pattern 4: Devil’s Advocate

Here's my proposed solution: [solution]

Before accepting it, argue against this solution:
- What could go wrong?
- What am I missing?
- What's a better alternative?

Then, either defend the original or propose improvements.

Catches blind spots.

Reasoning Limitations

Even reasoning models have limits:

They Can Still Be Wrong

Longer thinking doesn’t guarantee correctness. The model can:

Make errors in any step
Have incorrect premises
Follow valid logic from wrong assumptions

Always verify, especially for important decisions.

They Can Overthink

Sometimes simple is better:

Task: Add two numbers
Model: "Let me consider multiple approaches. First, I could use
       direct addition. But let me also consider logarithmic
       approaches for potential numerical stability..."

For simple tasks, reasoning overhead is wasteful.

They Can Confabulate Reasoning

Models can generate convincing-sounding reasoning that’s actually post-hoc justification, not genuine thought process.

The reasoning looks logical but may not reflect how the answer was actually generated.

Choosing the Right Approach

Decision Framework

Is this a simple, straightforward task?
├── Yes → Standard prompting (no special reasoning needed)
└── No → Continue...

Does it require multiple logical steps?
├── Yes → Use chain-of-thought prompting
└── No → Standard may suffice

Is it a complex, high-stakes decision?
├── Yes → Use extended thinking model
└── No → Chain-of-thought is probably enough

Do you need to verify the reasoning?
├── Yes → Request visible reasoning
└── No → Hidden/summarized is fine

Effort vs. Benefit

Task Complexity	Approach	Reasoning Overhead
Simple	Standard	None
Moderate	Chain-of-thought prompt	Low
Complex	Extended thinking	Medium
Critical	Extended + verification	High

Match the tool to the task.

Practical Exercises

Exercise 1: Chain-of-Thought Comparison

Take this problem:

“You have 3 boxes. Box A has 5 red balls and 3 blue balls. Box B has 2 red balls and 6 blue balls. You randomly pick one box, then randomly pick one ball. It’s red. What’s the probability you picked from Box A?”

Ask without chain-of-thought
Ask with “think step by step”
Compare the answers and reasoning

Exercise 2: Debugging with Reasoning

Find buggy code (or intentionally write some). Ask the AI to debug it:

First, without reasoning prompts
Then, with “trace through the code step by step with example input”

Compare thoroughness.

Exercise 3: Architectural Reasoning

Describe a feature you want to build. Ask:

“How should I implement this?” (basic)
“Before recommending an approach, analyze at least 3 different options with tradeoffs, then recommend one” (reasoning)

Compare the depth of analysis.

Key Insights

Concept	Practical Rule
Chain-of-thought	Ask for step-by-step reasoning on complex problems
Extended thinking	Use dedicated reasoning modes for high-stakes decisions
Visible reasoning	Helps you verify; use when accuracy matters
Match to task	Don’t over-reason simple tasks
Still verify	Reasoning models can still be wrong

Connection to What’s Next

You now understand how to get AI to reason more carefully. Final module in Tier 2:

Module 09: Context and memory — managing long conversations and large contexts

Then Tier 3 covers agentic development, where reasoning becomes crucial for autonomous operation.

Reflection Questions

When have you accepted an AI answer too quickly that turned out to be wrong? Could reasoning techniques have helped?
“The model can generate convincing-sounding reasoning that’s actually post-hoc justification.” What are the implications of this for trusting AI explanations?
For your own work, which tasks would benefit most from extended reasoning? Which would be slowed down by it?

Next module: Context & Memory — managing long conversations and large contexts effectively.