Academy/AI Prompt Engineer/Chain-of-Thought Prompting: Guiding AI Through Step-by-Step Reasoning

Free Chapter 10 minChapter 3/5

Chain-of-Thought Prompting: Guiding AI Through Step-by-Step Reasoning

Master the Chain-of-Thought technique to significantly improve accuracy on complex tasks

本章学习要点

第 3 / 5 章

Understand how Large Language Models (LLMs) work in plain language

Differentiate the characteristics of mainstream LLMs (GPT/Claude/DeepSeek/Gemini)

Understand the meaning of core parameters like Temperature and Token

Recognize the capabilities and limitations of LLMs—what they can and cannot do

Establish the value proposition and learning framework for Prompt Engineering

In the previous two chapters, we learned about basic role setting, example provision, and format control. This chapter, we delve into advanced techniques—Chain-of-Thought (CoT) prompting and its evolved versions: Tree-of-Thought, ReAct, and structured reasoning. These techniques can significantly improve AI accuracy when handling complex tasks.

What is Chain-of-Thought (CoT) Prompting?

The core idea of Chain-of-Thought prompting is simple: make the AI show its reasoning process before giving the final answer. It's like a teacher asking students to 'show your work' when solving a math problem—the process of showing the steps itself helps achieve a more accurate result.

Google first proposed this concept in their 2022 paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". Experiments showed that CoT could improve the accuracy of models like GPT on mathematical reasoning tasks (GSM8K) from about 17% to about 58%—a massive improvement achieved solely by changing the prompt.

**Why does CoT work?** LLM output is generated token by token. When a model directly outputs the final answer, it needs to complete all reasoning in one 'prediction'—which is too difficult for complex problems. CoT allows the model to generate intermediate reasoning steps first. Each step serves as the 'input context' for the next, effectively giving the model more 'thinking space' and 'scratch paper.'

Zero-shot CoT: The Magic of One Sentence

The simplest use of CoT is to add a guiding phrase at the end of the prompt. Just this one sentence can significantly improve AI performance on complex reasoning tasks:

**Classic Triggers:** "Please think step by step," "Let's think step by step," "Please analyze first, then answer," "Please show your reasoning process."

Practical Comparison

**Without CoT:** "A bookstore has three types of books: novels cost 30 yuan each, textbooks cost 50 yuan each, and dictionaries cost 80 yuan each. Xiao Ming buys 2 novels, 1 textbook, and 1 dictionary, paying 200 yuan. How much change does he get?" — The AI might directly give a number, sometimes miscalculating.

**With CoT:** Add "Please calculate step by step" after the same question — The AI will write: "1) 2 novels = 60 yuan; 2) 1 textbook = 50 yuan; 3) 1 dictionary = 80 yuan; 4) Total price = 190 yuan; 5) Change = 10 yuan." Through step-by-step calculation, the error probability drops from about 30% to less than 5%.

Advanced Triggers for Zero-shot CoT

Different triggers suit different scenarios:

**"Please first list your assumptions, then analyze step by step"** — Suitable for business analysis, making the model clarify premises.

**"Please analyze from both positive and negative sides, then give a conclusion"** — Suitable for decision-making problems, forcing the model to consider opposing arguments.

**"Please first summarize the conclusion in 30 words, then explain the reasoning process in detail"** — Suitable for scenarios requiring the conclusion upfront.

**"Please indicate your confidence level (High/Medium/Low)"** — Suitable for scenarios requiring reliability assessment, making the model self-evaluate.

Few-shot CoT: Providing Reasoning Demonstrations

A more powerful usage is to show the AI one or two examples containing complete reasoning processes in the prompt. The AI will imitate your reasoning style, logical structure, and analytical depth.

Practical Example: AI Impact Analysis on Jobs

"Please analyze the problem using the following reasoning framework:

Problem: A company's customer service department has 20 people. After an AI customer service system goes live, it can handle 60% of common inquiries. Analyze the impact on jobs.

Analysis: Step 1 — Determine AI coverage: 60% of common inquiries can be handled by AI. Step 2 — Assess remaining workload: 40% of complex inquiries still require human agents, plus tasks AI cannot handle like escalated complaints, VIP clients. Step 3 — Calculate manpower impact: Common inquiries account for about 70% of total workload. AI reduces manpower demand by approximately 20 x 70% x 60% = 8.4 people. Step 4 — Consider new roles: 2-3 people needed to maintain the AI system and train models. Step 5 — Comprehensive conclusion: Estimated net reduction of 5-6 traditional customer service roles, addition of 2-3 AI operations roles, with increased skill requirements and salaries for remaining customer service staff.

Now please use the same five-step framework to analyze: An accounting firm has 50 junior accountants. After AI financial software goes live, it can automatically complete 80% of basic bookkeeping work."

By providing a complete five-step reasoning demonstration, the AI clearly understands the expected analytical depth, logical structure, and information volume.

Key Points for Designing Few-shot CoT

**Number the reasoning steps:** Numbering makes the logical chain clearer and facilitates checking if a step is correct.

**Have an intermediate conclusion for each step:** Don't just list steps; have a clear intermediate result at the end of each step.

**Show numerical derivation:** If calculations are involved, show specific formulas and numbers, don't just say 'approximately XX.'

**Demonstrate handling edge cases:** In the example, show how you handle uncertainty ("If the proportion is between 60-80%, then the impact range is..."), and the model will learn this cautious attitude.

实用建议

The golden rule of Few-shot CoT: The rigor of your example's reasoning determines the rigor of the AI's output reasoning. Spending 20 minutes writing one high-quality reasoning example is more effective than adjusting prompt parameters 10 times.

Suitable and Unsuitable Scenarios for CoT

Scenarios Most Suitable for CoT

**Mathematical and logical reasoning:** Any problem requiring multi-step calculation or logical deduction. CoT shows the most significant improvement on such tasks.

**Causal analysis:** Deriving effects from causes or tracing causes from phenomena. E.g., "Why did the conversion rate drop by 20% this month?"

**Multi-factor decision-making:** Requiring weighing multiple factors to make a judgment. E.g., "Should we choose Plan A or Plan B?"

**Code debugging:** Having the AI check code logic step by step is far more effective than directly asking "What bugs are there?"

**Text analysis:** Tasks requiring understanding complex relationships, like analyzing contract clauses, interpreting policy documents, extracting key points from literature.

Scenarios Unsuitable for CoT

**Simple information queries:** "How do I use Python's len function?" No reasoning needed.

**Creative writing:** Writing poetry, stories, brainstorming—CoT can actually inhibit creativity, making the output 'formulaic.'

**Simple format conversion:** Mechanical tasks like translation, date format conversion.

**When token budget is tight:** CoT increases output length by 2-5 times. Use cautiously if cost control is needed per call.

Advanced Technique One: Tree-of-Thought (ToT)

**What is ToT?** Tree-of-Thought is an evolved version of CoT. CoT is 'single-line' reasoning (A to B to C to conclusion), while ToT is 'tree-like' reasoning—the model explores multiple possible directions at each node, evaluates the prospects of each, chooses the most promising to proceed, and abandons dead-end branches.

**Prompt Implementation of ToT:** "Please propose 3 different solution approaches to this problem. Analyze the pros, cons, and feasibility of each approach, then select the optimal one to elaborate in detail."

**Applicable Scenarios:** Open-ended problems (multiple feasible solutions), creative optimization (needing to compare multiple candidates), complex strategy formulation.

**Practical Example:** "I want to start an AI education and training company. Please propose 3 different business model directions. For each direction, evaluate: market size, competition level, startup cost, profitability timeline. Then select the optimal direction and provide a detailed launch plan."

Advanced Technique Two: ReAct (Reasoning + Acting)

**What is ReAct?** ReAct makes the AI take actions while reasoning—'think one step, act one step, observe the result, think the next step.' This is the core reasoning mode for AI Agents.

**ReAct Pattern:** Thought (think about what to do next) to Action (execute an operation, like search, calculate) to Observation (observe the result) to Thought (think about the next step based on the result)—repeat until the task is complete.

**Prompt Implementation:** "Please solve the problem using the following pattern: Thought: [Analyze the current situation] to Action: [Operation to execute] to Observation: [Execution result] to Thought: [Adjust direction based on result]. Repeat this process until the problem is solved."

**Applicable Scenarios:** Information gathering and research tasks, debugging and troubleshooting, multi-step complex operations.

Advanced Technique Three: Self-Consistency

**Principle:** Generate multiple answers (e.g., 5 times) for the same problem using CoT, then take the majority result. It's like having 5 experts independently analyze the same problem and adopting the majority conclusion.

**Why it works?** LLM output has randomness (when Temperature > 0), and different generations might follow different reasoning paths. If multiple paths point to the same conclusion, that conclusion is likely correct.

**Implementation:** Set Temperature=0.7 in API calls, send 5 identical requests, compare answers. In ChatGPT or Claude, you can manually 'regenerate' a few times to simulate this.

**Cost Consideration:** Self-consistency verification costs 3-5 times more than a normal query. Use only in high-risk decision scenarios, like business analysis, technical solution evaluation, legal issues.

Advanced Technique Four: Structured Reasoning Frameworks

When facing specific types of problems, you can embed professional reasoning frameworks in the prompt to guide the AI to think more systematically:

**SWOT Analysis:** "Please analyze using the SWOT framework: Strengths, Weaknesses, Opportunities, Threats, and finally provide comprehensive strategic recommendations."

**5 Whys Analysis:** "Please use the 5 Whys method to trace the root cause: Why did this problem occur? Ask 'why' consecutively 5 times to find the fundamental cause."

**MECE Principle:** "Please ensure your analysis follows the MECE principle—categories are Mutually Exclusive and Collectively Exhaustive."

**First Principles:** "Please analyze from first principles—go back to the most basic facts and axioms, rather than relying on analogies or conventions."

You don't need to master these frameworks—you just need to mention the framework name in the prompt, and the LLM knows how to apply it.

注意事项

CoT makes AI responses 2-5 times longer, consuming more tokens and cost. Do not use CoT for simple queries and creative writing. Also, CoT is not a panacea—if the model's foundational knowledge is wrong, CoT will only make it 'make mistakes in an organized manner.'

重要提醒

Zero-shot CoT is sufficient for ordinary tasks (add a phrase like 'please think step by step'). Only truly important decisions are worth using advanced techniques. One of the core competencies of a Prompt Engineer is judging 'what level of prompt technique does this task deserve?'

Reasoning Technique Hierarchy

Zero-shot CoT (add one phrase)

Few-shot CoT (provide reasoning example)

Tree-of-Thought (multi-path exploration)

ReAct (think and act)

Self-Consistency (multiple generations for consensus)

CoT Applicability Decision Tree

Does the task require multi-step reasoning?

No: Don't use CoT

Yes: Simple multi-step: Use Zero-shot CoT

Complex multi-step: Use Few-shot CoT

High-risk: Use Self-Consistency

Structured Reasoning Framework Selection

Causal analysis: Use 5 Whys

Strategic planning: Use SWOT

Exhaustive classification: Use MECE

Fundamental questions: Use First Principles

Chapter Quiz

1/5

1What is the simplest way to use Zero-shot CoT?

After mastering CoT and its advanced techniques, you now possess the core skills of an intermediate to advanced Prompt Engineer. In the next chapter, we will enter enterprise-level practice—building a complete Prompt Engineering system.

If you are a traditional product manager looking to transition into an AI product direction, the first step is to understand the core differences between AI products and traditional products. This is not simply 'adding an AI feature' to an existing product, but a fundamental shift in product thinking. In 2024-2025, the demand for AI Product Manager roles grew by over 200% year-over-year, making it one of the hottest product roles in the tech industry. This chapter will establish a complete cognitive framework for you as an AI Product Manager.

Traditional Product vs AI Product: Fundamental Differences

The core logic of traditional software products is: Input → Deterministic Rules → Deterministic Output. User clicks button A, result B always appears. This determinism is the cornerstone of traditional product experience—users can build a clear mental model, knowing what result each operation will yield.

The core logic of AI products is: Input → Probabilistic Model → Uncertain Output. The same question may yield different answers from the AI. This uncertainty is the core challenge AI Product Managers must face. Traditional PMs are accustomed to 'writing rules to control outcomes,' while AI PMs must accept the fact that 'you cannot fully control the AI's output.'

This difference permeates all aspects of product work: In requirement documents, you cannot write 'When user inputs X, the system returns Y,' but must write 'When user inputs X, the system returns results similar to Y, with an accuracy rate no less than 85%.' During testing, you cannot write simple assertions but must design evaluation systems. After launch, you cannot assume the feature will remain stable because model performance can fluctuate with changes in data distribution.

Classification and Forms of AI Products

Before diving deeper, you need to understand the two major categories of AI products—this directly determines your product strategy and working style.

AI-Native Products

The core functionality of these products is entirely built on AI; the product wouldn't exist without AI. Typical examples include: ChatGPT (conversational AI), Midjourney (AI image generation), GitHub Copilot (AI programming assistant), Perplexity (AI search engine). Product managers for AI-Native products need a deep understanding of model capabilities because the AI *is* the product itself.

AI-Enhanced Products

These products add AI capabilities to existing features to enhance the experience. Typical examples include: Notion AI (adding AI writing to a note-taking tool), Canva Magic (adding AI generation to a design tool), Feishu Miaoji (adding AI meeting notes to a collaboration tool). Product managers for AI-Enhanced products need to balance the relationship between traditional and AI features, judging which scenarios are suitable for introducing AI.

实用建议

If you are a traditional PM transitioning, it's recommended to start with AI-Enhanced products—adding AI features to existing products. This allows you to reuse existing product experience while gradually building AI product intuition. After accumulating AI product experience, then consider the AI-Native product direction.

Five Key Particularities of AI Products

1. Output Uncertainty

AI model output is probabilistic, not deterministic. This means you cannot perform precise functional testing like with traditional products—you don't have a 'standard answer' to compare against. AI Product Managers need to learn to measure output quality using evaluation metrics (like accuracy, BLEU score, F1 score) rather than simple right/wrong judgments.

In practical work, the biggest challenge brought by uncertainty is 'quality assurance.' You need to establish an evaluation benchmark: collect a set of typical input samples, manually annotate ideal outputs, then use this benchmark set to evaluate model performance. After each model update or prompt adjustment, run the evaluation benchmark to ensure quality hasn't degraded.

2. Data-Driven, Not Rule-Driven

Traditional product behavior is defined by code logic; modifying behavior means modifying code. AI product behavior is determined by training data and model parameters; modifying behavior requires adjusting data or fine-tuning the model. This requires product managers to understand the role and limitations of data—insufficient data causes the model to 'not know,' biased data causes the model to 'learn incorrectly,' poor data quality causes the model to 'talk nonsense.'

注意事项

AI product output is probabilistic. Never promise users 100% accuracy. Honestly label 'AI-generated, for reference only' in product copy and UI; this costs much less than handling complaints later. Many AI products have suffered user trust collapse due to overpromising—this is the hardest brand damage to repair.

3. User Expectation Management

Users often have two extreme expectations for AI products: either too high (thinking AI can do anything) or too low (thinking AI is unreliable). AI Product Managers need to reasonably guide user expectations through product design. Specific strategies include: clearly stating the AI's capabilities and limitations on the onboarding page ('I am good at... but not good at...'), displaying confidence or accuracy hints next to AI output, providing gentle reminders like 'AI may make mistakes, please verify,' and designing example guides to show users the best ways to use the AI.

4. Feedback Loop and Data Flywheel

AI products need to continuously collect user feedback to improve models. 'Like/Dislike' buttons, user editing of AI output, usage frequency—all this data is valuable feedback. Designing a good feedback collection mechanism is key to AI product success. Excellent AI products create a 'data flywheel' effect: users use the product → generate feedback data → model improves → product experience improves → more users use it. ChatGPT is a classic case of the data flywheel—billions of daily conversations continuously help the model improve.

5. Ethics, Safety, and Compliance

AI products may generate biased, inaccurate, or even harmful content. AI Product Managers need to design safety guardrails, including content filtering (input and output bidirectional detection), human review processes (mandatory for high-risk scenarios), user reporting mechanisms, and compliance frameworks (like China's Generative AI Management Measures, the EU AI Act). Safety should not be an afterthought—it must be integrated into the PRD from day one of product design.

重要提醒

Ethical issues in AI products are not just moral problems; they are business risks. One AI bias incident can lead to brand crisis, user churn, or even lawsuits. AI Product Managers must establish bias detection and mitigation mechanisms during the product design phase, not after an incident occurs.

Core Competency Model for AI Product Managers

AI Product Managers need to master four additional core competencies on top of traditional PM skills:

Competency One: AI Technical Literacy

You don't need to write model code, but you need to understand AI's capability boundaries. Specifically, you should be able to answer: Can this feature be done with AI? Use a large language model or computer vision? Use an off-the-shelf API or need fine-tuning? What's the approximate accuracy range of the model? What are the cost and latency?

Competency Two: Data Thinking

Understand the impact of data quality and quantity on AI products. You need to judge: Do we have enough training data? Is the data biased? How does user-generated data flow back to model improvement? How is data privacy compliance ensured?

Competency Three: Evaluation Methods

How to measure the quality of AI features. You need to master: designing evaluation benchmark sets, selecting appropriate evaluation metrics (accuracy, recall, F1 score, etc.), A/B testing AI features, analyzing user adoption and edit rates for AI output.

Competency Four: Prompt Engineering

The competitive edge of many AI products lies in prompt design. Understanding Prompt Engineering not only helps you design better products but also makes communication with engineers more efficient—you can validate product ideas directly with prompt prototypes without waiting for engineers to write code.

实用建议

The fastest path from traditional PM to AI PM: First learn Prompt Engineering (1-2 weeks), then learn basic model evaluation methods (1 week), then do an AI product side project (2-4 weeks). The whole process takes 1-2 months, allowing you to build basic AI product intuition. Recommended learning resources: Anthropic's official Prompt Engineering documentation, Google's Machine Learning Crash Course (free).

A Day in the Life of an AI Product Manager

What does a typical day look like for an AI Product Manager? Here's a description of a mid-level AI PM's workday:

**9:00-10:00 AM:** Check yesterday's model monitoring dashboard—metrics for AI feature accuracy, latency, user adoption rate, etc. Notice accuracy for a certain scenario dropped from 91% to 86%, flag it as an issue needing investigation.

**10:00-11:30 AM:** Weekly meeting with ML engineers, discussing reasons for last week's accuracy drop (user input patterns changed) and solutions (add new training samples, adjust prompts). Sync on the next version's model upgrade plan.

**1:30-3:00 PM:** Write PRD for a new AI feature—'AI auto-generates meeting action items.' Need to define model requirements, data requirements, fallback strategies, and evaluation metrics.

**3:00-4:00 PM:** Analyze user feedback data. Find that user edit rate for AI summaries is 35%—higher than expected, indicating room for optimization in summary quality. Extract the most common editing patterns as direction for the next round of prompt optimization.

**4:00-5:00 PM:** Discuss AI feature interaction design with designers—how to display AI confidence in the UI? How to gracefully degrade when the AI is uncertain?

Industry Demand and Market Analysis

Demand for AI Product Managers is growing explosively. According to data from LinkedIn and Lagou, the number of AI PM positions in 2025 grew by over 200% compared to 2023. This growth comes not only from AI companies (ByteDance, Baidu, SenseTime) but also from digital transformation in traditional industries—finance, healthcare, education, manufacturing are all hiring AI Product Managers.

The supply of AI PMs in the market is severely insufficient. Most candidates are either purely technical (understand AI but not product) or purely product-focused (understand product but not AI). Compound talent that understands both AI technology and product methodology is extremely scarce—this is your opportunity window.

After understanding the core particularities of AI products and the competency model for AI PMs, the next chapter will delve into AI product design methodology—how to assess AI feasibility, design AI interactions, and handle UX challenges unique to AI.

AI Product Decision Flow

User Need

AI Feasibility Assessment

Probabilistic Output Design

Quality Evaluation System

User Feedback Flywheel

AI Product Classification

AI-Native Products (ChatGPT/Midjourney)

AI-Enhanced Products (Notion AI/Feishu Miaoji)

Different Strategies & PM Focus Areas

AI Product Manager Competency Model

Traditional PM Foundation (Requirements/Design/Data Analysis)

AI Technical Literacy (Model Understanding/Capability Boundaries)

Data Thinking (Quality/Bias/Flywheel)

Evaluation Methods (Metrics/A-B Testing)

Prompt Engineering (Prototype Validation)

Previous Chapter

Your First High-Quality Prompt

Next Chapter

Hands-on Project: Building a Prompt Template Library for a Business

Course Chapters

How Do Large Language Models Work?Your First High-Quality Prompt Chain-of-Thought Prompting: Guiding AI Through Step-by-Step Reasoning Hands-on Project: Building a Prompt Template Library for a BusinessUnlock with assessment Prompt Engineer Job Search Guide & Career DevelopmentUnlock with assessment

Finished? Mark as completed

Complete all chapters to earn your certificate

Want to unlock all course content?

Purchase the full learning pack for all chapters + certification guides + job templates

View Full Course