Academy/AI Prompt Engineer/How Do Large Language Models Work?

Free Chapter 8 minChapter 1/5

How Do Large Language Models Work?

Understand the principles of LLMs in plain language, laying the foundation for learning Prompt Engineering

本章学习要点

第 1 / 5 章

Understand how Large Language Models (LLMs) work in plain language

Differentiate the characteristics of mainstream LLMs (GPT/Claude/DeepSeek/Gemini)

Understand the meaning of core parameters like Temperature and Token

Recognize the capabilities and limitations of LLMs—what they can and cannot do

Establish the value proposition and learning framework for Prompt Engineering

Before learning Prompt Engineering, you need to understand how Large Language Models (LLMs) work. You don't need to dive into mathematical details, but understanding the basic principles will help you write better prompts and know why some prompts work while others don't.

What is a Large Language Model?

A Large Language Model (LLM) is essentially a 'text predictor' trained on massive amounts of text. Given a piece of text, it predicts the most likely text to follow. Under the hood, ChatGPT, Claude, DeepSeek, and Gemini are all LLMs.

For analogy: If you type 'today's weather' on your phone's keyboard, it might suggest words like 'is great' or 'is nice'. The principle of an LLM is similar, but its 'vocabulary' and 'understanding ability' are billions of times more powerful—it has read almost all public text on the internet, enabling it to generate coherent, logical, and even creative content.

More precisely, an LLM performs 'conditional probability prediction': given all previous Tokens (we'll explain Tokens in detail later), it calculates the probability distribution for the next Token and then selects one. This seemingly simple mechanism, at a sufficiently large model and data scale, has given rise to astonishing emergent capabilities—including reasoning, programming, translation, and creation.

How does it learn?

The training process of an LLM can be simplified into three stages:

Stage One: Pre-training

The model reads vast amounts of text data from the internet—books, articles, web pages, code, papers, Wikipedia, etc.—learning language patterns and knowledge. This is like a student reading all the books in the world, forming a broad knowledge base.

The amount of pre-training data is staggering. GPT-4-level models may be trained on over 13 trillion Tokens (equivalent to tens of billions of pages of text). Training requires thousands of high-end GPUs (like NVIDIA A100/H100) running for months, costing over $100 million. This is why only a few large companies can train foundational models from scratch.

Stage Two: Supervised Fine-tuning (SFT)

After pre-training, the model is just a 'completer'—it can continue a half-sentence but isn't good at answering questions according to instructions. The SFT stage trains the model with a large number of high-quality 'instruction-response' pairs, teaching it to understand user intent and provide helpful answers.

This training data is typically written by professional human annotators. For example: the instruction is 'Write a quicksort function in Python,' and the response is a complete piece of Python code with explanations. Through tens of thousands to hundreds of thousands of such high-quality examples, the model learns the pattern of 'when a user asks like this, I should answer like that.'

Stage Three: Alignment / RLHF

Using techniques like Reinforcement Learning from Human Feedback (RLHF), the model learns to answer questions according to human values and preferences. This step addresses the 'safety' issue—making the model refuse harmful requests, avoid bias, and remain honest.

The RLHF process is: have the model generate multiple responses to the same question → human evaluators select the best one → train a 'reward model' to learn human preferences → use the reward model to guide the LLM to generate responses more aligned with human expectations.

You can compare the entire process to education: Pre-training = reading extensively (building knowledge), SFT = doing exercises (learning how to answer questions), RLHF = teacher grading homework (learning what makes a good answer).

实用建议

Understanding the three training stages is important for Prompt Engineering: Pre-training determines what the model 'knows,' SFT determines 'how it answers,' and RLHF determines 'what it cannot say.' When your prompt triggers a safety filter, RLHF is at work.

Core LLM Parameters

When using AI tools, you'll encounter several key parameters. Understanding them helps you control the output more precisely:

Temperature

Controls the randomness of the output. At Temperature=0, the model always chooses the highest-probability Token, resulting in the most deterministic, almost identical output each time. At Temperature=1, the model explores more possibilities, resulting in more diverse but less predictable output.

**Usage Suggestions**: Factual Q&A, code generation → 0-0.3 (for accuracy); business copywriting, casual conversation → 0.5-0.7 (for naturalness); creative writing, brainstorming → 0.8-1.0 (for novelty).

Context Window

The maximum number of Tokens the model can process at once, equivalent to the model's 'working memory.' GPT-4o supports 128K Tokens (about a 300-page book), Claude 3.5 supports 200K Tokens. Content beyond the window is truncated or lost.

**Impact on Prompt Engineering**: The context window determines how much background information, examples, and instructions you can include in one conversation. A larger window allows you to provide more context to the AI, usually improving answer quality—but Tokens are also more expensive.

Top-P (Nucleus Sampling)

Another parameter controlling diversity. Top-P=0.1 means selecting only from the top 10% of Tokens by cumulative probability. It's generally recommended to keep Top-P=1 and only adjust Temperature—adjusting both simultaneously can lead to unpredictable results.

What Can LLMs Do? What Can't They Do?

Understanding the principles of LLMs allows us to define their capabilities boundaries—crucial for writing good prompts:

Areas of Strength

**Language Understanding and Generation**: Translation, summarization, rewriting, expansion, style conversion—this is the core capability of LLMs.

**Code Generation and Debugging**: Generating code from natural language descriptions, explaining code, finding bugs. Top models now surpass most human developers on standard programming tests.

**Creative Writing**: Stories, poetry, ad copy, scripts. Good at imitating various writing styles and tones.

**Information Integration and Analysis**: Reading large volumes of documents and extracting key information, generating structured summaries, comparative analysis.

**Logical Reasoning**: With sufficiently good prompt guidance, they can perform multi-step logical reasoning, causal analysis, and decision support.

Areas of Weakness

**Real-time Information**: Training data has a cutoff date. If you ask about recent events, the model may not know or give outdated information. Solution: Use RAG (Retrieval-Augmented Generation) or web search functionality.

**Precise Mathematical Calculation**: LLMs 'predict text' rather than truly 'calculate.' Multi-digit multiplication, solving complex equations often produce errors. Solution: Have the LLM call a code interpreter or calculator tool.

**Memory Management**: LLMs have no long-term memory across conversations. Each new conversation starts from scratch. If information within the context window is too long, the model may 'forget' content in the middle (known as the 'lost-in-the-middle' problem).

**100% Accuracy**: LLMs 'hallucinate'—confidently fabricating non-existent facts. This is not a bug but an inherent limitation of the prediction mechanism.

重要提醒

The hallucination problem of LLMs is something Prompt Engineers must always be vigilant about. For critical information involving data, law, medicine, finance, etc., always perform manual verification. A good Prompt Engineer knows how to design prompts to reduce hallucination risk (e.g., asking the model to cite sources, say 'I don't know' when uncertain).

Comparison of Mainstream LLMs

Understanding the characteristics of different models helps in choosing the most suitable tool:

**GPT-4o (OpenAI)**: Strong overall capabilities, excellent tool calling and coding abilities, richest ecosystem. Suitable for most scenarios.

**Claude 3.5 Sonnet (Anthropic)**: Outstanding long-text analysis and coding capabilities, leading safety design, 200K ultra-long context. Suitable for long document processing and scenarios requiring high safety.

**Gemini 1.5 Pro (Google)**: Ultra-long context window (1M Tokens), strong multimodal capabilities, deep integration with Google ecosystem. Suitable for processing ultra-long documents and video content.

**DeepSeek V3 (DeepSeek)**: Outstanding reasoning capabilities, excellent performance in Chinese, open-source and deployable. Suitable for scenarios requiring local deployment or with limited budgets.

**Qwen 2.5 (Alibaba)**: Top-tier Chinese capabilities, open-source and free for commercial use, many parameter size options. Suitable for Chinese-dominant application scenarios.

注意事项

Don't blindly believe in the 'strongest model.' Different models have their own advantages in different tasks. A good Prompt Engineer chooses the most suitable (not necessarily the most expensive) model based on task characteristics. For example, simple text classification can be done with GPT-4o mini; there's no need to use GPT-4o.

How is this related to Prompt Engineering?

Understanding that the essence of an LLM is a 'text predictor' clarifies why prompts are so important—the prompts you give determine its prediction direction. Prompt Engineering is essentially 'guiding the model's prediction direction through carefully designed input to obtain optimal output.'

A good prompt is like a good exam question: clear, specific, with context. Vague questions get vague answers, just like a poorly worded exam question leaves students confused.

Comparison of Bad vs. Good Prompts

**Bad**: 'Write an article' → The model doesn't know the topic, length, style, or audience, resulting in random and generic output.

**Good**: 'You are a tech journalist with 10 years of experience. Please write an 800-word in-depth analysis for readers of 36Kr on the impact of AI on the accounting industry. Include data support and specific cases. The tone should be professional but not obscure.' → The model has a clear direction, and the output quality will be many times higher.

**Why is it good?** Because the good prompt contains five key pieces of information: role (tech journalist), audience (36Kr readers), format (800-word in-depth analysis), topic (AI's impact on accounting), and requirements (data + cases + tone). This information helps the model significantly narrow the prediction space, resulting in more precise content.

The Value of Prompt Engineering

Prompt Engineering currently offers the highest return on investment among AI skills for three reasons:

**Zero Barrier to Entry**: No programming or math background required. You can start learning if you can type.

**Immediate Results**: You can apply a technique to your work immediately after learning it. Unlike learning programming, which requires long accumulation before producing results.

**High Versatility**: Regardless of your industry—marketing, law, finance, education, design—prompt skills can improve your efficiency in using AI.

**High Career Value**: According to LinkedIn 2025 data, positions requiring Prompt Engineering skills pay 15-25% more on average than similar roles. This skill is spreading from a standalone job to an essential skill for almost all knowledge workers.

实用建议

The best way to learn Prompt Engineering is 'learning by doing.' Starting today, consciously optimize your prompts each time you use an AI tool, comparing the effects before and after optimization. You'll notice significant improvement within a week.

In the next chapter, we'll move into practical application, learning the three foundational techniques: role setting, Few-shot, and format control—these three techniques alone can at least triple your AI efficiency.

LLM Training Three Stages

Pre-training (Read Extensively / Build Knowledge)

SFT Supervised Fine-tuning (Do Exercises / Learn to Answer)

RLHF Alignment (Grade Homework / Learn Safety)

LLM Capability Boundaries

Strengths: Language Understanding + Code Generation + Creative Writing + Information Analysis | Weaknesses: Real-time Info + Precise Calculation + Long-term Memory + 100% Accuracy

Temperature vs. Output Quality Relationship

Temperature 0 (Most Deterministic / Repetitive)

0.3 (Accurate / Suitable for Code)

0.7 (Natural / Suitable for Copywriting)

1.0 (Diverse / Suitable for Creativity)

Chapter Quiz

1/5

1What is the essence of a Large Language Model (LLM)?

Next Chapter

Your First High-Quality Prompt

Course Chapters

How Do Large Language Models Work?Your First High-Quality Prompt Chain-of-Thought Prompting: Guiding AI Through Step-by-Step Reasoning Hands-on Project: Building a Prompt Template Library for a BusinessUnlock with assessment Prompt Engineer Job Search Guide & Career DevelopmentUnlock with assessment

Finished? Mark as completed

Complete all chapters to earn your certificate

Want to unlock all course content?

Purchase the full learning pack for all chapters + certification guides + job templates

View Full Course