AI Fundamentals and Models: From Machine Learning to Large Language Models
Systematically understand the differences between AI/ML/DL, training vs. inference, and open-source vs. closed-source models.
本章学习要点
Differentiate the meanings of the four levels: AI, AGI, ASI, ANI
Understand the relationship between Machine Learning, Deep Learning, and Neural Networks
Master the difference between Training and Inference
Understand the practical significance of model parameter counts (7B/70B/405B)
Differentiate the advantages and disadvantages of open-source vs. closed-source models
AI is reshaping every industry, but many people still have a vague understanding of its foundational concepts. This chapter will use the most accessible language to help you build a complete knowledge framework from AI to ML to DL, and understand core processes like training, inference, and fine-tuning.
AI, ML, DL: A Three-Layer Nested Relationship
These three terms are often used interchangeably, but they have a clear hierarchical relationship:
**Artificial Intelligence (AI)**: The broadest concept, referring to all technologies that enable machines to exhibit "intelligent behavior." Everything from the simplest rule engines (if-else logic) to the most advanced large language models falls under the umbrella of AI.
**Machine Learning (ML)**: A subset of AI. The core idea is to let machines automatically learn patterns from data, rather than manually writing rules. You give it 10,000 pictures of cats, and it learns to recognize cats on its own—without you telling it "cats have whiskers and pointy ears."
**Deep Learning (DL)**: A subset of ML. It uses multi-layer neural networks to handle complex patterns. The underlying technology for the hottest current AI products like ChatGPT, Midjourney, and Sora is deep learning.
实用建议
Memory aid: AI is the largest circle, ML is the middle circle, DL is the smallest circle—they are inclusive relationships, not parallel ones. Pay attention to distinguishing them in interviews or daily communication.
Training vs. Inference
This is the most crucial pair of concepts for understanding how AI systems operate:
**Training**: The process of "teaching" a model to learn knowledge and capabilities using vast amounts of data. Training a model like GPT-4 requires thousands of GPUs, months of time, and billions of dollars in investment. Training is the "learning" phase, producing a model file (i.e., model weights).
**Inference**: The process of using a trained model to process new inputs and generate outputs. When you ask ChatGPT a question, it is performing inference. The cost of inference is far lower than training, but it requires real-time response and has latency requirements.
Analogy: Training is like a student spending 4 years in university to acquire knowledge; inference is like using that learned knowledge to work after graduation. University is expensive (high training cost), but the "brainpower cost" during work is relatively low (low inference cost).
Fine-tuning
Fine-tuning involves performing additional training on a pre-trained base model using domain-specific data to make the model perform better in a particular professional direction.
**Why is fine-tuning needed?** Base models (like GPT-4, Claude) are generalists—they know a bit of everything but aren't specialized. If you need an AI specialized in medical Q&A, you can fine-tune a base model using medical literature and Q&A data to significantly improve its accuracy in the medical field.
**Cost of fine-tuning**: Much lower than training from scratch. Training a base model costs billions of dollars; fine-tuning might only cost thousands to tens of thousands of dollars.
**Common fine-tuning methods**: Full Fine-tuning (adjusting all parameters), LoRA (Low-Rank Adaptation, adjusting only a small number of parameters, more efficient), QLoRA (quantized version of LoRA, further reducing hardware requirements).
重要提醒
Distinguish three levels: Training is building a building from scratch (billions of dollars), fine-tuning is doing the interior decoration (thousands to tens of thousands of dollars), and prompt engineering is telling the occupants how to use the rooms (almost zero cost). Most individuals and businesses need the latter two, not training from scratch.
Open-source Models vs. Closed-source Models
**Closed-source models**: Model weights are not public; they can only be used via API or official products. Examples: GPT-4 (OpenAI), Claude (Anthropic), Gemini (Google). Advantages: top-tier performance, easy to use; Disadvantages: limited data privacy, cannot be deployed or customized independently.
**Open-source models**: Model weights are public; anyone can download, deploy, and modify them. Examples: LLaMA 3 (Meta), Qwen (Alibaba), DeepSeek, Mistral. Advantages: can be deployed locally to protect data privacy, can be freely fine-tuned; Disadvantages: requires technical expertise for deployment and maintenance.
**Choosing between open-source and closed-source**: Personal learning and lightweight use → closed-source APIs are most convenient; Enterprise core business involving sensitive data → local deployment of open-source models is more secure; Need for deep customization → fine-tuning open-source models.
Model Parameter Count: What do 7B, 70B, 405B mean?
Numbers like "7B" and "70B" in model names represent the model's parameter count (B = Billion). Parameter count can be roughly understood as the model's "brain capacity"—more parameters mean a higher potential capability ceiling for the model, but also higher hardware requirements.
**Common parameter scales**: 7B (entry-level, can run on a single consumer-grade GPU), 13B-34B (medium, requires professional GPUs), 70B (high-end, requires multiple A100/H100 GPUs), 405B+ (top-tier, requires GPU clusters).
注意事项
Parameter count does not equal model quality. A carefully fine-tuned 7B model may outperform a poorly trained 70B model on specific tasks. When choosing a model, comprehensively consider task requirements, hardware conditions, and usage costs.
Chapter Terminology Quick Reference
**AI (Artificial Intelligence)**: The umbrella term for technologies that enable machines to exhibit intelligent behavior. **ML (Machine Learning)**: A subset of AI where machines learn automatically from data. **DL (Deep Learning)**: A subset of ML that uses multi-layer neural networks. **Training**: The process of teaching a model using data. **Inference**: Using a trained model to process new inputs. **Fine-tuning**: Performing additional training on a base model using specialized data. **LoRA**: An efficient fine-tuning method. **Parameter Count**: A metric for measuring model scale (B=billion).
AI/ML/DL Hierarchy
Model Lifecycle
Chapter Quiz
1What is the relationship between AI, ML, and DL?
After mastering these foundational concepts, the next chapter will delve into the core mechanisms of large language models—Token, Embedding, and the Transformer architecture.
Course Chapters
Finished? Mark as completed
Complete all chapters to earn your certificate
Want to unlock all course content?
Purchase the full learning pack for all chapters + certification guides + job templates
View Full Course