YOU ARE AN EXPERT in GIVE DIRECTION SPECIFY FORMAT PROVIDE EXAMPLES EVALUATE QUALITY DIVIDE LABOR O'REILLY · PHOENIX · TAYLOR
◆   O’Reilly · LLMs · AI Engineering · Future-Proof

Prompt Engineering
for Generative AI

James Phoenix & Mike Taylor — O’Reilly Media, 2024

70,000+ Students · Udemy Bootcamp yacine.love

Every person who types a message into ChatGPT, Midjourney, or any AI system is doing prompt engineering — most just don’t know it. The quality of what you get out of an AI is almost entirely determined by the quality of what you put in. This O’Reilly book is the most complete technical guide to that input — written for people who want AI to do serious work, not just party tricks.

Before We Start — What Is a Prompt, Actually?

How AI Models Actually Read Your Words

When you type a message to an AI, you imagine it is reading your words like a person reads a letter. In reality, something far stranger is happening. The AI does not read words — it reads tokens. A token is roughly three-quarters of a word. “Extraordinary” might be three tokens: “Extra”, “ordin”, “ary”. Every token in your prompt literally changes the mathematical probability of every single token in the response.

The model has been trained on essentially the entire text of the internet. It has seen the best and worst writing humans have ever produced, and it can emulate almost any of it — if you know the right way to ask. A weak prompt sends the model toward the average of everything it has ever read. A strong prompt sends it toward the specific, high-quality corner of its knowledge you actually need.

The Temperature Parameter — Randomness as a Dial

Every time the AI generates the next token, it calculates the probability of every possible word. Then there is a dial called temperature that controls how much randomness is added to that choice. Temperature 0 = always pick the most probable word. Temperature 1 = add significant randomness, making the output more creative but less predictable. Temperature 2 = so much randomness the output starts to fall apart. For factual or structured tasks (JSON generation, code), use low temperature. For creative writing or brainstorming, use higher temperature. Most people never touch this dial — and it costs them quality on every task.

The Core Framework

The Five Principles of Prompting

Phoenix and Taylor spent years working with AI models before these principles converged as patterns. They are not tips or hacks. They are conventions that work with any intelligence, biological or artificial, in any model — whether GPT-4 today or GPT-6 in three years.

1
Give Direction — Brief the AI Like You’d Brief a Human Expert
The most common failure in prompting is asking the AI to do something without telling it who it should be while doing it. “Write a product name” produces an average result. “Write a product name in the style of Steve Jobs — single word, no hyphens, evokes simplicity and personal technology” produces something completely different. Direction can be a persona (act as a senior software architect), a style (minimalist, technical, accessible), or a constraint (under 20 words, no jargon, written for a 12-year-old). A human copywriter at a branding agency needs a brief before they can do great work. So does the AI.
2
Specify Format — Tell It Exactly What Shape the Output Should Take
AI models are universal format translators. They can output plain text, JSON, YAML, Python code, CSV, Markdown, HTML, a numbered list, a table, a poem — anything. But if you do not specify the format, you get whatever the model guesses from context, which will be inconsistent. For production applications, this inconsistency causes real errors. One run returns a numbered list; the next returns a paragraph; the next adds an introduction paragraph before the data starts. If you are parsing AI output programmatically, you need a format guarantee. JSON and YAML are the authors’ top recommendations for structured outputs because they are machine-readable, human-readable, and parseable in any programming language.
3
Provide Examples — Show, Don’t Just Tell
This is the single most reliable way to improve output quality. Giving the AI 2-5 examples of the task done well — called few-shot prompting — dramatically narrows the probability space toward the quality you want. Zero-shot prompting (no examples) works for simple tasks. Few-shot prompting is essential for complex, nuanced, or stylistically specific tasks. The examples act as anchors. The model reverse-engineers the quality pattern from them and applies it to your new input. Crucially, you should update examples over time based on what results you like most — this makes your prompt system smarter month by month without ever retraining the model.
4
Evaluate Quality — Test, Score, and Iterate
A prompt is not finished when you write it. It is finished when you have tested it across enough varied inputs to trust its output. The book introduces systematic evaluation: build a set of test cases representing the range of real inputs, run the prompt across all of them, score the outputs against your criteria, then change one variable and re-run. This is the scientific method applied to prompting. For production systems, the book covers using a second LLM as the evaluator — asking GPT-4 to score GPT-4’s outputs, a technique called LLM-as-judge. This is how you scale quality assurance beyond what any human reviewer can manually check.
5
Divide Labor — Chain Prompts for Complex Tasks
The most common mistake for complex tasks is trying to do everything in one prompt. A single prompt that asks the AI to research, analyse, draft, format, and summarise is asking too much — quality collapses at every stage. Prompt chaining is the solution: break the task into sequential steps where the output of prompt N becomes the input of prompt N+1. Example: Prompt 1 extracts key facts from a document. Prompt 2 analyses those facts for contradictions. Prompt 3 writes a summary based on the analysis. Prompt 4 formats the summary as HTML. Each step is simpler, the model does it better, and the final result is dramatically higher quality than any single mega-prompt could produce.
Before vs. After

The Same Task — Naive vs. Engineered

Here is the authors’ central demonstration from Chapter 1. The exact same task: generate product names for a shoe that fits any foot size. See how application of all five principles transforms the output from generic to precise, parseable, and reusable.

✗   Naive Prompt (Zero Engineering)
Can I have a list of product names for a pair of shoes that can fit any foot size?

Problems: No style direction (gets “average internet” quality). Unspecified format (returns different structure each run). No examples (model guesses what “good” looks like). Cannot be parsed by software. Cannot be improved systematically.

✓   Engineered Prompt (All 5 Principles)
Brainstorm product names for a shoe that fits any foot size, in the style of Steve Jobs. Return as: Product names: [comma-separated list of 3] ## Examples Product description: A beer-dispensing fridge Product names: iBarFridge, iFridgeBeer, iDrinkBeerFridge Product description: A watch accurate in space Product names: iNaut, iSpace, iTime

Includes: Direction (Steve Jobs style), Format (comma-separated, fixed structure), Examples (3 reference cases). Result: iFitFoot, iPerfectFit, iShoeSize — consistent, parseable, on-brand. Updateable. Reusable in production.

“Average prompts will return average responses. These models have seen the best and worst of what humans have produced and are capable of emulating almost anything — if you know the right way to ask.”

— James Phoenix & Mike Taylor, Prompt Engineering for Generative AI
Advanced Technique 1

Chain-of-Thought & Inner Monologue

One of the most significant discoveries in prompt engineering is that AI models produce dramatically more accurate answers when you ask them to think out loud before answering. This is called chain-of-thought prompting. Instead of asking “What is 17% of 348?” — which the model may answer incorrectly by jumping to a conclusion — you add: “Think through this step by step before giving your final answer.”

The Inner Monologue Tactic — Hide the Reasoning

A refinement of chain-of-thought: you ask the model to put its working-out inside triple quotes """like this""", instructing that this section should be hidden from the user. The model reasons through the problem at length, then delivers only the clean final answer. This is how you get an AI tutor that will not just give students the answer — it reasons through the solution in its hidden monologue, then gives only a hint to guide the student toward the answer themselves. The reasoning improves the answer without cluttering the user-facing output.

Pre-Warming — Ask the AI to Give Itself Instructions

A subtle but powerful technique: before giving the AI its main task, ask it to first list the best practices for that task. Then ask it to complete the task using the advice it just gave itself. This is called pre-warming or internal retrieval. The model’s own summary of expert advice becomes context that improves its subsequent answer. It costs you an extra prompt, but the quality improvement is substantial because the model is now completing the task in the context of best practices it articulated from its own training data.

Advanced Technique 2

Prompt Chaining — How Complex Work Actually Gets Done

Imagine a film production company wants to use AI to help create films. The temptation is to write one giant prompt: “Create a complete film with characters, plot, scenes, and screenplay.” The result will be superficial at every level. The professional approach is to chain prompts:

Prompt 1
Character creation — names, traits, backstory
Prompt 2
Plot generation using characters from P1
Prompt 3
Scene/world building from plot in P2
Prompt 4
Dialogue for key scenes from P3

Each prompt does one thing well. The output of each stage becomes the context for the next. The final output is dramatically richer because each component was given the AI’s full attention. The authors use LangChain in the book to orchestrate these chains programmatically, allowing you to build AI pipelines that run automatically at scale.

Advanced Technique 3

RAG — Giving the AI Your Private Knowledge

Retrieval-Augmented Generation (RAG) solves one of AI’s most frustrating limitations: it does not know anything that happened after its training cutoff, and it does not know anything about your specific organisation, products, or documents. RAG fixes this by retrieving relevant information from your own data and injecting it into the prompt context at the moment of answering.

The process: (1) Take your documents — PDFs, databases, web pages, internal wikis. (2) Break them into chunks of text. (3) Convert each chunk to a numerical representation called an embedding using an embedding model. (4) Store all embeddings in a vector database (like Pinecone). (5) When a user asks a question, convert that question to an embedding, find the most similar chunks in the database, inject them into the prompt as context. (6) The AI answers based on your private data — accurately and with citations.

Why Chunking Strategy Matters More Than You Think

Chunking is splitting your documents into pieces before converting to embeddings. Bad chunking — for instance, fixed-size chunks that split a sentence in the middle — destroys the semantic coherence of each piece. The vector database then matches your question to a fragment with no clear meaning and injects irrelevant context. Sliding window chunking (where each chunk overlaps with the previous one by 20-30%) significantly improves match quality because key sentences are never orphaned at the boundary of a chunk. The book covers four strategies: fixed-size, sentence detection via spaCy NLP, sliding window, and semantic chunking.

Key Technical Concepts

The Vocabulary Every AI Engineer Needs

📋
Few-Shot Prompting

Providing 2-10 examples of the task done correctly inside the prompt. The model pattern-matches from examples rather than guessing from training data alone. Most reliable quality booster available for complex tasks.

🎯
Role Prompting

“You are a senior security engineer with 15 years of experience reviewing Python code.” Persona assignment dramatically changes the depth, vocabulary, and focus of responses. Works because the AI has absorbed vast amounts of domain-specific writing and can adopt the perspective accurately.

📈
Token Budget

You pay per token — both for what you send (prompt) and what you receive (completion). A well-engineered prompt achieves higher quality at fewer tokens. Prompt engineering is therefore both a quality and a cost-reduction discipline.

📚
Vector Embeddings

Text converted to numerical coordinates in high-dimensional space. Words with similar meaning end up close together. This is how RAG systems find relevant chunks: the question’s embedding is compared to all document chunk embeddings by distance.

🛠
LangChain

An open-source Python framework for building multi-step AI workflows. Handles prompt templates, chaining, output parsing, vector database integration, and agent tool use. The book uses it throughout to demonstrate production-grade implementations.

🧠
Meta Prompting

Using one prompt to generate or improve another prompt. “Based on this task description, write me the best possible prompt for achieving this result.” The AI becomes a co-designer of the system it will operate within.

O'REILLY PROMPT ENGINEERING for Generative AI PHOENIX · TAYLOR

Prompt Engineering for Generative AI

James Phoenix & Mike Taylor · O’Reilly Media

Published May 2024 by O’Reilly Media. 423 pages. Based on a Udemy bootcamp with 70,000+ students. Covers text and image prompting, LangChain, RAG, vector databases, agents, and production AI system design. Written by practitioners who consult on real AI deployments across industries.

Prompt EngineeringLLMsLangChainRAGGPT-4Production AIO’Reilly

The Gap Between Good and Great AI Output Is Almost Entirely in How You Ask.

The models are already extraordinary. The bottleneck is not the AI — it is the quality of the brief you give it. Prompt engineering is the skill that converts an impressive demo into a reliable production system. It is the most transferable technical skill of the current decade.

Y

Yacine

Educator · Electronics Engineer · AI Curious

BTS instructor in Tangier teaching embedded systems and IoT. Writing about books at the frontier of AI, engineering, and human communication at yacine.love.