Prompt Engineering
for Generative AI
James Phoenix & Mike Taylor — O’Reilly Media, 2024
Every person who types a message into ChatGPT, Midjourney, or any AI system is doing prompt engineering — most just don’t know it. The quality of what you get out of an AI is almost entirely determined by the quality of what you put in. This O’Reilly book is the most complete technical guide to that input — written for people who want AI to do serious work, not just party tricks.
How AI Models Actually Read Your Words
When you type a message to an AI, you imagine it is reading your words like a person reads a letter. In reality, something far stranger is happening. The AI does not read words — it reads tokens. A token is roughly three-quarters of a word. “Extraordinary” might be three tokens: “Extra”, “ordin”, “ary”. Every token in your prompt literally changes the mathematical probability of every single token in the response.
The model has been trained on essentially the entire text of the internet. It has seen the best and worst writing humans have ever produced, and it can emulate almost any of it — if you know the right way to ask. A weak prompt sends the model toward the average of everything it has ever read. A strong prompt sends it toward the specific, high-quality corner of its knowledge you actually need.
The Temperature Parameter — Randomness as a Dial
Every time the AI generates the next token, it calculates the probability of every possible word. Then there is a dial called temperature that controls how much randomness is added to that choice. Temperature 0 = always pick the most probable word. Temperature 1 = add significant randomness, making the output more creative but less predictable. Temperature 2 = so much randomness the output starts to fall apart. For factual or structured tasks (JSON generation, code), use low temperature. For creative writing or brainstorming, use higher temperature. Most people never touch this dial — and it costs them quality on every task.
The Five Principles of Prompting
Phoenix and Taylor spent years working with AI models before these principles converged as patterns. They are not tips or hacks. They are conventions that work with any intelligence, biological or artificial, in any model — whether GPT-4 today or GPT-6 in three years.
The Same Task — Naive vs. Engineered
Here is the authors’ central demonstration from Chapter 1. The exact same task: generate product names for a shoe that fits any foot size. See how application of all five principles transforms the output from generic to precise, parseable, and reusable.
Problems: No style direction (gets “average internet” quality). Unspecified format (returns different structure each run). No examples (model guesses what “good” looks like). Cannot be parsed by software. Cannot be improved systematically.
Includes: Direction (Steve Jobs style), Format (comma-separated, fixed structure), Examples (3 reference cases). Result: iFitFoot, iPerfectFit, iShoeSize — consistent, parseable, on-brand. Updateable. Reusable in production.
“Average prompts will return average responses. These models have seen the best and worst of what humans have produced and are capable of emulating almost anything — if you know the right way to ask.”
— James Phoenix & Mike Taylor, Prompt Engineering for Generative AIChain-of-Thought & Inner Monologue
One of the most significant discoveries in prompt engineering is that AI models produce dramatically more accurate answers when you ask them to think out loud before answering. This is called chain-of-thought prompting. Instead of asking “What is 17% of 348?” — which the model may answer incorrectly by jumping to a conclusion — you add: “Think through this step by step before giving your final answer.”
The Inner Monologue Tactic — Hide the Reasoning
A refinement of chain-of-thought: you ask the model to put its working-out inside triple quotes """like this""", instructing that this section should be hidden from the user. The model reasons through the problem at length, then delivers only the clean final answer. This is how you get an AI tutor that will not just give students the answer — it reasons through the solution in its hidden monologue, then gives only a hint to guide the student toward the answer themselves. The reasoning improves the answer without cluttering the user-facing output.
Pre-Warming — Ask the AI to Give Itself Instructions
A subtle but powerful technique: before giving the AI its main task, ask it to first list the best practices for that task. Then ask it to complete the task using the advice it just gave itself. This is called pre-warming or internal retrieval. The model’s own summary of expert advice becomes context that improves its subsequent answer. It costs you an extra prompt, but the quality improvement is substantial because the model is now completing the task in the context of best practices it articulated from its own training data.
Prompt Chaining — How Complex Work Actually Gets Done
Imagine a film production company wants to use AI to help create films. The temptation is to write one giant prompt: “Create a complete film with characters, plot, scenes, and screenplay.” The result will be superficial at every level. The professional approach is to chain prompts:
Each prompt does one thing well. The output of each stage becomes the context for the next. The final output is dramatically richer because each component was given the AI’s full attention. The authors use LangChain in the book to orchestrate these chains programmatically, allowing you to build AI pipelines that run automatically at scale.
RAG — Giving the AI Your Private Knowledge
Retrieval-Augmented Generation (RAG) solves one of AI’s most frustrating limitations: it does not know anything that happened after its training cutoff, and it does not know anything about your specific organisation, products, or documents. RAG fixes this by retrieving relevant information from your own data and injecting it into the prompt context at the moment of answering.
The process: (1) Take your documents — PDFs, databases, web pages, internal wikis. (2) Break them into chunks of text. (3) Convert each chunk to a numerical representation called an embedding using an embedding model. (4) Store all embeddings in a vector database (like Pinecone). (5) When a user asks a question, convert that question to an embedding, find the most similar chunks in the database, inject them into the prompt as context. (6) The AI answers based on your private data — accurately and with citations.
Why Chunking Strategy Matters More Than You Think
Chunking is splitting your documents into pieces before converting to embeddings. Bad chunking — for instance, fixed-size chunks that split a sentence in the middle — destroys the semantic coherence of each piece. The vector database then matches your question to a fragment with no clear meaning and injects irrelevant context. Sliding window chunking (where each chunk overlaps with the previous one by 20-30%) significantly improves match quality because key sentences are never orphaned at the boundary of a chunk. The book covers four strategies: fixed-size, sentence detection via spaCy NLP, sliding window, and semantic chunking.
The Vocabulary Every AI Engineer Needs
Providing 2-10 examples of the task done correctly inside the prompt. The model pattern-matches from examples rather than guessing from training data alone. Most reliable quality booster available for complex tasks.
“You are a senior security engineer with 15 years of experience reviewing Python code.” Persona assignment dramatically changes the depth, vocabulary, and focus of responses. Works because the AI has absorbed vast amounts of domain-specific writing and can adopt the perspective accurately.
You pay per token — both for what you send (prompt) and what you receive (completion). A well-engineered prompt achieves higher quality at fewer tokens. Prompt engineering is therefore both a quality and a cost-reduction discipline.
Text converted to numerical coordinates in high-dimensional space. Words with similar meaning end up close together. This is how RAG systems find relevant chunks: the question’s embedding is compared to all document chunk embeddings by distance.
An open-source Python framework for building multi-step AI workflows. Handles prompt templates, chaining, output parsing, vector database integration, and agent tool use. The book uses it throughout to demonstrate production-grade implementations.
Using one prompt to generate or improve another prompt. “Based on this task description, write me the best possible prompt for achieving this result.” The AI becomes a co-designer of the system it will operate within.
Prompt Engineering for Generative AI
James Phoenix & Mike Taylor · O’Reilly MediaPublished May 2024 by O’Reilly Media. 423 pages. Based on a Udemy bootcamp with 70,000+ students. Covers text and image prompting, LangChain, RAG, vector databases, agents, and production AI system design. Written by practitioners who consult on real AI deployments across industries.
The Gap Between Good and Great AI Output Is Almost Entirely in How You Ask.
The models are already extraordinary. The bottleneck is not the AI — it is the quality of the brief you give it. Prompt engineering is the skill that converts an impressive demo into a reliable production system. It is the most transferable technical skill of the current decade.
Yacine
Educator · Electronics Engineer · AI CuriousBTS instructor in Tangier teaching embedded systems and IoT. Writing about books at the frontier of AI, engineering, and human communication at yacine.love.