What prompt engineering actually is
Prompt engineering is writing instructions that produce reliable, high-quality outputs from language models. It’s not magic words or secret incantations — it’s clear communication with a system that follows patterns.
The core principle: the model matches the distribution of text that looks like your prompt. A vague prompt gets a vague answer (because the internet is full of vague answers to vague questions). A specific, structured prompt gets a specific, structured answer (because high-quality writing follows patterns that models have learned).
The six patterns that work everywhere
1. Role + Task + Format
Tell the model who it is, what to do, and how to present the answer.
You are a senior backend engineer reviewing a pull request.
Review this code for security vulnerabilities, performance issues, and maintainability.
Format: bullet list, one issue per bullet, severity (high/medium/low) prefix.
Why it works: constrains the model’s output distribution to the intersection of “expert writing” + “this specific task” + “this specific format.”
2. Few-shot examples
Show the model 2-3 examples of the input→output mapping you want before giving it the actual input.
Convert these natural language descriptions to SQL queries.
Description: "Show me all users who signed up last month"
SQL: SELECT * FROM users WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month') AND created_at < DATE_TRUNC('month', CURRENT_DATE);
Description: "Count orders by country this year"
SQL: SELECT country, COUNT(*) as order_count FROM orders WHERE created_at >= DATE_TRUNC('year', CURRENT_DATE) GROUP BY country ORDER BY order_count DESC;
Description: "Find products with no sales in 90 days"
SQL:
Why it works: examples define the pattern more precisely than instructions ever can. The model extrapolates the pattern to new inputs.
3. Chain of thought
Ask the model to think step-by-step before answering. This is not a politeness — it forces intermediate reasoning that improves accuracy on logic, math, and multi-step problems.
Determine if this argument is logically valid. Think through each step before concluding.
Why it works: without chain-of-thought, the model generates the answer token-by-token in a forward pass. With it, the “thinking” tokens serve as working memory, allowing the model to break complex problems into simpler subproblems.
4. Constraints and boundaries
Explicitly state what the model should NOT do. Models have strong tendencies (being helpful, being verbose, hedging with caveats). Override them directly.
Answer in 2 sentences maximum.
Do not include disclaimers or caveats.
If you don't know, say "I don't know" — do not guess.
Use only information from the provided context.
5. Structured output
Request specific formats. JSON, markdown tables, numbered lists, YAML — any format the model has seen frequently in training data.
Return a JSON object with these fields:
- "sentiment": one of "positive", "negative", "neutral"
- "confidence": float 0-1
- "key_phrases": array of strings
Why it works: structured formats reduce ambiguity. The model doesn’t have to decide how to organize the output — you’ve already decided.
6. Iterative refinement
Don’t try to write the perfect prompt. Write a decent prompt, see the output, then refine. Most production prompts go through 5-10 iterations.
Common refinement moves:
- Output too long → add word/sentence limit
- Output misses edge cases → add examples of those cases
- Output format inconsistent → add explicit format template
- Output too generic → add domain context or persona
Token economics matter
Every token in your prompt costs money and consumes context window. A 2,000-token system prompt that could be 200 tokens is wasting 90% of its budget.
Rules:
- Don’t repeat instructions. Say it once, clearly.
- Don’t explain why you want something unless it changes the output.
- Remove filler phrases (“I would like you to…”, “Could you please…”). Models don’t care about politeness — direct instructions produce identical or better results.
- Put the most important instruction first and last (primacy and recency effects exist in LLMs).
What doesn’t work
- Threatening the model (“If you get this wrong, I’ll be fired”) — produces anxious, over-hedged outputs.
- Excessive detail on trivial aspects — dilutes attention from what matters.
- Asking the model to “be creative” — creativity emerges from constraints, not from asking for it. Constrain the format, free the content.
- Expecting consistency across sessions — models are stateless. Every conversation starts from zero. If you need consistent behavior, put everything in the system prompt every time.
Related across the network
Articles in this guide
Chain-of-Thought vs. Direct Prompting — When Reasoning Steps Actually Help
Accuracy comparison data across task types for chain-of-thought vs. direct prompting, with token cost analysis, technique variants ranked, and a decision matrix for production use.
Output Formatting Control — JSON, Markdown, CSV, and Structured Responses
Format-specific prompt patterns with reliability rates per model, error handling for malformed output, and schema enforcement techniques for production AI systems.
How to Cut AI API Costs by 60-80% Without Losing Quality
Practical techniques for reducing LLM API spending: model routing, prompt compression, caching, batching, and output limits. Per-model pricing, cost projections at scale, and decision frameworks with real math.
Temperature and Top-P Explained — How Sampling Parameters Change Your Output
Practical guide to temperature and top-p settings with output behavior tables, recommended settings per use case, the reproducibility problem, parameter interaction matrix, and common misconceptions debunked.
Token Optimization — How to Get the Same Output Quality at 40% Lower Cost
Practical token reduction techniques with before/after prompt comparisons, per-model pricing tables, caching strategies, and batch processing math for production AI workloads.
System Prompt Patterns That Actually Work
Five battle-tested system prompt templates for common AI tasks. Copy, adapt, ship. No theory — just patterns.