Intermediate Estimated time: "40 min" Category: ai Published 💡 Prerequisites

Advanced Prompt Engineering for LLMs¶

Reading time: 40 minutes | Difficulty: Intermediate | Category: Artificial Intelligence

Summary¶

Prompt engineering is the art and science of designing effective instructions for LLMs. This guide covers professional techniques from zero-shot to chain-of-thought, with practical examples and evaluation frameworks for local models.

🎯 Why Prompt Engineering Matters¶

Impact of Prompts on Results¶

# Poorly designed prompt
bad_prompt = "give me info about kubernetes"
# Result: vague, not useful, unstructured

# Well-designed prompt
good_prompt = """
Act as a Kubernetes expert. Explain the concepts of Pods, Deployments and Services in the context of a 3-tier web application (frontend, backend, database).

Requirements:
- Audience: Developers with basic Docker knowledge
- Length: 300-400 words
- Include: 1 YAML example per concept
- Format: Markdown with H2 sections

Structure:
1. Pods - What they are and when to use them
2. Deployments - Replica management
3. Services - Exposing applications
"""
# Result: structured, relevant, actionable

Benefits of Good Prompts¶

✅ Reduced iterations: Correct result on first attempt
✅ Consistency: Predictable and reproducible outputs
✅ Superior quality: More precise and useful responses
✅ Token savings: Fewer corrections = less cost
✅ Effective automation: Integrable into pipelines

📋 Anatomy of a Good Prompt¶

Fundamental Components¶

class PromptTemplate:
    def __init__(
        self,
        role: str,  # LLM personality/expertise
        task: str,  # What it should do
        context: str,  # Background information
        constraints: list,  # Limitations and requirements
        output_format: str,  # Desired format
        examples: list = None  # Few-shot examples
    ):
        self.role = role
        self.task = task
        self.context = context
        self.constraints = constraints
        self.output_format = output_format
        self.examples = examples or []

    def build(self) -> str:
        """Builds complete prompt."""

        prompt_parts = []

        # 1. Role/Persona
        if self.role:
            prompt_parts.append(f"Role: {self.role}\n")

        # 2. Context
        if self.context:
            prompt_parts.append(f"Context:\n{self.context}\n")

        # 3. Examples (few-shot)
        if self.examples:
            prompt_parts.append("Examples:")
            for i, example in enumerate(self.examples, 1):
                prompt_parts.append(f"\nExample {i}:")
                prompt_parts.append(f"Input: {example['input']}")
                prompt_parts.append(f"Output: {example['output']}\n")

        # 4. Main task
        prompt_parts.append(f"Task:\n{self.task}\n")

        # 5. Constraints
        if self.constraints:
            prompt_parts.append("Requirements:")
            for constraint in self.constraints:
                prompt_parts.append(f"- {constraint}")
            prompt_parts.append("")

        # 6. Output format
        if self.output_format:
            prompt_parts.append(f"Output format:\n{self.output_format}\n")

        return "\n".join(prompt_parts)

# Usage example
template = PromptTemplate(
    role="Senior Docker Security Expert",
    task="Audit this Dockerfile and suggest security improvements",
    context="""
Current Dockerfile:
FROM ubuntu:latest
RUN apt-get update && apt-get install -y python3
COPY . /app
WORKDIR /app
CMD ["python3", "app.py"]
""",
    constraints=[
        "Prioritize official and slim images",
        "Non-root user mandatory",
        "Multi-stage build if possible",
        "Minimize layers"
    ],
    output_format="""
JSON with:
{
  "issues": ["..."],
  "improvements": ["..."],
  "dockerfile_improved": "..."
}
"""
)

prompt = template.build()
print(prompt)

🎓 Technique 1: Zero-Shot Prompting¶

Definition¶

Give clear instructions without prior examples. The model must infer what to do from the description alone.

When to Use¶

Simple and well-defined tasks
Large models (13B+) with good comprehension
When no examples are available

Practical Example¶

def zero_shot_classification(text: str, categories: list) -> str:
    """Classifies text into categories without prior examples."""

    prompt = f"""
Classify the following text into ONE of these categories: {', '.join(categories)}

Text: "{text}"

Respond ONLY with the category name, no explanations.

Category:"""

    response = requests.post("http://localhost:11434/api/generate", json={
        "model": "llama2:13b-chat-q4_0",
        "prompt": prompt,
        "temperature": 0.1,
        "stream": False
    })

    return response.json()["response"].strip()

# Usage
categories = ["Bug", "Feature Request", "Documentation", "Question"]
issue_text = "The application crashes when clicking the submit button"

category = zero_shot_classification(issue_text, categories)
print(f"Category: {category}")  # Output: Bug

Zero-Shot Best Practices¶

# ❌ Vague prompt
bad_prompt = "Classify this: 'app crashes'"

# ✅ Clear and specific prompt
good_prompt = """
Task: Support ticket classification

Valid categories:
1. BUG - Functional error in application
2. FEATURE - New functionality request
3. DOCS - Documentation problem
4. QUESTION - User query

Text to classify: "The application crashes when clicking the submit button"

Instructions:
- Respond ONLY with category name
- If unsure, choose the most probable
- Format: Uppercase single word

Category:"""

🎯 Technique 2: Few-Shot Prompting¶

Definition¶

Provide input-output examples before the actual task to guide the model.

When to Use¶

Complex tasks with specific format
Medium models (7B-13B) needing guidance
When consistent outputs are needed

Practical Example¶

def few_shot_entity_extraction(text: str) -> dict:
    """Extracts entities using few-shot learning."""

    prompt = f"""
Extract technical entities from incident descriptions.

Example 1:
Text: "PostgreSQL database on srv-db-01 is experiencing high CPU usage"
Entities: {"technology": "PostgreSQL", "resource": "database", "server": "srv-db-01", "metric": "CPU usage", "status": "high"}

Example 2:
Text: "Nginx reverse proxy returning 502 errors for api.example.com"
Entities: {"technology": "Nginx", "resource": "reverse proxy", "error": "502", "domain": "api.example.com"}

Example 3:
Text: "Kubernetes pod web-frontend-abc123 is in CrashLoopBackOff state"
Entities: {"technology": "Kubernetes", "resource": "pod", "name": "web-frontend-abc123", "status": "CrashLoopBackOff"}

Now extract entities from this text:
Text: "{text}"
Entities:"""

    response = requests.post("http://localhost:11434/api/generate", json={
        "model": "llama2:13b-chat-q4_0",
        "prompt": prompt,
        "temperature": 0.2,
        "stream": False,
        "format": "json"
    })

    import json
    return json.loads(response.json()["response"])

# Usage
incident = "Redis cache cluster on redis-prod-cluster-01 showing memory leak"
entities = few_shot_entity_extraction(incident)
print(entities)
# Output: {"technology": "Redis", "resource": "cache cluster", "name": "redis-prod-cluster-01", "issue": "memory leak"}

Few-Shot Optimization¶

class FewShotOptimizer:
    def __init__(self, model: str = "llama2:13b-chat-q4_0"):
        self.model = model
        self.ollama_url = "http://localhost:11434/api/generate"

    def find_optimal_examples(
        self,
        task_description: str,
        candidate_examples: list,
        test_cases: list,
        max_examples: int = 5
    ) -> list:
        """
        Finds optimal number and selection of examples.

        Args:
            task_description: Task description
            candidate_examples: Pool of possible examples
            test_cases: Test cases for evaluation
            max_examples: Maximum examples to test

        Returns:
            Optimal examples list
        """

        best_score = 0
        best_examples = []

        # Try different combinations
        from itertools import combinations

        for n in range(1, min(max_examples + 1, len(candidate_examples) + 1)):
            for example_combo in combinations(candidate_examples, n):
                # Test with these examples
                score = self.evaluate_examples(
                    task_description,
                    list(example_combo),
                    test_cases
                )

                if score > best_score:
                    best_score = score
                    best_examples = list(example_combo)

        return best_examples

    def evaluate_examples(
        self,
        task: str,
        examples: list,
        test_cases: list
    ) -> float:
        """Evaluates example quality on test cases."""

        correct = 0

        for test_case in test_cases:
            # Build prompt with examples
            prompt = self.build_few_shot_prompt(task, examples, test_case["input"])

            # Get response
            response = requests.post(self.ollama_url, json={
                "model": self.model,
                "prompt": prompt,
                "temperature": 0.1,
                "stream": False
            })

            output = response.json()["response"].strip()

            # Compare with expected output
            if output == test_case["expected_output"]:
                correct += 1

        return correct / len(test_cases) if test_cases else 0

    def build_few_shot_prompt(self, task: str, examples: list, input_text: str) -> str:
        """Builds prompt with examples."""

        prompt_parts = [task, ""]

        for i, example in enumerate(examples, 1):
            prompt_parts.append(f"Example {i}:")
            prompt_parts.append(f"Input: {example['input']}")
            prompt_parts.append(f"Output: {example['output']}")
            prompt_parts.append("")

        prompt_parts.append("Your turn:")
        prompt_parts.append(f"Input: {input_text}")
        prompt_parts.append("Output:")

        return "\n".join(prompt_parts)

🧠 Technique 3: Chain-of-Thought (CoT)¶

Definition¶

Instruct the model to show its step-by-step reasoning before giving the final answer.

When to Use¶

Complex problems requiring multiple steps
Debugging and troubleshooting
Analysis and diagnosis

Practical Example¶

def chain_of_thought_debug(error_log: str, context: str = "") -> dict:
    """Uses CoT for complex debugging."""

    prompt = f"""
Act as an expert debugger. Analyze this error using step-by-step reasoning.

Error:
{error_log}

Context:
{context}

Think out loud, step by step:

Step 1 - Identify error type:
[Your reasoning here]

Step 2 - Analyze stack trace:
[Your reasoning here]

Step 3 - Identify relevant variables/state:
[Your reasoning here]

Step 4 - Root cause hypothesis:
[Your reasoning here]

Step 5 - Conclusion and solution:
[Your reasoning here]

Final format in JSON:

{
  "error_type": "...",
  "root_cause": "...",
  "reasoning_steps": ["step 1", "step 2", ...],
  "solution": "...",
  "confidence": 0.0-1.0
}

"""

    response = requests.post("http://localhost:11434/api/generate", json={
        "model": self.model,
        "prompt": prompt,
        "temperature": 0.3,
        "stream": False
    })

    # Extract JSON from end of response
    full_response = response.json()["response"]

    # Parse JSON
    import json
    import re
    json_match = re.search(r'\{.*\}', full_response, re.DOTALL)
    if json_match:
        return json.loads(json_match.group())

    return {"error": "Could not parse response"}

# Usage
error = """
TypeError: Cannot read property 'id' of undefined
    at getUserProfile (app/controllers/user.js:42:18)
    at Router.handle (node_modules/express/lib/router/index.js:284:7)
"""

context = """
Endpoint: GET /api/users/:id/profile
Request: user_id=12345
Database query returned empty result
"""

debug_result = chain_of_thought_debug(error, context)
print("Reasoning:")
for step in debug_result["reasoning_steps"]:
    print(f"- {step}")
print(f"\nSolution: {debug_result['solution']}")
print(f"Confidence: {debug_result['confidence']}")

CoT with Self-Consistency¶

def chain_of_thought_with_consistency(
    question: str,
    num_samples: int = 5
) -> dict:
    """
    Generates multiple CoT responses and selects most consistent.
    """

    prompt_template = f"""
Solve this problem step by step:

{question}

Step-by-step reasoning:
1. [First step]
2. [Second step]
3. [Third step]
...

Final answer: [Your answer]
"""

    responses = []

    # Generate multiple responses with high temperature
    for _ in range(num_samples):
        response = requests.post("http://localhost:11434/api/generate", json={
            "model": "llama2:13b-chat-q4_0",
            "prompt": prompt_template,
            "temperature": 0.7,  # Higher variation
            "stream": False
        })

        responses.append(response.json()["response"])

    # Extract final answers
    final_answers = []
    for resp in responses:
        # Search for "Final answer:" in response
        import re
        match = re.search(r'Final answer:\s*(.+)', resp, re.IGNORECASE)
        if match:
            final_answers.append(match.group(1).strip())

    # Find most common answer (voting)
    from collections import Counter
    answer_counts = Counter(final_answers)
    most_common_answer, count = answer_counts.most_common(1)[0]

    return {
        "answer": most_common_answer,
        "confidence": count / num_samples,
        "all_responses": responses,
        "answer_distribution": dict(answer_counts)
    }

# Usage
question = """
A Kubernetes pod is consuming 800MB of memory but its limit is 512MB.
The pod doesn't restart but new requests fail.
Why is this happening and how to fix it?
"""

result = chain_of_thought_with_consistency(question, num_samples=5)
print(f"Consensus answer: {result['answer']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Distribution: {result['answer_distribution']}")

🎨 Technique 4: Role Prompting¶

Definition¶

Assign a specific role or personality to the model to get more appropriate responses.

Practical Example¶

class RoleBasedPrompt:
    ROLES = {
        "devops_engineer": """
You are a Senior DevOps Engineer with 10+ years experience in:
- Kubernetes, Docker, Terraform
- AWS, GCP, Azure
- CI/CD (Jenkins, GitLab, GitHub Actions)
- Observability (Prometheus, Grafana, ELK)

Your style:
- Pragmatic and solution-oriented
- Focused on automation and scalability
- Prefers code over long explanations
- Considers security and costs in recommendations
""",
        "security_expert": """
You are a Security Architect specializing in:
- Application Security (OWASP Top 10)
- Cloud Security (CIS Benchmarks)
- Container Security (trivy, falco)
- Compliance (SOC2, ISO 27001, GDPR)

Your style:
- Security first, always
- Assumes breach (zero trust)
- Provides evidence and references
- Balances security with usability
""",
        "sre": """
You are a Site Reliability Engineer focused on:
- Availability and reliability (SLIs, SLOs, SLAs)
- Incident Management and Postmortems
- Capacity Planning
- Chaos Engineering

Your style:
- Data and metrics based
- Proactive in prevention
- Automates toil relentlessly
- Documents everything for future reference
"""
    }

    def __init__(self, role: str, model: str = "llama2:13b-chat-q4_0"):
        self.role = self.ROLES.get(role, "")
        self.model = model
        self.ollama_url = "http://localhost:11434/api/generate"

    def ask(self, question: str, context: str = "") -> str:
        """Asks question with assigned role."""

        prompt = f"""
{self.role}

Additional context:
{context}

Question:
{question}

Your response (maintain role and style):
"""

        response = requests.post(self.ollama_url, json={
            "model": self.model,
            "prompt": prompt,
            "temperature": 0.4,
            "stream": False
        })

        return response.json()["response"]

# Comparative usage
question = "How to deploy a Node.js application in Kubernetes?"

# DevOps perspective
devops = RoleBasedPrompt("devops_engineer")
devops_answer = devops.ask(question)
print("DevOps Engineer:")
print(devops_answer)
print("\n" + "="*80 + "\n")

# Security perspective
security = RoleBasedPrompt("security_expert")
security_answer = security.ask(question)
print("Security Expert:")
print(security_answer)
print("\n" + "="*80 + "\n")

# SRE perspective
sre = RoleBasedPrompt("sre")
sre_answer = sre.ask(question)
print("SRE:")
print(sre_answer)

📊 Prompt Evaluation¶

Evaluation Framework¶

from dataclasses import dataclass
from typing import Callable
import statistics

@dataclass
class PromptMetrics:
    relevance: float  # 0-1
    accuracy: float  # 0-1
    completeness: float  # 0-1
    consistency: float  # 0-1
    tokens_used: int
    latency_ms: float

class PromptEvaluator:
    def __init__(self, model: str = "llama2:13b-chat-q4_0"):
        self.model = model
        self.ollama_url = "http://localhost:11434/api/generate"

    def evaluate_prompt(
        self,
        prompt: str,
        test_cases: list,
        evaluation_criteria: dict
    ) -> PromptMetrics:
        """
        Evaluates prompt across multiple dimensions.

        Args:
            prompt: Template prompt to evaluate
            test_cases: Test case list
            evaluation_criteria: Custom evaluation criteria

        Returns:
            Aggregated prompt metrics
        """

        import time

        results = []
        total_tokens = 0
        latencies = []

        for test_case in test_cases:
            # Execute prompt
            full_prompt = prompt.format(**test_case["variables"])

            start_time = time.time()
            response = requests.post(self.ollama_url, json={
                "model": self.model,
                "prompt": full_prompt,
                "temperature": 0.2,
                "stream": False
            })
            latency = (time.time() - start_time) * 1000

            output = response.json()["response"]

            # Evaluate response
            relevance = self._evaluate_relevance(output, test_case["expected_topics"])
            accuracy = self._evaluate_accuracy(output, test_case["ground_truth"])
            completeness = self._evaluate_completeness(output, test_case["required_elements"])

            results.append({
                "relevance": relevance,
                "accuracy": accuracy,
                "completeness": completeness
            })

            # Count tokens (approximate)
            total_tokens += len(full_prompt.split()) + len(output.split())
            latencies.append(latency)

        # Calculate aggregated metrics
        return PromptMetrics(
            relevance=statistics.mean([r["relevance"] for r in results]),
            accuracy=statistics.mean([r["accuracy"] for r in results]),
            completeness=statistics.mean([r["completeness"] for r in results]),
            consistency=1.0 - statistics.stdev([r["accuracy"] for r in results]) if len(results) > 1 else 1.0,
            tokens_used=total_tokens,
            latency_ms=statistics.mean(latencies)
        )

    def _evaluate_relevance(self, output: str, expected_topics: list) -> float:
        """Evaluates if response is relevant to expected topics."""

        output_lower = output.lower()
        matches = sum(1 for topic in expected_topics if topic.lower() in output_lower)
        return matches / len(expected_topics) if expected_topics else 0.0

    def _evaluate_accuracy(self, output: str, ground_truth: str) -> float:
        """Evaluates accuracy by comparing with ground truth."""

        # Use another LLM for evaluation (LLM-as-Judge)
        eval_prompt = f"""
Evaluate the accuracy of this response on a 0.0 to 1.0 scale.

Correct answer (ground truth):
{ground_truth}

Response to evaluate:
{output}

Criteria:
- 1.0: Completely correct
- 0.8: Mostly correct with minor errors
- 0.6: Partially correct
- 0.4: Incorrect but related
- 0.0: Completely incorrect

Respond ONLY with a number from 0.0 to 1.0:
"""

        response = requests.post(self.ollama_url, json={
            "model": self.model,
            "prompt": eval_prompt,
            "temperature": 0.1,
            "stream": False
        })

        try:
            score = float(response.json()["response"].strip())
            return max(0.0, min(1.0, score))
        except:
            return 0.5  # Default if cannot parse

    def _evaluate_completeness(self, output: str, required_elements: list) -> float:
        """Evaluates if response includes all required elements."""

        output_lower = output.lower()
        present = sum(1 for elem in required_elements if elem.lower() in output_lower)
        return present / len(required_elements) if required_elements else 1.0

    def compare_prompts(self, prompts: dict, test_cases: list) -> dict:
        """Compares multiple prompt variants."""

        results = {}

        for name, prompt in prompts.items():
            print(f"Evaluating prompt: {name}...")
            metrics = self.evaluate_prompt(prompt, test_cases, {})
            results[name] = metrics

        # Generate comparative report
        return self._generate_comparison_report(results)

    def _generate_comparison_report(self, results: dict) -> dict:
        """Generates comparative report of prompts."""

        # Find best in each metric
        best = {
            "relevance": max(results.items(), key=lambda x: x[1].relevance),
            "accuracy": max(results.items(), key=lambda x: x[1].accuracy),
            "completeness": max(results.items(), key=lambda x: x[1].completeness),
            "consistency": max(results.items(), key=lambda x: x[1].consistency),
            "efficiency": min(results.items(), key=lambda x: x[1].tokens_used),
            "speed": min(results.items(), key=lambda x: x[1].latency_ms)
        }

        return {
            "all_results": results,
            "best_per_metric": best,
            "recommendation": self._recommend_best_prompt(results)
        }

    def _recommend_best_prompt(self, results: dict) -> str:
        """Recommends overall best prompt."""

        # Weighted scoring
        scores = {}
        for name, metrics in results.items():
            score = (
                metrics.relevance * 0.3 +
                metrics.accuracy * 0.4 +
                metrics.completeness * 0.2 +
                metrics.consistency * 0.1
            )
            scores[name] = score

        best_name = max(scores.items(), key=lambda x: x[1])[0]
        return best_name

# Usage
evaluator = PromptEvaluator()

# Prompts to compare
prompts = {
    "simple": """
Explain what Kubernetes is.
""",

    "structured": """
Explain what Kubernetes is.

Audience: Backend developers with Docker experience
Length: 200-300 words
Include: Main concepts, benefits, when to use

Format:
1. Brief definition
2. Key concepts
3. Benefits
4. When to use vs Docker Compose
""",

    "role_based": """
You are a Senior Platform Engineer explaining to your team.

Explain what Kubernetes is practically and clearly.

Requirements:
- Audience: Developers using Docker
- Focus: Pragmatic, not theoretical
- Examples: Real use cases
- Length: 250 words
"""
}

# Test cases
test_cases = [
    {
        "variables": {},
        "expected_topics": ["containers", "orchestration", "pods", "clusters"],
        "ground_truth": "Kubernetes is a container orchestration platform...",
        "required_elements": ["pods", "services", "deployments"]
    }
]

# Compare
comparison = evaluator.compare_prompts(prompts, test_cases)

print("\n📊 Evaluation Results:\n")
for name, metrics in comparison["all_results"].items():
    print(f"{name}:")
    print(f"  Relevance: {metrics.relevance:.2f}")
    print(f"  Accuracy: {metrics.accuracy:.2f}")
    print(f"  Completeness: {metrics.completeness:.2f}")
    print(f"  Tokens: {metrics.tokens_used}")
    print(f"  Latency: {metrics.latency_ms:.0f}ms\n")

print(f"🏆 Recommendation: {comparison['recommendation']}")

🔧 Advanced Techniques¶

1. Self-Consistency with Voting¶

Already covered in Chain-of-Thought, but here's the complete implementation:

def self_consistency_voting(
    prompt: str,
    num_samples: int = 7,
    temperature: float = 0.8
) -> dict:
    """
    Generates multiple responses and uses voting to determine consensus.
    """

    responses = []

    for i in range(num_samples):
        response = requests.post("http://localhost:11434/api/generate", json={
            "model": "llama2:13b-chat-q4_0",
            "prompt": prompt,
            "temperature": temperature,
            "stream": False
        })

        responses.append(response.json()["response"])

    # Use LLM to determine consensus
    consensus_prompt = f"""
These are {num_samples} different responses to the same question:

{chr(10).join([f"{i+1}. {r}" for i, r in enumerate(responses)])}

Analyze the responses and determine:
1. Consensus points (what all or most say)
2. Divergence points (where they differ)
3. Final synthesized answer (combine the best from all)

JSON format:

{
  "consensus_points": ["..."],
  "divergence_points": ["..."],
  "final_answer": "...",
  "confidence": 0.0-1.0
}

"""

    consensus_response = requests.post("http://localhost:11434/api/generate", json={
        "model": "llama2:13b-chat-q4_0",
        "prompt": consensus_prompt,
        "temperature": 0.2,
        "stream": False,
        "format": "json"
    })

    import json
    return json.loads(consensus_response.json()["response"])

2. Prompt Chaining¶

class PromptChain:
    def __init__(self, model: str = "llama2:13b-chat-q4_0"):
        self.model = model
        self.ollama_url = "http://localhost:11434/api/generate"
        self.chain_history = []

    def add_step(self, prompt_template: str, use_previous: bool = True):
        """Adds step to chain."""

        def step_function(input_data: dict) -> str:
            # Build prompt with previous data if needed
            if use_previous and self.chain_history:
                previous_output = self.chain_history[-1]["output"]
                input_data["previous_output"] = previous_output

            prompt = prompt_template.format(**input_data)

            response = requests.post(self.ollama_url, json={
                "model": self.model,
                "prompt": prompt,
                "temperature": 0.3,
                "stream": False
            })

            output = response.json()["response"]

            self.chain_history.append({
                "prompt": prompt,
                "output": output,
                "input_data": input_data
            })

            return output

        return step_function

    def execute(self, initial_data: dict) -> dict:
        """Executes entire chain."""
        return {
            "final_output": self.chain_history[-1]["output"] if self.chain_history else None,
            "chain_history": self.chain_history
        }

# Example: Code analysis pipeline
chain = PromptChain()

# Step 1: Analyze code
analyze_step = chain.add_step("""
Analyze this code and identify:
1. Main functionality
2. Potential bugs
3. Performance improvements

Code:
{code}

Analysis:
""", use_previous=False)

# Step 2: Generate refactor
refactor_step = chain.add_step("""
Based on this analysis:
{previous_output}

Generate refactored code implementing suggested improvements.

Refactored code:
""")

# Step 3: Document
document_step = chain.add_step("""
Generate complete documentation for this refactored code:
{previous_output}

Include:
- Function docstring
- Inline comments
- Usage examples

Documentation:
""")

# Execute chain
code_to_analyze = """
def process_data(data):
    result = []
    for item in data:
        if item > 0:
            result.append(item * 2)
    return result
"""

analyze_step({"code": code_to_analyze})
refactor_step({})
document_step({})

final_result = chain.execute({})
print(final_result["final_output"])

📚 Best Practices and Anti-Patterns¶

✅ DO's¶

Be specific and clear

# ✅ Good
prompt = "Generate a Python function that calculates factorial using recursion. Include error handling for negative inputs and type annotations for return value."

Use clear delimiters

# ✅ Good
prompt = """
Text to analyze:
'''
{user_input}
'''

Analysis:
"""

Specify output format

# ✅ Good
prompt = "Respond in JSON format with these keys: {status, message, data}"

Provide relevant context

# ✅ Good
prompt = f"Context: E-commerce web application with 1M daily users\nQuestion: {question}"

❌ DON'Ts¶

Ambiguity

# ❌ Bad
prompt = "Give me info about that"

Overly long prompts

# ❌ Bad (>4000 words of unnecessary context)
prompt = f"{entire_documentation}\nNow answer: {simple_question}"

Implicit knowledge assumptions

# ❌ Bad
prompt = "Explain how this works"

No output validation

# ❌ Bad
response = llm.generate(prompt)
use_directly(response)  # No validation

🔗 Additional Resources¶

📚 Next Steps¶

After mastering prompt engineering, consider:

Fine-tuning Basics - Customize models for your domain
Model Evaluation - Metrics and benchmarks
LLMs in Production - Deploy at scale

Have you developed effective prompting techniques? Share your strategies and learnings in the comments.