Working with LLMs - Practical Techniques

🎯 The Three Approaches

From Training to Using

You've learned how LLMs are trained from scratch. But here's the good news: you don't need to train your own model!

Instead, you can use existing models and adapt them to your needs using three increasingly powerful techniques:

💬

1. Prompt Engineering

Craft better instructions

💰 Free - $0.01

Minutes to implement

📚

2. RAG

Give models external knowledge

💰 $10 - $100

Hours to days to implement

🎓

3. Fine-Tuning

Teach models new behaviors

💰 $100 - $10,000+

Days to weeks to implement

🎯 Which Should You Use?

Start with prompting (it's free and fast!), then move to RAG if you need external data, and only fine-tune if you need specialized behavior that prompting can't achieve.

Think of it like cooking: Prompting is choosing from a menu, RAG is adding your own ingredients, and fine-tuning is teaching the chef a new cuisine!

💬 Prompt Engineering

The Art of Asking

Prompt engineering is the skill of crafting effective instructions to get better outputs from LLMs. It's like learning to ask better questions!

The same model can give vastly different results based on how you phrase your request.

❌ Bad Prompt

Write about dogs.

Output: Dogs are animals. They have four legs. People keep them as pets. They bark...

Problem: Too vague, generic output

✅ Good Prompt

Write a 200-word article about the benefits of adopting rescue dogs, targeting first-time pet owners. Use a warm, encouraging tone and include 3 specific benefits.

Output: Thinking about welcoming a furry friend into your home? Rescue dogs make wonderful companions, especially for first-time owners...

Success: Specific, targeted, useful output

🎯 Key Prompting Techniques

1️⃣ Be Specific

Instead of: "Explain quantum computing"

Try: "Explain quantum computing to a 10-year-old using analogies with toys and games. Keep it under 100 words."

2️⃣ Provide Context

Instead of: "Write code"

Try: "You are an expert Python developer. Write a function that validates email addresses using regex. Include error handling and docstrings."

3️⃣ Give Examples

Few-shot prompting: Show the model what you want

Example:
"Classify sentiment:
'I love this!' → Positive
'This is terrible' → Negative
'It's okay' → Neutral
'Best day ever!' → ?"

4️⃣ Chain of Thought

Make it think step-by-step:

Add: "Let's think through this step by step:"

This dramatically improves reasoning on complex problems!

Interactive: Try Different Prompts

Task: Get the model to explain photosynthesis

🎨 System Prompts vs User Prompts

System Prompt: Sets the overall behavior and personality (like "You are a helpful assistant")

User Prompt: Your specific request or question

Example System Prompt: "You are a friendly coding tutor who explains concepts using simple analogies and always provides working code examples."

📚 RAG (Retrieval Augmented Generation)

Giving Models External Knowledge

The Problem: LLMs only know what they were trained on. They can't access your company's documents, recent news, or private data.

The Solution: RAG retrieves relevant information from external sources and includes it in the prompt!

How RAG Works

1. User Question
"What's our refund policy?"

↓

2. Search Knowledge Base
Find relevant documents

↓

3. Retrieve Context
"Refunds within 30 days..."

↓

4. Augment Prompt
Question + Retrieved Context

↓

5. LLM Generates Answer
Based on actual documents

❌ Without RAG

User: "What's our company's vacation policy?"

LLM: "I don't have access to your company's specific policies. Typically, companies offer 10-15 days..." (Generic, possibly wrong)

✅ With RAG

User: "What's our company's vacation policy?"
[System retrieves: "Employees receive 20 days PTO..."]

LLM: "According to your company policy, employees receive 20 days of PTO per year, accrued monthly..." (Accurate, specific)

🔧 RAG Components

Document Store: Where your data lives (PDFs, databases, websites)
Embeddings: Convert documents into vectors (remember Lesson 4!)
Vector Database: Store and search embeddings efficiently
Retrieval: Find most relevant documents using similarity search
Generation: LLM creates answer using retrieved context

📄 Use Case 1

Customer Support

Answer questions using your help docs, FAQs, and knowledge base

🔬 Use Case 2

Research Assistant

Query scientific papers, legal documents, or medical records

💼 Use Case 3

Internal Chatbot

Let employees query company policies, procedures, and documentation

⚡ Why RAG is Powerful

Always up-to-date: Add new documents anytime without retraining
Source attribution: Can cite which documents were used
Cost-effective: No expensive model training required
Privacy: Your data stays in your control
Reduces hallucinations: Model answers from actual documents

🎓 Fine-Tuning

Teaching Models New Behaviors

Fine-tuning means taking a pre-trained model and continuing its training on your specific data to teach it new behaviors, styles, or knowledge.

Think of it like this: The base model went to general school (pre-training), now you're sending it to specialized training (fine-tuning).

🎓 Pre-Training

What: Learning from massive internet data

Cost: Millions of dollars

Time: Weeks to months

Who: Large AI companies

Result: Base model

🎯 Fine-Tuning

What: Learning from your specific data

Cost: $100 - $10,000+

Time: Hours to days

Who: You!

Result: Specialized model

🎯 When to Fine-Tune

Fine-tuning is worth it when:

Consistent style needed: Medical reports, legal documents, specific writing tone
Domain expertise: Specialized knowledge not in base model (medical, legal, technical)
Structured outputs: Always respond in specific JSON format
Efficiency: Shorter prompts = lower costs at scale
Prompting isn't enough: Can't achieve desired behavior with prompts alone

📝 Example 1

Customer Service Bot

Fine-tune on 10,000 past support conversations to match your company's tone and policies

⚕️ Example 2

Medical Diagnosis Assistant

Fine-tune on medical literature and case studies for specialized medical knowledge

💻 Example 3

Code Generator

Fine-tune on your company's codebase to follow internal patterns and conventions

🔧 Fine-Tuning Process

Prepare Training Data: Collect 100s-1000s of examples (input → desired output)
Format Data: Usually JSON format with prompt/completion pairs
Choose Base Model: Select an appropriate foundation model
Train: Run fine-tuning (hours to days, costs $100-$10,000)
Evaluate: Test on held-out examples
Deploy: Use your custom model via API

Example Training Data Format

{"prompt": "Translate to French: Hello, how are you?", "completion": "Bonjour, comment allez-vous?"} {"prompt": "Translate to French: Good morning", "completion": "Bonjour"} {"prompt": "Translate to French: Thank you very much", "completion": "Merci beaucoup"}

After fine-tuning on thousands of examples, the model learns to translate consistently in your desired style

⚠️ Fine-Tuning Challenges

Data quality matters: Garbage in = garbage out
Overfitting risk: Model might memorize instead of generalize
Catastrophic forgetting: Might lose some general knowledge
Cost: Training and hosting custom models is expensive
Maintenance: Need to retrain as requirements change

⚖️ Comparing the Three Approaches

Aspect	💬 Prompt Engineering	📚 RAG	🎓 Fine-Tuning
Cost	Free - $0.01	$10 - $100	$100 - $10,000+
Time to Implement	Minutes	Hours to Days	Days to Weeks
Technical Complexity	Low	Medium	High
Data Required	None	Documents/Knowledge Base	100s-1000s of examples
Best For	General tasks, quick experiments	Accessing external/private data	Specialized behavior, consistent style
Updates	Instant (change prompt)	Easy (add documents)	Hard (retrain model)
Scalability	Excellent	Good	Requires infrastructure

🎯 Decision Framework

Start here: Always try prompt engineering first!

Move to RAG if: You need to access external documents, databases, or real-time information

Consider fine-tuning if: Prompting + RAG can't achieve the behavior you need, and you have the budget and data

💡 Pro Tip: You can combine these! Use RAG to retrieve context, then fine-tune for consistent formatting.

💡 Key Takeaways

📚 Lesson 6 Summary

You've learned three powerful ways to work with LLMs without training from scratch!

What You've Learned

💬 Prompt Engineering: Craft better instructions for better outputs (free, instant)
📚 RAG: Give models access to external knowledge and documents (moderate cost)
🎓 Fine-Tuning: Teach models specialized behaviors and styles (expensive, powerful)
⚖️ Trade-offs: Balance cost, complexity, and capabilities
🎯 Decision framework: Start simple, scale up only when needed
🔄 Combination: These techniques can work together

🚀 Real-World Impact

These techniques power the AI applications you use every day:

AI assistants: Use system prompts + fine-tuning for helpful behavior
Code completion tools: Fine-tuned on code repositories
Customer support bots: RAG to access help docs + fine-tuning for tone
Search engines: RAG to retrieve relevant web pages

🛠️ Working with LLMs