Foundation Study Guide

A conversational guide for preparing for the Foundation tier assessment. This playbook covers all 7 elements you need to understand, with study resources, practice activities, and mini challenges for each.

How This Works

The Foundation assessment is a conversation, not a written test. You'll discuss concepts with a Practitioner or Expert, walk through real examples, and demonstrate that you can reason about AI — not just recite definitions.

For each of the 7 elements below, you should be able to:

Explain the concept clearly in your own words
Describe trade-offs and limitations
Talk through examples from your own work

Knowledge Check

Before moving to the assessment, you should feel confident answering "yes" to all of these for every element:

I can clearly explain this concept to someone else
I have applied this in at least one real scenario
I understand common mistakes in this area
I can describe limitations and trade-offs
I know how to recover when something doesn't work as expected

Final Advice

Before scheduling your assessment:

Review your real-world usage. Be ready with concrete examples.
Be ready to explain your thinking process. "Why" matters more than "what."
Think in terms of reasoning, not memorization. This is a conversation, not a quiz.

The 7 Elements

PromptsWriting effective AI instructions

Prompt — What Is It?

A good prompt engineer knows how to give instructions in a way that is easy for the AI to understand — organizing the request clearly, avoiding confusion, and being specific about what is expected.

They also know how to improve a prompt when the first answer is not ideal. Instead of accepting a weak result, they analyze what went wrong and rewrite the prompt to guide the model better.

Key prompting techniques to understand:

Zero-shot — Give the AI a task with no examples; it relies on general knowledge
Few-shot — Provide examples in the prompt to show the format/style you want
Chain-of-thought — Ask the model to reason step by step before answering
Role-based — Assign a persona (e.g., "act as a senior recruiter") to guide tone and priorities

What We Expect

You can identify the structure (role, context, format, task, constraints) within a prompt
You understand that prompts often require refinement and are comfortable iterating
You are familiar with at least two prompting techniques and know how to apply them
You can use real examples to demonstrate your understanding, not just explain theory

Prompt — Study Resources

Start Here:

The ADVANCED 2026 Guide to Prompt Engineering — Master the Perfect Prompt

After watching, you should be able to explain: why prompt structure matters, the difference between vague vs. structured prompts, what the main techniques are, and why iteration is essential.

Going Deeper

Use these to expand your understanding of prompt design principles, study structured examples across use cases, and explore advanced techniques like structured output formatting.

Common Gaps

Expecting the AI to "figure out" what they want without providing enough context or clarity
Writing vague instructions like "fix this code" or "write an essay for me" without defining scope, format, or constraints
Regenerating responses repeatedly without improving or refining the original prompt
Not understanding why prompt structure matters or how it directly impacts output quality

Prompt — Practice

Hands-On Activity: Choose a vague instruction (e.g., "Write a project summary," "Fix this code," "Create interview questions"). Then:

Rewrite it using a clear role, context, constraints, and desired output format
Generate the output
Refine the prompt at least twice, improving clarity and structure each time
Compare the results — what changed, why did the output improve, which techniques did you apply?

Mini Challenge:

"Create feedback for a developer"

Identify what is missing. Reflect on a time when AI failed you.
Rewrite it using at least two prompting techniques (e.g., role-based + structured format, or few-shot + constraints).
Justify your design decisions in 3-5 sentences.
Show how the revised version produces a more consistent and useful result.

BONUS

Create a second version optimized for a different audience (e.g., junior vs senior developer) and explain what changed.

Real-World Application

Apply structured prompting to something you are currently working on. Examples: improve a Slack message for leadership alignment, generate structured interview questions for a role, refactor documentation into a clearer format, create a decision framework with defined constraints.

Before using AI: define your objective, audience, constraints, and output format.

After using AI:

Evaluate the result — how did you diagnose any issues?
Refine the prompt strategically (not randomly)
Document what worked — what changed in your revised prompt?

Ready? You're ready if you can explain your reasoning, diagnose prompt failures, intentionally apply techniques, and improve a vague prompt live.

LLMs (Large Language Models)The core reasoning engines behind modern AI tools

LLMs — What Is It?

Large Language Models (LLMs) are the core reasoning engines behind modern AI tools — ChatGPT, Claude, Gemini, Llama, and others. They are trained on massive amounts of text data and learn patterns in language, reasoning, and structure.

An LLM does not "think" or "know" things in a human way. It predicts the most likely next word based on patterns learned during training. When properly prompted, this prediction ability can simulate reasoning, summarization, coding, analysis, and more.

What We Expect

You can explain what an LLM is at a high level — it generates text by predicting patterns, not by retrieving facts
You understand core capabilities and limitations — what LLMs are strong at (drafting, summarizing, transformation) and where they struggle (guaranteed accuracy, real-time data, complex reasoning without structure)
You recognize hallucination risks and understand that confident-sounding outputs are not automatically correct
You understand training data limitations, including knowledge cutoffs and the absence of real-time awareness
You can use real examples to demonstrate this understanding

LLMs — Study Resources

Start Here:

After watching, you should be able to explain: what an LLM is and how it generates responses, what "hallucination" means, how to decide whether to trust output, differences between major systems, and strengths/limitations in real-world use.

Going Deeper

Use these to learn about model architecture at a high level (tokens, probability, training data), explore hallucinations and bias, and understand differences between major LLM systems.

Common Gaps

Pattern-based Understanding: "The model gives answers because it knows the facts." — LLMs generate text by predicting tokens based on learned patterns, not retrieving facts.
Hallucination Awareness: "If the model sounds confident, the answer is probably correct." — Confidence in tone does not equal correctness.
Capability Boundaries: "The model is intelligent, so it should handle any task equally well." — LLMs have specific strengths and weaknesses.
Data Awareness: "The model always knows the latest information." — LLMs have a knowledge cutoff and no built-in real-time awareness.
Prompting Sensitivity: "If the output isn't good, the model just isn't capable." — Output quality is directly influenced by prompt clarity and structure.

LLMs — Practice

Hands-On Activity: Pick a real task you would normally use an LLM for (e.g., summarizing documentation, reviewing code). Then:

Describe how an LLM processes your input at a high level — what is it actually doing? What role does probability play?
Run the same prompt twice and compare outputs — what changed? What does this tell you about stochastic generation?
Modify the prompt and observe how output variability changes

Mini Challenge:

"Why did the model hallucinate in this case?"

Provide three possible causes related to: lack of context, ambiguity in the prompt, and model training limitations
Propose one mitigation strategy for each cause
Explain your reasoning clearly as if you were teaching a junior developer

BONUS

Explain why hallucination is not necessarily a "bug," but a natural consequence of how LLMs generate text.

Real-World Application

Take a real workflow in your current role and analyze it through an LLM lens. Examples: If using AI for documentation, what risks exist regarding accuracy? If using AI for decision support, how do you validate outputs? If using AI for content generation, how do you control hallucination and variability?

Document: where the model could fail, what guardrails you would implement, and how you would explain model limitations to stakeholders.

Ready? You're ready if you can explain how LLMs generate responses, understand why variability and hallucination occur, can diagnose incorrect outputs, and suggest practical mitigation strategies.

EmbeddingsRepresenting meaning as numbers for semantic search

Embeddings — What Is It?

Embeddings are about representing meaning as numbers. Instead of treating text as just words or keywords, embeddings convert text into numerical vectors (lists of numbers) that capture semantic meaning. This allows systems to compare ideas based on meaning, not just exact word matches.

For example, "How do I reset my password?" and "I forgot my login credentials" don't share many keywords, but an embedding model converts them into vectors that are close together in mathematical space — because they mean something very similar.

Embeddings make it possible to: perform semantic search, group similar content together, power recommendation systems, enable RAG, and improve classification systems.

What We Expect

An embedding is a numerical representation of meaning
Text is converted into vectors in a high-dimensional space
Concepts with similar meaning are positioned closer together
Similarity metrics (like cosine similarity) measure how close meanings are
Embeddings don't truly "understand" language — they capture statistical patterns
Retrieval quality depends heavily on the embedding model and how content is chunked

Embeddings — Study Resources

Start Here:

After watching, you should be able to explain: how text is converted into vectors, how semantic similarity is calculated, how embeddings power search and clustering, and the trade-offs between keyword search and semantic search.

Going Deeper

Use these to understand vector spaces and distance metrics, learn how embeddings are stored/retrieved using vector databases, explore common failure patterns (irrelevant retrieval, embedding mismatch, poor chunking), and evaluate when embeddings are the right tool.

Common Gaps

Confusing embeddings with simple keyword search
Believing embeddings "understand" language the way humans do
Not understanding that embeddings are numerical vector representations of meaning
Not understanding how similarity metrics (e.g., cosine similarity) determine results

Embeddings — Practice

Hands-On Activity: Choose a small set of 10-20 short text samples (e.g., product descriptions, support tickets, Slack messages). Then:

Generate embeddings for each text (using any embedding API or tool)
Compare at least three pairs: semantically similar texts, texts that share keywords but differ in meaning, and texts with different wording but the same meaning
Analyze which pairs have higher similarity and why
Explain in your own words: what an embedding represents, why they're useful for similarity search, and why they're not a "database of knowledge"

Mini Challenge:

"Our search feature returns irrelevant results even though we're using embeddings."

List three possible causes (poor chunking strategy, embedding model mismatch, low-quality input text, missing metadata filtering)
Propose one corrective action for each cause
Explain how cosine similarity works at a high level, as if explaining to a product manager

BONUS

Explain why embeddings do not "understand" meaning, but still capture semantic relationships effectively.

Real-World Application

Think about your current role or a realistic business workflow. Examples: internal knowledge base search, support ticket clustering, duplicate detection, resume-to-job matching, semantic document retrieval.

For one of these: define the problem clearly, explain why embeddings are appropriate (or not), and describe how you would chunk the data, store vectors, compute similarity, and what guardrails you would add.

Be prepared to discuss trade-offs (speed vs. accuracy, chunk size vs. context retention) and failure modes (semantic drift, vague queries, domain mismatch).

Ready? You're ready if you can explain what embeddings are in simple terms, describe common use cases, diagnose typical issues (bad input text, poor chunking, wrong similarity expectations), and explain limitations and trade-offs.

GuardrailsSafety mechanisms that ensure AI behaves responsibly

Guardrails — What Is It?

Guardrails refer to the policies, technical controls, and design decisions that limit unsafe or inappropriate behavior in AI systems. At a practical level, guardrails are about:

Preventing harmful outputs (dangerous instructions, illegal guidance, explicit abuse)
Reducing misinformation and hallucinated content
Protecting sensitive data
Enforcing ethical and organizational standards
Controlling how AI is used in real workflows

Understanding guardrails means recognizing that AI systems are probabilistic and can generate unintended outputs. A strong AI practitioner does not assume the model will "self-regulate" — they intentionally design workflows that anticipate failure modes and reduce risk.

What We Expect

Understand what AI guardrails are and why they are necessary
Recognize common risk areas: harmful instructions, hallucinated outputs, biased content, sensitive data exposure
Explain why AI systems can generate unintended or unsafe outputs
Understand that safety is a shared responsibility between the model, the prompt designer, the application layer, and the organization

Guardrails — Study Resources

Start Here:

NVIDIA NeMo Guardrails: Full Walkthrough for Chatbots / AI

After watching, you should be able to explain: why AI systems require guardrails, the difference between built-in model safety and application-level safeguards, common AI risk areas, how prompt constraints reduce unsafe outputs, and the trade-offs between strict guardrails and system usability.

Going Deeper

Use these to expand your understanding of AI safety principles, learn how safety mechanisms work at different layers (model, prompt, application, governance), study real-world case studies of AI failures, and develop strategies for balancing usability and safety.

Common Gaps

Assuming AI systems are inherently safe and do not require additional safeguards
Failing to anticipate harmful, misleading, or unintended outputs
Ignoring sensitive data exposure risks when inputting proprietary or personal information
Over-relying on automation without implementing validation or human review

Guardrails — Practice

Hands-On Activity: Choose a realistic AI use case (e.g., internal chatbot, code assistant, documentation generator). Then:

Identify at least 3 potential risks in that system (harmful output, hallucinated information, sensitive data exposure, bias)
For each risk, define one prompt-level safeguard, one application-level safeguard, and whether human review is required
Explain your reasoning: why is this risk realistic? What would happen with no guardrails? What is the trade-off?

Mini Challenge:

"We launched an AI assistant internally. A user asks for legal advice, and the system provides a confident but incorrect answer."

Identify what guardrails failed (or were missing)
Propose: a prevention strategy (before output), a detection strategy (after output), and a recovery strategy (after discovery)
Explain how you would communicate this incident to leadership

BONUS

Explain how over-restricting the assistant could reduce its usefulness and how you would balance safety and productivity.

Real-World Application

Take something from your own workflow or organization. Examples: AI-generated documentation, resume screening, customer support automation, internal knowledge base chatbot, AI-powered decision support.

For one of these:

Map the system (input, model, output, user)
Identify where guardrails should exist (prompt design, output validation, data filtering, access control, human escalation)
Define what could go wrong, how you would detect it, and how you would recover

Ready? You're ready if you can explain why guardrails are needed, identify likely failure modes in a given use case, propose layered safeguards and justify the trade-offs, and describe how you would detect issues and recover.

Context WindowsHow much information an AI model can 'see' at once

Context Windows — What Is It?

A context window defines how much information an AI model can process in a single interaction. Tokens are pieces of text (words, subwords, or characters) that the model reads and uses to generate a response. Every message — system instructions, user input, and previous responses — consumes tokens.

When the total token count exceeds the model's context window:

Older messages may be truncated
Important instructions may be dropped
The model may "forget" earlier parts of the conversation
Responses may degrade in quality or coherence

Understanding context windows means understanding that memory in LLMs is not persistent — it exists only within the current token window. Longer conversations increase token consumption, and prompts, instructions, and examples all compete for limited space.

What We Expect

Understand what tokens are and how text is converted into tokens
Understand that context windows have size limits measured in tokens
Recognize that all input and output tokens count toward that limit
Know that when the context window is exceeded, older information may be truncated or lost
Explain why models may "forget" earlier parts of a long conversation
Understand that context limits affect both quality and cost

Context Windows — Study Resources

Start Here:

Stop Wasting Tokens: The Art of Context Engineering

After watching, you should be able to explain: what a context window is, what tokens are and how they relate to text length, why both input and output consume tokens, and how exceeding context limits degrades quality.

Going Deeper

Use these to expand your understanding of tokens and context limits, learn how context size impacts quality, cost, and reliability, explore why long conversations deteriorate over time, and understand trade-offs between keeping more history vs. staying focused.

Common Gaps

Believing LLMs have unlimited memory
No awareness of token limits or how they affect behavior
Feeding entire documents or long histories without managing context
Misinterpreting degraded responses as intelligence issues instead of context overflow

Context Windows — Practice

Hands-On Activity: Choose one of the following: a long document (10+ pages), a long chat conversation, or a multi-step reasoning prompt. Then:

Estimate how many tokens the input might consume
Identify what parts are truly necessary for the task
Reduce the context intentionally (summarize, remove redundancy, split into chunks)
Compare the outputs: full raw input vs. context-managed version. Did quality or coherence improve?

Mini Challenge:

A chatbot performs well for the first 10 messages, but after 40 exchanges it starts ignoring earlier instructions and contradicting itself.

Diagnose what is likely happening
Explain what a context window is in this scenario
Propose at least three solutions (conversation summarization, memory pruning, retrieval-based context injection, system instruction reinforcement)

BONUS

Explain how cost and latency are affected as context grows.

Real-World Application

Think about a realistic AI system. Examples: Customer Support ChatBot, Internal Knowledge-Base Assistant, Legal Document Analyzer, Code Review Assistant.

For one of these:

Map how context flows through the system (what is sent, how often, how large)
Identify risks (context overflow, instruction loss, increased latency, rising token cost)
Design a context management strategy (chunking, summarization, retrieval, selective memory retention)

Ready? You're ready if you can explain what a context window is and how token limits affect behavior, diagnose when poor output is caused by context overflow, and intentionally manage context using practical strategies.

RAG (Retrieval-Augmented Generation)Grounding AI responses in external knowledge

RAG — What Is It?

RAG (Retrieval-Augmented Generation) is a method that improves AI responses by giving the model access to external information. Instead of relying only on training data, RAG retrieves relevant data from a knowledge source and uses it to generate a more accurate, grounded answer.

The flow works like this:

A user submits a question
The system searches an external knowledge base for relevant information
The most relevant pieces of content are retrieved
That retrieved context is injected into the prompt
The LLM generates a grounded response based on that augmented prompt

The key idea is grounding. Without retrieval, an LLM may hallucinate or provide outdated information. With RAG, the model is guided by real, current, and domain-specific data.

What We Expect

Understand the retrieve-then-generate pattern
Clearly explain how retrieval happens before generation
Understand that RAG connects LLMs to external or private data sources
Recognize that LLMs alone rely only on training data, while RAG provides fresh and domain-specific information
Explain why RAG reduces hallucination (by grounding in retrieved context)
Understand that RAG does not eliminate hallucination completely — it reduces risk when retrieval is done correctly

RAG — Study Resources

Start Here:

After watching, you should be able to explain: the retrieve-then-generate pattern, why LLMs alone can hallucinate or provide outdated information, the basic components of a RAG pipeline (chunking, embeddings, vector similarity search, context injection, generation), and why retrieval quality directly affects answer quality.

Going Deeper

Use these to learn how retrieval works in practice, understand chunking strategies and their impact on quality, explore common failure modes (irrelevant retrieval, missing key context, context overload, grounded but wrong answers from bad sources), and understand trade-offs between precision vs. recall, latency vs. depth, and simplicity vs. complexity.

Common Gaps

Confusing RAG with fine-tuning or simple prompting
No concept of how to ground LLM responses in real data
Not understanding how retrieval quality directly impacts generation quality
Treating hallucination as purely a generation issue instead of sometimes a retrieval failure

RAG — Practice

Hands-On Activity: Pick a realistic use case (e.g., "internal policy assistant" or "customer support FAQ bot"). Describe the RAG flow in your own words:

What kind of data would be stored in the knowledge source?
How would the system decide what information to retrieve for a user question?
What gets inserted into the prompt, and why?
What should the model do if the retrieved context does not contain the answer?
Explain why this approach is better than "just prompting the LLM" for this use case.

Mini Challenge:

"A user asks a question. The RAG system retrieves content that is relevant, but the final answer is still wrong."

Give three possible reasons (retrieval returned the wrong chunk, retrieval returned a right chunk but missing key detail, the model misinterpreted or ignored the context)
For each reason, propose one fix (no code, just explain the change)
Explain how you would tell whether the problem is retrieval or generation

BONUS

Explain why RAG reduces hallucination risk but doesn't eliminate it.

Real-World Application

Choose one real workflow you have seen (or could imagine at work). Examples: "Answer questions about internal documentation," "Summarize HR policies accurately," "Support ticket assistant that cites sources."

Describe a simple design plan:

What are the top 2-3 types of questions users will ask?
What data sources would you ground answers in?
What would you retrieve (and how much) to stay within context limits?
What safety/quality checks would you add (e.g., "cite sources," "say 'I don't know' when context is missing")?

Ready? You're ready if you can explain the retrieve-then-generate pattern, diagnose failures conceptually, and describe how you would ground answers in real data.

EvaluationMeasuring whether an AI system is actually working

Evaluation — What Is It?

Evaluation is how we measure whether an AI system is actually working. It answers questions like: Is the output correct? Is it relevant? Is it grounded in the right data? Is it safe? Is it consistent? Is it useful for the intended user?

Evaluation is not just about checking correctness. It is about detecting failure patterns, identifying hallucination, comparing approaches, measuring trade-offs, and improving system reliability. Without evaluation, teams rely on intuition. With evaluation, teams rely on evidence.

Evaluation takes different forms:

Quantitative Metrics — Accuracy, precision/recall, similarity scores, latency, error rates. Useful when tasks have clear expected outcomes.
Benchmarks — Predefined datasets or scenarios used to compare performance across model versions, prompt variations, RAG pipeline changes, or guardrail adjustments.
Human Evaluation — Assessing clarity, helpfulness, tone, logical reasoning, safety, and appropriateness. Essential when subjective judgment matters.

What We Expect

Understand that AI systems must be evaluated intentionally — not assumed to be correct
Explain why "it looks good to me" is not a reliable evaluation method
Recognize different types of evaluation: quantitative metrics, structured test cases, human review
Understand that evaluation criteria must align with the system's purpose
Recognize that improving prompts, RAG pipelines, or guardrails requires measurable feedback

Evaluation — Study Resources

Start Here:

How to Evaluate (and Improve) Your LLM Apps

After watching, you should be able to explain: why evaluation is necessary for any AI system, the difference between quantitative metrics and human evaluation, why benchmarks and structured test cases matter, how to define what "good" means before measuring performance, and how evaluation supports iteration and continuous improvement.

Going Deeper

Use these to expand your understanding of evaluation methods, learn how to design test sets that reflect real user behavior, study common evaluation pitfalls (overfitting to a small test set, measuring the wrong thing, using metrics that don't match the use case), and understand trade-offs between evaluation depth vs. time cost, automated scoring vs. human judgment, and speed vs. reliability.

Common Gaps

Assuming AI outputs are correct by default
No concept of how to measure AI quality systematically
Relying on subjective judgment instead of defined criteria
Measuring the wrong metrics for the intended use case

Evaluation — Practice

Hands-On Activity: Choose a simple AI use case (e.g., FAQ assistant, document summarizer, resume screener). Then:

Define what "good output" means for this system (accuracy? relevance? tone? speed? grounded answers?)
Define 3 measurable criteria for success (e.g., "Answer must reference source," "Summary must include key points," "Response must not introduce unsupported claims")
Describe how you would compare Version A (current system) vs. Version B (improved prompt or pipeline) — use criteria, not intuition

Mini Challenge:

"A team updates a prompt and says, 'The responses feel better now.'"

Explain why this is not a sufficient evaluation
Propose a simple evaluation plan: What would you measure? How many test cases? Would human review be required?
Describe how you would detect regression over time

BONUS

Explain the difference between evaluating a single output and evaluating system performance consistently.

Real-World Application

Think of a real workflow in your organization that uses AI (or could use AI).

For that workflow:

Define the objective clearly
Identify what failure looks like and what success looks like
Design a simple evaluation loop: How will outputs be reviewed? How often? By whom? What metrics or criteria will be tracked?

Ready? You're ready if you can define what "good" means for a specific AI use case, propose measurable criteria instead of relying on intuition, compare two versions using structured reasoning, and understand that improvement requires evidence, not opinions.

How This Works​

Knowledge Check​

Final Advice​

The 7 Elements​

Prompt — What Is It?​

Prompt — Study Resources​

Prompt — Practice​

LLMs — What Is It?​

LLMs — Study Resources​

LLMs — Practice​

Embeddings — What Is It?​

Embeddings — Study Resources​

Embeddings — Practice​

Guardrails — What Is It?​

Guardrails — Study Resources​

Guardrails — Practice​

Context Windows — What Is It?​

Context Windows — Study Resources​

Context Windows — Practice​

RAG — What Is It?​

RAG — Study Resources​

RAG — Practice​

Evaluation — What Is It?​

Evaluation — Study Resources​

Evaluation — Practice​

How This Works

Knowledge Check

Final Advice

The 7 Elements

Prompt — What Is It?

Prompt — Study Resources

Prompt — Practice

LLMs — What Is It?

LLMs — Study Resources

LLMs — Practice

Embeddings — What Is It?

Embeddings — Study Resources

Embeddings — Practice

Guardrails — What Is It?

Guardrails — Study Resources

Guardrails — Practice

Context Windows — What Is It?

Context Windows — Study Resources

Context Windows — Practice

RAG — What Is It?

RAG — Study Resources

RAG — Practice

Evaluation — What Is It?

Evaluation — Study Resources

Evaluation — Practice