We built a 12-week AI literacy curriculum for our coaching engagements. We’re publishing the structure here because most SMBs don’t need to buy training — they need a curriculum and a willing internal champion.
The full structure below is the same one we run in our AI & Leadership Training cohorts. The key insight is that structure beats content quality — a mediocre curriculum run with cohort discipline produces better outcomes than a brilliant curriculum delivered as videos.
Who This Is For#
This curriculum is designed to be run by an SMB with:
- An internal champion (usually a senior engineer, EM, or technical product manager) willing to spend 4-6 hours/week facilitating
- 6-15 participants — non-technical and lightly technical managers in roles where AI workflows could plausibly help (sales ops, marketing ops, customer success, finance, HR)
- Buy-in from leadership for the time commitment (~4 hours/participant/week)
- A working organizational chat tool for async pair-work and a reasonable docs platform for assignments
If you don’t have all four, run a smaller version (e.g., 4 participants, 6 weeks) or hire an external program. Half-implementing the curriculum is worse than not running it.
The 12-Week Structure#
Weeks 1-2 — Production Prompt Engineering#
Learning goal: Move from “I can write a prompt that works” to “I can write a prompt that works reliably.”
Topics:
- Structured outputs (JSON schemas, response formatting, system messages vs user messages)
- Prompt versioning — treating prompts as code, not text
- Few-shot vs zero-shot — when each is right
- A simple eval harness: 20 golden inputs, expected outputs, regression-style testing of prompt changes
Weekly exercise: Pick one task you do manually that involves text. Write a prompt that automates it. Build a 10-input eval set. Iterate the prompt until it passes 90%+ of the eval cases.
Anti-pattern to flag: Prompt-engineering as art. The whole point of moving past “I can write a prompt that works” is to make prompts reproducible and testable.
Weeks 3-4 — Retrieval Architecture (RAG)#
Learning goal: Understand what retrieval-augmented generation actually does, why naive implementations fail, and how to evaluate retrieval quality independently of generation quality.
Topics:
- Vector vs lexical search vs hybrid
- Chunking strategies: by token count, by semantic boundary, by document structure
- Retrieval evaluation: precision-at-k, recall-at-k, MRR
- The most common naive-RAG failure modes (low chunk diversity, embedding model mismatch, retrieval recall masking generation issues)
Weekly exercise: Set up a basic RAG over your team’s internal docs. Build a 30-question eval set. Measure retrieval quality independently of answer quality.
Weeks 5-7 — Agentic Workflows#
Learning goal: Know when agents are the right pattern and when they aren’t. Design tool-use that doesn’t spiral.
Topics:
- Decision tree: when to use a single LLM call, when to use chains, when to use agents
- Tool-use design — granularity, naming, error handling
- Orchestration patterns (linear, branching, with-human-in-the-loop)
- Fallback paths and kill-switches
- Cost ceilings: per-task, per-day, per-tenant
Weekly exercise: Convert one of your week 1-2 workflows into an agentic version with 2-3 tools. Identify three failure modes. Implement fallbacks.
Required reading: Two case studies from our coaching alumni — one workflow that worked (with metrics) and one we killed (with the kill criteria that triggered).
Weeks 8-9 — Evaluation Harnesses#
Learning goal: Build an eval harness that scales beyond the demo and runs in CI.
Topics:
- Golden sets vs synthetic sets vs production-traffic replay
- Eval-as-CI: regression detection on every prompt change
- Offline (curated set) vs online (live A/B) evaluation
- Accuracy metrics (BLEU, exact match, embedding similarity) vs outcome metrics (task completion, time-to-resolution, business KPI)
Weekly exercise: Take the workflow from week 5-7 and instrument an eval harness in CI. Demonstrate that a deliberate prompt regression triggers a CI failure.
Weeks 10-11 — Governance and Compliance#
Learning goal: Build production AI workflows that survive a SOC 2 / ISO 27001 / EU AI Act audit.
Topics:
- PII detection and redaction in prompts
- Prompt-injection defense (input filtering, output validation)
- Audit logging: what to log, what NOT to log, retention policy
- EU AI Act risk classifications and which apply to typical SMB workflows
- SOC 2 controls relevant to AI-touching systems
Weekly exercise: Run a tabletop audit on the workflow from week 5-7. Identify what would fail an EU AI Act limited-risk classification check. Fix it.
Week 12 — Capstone#
Learning goal: Ship one production-grade workflow with measurable outcomes.
Each participant presents to leadership:
- Their workflow
- The eval harness
- The cost-per-task projection
- The governance posture
- A 90-day measurement plan
Capstone presentations should be attended by the executive sponsor of the program. This produces visible commitment that is the single biggest factor in whether the workflows survive the next 6 months.
What to Skip From This Curriculum#
If 12 weeks is too long, here’s the priority order for cuts:
- Skip last: Weeks 10-11 (governance). Skipping this produces compliance incidents and is the easiest way to make the entire investment pay negative returns.
- Skip second-last: Week 8-9 (evaluation). Without evaluation, no workflow improves over time.
- Skip first if necessary: Weeks 5-7 (agents). Many SMB use cases work fine with non-agentic single-call patterns. You can always run agents in a follow-up program.
Common Failure Modes#
In our coaching engagements we see four recurring patterns:
- The capstone gets cancelled. Schedule pressure mounts; the capstone gets postponed; participants don’t ship; the program quietly fails. Pre-commit the capstone date to leadership at week 1.
- Pair-work doesn’t happen. The async exercise pairing is essential — solo learning compounds slower. Track participation; reach out at week 3 if pair sessions aren’t happening.
- Senior managers exempt themselves. “I’ll let my team go through the program; I’ll skim the materials.” This kills the program’s authority signal. If senior people don’t participate, junior participants learn the program isn’t real.
- No measurement plan beyond capstone. Without a 90-day post-program measurement, you have no idea whether the workflows survived contact with reality. Run a 90-day check-in mandatorily.
Materials and Templates#
We maintain a repository of materials we use in our coached cohorts:
- Weekly assignment template
- Eval harness starter (Python and Node)
- Capstone presentation template
- 90-day measurement template
- Participant handbook
Related Reading#
- AI Training Programs for Non-Technical SMB Managers (2026) — if you’d rather buy than build.
- Claude vs ChatGPT for SMB Operations — the vendor module of week 1.
- Pro Bono AI Consulting — if you’re a Polish non-profit or education organization, ask about our pro bono cohort.