Adult-son mode, Antoine is 25, never child pacing. SKILL.md principle 10: 'treating him as a child was the worst regression of the old skill.'
Bar we are happy with
Grown-up pacing and interests, loose tempo, pubs/food/history/roots, zero child-led structure (no '2-3 anchors max, rest/snacks/toilet breaks').
Real output
Adult tempo throughout: 'The pace is loose. The point is to notice things.' Pub lunch, real ales, Turkish dinner corridor, architecture. No child pacing anywhere.
Other modes (solo, +Hava, family, adult+child, general)
PARTIAL
What I wanted
Distinct biases per mode: Boyd solo scenic/photo/hidden, +Hava couple rhythm, adult+child child-led ONLY when a real child confirmed, general mode no personal leakage.
Bar we are happy with
Each mode produces a visibly different plan shape; general mode shows zero Boyd/Hava/Antoine leakage.
Real output
Boyd solo guide ('The Ghost Line to Golden Hour') now live. DOM-verified: 21 images, 0 em-dashes, golden-hour engine (21:10 sunset, 20:15 golden hour, 22:08 civil twilight), solo language throughout ('Boyd, Solo', 'Made for one person', 'Nobody else's pace matters today'), green/forest palette (accent #8aa36b) visibly distinct from Antoine amber/gold (#c8a97a), hidden/unusual bias (Spriggan, abandoned platforms, Parkland Walk ghost railway), cheap food spine (Dunns sausage roll, roast pork bap), pubs as route rewards (The Flask, Southampton Arms). Plan shape is clearly solo-biased vs Antoine adult-son guide. Partial: +Hava and general-traveller modes still not exercised.
SKILL.md pre-plan gates: ask 'Where are you based / coming from?' and 'Where are you sleeping?' BEFORE building any itinerary. 'Do NOT skip these and then hand-wave routing.' Register §11 names this an explicit live failure.
Bar we are happy with
Origin and base either established up front, OR explicitly labelled 'assumed' / 'origin unknown, can't plan on-the-way stops' in the guide.
Real output
Styled 'Planning assumptions' box near top: base/origin ASSUMED Finsbury Park, with 'If you are sleeping somewhere else, in central London, near Antoine's place... tell me and I will re-route.' Tagged logistics.
Mood-first discovery, 2-3 vivid option cards before interrogation, 'questionnaire goblin' banned, one-clarifier rule when mode ambiguous.
Bar we are happy with
A real discovery turn: emotional job asked first, 2-3 starter cards offered before deep planning.
Real output
The artifact is a finished guide, not the discovery conversation that preceded it.
Explicit decision-variable model
PASS
What I wanted
BB verbal (session 20260602_011336, msg 38885): 'what are the variables that determine how this consultant knows which path to choose. there must be a bunch of variables.' The pipeline accepted this and flagged 'New Stage 6 artifact needed: a decision-model.md reference file' (traveller mode, mood, energy, novelty appetite, friction tolerance, food/photo/nature priority, budget, weather, daylight, transport, base, crowd tolerance, booking lead, hard-nos). Never built.
Bar we are happy with
A decision-model.md reference exists mapping the variables to plan-shape decisions, and the model demonstrably routes a plan (e.g. high heat + photo priority => golden-hour route).
Real output
2026-06-10: references/decision-model.md written: who/state/priorities/conditions/logistics variables mapped to plan-shape decisions, conflict rules (hard no-gos > logistics reality > mode defaults > priorities; collide => sequence, don't average), SKILL.md pointer added. Demonstrably routes the live guide: photo priority + June daylight => the Chasing the Light eat-early re-sequence; base unknown => labelled assumption + re-route offer; worked example in the file maps each variable to the live guide.
A run of rounds with scored cards, a clear target profile, and a save-gate before any vault write.
Real output
A day-guide does not run Taste Deck.
Image/thumbnail layer on Taste Deck cards
NOT EXERCISED
What I wanted
Register §3: cards are text-only, 'should show a representative image per card' (flagged missing Stage 5).
Bar we are happy with
Each Taste Deck card carries a representative image.
Real output
No Taste Deck in this artifact.
4. Personal profiles (vault)
Read vault profile at start of personal-mode work
NOT EXERCISED
What I wanted
Read 60-Resources/Travel-Profiles/ profile at start of Boyd/Hava/Antoine work; quote-file-why-approve save gate; post-trip feedback ritual.
Bar we are happy with
Evidence the relevant profile was read and used quietly, plus a save-gate on any update.
Real output
Profile read/write happens in the planning loop, not visible in a static guide. Profiles are also still stubs (register §4).
Heat/shade tolerance preference capture
NOT EXERCISED
What I wanted
Register §4 roadmap: heat/shade tolerance (27-29C+ trigger) defined in pipeline, belongs in Boyd.md, not yet captured.
Bar we are happy with
Boyd.md carries a heat/shade tolerance field used when routing in hot weather.
Real output
Not buildable from a UK day-guide; needs profile work.
5. Research
NotebookLM default for serious new destinations
NOT EXERCISED
What I wanted
NotebookLM by default for serious new-destination planning (synthesis), official sources for truth; 10-50 curated source set; light-ask carve-out.
Bar we are happy with
For an unfamiliar multi-day destination: a built source set and NotebookLM synthesis, verified separately against official sources.
Real output
London is a known deep-knowledge market and this is a light same-area ask, so NotebookLM correctly not triggered. Cannot judge the research engine from this artifact.
Cheap-subagent image search
NOT EXERCISED
What I wanted
Register §5 roadmap: 'use cheap subagent/model for image search', never wired.
Bar we are happy with
An image-search step that curates real, location-matched images via a cheap model.
Real output
Not wired; images in this guide are hotlinked stock, not curated by a search step.
Local-language forum / source scan
NOT EXERCISED
What I wanted
BB verbal (session 20260602_011336, msg 38885): 'it can read local forums in that language... whereas maybe all the English sites that mention a place is showing that it's open, if you go into the Thai forums for that... they're all saying everything's closed.' Read local-language forums/blogs/notices, translate, summarise, label confidence, verify against official sources. Register §5 names it but 'weakly enforced'; never had a compliance row.
Bar we are happy with
For a non-English destination, the guide surfaces at least one local-language source insight (closure, scam, better entrance, seasonal note) the English web missed, translated and confidence-labelled.
Real output
London is English-language; no local-language scan needed for this artifact. Needs a foreign-language destination guide to judge.
6. Storytelling & voice
Private-guide voice, anti-beige
PASS
What I wanted
Sell the experience before scheduling; private-guide tone; anti-beige, anti-Wikipedia, anti-top-10; opinionated, rank don't dump.
Bar we are happy with
Vivid, opinionated, emotionally-led prose that reads like a private guide, not an itinerary bot.
Real output
Strong throughout: 'This isn't sightseeing. This is a homecoming.' / 'It should not work. It works.' Opinionated, warm, no top-10 dump.
Register §7 'biggest live failure'. SKILL.md hard rule: every recommendation names a specific place with a reason. BANNED: 'find a table somewhere', 'grab lunch around here'.
Bar we are happy with
Every food/cafe/pub/shop is a named venue with a reason; zero 'somewhere around here' gestures.
Real output
Zero hand-waves (grep verified). 'Selale, Gokyuzu... lamb shish or a mixed grill.' Every rec names a place + reason.
Risks tagged from the taxonomy where relevant; any quoted review cited to its source.
Real output
Watch-out taxonomy used in cards ('Crowd / queue risk (crowd, timing): garden fills fast'). Cited snippet: Londonist reckons Selale's lamb shish 'might just be the best on the whole street'.
BB verbal (session 20260602_011336, msg 38885): 'whenever that sun's going down or whenever the golden hour is, are going to be ideal photo opportunities... So also thinking about those things.' Plan routes to END at the viewpoint at golden hour; check sunrise/sunset/blue hour, direction, cloud risk, safe return after dark. Applies generally, not just Boyd. Currently only lives as a Boyd-solo mode-bias line, never a measured feature.
Bar we are happy with
Where a photo payoff exists, the guide gives actual golden-hour / sunset timing for the day and routes the stop to land in that window, not just a generic 'nice in afternoon light' note.
Real output
2026-06-10, judged vs live DOM: 'Chasing the Light' section gives real June 2026 Southgate times (sunset 21:19-21:24, golden hour from ~20:20, civil twilight ends ~22:05, source sunrise-sunset.org, verify link to timeanddate) and routes TO the light: eat-early-at-18:00 re-sequence putting you at the Arnos viaduct inside the 20:20-21:20 window, with park-locking caveat. Times also wired into Leg 3 card, evening time block, and Check Before You Go.
Register §8: compare bases/destinations before day-by-day; accommodation-anchored routing; transport-mode architecture; map-aware day cards (roadmap).
Bar we are happy with
For an open trip: 2-3 base/destination options compared with who-each-suits before any day plan.
Real output
Single base, single day. Architecture mode cannot show here.
On-the-way intelligence
PASS
What I wanted
Register §8 + SKILL.md: when origin is known, proactively surface one genuinely interesting stop between origin and the day's first anchor. BB's explicit complaint that this was not built.
Bar we are happy with
One real on-the-way stop tied to an established origin, OR an explicit note that origin is unknown so on-the-way can't be planned.
Real output
Faltering Fullback framed as on-the-way stop from the assumed base, with note it changes if origin changes ('as if you start the morning here').
Area intelligence (what's genuinely interesting around X)
PASS
What I wanted
Register §9 + SKILL.md: for any anchor surface local character, interesting nearby misses, history/oddity/food cluster, what changed recently. The Antoine guide 'had nothing on Southgate/Conway Rd beyond walk the street'.
Bar we are happy with
Every anchor area gets real what's-there-now / history / what-changed depth, at Southgate-Station level.
Real output
Real area intelligence: Arnos Park Victorian viaduct over Pymmes Brook, Broomfield walled garden C16-C18 listed walls, James I hunting-lodge history, Holden station depth.
Register §9 + SKILL.md: when a place has personal meaning, surface what's still there vs changed, what's worth re-experiencing, pair familiar with surprising. 'Antoine born in Southgate, grew up Conway Rd' currently hand-waved.
Bar we are happy with
Concrete then-vs-now for the personal street: named landmarks of childhood, specific changes, a real reason to walk it.
Real output
Roots grounded in verifiable fact, not vibes: 'Southgate is an established North London suburb in N14 that grew up around the 1933 Piccadilly line extension, with Holden's station as its civic centrepiece.'
Register §7/§9: pair the obvious with an unusual alternative; judge by genuine-interest rubric (ruins, viaducts, weird collections over plaques/monuments).
Bar we are happy with
At least one genuine hidden-gem pairing surfaced over the obvious pick.
Real output
Broomfield walled kitchen garden and the Arnos Park Victorian railway viaduct are real hidden-gem picks: 'a park most people don't know about'.
Contract lines 150-159 + principle 7: prices, hours, event times, ticket rules, closures, transport need official/current source links. 'Do not freeze stale prices as fact.'
Bar we are happy with
Every operational fact (hours, prices, transport times) carries a source/verify link.
Real output
22 source links incl official: falteringfullback.com, TfL journey planner, Wikipedia Southgate station, Enfield Council + GoParks Broomfield, Londonist/Eater Green Lanes, Met Office. Corrected the wrong pub hours (now 12:00 daily, was 'weekends 11am').
Register §10 'single biggest miss' + contract image workflow. NOTE: Boyd is fine with hotlinks / any source for personal use, so image SOURCE is not a failure; only RELEVANCE/specificity is the gap.
Bar we are happy with
Images depict the actual named places with specific alts, not generic stock.
Real output
2026-06-10, judged vs live DOM: 18 images, all named-place specific (Fullback frontage x2, Holden drum x3, Conway/Minchenden streets x3, Broomfield walled garden + conservatory, Green Lanes street level, Arnos viaduct x2), every alt names the place. Browser-validated: 18/18 naturalWidth > 0, none broken. Source is Commons/official hotlinks, which Boyd accepts for personal use; relevance bar met.
Contract mobile CSS: max ~760px, rounded cards, tap-friendly links, high contrast, no wide tables.
Bar we are happy with
Renders clean and readable on a phone, narrow, high contrast.
Real output
2026-06-10: verified in a real browser at 320, 360 and 390px layout widths WITH the defensive overflow-x guard disabled: zero elements extend past the viewport, scrollWidth equals viewport at all three widths. The earlier wobble culprit no longer exists (gallery grid collapses to single column under 560px); guard retained as belt-and-braces only.
Register §10: event cards with thumbnails, official links, fit reasons, crowd/effort, ticket caveats.
Bar we are happy with
Where a time-bound event applies: an event card with official link and ticket caveat.
Real output
No time-bound event in this day; nothing to render an event card against.
Visual richness (multiple images per location)
PASS
What I wanted
BB verbal gold-standard, NOT previously in contract: the guide must feel like an enticing magazine, not a blog post. Multiple images per location (several per stop), roughly 10+ images minimum across the guide, real photos that connect to the specific content of each section. NO placeholders. Each location visually sold.
Bar we are happy with
Every major stop has 2+ relevant images woven into its content (not one token header shot). Total 10+ images. Images map to what the text describes. A reader is enticed by each location visually before reading a word.
Real output
2026-06-10, judged vs live DOM: 18 images total, woven as duo/trio galleries plus inline shots, not token headers. Per major stop: Finsbury Park/Fullback 3, Southgate station 3, Conway Road streets 3, Broomfield 3, Green Lanes 2, Arnos viaduct 2, plus hero and evening closer. Each location visually sold before its text; reads magazine, not blog.
A guide that moves through the lifecycle with a near-departure re-check and a post-trip feedback capture.
Real output
Single draft snapshot; lifecycle and salvage mode play out over a trip, not in one static guide.
How to read status:pass clears the bar · partial present but weak · fail should be here, absent or hand-waved · not exercised needs a different test artifact.