System Guide
The product experience is simple: “my assistant talks to my people so I don’t have to.” The technical reality is a reliable orchestration engine that can send messages, wait, retry tomorrow, and never lose track of what’s going on.
In Phase 1, we prove one loop works end-to-end (sitter scheduling) and build the system so it can expand to activities and healthcare later.
Built vs Not Built (Explicit Boundary)
The most important thing to understand about this MVP is what it does today and what it intentionally does not do yet. This keeps the pilot simple and prevents us from accidentally claiming “a full realtime voice agent” exists when it doesn’t.
Built Now
- SMS-first assistant (Twilio inbound SMS webhook + outbound SMS)
- Email outreach + inbound email replies (proxy-friendly)
- Durable workflow engine (tasks, outreach, options, confirmation)
- Safety rules: max 5 active tasks; max 1 awaiting-parent at a time
- Voice: Twilio Voice scripted calling (availability + booking)
- Voice: optional structured ingestion (POST /webhooks/voice/result)
- Admin UI: create families/contacts/tasks, simulate voice results, start calls
Not Built Yet (By Design)
- A streaming/realtime voice agent (Twilio Media Streams + OpenAI/Grok/etc.)
- Calendar booking integrations (Google/iOS)
- End-user login, dashboards, marketplace/discovery, payments/insurance
- Formal HIPAA compliance program (BAAs, audits, policies)
The Core Loop
Every successful coordination assistant does the same repeating pattern. The system exists to remove friction: waiting, following up, summarizing options, and confirming.
flowchart LR A["1. Parent Request
Inbound SMS"] --> B["2. System Outreach
SMS or Email"] B --> C["3. Aggregate Options
YES/NO -> options list"] C --> D["4. Parent Choice
Reply 1/2/3"] D --> E["5. Confirmation
Notify winner + close"]
What Runs Where
Think of this as three moving parts. The API handles incoming messages and sends outgoing messages. The worker handles delayed jobs. Postgres is the durable memory so the system doesn’t lose state.
flowchart TB
subgraph Providers
TwilioSMS["Twilio
SMS"]
TwilioVoice["Twilio
Voice (calls + speech)"]
Resend["Resend
Outbound Email"]
Proxy["Email Proxy
(Gmail+Zapier/Make)"]
VoiceBridge["Optional Voice Bridge
(LLM agent)"]
end
subgraph OurSystem["Family Coordination Assistant"]
API["API (Fastify)
webhooks + orchestration
admin UI"]
Worker["Worker (pg-boss)
delayed jobs + cleanup"]
DB["Postgres
tasks + outreach + voice_jobs + logs"]
end
TwilioSMS -->|inbound webhook| API
API -->|outbound SMS| TwilioSMS
TwilioVoice -->|/webhooks/twilio/voice/*| API
API -->|outbound email| Resend
Proxy -->|POST /webhooks/email/inbound| API
VoiceBridge -->|POST /webhooks/voice/result| API
API <--> DB
Worker <--> DB
API -->|enqueue jobs| Worker
API
- Receives inbound webhooks (SMS + email + voice)
- Updates the workflow state in Postgres
- Sends outbound SMS/email
- Returns TwiML for Twilio Voice calls
- Provides pilot admin UI
Worker
- Runs delayed jobs (compile options, next-day retry)
- Dials voice jobs (availability calls + booking calls)
- Runs retention cleanup (30-day transcript deletion)
- Keeps the “waiting” work off the API
Recommended Architecture (How This Scales)
The long-term goal is to support many coordination “domains” (sitters, activities, clinics) without turning the system into a fragile black box. The recommended approach is to keep one reliable “coordination core” and treat each communication method (SMS, email, voice) as a swappable adapter.
flowchart TB
subgraph Core["Coordination Core (this repo)"]
Inbound["Inbound Webhooks
SMS / Email / Twilio Voice / Voice-Result"]
Orchestrator["Orchestrator
state machine + safety rules"]
Outbound["Outbound Messaging
SMS + Email + Voice"]
Jobs["Background Jobs
retry + compile + retention"]
Store["Postgres
durable lists (tasks, contacts, logs)"]
Inbound --> Orchestrator
Orchestrator --> Store
Orchestrator --> Outbound
Jobs --> Store
end
subgraph Voice["Voice Bridge (Phase 1 in-core; separable later)"]
Dial["Telephony (Twilio Voice)
places phone calls"]
Agent["Scripted Call Flow (Phase 1)
or LLM agent (future)"]
Extract["Slot Extractor
turns conversation into offeredSlots[]"]
Dial --> Agent --> Extract
Extract -->|"POST /webhooks/voice/result"| Inbound
end
Parent["Parent (SMS)"] -->|"texts assistant"| Inbound
Outbound -->|"options + confirmations"| Parent
Clinic["Clinic Reception"] <-->|"phone call"| Dial
Why Keep a “Coordination Core”
- One durable source of truth (tasks + options) prevents confusion.
- Safety rules stay consistent across channels (no mixed requests).
- Retries and dedupe are centralized (providers will retry webhooks).
- You can swap providers without rewriting the workflow logic.
Why Voice Can Be a Separate “Bridge”
- Phone calls are long-running and failure-prone (hold music, transfers).
- Audio + transcripts have higher privacy risk than SMS scheduling texts.
- You can iterate on the call script/agent without risking the core system.
- The bridge can be replaced (Twilio+Grok, Twilio+OpenAI, etc.).
Deeper design doc: docs/recommended-architecture.md
The Workflow Engine (State Machine)
A “task” is one coordination job: “Find a sitter Fri 6–10.” The system moves a task through a small set of states so it stays reliable and doesn’t get confused.
stateDiagram-v2
[*] --> intent_created
intent_created --> collecting: outreach queued/sent
intent_created --> awaiting_parent: missing info (time window / contacts)
collecting --> options_ready: enough YES replies
collecting --> collecting: keep waiting
collecting --> cancelled: parent/admin cancels
options_ready --> confirmed: parent chooses 1/2/3
options_ready --> cancelled: parent/admin cancels
awaiting_parent --> intent_created: parent provides missing info
awaiting_parent --> cancelled: cancel
confirmed --> [*]
cancelled --> [*]
The key safety rule is: only one task can be awaiting_parent at a time. That prevents mixing answers across multiple open requests in the same SMS thread.
Data Model (Durable Memory)
You can think of Postgres as a set of durable lists that keep the system honest. These tables make the workflow restart-safe and debuggable.
erDiagram
families ||--o{ family_authorized_phones : has
families ||--o{ contacts : has
families ||--o{ tasks : owns
tasks ||--o{ task_outreach : sends
tasks ||--o{ task_contact_responses : receives
tasks ||--o{ task_options : aggregates
tasks ||--o{ message_events : logs
contacts ||--o{ task_outreach : targeted
contacts ||--o{ task_contact_responses : replies
contacts ||--o{ task_options : offered
families {
uuid id
text assistant_phone_e164
text display_name
text timezone
}
contacts {
uuid id
text name
text category
text phone_e164
text email
text channel_pref
bool sms_opted_out
bool email_opted_out
}
tasks {
uuid id
text intent_type
text status
bool awaiting_parent
text awaiting_parent_reason
timestamptz requested_start
timestamptz requested_end
jsonb metadata
}
Channels + Webhooks
The system supports two inbound channels today. Parents always talk to the assistant via SMS. Contacts can be reached by SMS or email, based on their channel_pref.
Inbound SMS (Twilio)
Twilio posts inbound messages to our API:
POST /webhooks/twilio/sms
The API then sends outbound SMS through Twilio’s REST API. Inbound webhook retries are deduped using (provider, provider_message_id).
Inbound Email (Proxy-friendly)
If you do not have a domain yet, you can still pilot email replies using a proxy mailbox (Gmail). Outbound email sets Reply-To to a plus-address that encodes the family id:
assistant+<familyId>@gmail.com
The proxy forwards replies to:
POST /webhooks/email/inbound
x-inbound-token: <INBOUND_EMAIL_TOKEN>
Voice Calling (Twilio Voice)
Phase 1 includes a deterministic voice flow: the worker places an outbound call via Twilio Voice, and Twilio hits our API for TwiML plus speech transcripts.
POST /webhooks/twilio/voice/answer
POST /webhooks/twilio/voice/gather
POST /webhooks/twilio/voice/status
The system converts the speech transcript into offered appointment slots (rule-based) and then prompts the parent by SMS.
Optional: if you later build a separate “voice bridge” (streaming agent), it can POST structured results to:
POST /webhooks/voice/result
x-inbound-token: <INBOUND_VOICE_TOKEN>
Minimal payload:
{
"id": "provider-message-id",
"provider": "twilio",
"familyId": "<uuid>",
"taskId": "<uuid>",
"contactId": "<uuid>",
"transcript": "Receptionist offered: Tue 3:30, Thu 4:15",
"offeredSlots": [
{ "start": "2026-02-12T22:30:00.000Z", "end": "2026-02-12T23:15:00.000Z" }
]
}
Minimal Inbound Email Payload
{
"id": "provider-message-id",
"from": "Person <person@example.com>",
"to": "assistant+<familyId>@gmail.com",
"text": "YES"
}
The API extracts <familyId> from the recipient address and uses it to route the message to the correct family.
Safety + Reliability Rules
- Max 5 active tasks per family: prevents overwhelm and complexity.
- Max 1 awaiting-parent task: prevents mixing answers between tasks.
- Per-family sequential processing: family row is locked while a message is processed.
- Webhook dedupe: if Twilio/Resend retries, we do not double-process.
- Next-day retry: for non-responders, we attempt again tomorrow.
- 30-day retention: message logs are deleted after 30 days.
- Parent commands: text STATUS to list active requests, or CANCEL to cancel the current request.
Code Map (Where Things Live)
This repo is intentionally small: one API process, one worker process, and one database. The most important logic is the orchestration layer that turns inbound messages into task state transitions.
src/
index.ts API entrypoint (starts Fastify + pg-boss)
worker.ts Worker entrypoint (job runners + schedules)
http/
buildServer.ts Fastify wiring (routes + form parsing)
routes/
health.ts GET /health
twilioSms.ts POST /webhooks/twilio/sms
resendInbound.ts POST /webhooks/resend/inbound + /webhooks/email/inbound
voiceResult.ts POST /webhooks/voice/result (structured voice results)
twilioVoice.ts POST /webhooks/twilio/voice/* (TwiML + speech)
admin.ts JSON admin API (pilot)
adminUi.ts HTML admin UI (pilot)
orchestrator/
handleInboundSms.ts Main SMS workflow (parent + contact paths)
handleInboundEmail.ts Email reply workflow (YES/NO + STOP/START)
handleInboundVoiceResult.ts Voice result ingestion (offered slots + prompt parent)
messaging.ts Send-and-log wrappers (writes message_events)
workers/
sitterJobs.ts compile options + next-day retry outreach
voiceJobs.ts dial voice_jobs (availability + booking)
retentionCleanup.ts deletes message_events older than 30 days
db/
pool.ts Postgres pool + transaction helper
migrate.ts CLI to apply migrations
migrations/
001_init.sql schema (families, contacts, tasks, etc.)
002_indexes.sql indexes + dedupe constraints
003_voice.sql voice: opt-out + indexes
004_voice_jobs.sql voice: durable outbound call jobs
domain/
parsing/ rule-based parsers (time window, yes/no, offered slots, contact lists)
Pilot Ops (What You Do To Run This)
The system is designed so a pilot can run with minimal UI and no end-user login. Everything is controlled by one admin token.
Deploy (Railway)
- API service: pnpm start
- Worker service: pnpm start:worker
- Postgres plugin
Required env vars include:
DATABASE_URL
ADMIN_TOKEN
TWILIO_ACCOUNT_SID
TWILIO_AUTH_TOKEN
PUBLIC_BASE_URL
TWILIO_VOICE_WEBHOOK_TOKEN
RESEND_API_KEY
EMAIL_FROM
EMAIL_REPLY_TO
INBOUND_EMAIL_TOKEN
INBOUND_VOICE_TOKEN
Operate (Admin UI)
Open:
/admin-ui
Auth:
- Basic auth user admin
- Password = ADMIN_TOKEN
Use it to:
- Create a family and set timezone
- Authorize parent phone(s)
- Add contacts (SMS or email)
Done vs Next
Done (Phase 1 MVP)
- Sitter coordination loop (SMS parent, SMS/email contacts)
- Voice calling loop (Twilio Voice scripted availability + booking)
- Optional voice result ingestion webhook (structured offered slots) + admin simulation
- Progressive onboarding (missing time window, missing contacts)
- Durable workflow state in Postgres
- Background retries + retention cleanup
- Pilot admin UI + admin token auth
- Tests: parsing, SMS routes, email routes, Twilio Voice routes, worker jobs
Next (Pilot Completion)
- Railway deployment + env wiring
- Twilio number + webhook configuration
- Resend domain/sender setup (optional) + email proxy wiring
- Pilot runbook execution with real phones and contacts
- Optional: upgrade to a streaming voice agent (Media Streams + realtime model)