System Guide

The product experience is simple: “my assistant talks to my people so I don’t have to.” The technical reality is a reliable orchestration engine that can send messages, wait, retry tomorrow, and never lose track of what’s going on.

In Phase 1, we prove one loop works end-to-end (sitter scheduling) and build the system so it can expand to activities and healthcare later.

Built vs Not Built (Explicit Boundary)

The most important thing to understand about this MVP is what it does today and what it intentionally does not do yet. This keeps the pilot simple and prevents us from accidentally claiming “a full realtime voice agent” exists when it doesn’t.

Built Now

  • SMS-first assistant (Twilio inbound SMS webhook + outbound SMS)
  • Email outreach + inbound email replies (proxy-friendly)
  • Durable workflow engine (tasks, outreach, options, confirmation)
  • Safety rules: max 5 active tasks; max 1 awaiting-parent at a time
  • Voice: Twilio Voice scripted calling (availability + booking)
  • Voice: optional structured ingestion (POST /webhooks/voice/result)
  • Admin UI: create families/contacts/tasks, simulate voice results, start calls

Not Built Yet (By Design)

  • A streaming/realtime voice agent (Twilio Media Streams + OpenAI/Grok/etc.)
  • Calendar booking integrations (Google/iOS)
  • End-user login, dashboards, marketplace/discovery, payments/insurance
  • Formal HIPAA compliance program (BAAs, audits, policies)
The Phase 1 voice system is intentionally deterministic (scripted + rule-based parsing). A future “voice bridge” can replace the call logic entirely as long as it produces a structured result and posts it to POST /webhooks/voice/result.

The Core Loop

Every successful coordination assistant does the same repeating pattern. The system exists to remove friction: waiting, following up, summarizing options, and confirming.

flowchart LR
  A["1. Parent Request
Inbound SMS"] --> B["2. System Outreach
SMS or Email"] B --> C["3. Aggregate Options
YES/NO -> options list"] C --> D["4. Parent Choice
Reply 1/2/3"] D --> E["5. Confirmation
Notify winner + close"]
The most important engineering choice is durability: if your server restarts, if Twilio retries, if someone replies tomorrow, the workflow still completes.

What Runs Where

Think of this as three moving parts. The API handles incoming messages and sends outgoing messages. The worker handles delayed jobs. Postgres is the durable memory so the system doesn’t lose state.

flowchart TB
  subgraph Providers
    TwilioSMS["Twilio
SMS"] TwilioVoice["Twilio
Voice (calls + speech)"] Resend["Resend
Outbound Email"] Proxy["Email Proxy
(Gmail+Zapier/Make)"] VoiceBridge["Optional Voice Bridge
(LLM agent)"] end subgraph OurSystem["Family Coordination Assistant"] API["API (Fastify)
webhooks + orchestration
admin UI"] Worker["Worker (pg-boss)
delayed jobs + cleanup"] DB["Postgres
tasks + outreach + voice_jobs + logs"] end TwilioSMS -->|inbound webhook| API API -->|outbound SMS| TwilioSMS TwilioVoice -->|/webhooks/twilio/voice/*| API API -->|outbound email| Resend Proxy -->|POST /webhooks/email/inbound| API VoiceBridge -->|POST /webhooks/voice/result| API API <--> DB Worker <--> DB API -->|enqueue jobs| Worker

API

  • Receives inbound webhooks (SMS + email + voice)
  • Updates the workflow state in Postgres
  • Sends outbound SMS/email
  • Returns TwiML for Twilio Voice calls
  • Provides pilot admin UI

Worker

  • Runs delayed jobs (compile options, next-day retry)
  • Dials voice jobs (availability calls + booking calls)
  • Runs retention cleanup (30-day transcript deletion)
  • Keeps the “waiting” work off the API

The Workflow Engine (State Machine)

A “task” is one coordination job: “Find a sitter Fri 6–10.” The system moves a task through a small set of states so it stays reliable and doesn’t get confused.

stateDiagram-v2
  [*] --> intent_created

  intent_created --> collecting: outreach queued/sent
  intent_created --> awaiting_parent: missing info (time window / contacts)

  collecting --> options_ready: enough YES replies
  collecting --> collecting: keep waiting
  collecting --> cancelled: parent/admin cancels

  options_ready --> confirmed: parent chooses 1/2/3
  options_ready --> cancelled: parent/admin cancels

  awaiting_parent --> intent_created: parent provides missing info
  awaiting_parent --> cancelled: cancel

  confirmed --> [*]
  cancelled --> [*]
            

The key safety rule is: only one task can be awaiting_parent at a time. That prevents mixing answers across multiple open requests in the same SMS thread.

Data Model (Durable Memory)

You can think of Postgres as a set of durable lists that keep the system honest. These tables make the workflow restart-safe and debuggable.

erDiagram
  families ||--o{ family_authorized_phones : has
  families ||--o{ contacts : has
  families ||--o{ tasks : owns

  tasks ||--o{ task_outreach : sends
  tasks ||--o{ task_contact_responses : receives
  tasks ||--o{ task_options : aggregates
  tasks ||--o{ message_events : logs

  contacts ||--o{ task_outreach : targeted
  contacts ||--o{ task_contact_responses : replies
  contacts ||--o{ task_options : offered

  families {
    uuid id
    text assistant_phone_e164
    text display_name
    text timezone
  }

  contacts {
    uuid id
    text name
    text category
    text phone_e164
    text email
    text channel_pref
    bool sms_opted_out
    bool email_opted_out
  }

  tasks {
    uuid id
    text intent_type
    text status
    bool awaiting_parent
    text awaiting_parent_reason
    timestamptz requested_start
    timestamptz requested_end
    jsonb metadata
  }
            
Message transcripts are stored in message_events for support and debugging, but a daily job deletes anything older than 30 days (privacy).

Channels + Webhooks

The system supports two inbound channels today. Parents always talk to the assistant via SMS. Contacts can be reached by SMS or email, based on their channel_pref.

Inbound SMS (Twilio)

Twilio posts inbound messages to our API:

POST /webhooks/twilio/sms

The API then sends outbound SMS through Twilio’s REST API. Inbound webhook retries are deduped using (provider, provider_message_id).

Inbound Email (Proxy-friendly)

If you do not have a domain yet, you can still pilot email replies using a proxy mailbox (Gmail). Outbound email sets Reply-To to a plus-address that encodes the family id:

assistant+<familyId>@gmail.com

The proxy forwards replies to:

POST /webhooks/email/inbound
x-inbound-token: <INBOUND_EMAIL_TOKEN>

Voice Calling (Twilio Voice)

Phase 1 includes a deterministic voice flow: the worker places an outbound call via Twilio Voice, and Twilio hits our API for TwiML plus speech transcripts.

POST /webhooks/twilio/voice/answer
POST /webhooks/twilio/voice/gather
POST /webhooks/twilio/voice/status

The system converts the speech transcript into offered appointment slots (rule-based) and then prompts the parent by SMS.

Optional: if you later build a separate “voice bridge” (streaming agent), it can POST structured results to:

POST /webhooks/voice/result
x-inbound-token: <INBOUND_VOICE_TOKEN>

Minimal payload:

{
  "id": "provider-message-id",
  "provider": "twilio",
  "familyId": "<uuid>",
  "taskId": "<uuid>",
  "contactId": "<uuid>",
  "transcript": "Receptionist offered: Tue 3:30, Thu 4:15",
  "offeredSlots": [
    { "start": "2026-02-12T22:30:00.000Z", "end": "2026-02-12T23:15:00.000Z" }
  ]
}

Minimal Inbound Email Payload

{
  "id": "provider-message-id",
  "from": "Person <person@example.com>",
  "to": "assistant+<familyId>@gmail.com",
  "text": "YES"
}

The API extracts <familyId> from the recipient address and uses it to route the message to the correct family.

Safety + Reliability Rules

  • Max 5 active tasks per family: prevents overwhelm and complexity.
  • Max 1 awaiting-parent task: prevents mixing answers between tasks.
  • Per-family sequential processing: family row is locked while a message is processed.
  • Webhook dedupe: if Twilio/Resend retries, we do not double-process.
  • Next-day retry: for non-responders, we attempt again tomorrow.
  • 30-day retention: message logs are deleted after 30 days.
  • Parent commands: text STATUS to list active requests, or CANCEL to cancel the current request.
Healthcare note: Phase 1 is built to be “HIPAA-ready” in structure (durable logs, retention, access controls), but it is not “HIPAA-compliant” by default. Treat this as a scheduling coordinator and keep PHI out of messages until contracts and controls exist.

Code Map (Where Things Live)

This repo is intentionally small: one API process, one worker process, and one database. The most important logic is the orchestration layer that turns inbound messages into task state transitions.

src/
  index.ts                     API entrypoint (starts Fastify + pg-boss)
  worker.ts                    Worker entrypoint (job runners + schedules)

  http/
    buildServer.ts             Fastify wiring (routes + form parsing)
    routes/
      health.ts                GET /health
      twilioSms.ts             POST /webhooks/twilio/sms
      resendInbound.ts         POST /webhooks/resend/inbound + /webhooks/email/inbound
      voiceResult.ts           POST /webhooks/voice/result (structured voice results)
      twilioVoice.ts           POST /webhooks/twilio/voice/* (TwiML + speech)
      admin.ts                 JSON admin API (pilot)
      adminUi.ts               HTML admin UI (pilot)

  orchestrator/
    handleInboundSms.ts        Main SMS workflow (parent + contact paths)
    handleInboundEmail.ts      Email reply workflow (YES/NO + STOP/START)
    handleInboundVoiceResult.ts Voice result ingestion (offered slots + prompt parent)
    messaging.ts               Send-and-log wrappers (writes message_events)

  workers/
    sitterJobs.ts              compile options + next-day retry outreach
    voiceJobs.ts               dial voice_jobs (availability + booking)
    retentionCleanup.ts        deletes message_events older than 30 days

  db/
    pool.ts                    Postgres pool + transaction helper
    migrate.ts                 CLI to apply migrations
    migrations/
      001_init.sql             schema (families, contacts, tasks, etc.)
      002_indexes.sql          indexes + dedupe constraints
      003_voice.sql            voice: opt-out + indexes
      004_voice_jobs.sql        voice: durable outbound call jobs

  domain/
    parsing/                   rule-based parsers (time window, yes/no, offered slots, contact lists)
If you only read two files to understand the behavior, start with: src/orchestrator/handleInboundSms.ts and src/workers/sitterJobs.ts.

Pilot Ops (What You Do To Run This)

The system is designed so a pilot can run with minimal UI and no end-user login. Everything is controlled by one admin token.

Deploy (Railway)

  • API service: pnpm start
  • Worker service: pnpm start:worker
  • Postgres plugin

Required env vars include:

DATABASE_URL
ADMIN_TOKEN
TWILIO_ACCOUNT_SID
TWILIO_AUTH_TOKEN
PUBLIC_BASE_URL
TWILIO_VOICE_WEBHOOK_TOKEN
RESEND_API_KEY
EMAIL_FROM
EMAIL_REPLY_TO
INBOUND_EMAIL_TOKEN
INBOUND_VOICE_TOKEN

Operate (Admin UI)

Open:

/admin-ui

Auth:

  • Basic auth user admin
  • Password = ADMIN_TOKEN

Use it to:

  • Create a family and set timezone
  • Authorize parent phone(s)
  • Add contacts (SMS or email)
If you want an inbox-based proxy without a custom domain, use Gmail plus-addressing and Zapier/Make to forward replies to /webhooks/email/inbound. See docs/email-proxy-gmail.md in the repo.

Done vs Next

Done (Phase 1 MVP)

  • Sitter coordination loop (SMS parent, SMS/email contacts)
  • Voice calling loop (Twilio Voice scripted availability + booking)
  • Optional voice result ingestion webhook (structured offered slots) + admin simulation
  • Progressive onboarding (missing time window, missing contacts)
  • Durable workflow state in Postgres
  • Background retries + retention cleanup
  • Pilot admin UI + admin token auth
  • Tests: parsing, SMS routes, email routes, Twilio Voice routes, worker jobs

Next (Pilot Completion)

  • Railway deployment + env wiring
  • Twilio number + webhook configuration
  • Resend domain/sender setup (optional) + email proxy wiring
  • Pilot runbook execution with real phones and contacts
  • Optional: upgrade to a streaming voice agent (Media Streams + realtime model)