Voice Agents Need Receipts

RecallMEM now has a memory-backed voice agent.

RecallMEM can talk now.

And no, I do not mean "press a mic button, transcribe one sentence, and send it as text." That is useful. It is not a voice agent. It is a keyboard with extra steps.

RecallMEM now has a real voice-agent mode built on Deepgram's Voice Agent WebSocket API.

The important part is not just voice. It is voice with memory.

Voice Tests Memory

The browser connects to Deepgram with a temporary token. The long-lived Deepgram API key stays on the server, which creates a short-lived browser token through Deepgram's grant endpoint. Then the browser opens the agent socket with that token.

wss://agent.deepgram.com/v1/agent/converse

The stack is straightforward: Deepgram Nova-3 listens, the LLM thinks, and Deepgram Aura speaks back. Audio streams in as 16kHz linear PCM and comes back as 24kHz linear PCM.

The important part is that voice uses the same memory layer as text chat. When the agent needs context, it calls RecallMEM's memory endpoint. The app searches facts and transcript chunks, mixes semantic matches with exact text matches, and returns only the relevant context.

This next part matters: the voice model does not get direct database access. It cannot mutate memory directly. Same rule as chat: the model does not own memory.

That matters more for voice than text. Typing gives you time to be precise. Voice is messy. People say "that thing from yesterday," "the same project," "what was that error again," and expect the agent to follow immediately.

A memory system that only works for clean typed prompts is not enough.

Voice turns also save back into the current chat, the same way typed turns do. Talking to the agent should not become a dead-end session like it does in a lot of voice apps. The transcript can feed the same future memory extraction, retrieval, and profile updates.

I also added the mute that actually matters: muting your own mic. The agent can keep speaking without getting interrupted by a loud room, a side conversation, echoes, or hearing itself. That sounds small. It is the difference between a voice demo and something you can use in public.

Once voice worked, the next problem got sharper. A voice agent with bad memory is worse than a text box with bad memory. It can sound natural while being wrong.

So the memory layer had to get stricter.

No quote, no memory.

Valid JSON Is Not Proof

RecallMEM already had a hard boundary around memory.

The LLM does not query Postgres. It does not browse pgvector. It does not write rows. TypeScript does the database work. TypeScript builds the context before the model even has a chance to answer.

Writes work the same way. A background model can propose candidate memories, but it does not write them directly into the database. The app validates, deduplicates, categorizes, embeds, and only then inserts.

Before, the extractor could return this:

{
  "facts": [
    "User prefers concise technical explanations."
  ]
}

That is valid JSON. It fits the shape. But it is not proof.

Now the extractor has to return a receipt:

{
  "facts": [
    {
      "text": "User prefers concise technical explanations.",
      "quote": "I prefer concise technical explanations."
    }
  ]
}

Then TypeScript checks the quote against the transcript. If the quote is not there, the memory is rejected.

This does not prove the user will always prefer concise explanations. It proves something narrower and more useful: the memory is supported by the conversation.

The model can still hallucinate. The change is that a hallucination cannot write itself into the database just because it was shaped correctly. That is the whole point.

Better Failure Modes Matter

This makes memory stricter.

RecallMEM may reject a real memory if the extractor fails to include the supporting quote. I am fine with that.

Missing a memory is annoying, but storing a fake one is worse.

If the LLM forgets something, I can tell it again. If it stores something false, that fake fact can quietly become part of future answers.

So the system now prefers a boring miss over a confident lie.

Time Is Part Of Memory

Receipts fix fake facts. Dates fix stale facts.

This looks harmless:

User: My trial renews in 2 weeks.

Before, the app could store this:

User's trial renews in 2 weeks.

That memory gets worse after every session.

Now the extractor gets the conversation date and has to ground relative time. If the conversation happened on 2026-05-23, the fact should become:

User said on 2026-05-23 that their trial renews around 2026-06-06.

If it stays as "in 2 weeks," it gets rejected.

Vectors Are Not Enough

I also added hybrid retrieval.

pgvector is good at meaning. If I ask about "that PDF upload bug," it can find old debugging notes even if I do not remember the exact error.

But exact strings are different. Model IDs, branch names, error messages, prices, migrations, and weird product names are not vibes.

Cannot find module 'pdf.worker.mjs'

That should be found because the exact string exists somewhere.

So RecallMEM now combines semantic retrieval with Postgres text search over facts, transcript chunks, and receipt quotes.

The voice agent uses the same upgrade.

Less Magic. More Receipts.

Here is the short version of what changed:

Change Why
Quote-backed facts Reject unsupported memories
Grounded dates Stop relative time from rotting
Hybrid retrieval Find meaning and exact strings
Receipt-backed memory page Let users inspect why a fact exists
Voice memory upgrade Give voice the same recall path

None of this makes memory feel more magical.

That is the point.

I want memory to be boring, inspectable, and hard to fake.

Better models still help. But RecallMEM should not need blind faith in the model to keep memory clean.

The model can talk.

The app keeps the receipts.


Chris Dabatos - DevRel Engineer

Chris Dabatos

DevRel Engineer, Builder, and Technical Storyteller based in Las Vegas. He builds things with AI and writes about what breaks.

Sections
Intro
0:00 / 0:00