OpenClaw proactive assistant

A self-hosted, always-on personal assistant on the OpenClaw runtime: a fully offline WhatsApp voice loop (own faster-whisper STT microservice + sherpa-onnx/Piper TTS, zero cloud speech) plus a fleet of isolated cron jobs that run unattended and report over WhatsApp/Telegram — one captured 6,277 Google AI Overviews in a single ~22-hour run at <0.1% errors. Integrator/operator of a third-party runtime, not its author.

AGENTSOPENCLAWVOICECRONWHATSAPP

Honest outcomes

AI Overviews, one run

6,277

6,647 requests · ~22h · <0.1% errors

cloud STT/TTS

fully offline speech

scheduled agent jobs

isolated cron workers

channels

WhatsApp · Telegram

role

integrator/operator

extends a 3rd-party runtime

01 —

Why

Most "AI assistants" are chat boxes: they answer when spoken to and forget everything after. I wanted the opposite — a persistent assistant that runs on a schedule, takes real actions through tools, remembers context, and reaches me on the channels I already use, all on hardware I control.

I built this on OpenClaw, a self-hosted agent runtime, on my own Ubuntu homelab. Precise about credit: OpenClaw is a third-party runtime — I am the integrator, operator, and extender, not the author of the agent OS. What I built is the layer on top: the voice loop, the offline speech services, the scheduled automations, and a provenance-aware memory.

One honesty caveat about the evidence: the repository snapshot captured here has several skill manifests as empty placeholders — the working code lives on the homelab — so what I lean on is the running configuration and the session logs, not those stubs.

A real assistant does not wait to be asked — it runs on a schedule, acts through real tools, and reports back over the channels you already use.

integrator & operator, not the OS author

02 —

What

The headline is a fully offline WhatsApp voice assistant: a voice note arrives over WhatsApp, a local faster-whisper microservice transcribes it (int8, on CPU), the agent acts, and the reply is spoken back through a local sherpa-onnx / Piper voice with a hand-tuned ffmpeg filter chain — zero cloud speech services in the loop, verified end to end.

On top sits a fleet of scheduled, isolated cron jobs that run unattended and report over WhatsApp and Telegram. One of them is a captcha-aware Google AI-Overview scraper that, in a single ~22-hour run, captured 6,277 AI Overviews from 6,647 requests at under 0.1% errors — concrete proof from the run logs that the proactive automation actually runs at scale.

Underneath is a provenance-aware ontology memory — an append-only graph with a YAML schema and a query-memory-first recall rule — and an NL→cron scheduling skill that turns "remind me every weekday at 9" into a real, DST-safe scheduled job.

03 —

How

The voice loop is a Node event hook I wrote against the runtime: it intercepts inbound audio, routes it through a local STT microservice (a systemd service exposing faster-whisper), and on the way out renders TTS through sherpa-onnx / Piper and an ffmpeg DSP chain for a consistent voice. Everything speech-related is local, which is the whole point — privacy and zero per-minute cost.

Proactivity comes from cron: each scheduled job runs in its own isolated agent session so one job’s context cannot bleed into another, with a hard run-guard so a job cannot pile up on itself, and delivery over WhatsApp / Telegram with a retry queue. The Google AI-Overview job is the GEO connection — the same at-scale answer-collection problem, solved here as a personal scheduled worker.

What I am careful about: I did not write OpenClaw, and one bundled skill in the tree is a third-party template I did not author. My contributions are the voice hook, the offline STT/TTS services, the cron orchestration, the ontology-memory design, and operating the whole thing on real hardware — which is what the configs and session logs actually show.

04 —

Where it stands

A self-hosted, always-on personal assistant that takes action on a schedule: an offline WhatsApp voice loop with no cloud speech dependency, eight isolated cron workers, provenance-aware memory, and a scaled scrape (6,277 AI Overviews in one ~22-hour run) proving the automation is real.

Kept honest: OpenClaw is a third-party runtime I extend and operate, not my creation; several skill files in this snapshot are placeholders with the real code on the homelab; and one bundled skill is someone else’s template. The differentiator is genuine — proactive, tool-using, channel-delivering assistants on infrastructure I own — but I scope my credit to what I actually built.

05 —

Stack

OpenClawfaster-whisper (STT)sherpa-onnx / Piper (TTS)ffmpegsystemdWhatsApp (Baileys) / Telegram

All case studies Ask my AI twin about this