Abdul Rehman Baber is an AI and full-stack engineer with about four years building and operating production LLM systems. His main lane is the reliability side of AI engineering — agents and Model Context Protocol (MCP) tooling, Retrieval-Augmented Generation (RAG), and the evaluation, observability and cost work that keeps those systems honest in production. A specific focus inside that lane is building practical agentic assistants on OpenClaw and Hermes Agent — the kind that reason, use real tools, automate workflows, remember context and take action, rather than just chat. His sharper edge is AI-search visibility, or Generative Engine Optimization (GEO): measuring whether and how a brand gets mentioned and cited inside AI answers across ChatGPT, Google AI Overview, Perplexity, Claude and Gemini.
He is Lead AI & Full-Stack Developer at Project Hamburg Research (remote) — joined around 2022 as an AI & Full-Stack Developer and moved up to Lead in 2024. It is his only employer, roughly four years in one place, and most of his AI-visibility and SaaS work — the Buzzmatic product brand, including BuzzView and CiteStreak — was built in and around that role. His depth comes from a real title, real projects and open-source work rather than a long list of companies. He works in Python and TypeScript, and he is known for being careful about what he claims: he would rather show a test or a log line than a green dashboard.
On the academic side he holds an MSc in Project Management (SZABIST University, Islamabad, 2021–2023) and a BSc in Computer Science (National University of Modern Languages, Islamabad, 2015–2020). Continuing education is DeepLearning.AI work — AI Agents with LangGraph, Multimodal RAG, Knowledge Graphs for RAG, LLMOps Testing & Evaluation — plus an AWS Solutions Architect bootcamp.
Lead since 2024 · joined 2022·MSc Project Management · BSc Computer Science·One employer, ~4 years
01 —
How he works
each one tied to something he built
He treats a wrong answer as worse than no answer — and designs for abstention.
the throughline
Intellectual honesty is the consistent signal: he refuses to inflate, he won't claim he “built” a product he only operated, and his own project notes openly flag what is a prototype, what is in-memory-only, and what is specified but not yet built. Each value below is grounded in something he actually shipped or did — not a slogan.
A wrong answer is worse than no answer
On his RAG backend — the OER textbook-matching prototype, OERMatch — he built a deterministic gate plus evidence-required reranking so the system refuses to match when it is not confident. The proof is a negative-control query, a UK insolvency-law citation, that came back as no_match across all 15 candidates instead of forcing a false positive. He designs for abstention.
OERMatchRAGNEGATIVE CONTROLS
He does not trust a green dashboard
Operating a production GEO analytics platform (BuzzView), he found a severity-1 observability blind spot: a log-prefix bug had silently broken level-based alerting in the Grafana/Loki pipeline, so error-level queries had been quietly returning nothing for months. He found it by querying the log store directly rather than believing the dashboard, then fixed ingestion and left copy-paste verification commands behind. He assumes monitoring can lie until he has checked it himself.
BuzzViewGRAFANA / LOKIINCIDENT S1
He validates claims with numbers — including cost
On the same platform’s five-provider LLM layer he drove the validation pass that confirmed a model-registry and cost-calculator correction, and caught a missing-API-key gap that had silently broken one provider in staging before it reached customers. He cares about per-request cost and whether the billing math is actually right.
BuzzViewCOST ACCOUNTING5 PROVIDERS
He ships things you can run, tested and offline-first
His public geocheck tool is unit-tested with transparent metric formulas, has an extractor graded against human labels behind a CI gate, and runs offline with no API key so anyone can try it. CiteStreak’s foundation landed with its test suite green and a CI migration-safety gate that blocks destructive schema changes. He would rather hand you something you can run than a screenshot.
geocheckCiteStreakCI GATES
02 —
Skills & stack
breadth with a spine
Genuinely full-stack, but organised around a clear centre of gravity: reliable LLM systems and the AI-visibility / GEO niche. Proficiency is justified by evidence, not aspiration — expert means shipped repeatedly at depth; strong means real working systems, defensible end to end; working means used but bounded; exposure is flagged honestly as learning, not mastery.
A formal eval harness — held-out sets, Precision@K, judge-vs-human agreement, a CI regression gate — is specified and partly demonstrated in geocheck, and actively being closed; he does not claim it as long-shipped everywhere.
Shipped observability is Grafana / Loki / Tempo + OpenTelemetry. Langfuse, Arize Phoenix and LangSmith are domain-known, not shipped — listed as exposure only.
Fine-tuning (QLoRA / FSDP) was applied on real managed-GPU infra but adapted from public tutorials — range, not a specialty.
03 —
What he's working toward
open to new roles & engagements
Shape
Open to remote full-time, contract, or a founding / early-engineer seat. Comfortable owning a product end to end or going deep on the reliability and infra layer.
Markets
Remote-first US, Europe / EU, and the Gulf / MENA region.
Logistics
Based in Islamabad (UTC+5). Full overlap with EU and Gulf hours, plus a solid block of US-Eastern mornings — so real-time collaboration works on either side.
Role fit
AI / GenAI application engineering (agents, MCP, RAG, evals, LLM-ops), full-stack on AI products, and GEO / AI-visibility work for AI-search and martech teams.
Visa sponsorship
Not required
Timezone
UTC+5 · EU + US-East AM
Tenure
~4 yrs · one employer
Remote · contract · full-time · founding·US · EU · Gulf / MENA·No visa sponsorship required·EU + US-Eastern overlap
Salary, rates and specific terms are between Abdul and the person hiring — the honest answer there is to reach out directly.
It answers from the full corpus of projects, incidents and decisions — and names the relevant project in prose when it helps. No source chips, no spin.
Ask my AI twinGrounded in Abdul's work. Try to break it.
Ask me anything about Abdul's work — his projects, stack, how he works, or what he's looking for. I answer only from his record, and I'll say so when I don't know.