ABOUT · ISLAMABAD · UTC+5

Engineer first, careful by default

Abdul Rehman Baber is an AI and full-stack engineer with about four years building and operating production LLM systems. His main lane is the reliability side of AI engineering — agents and Model Context Protocol (MCP) tooling, Retrieval-Augmented Generation (RAG), and the evaluation, observability and cost work that keeps those systems honest in production. A specific focus inside that lane is building practical agentic assistants on OpenClaw and Hermes Agent — the kind that reason, use real tools, automate workflows, remember context and take action, rather than just chat. His sharper edge is AI-search visibility, or Generative Engine Optimization (GEO): measuring whether and how a brand gets mentioned and cited inside AI answers across ChatGPT, Google AI Overview, Perplexity, Claude and Gemini.

He is Lead AI & Full-Stack Developer at Project Hamburg Research (remote) — joined around 2022 as an AI & Full-Stack Developer and moved up to Lead in 2024. It is his only employer, roughly four years in one place, and most of his AI-visibility and SaaS work — the Buzzmatic product brand, including BuzzView and CiteStreak — was built in and around that role. His depth comes from a real title, real projects and open-source work rather than a long list of companies. He works in Python and TypeScript, and he is known for being careful about what he claims: he would rather show a test or a log line than a green dashboard.

On the academic side he holds an MSc in Project Management (SZABIST University, Islamabad, 2021–2023) and a BSc in Computer Science (National University of Modern Languages, Islamabad, 2015–2020). Continuing education is DeepLearning.AI work — AI Agents with LangGraph, Multimodal RAG, Knowledge Graphs for RAG, LLMOps Testing & Evaluation — plus an AWS Solutions Architect bootcamp.

Lead since 2024 · joined 2022·MSc Project Management · BSc Computer Science·One employer, ~4 years
01

How he works

each one tied to something he built
He treats a wrong answer as worse than no answer — and designs for abstention.
the throughline

Intellectual honesty is the consistent signal: he refuses to inflate, he won't claim he “built” a product he only operated, and his own project notes openly flag what is a prototype, what is in-memory-only, and what is specified but not yet built. Each value below is grounded in something he actually shipped or did — not a slogan.

A wrong answer is worse than no answer

On his RAG backend — the OER textbook-matching prototype, OERMatch — he built a deterministic gate plus evidence-required reranking so the system refuses to match when it is not confident. The proof is a negative-control query, a UK insolvency-law citation, that came back as no_match across all 15 candidates instead of forcing a false positive. He designs for abstention.

OERMatchRAGNEGATIVE CONTROLS

He does not trust a green dashboard

Operating a production GEO analytics platform (BuzzView), he found a severity-1 observability blind spot: a log-prefix bug had silently broken level-based alerting in the Grafana/Loki pipeline, so error-level queries had been quietly returning nothing for months. He found it by querying the log store directly rather than believing the dashboard, then fixed ingestion and left copy-paste verification commands behind. He assumes monitoring can lie until he has checked it himself.

BuzzViewGRAFANA / LOKIINCIDENT S1

He validates claims with numbers — including cost

On the same platform’s five-provider LLM layer he drove the validation pass that confirmed a model-registry and cost-calculator correction, and caught a missing-API-key gap that had silently broken one provider in staging before it reached customers. He cares about per-request cost and whether the billing math is actually right.

BuzzViewCOST ACCOUNTING5 PROVIDERS

He ships things you can run, tested and offline-first

His public geocheck tool is unit-tested with transparent metric formulas, has an extractor graded against human labels behind a CI gate, and runs offline with no API key so anyone can try it. CiteStreak’s foundation landed with its test suite green and a CI migration-safety gate that blocks destructive schema changes. He would rather hand you something you can run than a screenshot.

geocheckCiteStreakCI GATES
02

Skills & stack

breadth with a spine

Genuinely full-stack, but organised around a clear centre of gravity: reliable LLM systems and the AI-visibility / GEO niche. Proficiency is justified by evidence, not aspiration — expert means shipped repeatedly at depth; strong means real working systems, defensible end to end; working means used but bounded; exposure is flagged honestly as learning, not mastery.

AI / LLM systems

Python & TypeScript · ~4 years in production

LangGraph (strong)LangChain / LCEL (strong)Custom tool authoring (strong)Multi-model benchmarking (strong)Structured output — Pydantic / Zod (strong)Prompt engineering (working)Fine-tuning — QLoRA / FSDP (exposure)

RAG & retrieval

Hybrid retrieval — dense + sparse (strong)LLM reranking (strong)pgvector / HNSW (strong)Entity resolution & corroboration (strong)Knowledge-graph RAG — Neo4j / Cypher (working)Multimodal RAG — CLIP / Chroma (working)Semantic caching (working)

Evals & reliability

Lead-with signal — reliability primitives shipped; full eval harness in progress

Negative controls / anti-hallucination (strong)Deterministic fallback gating (strong)Retry / backoff / idempotency (strong)Multi-key rotation / rate-limit resilience (strong)CI test & eval gates (working)LLM-as-judge loops (working)Formal eval harness — Precision@K (exposure, building)

Agents & MCP

Agentic assistants on OpenClaw & Hermes — the applied edge

MCP server authoring (strong)MCP client integration, multi-surface (strong)Multi-step decomposition / tool streaming (strong)OpenClaw — agent orchestration (strong)Hermes Agent — proactive assistants (strong)Subagent delegation & coordination (strong)Tool-use & API automation (strong)Agent memory systems (working)Scheduled / proactive workers (working)Channel delivery — WhatsApp / Telegram (working)Transport — SSE / stdio / WebSocket (working)

AI-visibility / GEO

The differentiating niche

GEO KPI & metric design (strong)Entity-authority scoring (strong)Multi-engine answer collection (strong)AI-Overview / SERP scraping at scale (strong)llms.txt & AI-crawler policy (strong)JSON-LD entity graphs (strong)Competitive / market intelligence (working)

Observability & LLM-ops

Grafana / Loki / Promtail (strong)Structured logging & incident discipline (strong)Production crash / loop diagnosis (strong)Model registry & cost accounting (strong)OpenTelemetry / Tempo tracing (working)Langfuse / Phoenix / LangSmith (exposure)

Backend & distributed

NestJS / Fastify (strong)FastAPI (strong)Multi-tenancy & Row-Level Security (strong)Billing correctness — ledger / Stripe (strong)Parallel orchestration & partial-failure (strong)ETL with provenance (strong)Temporal durable workflows (working)BullMQ / Redis · Elasticsearch (working)

Frontend

Next.js — App Router / RSC / Server Actions (strong)React 18 / 19 (strong)TypeScript (strong)Auth — NextAuth / Better Auth / OAuth (strong)Prisma / Drizzle on Neon (strong)Tailwind / shadcn (strong)PWA / offline-first (strong)Doc generation — PDF / DOCX (strong)Chart.js data-viz (working)

Infra & DevOps

AWS — EC2 / Lambda / SageMaker / S3 / IAM (strong)Docker / Compose (strong)CI/CD — GitHub Actions (strong)Vercel / Railway deploy (strong)Linux networking / nftables (strong)From-source C builds (working)Nginx / TLS / fail2ban (working)Agentic DevOps over SSH (working)

Languages, tooling & platforms

Python (expert)TypeScript / JavaScript / Node (strong)SQL / Cypher (strong)PostgreSQL · pgvector (strong)Google Analytics 4 (working)Google Search Console (working)AWS · GCP (working)

Honesty flags

  • A formal eval harness — held-out sets, Precision@K, judge-vs-human agreement, a CI regression gate — is specified and partly demonstrated in geocheck, and actively being closed; he does not claim it as long-shipped everywhere.
  • Shipped observability is Grafana / Loki / Tempo + OpenTelemetry. Langfuse, Arize Phoenix and LangSmith are domain-known, not shipped — listed as exposure only.
  • Fine-tuning (QLoRA / FSDP) was applied on real managed-GPU infra but adapted from public tutorials — range, not a specialty.
03

What he's working toward

open to new roles & engagements

Shape

Open to remote full-time, contract, or a founding / early-engineer seat. Comfortable owning a product end to end or going deep on the reliability and infra layer.

Markets

Remote-first US, Europe / EU, and the Gulf / MENA region.

Logistics

Based in Islamabad (UTC+5). Full overlap with EU and Gulf hours, plus a solid block of US-Eastern mornings — so real-time collaboration works on either side.

Role fit

AI / GenAI application engineering (agents, MCP, RAG, evals, LLM-ops), full-stack on AI products, and GEO / AI-visibility work for AI-search and martech teams.

Visa sponsorship
Not required
Timezone
UTC+5 · EU + US-East AM
Tenure
~4 yrs · one employer
Remote · contract · full-time · founding·US · EU · Gulf / MENA·No visa sponsorship required·EU + US-Eastern overlap

Salary, rates and specific terms are between Abdul and the person hiring — the honest answer there is to reach out directly.

04

Contact

the best ways to reach him

The fastest way to vet me

Ask my AI twin anything about the work

It answers from the full corpus of projects, incidents and decisions — and names the relevant project in prose when it helps. No source chips, no spin.