ABOUT · ISLAMABAD · UTC+5

Engineer first, careful by default

Abdul Rehman Baber is an AI and full-stack engineer with about four years building and operating production LLM systems. His main lane is the reliability side of AI engineering — agents and Model Context Protocol (MCP) tooling, Retrieval-Augmented Generation (RAG), and the evaluation, observability and cost work that keeps those systems honest in production. A specific focus inside that lane is building practical agentic assistants on OpenClaw and Hermes Agent — the kind that reason, use real tools, automate workflows, remember context and take action, rather than just chat. His sharper edge is AI-search visibility, or Generative Engine Optimization (GEO): measuring whether and how a brand gets mentioned and cited inside AI answers across ChatGPT, Google AI Overview, Perplexity, Claude and Gemini.

He is Lead AI & Full-Stack Developer at Project Hamburg Research (remote) — joined around 2022 as an AI & Full-Stack Developer and moved up to Lead in 2024. It is his only employer, roughly four years in one place, and most of his AI-visibility and SaaS work — including the production GEO platform he operates and the GEO monitoring SaaS he is building — was built in and around that role. His depth comes from a real title, real projects and open-source work rather than a long list of companies. He works in Python and TypeScript, and he is known for being careful about what he claims: he would rather show a test or a log line than a green dashboard.

On the academic side he holds an MSc in Project Management (SZABIST University, Islamabad, 2021–2023) and a BSc in Computer Science (National University of Modern Languages, Islamabad, 2015–2020). Continuing education is DeepLearning.AI work — AI Agents with LangGraph, Multimodal RAG, Knowledge Graphs for RAG, LLMOps Testing & Evaluation — plus an AWS Solutions Architect bootcamp.

Lead since 2024 · joined 2022·MSc Project Management · BSc Computer Science·One employer, ~4 years

01 —

How he works

each one tied to something he built

He treats a wrong answer as worse than no answer — and designs for abstention.

the throughline

Intellectual honesty is the consistent signal: he refuses to inflate, he won't claim he “built” a product he only operated, and his own project notes openly flag what is a prototype, what is in-memory-only, and what is specified but not yet built. Each value below is grounded in something he actually shipped or did — not a slogan.

A wrong answer is worse than no answer

On his RAG backend — the OER textbook-matching prototype, OERMatch — he built a deterministic gate plus evidence-required reranking so the system refuses to match when it is not confident. The proof is a negative-control query, a UK insolvency-law citation, that came back as no_match across all 15 candidates instead of forcing a false positive. He designs for abstention.

OERMatchRAGNEGATIVE CONTROLS

He does not trust a green dashboard

Operating a production GEO analytics platform, he found a severity-1 observability blind spot: a log-prefix bug had silently broken level-based alerting in the Grafana/Loki pipeline, so error-level queries had been quietly returning nothing for months. He found it by querying the log store directly rather than believing the dashboard, then fixed ingestion and left copy-paste verification commands behind. He assumes monitoring can lie until he has checked it himself.

GEO PLATFORMGRAFANA / LOKIINCIDENT S1

He validates claims with numbers — including cost

On the same platform’s five-provider LLM layer he drove the validation pass that confirmed a model-registry and cost-calculator correction, and caught a missing-API-key gap that had silently broken one provider in staging before it reached customers. He cares about per-request cost and whether the billing math is actually right.

GEO PLATFORMCOST ACCOUNTING5 PROVIDERS

He ships things you can run, tested and offline-first

His public geocheck tool is unit-tested with transparent metric formulas, has an extractor graded against human labels behind a CI gate, and runs offline with no API key so anyone can try it. His GEO-monitoring SaaS foundation landed with its test suite green and a CI migration-safety gate that blocks destructive schema changes. He would rather hand you something you can run than a screenshot.

geocheckGEO SAASCI GATES

02 —

Skills & stack

breadth with a spine

Genuinely full-stack, but organised around a clear centre of gravity: reliable LLM systems and the AI-visibility / GEO niche. Proficiency is justified by evidence, not aspiration — expert means shipped repeatedly at depth; strong means real working systems, defensible end to end; working means used but bounded; exposure is flagged honestly as learning, not mastery.

AI / LLM systems

Python & TypeScript · ~4 years in production

LangGraph (strong)LangChain / LCEL (strong)Custom tool authoring (strong)Multi-model benchmarking (strong)Structured output — Pydantic / Zod (strong)Prompt engineering (working)Fine-tuning — QLoRA / FSDP (exposure)

RAG & retrieval

Hybrid retrieval — dense + sparse (strong)LLM reranking (strong)pgvector / HNSW (strong)Entity resolution & corroboration (strong)Knowledge-graph RAG — Neo4j / Cypher (working)Multimodal RAG — CLIP / Chroma (working)Semantic caching (working)

Evals & reliability

Lead-with signal — reliability primitives shipped; full eval harness in progress

Negative controls / anti-hallucination (strong)Deterministic fallback gating (strong)Retry / backoff / idempotency (strong)Multi-key rotation / rate-limit resilience (strong)CI test & eval gates (working)LLM-as-judge loops (working)Formal eval harness — Precision@K (exposure, building)

Agents & MCP

Agentic assistants on OpenClaw & Hermes — the applied edge

MCP server authoring (strong)MCP client integration, multi-surface (strong)Multi-step decomposition / tool streaming (strong)OpenClaw — agent orchestration (strong)Hermes Agent — proactive assistants (strong)Subagent delegation & coordination (strong)Tool-use & API automation (strong)Agent memory systems (working)Scheduled / proactive workers (working)Channel delivery — WhatsApp / Telegram (working)Offline voice — faster-whisper / Piper (working)Transport — SSE / stdio / WebSocket (working)

AI-visibility / GEO

The differentiating niche

GEO KPI & metric design (strong)Entity-authority scoring (strong)Multi-engine answer collection (strong)AI-Overview / SERP scraping at scale (strong)Anti-bot / TLS-fingerprint collection (strong)Multi-cloud scraper deploy — Lambda / Cloud Run (working)llms.txt & AI-crawler policy (strong)JSON-LD entity graphs (strong)Competitive / market intelligence (working)

Observability & LLM-ops

Grafana / Loki / Promtail (strong)Structured logging & incident discipline (strong)Production crash / loop diagnosis (strong)Model registry & cost accounting (strong)OpenTelemetry / Tempo tracing (working)Langfuse / Phoenix / LangSmith (exposure)

Backend & distributed

NestJS / Fastify (strong)FastAPI (strong)Multi-tenancy & Row-Level Security (strong)Billing correctness — ledger / Stripe (strong)Parallel orchestration & partial-failure (strong)ETL with provenance (strong)Temporal durable workflows (working)BullMQ / Redis · Elasticsearch (working)

Frontend

Next.js — App Router / RSC / Server Actions (strong)React 18 / 19 (strong)TypeScript (strong)Auth — NextAuth / Better Auth / OAuth (strong)Prisma / Drizzle on Neon (strong)Tailwind / shadcn (strong)PWA / offline-first (strong)Doc generation — PDF / DOCX (strong)Chart.js data-viz (working)

Infra & DevOps

AWS — EC2 / Lambda / SageMaker / S3 / IAM (strong)Docker / Compose (strong)CI/CD — GitHub Actions (strong)Vercel / Railway deploy (strong)Linux networking / nftables (strong)From-source C builds (working)Nginx / TLS / fail2ban (working)Agentic DevOps over SSH (working)

Languages, tooling & platforms

Python (expert)TypeScript / JavaScript / Node (strong)SQL / Cypher (strong)PostgreSQL · pgvector (strong)Google Analytics 4 (working)Google Search Console (working)AWS · GCP (working)

Honesty flags

A formal eval harness — held-out sets, Precision@K, judge-vs-human agreement, a CI regression gate — is specified and partly demonstrated in geocheck, and actively being closed; he does not claim it as long-shipped everywhere.
Shipped observability is Grafana / Loki / Tempo + OpenTelemetry. Langfuse, Arize Phoenix and LangSmith are domain-known, not shipped — listed as exposure only.
Fine-tuning (QLoRA / FSDP) was applied on real managed-GPU infra but adapted from public tutorials — range, not a specialty.

03 —

What he's working toward

open to new roles & engagements

Shape

Open to remote full-time, contract, or a founding / early-engineer seat. Comfortable owning a product end to end or going deep on the reliability and infra layer.

Markets

Remote-first US, Europe / EU, and the Gulf / MENA region.

Logistics

Based in Islamabad (UTC+5). Full overlap with EU and Gulf hours, plus a solid block of US-Eastern mornings — so real-time collaboration works on either side.

Role fit

AI / GenAI application engineering (agents, MCP, RAG, evals, LLM-ops), full-stack on AI products, and GEO / AI-visibility work for AI-search and martech teams.

Visa sponsorship

Not required

Timezone

UTC+5 · EU + US-East AM

Tenure

~4 yrs · one employer

Remote · contract · full-time · founding·US · EU · Gulf / MENA·No visa sponsorship required·EU + US-Eastern overlap

Salary, rates and specific terms are between Abdul and the person hiring — the honest answer there is to reach out directly.

04 —

Contact

the best ways to reach him

a.r.b.plato@gmail.com

Best route for work and hiring

→

github.com/ar-baber

The work — and a way to verify it

→

linkedin.com/in/abdul-rehman-baber-4b5b36200

Background and history

→

Email a.r.b.plato@gmail.comAsk my AI twin instead

The fastest way to vet me

Ask my AI twin anything about the work

It answers from the full corpus of projects, incidents and decisions — and names the relevant project in prose when it helps. No source chips, no spin.

Open the AI twin