Shipped full-stack product

MapleScholar

A production, nonprofit AI research-discovery PWA — paste a DOI / title / arXiv / OpenAlex ID to resolve a paper across 7 scholarly APIs, chat with it in 70+ languages, and read it as interactive HTML. Sole primary engineer, end-to-end, with a first-class GEO program.

NEXT.JSRAGSUPABASEGEOPWA

Live site

Honest outcomes

scholarly APIs resolved

chat / translation languages

70+

AI crawlers allow-listed

for GEO citation

runtimes in production

Vercel + Railway + Supabase

build sessions

~168

logged over ~9 months

01 —

Why

Academic papers are locked behind four separate frictions at once: language barriers, inconsistent identifiers (DOI vs arXiv vs OpenAlex vs a raw title), paywalled or awkward PDFs, and discovery that favours already-famous work. Each is annoying on its own; together they keep open science from actually being open.

MapleScholar is a production, nonprofit research-discovery PWA — live and in use — that attacks all four. Paste any identifier or even a fuzzy title, and it resolves the paper, lets you chat with it in your own language, and renders the PDF as clean, translatable HTML. It is run as a real product by a registered nonprofit, not a demo.

It also had to be found. The growth engine is a deliberate GEO / AI-visibility layer: the goal was for the tool to surface correctly inside ChatGPT, Perplexity, and Claude answers, treating that as an engineering deliverable rather than marketing.

A static GitHub-Pages slideshow became a multi-runtime production platform — and the part I am proudest of is the discipline that "appear correctly in an AI answer" was treated as a shipping requirement, not a nice-to-have.

sole primary engineer, ~9 months

02 —

What

Resolution fans a normalized query out to seven scholarly providers — OpenAlex, Crossref, Semantic Scholar, Europe PMC, CORE, Unpaywall, and arXiv — then aggregates and scores the candidates with title similarity, token overlap, and a corroboration count across sources, emitting a confidence level and a human-readable explanation for why it matched.

Comprehension is real-time, selection-based translation across 70+ languages (RTL/LTR aware) plus an AI chat grounded in the paper whose highest-priority rule is to answer in the user’s own language. A curated discovery feed classifies papers as Trending, Hidden Gem, or Breakthrough using an explainable, metrics-based model over citation percentiles, FWCI, and Altmetric attention.

Rendering converts arbitrary PDFs into a clean HTML reading shell via a separate Dockerized FastAPI worker, deliberately split off the serverless platform because the native converter needs Linux libraries and exceeds serverless time limits — a runtime-fit decision, not an accident.

03 —

How

The system is multi-runtime: a Next.js 14 App Router web app on Vercel, a Supabase Postgres backend with an admin panel and dry-run-guarded migration scripts, and the Dockerized PDF→HTML worker on Railway with shared-secret auth, SSRF and path-traversal guards, and job caching. Server API routes hold every secret; the discovery feed renders client-side over Supabase.

The Gemini AI layer is hardened for reliability: API-key rotation across GCP projects for independent rate limits, a two-model preference-then-fallback chain, and three-attempt JSON-repair retries against a structured-output contract. Enhancement keys stay server-side so they never reach the client bundle.

The GEO subsystem is a first-class part of "done": an llms.txt source of truth, a robots policy explicitly allow-listing 44 AI crawlers, a JSON-LD entity graph, an agent-facing research.json, and a written discipline requiring sitemap + llms.txt + per-page metadata on every public-route change.

04 —

Where it stands

MapleScholar is live at maplescholar.projecthamburg.org and has been iterated by a single primary engineer across roughly nine months and ~168 documented build sessions — including the full v1→v2 migration from a static GitHub-Pages prototype to the production platform.

Honest caveats, kept on the record: the self-reported capacity figures (a handful to a few dozen concurrent users, a few hundred page-loads/day) are README estimates, not measured analytics, so I present them as estimates or leave them off. The build is intentionally lenient on type/lint gates, which I am candid about. The shipped product and the GEO program are the real, defensible claims.

05 —

Stack

Next.js 14React 18Supabase / PostgresVercelGoogle GeminiFastAPI worker (Railway)

All case studies Ask my AI twin about this