PubMed literature alert service

The problem — and why it’s such a large project

It has two separate motivations, and both should be stated plainly.

The first is a real need: emergency-department shifts have dead stretches between cases, and I wanted to keep up — passively, without having to remember to log into PubMed and run searches by hand — with the literature relevant to my own practice: chest pain risk stratification, sepsis recognition, renal colic, the things I see most days. PubMed’s own email alerts are tedious to manage and don’t filter the way I want; literature-monitoring services are designed for researchers writing reviews, not for a clinician scanning for something useful between patients.

The second is this: it was my capstone project for the boot.dev backend developer curriculum. Which means the project’s size — Go and Python and RabbitMQ and PostgreSQL and Docker and a React frontend, three separate services, a queue — is not the size the literature-alert problem requires. Nobody reaches for RabbitMQ to email themselves papers. That size is the size the curriculum requires. I should say this up front: the project used a real need as the ground for deliberately building something larger than necessary — because the goal was partly the problem and partly the practice.

So this project answers a different question than the others in this portfolio. The others answer “how did you solve this operational problem”; the question this one answers is “can you build a complete, distributed, multi-component backend system end to end, to current professional standards?”

What I built

A self-hosted service. The user defines the searches they want to track, in PubMed’s standard search syntax; the system runs those searches against PubMed at set intervals, fetches newly matching articles, filters them, and every morning sends a digest email of that day’s new articles.

Filters can be set per search: publication-type allow/block lists (to weed out comments, errata, retraction notices), a minimum abstract length (to weed out conference abstracts and letters), a poll interval. If a search is too noisy, its filters can be tightened and the search re-run; if it’s too narrow, the search string can be broadened.

The system itself is three services: a Go service that polls PubMed at intervals and serves the HTTP API; a Python worker that fetches, parses, and filters the articles; and a separate Python worker that builds and sends the daily digest. A RabbitMQ queue between them, a PostgreSQL database beneath. The frontend is a React app embedded inside the Go binary.

What was technically interesting

By the nature of a capstone, most of the technical content is in the “did this for the first time” category — authentication, session management, message queues, multi-service architecture, CI, deployment. Listing all of it would be a résumé. Instead, a few decisions I found genuinely interesting:

Decoupling the services with a queue. The three services don’t call each other directly; there’s a RabbitMQ queue between them. This looked like needless complexity at first — but the loop that polls PubMed and the loop that fetches an article genuinely run at different speeds, and fail in different ways. The queue makes them independent of each other: even if the fetch service slows down, the polling service isn’t affected — work piles up in the queue and is processed afterward.

Deduplication before the network call. Before fetching an article from PubMed, the worker checks whether that article is already in the database. This is about being polite to NCBI: not requesting the same article twice, not generating unnecessary network traffic. A small detail, but it’s the correct behavior when you’re consuming someone else’s API.

The digest worker’s idempotency. The daily digest email must not be sent twice for the same day — even if the server restarts, even if the scheduler fires twice. This is guaranteed by a partial unique index in the database on (user, local_date): a second digest record for the same user-day pair can’t be created.

On the deployment side there were small surprises too — for instance, Railway blocking the outbound SMTP port, which forced sending email over an HTTPS API instead.

Outcome

The service runs on Railway, and it feeds my own tracking searches — right now, every morning, I get a digest of the articles published since the previous day that pass the filters I defined. A few other people use it by invitation.

But the real output of this project isn’t the service itself. The real output is this: I walked, once and end to end, the whole distance from tools that are just separate files to a fully distributed, tested, CI-gated, cloud-running multi-component system. The other projects in this portfolio solve specific problems; this one was built to widen the tool needed to solve those problems — that is, the limit of what I can do.

The repo is public and MIT-licensed. The running instance is invitation-only; anyone can stand up their own with Docker.