Weather Muse
The weather app that grades its own forecasts — and tells you how much to trust tomorrow.
Visit live siteConsumer weather apps fail in a specific, repeatable way: they project false certainty. They show a single forecast with a sunny icon, silently revise it hours later, and never acknowledge when they were wrong. Users cope by screenshotting forecasts just to prove to themselves that the prediction changed.
The people this hurts most are the ones who plan around weather and have been burned by it — weddings and outdoor events, travel, allergy- and pressure-sensitive individuals, and weather enthusiasts who simply want to know the truth.
The core gap is trust calibration. No mainstream app tells you how reliable a given forecast actually is, how much it has shifted since yesterday, or what historically happened on this date in this city. Weather Muse is built to answer exactly those questions.
Weather Muse (internally, Almanac) is a forecast-accountability platform rather than a forecast display. A single long-lived Node process wraps Next.js so a cron scheduler can snapshot forecasts daily, reconcile them against independently sourced actuals, and score reliability — turning weather from a disposable prediction into an auditable, accumulating record.
Centralized daily collection
A single node-cron scheduler runs four daily jobs — a 6:00 forecast snapshot, an 8:00 actuals pull, an 8:30 changeability pass, and a 9:00 prediction resolution. Because collection is centralized rather than per-user, every new user gets full historical depth immediately on signup, which solves the cold-start problem outright.
Forecast vs. ground-truth separation
Forecasts come from Tomorrow.io; recorded actuals come from Open-Meteo. Keeping the prediction source separate from the grading source avoids the integrity problem of a provider grading its own homework — accountability has to be independent to mean anything.
Time-series schema
ForecastSnapshot rows capture how the prediction for a given day evolves across successive daily snapshots, and each row carries a weatherProvider field so multi-provider accuracy comparison can be added later without a schema migration. The result is a forecast-evolution record no standard app keeps.
The scoring engine
Eight algorithms score temperature reliability, forecast stability, precipitation occurrence, calibration, amount, and type, plus a confidence-adjusted composite and a forecast-value score against a climatological baseline. Weights live in a WeightConfig model exposed through an admin “Score Observatory,” with a ForecastFeedback table to validate the algorithms against user perception — an honest acknowledgment that the weights are informed estimates that need a feedback loop.
The consumer game layer
On top of the data sits the engagement surface: daily rain predictions, streaks, achievements, bingo, and leaderboards, plus a watchlist of tracked future dates (“Date Watch”) and AI-generated daily content — motivators, nerd facts, and almanac entries from the Anthropic API.
Runtime & deployment
A custom Node server wraps Next.js — deliberately not serverless — so the cron scheduler runs in the same long-lived process, initialized after Next.js is ready. NextAuth v5 with a credentials provider and JWT gates the admin routes by role, and the app runs on a Hetzner VPS under PM2 for restart persistence, alongside a co-resident Telegram bot.
The reasoning behind the build — and what each choice cost.
Centralized daily collection instead of per-user fetching
One scheduler pulls forecasts for all 64 cities once a day, so every user gets full historical depth the moment they sign up — the cold-start problem disappears and the dataset is identical for everyone.
Coverage is fixed to the seeded city list rather than any arbitrary user location, and the single process must stay healthy for the data to keep accruing.
Separate the forecast source from the ground-truth source
Tomorrow.io supplies forecasts and Open-Meteo supplies actuals, so no provider grades its own homework — the accuracy scores are independent and therefore credible.
Two external dependencies to reconcile, including aligning their data shapes and timing for a fair comparison.
A custom Node server wrapping Next.js, not serverless
The cron scheduler needs to live in the same long-lived process as the app, initialized once Next.js is ready — something serverless functions cannot provide.
Gives up serverless auto-scaling and managed ops in exchange for a persistent process that must be supervised (PM2) and patched.
Time-series schema with a provider field baked in
Storing every daily snapshot — each tagged with its weatherProvider — turns the database into an append-only evolution record and leaves room for multi-provider comparison without a future migration.
Storage grows monotonically forever, and the schema commits early to a design whose payoff only arrives with time.
Expose scoring weights as tunable, feedback-validated parameters
A WeightConfig model and an admin Score Observatory make the scoring weights adjustable, and a ForecastFeedback table checks algorithm output against user perception — an explicit admission that the weights are estimates that need a loop to refine.
Tunable weights add admin surface and the risk of overfitting to feedback; the scores are only as good as the loop maintaining them.
NextAuth v5 with role-gated admin routes
Credentials-provider auth with JWT and middleware role checks protects the admin observatory and APIs cleanly within the App Router.
v5 was the most friction-prone part of go-live — its AUTH_TRUST_HOST requirement and the missing auto-generated sign-in page both surfaced only in production.
Weather Muse is a launch-state system, so the results are operational rather than user-metric. The full four-phase build was deployed to production in a single day: the final build compiles all 50 routes and passes a strict TypeScript check (tsc --noEmit) with zero errors.
The shipped scope is substantial — 64 cities seeded; 13 consumer, 11 admin, and 6 game API routes plus auth; 8 admin pages; 8 scoring algorithms; and 4 daily cron jobs. The app and a co-resident Telegram bot both run under PM2 with restart persistence, which stabilized to single-digit restart counts once the ts-node/ESM and NextAuth host-trust issues were resolved.
By design, the real value compounds with time. Every daily cron run permanently adds an irreplaceable day of forecast-vs-actual data across all cities — data that cannot be backfilled. The metrics that matter from here are forecast accuracy by provider, city, and lead time; prediction-game retention; and Date Watch engagement as tracked dates arrive.