Intelligence · PRJ-006

Dealflow CRM

Turning thousands of MLS listings into a ranked, auditable shortlist of real deals.

~5,192

Properties in the scored lead universe

≈3.4s

Per property — scrape, enrich, score

0–100

Auditable deal score on every listing

The Problem

Real estate wholesalers and fix-and-flip investors in Dallas–Fort Worth face a needle-in-a-haystack problem. Hundreds of new and updated MLS listings appear every day, but only a tiny fraction are real deals — properties where a motivated seller is likely to accept an offer at or below 70% of the After Repair Value minus rehab costs (the wholesaler’s Maximum Allowable Offer).

Finding those few means manually reading the remarks on every listing for distress signals — “motivated,” “as-is,” “estate,” “must sell” — then estimating ARV and a repair budget, which normally requires pulling comps and walking the property. Texas is a non-disclosure state, so even sold prices are hard to come by.

On top of that, listings are not static. Price cuts, re-lists, and back-on-market events are the actual buying windows, and tracking which leads were contacted, which are under contract, and which just dropped their price is normally scattered across spreadsheets. The tool is built for a single operator who needs to triage thousands of listings down to a confident shortlist, fast.

The Architecture

DealFlow CRM is deliberately split in two. A React 19 single-page app owns the interface and the CRM pipeline, while a separate local Express server owns everything that needs secrets, a real browser, or heavy compute — MLS scraping, OpenAI calls, and comp enrichment. Scoring is deterministic-first; AI is reserved for the genuinely subjective judgments. Supabase/Postgres is the single source of truth, written from both sides.

Lead ingestion — two paths

Listings enter either by manual CSV import (Papaparse parses Matrix Hotsheet / Flexmls / generic MLS exports with flexible column mapping) or by automated sync: a Playwright headless-Chromium bot with a stealth plugin logs into NTREIS Matrix MLS, handles 2FA, sweeps the operator’s saved searches, and upserts listings. Each row is scored on the way in.

Deterministic deal-scoring engine

A pure, auditable function assigns every property a 0–100 score plus strategy tags — no AI in the loop. It rewards days-on-market fatigue, price-drop percentage (a ≥20% cut flags a “PANIC_CUT”), low photo counts, and re-lists, and reads keyword signals across five buckets (distress, condition, cash-only, vacancy, lot-value). It even penalizes “finished flip” language, since turnkey homes aren’t wholesale targets.

Deal-viability model

A second model decomposes viability into three weighted sub-scores — Math (45%: do the numbers work against 70% ARV), Motivation (35%: will the seller accept a lowball), and Timing (20%: is there a window right now, e.g. a price cut in the last seven days) — and buckets each property into STRIKE_NOW, WARMING, or COLD.

AI ARV & rehab analysis

On demand, GPT-4o Vision analyzes up to eight listing photos plus remarks and sold comps to produce a three-point ARV (conservative / likely / aggressive) and a line-item rehab budget driven by a fixed DFW cost table. A deterministic comp-adjustment engine applies pure-math adjustments (per condition point, per sqft, per bedroom, pool, garage, stories) so every dollar is auditable, with GPT-4o normalizing the condition of subject and comp photos. It runs at temperature 0.2 with JSON-object output for reliability.

CRM pipeline & change detection

Properties flow through statuses (Interested → Contacted → Offer Sent → Under Contract → Deal Closed) on a drag-and-drop kanban that tracks contract terms, option periods, title company, assignment fees, and document uploads. A metrics page computes the conversion funnel and average contract-price as a percent of ARV. “Flag Backs” re-detect change events — price drops, re-lists, DOM spikes, photo removals, agent changes — and resurface a known lead the moment something material shifts.

Data & cost control

Supabase/Postgres is the single source of truth, written by the frontend via the anon key and the server via the service key; RLS is intentionally permissive because it is a single-user app. The scarce resource is RapidAPI quota, so an enrichment cache-guard looks up the existing record and skips the API call unless the listing is new or its price changed, with comp sourcing tiered Matrix-first (free) then RapidAPI (paid) then Nominatim for geocoding.

Key Decisions & Tradeoffs

The reasoning behind the build — and what each choice cost.

Split the app — a React SPA plus a separate local Express server

Why

The browser never touches MLS or AI directly. The Express server owns Playwright scraping (which needs a persistent login/2FA session a SPA cannot hold) and keeps the OpenAI, MLS, and RapidAPI keys server-side; the frontend only holds the Supabase anon and Maps keys.

Tradeoff

Two processes to run and deploy instead of one, and a local server in the loop for the heavy operations.

Deterministic-first scoring, AI second

Why

The deal score and comp adjustments are pure functions — fast, free, explainable, and consistent — so the ranking is defensible. GPT-4o Vision is reserved for the genuinely subjective calls (condition from photos, ARV reasoning, rehab line items), capping cost and keeping the score auditable.

Tradeoff

The deterministic rules need hand-tuning and domain knowledge to stay accurate, and they cannot catch nuance the way a model might.

Cache-guard on enrichment to control RapidAPI quota

Why

RapidAPI quota was the scarce resource (~2 calls per property). The sync endpoint now looks up the existing record and skips enrichment unless the listing is new or its price changed, so only new or changed listings spend quota — with a forceEnrich override.

Tradeoff

Adds a staleness window (a property is not re-enriched until its price moves) and extra bookkeeping to track enrichment state.

Tiered comp sourcing with fallbacks

Why

Matrix MLS scraping is tried first (free, unlimited, most accurate), RapidAPI is the paid fallback, and Nominatim is the free geocoding fallback — maximizing accuracy while minimizing paid calls.

Tradeoff

Multiple data paths to maintain and reconcile, each with its own failure modes.

Supabase with intentionally permissive RLS for a single user

Why

Because it is a single-operator tool, permissive RLS (“allow all”) trades multi-tenant security for development speed; real protection comes from the un-exposed service key and a private deployment.

Tradeoff

Not safe to open to multiple users as-is — multi-tenancy would require real RLS policies first.

Lean stack — no Redux/Router, Vite-only build

Why

Plain React 19 state with Supabase as the backing store, a hand-rolled page switcher instead of React Router, and a Vite-only build kept a solo project fast to iterate on.

Tradeoff

Skipping a router and the type-check build trades some long-term structure and safety for short-term velocity.

Outcome & Performance

DealFlow CRM is a working personal tool rather than a benchmarked product, so its results are operational — and concrete. The live database holds roughly 5,192 property records, the working universe the engine ranks, and a representative full sync logged “33 upserted, 0 errors, 113.7s” — about 3.4 seconds per property end to end, including scrape and enrichment.

The clearest measurable win is cost. The enrichment cache-guard converted a recurring quota problem — repeated “maxed on the API” alerts — into a controlled one by spending RapidAPI calls only on new or price-changed listings instead of every property on every sweep. AI cost is bounded too: token usage is logged per call and inputs are capped at eight photos for ARV and six for condition.

Above all, the value is leverage and explainability. Every property carries a 0–100 score, human-readable reasons (“27% price drop from original — seller deeply motivated”), strategy tags, and a STRIKE_NOW / WARMING / COLD tier with separate math, motivation, and timing sub-scores — so one person can triage thousands of listings down to a ranked, auditable shortlist with AI-backed ARV and rehab numbers, and know exactly why each property ranked where it did.

Capabilities

React 19 · Vite 7TypeScriptNode · ExpressSupabase · PostgresOpenAI GPT-4o VisionPlaywright (stealth)RapidAPI Realty-in-USGoogle Maps · Nominatim@dnd-kit kanbanPapaparse CSVVercel