An interactive companion to the 2-page brief — live calibration of the proposed policy thresholds against the provided dataset, full SQL query replay via DuckDB-WASM in your browser, and a forward-projection model for 30/60/90 rollout scenarios.
Leadership suspects buyers are gaming refunds. The data has no confirmed-abuse label, so the analytical lens is buyer concentration within the gameable-reason universe (99% of approved refunds). Ranked by a 2-feature ex-ante risk score, a small buyer cohort drives most of the refund spend — and a disproportionate share of the refund queue.
Σ refund_amount_usd WHERE refund_requested=1Σ refund_amount_usd WHERE refund_approved=1requested=1 AND approved=0Within the top-24 cohort, three hard-to-verify reason codes dominate: Marked delivered not received, Missing item, Wrong item — the reasons buyers can file without proof. Outside the cohort, these are much less common.
Residual is not proof of abuse — it's the headroom a policy could plausibly address. Likely legitimate refunds (outside-cohort + cohort baseline) ≈ —.
The raw file mixed order rows with buyer aggregates, which is a trap: SUM() over a column that's already aggregated inflates numbers if you forget to dedupe. I split it into 5 tables with explicit primary keys + foreign keys. Every metric on this page is sourced from one of these tables, and every cited number shows the exact SQL that produced it — click any SQL badge on the page to verify.
Drag the sliders to set risk thresholds and support floors. Every move re-routes all 297 historical refund requests through the decision tree and updates the tier distribution, expected approved-$ reduction, and ops manual-review load. Defaults match the values proposed in the 2-page brief.
This page runs DuckDB-WASM entirely in your browser against the parquet data we ship. No server, no install. Click Run on any query to see the output; edit the SQL inline and re-run. Use the provided parquet files (orders.parquet, buyers.parquet, rq_buyers.parquet) to replicate in your own DuckDB, Snowflake, or Redshift environment.
Forward projection over 90 days. Toggle pilot aggressiveness; the model re-estimates refund-$ reduction trajectory, appeal-overturn projection, and ops-review volume. Assumes calibration against ~10% of monthly refund requests for QA.
risk_likelihood_score(b, t) = avg(
percentile(prior_excess_ratio(b, t)),
percentile(prior_gameable_request_count(b, t))
)
prior_excess_ratio(b, t) =
Σ refund_amount_usd WHERE buyer=b AND
is_gameable_reason=1 AND refund_request_date < t
/ Σ order_value_usd WHERE buyer=b AND order_date < t
− Σ(order_value_i × seller_defect_rate_i) / Σ order_value_i
[prior-spend-weighted seller-baseline]
Approved refund $ is never part of the score (avoids circular ranking). current_requested_refund_amount controls evidence burden + queue order only, never tier assignment.
buyer_gameable_refund_count is documented as "approved gameable refunds," but the per-buyer values consistently match the request count, not the approved count. Example: buyer B_A_013 has count=10, but only 8 of their 10 gameable requests were approved.
Treated throughout as prior_gameable_request_count. Approved exposure is computed directly from row-level refund_approved=1. Flagged so reviewers know I read the data.
Percentile breakpoints in the simulator (P50, P80, P90, etc.) are computed over the full-period buyer aggregates provided in the xlsx. This is a descriptive view, not a clean backtest of request-time decisions.
Production policy would compute prior_* features using only data known at refund-request time (refund_request_date = order_date + days_to_refund_request), and thresholds would be recalibrated on the prior-at-request distribution before Tier 4/5 enforcement goes live.
This analysis runs in DuckDB-WASM against ~160 KB of parquet — perfect for demonstrating logic, overkill for production volume.
At Whatnot scale, the policy would ingest from Kafka into Snowflake/Redshift; feature store computes prior_* features at request time; rule engine (or lightweight ML scorer as v2) emits a tier label for every refund request; manual-review queue is prioritized by action_priority = risk_likelihood × current_requested_refund_amount. Shadow mode for 2 weeks before enforcement, recalibration quarterly.