← Home

Methodology

What Sanjaya does, where every number comes from, and what the current approximations are. This page is meant to be readable by journalists, researchers, and policy analysts who need to understand the limits before citing anything here.

What Sanjaya is

Sanjaya aggregates public Indian petroleum pricing data into a single daily view. It shows, at each of the four metros, what every rupee of the pump price pays for: refinery transfer price, dealer commission, central excise, and state VAT. It then overlays those components on nine years of history alongside Brent crude so that policy shocks and price stickiness become visible.

Sanjaya is independent. It is not commissioned by, endorsed by, or funded by any oil marketing company, ministry, or government body.

Data sources

How the waterfall is computed

For each (city, product) on the latest available date, Sanjaya computes the five-component breakdown as follows.

  1. Retail selling price (RSP) — taken directly from the PPAC daily PDF.
  2. Central excise total — Basic Excise + SAED + RIC + AIDC from the current-FY PPAC table.
  3. Dealer commission— the fixed per-kilolitre component from the latest effective PPAC row, converted to paise per litre. The hybrid 'percentage of billable price' portion is not yet modelled in the displayed number; it typically adds 0.05 to 0.10 paise per litre.
  4. State VAT— derived from the pre-VAT base using the state's percentage component. RSP = base × (1 + percent/100), so base = RSP / (1 + percent/100) and VAT = RSP − base.
  5. Refinery transfer price — the plug. What is left when you subtract excise, dealer commission, and state VAT from RSP. Any error in the upstream components lands here.

Brent intelligence

The Brent intelligence page is designed as a 24-hour reading tool, not a black-box trading model. A nontechnical visitor should read it as: what is the next trading-session Brent range, which external forces are driving that range, and what those forces mean for India's import exposure.

Sanjaya does not forecast pump-price action or OMC margin pass-through. Those are policy-gated decisions in India, not clean market-clearing model outputs. Retail prices, taxes, and margins remain useful historical facts elsewhere on the site, but they are not treated as Brent-read targets.

The current model is benchmark-calibrated, not black-box. It first runs simple price models and an adaptive error-weighted ensemble across recent rolling one-session history, selects by a skill score that combines MAE, RMSE, interval calibration, and direction hit rate, and calibrates the displayed dollar band from the selected model's actual residuals. It then scores Brent momentum, WTI confirmation, EIA weekly crude stocks, and quantified real-world news as a capped live overlay. FX is displayed beside the call because USD/INR changes India's landed-cost interpretation, but FX does not set the Brent direction score.

This is not the best mathematical model possible. It is the strongest honest public version currently supported by Sanjaya's public data: small enough to audit, empirically benchmarked, and useful before a fully calibrated feature model exists. A more advanced model should beat this baseline on out-of-sample error, directional accuracy, and interval calibration before replacing it.

  • Direction is thresholded from the score: bullish at +18 or above, bearish at -14 or below, otherwise rangebound for the next trading session.
  • The benchmark suite compares last-price carry, one-session momentum, 20-session mean reversion, EWMA drift, a blended model, and an adaptive ensemble on the same rolling forecast dates. The winner is selected by the lowest skill score, not by which story sounds best.
  • The displayed dollar range starts from the selected benchmark forecast, adds a capped live-signal overlay of at most plus/minus $1.25/bbl, and uses the benchmark's empirical residual band as the interval width. The residual band targets roughly 85 percent recent coverage.
  • News is quantified, not treated as unstructured sentiment. Sanjaya classifies live crisis and trend headlines into route disruption, physical supply loss, geopolitics, producer restraint, gas spillover, demand weakness, risk easing, and supply increase. The signed news shock is recency-weighted, source-diversity adjusted, and capped before it can move the Brent range.
  • Signal quality means source coverage and agreement, not a statistically calibrated probability. If sources are stale, missing, or disagreeing, the model should widen the band or lower confidence before it gets more assertive.
  • The Brent page exposes the benchmark winner, sample count, MAE, RMSE, empirical band, direction hit rate, news shock, and nearest challenger so technical readers can see whether the model is earning its claim.
  • The driver deck separates observed inputs from known unknowns: OPEC+ compliance, shipping and war premium, inventory expectations, and positioning/liquidity can move Brent before India has full visibility.
  • The next mathematical upgrade is a regularised historical feature model that archives headline shocks and backtests WTI, spread, volatility, EIA inventory surprises, and crisis indicators on the same forecast dates before those live inputs are allowed to carry more weight.

The useful citation is therefore the model state and the evidence rows together: range, score, source health, top drivers, and the stated caveat. Do not cite the exact dollar endpoints as a guaranteed 24-hour forecast.

Known approximations

The current build uses the following simplifications. These will be tightened in future releases and are flagged in the data files via the raw_formula field.

  • State VAT uses only the percentage component of each formula. States with per-litre additives (Tamil Nadu adds Rs 11.52/L) or a surcharge on VAT (Bihar adds 30 percent) will show slightly under-stated VAT in the waterfall.
  • For 'percentage OR Rs X per litre whichever is higher' rules, the percentage is assumed dominant at current price levels. This holds for petrol and diesel across 2022 to 2026 but may not hold at lower underlying prices.
  • Crude overlay is Brent, not the authoritative Indian Basket. The two are directionally close but not identical, and Sanjaya now leaves missing official spot dates blank rather than approximating them with futures.
  • Historical excise rates are captured only for the current fiscal year at the moment. Pre-FY26 excise history backfill is pending.
  • State VAT history is not yet captured; only the current-rate snapshot is shown. Backfill of effective-dated rows is pending.

Refresh cadence

A GitHub Actions workflow runs daily at 02:30 UTC (08:00 IST) and scrapes the PPAC daily PDF, ECB FX feed, official EIA Daily Prices spot close, and the FRED Brent history series. A second workflow runs every Sunday at 03:00 UTC and refreshes the weekly tables: state VAT, central excise, dealer commission. Any new data lands in the repository as a commit, which triggers a Vercel rebuild; the static pages you are viewing are typically updated within 10 minutes of PPAC publication.

How to cite or reuse

The complete dataset is in the GitHub repository under /data. Every file is plain JSON or JSONL, typed in TypeScript, and kept small enough for direct clone. Git history is the audit trail: every scrape is a commit, every row has a scrape timestamp.

If you are a journalist, researcher, or policy analyst: fork the repo, cite Sanjaya as the intermediary, and cite PPAC as the primary source. If you find an error, raise a GitHub issue with the specific row and the expected value.

Related deep-dives

For specific topical reads that build on this methodology:

  • Russia crude tracker — Eurasia, Middle East, Africa, Americas import-region shares over time.
  • Hormuz watch — Gulf chokepoint exposure for Indian crude and product flows.
  • Gas dynamics — Henry Hub vs crude parity, ceiling pricing, and fertiliser/power feedstock impact.
  • Global pump prices — where Indian retail sits in the international comparison.

Last reviewed: 2026-04-21. Methodology is versioned in the repo; substantive changes will be noted in the changelog.