FlareForge

Guide · Oracle

Backtest an FTSO data-provider strategy programmatically

If you run or plan to run an FTSO data-provider on Flare, the question you want answered is: "would my proposed set of data sources + aggregation have landed me in tier A over the last 30 days?" The Oracle Lab backtester on this site simulates exactly that against indexed mainnet consensus values. This guide shows how to hit the same endpoint from Python or TypeScript and interpret the tier scoring.

The request body

One POST, four required fields. Every entry in sources is a synthetic data source with its own noise distribution (jitter in basis points) and relative weight in the aggregation step.

POST /api/v1/oracle-lab/simulate
Content-Type: application/json

{
  "feed_id": "0x014254432f55534400000000000000000000000000",
  "sources": [
    { "name": "kraken",   "jitter_bps": 8,  "weight": 1.0 },
    { "name": "coinbase", "jitter_bps": 10, "weight": 1.0 },
    { "name": "binance",  "jitter_bps": 6,  "weight": 1.5 }
  ],
  "aggregation": "weighted_mean",
  "hours": 720,
  "seed": 42
}

Aggregation options: mean, median, weighted_mean, trimmed_mean. The last one drops the highest and lowest source value before averaging, which is the realistic defence against a single bad feed.

Python end-to-end

import json
from urllib.request import Request, urlopen

URL = "https://flareforge.io/api/v1/oracle-lab/simulate"

body = {
    "feed_id": "0x014254432f55534400000000000000000000000000",  # BTC/USD
    "sources": [
        {"name": "kraken",   "jitter_bps": 8,  "weight": 1.0},
        {"name": "coinbase", "jitter_bps": 10, "weight": 1.0},
        {"name": "binance",  "jitter_bps": 6,  "weight": 1.5},
    ],
    "aggregation": "weighted_mean",
    "hours": 720,
    "seed": 42,
}

req = Request(URL, data=json.dumps(body).encode("utf-8"),
              headers={"Content-Type": "application/json"})
with urlopen(req, timeout=30) as resp:
    result = json.loads(resp.read())

s = result["summary"]
print(f"Total ticks:     {s['total_ticks']}")
print(f"Tier A rate:     {float(s['tier_a_rate']) * 100:.1f} %")
print(f"Tier B rate:     {float(s['tier_b_rate']) * 100:.1f} %")
print(f"Reward score:    {s['reward_score_pct']} %")
print(f"Median deviation:{s['median_deviation_bps']} bps")

What the numbers mean

Tuning loop in practice

The realistic workflow is: add one source at a time, keep seed constant so the run is deterministic, and measure how reward_score_pctmoves. If adding a fourth source doesn't move the score, it isn't earning its RPC bill. If adding trimmed_mean over weighted_mean raises tier A rate by more than 5 pp, you have a dirty source in your set and should find out which one.

Rate limits

The simulate endpoint is more expensive than a read endpoint (it iterates over N ticks of history per source). It's rate-limited at 20 requests per minute and 200 per hour per IP, Redis-backed. For a tuning loop this is generous; if you plan to do grid-search across hundreds of parameter combinations, run it locally against the backend instead.

Related on the site