Files
uniswap_auto_clp/todo/data_needed_for_optymalization.md

6.8 KiB

Analysis Request: Data needed for system optimization

Status: Completed Date: 2025-12-20 Priority: Medium


1. User Description & Query

Goal: Check and analyze what kind of historical data is needed for further optimization of the system. Context: Learning from the past how the system worked (good and bad) will help in the future. Assume that any kind of data publicly available can be used. Desired Behavior: List of data which will help to improve now and in the future.

Specific Questions

  1. Price data, what kind of?
  2. CSV, DB?
  3. Source of data (Hyperliquid, Uniswap)?
  4. Other sources of data? Please propose.

2. Agent Summary

  • Objective: Define a comprehensive data strategy to support backtesting, parameter optimization, and performance analysis for the Uniswap CLP + Hyperliquid Hedger system.
  • Key Constraints:
    • High Frequency: Hedging logic runs on ~1s ticks. 1-minute candles are insufficient for simulating slippage and "whipsaw" events.
    • Dual Venue: Must correlate Uniswap V3 (Spot/Liquidity) events with Hyperliquid (Perp/Hedge) actions.
    • Storage: High-frequency data grows rapidly; format matters.

3. Main Analysis

3.1 Data Types Required

To fully reconstruct and optimize the strategy, you need three distinct layers of data:

A. Market Data (The "Environment")

  1. Tick-Level Trades (Hyperliquid):
    • Why: To simulate realistic slippage, fill probability, and exact trigger timing for the hedger.
    • Fields: timestamp_ms, price, size, side, liquidation (bool).
  2. Order Book Snapshots (Hyperliquid):
    • Why: To calculate "effective impact price" for large hedges. The mid-price might be $3000, but selling $50k might execute at $2998.
    • Frequency: Every 1-5 seconds.
  3. Uniswap V3 Pool Events (Arbitrum):
    • Why: To track the exact "Health" of the CLP. Knowing when the price crosses a tick boundary is critical for "In Range" status.
    • Events: Swap (Price changes), Mint, Burn.

B. System State Data (The "Bot's Brain")

  • Why: To understand why the bot made a decision. A trade might look bad in hindsight, but was correct given the data available at that millisecond.
  • Fields: timestamp, current_hedge_delta, target_hedge_delta, rebalance_threshold_used, volatility_metric, pnl_unrealized, pnl_realized.

C. External "Alpha" Data (Optimization Signals)

  • Funding Rates (Historical): To optimize long/short bias.
  • Gas Prices (Arbitrum): To optimize mint/burn timing (don't rebalance CLP if gas > expected fees).
  • Implied Volatility (Deribit/Derebit Options): Compare realized vol vs. implied vol to adjust DYNAMIC_THRESHOLD_MULTIPLIER.

3.2 Technical Options / Trade-offs

Option Pros Cons Complexity
A. CSV Files (Flat) Simple, human-readable, portable. Good for daily logs. Slow to query large datasets. Hard to merge multiple streams (e.g., matching Uniswap swap to HL trade). Low
B. SQLite (Local DB) Single file, supports SQL queries, better performance than CSV. Concurrency limits (one writer). Not great for massive tick data (TB scale). Low-Medium
C. Time-Series DB (InfluxDB / QuestDB) Optimized for high-frequency timestamps. Native downsampling. Requires running a server/container. Overkill for simple analysis? High
D. Parquet / HDF5 Extremely fast read/write for Python (Pandas). High compression. Not human-readable. Best for "Cold" storage (backtesting). Medium

3.3 Proposed Solution Design

Architecture: "Hot" Logging + "Cold" Archival

  1. Live Logging (Hot): Continue using JSON status files and Log files for immediate state.
  2. Data Collector Script: A separate process (or async thread) that dumps high-frequency data into daily CSVs or Parquet files.
  3. Backtest Engine: A Python script that loads these Parquet files to simulate "What if threshold was 0.08 instead of 0.05?".

Data Sources

  • Hyperliquid: Public API (Info) provides L2 snapshots and recent trade history.
  • Uniswap: The Graph (Subgraphs) or RPC eth_getLogs.
  • Dune Analytics: Great for exporting historical Uniswap V3 data (fees, volumes) to CSV for free/cheap.

3.4 KPI & Performance Metrics

To truly evaluate "Success," we need more than just PnL. We need to compare against benchmarks.

  1. NAV vs. Benchmark (HODL):

    • Metric: (Current Wallet Value + Position Value) - (Net Inflows) vs. (Initial ETH * Current Price).
    • Goal: Did we beat simply holding ETH?
    • Frequency: Hourly.
  2. Hedging Efficiency (Delta Neutrality):

    • Metric: Net Delta Exposure = (Uniswap Delta + Hyperliquid Delta).
    • Goal: Should be close to 0. A high standard deviation here means the bot is "loose" or slow.
    • Frequency: Per-Tick (or aggregated per minute).
  3. Cost of Hedge (The "Insurance Premium"):

    • Metric: (Hedge Fees Paid + Funding Paid + Hedge Slippage) / Total Portfolio Value.
    • Goal: Keep this below the APR earned from Uniswap fees.
    • Frequency: Daily.
  4. Fee Coverage Ratio:

    • Metric: Uniswap Fees Earned / Cost of Hedge.
    • Goal: Must be > 1.0. If < 1.0, the strategy is burning money to stay neutral.
    • Frequency: Daily.
  5. Impermanent Loss (IL) Realized:

    • Metric: Value lost due to selling ETH low/buying high during CLP rebalances vs. Fees Earned.
    • Frequency: Per-Rebalance.

4. Risk Assessment

  • Risk: Data Gaps. If the bot goes offline, you miss market data.
    • Mitigation: Use public historical APIs (like Hyperliquid's archive or Dune) to fill gaps, rather than relying solely on local recording.
  • Risk: Storage Bloat. Storing every millisecond tick can fill a hard drive in weeks.
    • Mitigation: Aggregate. Store "1-second OHLC" + "Tick Volume" instead of every raw trade, unless debugging specific slippage events.

5. Conclusion

Recommendation:

  1. Immediate: Start logging Internal System State (Thresholds, Volatility metrics) to a structured CSV (hedge_metrics.csv). You can't get this from public APIs later.
  2. External Data: Don't build a complex scraper yet. Rely on downloading public data (Dune/Hyperliquid) when you are ready to backtest.
  3. Format: Use Parquet (via Pandas) for storing price data. It's 10x faster and smaller than CSV.

6. Implementation Plan

  • Step 1: Create tools/data_collector.py to fetch and save public trade history (HL) daily.
  • Step 2: Modify clp_hedger.py to append "Decision Metrics" (Vol, Threshold, Delta) to a metrics.csv every loop.
  • Step 3: Use a notebook (Colab/Jupyter) to load metrics.csv and visualize "Threshold vs. Price Deviation".