6.8 KiB
6.8 KiB
Analysis Request: Data needed for system optimization
Status: Completed Date: 2025-12-20 Priority: Medium
1. User Description & Query
Goal: Check and analyze what kind of historical data is needed for further optimization of the system. Context: Learning from the past how the system worked (good and bad) will help in the future. Assume that any kind of data publicly available can be used. Desired Behavior: List of data which will help to improve now and in the future.
Specific Questions
- Price data, what kind of?
- CSV, DB?
- Source of data (Hyperliquid, Uniswap)?
- Other sources of data? Please propose.
2. Agent Summary
- Objective: Define a comprehensive data strategy to support backtesting, parameter optimization, and performance analysis for the Uniswap CLP + Hyperliquid Hedger system.
- Key Constraints:
- High Frequency: Hedging logic runs on ~1s ticks. 1-minute candles are insufficient for simulating slippage and "whipsaw" events.
- Dual Venue: Must correlate Uniswap V3 (Spot/Liquidity) events with Hyperliquid (Perp/Hedge) actions.
- Storage: High-frequency data grows rapidly; format matters.
3. Main Analysis
3.1 Data Types Required
To fully reconstruct and optimize the strategy, you need three distinct layers of data:
A. Market Data (The "Environment")
- Tick-Level Trades (Hyperliquid):
- Why: To simulate realistic slippage, fill probability, and exact trigger timing for the hedger.
- Fields:
timestamp_ms,price,size,side,liquidation (bool).
- Order Book Snapshots (Hyperliquid):
- Why: To calculate "effective impact price" for large hedges. The mid-price might be $3000, but selling $50k might execute at $2998.
- Frequency: Every 1-5 seconds.
- Uniswap V3 Pool Events (Arbitrum):
- Why: To track the exact "Health" of the CLP. Knowing when the price crosses a tick boundary is critical for "In Range" status.
- Events:
Swap(Price changes),Mint,Burn.
B. System State Data (The "Bot's Brain")
- Why: To understand why the bot made a decision. A trade might look bad in hindsight, but was correct given the data available at that millisecond.
- Fields:
timestamp,current_hedge_delta,target_hedge_delta,rebalance_threshold_used,volatility_metric,pnl_unrealized,pnl_realized.
C. External "Alpha" Data (Optimization Signals)
- Funding Rates (Historical): To optimize long/short bias.
- Gas Prices (Arbitrum): To optimize mint/burn timing (don't rebalance CLP if gas > expected fees).
- Implied Volatility (Deribit/Derebit Options): Compare realized vol vs. implied vol to adjust
DYNAMIC_THRESHOLD_MULTIPLIER.
3.2 Technical Options / Trade-offs
| Option | Pros | Cons | Complexity |
|---|---|---|---|
| A. CSV Files (Flat) | Simple, human-readable, portable. Good for daily logs. | Slow to query large datasets. Hard to merge multiple streams (e.g., matching Uniswap swap to HL trade). | Low |
| B. SQLite (Local DB) | Single file, supports SQL queries, better performance than CSV. | Concurrency limits (one writer). Not great for massive tick data (TB scale). | Low-Medium |
| C. Time-Series DB (InfluxDB / QuestDB) | Optimized for high-frequency timestamps. Native downsampling. | Requires running a server/container. Overkill for simple analysis? | High |
| D. Parquet / HDF5 | Extremely fast read/write for Python (Pandas). High compression. | Not human-readable. Best for "Cold" storage (backtesting). | Medium |
3.3 Proposed Solution Design
Architecture: "Hot" Logging + "Cold" Archival
- Live Logging (Hot): Continue using
JSONstatus files andLogfiles for immediate state. - Data Collector Script: A separate process (or async thread) that dumps high-frequency data into daily CSVs or Parquet files.
- Backtest Engine: A Python script that loads these Parquet files to simulate "What if threshold was 0.08 instead of 0.05?".
Data Sources
- Hyperliquid: Public API (Info) provides L2 snapshots and recent trade history.
- Uniswap: The Graph (Subgraphs) or RPC
eth_getLogs. - Dune Analytics: Great for exporting historical Uniswap V3 data (fees, volumes) to CSV for free/cheap.
3.4 KPI & Performance Metrics
To truly evaluate "Success," we need more than just PnL. We need to compare against benchmarks.
-
NAV vs. Benchmark (HODL):
- Metric:
(Current Wallet Value + Position Value) - (Net Inflows)vs.(Initial ETH * Current Price). - Goal: Did we beat simply holding ETH?
- Frequency: Hourly.
- Metric:
-
Hedging Efficiency (Delta Neutrality):
- Metric:
Net Delta Exposure = (Uniswap Delta + Hyperliquid Delta). - Goal: Should be close to 0. A high standard deviation here means the bot is "loose" or slow.
- Frequency: Per-Tick (or aggregated per minute).
- Metric:
-
Cost of Hedge (The "Insurance Premium"):
- Metric:
(Hedge Fees Paid + Funding Paid + Hedge Slippage) / Total Portfolio Value. - Goal: Keep this below the APR earned from Uniswap fees.
- Frequency: Daily.
- Metric:
-
Fee Coverage Ratio:
- Metric:
Uniswap Fees Earned / Cost of Hedge. - Goal: Must be > 1.0. If < 1.0, the strategy is burning money to stay neutral.
- Frequency: Daily.
- Metric:
-
Impermanent Loss (IL) Realized:
- Metric: Value lost due to selling ETH low/buying high during CLP rebalances vs. Fees Earned.
- Frequency: Per-Rebalance.
4. Risk Assessment
- Risk: Data Gaps. If the bot goes offline, you miss market data.
- Mitigation: Use public historical APIs (like Hyperliquid's archive or Dune) to fill gaps, rather than relying solely on local recording.
- Risk: Storage Bloat. Storing every millisecond tick can fill a hard drive in weeks.
- Mitigation: Aggregate. Store "1-second OHLC" + "Tick Volume" instead of every raw trade, unless debugging specific slippage events.
5. Conclusion
Recommendation:
- Immediate: Start logging Internal System State (Thresholds, Volatility metrics) to a structured CSV (
hedge_metrics.csv). You can't get this from public APIs later. - External Data: Don't build a complex scraper yet. Rely on downloading public data (Dune/Hyperliquid) when you are ready to backtest.
- Format: Use Parquet (via Pandas) for storing price data. It's 10x faster and smaller than CSV.
6. Implementation Plan
- Step 1: Create
tools/data_collector.pyto fetch and save public trade history (HL) daily. - Step 2: Modify
clp_hedger.pyto append "Decision Metrics" (Vol, Threshold, Delta) to ametrics.csvevery loop. - Step 3: Use a notebook (Colab/Jupyter) to load
metrics.csvand visualize "Threshold vs. Price Deviation".