# AGENTS.md - AI Coding Assistant Guidelines ## Project Overview BTC Accumulation Bot - Data Collection Phase. High-performance async data collection system for cbBTC on Hyperliquid with TimescaleDB storage. Python 3.11, asyncio, FastAPI, asyncpg, WebSockets. ## Build/Run Commands ### Docker (Primary deployment - Synology DS218+) ```bash # Build and start all services (timescaledb, data_collector, api_server) cd docker && docker-compose up -d --build # View logs docker-compose logs -f data_collector docker-compose logs -f api_server # Full deploy (creates dirs, pulls, builds, starts) bash scripts/deploy.sh ``` ### Development ```bash # API server (requires DB running) cd src/api && uvicorn server:app --reload --host 0.0.0.0 --port 8000 # Docs: http://localhost:8000/docs | Dashboard: http://localhost:8000/dashboard # Data collector cd src/data_collector && python -m data_collector.main ``` ### Testing ```bash # Run all tests pytest # Run a specific test file pytest tests/data_collector/test_websocket_client.py # Run a single test by name pytest tests/data_collector/test_websocket_client.py::test_websocket_connection -v # Run with coverage pytest --cov=src --cov-report=html ``` Note: The tests/ directory structure exists but test files have not been written yet. When creating tests, use pytest with pytest-asyncio for async test support. ### Linting & Formatting ```bash # No config files exist for these tools; use these flags: flake8 src/ --max-line-length=100 --extend-ignore=E203,W503 black --check src/ # Check formatting black src/ # Auto-format mypy src/ --ignore-missing-imports ``` ## Project Structure ``` src/ ├── data_collector/ # WebSocket client, buffer, database │ ├── __init__.py │ ├── main.py # Entry point, orchestration, signal handling │ ├── websocket_client.py # Hyperliquid WS client, Candle dataclass │ ├── candle_buffer.py # Circular buffer with async flush │ ├── database.py # asyncpg/TimescaleDB interface │ └── backfill.py # Historical data backfill from REST API └── api/ ├── server.py # FastAPI app, all endpoints └── dashboard/static/ └── index.html # Real-time web dashboard config/data_config.yaml # Non-secret operational config docker/ ├── docker-compose.yml # 3-service orchestration ├── Dockerfile.api / .collector # python:3.11-slim based └── init-scripts/ # 01-schema.sql, 02-optimization.sql scripts/ # deploy.sh, backup.sh, health_check.sh, backfill.sh tests/data_collector/ # Test directory (empty - tests not yet written) ``` ## Code Style Guidelines ### Imports Group in this order, separated by blank lines: 1. Standard library (`import asyncio`, `from datetime import datetime`) 2. Third-party (`import websockets`, `import asyncpg`, `from fastapi import FastAPI`) 3. Local/relative (`from .websocket_client import Candle`) Use relative imports (`.module`) within the `data_collector` package. Use absolute imports for third-party packages. ### Formatting - Line length: 100 characters max - Indentation: 4 spaces - Strings: double quotes (single only to avoid escaping) - Trailing commas in multi-line collections - Formatter: black ### Type Hints - Required on all function parameters and return values - `Optional[Type]` for nullable values - `List[Type]`, `Dict[str, Any]` from `typing` module - `@dataclass` for data-holding classes (e.g., `Candle`, `BufferStats`) - Callable types for callbacks: `Callable[[Candle], Awaitable[None]]` ### Naming Conventions - Classes: `PascalCase` (DataCollector, CandleBuffer) - Functions/variables: `snake_case` (get_candles, buffer_size) - Constants: `UPPER_SNAKE_CASE` (DB_HOST, MAX_BUFFER_SIZE) - Private methods: `_leading_underscore` (_handle_reconnect, _flush_loop) ### Docstrings - Triple double quotes on all modules, classes, and public methods - Brief one-line description on first line - Optional blank line + detail if needed - No Args/Returns sections (not strict Google-style) ```python """Add a candle to the buffer Returns True if added, False if buffer full and candle dropped""" ``` ### Error Handling - `try/except` with specific exceptions (never bare `except:`) - Log errors with `logger.error()` before re-raising in critical paths - Catch `asyncio.CancelledError` to break loops cleanly - Use `finally` blocks for cleanup (always call `self.stop()`) - Use `@asynccontextmanager` for resource acquisition (DB connections) ### Async Patterns - `async/await` for all I/O operations - `asyncio.Lock()` for thread-safe buffer access - `asyncio.Event()` for stop/flush coordination - `asyncio.create_task()` for background loops - `asyncio.gather(*tasks, return_exceptions=True)` for parallel cleanup - `asyncio.wait_for(coro, timeout)` for graceful shutdown - `asyncio.run(main())` as the entry point ### Logging - Module-level: `logger = logging.getLogger(__name__)` in every file - Format: `'%(asctime)s - %(name)s - %(levelname)s - %(message)s'` - Log level from env: `getattr(logging, os.getenv('LOG_LEVEL', 'INFO'))` - Use f-strings in log messages with relevant context - Levels: DEBUG (candle receipt), INFO (lifecycle), WARNING (gaps), ERROR (failures) ### Database (asyncpg + TimescaleDB) - Connection pool: `asyncpg.create_pool(min_size=1, max_size=N)` - `@asynccontextmanager` wrapper for connection acquisition - Batch inserts with `executemany()` - Upserts with `ON CONFLICT ... DO UPDATE` - Positional params: `$1, $2, ...` (not `%s`) - Use `conn.fetch()`, `conn.fetchrow()`, `conn.fetchval()` for results ### Configuration - Secrets via environment variables (`os.getenv('DB_PASSWORD')`) - Non-secret config in `config/data_config.yaml` - Constructor defaults fall back to env vars - Never commit `.env` files (contains real credentials) ## Common Tasks ### Add New API Endpoint 1. Add route in `src/api/server.py` with `@app.get()`/`@app.post()` 2. Type-hint query params with `Query()`; return `dict` or raise `HTTPException` 3. Use `asyncpg` pool for database queries ### Add New Data Source 1. Create module in `src/data_collector/` following `websocket_client.py` pattern 2. Implement async `connect()`, `disconnect()`, `receive()` methods 3. Use callback architecture: `on_data`, `on_error` callables ### Database Schema Changes 1. Update `docker/init-scripts/01-schema.sql` 2. Update `DatabaseManager` methods in `src/data_collector/database.py` 3. Rebuild: `docker-compose down -v && docker-compose up -d --build` ### Writing Tests 1. Create test files in `tests/data_collector/` (e.g., `test_websocket_client.py`) 2. Use `pytest-asyncio` for async tests: `@pytest.mark.asyncio` 3. Mock external services (WebSocket, database) with `unittest.mock` 4. Descriptive names: `test_websocket_reconnection_with_backoff` ### Historical Data Backfill The `backfill.py` module downloads historical candle data from Hyperliquid's REST API. **API Limitations:** - Max 5000 candles per coin/interval combination - 500 candles per response (requires pagination) - Available intervals: 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 8h, 12h, 1d, 3d, 1w, 1M **Usage - Python Module:** ```python from data_collector.backfill import HyperliquidBackfill from data_collector.database import DatabaseManager async with HyperliquidBackfill(db, coin="BTC", intervals=["1m", "1h"]) as backfill: # Backfill last 7 days for all configured intervals results = await backfill.backfill_all_intervals(days_back=7) # Or backfill specific interval count = await backfill.backfill_interval("1m", days_back=3) ``` **Usage - CLI:** ```bash # Backfill 7 days of 1m candles for BTC cd src/data_collector && python -m data_collector.backfill --coin BTC --days 7 --intervals 1m # Backfill multiple intervals python -m data_collector.backfill --coin BTC --days 30 --intervals 1m 5m 1h # Backfill MAXIMUM available data (5000 candles per interval) python -m data_collector.backfill --coin BTC --days max --intervals 1m 1h 1d # Or use the convenience script bash scripts/backfill.sh BTC 7 "1m 5m 1h" bash scripts/backfill.sh BTC max "1m 1h 1d" # Maximum data ``` **Data Coverage by Interval:** - 1m candles: ~3.5 days (5000 candles) - 1h candles: ~7 months (5000 candles) - 1d candles: ~13.7 years (5000 candles)