feat: Update data ingestion and add training reports

Scripts:
- Update ingest_ohlcv_polygon.py for improved data processing

Reports:
- Add attention model training reports (2x)
- Add standard training reports (2x)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Adrian Flores Cortes 2026-01-30 12:24:53 -06:00
parent d015e2b0f3
commit dcfe83bb44
5 changed files with 602 additions and 57 deletions

View File

@ -0,0 +1,118 @@
# Attention Score Model Training Report
**Generated:** 2026-01-25 06:00:49
## Overview
The attention model learns to identify high-flow market moments using volume, volatility, and money flow indicators - WITHOUT hardcoding specific trading hours or sessions.
## Configuration
- **Symbols:** XAUUSD
- **Timeframes:** 5m
- **Training Data Cutoff:** 2026-01-20
- **Training Years:** 1.0
- **Holdout Years:** 0.1
### Model Parameters
| Parameter | Value |
|-----------|-------|
| Factor Window | 200 |
| Horizon Bars | 3 |
| Low Flow Threshold | 1.0 |
| High Flow Threshold | 2.0 |
### Features Used (9 total)
| Feature | Description |
|---------|-------------|
| volume_ratio | volume / rolling_median(volume, 20) |
| volume_z | z-score of volume over 20 periods |
| ATR | Average True Range (14 periods) |
| ATR_ratio | ATR / rolling_median(ATR, 50) |
| CMF | Chaikin Money Flow (20 periods) |
| MFI | Money Flow Index (14 periods) |
| OBV_delta | diff(OBV) / rolling_std(OBV, 20) |
| BB_width | (BB_upper - BB_lower) / close |
| displacement | (close - open) / ATR |
## Training Results
| Model | Symbol | TF | Reg MAE | Reg R2 | Clf Acc | Clf F1 | N Train | High Flow % |
|-------|--------|-----|---------|--------|---------|--------|---------|-------------|
| XAUUSD_5m_attention | XAUUSD | 5m | 0.8237 | 0.2248 | 58.35% | 56.10% | 52983 | 40.2% |
## Class Distribution (Holdout Set)
| Model | Low Flow | Medium Flow | High Flow |
|-------|----------|-------------|-----------|
| XAUUSD_5m_attention | 62 (0.9%) | 3890 (58.8%) | 2661 (40.2%) |
## Feature Importance
### XAUUSD_5m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | ATR_ratio | 0.4268 |
| 2 | ATR | 0.1115 |
| 3 | displacement | 0.0801 |
| 4 | BB_width | 0.0801 |
| 5 | volume_ratio | 0.0776 |
| 6 | CMF | 0.0637 |
| 7 | volume_z | 0.0592 |
| 8 | MFI | 0.0524 |
| 9 | OBV_delta | 0.0486 |
## Interpretation
### Attention Score (Regression)
- **< 1.0**: Low flow period - below average market movement expected
- **1.0 - 2.0**: Medium flow period - average market conditions
- **> 2.0**: High flow period - above average movement expected (best trading opportunities)
### Flow Class (Classification)
- **0 (low_flow)**: move_multiplier < 1.0
- **1 (medium_flow)**: 1.0 <= move_multiplier < 2.0
- **2 (high_flow)**: move_multiplier >= 2.0
## Trading Recommendations
1. **Filter by attention_score**: Only trade when attention_score > 1.0
2. **Adjust position sizing**: Increase size when attention_score > 2.0
3. **Combine with base models**: Use attention_score as feature #51 in prediction models
4. **Time-agnostic**: The model identifies flow without hardcoded sessions
## Usage Example
```python
from training.attention_trainer import AttentionModelTrainer
# Load trained models
trainer = AttentionModelTrainer.load('models/attention/')
# Get attention score for new OHLCV data
attention = trainer.get_attention_score(df_ohlcv, 'XAUUSD', '5m')
# Filter trades
mask_trade = attention > 1.0 # Only trade in medium/high flow
# Or use as feature in base models
df['attention_score'] = attention
```
## Files Generated
- `models/attention/{symbol}_{timeframe}_attention/` - Model directories
- `models/attention/trainer_metadata.joblib` - Trainer configuration
- `models/attention/training_summary.csv` - Summary metrics
---
*Report generated by Attention Model Training Pipeline*

View File

@ -0,0 +1,294 @@
# Attention Score Model Training Report
**Generated:** 2026-01-25 06:09:11
## Overview
The attention model learns to identify high-flow market moments using volume, volatility, and money flow indicators - WITHOUT hardcoding specific trading hours or sessions.
## Configuration
- **Symbols:** XAUUSD, EURUSD, BTCUSD, GBPUSD, USDJPY, AUDUSD
- **Timeframes:** 5m, 15m
- **Training Data Cutoff:** 2026-01-20
- **Training Years:** 1.0
- **Holdout Years:** 0.1
### Model Parameters
| Parameter | Value |
|-----------|-------|
| Factor Window | 200 |
| Horizon Bars | 3 |
| Low Flow Threshold | 1.0 |
| High Flow Threshold | 2.0 |
### Features Used (9 total)
| Feature | Description |
|---------|-------------|
| volume_ratio | volume / rolling_median(volume, 20) |
| volume_z | z-score of volume over 20 periods |
| ATR | Average True Range (14 periods) |
| ATR_ratio | ATR / rolling_median(ATR, 50) |
| CMF | Chaikin Money Flow (20 periods) |
| MFI | Money Flow Index (14 periods) |
| OBV_delta | diff(OBV) / rolling_std(OBV, 20) |
| BB_width | (BB_upper - BB_lower) / close |
| displacement | (close - open) / ATR |
## Training Results
| Model | Symbol | TF | Reg MAE | Reg R2 | Clf Acc | Clf F1 | N Train | High Flow % |
|-------|--------|-----|---------|--------|---------|--------|---------|-------------|
| XAUUSD_5m_attention | XAUUSD | 5m | 0.8237 | 0.2248 | 58.35% | 56.10% | 52983 | 40.2% |
| XAUUSD_15m_attention | XAUUSD | 15m | 0.8585 | 0.1183 | 54.28% | 51.34% | 17744 | 41.0% |
| EURUSD_5m_attention | EURUSD | 5m | 0.4217 | 0.2472 | 63.32% | 57.12% | 55849 | 10.5% |
| EURUSD_15m_attention | EURUSD | 15m | 0.5014 | 0.1783 | 61.45% | 55.96% | 18577 | 15.0% |
| BTCUSD_5m_attention | BTCUSD | 5m | 0.5886 | 0.2960 | 62.27% | 55.79% | 79155 | 12.8% |
| BTCUSD_15m_attention | BTCUSD | 15m | 0.6936 | 0.1764 | 60.25% | 56.22% | 26330 | 17.7% |
| GBPUSD_5m_attention | GBPUSD | 5m | 0.5024 | 0.2676 | 59.39% | 56.40% | 55618 | 23.5% |
| GBPUSD_15m_attention | GBPUSD | 15m | 0.5831 | 0.2139 | 59.53% | 56.19% | 18550 | 24.6% |
| USDJPY_5m_attention | USDJPY | 5m | 0.6536 | 0.1863 | 58.56% | 56.72% | 55687 | 26.2% |
| USDJPY_15m_attention | USDJPY | 15m | 0.7212 | 0.0821 | 53.98% | 50.89% | 18567 | 26.6% |
| AUDUSD_5m_attention | AUDUSD | 5m | 0.3715 | -0.2385 | 77.20% | 70.93% | 55315 | 1.9% |
| AUDUSD_15m_attention | AUDUSD | 15m | 0.4988 | -0.1155 | 69.43% | 62.51% | 18387 | 4.6% |
## Class Distribution (Holdout Set)
| Model | Low Flow | Medium Flow | High Flow |
|-------|----------|-------------|-----------|
| XAUUSD_5m_attention | 62 (0.9%) | 3890 (58.8%) | 2661 (40.2%) |
| XAUUSD_15m_attention | 0 (0.0%) | 1312 (59.0%) | 912 (41.0%) |
| EURUSD_5m_attention | 229 (3.3%) | 5964 (86.2%) | 727 (10.5%) |
| EURUSD_15m_attention | 60 (2.6%) | 1908 (82.4%) | 347 (15.0%) |
| BTCUSD_5m_attention | 113 (1.1%) | 9053 (86.1%) | 1347 (12.8%) |
| BTCUSD_15m_attention | 287 (8.2%) | 2597 (74.1%) | 621 (17.7%) |
| GBPUSD_5m_attention | 296 (4.3%) | 4985 (72.2%) | 1621 (23.5%) |
| GBPUSD_15m_attention | 97 (4.2%) | 1648 (71.2%) | 568 (24.6%) |
| USDJPY_5m_attention | 443 (6.4%) | 4661 (67.4%) | 1810 (26.2%) |
| USDJPY_15m_attention | 63 (2.7%) | 1636 (70.7%) | 615 (26.6%) |
| AUDUSD_5m_attention | 231 (3.3%) | 6580 (94.8%) | 130 (1.9%) |
| AUDUSD_15m_attention | 30 (1.3%) | 2181 (94.1%) | 106 (4.6%) |
## Feature Importance
### XAUUSD_5m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | ATR_ratio | 0.4268 |
| 2 | ATR | 0.1115 |
| 3 | displacement | 0.0801 |
| 4 | BB_width | 0.0801 |
| 5 | volume_ratio | 0.0776 |
| 6 | CMF | 0.0637 |
| 7 | volume_z | 0.0592 |
| 8 | MFI | 0.0524 |
| 9 | OBV_delta | 0.0486 |
### XAUUSD_15m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | ATR_ratio | 0.2016 |
| 2 | volume_ratio | 0.1666 |
| 3 | displacement | 0.1320 |
| 4 | volume_z | 0.0976 |
| 5 | BB_width | 0.0921 |
| 6 | ATR | 0.0830 |
| 7 | MFI | 0.0786 |
| 8 | OBV_delta | 0.0763 |
| 9 | CMF | 0.0722 |
### EURUSD_5m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | ATR | 0.3272 |
| 2 | ATR_ratio | 0.2003 |
| 3 | BB_width | 0.1141 |
| 4 | volume_z | 0.0970 |
| 5 | volume_ratio | 0.0849 |
| 6 | CMF | 0.0468 |
| 7 | displacement | 0.0468 |
| 8 | MFI | 0.0438 |
| 9 | OBV_delta | 0.0392 |
### EURUSD_15m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | ATR_ratio | 0.2958 |
| 2 | volume_z | 0.1386 |
| 3 | volume_ratio | 0.1346 |
| 4 | ATR | 0.1167 |
| 5 | BB_width | 0.0719 |
| 6 | MFI | 0.0636 |
| 7 | CMF | 0.0615 |
| 8 | displacement | 0.0598 |
| 9 | OBV_delta | 0.0574 |
### BTCUSD_5m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | ATR_ratio | 0.3239 |
| 2 | BB_width | 0.1286 |
| 3 | volume_ratio | 0.1037 |
| 4 | volume_z | 0.0852 |
| 5 | ATR | 0.0816 |
| 6 | CMF | 0.0793 |
| 7 | displacement | 0.0701 |
| 8 | MFI | 0.0661 |
| 9 | OBV_delta | 0.0616 |
### BTCUSD_15m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | ATR_ratio | 0.3038 |
| 2 | volume_ratio | 0.1570 |
| 3 | BB_width | 0.0998 |
| 4 | ATR | 0.0983 |
| 5 | volume_z | 0.0901 |
| 6 | displacement | 0.0679 |
| 7 | CMF | 0.0664 |
| 8 | OBV_delta | 0.0597 |
| 9 | MFI | 0.0569 |
### GBPUSD_5m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | ATR | 0.3587 |
| 2 | ATR_ratio | 0.1753 |
| 3 | volume_ratio | 0.1055 |
| 4 | BB_width | 0.0981 |
| 5 | volume_z | 0.0852 |
| 6 | displacement | 0.0514 |
| 7 | CMF | 0.0474 |
| 8 | OBV_delta | 0.0419 |
| 9 | MFI | 0.0365 |
### GBPUSD_15m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | ATR_ratio | 0.3105 |
| 2 | volume_z | 0.1476 |
| 3 | volume_ratio | 0.1287 |
| 4 | ATR | 0.1145 |
| 5 | BB_width | 0.0708 |
| 6 | MFI | 0.0605 |
| 7 | CMF | 0.0590 |
| 8 | OBV_delta | 0.0587 |
| 9 | displacement | 0.0499 |
### USDJPY_5m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | ATR_ratio | 0.3854 |
| 2 | ATR | 0.1623 |
| 3 | volume_z | 0.1100 |
| 4 | volume_ratio | 0.0971 |
| 5 | BB_width | 0.0901 |
| 6 | displacement | 0.0479 |
| 7 | OBV_delta | 0.0365 |
| 8 | MFI | 0.0359 |
| 9 | CMF | 0.0349 |
### USDJPY_15m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | volume_ratio | 0.2208 |
| 2 | volume_z | 0.2110 |
| 3 | ATR_ratio | 0.1173 |
| 4 | BB_width | 0.0934 |
| 5 | displacement | 0.0857 |
| 6 | ATR | 0.0829 |
| 7 | CMF | 0.0666 |
| 8 | OBV_delta | 0.0638 |
| 9 | MFI | 0.0585 |
### AUDUSD_5m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | ATR | 0.2316 |
| 2 | volume_ratio | 0.1677 |
| 3 | ATR_ratio | 0.1320 |
| 4 | volume_z | 0.1139 |
| 5 | MFI | 0.0923 |
| 6 | BB_width | 0.0796 |
| 7 | displacement | 0.0718 |
| 8 | CMF | 0.0717 |
| 9 | OBV_delta | 0.0394 |
### AUDUSD_15m_attention
| Rank | Feature | Combined Importance |
|------|---------|--------------------|
| 1 | volume_z | 0.1874 |
| 2 | volume_ratio | 0.1795 |
| 3 | BB_width | 0.1206 |
| 4 | ATR_ratio | 0.1140 |
| 5 | ATR | 0.0936 |
| 6 | CMF | 0.0923 |
| 7 | MFI | 0.0819 |
| 8 | displacement | 0.0779 |
| 9 | OBV_delta | 0.0529 |
## Interpretation
### Attention Score (Regression)
- **< 1.0**: Low flow period - below average market movement expected
- **1.0 - 2.0**: Medium flow period - average market conditions
- **> 2.0**: High flow period - above average movement expected (best trading opportunities)
### Flow Class (Classification)
- **0 (low_flow)**: move_multiplier < 1.0
- **1 (medium_flow)**: 1.0 <= move_multiplier < 2.0
- **2 (high_flow)**: move_multiplier >= 2.0
## Trading Recommendations
1. **Filter by attention_score**: Only trade when attention_score > 1.0
2. **Adjust position sizing**: Increase size when attention_score > 2.0
3. **Combine with base models**: Use attention_score as feature #51 in prediction models
4. **Time-agnostic**: The model identifies flow without hardcoded sessions
## Usage Example
```python
from training.attention_trainer import AttentionModelTrainer
# Load trained models
trainer = AttentionModelTrainer.load('models/attention/')
# Get attention score for new OHLCV data
attention = trainer.get_attention_score(df_ohlcv, 'XAUUSD', '5m')
# Filter trades
mask_trade = attention > 1.0 # Only trade in medium/high flow
# Or use as feature in base models
df['attention_score'] = attention
```
## Files Generated
- `models/attention/{symbol}_{timeframe}_attention/` - Model directories
- `models/attention/trainer_metadata.joblib` - Trainer configuration
- `models/attention/training_summary.csv` - Summary metrics
---
*Report generated by Attention Model Training Pipeline*

View File

@ -0,0 +1,46 @@
# Symbol-Timeframe Model Training Report
**Generated:** 2026-01-25 06:10:13
## Configuration
- **Training Data Cutoff:** 2024-12-31 (excluding 2025 for backtesting)
- **Dynamic Factor Weighting:** Enabled
- **Sample Weight Method:** Softplus with beta=4.0, w_max=3.0
## Training Results Summary
| Model | Symbol | Timeframe | Target | MAE | RMSE | R2 | Dir Accuracy | Train | Val |
|-------|--------|-----------|--------|-----|------|----|--------------| ----- | --- |
## Model Files
Models saved to: `/mnt/c/Empresas/ISEM/workspace-v2/projects/trading-platform/apps/ml-engine/models/symbol_timeframe_models`
### Model Naming Convention
- `{symbol}_{timeframe}_high_h{horizon}.joblib` - High range predictor
- `{symbol}_{timeframe}_low_h{horizon}.joblib` - Low range predictor
## Usage Example
```python
from training.symbol_timeframe_trainer import SymbolTimeframeTrainer
# Load trained models
trainer = SymbolTimeframeTrainer()
trainer.load('models/symbol_timeframe_models/')
# Predict for XAUUSD 15m
predictions = trainer.predict(features, 'XAUUSD', '15m')
print(f"Predicted High: {predictions['high']}")
print(f"Predicted Low: {predictions['low']}")
```
## Notes
1. Models exclude 2025 data for out-of-sample backtesting
2. Dynamic factor weighting emphasizes high-movement samples
3. Separate models for HIGH and LOW predictions per symbol/timeframe
---
*Report generated by Symbol-Timeframe Training Pipeline*

View File

@ -0,0 +1,46 @@
# Symbol-Timeframe Model Training Report
**Generated:** 2026-01-25 06:10:49
## Configuration
- **Training Data Cutoff:** 2024-12-31 (excluding 2025 for backtesting)
- **Dynamic Factor Weighting:** Enabled
- **Sample Weight Method:** Softplus with beta=4.0, w_max=3.0
## Training Results Summary
| Model | Symbol | Timeframe | Target | MAE | RMSE | R2 | Dir Accuracy | Train | Val |
|-------|--------|-----------|--------|-----|------|----|--------------| ----- | --- |
## Model Files
Models saved to: `/mnt/c/Empresas/ISEM/workspace-v2/projects/trading-platform/apps/ml-engine/models/symbol_timeframe_models`
### Model Naming Convention
- `{symbol}_{timeframe}_high_h{horizon}.joblib` - High range predictor
- `{symbol}_{timeframe}_low_h{horizon}.joblib` - Low range predictor
## Usage Example
```python
from training.symbol_timeframe_trainer import SymbolTimeframeTrainer
# Load trained models
trainer = SymbolTimeframeTrainer()
trainer.load('models/symbol_timeframe_models/')
# Predict for XAUUSD 15m
predictions = trainer.predict(features, 'XAUUSD', '15m')
print(f"Predicted High: {predictions['high']}")
print(f"Predicted Low: {predictions['low']}")
```
## Notes
1. Models exclude 2025 data for out-of-sample backtesting
2. Dynamic factor weighting emphasizes high-movement samples
3. Separate models for HIGH and LOW predictions per symbol/timeframe
---
*Report generated by Symbol-Timeframe Training Pipeline*

View File

@ -61,12 +61,15 @@ SYMBOL_CONFIG = {
}
# PostgreSQL connection (matches trading_platform config)
# Note: sslmode=disable for local WSL connections to avoid SSL timeout issues
PG_CONFIG = {
"host": os.getenv("DB_HOST", "localhost"),
"port": int(os.getenv("DB_PORT", "5432")),
"dbname": os.getenv("DB_NAME", "trading_platform"),
"user": os.getenv("DB_USER", "trading_user"),
"password": os.getenv("DB_PASSWORD", "trading_dev_2026"),
"sslmode": os.getenv("DB_SSLMODE", "disable"),
"connect_timeout": 30,
}
@ -198,13 +201,12 @@ def get_last_timestamp(conn, ticker_id: int) -> datetime:
return row[0] if row and row[0] else None
def insert_bars_to_postgres(conn, ticker_id: int, bars: list) -> int:
def insert_bars_to_postgres(ticker_id: int, bars: list) -> int:
"""
Insert OHLCV bars into PostgreSQL market_data.ohlcv_5m.
Uses ON CONFLICT for upsert behavior.
Uses ON CONFLICT for upsert behavior. Manages its own connection.
Args:
conn: psycopg2 connection
ticker_id: Ticker ID from market_data.tickers
bars: List of Polygon API bar dicts
@ -244,16 +246,43 @@ def insert_bars_to_postgres(conn, ticker_id: int, bars: list) -> int:
ts_ms, # ts_epoch in ms
))
# Insert in batches
batch_size = 5000
# Insert in smaller batches with fresh connection per batch group
batch_size = 500 # Small batches for reliability
total_inserted = 0
conn = None
with conn.cursor() as cur:
try:
conn = psycopg2.connect(**PG_CONFIG)
for i in range(0, len(rows), batch_size):
batch = rows[i : i + batch_size]
psycopg2.extras.execute_batch(cur, insert_sql, batch, page_size=1000)
conn.commit()
total_inserted += len(batch)
retries = 3
while retries > 0:
try:
with conn.cursor() as cur:
psycopg2.extras.execute_batch(cur, insert_sql, batch, page_size=100)
conn.commit()
total_inserted += len(batch)
break
except psycopg2.OperationalError as e:
retries -= 1
logger.warning(f" DB error ({3 - retries}/3): {str(e)[:50]}")
try:
conn.close()
except Exception:
pass
conn = psycopg2.connect(**PG_CONFIG)
if retries == 0:
raise
# Progress every 10 batches
batch_num = i // batch_size + 1
if batch_num % 20 == 0:
logger.info(f" Progress: {total_inserted:,} / {len(rows):,} rows")
finally:
if conn:
try:
conn.close()
except Exception:
pass
return total_inserted
@ -264,7 +293,6 @@ def insert_bars_to_postgres(conn, ticker_id: int, bars: list) -> int:
async def ingest_symbol(
api_key: str,
conn,
symbol: str,
start_date: datetime,
end_date: datetime,
@ -272,6 +300,7 @@ async def ingest_symbol(
) -> dict:
"""
Full ingestion pipeline for a single symbol.
Uses fresh DB connection for each symbol to avoid stale connections.
Returns:
Dict with results summary
@ -283,55 +312,69 @@ async def ingest_symbol(
polygon_symbol = config["polygon"]
# Get ticker ID from DB
# Fresh connection for this symbol
try:
ticker_id = get_ticker_id(conn, symbol)
except ValueError as e:
logger.error(str(e))
conn = psycopg2.connect(**PG_CONFIG)
except Exception as e:
logger.error(f"PostgreSQL connection failed: {e}")
return {"symbol": symbol, "status": "error", "error": str(e)}
# In incremental mode, start from last timestamp + 5 min
actual_start = start_date
if incremental:
last_ts = get_last_timestamp(conn, ticker_id)
if last_ts:
actual_start = last_ts + timedelta(minutes=5)
logger.info(f" Incremental: continuing from {actual_start}")
try:
# Get ticker ID from DB
try:
ticker_id = get_ticker_id(conn, symbol)
except ValueError as e:
logger.error(str(e))
return {"symbol": symbol, "status": "error", "error": str(e)}
if actual_start >= end_date:
logger.info(f" {symbol}: already up to date")
return {"symbol": symbol, "status": "up_to_date", "rows": 0}
# In incremental mode, start from last timestamp + 5 min
actual_start = start_date
if incremental:
last_ts = get_last_timestamp(conn, ticker_id)
if last_ts:
actual_start = last_ts + timedelta(minutes=5)
logger.info(f" Incremental: continuing from {actual_start}")
# Fetch from Polygon
logger.info(f"Fetching {symbol} ({polygon_symbol}) from {actual_start.date()} to {end_date.date()}...")
bars = await fetch_polygon_bars(
api_key=api_key,
polygon_symbol=polygon_symbol,
start_date=actual_start,
end_date=end_date,
)
if actual_start >= end_date:
logger.info(f" {symbol}: already up to date")
return {"symbol": symbol, "status": "up_to_date", "rows": 0}
if not bars:
logger.warning(f" {symbol}: no data received from API")
return {"symbol": symbol, "status": "no_data", "rows": 0}
# Fetch from Polygon
logger.info(f"Fetching {symbol} ({polygon_symbol}) from {actual_start.date()} to {end_date.date()}...")
bars = await fetch_polygon_bars(
api_key=api_key,
polygon_symbol=polygon_symbol,
start_date=actual_start,
end_date=end_date,
)
# Show range
first_ts = datetime.fromtimestamp(bars[0]["t"] / 1000)
last_ts = datetime.fromtimestamp(bars[-1]["t"] / 1000)
logger.info(f" {symbol}: {len(bars)} bars from {first_ts} to {last_ts}")
if not bars:
logger.warning(f" {symbol}: no data received from API")
return {"symbol": symbol, "status": "no_data", "rows": 0}
# Insert into PostgreSQL
logger.info(f" Inserting {len(bars)} bars into market_data.ohlcv_5m...")
inserted = insert_bars_to_postgres(conn, ticker_id, bars)
logger.info(f" {symbol}: {inserted} rows inserted/updated")
# Show range
first_ts = datetime.fromtimestamp(bars[0]["t"] / 1000)
last_ts = datetime.fromtimestamp(bars[-1]["t"] / 1000)
logger.info(f" {symbol}: {len(bars)} bars from {first_ts} to {last_ts}")
return {
"symbol": symbol,
"status": "success",
"rows": inserted,
"first_bar": str(first_ts),
"last_bar": str(last_ts),
}
# Insert into PostgreSQL (uses its own connection)
conn.close() # Close read connection before bulk insert
logger.info(f" Inserting {len(bars)} bars into market_data.ohlcv_5m...")
inserted = insert_bars_to_postgres(ticker_id, bars)
logger.info(f" {symbol}: {inserted} rows inserted/updated")
return {
"symbol": symbol,
"status": "success",
"rows": inserted,
"first_bar": str(first_ts),
"last_bar": str(last_ts),
}
finally:
try:
conn.close()
except Exception:
pass
async def main():
@ -385,15 +428,16 @@ async def main():
logger.info(f"Incremental: {args.incremental}")
logger.info(f"Database: {PG_CONFIG['host']}:{PG_CONFIG['port']}/{PG_CONFIG['dbname']}")
# Connect to PostgreSQL
# Test PostgreSQL connectivity
try:
conn = psycopg2.connect(**PG_CONFIG)
logger.info("PostgreSQL connected")
test_conn = psycopg2.connect(**PG_CONFIG)
test_conn.close()
logger.info("PostgreSQL connectivity verified")
except Exception as e:
logger.error(f"PostgreSQL connection failed: {e}")
sys.exit(1)
# Process each symbol
# Process each symbol (each gets its own connection)
results = []
for symbol in args.symbols:
logger.info(f"\n{'='*40}")
@ -402,7 +446,6 @@ async def main():
result = await ingest_symbol(
api_key=api_key,
conn=conn,
symbol=symbol,
start_date=start_date,
end_date=end_date,
@ -410,8 +453,6 @@ async def main():
)
results.append(result)
conn.close()
# Print summary
logger.info("\n" + "=" * 60)
logger.info("INGESTION SUMMARY")