🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
129 lines
3.8 KiB
Markdown
129 lines
3.8 KiB
Markdown
---
|
|
id: "MAP-IAI-007"
|
|
title: "Mapa de EPIC IAI-007 Webscraper"
|
|
type: "Navigation Map"
|
|
epic: "IAI-007"
|
|
project: "inmobiliaria-analytics"
|
|
created_date: "2026-01-04"
|
|
updated_date: "2026-01-04"
|
|
---
|
|
|
|
# _MAP: EPIC IAI-007 - Web Scraping y ETL
|
|
|
|
**EPIC:** IAI-007
|
|
**Nombre:** Sistema de Web Scraping y ETL
|
|
**Estado:** Draft
|
|
**Story Points:** 55 (estimado)
|
|
|
|
---
|
|
|
|
## Estructura del EPIC
|
|
|
|
```
|
|
IAI-007-webscraper/
|
|
├── _MAP.md # Este archivo
|
|
├── README.md # Vision general del EPIC
|
|
│
|
|
├── requerimientos/
|
|
│ ├── _MAP.md
|
|
│ ├── RF-SCR-001.md # Motor de scraping
|
|
│ ├── RF-SCR-002.md # Gestion de proxies
|
|
│ ├── RF-SCR-003.md # Pipeline ETL
|
|
│ ├── RF-SCR-004.md # Scheduling y jobs
|
|
│ └── RF-SCR-005.md # Monitoreo
|
|
│
|
|
├── especificaciones/
|
|
│ ├── _MAP.md
|
|
│ ├── ET-SCR-001-scraper.md # Motor de scraping Playwright
|
|
│ ├── ET-SCR-002-etl.md # Pipeline ETL y normalizacion
|
|
│ └── ET-SCR-003-proxies.md # Gestion pool de proxies
|
|
│
|
|
├── historias-usuario/
|
|
│ ├── _MAP.md
|
|
│ ├── US-SCR-001.md # Scraping Inmuebles24
|
|
│ ├── US-SCR-002.md # Scraping Vivanuncios
|
|
│ ├── US-SCR-003.md # Normalizacion de datos
|
|
│ ├── US-SCR-004.md # Programacion de jobs
|
|
│ └── US-SCR-005.md # Dashboard de monitoreo
|
|
│
|
|
├── tareas/
|
|
│ └── _MAP.md
|
|
│
|
|
└── implementacion/
|
|
├── _MAP.md
|
|
├── CHANGELOG.md # Historial de cambios
|
|
└── TRACEABILITY.yml # Trazabilidad
|
|
```
|
|
|
|
---
|
|
|
|
## Requerimientos Funcionales
|
|
|
|
| ID | Nombre | Prioridad | Estado |
|
|
|----|--------|-----------|--------|
|
|
| RF-SCR-001 | Motor de scraping con anti-detection | Alta | Pendiente |
|
|
| RF-SCR-002 | Gestion de pool de proxies | Alta | Pendiente |
|
|
| RF-SCR-003 | Pipeline ETL y normalizacion | Alta | Pendiente |
|
|
| RF-SCR-004 | Scheduling y job management | Media | Pendiente |
|
|
| RF-SCR-005 | Monitoreo y alertas | Media | Pendiente |
|
|
|
|
---
|
|
|
|
## Historias de Usuario
|
|
|
|
| ID | Titulo | SP | Prioridad | Sprint |
|
|
|----|--------|----|-----------| -------|
|
|
| US-SCR-001 | Scrapear propiedades de Inmuebles24 | 13 | Alta | - |
|
|
| US-SCR-002 | Scrapear propiedades de Vivanuncios | 8 | Alta | - |
|
|
| US-SCR-003 | Normalizar datos de multiples fuentes | 8 | Alta | - |
|
|
| US-SCR-004 | Programar jobs de actualizacion | 5 | Media | - |
|
|
| US-SCR-005 | Monitorear estado del scraping | 5 | Media | - |
|
|
|
|
**Total Story Points:** 39
|
|
|
|
---
|
|
|
|
## Dependencias
|
|
|
|
### Depende de:
|
|
- IAI-001: Fundamentos (autenticacion para API interna)
|
|
- Infraestructura: Redis, PostgreSQL
|
|
|
|
### Bloquea a:
|
|
- IAI-002: Propiedades (necesita datos)
|
|
- IAI-008: ML/Analytics (necesita datos)
|
|
|
|
---
|
|
|
|
## Riesgos
|
|
|
|
| Riesgo | Probabilidad | Impacto | Mitigacion |
|
|
|--------|--------------|---------|------------|
|
|
| Bloqueo por Cloudflare | Alta | Alto | Anti-detection, proxies residenciales |
|
|
| Cambios en estructura HTML | Media | Medio | Selectores flexibles, alertas |
|
|
| Costos de proxies | Media | Bajo | Proveedores economicos, optimizacion |
|
|
| Aspectos legales | Baja | Alto | Cumplir robots.txt, agregar datos |
|
|
|
|
---
|
|
|
|
## Metricas de Exito
|
|
|
|
- [ ] 10,000 propiedades scrapeadas en primera semana
|
|
- [ ] Tasa de exito > 85%
|
|
- [ ] Tiempo de normalizacion < 1s/propiedad
|
|
- [ ] Cero bloqueos permanentes
|
|
|
|
---
|
|
|
|
## Especificaciones Tecnicas
|
|
|
|
| ID | Titulo | Estado | Contenido Principal |
|
|
|----|--------|--------|---------------------|
|
|
| ET-SCR-001 | Motor de Scraping | Creado | Playwright, stealth mode, BrowserManager |
|
|
| ET-SCR-002 | Pipeline ETL | Creado | Extractors, normalizacion, geocoding, dedup |
|
|
| ET-SCR-003 | Gestion de Proxies | Creado | Pool manager, rotacion, health checks |
|
|
|
|
---
|
|
|
|
**Ultima actualizacion:** 2026-01-04
|