🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
228 lines
6.0 KiB
Markdown
228 lines
6.0 KiB
Markdown
---
|
|
id: "RF-SCR-005"
|
|
title: "Monitoreo y Alertas"
|
|
type: "Functional Requirement"
|
|
epic: "IAI-007"
|
|
priority: "Media"
|
|
status: "Draft"
|
|
project: "inmobiliaria-analytics"
|
|
created_date: "2026-01-04"
|
|
updated_date: "2026-01-04"
|
|
---
|
|
|
|
# RF-IA-007-005: Monitoreo y Alertas
|
|
|
|
---
|
|
|
|
## Descripcion
|
|
|
|
El sistema debe proporcionar monitoreo en tiempo real del estado del scraping, metricas de rendimiento, deteccion de anomalias y alertas automaticas cuando se detecten problemas.
|
|
|
|
---
|
|
|
|
## Justificacion
|
|
|
|
El scraping es un proceso fragil que puede fallar por multiples razones (cambios en HTML, bloqueos, errores de red). El monitoreo proactivo permite detectar y resolver problemas rapidamente antes de que afecten la calidad de datos.
|
|
|
|
---
|
|
|
|
## Requisitos Funcionales
|
|
|
|
### RF-005.1: Metricas
|
|
|
|
| ID | Requisito | Prioridad |
|
|
|----|-----------|-----------|
|
|
| RF-005.1.1 | El sistema debe registrar propiedades scrapeadas por fuente/hora | Alta |
|
|
| RF-005.1.2 | El sistema debe calcular success rate por fuente/proxy | Alta |
|
|
| RF-005.1.3 | El sistema debe medir latencia promedio por request | Alta |
|
|
| RF-005.1.4 | El sistema debe contar errores por tipo y fuente | Alta |
|
|
| RF-005.1.5 | El sistema debe trackear estado del pool de proxies | Alta |
|
|
|
|
### RF-005.2: Dashboard
|
|
|
|
| ID | Requisito | Prioridad |
|
|
|----|-----------|-----------|
|
|
| RF-005.2.1 | El sistema debe mostrar estado actual de jobs | Alta |
|
|
| RF-005.2.2 | El sistema debe visualizar metricas en tiempo real | Alta |
|
|
| RF-005.2.3 | El sistema debe mostrar historial de ejecuciones | Media |
|
|
| RF-005.2.4 | El sistema debe permitir drill-down por fuente/job | Media |
|
|
|
|
### RF-005.3: Alertas
|
|
|
|
| ID | Requisito | Prioridad |
|
|
|----|-----------|-----------|
|
|
| RF-005.3.1 | El sistema debe alertar cuando success rate < 80% | Alta |
|
|
| RF-005.3.2 | El sistema debe alertar cuando un job falla | Alta |
|
|
| RF-005.3.3 | El sistema debe alertar cuando pool de proxies < umbral | Alta |
|
|
| RF-005.3.4 | El sistema debe alertar cuando detecte cambio en estructura HTML | Media |
|
|
| RF-005.3.5 | El sistema debe soportar canales: email, Slack, webhook | Media |
|
|
|
|
### RF-005.4: Logs
|
|
|
|
| ID | Requisito | Prioridad |
|
|
|----|-----------|-----------|
|
|
| RF-005.4.1 | El sistema debe registrar logs estructurados (JSON) | Alta |
|
|
| RF-005.4.2 | El sistema debe incluir correlation IDs por job | Alta |
|
|
| RF-005.4.3 | El sistema debe permitir ajustar nivel de log | Media |
|
|
| RF-005.4.4 | El sistema debe retener logs por 30 dias minimo | Media |
|
|
|
|
---
|
|
|
|
## Metricas Definidas
|
|
|
|
```yaml
|
|
metricas:
|
|
counters:
|
|
- scraper_properties_total:
|
|
labels: [source, type, status]
|
|
description: "Total propiedades procesadas"
|
|
|
|
- scraper_requests_total:
|
|
labels: [source, status_code]
|
|
description: "Total requests HTTP"
|
|
|
|
- scraper_errors_total:
|
|
labels: [source, error_type]
|
|
description: "Total errores por tipo"
|
|
|
|
gauges:
|
|
- scraper_active_jobs:
|
|
labels: [source]
|
|
description: "Jobs activos actualmente"
|
|
|
|
- scraper_proxy_pool_size:
|
|
labels: [status]
|
|
description: "Proxies por estado"
|
|
|
|
- scraper_queue_size:
|
|
description: "Tareas pendientes en cola"
|
|
|
|
histograms:
|
|
- scraper_request_duration_seconds:
|
|
labels: [source]
|
|
buckets: [0.1, 0.5, 1, 2, 5, 10]
|
|
description: "Duracion de requests"
|
|
|
|
- scraper_job_duration_seconds:
|
|
labels: [source, type]
|
|
description: "Duracion total de jobs"
|
|
```
|
|
|
|
---
|
|
|
|
## Configuracion de Alertas
|
|
|
|
```yaml
|
|
alerts:
|
|
success_rate_low:
|
|
condition: "scraper_success_rate < 0.8"
|
|
duration: "5m"
|
|
severity: warning
|
|
channels: [slack, email]
|
|
message: "Success rate bajo en {source}: {value}%"
|
|
|
|
job_failed:
|
|
condition: "scraper_job_status == 'failed'"
|
|
severity: critical
|
|
channels: [slack, email, pagerduty]
|
|
message: "Job fallido: {job_id} en {source}"
|
|
|
|
proxy_pool_low:
|
|
condition: "scraper_proxy_pool_size{status='active'} < 20"
|
|
duration: "10m"
|
|
severity: warning
|
|
channels: [slack]
|
|
message: "Pool de proxies bajo: {value} activos"
|
|
|
|
no_data:
|
|
condition: "scraper_properties_total == 0"
|
|
duration: "1h"
|
|
severity: critical
|
|
channels: [slack, email]
|
|
message: "Sin propiedades scrapeadas en la ultima hora"
|
|
|
|
html_change_detected:
|
|
condition: "scraper_selector_failures > 10"
|
|
duration: "15m"
|
|
severity: warning
|
|
channels: [slack]
|
|
message: "Posible cambio en estructura HTML de {source}"
|
|
```
|
|
|
|
---
|
|
|
|
## Dashboard Widgets
|
|
|
|
```yaml
|
|
dashboard:
|
|
row_1:
|
|
- widget: "stat"
|
|
title: "Propiedades Hoy"
|
|
metric: "sum(scraper_properties_total{status='success'})"
|
|
|
|
- widget: "stat"
|
|
title: "Success Rate"
|
|
metric: "scraper_success_rate * 100"
|
|
format: "percent"
|
|
|
|
- widget: "stat"
|
|
title: "Jobs Activos"
|
|
metric: "scraper_active_jobs"
|
|
|
|
- widget: "stat"
|
|
title: "Proxies Activos"
|
|
metric: "scraper_proxy_pool_size{status='active'}"
|
|
|
|
row_2:
|
|
- widget: "timeseries"
|
|
title: "Propiedades por Hora"
|
|
metric: "rate(scraper_properties_total[1h])"
|
|
group_by: source
|
|
|
|
- widget: "timeseries"
|
|
title: "Success Rate"
|
|
metric: "scraper_success_rate"
|
|
group_by: source
|
|
|
|
row_3:
|
|
- widget: "table"
|
|
title: "Jobs Recientes"
|
|
query: "SELECT * FROM scraping_jobs ORDER BY created_at DESC LIMIT 10"
|
|
|
|
- widget: "piechart"
|
|
title: "Errores por Tipo"
|
|
metric: "scraper_errors_total"
|
|
group_by: error_type
|
|
```
|
|
|
|
---
|
|
|
|
## Criterios de Aceptacion
|
|
|
|
- [ ] Metricas se registran correctamente en Prometheus/similar
|
|
- [ ] Dashboard muestra datos en tiempo real
|
|
- [ ] Alertas se disparan dentro de la duracion configurada
|
|
- [ ] Notificaciones llegan a los canales configurados
|
|
- [ ] Logs son estructurados y contienen correlation IDs
|
|
- [ ] Historial de metricas disponible por 30+ dias
|
|
|
|
---
|
|
|
|
## Dependencias
|
|
|
|
- Prometheus o similar para metricas
|
|
- Grafana o similar para dashboard
|
|
- AlertManager o similar para alertas
|
|
- ELK Stack o similar para logs
|
|
|
|
---
|
|
|
|
## Historias de Usuario Relacionadas
|
|
|
|
- US-SCR-005: Dashboard de monitoreo
|
|
|
|
---
|
|
|
|
**Autor:** Tech Lead
|
|
**Fecha:** 2026-01-04
|