443 lines
13 KiB
Markdown
443 lines
13 KiB
Markdown
# DevOps Documentation - ERP Generic
|
|
|
|
**Última actualización:** 2025-11-24
|
|
**Responsable:** DevOps Team
|
|
**Estado:** ✅ Production-Ready
|
|
|
|
---
|
|
|
|
## 1. OVERVIEW
|
|
|
|
Esta carpeta contiene toda la documentación DevOps necesaria para desplegar, monitorear, mantener y asegurar el ERP Generic en ambientes de producción.
|
|
|
|
El ERP Generic es un sistema modular con:
|
|
- **14 módulos** (MGN-001 a MGN-014)
|
|
- **Stack:** NestJS 10 + Prisma 5 + PostgreSQL 16 + Redis 7 + React 18 + TypeScript 5
|
|
- **Multi-tenancy:** Schema-level isolation + Row-Level Security (RLS)
|
|
- **9 schemas PostgreSQL:** auth, core, financial, inventory, purchase, sales, analytics, projects, system
|
|
- **Arquitectura:** Microservices-ready, Cloud-native, Container-based
|
|
|
|
---
|
|
|
|
## 2. DOCUMENTOS PRINCIPALES
|
|
|
|
### 2.1 [DEPLOYMENT-GUIDE.md](./DEPLOYMENT-GUIDE.md)
|
|
**Propósito:** Guía completa de deployment en todos los ambientes.
|
|
|
|
**Contenido:**
|
|
- Docker setup completo (Dockerfile + docker-compose.yml)
|
|
- PostgreSQL 16 initialization (9 schemas)
|
|
- Redis configuration
|
|
- Environment variables management
|
|
- Multi-environment deployment strategy (Dev, QA, Staging, Production)
|
|
- Zero-downtime deployment (Blue-green)
|
|
- Rollback procedures
|
|
|
|
**Audiencia:** DevOps Engineers, SREs, Infrastructure Team
|
|
|
|
**Tiempo de implementación:** 4-6 horas primera vez, 15-30 min deployments posteriores
|
|
|
|
---
|
|
|
|
### 2.2 [MONITORING-OBSERVABILITY.md](./MONITORING-OBSERVABILITY.md)
|
|
**Propósito:** Estrategia completa de monitoring y observability.
|
|
|
|
**Contenido:**
|
|
- Prometheus setup (metrics collection)
|
|
- Grafana dashboards (Application, Database, Business)
|
|
- Alert rules (CPU, memoria, DB connections, error rate)
|
|
- Logging strategy (Winston + ELK/Loki)
|
|
- Application Performance Monitoring (APM)
|
|
- Health checks endpoints
|
|
- Distributed tracing (OpenTelemetry)
|
|
|
|
**Audiencia:** DevOps Engineers, SREs, On-call Engineers
|
|
|
|
**Tiempo de implementación:** 6-8 horas setup inicial
|
|
|
|
---
|
|
|
|
### 2.3 [BACKUP-RECOVERY.md](./BACKUP-RECOVERY.md)
|
|
**Propósito:** Procedimientos de backup y disaster recovery.
|
|
|
|
**Contenido:**
|
|
- Backup strategy (Full + incremental)
|
|
- Automated backup scripts (PostgreSQL multi-schema)
|
|
- Multi-tenant backup isolation
|
|
- Retention policies (7 días + 4 semanas + 12 meses)
|
|
- Point-in-Time Recovery (PITR)
|
|
- Disaster recovery playbook (RTO 4h, RPO 15min)
|
|
- Backup testing procedures
|
|
|
|
**Audiencia:** DevOps Engineers, DBAs, Security Team
|
|
|
|
**Tiempo de implementación:** 4-5 horas setup inicial, testing mensual
|
|
|
|
---
|
|
|
|
### 2.4 [SECURITY-HARDENING.md](./SECURITY-HARDENING.md)
|
|
**Propósito:** Hardening de seguridad completo del sistema.
|
|
|
|
**Contenido:**
|
|
- OWASP Top 10 mitigations
|
|
- Rate limiting configuration
|
|
- JWT security (rotation, expiration, refresh tokens)
|
|
- SQL injection prevention
|
|
- XSS/CSRF protection
|
|
- CORS configuration
|
|
- Security headers (Helmet.js)
|
|
- Secrets management (Vault/AWS Secrets Manager)
|
|
- SSL/TLS certificate management
|
|
|
|
**Audiencia:** Security Team, DevOps Engineers, Backend Developers
|
|
|
|
**Tiempo de implementación:** 8-10 horas implementación completa
|
|
|
|
---
|
|
|
|
### 2.5 [CI-CD-PIPELINE.md](./CI-CD-PIPELINE.md)
|
|
**Propósito:** Pipeline completo de integración y deployment continuo.
|
|
|
|
**Contenido:**
|
|
- GitHub Actions workflows (CI, CD-QA, CD-Production)
|
|
- Automated testing integration (Jest + Vitest + Playwright)
|
|
- Code quality gates (SonarQube)
|
|
- Security scanning (Snyk + OWASP Dependency Check)
|
|
- Docker build & push to registry
|
|
- Automated deployment (QA auto, Production manual approval)
|
|
- Rollback automation
|
|
- Notifications (Slack/Discord)
|
|
|
|
**Audiencia:** DevOps Engineers, Tech Lead, Development Team
|
|
|
|
**Tiempo de implementación:** 10-12 horas setup inicial
|
|
|
|
---
|
|
|
|
## 3. SCRIPTS
|
|
|
|
### 3.1 [scripts/backup-postgres.sh](./scripts/backup-postgres.sh)
|
|
Script automatizado de backup de PostgreSQL con soporte multi-tenant.
|
|
|
|
**Características:**
|
|
- Full backup + per-schema backups
|
|
- Compresión automática
|
|
- Retention policy (7 días)
|
|
- Upload opcional a S3/Cloud Storage
|
|
- Logging y notificaciones
|
|
|
|
**Ejecución:** Cron diario a las 2:00 AM
|
|
```bash
|
|
0 2 * * * /opt/erp-generic/scripts/backup-postgres.sh
|
|
```
|
|
|
|
---
|
|
|
|
### 3.2 [scripts/restore-postgres.sh](./scripts/restore-postgres.sh)
|
|
Script de restauración de backups con validación y verificación.
|
|
|
|
**Características:**
|
|
- Restauración full o por schema
|
|
- Validación de integridad antes de restaurar
|
|
- Backup safety (crea snapshot antes de restaurar)
|
|
- Dry-run mode para testing
|
|
- Logging detallado
|
|
|
|
**Ejecución:** Manual (disaster recovery)
|
|
```bash
|
|
./restore-postgres.sh --backup=full_20251124_020000.dump --target=staging
|
|
```
|
|
|
|
---
|
|
|
|
### 3.3 [scripts/health-check.sh](./scripts/health-check.sh)
|
|
Script de health check completo del sistema.
|
|
|
|
**Características:**
|
|
- Verifica backend API (/health)
|
|
- Verifica PostgreSQL (conexión + queries)
|
|
- Verifica Redis (conexión + ping)
|
|
- Verifica frontend (HTTP 200)
|
|
- Exit codes para monitoreo
|
|
- Logging estructurado
|
|
|
|
**Ejecución:** Cron cada 5 minutos + usado por Kubernetes liveness/readiness probes
|
|
```bash
|
|
*/5 * * * * /opt/erp-generic/scripts/health-check.sh
|
|
```
|
|
|
|
---
|
|
|
|
## 4. QUICK START
|
|
|
|
### Primer Deployment (Fresh Install)
|
|
|
|
```bash
|
|
# 1. Clone repository
|
|
git clone https://github.com/company/erp-generic.git
|
|
cd erp-generic
|
|
|
|
# 2. Configure environment variables
|
|
cp .env.example .env
|
|
# Editar .env con valores reales
|
|
|
|
# 3. Start services with Docker Compose
|
|
docker-compose up -d
|
|
|
|
# 4. Run database migrations
|
|
docker-compose exec backend npm run prisma:migrate:deploy
|
|
|
|
# 5. Seed initial data
|
|
docker-compose exec backend npm run seed:initial
|
|
|
|
# 6. Verify health
|
|
./scripts/health-check.sh
|
|
```
|
|
|
|
**Tiempo total:** 15-20 minutos
|
|
|
|
---
|
|
|
|
### Update Deployment (Existing System)
|
|
|
|
```bash
|
|
# 1. Pull latest changes
|
|
git pull origin main
|
|
|
|
# 2. Backup database (safety)
|
|
./scripts/backup-postgres.sh
|
|
|
|
# 3. Build new images
|
|
docker-compose build
|
|
|
|
# 4. Run migrations (zero-downtime)
|
|
docker-compose exec backend npm run prisma:migrate:deploy
|
|
|
|
# 5. Rolling update (zero-downtime)
|
|
docker-compose up -d --no-deps --build backend
|
|
docker-compose up -d --no-deps --build frontend
|
|
|
|
# 6. Verify health
|
|
./scripts/health-check.sh
|
|
|
|
# 7. Run smoke tests
|
|
npm run test:smoke
|
|
```
|
|
|
|
**Tiempo total:** 5-10 minutos
|
|
|
|
---
|
|
|
|
## 5. AMBIENTES
|
|
|
|
| Ambiente | URL | Deploy Method | Database | Purpose |
|
|
|----------|-----|---------------|----------|---------|
|
|
| **Development** | http://localhost:3000 | Manual (local) | PostgreSQL local | Local development |
|
|
| **CI/CD** | - | Auto (GitHub Actions) | PostgreSQL (TestContainers) | Automated testing |
|
|
| **QA** | https://qa.erp-generic.local | Auto (push to develop) | PostgreSQL (anonymized prod) | Manual QA testing |
|
|
| **Staging** | https://staging.erp-generic.com | Manual (approval) | PostgreSQL (prod clone) | Pre-release validation |
|
|
| **Production** | https://erp-generic.com | Manual (approval) | PostgreSQL (prod) | Live system |
|
|
|
|
---
|
|
|
|
## 6. SLA Y OBJETIVOS
|
|
|
|
### 6.1 Availability Targets
|
|
- **Uptime:** 99.9% (8.76 horas downtime/año máximo)
|
|
- **Planned Maintenance Window:** Sábados 2:00-4:00 AM (notificación 48h antes)
|
|
- **Unplanned Downtime:** <30 min/mes
|
|
|
|
### 6.2 Performance Targets
|
|
- **API Response Time:** p50 <100ms, p95 <300ms, p99 <500ms
|
|
- **Page Load Time:** p95 <2s (First Contentful Paint)
|
|
- **Database Query Time:** p95 <50ms
|
|
- **Throughput:** >1000 req/s @ peak load
|
|
|
|
### 6.3 Recovery Targets
|
|
- **RTO (Recovery Time Objective):** 4 horas
|
|
- **RPO (Recovery Point Objective):** 15 minutos
|
|
- **Backup Frequency:** Full daily (2:00 AM) + Incremental every 4 hours
|
|
- **Backup Retention:** 7 daily + 4 weekly + 12 monthly
|
|
|
|
### 6.4 Security Targets
|
|
- **Critical Vulnerabilities:** Fix within 24 hours
|
|
- **High Vulnerabilities:** Fix within 7 días
|
|
- **Security Scans:** Daily (automated in CI/CD)
|
|
- **Penetration Testing:** Quarterly (external vendor)
|
|
- **Security Audits:** Bi-annual (compliance)
|
|
|
|
---
|
|
|
|
## 7. INCIDENT RESPONSE
|
|
|
|
### 7.1 Severity Levels
|
|
|
|
| Severity | Description | Response Time | Resolution Time |
|
|
|----------|-------------|---------------|-----------------|
|
|
| **P0 (Critical)** | System down, data loss | 15 min | 4 horas |
|
|
| **P1 (High)** | Major feature broken | 1 hora | 24 horas |
|
|
| **P2 (Medium)** | Minor feature broken | 4 horas | 72 horas |
|
|
| **P3 (Low)** | Cosmetic issue | 24 horas | Next sprint |
|
|
|
|
### 7.2 On-Call Rotation
|
|
- **Primary On-Call:** DevOps Engineer (24/7)
|
|
- **Secondary On-Call:** Backend Tech Lead
|
|
- **Escalation:** CTO
|
|
|
|
### 7.3 Incident Procedure
|
|
1. **Detection:** Alerts via Prometheus/Grafana → PagerDuty
|
|
2. **Acknowledge:** On-call engineer acknowledges within 15 min
|
|
3. **Assess:** Determine severity level
|
|
4. **Mitigate:** Apply immediate fix or rollback
|
|
5. **Communicate:** Update status page + notify stakeholders
|
|
6. **Resolve:** Permanent fix deployed
|
|
7. **Post-Mortem:** Document lessons learned (dentro de 48h)
|
|
|
|
---
|
|
|
|
## 8. MAINTENANCE WINDOWS
|
|
|
|
### 8.1 Regular Maintenance
|
|
**Frecuencia:** Mensual (primer sábado del mes)
|
|
**Horario:** 2:00-4:00 AM (timezone del servidor)
|
|
**Notificación:** 48 horas antes vía email + banner en sistema
|
|
|
|
**Actividades típicas:**
|
|
- Database maintenance (VACUUM, ANALYZE, REINDEX)
|
|
- SSL certificate renewal
|
|
- OS security patches
|
|
- PostgreSQL minor version updates
|
|
- Log rotation y cleanup
|
|
|
|
### 8.2 Emergency Maintenance
|
|
**Criterio:** Critical security vulnerability (P0)
|
|
**Notificación:** 2 horas antes (mínimo)
|
|
**Aprobación:** CTO + Product Owner
|
|
|
|
---
|
|
|
|
## 9. CONTACT INFORMATION
|
|
|
|
### 9.1 Teams
|
|
|
|
**DevOps Team:**
|
|
- Email: devops@erp-generic.com
|
|
- Slack: #devops-team
|
|
- On-Call: +1-XXX-XXX-XXXX (PagerDuty)
|
|
|
|
**Security Team:**
|
|
- Email: security@erp-generic.com
|
|
- Slack: #security-alerts
|
|
- Incident: security-incident@erp-generic.com
|
|
|
|
**Database Team:**
|
|
- Email: dba@erp-generic.com
|
|
- Slack: #database-team
|
|
|
|
**Development Team:**
|
|
- Email: dev@erp-generic.com
|
|
- Slack: #development
|
|
|
|
### 9.2 Escalation Path
|
|
1. **L1:** On-Call DevOps Engineer
|
|
2. **L2:** Backend Tech Lead + DBA
|
|
3. **L3:** CTO + Infrastructure Manager
|
|
4. **L4:** CEO (only for business-critical incidents)
|
|
|
|
---
|
|
|
|
## 10. TOOLS Y ACCESOS
|
|
|
|
### 10.1 Infrastructure
|
|
- **Cloud Provider:** AWS / Azure / GCP (TBD)
|
|
- **Container Registry:** Docker Hub / AWS ECR / GitHub Container Registry
|
|
- **CI/CD:** GitHub Actions
|
|
- **Secrets Management:** HashiCorp Vault / AWS Secrets Manager
|
|
|
|
### 10.2 Monitoring & Observability
|
|
- **APM:** Prometheus + Grafana
|
|
- **Logging:** Winston + ELK Stack (Elasticsearch + Logstash + Kibana) / Grafana Loki
|
|
- **Alerting:** Prometheus Alertmanager → PagerDuty
|
|
- **Uptime Monitoring:** UptimeRobot / Pingdom
|
|
- **Error Tracking:** Sentry
|
|
|
|
### 10.3 Security
|
|
- **SAST:** Snyk, SonarQube
|
|
- **DAST:** OWASP ZAP
|
|
- **Dependency Scanning:** Snyk, npm audit
|
|
- **Secret Scanning:** GitGuardian, TruffleHog
|
|
- **Penetration Testing:** External vendor (quarterly)
|
|
|
|
### 10.4 Collaboration
|
|
- **Project Management:** Jira
|
|
- **Documentation:** Confluence
|
|
- **Chat:** Slack / Microsoft Teams
|
|
- **Video:** Zoom / Google Meet
|
|
- **On-Call:** PagerDuty
|
|
|
|
---
|
|
|
|
## 11. COMPLIANCE & AUDITING
|
|
|
|
### 11.1 Standards
|
|
- **GDPR:** Data protection and privacy (EU)
|
|
- **CCPA:** California Consumer Privacy Act
|
|
- **SOC 2 Type II:** Security, availability, processing integrity (target)
|
|
- **ISO 27001:** Information security management (target)
|
|
|
|
### 11.2 Audit Logs
|
|
- **Database Audit:** pgaudit extension enabled
|
|
- **Application Audit:** Winston structured logging + ELK
|
|
- **Infrastructure Audit:** AWS CloudTrail / Azure Activity Log
|
|
- **Retention:** 1 año (compliance requirement)
|
|
|
|
### 11.3 Data Residency
|
|
- **Primary Region:** us-east-1 (Virginia) / eu-west-1 (Ireland) - TBD
|
|
- **Backup Region:** us-west-2 (Oregon) / eu-central-1 (Frankfurt) - TBD
|
|
- **Data Sovereignty:** EU data stays in EU (GDPR compliance)
|
|
|
|
---
|
|
|
|
## 12. CHANGELOG
|
|
|
|
| Versión | Fecha | Autor | Cambios |
|
|
|---------|-------|-------|---------|
|
|
| 1.0 | 2025-11-24 | DevOps Architect | Documentación inicial completa |
|
|
| | | | |
|
|
| | | | |
|
|
|
|
---
|
|
|
|
## 13. REFERENCIAS
|
|
|
|
**Documentación Relacionada:**
|
|
- [Test Plans](../04-test-plans/MASTER-TEST-PLAN.md)
|
|
- [Architecture Decision Records](../adr/)
|
|
- [Database Schemas](../02-modelado/database-design/schemas/)
|
|
- [User Stories](../03-user-stories/)
|
|
|
|
**Referencias Externas:**
|
|
- [Docker Documentation](https://docs.docker.com/)
|
|
- [PostgreSQL 16 Documentation](https://www.postgresql.org/docs/16/)
|
|
- [Prometheus Documentation](https://prometheus.io/docs/)
|
|
- [Grafana Documentation](https://grafana.com/docs/)
|
|
- [OWASP Top 10](https://owasp.org/www-project-top-ten/)
|
|
- [12-Factor App Methodology](https://12factor.net/)
|
|
|
|
---
|
|
|
|
## 14. LICENCIA Y COPYRIGHT
|
|
|
|
**Copyright © 2025 ERP Generic Team. All rights reserved.**
|
|
|
|
Esta documentación es confidencial y está destinada únicamente para uso interno del equipo de desarrollo y operaciones del ERP Generic.
|
|
|
|
**Clasificación:** Internal Use Only
|
|
**Retención:** Permanent (actualizar con cada release)
|
|
|
|
---
|
|
|
|
**Documento:** README.md
|
|
**Ubicación:** `/projects/erp-generic/docs/05-devops/`
|
|
**Próxima Revisión:** 2025-12-24 (mensual)
|