# DevOps Documentation - ERP Generic **Última actualización:** 2025-11-24 **Responsable:** DevOps Team **Estado:** ✅ Production-Ready --- ## 1. OVERVIEW Esta carpeta contiene toda la documentación DevOps necesaria para desplegar, monitorear, mantener y asegurar el ERP Generic en ambientes de producción. El ERP Generic es un sistema modular con: - **14 módulos** (MGN-001 a MGN-014) - **Stack:** NestJS 10 + Prisma 5 + PostgreSQL 16 + Redis 7 + React 18 + TypeScript 5 - **Multi-tenancy:** Schema-level isolation + Row-Level Security (RLS) - **9 schemas PostgreSQL:** auth, core, financial, inventory, purchase, sales, analytics, projects, system - **Arquitectura:** Microservices-ready, Cloud-native, Container-based --- ## 2. DOCUMENTOS PRINCIPALES ### 2.1 [DEPLOYMENT-GUIDE.md](./DEPLOYMENT-GUIDE.md) **Propósito:** Guía completa de deployment en todos los ambientes. **Contenido:** - Docker setup completo (Dockerfile + docker-compose.yml) - PostgreSQL 16 initialization (9 schemas) - Redis configuration - Environment variables management - Multi-environment deployment strategy (Dev, QA, Staging, Production) - Zero-downtime deployment (Blue-green) - Rollback procedures **Audiencia:** DevOps Engineers, SREs, Infrastructure Team **Tiempo de implementación:** 4-6 horas primera vez, 15-30 min deployments posteriores --- ### 2.2 [MONITORING-OBSERVABILITY.md](./MONITORING-OBSERVABILITY.md) **Propósito:** Estrategia completa de monitoring y observability. **Contenido:** - Prometheus setup (metrics collection) - Grafana dashboards (Application, Database, Business) - Alert rules (CPU, memoria, DB connections, error rate) - Logging strategy (Winston + ELK/Loki) - Application Performance Monitoring (APM) - Health checks endpoints - Distributed tracing (OpenTelemetry) **Audiencia:** DevOps Engineers, SREs, On-call Engineers **Tiempo de implementación:** 6-8 horas setup inicial --- ### 2.3 [BACKUP-RECOVERY.md](./BACKUP-RECOVERY.md) **Propósito:** Procedimientos de backup y disaster recovery. **Contenido:** - Backup strategy (Full + incremental) - Automated backup scripts (PostgreSQL multi-schema) - Multi-tenant backup isolation - Retention policies (7 días + 4 semanas + 12 meses) - Point-in-Time Recovery (PITR) - Disaster recovery playbook (RTO 4h, RPO 15min) - Backup testing procedures **Audiencia:** DevOps Engineers, DBAs, Security Team **Tiempo de implementación:** 4-5 horas setup inicial, testing mensual --- ### 2.4 [SECURITY-HARDENING.md](./SECURITY-HARDENING.md) **Propósito:** Hardening de seguridad completo del sistema. **Contenido:** - OWASP Top 10 mitigations - Rate limiting configuration - JWT security (rotation, expiration, refresh tokens) - SQL injection prevention - XSS/CSRF protection - CORS configuration - Security headers (Helmet.js) - Secrets management (Vault/AWS Secrets Manager) - SSL/TLS certificate management **Audiencia:** Security Team, DevOps Engineers, Backend Developers **Tiempo de implementación:** 8-10 horas implementación completa --- ### 2.5 [CI-CD-PIPELINE.md](./CI-CD-PIPELINE.md) **Propósito:** Pipeline completo de integración y deployment continuo. **Contenido:** - GitHub Actions workflows (CI, CD-QA, CD-Production) - Automated testing integration (Jest + Vitest + Playwright) - Code quality gates (SonarQube) - Security scanning (Snyk + OWASP Dependency Check) - Docker build & push to registry - Automated deployment (QA auto, Production manual approval) - Rollback automation - Notifications (Slack/Discord) **Audiencia:** DevOps Engineers, Tech Lead, Development Team **Tiempo de implementación:** 10-12 horas setup inicial --- ## 3. SCRIPTS ### 3.1 [scripts/backup-postgres.sh](./scripts/backup-postgres.sh) Script automatizado de backup de PostgreSQL con soporte multi-tenant. **Características:** - Full backup + per-schema backups - Compresión automática - Retention policy (7 días) - Upload opcional a S3/Cloud Storage - Logging y notificaciones **Ejecución:** Cron diario a las 2:00 AM ```bash 0 2 * * * /opt/erp-generic/scripts/backup-postgres.sh ``` --- ### 3.2 [scripts/restore-postgres.sh](./scripts/restore-postgres.sh) Script de restauración de backups con validación y verificación. **Características:** - Restauración full o por schema - Validación de integridad antes de restaurar - Backup safety (crea snapshot antes de restaurar) - Dry-run mode para testing - Logging detallado **Ejecución:** Manual (disaster recovery) ```bash ./restore-postgres.sh --backup=full_20251124_020000.dump --target=staging ``` --- ### 3.3 [scripts/health-check.sh](./scripts/health-check.sh) Script de health check completo del sistema. **Características:** - Verifica backend API (/health) - Verifica PostgreSQL (conexión + queries) - Verifica Redis (conexión + ping) - Verifica frontend (HTTP 200) - Exit codes para monitoreo - Logging estructurado **Ejecución:** Cron cada 5 minutos + usado por Kubernetes liveness/readiness probes ```bash */5 * * * * /opt/erp-generic/scripts/health-check.sh ``` --- ## 4. QUICK START ### Primer Deployment (Fresh Install) ```bash # 1. Clone repository git clone https://github.com/company/erp-generic.git cd erp-generic # 2. Configure environment variables cp .env.example .env # Editar .env con valores reales # 3. Start services with Docker Compose docker-compose up -d # 4. Run database migrations docker-compose exec backend npm run prisma:migrate:deploy # 5. Seed initial data docker-compose exec backend npm run seed:initial # 6. Verify health ./scripts/health-check.sh ``` **Tiempo total:** 15-20 minutos --- ### Update Deployment (Existing System) ```bash # 1. Pull latest changes git pull origin main # 2. Backup database (safety) ./scripts/backup-postgres.sh # 3. Build new images docker-compose build # 4. Run migrations (zero-downtime) docker-compose exec backend npm run prisma:migrate:deploy # 5. Rolling update (zero-downtime) docker-compose up -d --no-deps --build backend docker-compose up -d --no-deps --build frontend # 6. Verify health ./scripts/health-check.sh # 7. Run smoke tests npm run test:smoke ``` **Tiempo total:** 5-10 minutos --- ## 5. AMBIENTES | Ambiente | URL | Deploy Method | Database | Purpose | |----------|-----|---------------|----------|---------| | **Development** | http://localhost:3000 | Manual (local) | PostgreSQL local | Local development | | **CI/CD** | - | Auto (GitHub Actions) | PostgreSQL (TestContainers) | Automated testing | | **QA** | https://qa.erp-generic.local | Auto (push to develop) | PostgreSQL (anonymized prod) | Manual QA testing | | **Staging** | https://staging.erp-generic.com | Manual (approval) | PostgreSQL (prod clone) | Pre-release validation | | **Production** | https://erp-generic.com | Manual (approval) | PostgreSQL (prod) | Live system | --- ## 6. SLA Y OBJETIVOS ### 6.1 Availability Targets - **Uptime:** 99.9% (8.76 horas downtime/año máximo) - **Planned Maintenance Window:** Sábados 2:00-4:00 AM (notificación 48h antes) - **Unplanned Downtime:** <30 min/mes ### 6.2 Performance Targets - **API Response Time:** p50 <100ms, p95 <300ms, p99 <500ms - **Page Load Time:** p95 <2s (First Contentful Paint) - **Database Query Time:** p95 <50ms - **Throughput:** >1000 req/s @ peak load ### 6.3 Recovery Targets - **RTO (Recovery Time Objective):** 4 horas - **RPO (Recovery Point Objective):** 15 minutos - **Backup Frequency:** Full daily (2:00 AM) + Incremental every 4 hours - **Backup Retention:** 7 daily + 4 weekly + 12 monthly ### 6.4 Security Targets - **Critical Vulnerabilities:** Fix within 24 hours - **High Vulnerabilities:** Fix within 7 días - **Security Scans:** Daily (automated in CI/CD) - **Penetration Testing:** Quarterly (external vendor) - **Security Audits:** Bi-annual (compliance) --- ## 7. INCIDENT RESPONSE ### 7.1 Severity Levels | Severity | Description | Response Time | Resolution Time | |----------|-------------|---------------|-----------------| | **P0 (Critical)** | System down, data loss | 15 min | 4 horas | | **P1 (High)** | Major feature broken | 1 hora | 24 horas | | **P2 (Medium)** | Minor feature broken | 4 horas | 72 horas | | **P3 (Low)** | Cosmetic issue | 24 horas | Next sprint | ### 7.2 On-Call Rotation - **Primary On-Call:** DevOps Engineer (24/7) - **Secondary On-Call:** Backend Tech Lead - **Escalation:** CTO ### 7.3 Incident Procedure 1. **Detection:** Alerts via Prometheus/Grafana → PagerDuty 2. **Acknowledge:** On-call engineer acknowledges within 15 min 3. **Assess:** Determine severity level 4. **Mitigate:** Apply immediate fix or rollback 5. **Communicate:** Update status page + notify stakeholders 6. **Resolve:** Permanent fix deployed 7. **Post-Mortem:** Document lessons learned (dentro de 48h) --- ## 8. MAINTENANCE WINDOWS ### 8.1 Regular Maintenance **Frecuencia:** Mensual (primer sábado del mes) **Horario:** 2:00-4:00 AM (timezone del servidor) **Notificación:** 48 horas antes vía email + banner en sistema **Actividades típicas:** - Database maintenance (VACUUM, ANALYZE, REINDEX) - SSL certificate renewal - OS security patches - PostgreSQL minor version updates - Log rotation y cleanup ### 8.2 Emergency Maintenance **Criterio:** Critical security vulnerability (P0) **Notificación:** 2 horas antes (mínimo) **Aprobación:** CTO + Product Owner --- ## 9. CONTACT INFORMATION ### 9.1 Teams **DevOps Team:** - Email: devops@erp-generic.com - Slack: #devops-team - On-Call: +1-XXX-XXX-XXXX (PagerDuty) **Security Team:** - Email: security@erp-generic.com - Slack: #security-alerts - Incident: security-incident@erp-generic.com **Database Team:** - Email: dba@erp-generic.com - Slack: #database-team **Development Team:** - Email: dev@erp-generic.com - Slack: #development ### 9.2 Escalation Path 1. **L1:** On-Call DevOps Engineer 2. **L2:** Backend Tech Lead + DBA 3. **L3:** CTO + Infrastructure Manager 4. **L4:** CEO (only for business-critical incidents) --- ## 10. TOOLS Y ACCESOS ### 10.1 Infrastructure - **Cloud Provider:** AWS / Azure / GCP (TBD) - **Container Registry:** Docker Hub / AWS ECR / GitHub Container Registry - **CI/CD:** GitHub Actions - **Secrets Management:** HashiCorp Vault / AWS Secrets Manager ### 10.2 Monitoring & Observability - **APM:** Prometheus + Grafana - **Logging:** Winston + ELK Stack (Elasticsearch + Logstash + Kibana) / Grafana Loki - **Alerting:** Prometheus Alertmanager → PagerDuty - **Uptime Monitoring:** UptimeRobot / Pingdom - **Error Tracking:** Sentry ### 10.3 Security - **SAST:** Snyk, SonarQube - **DAST:** OWASP ZAP - **Dependency Scanning:** Snyk, npm audit - **Secret Scanning:** GitGuardian, TruffleHog - **Penetration Testing:** External vendor (quarterly) ### 10.4 Collaboration - **Project Management:** Jira - **Documentation:** Confluence - **Chat:** Slack / Microsoft Teams - **Video:** Zoom / Google Meet - **On-Call:** PagerDuty --- ## 11. COMPLIANCE & AUDITING ### 11.1 Standards - **GDPR:** Data protection and privacy (EU) - **CCPA:** California Consumer Privacy Act - **SOC 2 Type II:** Security, availability, processing integrity (target) - **ISO 27001:** Information security management (target) ### 11.2 Audit Logs - **Database Audit:** pgaudit extension enabled - **Application Audit:** Winston structured logging + ELK - **Infrastructure Audit:** AWS CloudTrail / Azure Activity Log - **Retention:** 1 año (compliance requirement) ### 11.3 Data Residency - **Primary Region:** us-east-1 (Virginia) / eu-west-1 (Ireland) - TBD - **Backup Region:** us-west-2 (Oregon) / eu-central-1 (Frankfurt) - TBD - **Data Sovereignty:** EU data stays in EU (GDPR compliance) --- ## 12. CHANGELOG | Versión | Fecha | Autor | Cambios | |---------|-------|-------|---------| | 1.0 | 2025-11-24 | DevOps Architect | Documentación inicial completa | | | | | | | | | | | --- ## 13. REFERENCIAS **Documentación Relacionada:** - [Test Plans](../04-test-plans/MASTER-TEST-PLAN.md) - [Architecture Decision Records](../adr/) - [Database Schemas](../02-modelado/database-design/schemas/) - [User Stories](../03-user-stories/) **Referencias Externas:** - [Docker Documentation](https://docs.docker.com/) - [PostgreSQL 16 Documentation](https://www.postgresql.org/docs/16/) - [Prometheus Documentation](https://prometheus.io/docs/) - [Grafana Documentation](https://grafana.com/docs/) - [OWASP Top 10](https://owasp.org/www-project-top-ten/) - [12-Factor App Methodology](https://12factor.net/) --- ## 14. LICENCIA Y COPYRIGHT **Copyright © 2025 ERP Generic Team. All rights reserved.** Esta documentación es confidencial y está destinada únicamente para uso interno del equipo de desarrollo y operaciones del ERP Generic. **Clasificación:** Internal Use Only **Retención:** Permanent (actualizar con cada release) --- **Documento:** README.md **Ubicación:** `/projects/erp-generic/docs/05-devops/` **Próxima Revisión:** 2025-12-24 (mensual)