| .. | ||
| scripts | ||
| BACKUP-RECOVERY.md | ||
| CI-CD-PIPELINE.md | ||
| DEPLOYMENT-GUIDE.md | ||
| MONITORING-OBSERVABILITY.md | ||
| README.md | ||
| SECURITY-HARDENING.md | ||
DevOps Documentation - ERP Generic
Última actualización: 2025-11-24 Responsable: DevOps Team Estado: ✅ Production-Ready
1. OVERVIEW
Esta carpeta contiene toda la documentación DevOps necesaria para desplegar, monitorear, mantener y asegurar el ERP Generic en ambientes de producción.
El ERP Generic es un sistema modular con:
- 14 módulos (MGN-001 a MGN-014)
- Stack: NestJS 10 + Prisma 5 + PostgreSQL 16 + Redis 7 + React 18 + TypeScript 5
- Multi-tenancy: Schema-level isolation + Row-Level Security (RLS)
- 9 schemas PostgreSQL: auth, core, financial, inventory, purchase, sales, analytics, projects, system
- Arquitectura: Microservices-ready, Cloud-native, Container-based
2. DOCUMENTOS PRINCIPALES
2.1 DEPLOYMENT-GUIDE.md
Propósito: Guía completa de deployment en todos los ambientes.
Contenido:
- Docker setup completo (Dockerfile + docker-compose.yml)
- PostgreSQL 16 initialization (9 schemas)
- Redis configuration
- Environment variables management
- Multi-environment deployment strategy (Dev, QA, Staging, Production)
- Zero-downtime deployment (Blue-green)
- Rollback procedures
Audiencia: DevOps Engineers, SREs, Infrastructure Team
Tiempo de implementación: 4-6 horas primera vez, 15-30 min deployments posteriores
2.2 MONITORING-OBSERVABILITY.md
Propósito: Estrategia completa de monitoring y observability.
Contenido:
- Prometheus setup (metrics collection)
- Grafana dashboards (Application, Database, Business)
- Alert rules (CPU, memoria, DB connections, error rate)
- Logging strategy (Winston + ELK/Loki)
- Application Performance Monitoring (APM)
- Health checks endpoints
- Distributed tracing (OpenTelemetry)
Audiencia: DevOps Engineers, SREs, On-call Engineers
Tiempo de implementación: 6-8 horas setup inicial
2.3 BACKUP-RECOVERY.md
Propósito: Procedimientos de backup y disaster recovery.
Contenido:
- Backup strategy (Full + incremental)
- Automated backup scripts (PostgreSQL multi-schema)
- Multi-tenant backup isolation
- Retention policies (7 días + 4 semanas + 12 meses)
- Point-in-Time Recovery (PITR)
- Disaster recovery playbook (RTO 4h, RPO 15min)
- Backup testing procedures
Audiencia: DevOps Engineers, DBAs, Security Team
Tiempo de implementación: 4-5 horas setup inicial, testing mensual
2.4 SECURITY-HARDENING.md
Propósito: Hardening de seguridad completo del sistema.
Contenido:
- OWASP Top 10 mitigations
- Rate limiting configuration
- JWT security (rotation, expiration, refresh tokens)
- SQL injection prevention
- XSS/CSRF protection
- CORS configuration
- Security headers (Helmet.js)
- Secrets management (Vault/AWS Secrets Manager)
- SSL/TLS certificate management
Audiencia: Security Team, DevOps Engineers, Backend Developers
Tiempo de implementación: 8-10 horas implementación completa
2.5 CI-CD-PIPELINE.md
Propósito: Pipeline completo de integración y deployment continuo.
Contenido:
- GitHub Actions workflows (CI, CD-QA, CD-Production)
- Automated testing integration (Jest + Vitest + Playwright)
- Code quality gates (SonarQube)
- Security scanning (Snyk + OWASP Dependency Check)
- Docker build & push to registry
- Automated deployment (QA auto, Production manual approval)
- Rollback automation
- Notifications (Slack/Discord)
Audiencia: DevOps Engineers, Tech Lead, Development Team
Tiempo de implementación: 10-12 horas setup inicial
3. SCRIPTS
3.1 scripts/backup-postgres.sh
Script automatizado de backup de PostgreSQL con soporte multi-tenant.
Características:
- Full backup + per-schema backups
- Compresión automática
- Retention policy (7 días)
- Upload opcional a S3/Cloud Storage
- Logging y notificaciones
Ejecución: Cron diario a las 2:00 AM
0 2 * * * /opt/erp-generic/scripts/backup-postgres.sh
3.2 scripts/restore-postgres.sh
Script de restauración de backups con validación y verificación.
Características:
- Restauración full o por schema
- Validación de integridad antes de restaurar
- Backup safety (crea snapshot antes de restaurar)
- Dry-run mode para testing
- Logging detallado
Ejecución: Manual (disaster recovery)
./restore-postgres.sh --backup=full_20251124_020000.dump --target=staging
3.3 scripts/health-check.sh
Script de health check completo del sistema.
Características:
- Verifica backend API (/health)
- Verifica PostgreSQL (conexión + queries)
- Verifica Redis (conexión + ping)
- Verifica frontend (HTTP 200)
- Exit codes para monitoreo
- Logging estructurado
Ejecución: Cron cada 5 minutos + usado por Kubernetes liveness/readiness probes
*/5 * * * * /opt/erp-generic/scripts/health-check.sh
4. QUICK START
Primer Deployment (Fresh Install)
# 1. Clone repository
git clone https://github.com/company/erp-generic.git
cd erp-generic
# 2. Configure environment variables
cp .env.example .env
# Editar .env con valores reales
# 3. Start services with Docker Compose
docker-compose up -d
# 4. Run database migrations
docker-compose exec backend npm run prisma:migrate:deploy
# 5. Seed initial data
docker-compose exec backend npm run seed:initial
# 6. Verify health
./scripts/health-check.sh
Tiempo total: 15-20 minutos
Update Deployment (Existing System)
# 1. Pull latest changes
git pull origin main
# 2. Backup database (safety)
./scripts/backup-postgres.sh
# 3. Build new images
docker-compose build
# 4. Run migrations (zero-downtime)
docker-compose exec backend npm run prisma:migrate:deploy
# 5. Rolling update (zero-downtime)
docker-compose up -d --no-deps --build backend
docker-compose up -d --no-deps --build frontend
# 6. Verify health
./scripts/health-check.sh
# 7. Run smoke tests
npm run test:smoke
Tiempo total: 5-10 minutos
5. AMBIENTES
| Ambiente | URL | Deploy Method | Database | Purpose |
|---|---|---|---|---|
| Development | http://localhost:3000 | Manual (local) | PostgreSQL local | Local development |
| CI/CD | - | Auto (GitHub Actions) | PostgreSQL (TestContainers) | Automated testing |
| QA | https://qa.erp-generic.local | Auto (push to develop) | PostgreSQL (anonymized prod) | Manual QA testing |
| Staging | https://staging.erp-generic.com | Manual (approval) | PostgreSQL (prod clone) | Pre-release validation |
| Production | https://erp-generic.com | Manual (approval) | PostgreSQL (prod) | Live system |
6. SLA Y OBJETIVOS
6.1 Availability Targets
- Uptime: 99.9% (8.76 horas downtime/año máximo)
- Planned Maintenance Window: Sábados 2:00-4:00 AM (notificación 48h antes)
- Unplanned Downtime: <30 min/mes
6.2 Performance Targets
- API Response Time: p50 <100ms, p95 <300ms, p99 <500ms
- Page Load Time: p95 <2s (First Contentful Paint)
- Database Query Time: p95 <50ms
- Throughput: >1000 req/s @ peak load
6.3 Recovery Targets
- RTO (Recovery Time Objective): 4 horas
- RPO (Recovery Point Objective): 15 minutos
- Backup Frequency: Full daily (2:00 AM) + Incremental every 4 hours
- Backup Retention: 7 daily + 4 weekly + 12 monthly
6.4 Security Targets
- Critical Vulnerabilities: Fix within 24 hours
- High Vulnerabilities: Fix within 7 días
- Security Scans: Daily (automated in CI/CD)
- Penetration Testing: Quarterly (external vendor)
- Security Audits: Bi-annual (compliance)
7. INCIDENT RESPONSE
7.1 Severity Levels
| Severity | Description | Response Time | Resolution Time |
|---|---|---|---|
| P0 (Critical) | System down, data loss | 15 min | 4 horas |
| P1 (High) | Major feature broken | 1 hora | 24 horas |
| P2 (Medium) | Minor feature broken | 4 horas | 72 horas |
| P3 (Low) | Cosmetic issue | 24 horas | Next sprint |
7.2 On-Call Rotation
- Primary On-Call: DevOps Engineer (24/7)
- Secondary On-Call: Backend Tech Lead
- Escalation: CTO
7.3 Incident Procedure
- Detection: Alerts via Prometheus/Grafana → PagerDuty
- Acknowledge: On-call engineer acknowledges within 15 min
- Assess: Determine severity level
- Mitigate: Apply immediate fix or rollback
- Communicate: Update status page + notify stakeholders
- Resolve: Permanent fix deployed
- Post-Mortem: Document lessons learned (dentro de 48h)
8. MAINTENANCE WINDOWS
8.1 Regular Maintenance
Frecuencia: Mensual (primer sábado del mes) Horario: 2:00-4:00 AM (timezone del servidor) Notificación: 48 horas antes vía email + banner en sistema
Actividades típicas:
- Database maintenance (VACUUM, ANALYZE, REINDEX)
- SSL certificate renewal
- OS security patches
- PostgreSQL minor version updates
- Log rotation y cleanup
8.2 Emergency Maintenance
Criterio: Critical security vulnerability (P0) Notificación: 2 horas antes (mínimo) Aprobación: CTO + Product Owner
9. CONTACT INFORMATION
9.1 Teams
DevOps Team:
- Email: devops@erp-generic.com
- Slack: #devops-team
- On-Call: +1-XXX-XXX-XXXX (PagerDuty)
Security Team:
- Email: security@erp-generic.com
- Slack: #security-alerts
- Incident: security-incident@erp-generic.com
Database Team:
- Email: dba@erp-generic.com
- Slack: #database-team
Development Team:
- Email: dev@erp-generic.com
- Slack: #development
9.2 Escalation Path
- L1: On-Call DevOps Engineer
- L2: Backend Tech Lead + DBA
- L3: CTO + Infrastructure Manager
- L4: CEO (only for business-critical incidents)
10. TOOLS Y ACCESOS
10.1 Infrastructure
- Cloud Provider: AWS / Azure / GCP (TBD)
- Container Registry: Docker Hub / AWS ECR / GitHub Container Registry
- CI/CD: GitHub Actions
- Secrets Management: HashiCorp Vault / AWS Secrets Manager
10.2 Monitoring & Observability
- APM: Prometheus + Grafana
- Logging: Winston + ELK Stack (Elasticsearch + Logstash + Kibana) / Grafana Loki
- Alerting: Prometheus Alertmanager → PagerDuty
- Uptime Monitoring: UptimeRobot / Pingdom
- Error Tracking: Sentry
10.3 Security
- SAST: Snyk, SonarQube
- DAST: OWASP ZAP
- Dependency Scanning: Snyk, npm audit
- Secret Scanning: GitGuardian, TruffleHog
- Penetration Testing: External vendor (quarterly)
10.4 Collaboration
- Project Management: Jira
- Documentation: Confluence
- Chat: Slack / Microsoft Teams
- Video: Zoom / Google Meet
- On-Call: PagerDuty
11. COMPLIANCE & AUDITING
11.1 Standards
- GDPR: Data protection and privacy (EU)
- CCPA: California Consumer Privacy Act
- SOC 2 Type II: Security, availability, processing integrity (target)
- ISO 27001: Information security management (target)
11.2 Audit Logs
- Database Audit: pgaudit extension enabled
- Application Audit: Winston structured logging + ELK
- Infrastructure Audit: AWS CloudTrail / Azure Activity Log
- Retention: 1 año (compliance requirement)
11.3 Data Residency
- Primary Region: us-east-1 (Virginia) / eu-west-1 (Ireland) - TBD
- Backup Region: us-west-2 (Oregon) / eu-central-1 (Frankfurt) - TBD
- Data Sovereignty: EU data stays in EU (GDPR compliance)
12. CHANGELOG
| Versión | Fecha | Autor | Cambios |
|---|---|---|---|
| 1.0 | 2025-11-24 | DevOps Architect | Documentación inicial completa |
13. REFERENCIAS
Documentación Relacionada:
Referencias Externas:
- Docker Documentation
- PostgreSQL 16 Documentation
- Prometheus Documentation
- Grafana Documentation
- OWASP Top 10
- 12-Factor App Methodology
14. LICENCIA Y COPYRIGHT
Copyright © 2025 ERP Generic Team. All rights reserved.
Esta documentación es confidencial y está destinada únicamente para uso interno del equipo de desarrollo y operaciones del ERP Generic.
Clasificación: Internal Use Only Retención: Permanent (actualizar con cada release)
Documento: README.md
Ubicación: /projects/erp-generic/docs/05-devops/
Próxima Revisión: 2025-12-24 (mensual)