erp-core/docs/07-devops
2026-01-04 06:12:07 -06:00
..
scripts Initial commit - erp-core 2026-01-04 06:12:07 -06:00
BACKUP-RECOVERY.md Initial commit - erp-core 2026-01-04 06:12:07 -06:00
CI-CD-PIPELINE.md Initial commit - erp-core 2026-01-04 06:12:07 -06:00
DEPLOYMENT-GUIDE.md Initial commit - erp-core 2026-01-04 06:12:07 -06:00
MONITORING-OBSERVABILITY.md Initial commit - erp-core 2026-01-04 06:12:07 -06:00
README.md Initial commit - erp-core 2026-01-04 06:12:07 -06:00
SECURITY-HARDENING.md Initial commit - erp-core 2026-01-04 06:12:07 -06:00

DevOps Documentation - ERP Generic

Última actualización: 2025-11-24 Responsable: DevOps Team Estado: Production-Ready


1. OVERVIEW

Esta carpeta contiene toda la documentación DevOps necesaria para desplegar, monitorear, mantener y asegurar el ERP Generic en ambientes de producción.

El ERP Generic es un sistema modular con:

  • 14 módulos (MGN-001 a MGN-014)
  • Stack: NestJS 10 + Prisma 5 + PostgreSQL 16 + Redis 7 + React 18 + TypeScript 5
  • Multi-tenancy: Schema-level isolation + Row-Level Security (RLS)
  • 9 schemas PostgreSQL: auth, core, financial, inventory, purchase, sales, analytics, projects, system
  • Arquitectura: Microservices-ready, Cloud-native, Container-based

2. DOCUMENTOS PRINCIPALES

2.1 DEPLOYMENT-GUIDE.md

Propósito: Guía completa de deployment en todos los ambientes.

Contenido:

  • Docker setup completo (Dockerfile + docker-compose.yml)
  • PostgreSQL 16 initialization (9 schemas)
  • Redis configuration
  • Environment variables management
  • Multi-environment deployment strategy (Dev, QA, Staging, Production)
  • Zero-downtime deployment (Blue-green)
  • Rollback procedures

Audiencia: DevOps Engineers, SREs, Infrastructure Team

Tiempo de implementación: 4-6 horas primera vez, 15-30 min deployments posteriores


2.2 MONITORING-OBSERVABILITY.md

Propósito: Estrategia completa de monitoring y observability.

Contenido:

  • Prometheus setup (metrics collection)
  • Grafana dashboards (Application, Database, Business)
  • Alert rules (CPU, memoria, DB connections, error rate)
  • Logging strategy (Winston + ELK/Loki)
  • Application Performance Monitoring (APM)
  • Health checks endpoints
  • Distributed tracing (OpenTelemetry)

Audiencia: DevOps Engineers, SREs, On-call Engineers

Tiempo de implementación: 6-8 horas setup inicial


2.3 BACKUP-RECOVERY.md

Propósito: Procedimientos de backup y disaster recovery.

Contenido:

  • Backup strategy (Full + incremental)
  • Automated backup scripts (PostgreSQL multi-schema)
  • Multi-tenant backup isolation
  • Retention policies (7 días + 4 semanas + 12 meses)
  • Point-in-Time Recovery (PITR)
  • Disaster recovery playbook (RTO 4h, RPO 15min)
  • Backup testing procedures

Audiencia: DevOps Engineers, DBAs, Security Team

Tiempo de implementación: 4-5 horas setup inicial, testing mensual


2.4 SECURITY-HARDENING.md

Propósito: Hardening de seguridad completo del sistema.

Contenido:

  • OWASP Top 10 mitigations
  • Rate limiting configuration
  • JWT security (rotation, expiration, refresh tokens)
  • SQL injection prevention
  • XSS/CSRF protection
  • CORS configuration
  • Security headers (Helmet.js)
  • Secrets management (Vault/AWS Secrets Manager)
  • SSL/TLS certificate management

Audiencia: Security Team, DevOps Engineers, Backend Developers

Tiempo de implementación: 8-10 horas implementación completa


2.5 CI-CD-PIPELINE.md

Propósito: Pipeline completo de integración y deployment continuo.

Contenido:

  • GitHub Actions workflows (CI, CD-QA, CD-Production)
  • Automated testing integration (Jest + Vitest + Playwright)
  • Code quality gates (SonarQube)
  • Security scanning (Snyk + OWASP Dependency Check)
  • Docker build & push to registry
  • Automated deployment (QA auto, Production manual approval)
  • Rollback automation
  • Notifications (Slack/Discord)

Audiencia: DevOps Engineers, Tech Lead, Development Team

Tiempo de implementación: 10-12 horas setup inicial


3. SCRIPTS

3.1 scripts/backup-postgres.sh

Script automatizado de backup de PostgreSQL con soporte multi-tenant.

Características:

  • Full backup + per-schema backups
  • Compresión automática
  • Retention policy (7 días)
  • Upload opcional a S3/Cloud Storage
  • Logging y notificaciones

Ejecución: Cron diario a las 2:00 AM

0 2 * * * /opt/erp-generic/scripts/backup-postgres.sh

3.2 scripts/restore-postgres.sh

Script de restauración de backups con validación y verificación.

Características:

  • Restauración full o por schema
  • Validación de integridad antes de restaurar
  • Backup safety (crea snapshot antes de restaurar)
  • Dry-run mode para testing
  • Logging detallado

Ejecución: Manual (disaster recovery)

./restore-postgres.sh --backup=full_20251124_020000.dump --target=staging

3.3 scripts/health-check.sh

Script de health check completo del sistema.

Características:

  • Verifica backend API (/health)
  • Verifica PostgreSQL (conexión + queries)
  • Verifica Redis (conexión + ping)
  • Verifica frontend (HTTP 200)
  • Exit codes para monitoreo
  • Logging estructurado

Ejecución: Cron cada 5 minutos + usado por Kubernetes liveness/readiness probes

*/5 * * * * /opt/erp-generic/scripts/health-check.sh

4. QUICK START

Primer Deployment (Fresh Install)

# 1. Clone repository
git clone https://github.com/company/erp-generic.git
cd erp-generic

# 2. Configure environment variables
cp .env.example .env
# Editar .env con valores reales

# 3. Start services with Docker Compose
docker-compose up -d

# 4. Run database migrations
docker-compose exec backend npm run prisma:migrate:deploy

# 5. Seed initial data
docker-compose exec backend npm run seed:initial

# 6. Verify health
./scripts/health-check.sh

Tiempo total: 15-20 minutos


Update Deployment (Existing System)

# 1. Pull latest changes
git pull origin main

# 2. Backup database (safety)
./scripts/backup-postgres.sh

# 3. Build new images
docker-compose build

# 4. Run migrations (zero-downtime)
docker-compose exec backend npm run prisma:migrate:deploy

# 5. Rolling update (zero-downtime)
docker-compose up -d --no-deps --build backend
docker-compose up -d --no-deps --build frontend

# 6. Verify health
./scripts/health-check.sh

# 7. Run smoke tests
npm run test:smoke

Tiempo total: 5-10 minutos


5. AMBIENTES

Ambiente URL Deploy Method Database Purpose
Development http://localhost:3000 Manual (local) PostgreSQL local Local development
CI/CD - Auto (GitHub Actions) PostgreSQL (TestContainers) Automated testing
QA https://qa.erp-generic.local Auto (push to develop) PostgreSQL (anonymized prod) Manual QA testing
Staging https://staging.erp-generic.com Manual (approval) PostgreSQL (prod clone) Pre-release validation
Production https://erp-generic.com Manual (approval) PostgreSQL (prod) Live system

6. SLA Y OBJETIVOS

6.1 Availability Targets

  • Uptime: 99.9% (8.76 horas downtime/año máximo)
  • Planned Maintenance Window: Sábados 2:00-4:00 AM (notificación 48h antes)
  • Unplanned Downtime: <30 min/mes

6.2 Performance Targets

  • API Response Time: p50 <100ms, p95 <300ms, p99 <500ms
  • Page Load Time: p95 <2s (First Contentful Paint)
  • Database Query Time: p95 <50ms
  • Throughput: >1000 req/s @ peak load

6.3 Recovery Targets

  • RTO (Recovery Time Objective): 4 horas
  • RPO (Recovery Point Objective): 15 minutos
  • Backup Frequency: Full daily (2:00 AM) + Incremental every 4 hours
  • Backup Retention: 7 daily + 4 weekly + 12 monthly

6.4 Security Targets

  • Critical Vulnerabilities: Fix within 24 hours
  • High Vulnerabilities: Fix within 7 días
  • Security Scans: Daily (automated in CI/CD)
  • Penetration Testing: Quarterly (external vendor)
  • Security Audits: Bi-annual (compliance)

7. INCIDENT RESPONSE

7.1 Severity Levels

Severity Description Response Time Resolution Time
P0 (Critical) System down, data loss 15 min 4 horas
P1 (High) Major feature broken 1 hora 24 horas
P2 (Medium) Minor feature broken 4 horas 72 horas
P3 (Low) Cosmetic issue 24 horas Next sprint

7.2 On-Call Rotation

  • Primary On-Call: DevOps Engineer (24/7)
  • Secondary On-Call: Backend Tech Lead
  • Escalation: CTO

7.3 Incident Procedure

  1. Detection: Alerts via Prometheus/Grafana → PagerDuty
  2. Acknowledge: On-call engineer acknowledges within 15 min
  3. Assess: Determine severity level
  4. Mitigate: Apply immediate fix or rollback
  5. Communicate: Update status page + notify stakeholders
  6. Resolve: Permanent fix deployed
  7. Post-Mortem: Document lessons learned (dentro de 48h)

8. MAINTENANCE WINDOWS

8.1 Regular Maintenance

Frecuencia: Mensual (primer sábado del mes) Horario: 2:00-4:00 AM (timezone del servidor) Notificación: 48 horas antes vía email + banner en sistema

Actividades típicas:

  • Database maintenance (VACUUM, ANALYZE, REINDEX)
  • SSL certificate renewal
  • OS security patches
  • PostgreSQL minor version updates
  • Log rotation y cleanup

8.2 Emergency Maintenance

Criterio: Critical security vulnerability (P0) Notificación: 2 horas antes (mínimo) Aprobación: CTO + Product Owner


9. CONTACT INFORMATION

9.1 Teams

DevOps Team:

Security Team:

Database Team:

Development Team:

9.2 Escalation Path

  1. L1: On-Call DevOps Engineer
  2. L2: Backend Tech Lead + DBA
  3. L3: CTO + Infrastructure Manager
  4. L4: CEO (only for business-critical incidents)

10. TOOLS Y ACCESOS

10.1 Infrastructure

  • Cloud Provider: AWS / Azure / GCP (TBD)
  • Container Registry: Docker Hub / AWS ECR / GitHub Container Registry
  • CI/CD: GitHub Actions
  • Secrets Management: HashiCorp Vault / AWS Secrets Manager

10.2 Monitoring & Observability

  • APM: Prometheus + Grafana
  • Logging: Winston + ELK Stack (Elasticsearch + Logstash + Kibana) / Grafana Loki
  • Alerting: Prometheus Alertmanager → PagerDuty
  • Uptime Monitoring: UptimeRobot / Pingdom
  • Error Tracking: Sentry

10.3 Security

  • SAST: Snyk, SonarQube
  • DAST: OWASP ZAP
  • Dependency Scanning: Snyk, npm audit
  • Secret Scanning: GitGuardian, TruffleHog
  • Penetration Testing: External vendor (quarterly)

10.4 Collaboration

  • Project Management: Jira
  • Documentation: Confluence
  • Chat: Slack / Microsoft Teams
  • Video: Zoom / Google Meet
  • On-Call: PagerDuty

11. COMPLIANCE & AUDITING

11.1 Standards

  • GDPR: Data protection and privacy (EU)
  • CCPA: California Consumer Privacy Act
  • SOC 2 Type II: Security, availability, processing integrity (target)
  • ISO 27001: Information security management (target)

11.2 Audit Logs

  • Database Audit: pgaudit extension enabled
  • Application Audit: Winston structured logging + ELK
  • Infrastructure Audit: AWS CloudTrail / Azure Activity Log
  • Retention: 1 año (compliance requirement)

11.3 Data Residency

  • Primary Region: us-east-1 (Virginia) / eu-west-1 (Ireland) - TBD
  • Backup Region: us-west-2 (Oregon) / eu-central-1 (Frankfurt) - TBD
  • Data Sovereignty: EU data stays in EU (GDPR compliance)

12. CHANGELOG

Versión Fecha Autor Cambios
1.0 2025-11-24 DevOps Architect Documentación inicial completa

13. REFERENCIAS

Documentación Relacionada:

Referencias Externas:


Copyright © 2025 ERP Generic Team. All rights reserved.

Esta documentación es confidencial y está destinada únicamente para uso interno del equipo de desarrollo y operaciones del ERP Generic.

Clasificación: Internal Use Only Retención: Permanent (actualizar con cada release)


Documento: README.md Ubicación: /projects/erp-generic/docs/05-devops/ Próxima Revisión: 2025-12-24 (mensual)