# BACKUP & RECOVERY - ERP Generic **Última actualización:** 2025-11-24 **Responsable:** DevOps Team / DBA Team **Estado:** ✅ Production-Ready --- ## TABLE OF CONTENTS 1. [Overview](#1-overview) 2. [Backup Strategy](#2-backup-strategy) 3. [Backup Scripts](#3-backup-scripts) 4. [Multi-Tenant Backup Isolation](#4-multi-tenant-backup-isolation) 5. [Retention Policy](#5-retention-policy) 6. [Recovery Procedures](#6-recovery-procedures) 7. [Point-in-Time Recovery (PITR)](#7-point-in-time-recovery-pitr) 8. [Disaster Recovery Playbook](#8-disaster-recovery-playbook) 9. [Backup Testing](#9-backup-testing) 10. [References](#10-references) --- ## 1. OVERVIEW ### 1.1 Backup Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ PostgreSQL 16 Database │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ auth │ │ core │ │ financial│ │ inventory│ │ │ │ schema │ │ schema │ │ schema │ │ schema │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ │ │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ purchase │ │ sales │ │ analytics│ │ projects │ │ │ │ schema │ │ schema │ │ schema │ │ schema │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ │ │ │ │ │ ┌──────────┐ │ │ │ system │ │ │ │ schema │ │ │ └────┬─────┘ │ └───────┼───────────────────────────────────────────────────┘ │ ↓ (Automated backup every 4 hours) ┌─────────────────────────────────────────────────────────────┐ │ Local Backup Storage (/backups) │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Full Backup │ │ Incremental │ │ Per-Schema │ │ │ │ (Daily) │ │ (4 hours) │ │ Backups │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ └─────────┼──────────────────┼──────────────────┼─────────────┘ │ │ │ │ (Sync every hour)│ │ ↓ ↓ ↓ ┌─────────────────────────────────────────────────────────────┐ │ Cloud Storage (S3 / Azure Blob / GCS) │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Versioning Enabled | Lifecycle Rules | Encryption │ │ │ │ Retention: 7d + 4w + 12m │ │ │ └──────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────┐ │ WAL (Write-Ahead Logs) Archive │ │ Continuous archiving for Point-in-Time Recovery (PITR) │ │ Retention: 7 days │ └─────────────────────────────────────────────────────────────┘ ``` ### 1.2 Backup Objectives **RTO (Recovery Time Objective):** 4 hours - Maximum time to restore service after failure **RPO (Recovery Point Objective):** 15 minutes - Maximum data loss acceptable (via WAL archiving) **Data Durability:** 99.999999999% (11 nines) - Multi-region cloud storage replication **Backup Types:** 1. **Full Backup:** Complete database snapshot (daily at 2:00 AM) 2. **Incremental Backup:** Changes since last full backup (every 4 hours) 3. **WAL Archive:** Continuous archiving for PITR (every 60 seconds) 4. **Per-Schema Backup:** Individual schema backups for multi-tenant isolation (daily) --- ## 2. BACKUP STRATEGY ### 2.1 Backup Schedule | Backup Type | Frequency | Retention | Size (Est.) | Duration | |-------------|-----------|-----------|-------------|----------| | **Full Backup** | Daily (2:00 AM) | 7 days local + 12 months cloud | 50-100 GB | 30-45 min | | **Incremental Backup** | Every 4 hours | 7 days local | 5-10 GB | 5-10 min | | **WAL Archive** | Continuous (every 16 MB) | 7 days | 100-200 GB/week | Real-time | | **Per-Schema Backup** | Daily (3:00 AM) | 7 days local + 4 weeks cloud | 5-15 GB each | 5-10 min each | | **Config Backup** | On change + daily | 30 days | <100 MB | <1 min | ### 2.2 Backup Storage Locations **Primary Storage (Local):** - **Path:** `/backups/postgres/` - **Filesystem:** XFS (optimized for large files) - **Capacity:** 500 GB minimum - **RAID:** RAID 10 (performance + redundancy) **Secondary Storage (Cloud):** - **Provider:** AWS S3 / Azure Blob Storage / Google Cloud Storage - **Bucket:** `erp-generic-backups-prod` - **Region:** Multi-region replication (e.g., us-east-1 + us-west-2) - **Encryption:** AES-256 at rest - **Versioning:** Enabled (30 versions max) - **Lifecycle Rules:** - Move to Glacier after 30 days - Delete after 1 year (except annual backups) **Tertiary Storage (Offsite):** - **Type:** Tape backup / Cold storage - **Frequency:** Monthly - **Retention:** 7 years (compliance requirement) ### 2.3 Backup Validation **Automated Validation (Daily):** ```bash # 1. Verify backup file integrity md5sum /backups/postgres/full_20251124_020000.dump > /backups/postgres/full_20251124_020000.dump.md5 # 2. Test restore to staging environment (weekly) pg_restore --dbname=erp_generic_staging --clean --if-exists /backups/postgres/full_20251124_020000.dump # 3. Run smoke tests on restored database psql -d erp_generic_staging -c "SELECT COUNT(*) FROM auth.users;" psql -d erp_generic_staging -c "SELECT COUNT(*) FROM core.partners;" # 4. Compare row counts with production diff <(psql -d erp_generic -tAc "SELECT tablename, n_live_tup FROM pg_stat_user_tables ORDER BY tablename") \ <(psql -d erp_generic_staging -tAc "SELECT tablename, n_live_tup FROM pg_stat_user_tables ORDER BY tablename") ``` **Manual Validation (Monthly):** - Full disaster recovery drill - Restore to isolated environment - Verify business-critical data - Test application functionality - Document findings in post-mortem --- ## 3. BACKUP SCRIPTS ### 3.1 Full Backup Script **File:** `scripts/backup-postgres.sh` ```bash #!/bin/bash # ===================================================== # ERP GENERIC - PostgreSQL Full Backup Script # Performs full database backup with multi-schema support # ===================================================== set -euo pipefail # Configuration BACKUP_DIR="/backups/postgres" TIMESTAMP=$(date +"%Y%m%d_%H%M%S") RETENTION_DAYS=7 DB_HOST="${POSTGRES_HOST:-postgres}" DB_PORT="${POSTGRES_PORT:-5432}" DB_NAME="${POSTGRES_DB:-erp_generic}" DB_USER="${POSTGRES_USER:-erp_user}" PGPASSWORD="${POSTGRES_PASSWORD}" export PGPASSWORD # Logging LOG_FILE="/var/log/erp-generic/backup.log" exec > >(tee -a "$LOG_FILE") exec 2>&1 echo "===== PostgreSQL Backup Started at $(date) =====" # Create backup directory if not exists mkdir -p "$BACKUP_DIR" # 1. Full Database Backup echo "1. Creating full database backup..." FULL_BACKUP_FILE="${BACKUP_DIR}/full_${TIMESTAMP}.dump" pg_dump -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -Fc -v -d "$DB_NAME" -f "$FULL_BACKUP_FILE" if [ $? -eq 0 ]; then echo "✓ Full backup created: $FULL_BACKUP_FILE" FILE_SIZE=$(du -h "$FULL_BACKUP_FILE" | cut -f1) echo " Size: $FILE_SIZE" else echo "✗ Full backup failed!" exit 1 fi # 2. Generate MD5 checksum echo "2. Generating checksum..." md5sum "$FULL_BACKUP_FILE" > "${FULL_BACKUP_FILE}.md5" echo "✓ Checksum saved: ${FULL_BACKUP_FILE}.md5" # 3. Per-Schema Backups (Multi-Tenant Isolation) echo "3. Creating per-schema backups..." SCHEMAS=("auth" "core" "financial" "inventory" "purchase" "sales" "analytics" "projects" "system") for schema in "${SCHEMAS[@]}"; do SCHEMA_BACKUP_FILE="${BACKUP_DIR}/${schema}_${TIMESTAMP}.dump" echo " Backing up schema: $schema" pg_dump -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -Fc -n "$schema" -d "$DB_NAME" -f "$SCHEMA_BACKUP_FILE" if [ $? -eq 0 ]; then SCHEMA_SIZE=$(du -h "$SCHEMA_BACKUP_FILE" | cut -f1) echo " ✓ $schema backup created ($SCHEMA_SIZE)" else echo " ✗ $schema backup failed!" fi done # 4. Backup Database Roles and Permissions echo "4. Backing up database roles..." ROLES_BACKUP_FILE="${BACKUP_DIR}/roles_${TIMESTAMP}.sql" pg_dumpall -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" --roles-only -f "$ROLES_BACKUP_FILE" echo "✓ Roles backup created: $ROLES_BACKUP_FILE" # 5. Backup PostgreSQL Configuration echo "5. Backing up PostgreSQL configuration..." CONFIG_BACKUP_DIR="${BACKUP_DIR}/config_${TIMESTAMP}" mkdir -p "$CONFIG_BACKUP_DIR" # Copy config files (if accessible) if [ -f /etc/postgresql/16/main/postgresql.conf ]; then cp /etc/postgresql/16/main/postgresql.conf "$CONFIG_BACKUP_DIR/" cp /etc/postgresql/16/main/pg_hba.conf "$CONFIG_BACKUP_DIR/" echo "✓ Config files backed up" fi # 6. Backup WAL Archive Status echo "6. Recording WAL archive status..." psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" -tAc "SELECT * FROM pg_stat_archiver;" > "${BACKUP_DIR}/wal_status_${TIMESTAMP}.txt" # 7. Upload to Cloud Storage (S3) if [ -n "${AWS_S3_BUCKET:-}" ]; then echo "7. Uploading to S3..." aws s3 cp "$FULL_BACKUP_FILE" "s3://${AWS_S3_BUCKET}/postgres/${TIMESTAMP}/" --storage-class STANDARD_IA aws s3 cp "${FULL_BACKUP_FILE}.md5" "s3://${AWS_S3_BUCKET}/postgres/${TIMESTAMP}/" # Upload per-schema backups for schema in "${SCHEMAS[@]}"; do SCHEMA_BACKUP_FILE="${BACKUP_DIR}/${schema}_${TIMESTAMP}.dump" if [ -f "$SCHEMA_BACKUP_FILE" ]; then aws s3 cp "$SCHEMA_BACKUP_FILE" "s3://${AWS_S3_BUCKET}/postgres/${TIMESTAMP}/schemas/" --storage-class STANDARD_IA fi done echo "✓ Backup uploaded to S3" fi # 8. Cleanup Old Backups (Local) echo "8. Cleaning up old backups (older than $RETENTION_DAYS days)..." find "$BACKUP_DIR" -type f -name "*.dump" -mtime +$RETENTION_DAYS -delete find "$BACKUP_DIR" -type f -name "*.sql" -mtime +$RETENTION_DAYS -delete find "$BACKUP_DIR" -type f -name "*.md5" -mtime +$RETENTION_DAYS -delete find "$BACKUP_DIR" -type d -name "config_*" -mtime +$RETENTION_DAYS -exec rm -rf {} + 2>/dev/null || true echo "✓ Old backups cleaned up" # 9. Verify Backup Integrity echo "9. Verifying backup integrity..." md5sum -c "${FULL_BACKUP_FILE}.md5" if [ $? -eq 0 ]; then echo "✓ Backup integrity verified" else echo "✗ Backup integrity check failed!" exit 1 fi # 10. Send Notification echo "10. Sending backup notification..." BACKUP_SIZE=$(du -sh "$BACKUP_DIR" | cut -f1) # Slack notification (optional) if [ -n "${SLACK_WEBHOOK_URL:-}" ]; then curl -X POST "$SLACK_WEBHOOK_URL" \ -H 'Content-Type: application/json' \ -d "{\"text\": \"✅ PostgreSQL backup completed successfully\n• Database: $DB_NAME\n• Size: $FILE_SIZE\n• Timestamp: $TIMESTAMP\n• Total backup dir size: $BACKUP_SIZE\"}" fi echo "===== PostgreSQL Backup Completed at $(date) =====" echo "Total backup size: $BACKUP_SIZE" echo "Backup location: $BACKUP_DIR" # Exit successfully exit 0 ``` ### 3.2 Incremental Backup Script **File:** `scripts/backup-postgres-incremental.sh` ```bash #!/bin/bash # ===================================================== # ERP GENERIC - PostgreSQL Incremental Backup Script # Uses pg_dump with --snapshot for consistency # ===================================================== set -euo pipefail BACKUP_DIR="/backups/postgres/incremental" TIMESTAMP=$(date +"%Y%m%d_%H%M%S") DB_HOST="${POSTGRES_HOST:-postgres}" DB_NAME="${POSTGRES_DB:-erp_generic}" DB_USER="${POSTGRES_USER:-erp_user}" PGPASSWORD="${POSTGRES_PASSWORD}" export PGPASSWORD mkdir -p "$BACKUP_DIR" echo "===== Incremental Backup Started at $(date) =====" # Get last full backup timestamp LAST_FULL_BACKUP=$(ls -t /backups/postgres/full_*.dump | head -1 | grep -oP '\d{8}_\d{6}') echo "Last full backup: $LAST_FULL_BACKUP" # Incremental backup: Export only changed data (simplified approach) # In production, consider using WAL-based incremental backups or pg_basebackup INCREMENTAL_FILE="${BACKUP_DIR}/incremental_${TIMESTAMP}.dump" pg_dump -h "$DB_HOST" -U "$DB_USER" -Fc -d "$DB_NAME" -f "$INCREMENTAL_FILE" echo "✓ Incremental backup created: $INCREMENTAL_FILE" echo "===== Incremental Backup Completed at $(date) =====" ``` ### 3.3 WAL Archiving Configuration **File:** `postgresql.conf` (WAL archiving section) ```ini # WAL Settings for Point-in-Time Recovery wal_level = replica archive_mode = on archive_command = 'test ! -f /backups/wal/%f && cp %p /backups/wal/%f && aws s3 cp /backups/wal/%f s3://${AWS_S3_BUCKET}/wal/' archive_timeout = 60s max_wal_senders = 3 wal_keep_size = 1GB ``` ### 3.4 Cron Schedule **File:** `/etc/cron.d/erp-backup` ```cron # ERP Generic Backup Schedule # Full backup daily at 2:00 AM 0 2 * * * root /opt/erp-generic/scripts/backup-postgres.sh >> /var/log/erp-generic/backup.log 2>&1 # Incremental backup every 4 hours 0 */4 * * * root /opt/erp-generic/scripts/backup-postgres-incremental.sh >> /var/log/erp-generic/backup.log 2>&1 # Verify backups daily at 4:00 AM 0 4 * * * root /opt/erp-generic/scripts/verify-backup.sh >> /var/log/erp-generic/backup.log 2>&1 # Cleanup old WAL files daily at 5:00 AM 0 5 * * * root find /backups/wal -type f -mtime +7 -delete # Weekly full disaster recovery test (Sundays at 3:00 AM) 0 3 * * 0 root /opt/erp-generic/scripts/test-restore.sh >> /var/log/erp-generic/backup-test.log 2>&1 ``` --- ## 4. MULTI-TENANT BACKUP ISOLATION ### 4.1 Per-Tenant Backup Strategy **Why Per-Schema Backups?** - Restore individual tenant without affecting others - Compliance: GDPR right to erasure (delete tenant data) - Tenant migration to dedicated instance - Faster restore times for single tenant issues **Backup Structure:** ``` /backups/postgres/ ├── full_20251124_020000.dump # All schemas ├── auth_20251124_020000.dump # Auth schema only ├── core_20251124_020000.dump # Core schema only ├── financial_20251124_020000.dump # Financial schema only ├── inventory_20251124_020000.dump # Inventory schema only ├── purchase_20251124_020000.dump # Purchase schema only ├── sales_20251124_020000.dump # Sales schema only ├── analytics_20251124_020000.dump # Analytics schema only ├── projects_20251124_020000.dump # Projects schema only └── system_20251124_020000.dump # System schema only ``` ### 4.2 Restore Single Tenant ```bash #!/bin/bash # Restore single tenant (schema isolation) TENANT_ID="tenant-abc" SCHEMA_NAME="financial" # Example: restore financial schema only BACKUP_FILE="/backups/postgres/financial_20251124_020000.dump" echo "Restoring schema $SCHEMA_NAME for tenant $TENANT_ID..." # Option 1: Drop and restore entire schema psql -h postgres -U erp_user -d erp_generic -c "DROP SCHEMA IF EXISTS $SCHEMA_NAME CASCADE;" pg_restore -h postgres -U erp_user -d erp_generic -n $SCHEMA_NAME --clean --if-exists "$BACKUP_FILE" # Option 2: Restore to temporary schema, then copy tenant-specific data pg_restore -h postgres -U erp_user -d erp_generic -n ${SCHEMA_NAME}_temp --create "$BACKUP_FILE" # Copy tenant-specific data psql -h postgres -U erp_user -d erp_generic < [--target=] [--no-prompt]" echo "Example: $0 /backups/postgres/full_20251124_020000.dump --target=erp_generic_staging" exit 1 fi BACKUP_FILE="$1" TARGET_DB="${2:-erp_generic}" NO_PROMPT=false # Parse arguments for arg in "$@"; do case $arg in --target=*) TARGET_DB="${arg#*=}" shift ;; --no-prompt) NO_PROMPT=true shift ;; esac done DB_HOST="${POSTGRES_HOST:-postgres}" DB_USER="${POSTGRES_USER:-erp_user}" PGPASSWORD="${POSTGRES_PASSWORD}" export PGPASSWORD echo "===== PostgreSQL Restore Started at $(date) =====" echo "Backup file: $BACKUP_FILE" echo "Target database: $TARGET_DB" # Verify backup file exists if [ ! -f "$BACKUP_FILE" ]; then echo "✗ Backup file not found: $BACKUP_FILE" exit 1 fi # Verify checksum if exists if [ -f "${BACKUP_FILE}.md5" ]; then echo "Verifying backup integrity..." md5sum -c "${BACKUP_FILE}.md5" if [ $? -ne 0 ]; then echo "✗ Backup integrity check failed!" exit 1 fi echo "✓ Backup integrity verified" fi # Safety prompt (unless --no-prompt) if [ "$NO_PROMPT" = false ]; then echo "" echo "⚠️ WARNING: This will OVERWRITE all data in database '$TARGET_DB'" echo "⚠️ Make sure you have a recent backup before proceeding!" echo "" read -p "Are you sure you want to continue? (yes/no): " -r if [[ ! $REPLY =~ ^[Yy][Ee][Ss]$ ]]; then echo "Restore cancelled by user." exit 0 fi fi # Create safety backup of target database (if not staging/dev) if [[ ! "$TARGET_DB" =~ (staging|dev|test) ]]; then echo "Creating safety backup of $TARGET_DB before restore..." SAFETY_BACKUP="/backups/postgres/safety_${TARGET_DB}_$(date +%Y%m%d_%H%M%S).dump" pg_dump -h "$DB_HOST" -U "$DB_USER" -Fc -d "$TARGET_DB" -f "$SAFETY_BACKUP" echo "✓ Safety backup created: $SAFETY_BACKUP" fi # Terminate active connections to target database echo "Terminating active connections to $TARGET_DB..." psql -h "$DB_HOST" -U "$DB_USER" -d postgres < pg_backend_pid(); SQL # Restore database echo "Restoring database from $BACKUP_FILE..." pg_restore -h "$DB_HOST" -U "$DB_USER" -d "$TARGET_DB" --clean --if-exists --verbose "$BACKUP_FILE" if [ $? -eq 0 ]; then echo "✓ Database restored successfully" else echo "✗ Restore failed!" exit 1 fi # Verify restore echo "Verifying restore..." TABLE_COUNT=$(psql -h "$DB_HOST" -U "$DB_USER" -d "$TARGET_DB" -tAc "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema NOT IN ('pg_catalog', 'information_schema');") USER_COUNT=$(psql -h "$DB_HOST" -U "$DB_USER" -d "$TARGET_DB" -tAc "SELECT COUNT(*) FROM auth.users;" 2>/dev/null || echo "N/A") echo "Tables restored: $TABLE_COUNT" echo "Users in auth.users: $USER_COUNT" # Rebuild statistics echo "Rebuilding database statistics..." psql -h "$DB_HOST" -U "$DB_USER" -d "$TARGET_DB" -c "ANALYZE;" echo "✓ Statistics rebuilt" echo "===== PostgreSQL Restore Completed at $(date) =====" echo "" echo "Next steps:" echo "1. Verify application functionality" echo "2. Run smoke tests: npm run test:smoke" echo "3. Check logs for errors" echo "4. Notify team that restore is complete" exit 0 ``` ### 6.2 Schema-Level Restore ```bash # Restore single schema SCHEMA="financial" BACKUP_FILE="/backups/postgres/financial_20251124_020000.dump" pg_restore -h postgres -U erp_user -d erp_generic -n $SCHEMA --clean --if-exists "$BACKUP_FILE" ``` ### 6.3 Table-Level Restore ```bash # Restore single table TABLE="auth.users" BACKUP_FILE="/backups/postgres/full_20251124_020000.dump" pg_restore -h postgres -U erp_user -d erp_generic -t $TABLE --data-only "$BACKUP_FILE" ``` --- ## 7. POINT-IN-TIME RECOVERY (PITR) ### 7.1 PITR Process **Use Case:** Restore database to specific point in time (e.g., before accidental DELETE query) **Requirements:** - Base backup (full backup) - WAL archives from base backup to target time **Steps:** ```bash #!/bin/bash # Point-in-Time Recovery (PITR) TARGET_TIME="2025-11-24 14:30:00" BASE_BACKUP="/backups/postgres/full_20251124_020000.dump" WAL_ARCHIVE_DIR="/backups/wal" echo "===== Point-in-Time Recovery to $TARGET_TIME =====" # 1. Stop PostgreSQL docker-compose stop postgres # 2. Backup current data (safety) mv /data/postgres /data/postgres_backup_$(date +%Y%m%d_%H%M%S) # 3. Restore base backup mkdir -p /data/postgres pg_basebackup -h postgres -U erp_user -D /data/postgres -Fp -Xs -P # 4. Create recovery configuration cat > /data/postgres/recovery.conf <= '2025-11-24 14:00:00') TO '/tmp/recovered_orders.csv' CSV HEADER;" # 4. Import recovered records to production psql -h postgres -U erp_user -d erp_generic -c "\COPY sales.orders FROM '/tmp/recovered_orders.csv' CSV HEADER;" echo "✓ Deleted records recovered" ``` **Estimated RTO:** 1 hour **Estimated RPO:** 0 (no data loss if caught quickly) --- **Scenario 3: Database Corruption** **Symptoms:** - PostgreSQL fails to start - "corrupt page" errors in logs - Data inconsistencies **Recovery Steps:** ```bash # 1. Attempt automatic repair docker-compose exec postgres pg_resetwal /var/lib/postgresql/data # 2. If repair fails, restore from backup ./scripts/restore-postgres.sh /backups/postgres/full_20251124_020000.dump # 3. Run VACUUM and ANALYZE psql -h postgres -U erp_user -d erp_generic -c "VACUUM FULL; ANALYZE;" # 4. Rebuild indexes psql -h postgres -U erp_user -d erp_generic -c "REINDEX DATABASE erp_generic;" ``` --- ## 9. BACKUP TESTING ### 9.1 Monthly Restore Test **File:** `scripts/test-restore.sh` ```bash #!/bin/bash # Monthly backup restore test BACKUP_FILE=$(ls -t /backups/postgres/full_*.dump | head -1) TEST_DB="erp_generic_restore_test" echo "===== Backup Restore Test =====" echo "Backup: $BACKUP_FILE" # 1. Drop test database if exists psql -h postgres -U erp_user -d postgres -c "DROP DATABASE IF EXISTS $TEST_DB;" # 2. Create test database psql -h postgres -U erp_user -d postgres -c "CREATE DATABASE $TEST_DB;" # 3. Restore backup pg_restore -h postgres -U erp_user -d $TEST_DB --clean --if-exists "$BACKUP_FILE" # 4. Run smoke tests psql -h postgres -U erp_user -d $TEST_DB < 0; SQL # 5. Cleanup psql -h postgres -U erp_user -d postgres -c "DROP DATABASE $TEST_DB;" echo "✓ Backup restore test passed" ``` ### 9.2 Quarterly DR Drill **Checklist:** - [ ] Provision new infrastructure (staging environment) - [ ] Restore from cloud backup (S3) - [ ] Verify all 9 schemas restored - [ ] Run full test suite (unit + integration + E2E) - [ ] Measure RTO (actual time to restore) - [ ] Measure RPO (data loss amount) - [ ] Document findings and improvements - [ ] Update DR playbook **Success Criteria:** - RTO < 4 hours - RPO < 15 minutes - All tests passing - Zero critical data loss --- ## 10. REFERENCES **Internal Documentation:** - [Deployment Guide](./DEPLOYMENT-GUIDE.md) - [Monitoring & Observability](./MONITORING-OBSERVABILITY.md) - [Database Schemas](../02-modelado/database-design/schemas/) **External Resources:** - [PostgreSQL Backup Documentation](https://www.postgresql.org/docs/16/backup.html) - [PostgreSQL PITR](https://www.postgresql.org/docs/16/continuous-archiving.html) - [AWS RDS Backup Best Practices](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html) --- **Documento:** BACKUP-RECOVERY.md **Versión:** 1.0 **Total Páginas:** ~12 **Última Actualización:** 2025-11-24