Skip to main content

Health Controller

The Health Controller provides system health monitoring, status checks, and diagnostic information for the VChata platform.

Base Path

/health

Overview

This controller provides essential system monitoring capabilities:
  • 💚 Health Checks - System and service health monitoring
  • 📊 Status Information - Detailed system status and metrics
  • 🔍 Diagnostic Tools - System diagnostics and troubleshooting
  • 📈 Performance Metrics - System performance and resource usage
  • 🛡️ Security Status - Security and authentication status
  • 🔧 Service Dependencies - External service connectivity checks

Authentication & Authorization

  • 🔓 Public Access - Health endpoints are publicly accessible for monitoring
  • 🔐 Diagnostic Access - Some diagnostic endpoints require authentication
  • 🏢 Organization Scoped - Organization-specific health checks require authentication

Health Checks

Basic Health Check

GET /health
{
  "status": "healthy",
  "timestamp": "2024-01-20T16:00:00Z",
  "uptime": 86400,
  "version": "1.2.3",
  "environment": "production"
}
Description: Basic health check endpoint for load balancers and monitoring systems.

Detailed Health Check

GET /health/detailed
Authorization: Bearer <token>
{
  "status": "healthy",
  "timestamp": "2024-01-20T16:00:00Z",
  "uptime": 86400,
  "version": "1.2.3",
  "environment": "production",
  "services": {
    "database": {
      "status": "healthy",
      "responseTime": 12,
      "lastChecked": "2024-01-20T16:00:00Z"
    },
    "redis": {
      "status": "healthy",
      "responseTime": 5,
      "lastChecked": "2024-01-20T16:00:00Z"
    },
    "stripe": {
      "status": "healthy",
      "responseTime": 150,
      "lastChecked": "2024-01-20T16:00:00Z"
    },
    "facebook": {
      "status": "healthy",
      "responseTime": 200,
      "lastChecked": "2024-01-20T16:00:00Z"
    }
  },
  "metrics": {
    "cpu": {
      "usage": 45.2,
      "load": 1.8
    },
    "memory": {
      "used": 2048,
      "total": 4096,
      "percentage": 50.0
    },
    "disk": {
      "used": 1024,
      "total": 2048,
      "percentage": 50.0
    }
  }
}
Description: Detailed health check with service dependencies and system metrics.

System Status

Get System Status

GET /health/status
Authorization: Bearer <token>
{
  "status": "operational",
  "timestamp": "2024-01-20T16:00:00Z",
  "services": {
    "api": {
      "status": "operational",
      "uptime": 86400,
      "requestsPerMinute": 1500,
      "averageResponseTime": 250
    },
    "database": {
      "status": "operational",
      "connections": 45,
      "maxConnections": 100,
      "queryTime": 12
    },
    "cache": {
      "status": "operational",
      "hitRate": 85.5,
      "memoryUsage": 512,
      "maxMemory": 1024
    },
    "external": {
      "stripe": {
        "status": "operational",
        "lastCheck": "2024-01-20T16:00:00Z",
        "responseTime": 150
      },
      "facebook": {
        "status": "operational",
        "lastCheck": "2024-01-20T16:00:00Z",
        "responseTime": 200
      }
    }
  },
  "incidents": [],
  "maintenance": []
}
Description: Comprehensive system status including all services and external dependencies.

Get Service Status

GET /health/services/database
Authorization: Bearer <token>
{
  "service": "database",
  "status": "healthy",
  "timestamp": "2024-01-20T16:00:00Z",
  "details": {
    "type": "PostgreSQL",
    "version": "14.5",
    "host": "db.vchata.com",
    "port": 5432,
    "database": "vchata_production",
    "connections": {
      "active": 45,
      "idle": 12,
      "max": 100
    },
    "performance": {
      "queryTime": 12,
      "slowQueries": 2,
      "cacheHitRatio": 95.8
    },
    "replication": {
      "status": "healthy",
      "lag": 0
    }
  }
}
Description: Detailed status information for a specific service.

Performance Metrics

Get Performance Metrics

GET /health/metrics?timeRange=1h
Authorization: Bearer <token>
{
  "timestamp": "2024-01-20T16:00:00Z",
  "timeRange": "1h",
  "metrics": {
    "system": {
      "cpu": {
        "usage": 45.2,
        "load": [1.8, 1.5, 1.2],
        "cores": 8
      },
      "memory": {
        "used": 2048,
        "total": 4096,
        "percentage": 50.0,
        "swap": {
          "used": 0,
          "total": 0
        }
      },
      "disk": {
        "used": 1024,
        "total": 2048,
        "percentage": 50.0,
        "io": {
          "read": 125.5,
          "write": 89.2
        }
      },
      "network": {
        "bytesIn": 1024000,
        "bytesOut": 2048000,
        "packetsIn": 15000,
        "packetsOut": 12000
      }
    },
    "application": {
      "requests": {
        "total": 90000,
        "perMinute": 1500,
        "errors": 45,
        "errorRate": 0.05
      },
      "responseTime": {
        "average": 250,
        "p50": 180,
        "p95": 800,
        "p99": 1500
      },
      "throughput": {
        "requestsPerSecond": 25,
        "bytesPerSecond": 1024000
      }
    },
    "database": {
      "connections": {
        "active": 45,
        "idle": 12,
        "max": 100
      },
      "queries": {
        "total": 450000,
        "perSecond": 125,
        "slow": 25
      },
      "performance": {
        "averageQueryTime": 12,
        "cacheHitRatio": 95.8,
        "indexUsage": 98.5
      }
    }
  }
}
Description: Comprehensive performance metrics for system monitoring.
query
object

Get Historical Metrics

GET /health/metrics/historical?start=2024-01-20T00:00:00Z&end=2024-01-20T23:59:59Z&granularity=1h
Authorization: Bearer <token>
{
  "start": "2024-01-20T00:00:00Z",
  "end": "2024-01-20T23:59:59Z",
  "granularity": "1h",
  "data": [
    {
      "timestamp": "2024-01-20T00:00:00Z",
      "cpu": 42.1,
      "memory": 48.5,
      "disk": 49.8,
      "requests": 1400,
      "responseTime": 245
    },
    {
      "timestamp": "2024-01-20T01:00:00Z",
      "cpu": 38.7,
      "memory": 47.2,
      "disk": 49.8,
      "requests": 1200,
      "responseTime": 230
    }
  ]
}
Description: Historical performance metrics for trend analysis.

Diagnostic Tools

System Diagnostics

GET /health/diagnostics
Authorization: Bearer <token>
{
  "timestamp": "2024-01-20T16:00:00Z",
  "diagnostics": {
    "system": {
      "os": "Linux",
      "version": "Ubuntu 20.04.3 LTS",
      "kernel": "5.4.0-89-generic",
      "architecture": "x86_64"
    },
    "runtime": {
      "node": "18.17.0",
      "v8": "10.2.154.26-node.26",
      "platform": "linux",
      "arch": "x64"
    },
    "application": {
      "name": "vchata-backend",
      "version": "1.2.3",
      "environment": "production",
      "uptime": 86400
    },
    "dependencies": {
      "database": "PostgreSQL 14.5",
      "redis": "Redis 6.2.7",
      "stripe": "[email protected]"
    }
  }
}
Description: System diagnostic information for troubleshooting.

Connectivity Test

POST /health/connectivity-test
Authorization: Bearer <token>
Content-Type: application/json

{
  "services": ["database", "redis", "stripe", "facebook"],
  "timeout": 5000
}
{
  "timestamp": "2024-01-20T16:00:00Z",
  "results": [
    {
      "service": "database",
      "status": "success",
      "responseTime": 12,
      "error": null
    },
    {
      "service": "redis",
      "status": "success",
      "responseTime": 5,
      "error": null
    },
    {
      "service": "stripe",
      "status": "success",
      "responseTime": 150,
      "error": null
    },
    {
      "service": "facebook",
      "status": "success",
      "responseTime": 200,
      "error": null
    }
  ],
  "summary": {
    "total": 4,
    "successful": 4,
    "failed": 0,
    "averageResponseTime": 91.75
  }
}
Description: Tests connectivity to external services and dependencies.

Organization Health

Get Organization Health

GET /health/organization/org_abc123
Authorization: Bearer <token>
{
  "organizationId": "org_abc123",
  "status": "healthy",
  "timestamp": "2024-01-20T16:00:00Z",
  "health": {
    "accounts": {
      "total": 5,
      "active": 5,
      "inactive": 0,
      "issues": 0
    },
    "integrations": {
      "social": {
        "connected": 3,
        "healthy": 3,
        "issues": 0
      },
      "billing": {
        "status": "active",
        "paymentMethod": "valid",
        "subscription": "active"
      }
    },
    "usage": {
      "apiCalls": {
        "last24h": 15000,
        "limit": 100000,
        "percentage": 15.0
      },
      "storage": {
        "used": 1024,
        "limit": 10240,
        "percentage": 10.0
      }
    },
    "alerts": [],
    "recommendations": [
      {
        "type": "optimization",
        "message": "Consider upgrading your plan for better performance",
        "priority": "low"
      }
    ]
  }
}
Description: Organization-specific health information and recommendations.

Health Status Values

System Status

  • healthy - All systems operational
  • degraded - Some services experiencing issues
  • unhealthy - Critical services down
  • maintenance - System under maintenance

Service Status

  • operational - Service running normally
  • degraded - Service experiencing performance issues
  • outage - Service completely unavailable
  • maintenance - Service under maintenance

Error Responses

Common Errors

{
  "status": "unhealthy",
  "timestamp": "2024-01-20T16:00:00Z",
  "message": "Service temporarily unavailable",
  "services": {
    "database": {
      "status": "outage",
      "error": "Connection timeout"
    }
  }
}
{
  "status": "error",
  "timestamp": "2024-01-20T16:00:00Z",
  "message": "Internal server error during health check",
  "error": "Database connection failed"
}

Monitoring Integration

Prometheus Metrics

The health controller exposes Prometheus-compatible metrics at /health/metrics/prometheus:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",status="200"} 1500

# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.1"} 800
http_request_duration_seconds_bucket{le="0.5"} 1200
http_request_duration_seconds_bucket{le="1.0"} 1450
http_request_duration_seconds_bucket{le="+Inf"} 1500

# HELP system_cpu_usage CPU usage percentage
# TYPE system_cpu_usage gauge
system_cpu_usage 45.2

# HELP system_memory_usage Memory usage percentage
# TYPE system_memory_usage gauge
system_memory_usage 50.0

Grafana Dashboard

Health metrics can be visualized in Grafana dashboards for real-time monitoring.

Alerting

Health Check Alerts

The system can be configured to send alerts based on health check results:
  • Service Down - Alert when critical services become unavailable
  • Performance Degradation - Alert when response times exceed thresholds
  • Resource Usage - Alert when CPU, memory, or disk usage is high
  • Error Rate - Alert when error rates exceed acceptable levels

Integration with Monitoring Tools

  • DataDog - Custom metrics and dashboards
  • New Relic - Application performance monitoring
  • PagerDuty - Incident management and alerting
  • Slack - Real-time notifications and status updates

Security Considerations

Access Control

  • Public Endpoints - Basic health checks are publicly accessible
  • Authenticated Endpoints - Detailed diagnostics require authentication
  • Rate Limiting - Health endpoints are rate limited to prevent abuse
  • IP Restrictions - Sensitive diagnostic endpoints can be IP restricted

Data Protection

  • Minimal Information - Health checks expose only necessary information
  • No Sensitive Data - No passwords, tokens, or personal data in health responses
  • Sanitized Output - All diagnostic output is sanitized before exposure