Health Controller

The Health Controller provides system health monitoring, status checks, and diagnostic information for the VChata platform.

Base Path

/health

Overview

This controller provides essential system monitoring capabilities:

💚 Health Checks - System and service health monitoring
📊 Status Information - Detailed system status and metrics
🔍 Diagnostic Tools - System diagnostics and troubleshooting
📈 Performance Metrics - System performance and resource usage
🛡️ Security Status - Security and authentication status
🔧 Service Dependencies - External service connectivity checks

Authentication & Authorization

🔓 Public Access - Health endpoints are publicly accessible for monitoring
🔐 Diagnostic Access - Some diagnostic endpoints require authentication
🏢 Organization Scoped - Organization-specific health checks require authentication

Health Checks

Basic Health Check

GET /health

{
  "status": "healthy",
  "timestamp": "2024-01-20T16:00:00Z",
  "uptime": 86400,
  "version": "1.2.3",
  "environment": "production"
}

Description: Basic health check endpoint for load balancers and monitoring systems.

Detailed Health Check

GET /health/detailed
Authorization: Bearer <token>

{
  "status": "healthy",
  "timestamp": "2024-01-20T16:00:00Z",
  "uptime": 86400,
  "version": "1.2.3",
  "environment": "production",
  "services": {
    "database": {
      "status": "healthy",
      "responseTime": 12,
      "lastChecked": "2024-01-20T16:00:00Z"
    },
    "redis": {
      "status": "healthy",
      "responseTime": 5,
      "lastChecked": "2024-01-20T16:00:00Z"
    },
    "stripe": {
      "status": "healthy",
      "responseTime": 150,
      "lastChecked": "2024-01-20T16:00:00Z"
    },
    "facebook": {
      "status": "healthy",
      "responseTime": 200,
      "lastChecked": "2024-01-20T16:00:00Z"
    }
  },
  "metrics": {
    "cpu": {
      "usage": 45.2,
      "load": 1.8
    },
    "memory": {
      "used": 2048,
      "total": 4096,
      "percentage": 50.0
    },
    "disk": {
      "used": 1024,
      "total": 2048,
      "percentage": 50.0
    }
  }
}

Description: Detailed health check with service dependencies and system metrics.

System Status

Get System Status

GET /health/status
Authorization: Bearer <token>

{
  "status": "operational",
  "timestamp": "2024-01-20T16:00:00Z",
  "services": {
    "api": {
      "status": "operational",
      "uptime": 86400,
      "requestsPerMinute": 1500,
      "averageResponseTime": 250
    },
    "database": {
      "status": "operational",
      "connections": 45,
      "maxConnections": 100,
      "queryTime": 12
    },
    "cache": {
      "status": "operational",
      "hitRate": 85.5,
      "memoryUsage": 512,
      "maxMemory": 1024
    },
    "external": {
      "stripe": {
        "status": "operational",
        "lastCheck": "2024-01-20T16:00:00Z",
        "responseTime": 150
      },
      "facebook": {
        "status": "operational",
        "lastCheck": "2024-01-20T16:00:00Z",
        "responseTime": 200
      }
    }
  },
  "incidents": [],
  "maintenance": []
}

Description: Comprehensive system status including all services and external dependencies.

Get Service Status

GET /health/services/database
Authorization: Bearer <token>

{
  "service": "database",
  "status": "healthy",
  "timestamp": "2024-01-20T16:00:00Z",
  "details": {
    "type": "PostgreSQL",
    "version": "14.5",
    "host": "db.vchata.com",
    "port": 5432,
    "database": "vchata_production",
    "connections": {
      "active": 45,
      "idle": 12,
      "max": 100
    },
    "performance": {
      "queryTime": 12,
      "slowQueries": 2,
      "cacheHitRatio": 95.8
    },
    "replication": {
      "status": "healthy",
      "lag": 0
    }
  }
}

Description: Detailed status information for a specific service.

Performance Metrics

Get Performance Metrics

GET /health/metrics?timeRange=1h
Authorization: Bearer <token>

{
  "timestamp": "2024-01-20T16:00:00Z",
  "timeRange": "1h",
  "metrics": {
    "system": {
      "cpu": {
        "usage": 45.2,
        "load": [1.8, 1.5, 1.2],
        "cores": 8
      },
      "memory": {
        "used": 2048,
        "total": 4096,
        "percentage": 50.0,
        "swap": {
          "used": 0,
          "total": 0
        }
      },
      "disk": {
        "used": 1024,
        "total": 2048,
        "percentage": 50.0,
        "io": {
          "read": 125.5,
          "write": 89.2
        }
      },
      "network": {
        "bytesIn": 1024000,
        "bytesOut": 2048000,
        "packetsIn": 15000,
        "packetsOut": 12000
      }
    },
    "application": {
      "requests": {
        "total": 90000,
        "perMinute": 1500,
        "errors": 45,
        "errorRate": 0.05
      },
      "responseTime": {
        "average": 250,
        "p50": 180,
        "p95": 800,
        "p99": 1500
      },
      "throughput": {
        "requestsPerSecond": 25,
        "bytesPerSecond": 1024000
      }
    },
    "database": {
      "connections": {
        "active": 45,
        "idle": 12,
        "max": 100
      },
      "queries": {
        "total": 450000,
        "perSecond": 125,
        "slow": 25
      },
      "performance": {
        "averageQueryTime": 12,
        "cacheHitRatio": 95.8,
        "indexUsage": 98.5
      }
    }
  }
}

Description: Comprehensive performance metrics for system monitoring.

query

object

Show Query Parameters

timeRange

string

Time range for metrics (5m, 15m, 1h, 4h, 24h)

metric

string

Specific metric to retrieve (system, application, database)

granularity

string

Data granularity (1m, 5m, 15m, 1h)

Get Historical Metrics

GET /health/metrics/historical?start=2024-01-20T00:00:00Z&end=2024-01-20T23:59:59Z&granularity=1h
Authorization: Bearer <token>

{
  "start": "2024-01-20T00:00:00Z",
  "end": "2024-01-20T23:59:59Z",
  "granularity": "1h",
  "data": [
    {
      "timestamp": "2024-01-20T00:00:00Z",
      "cpu": 42.1,
      "memory": 48.5,
      "disk": 49.8,
      "requests": 1400,
      "responseTime": 245
    },
    {
      "timestamp": "2024-01-20T01:00:00Z",
      "cpu": 38.7,
      "memory": 47.2,
      "disk": 49.8,
      "requests": 1200,
      "responseTime": 230
    }
  ]
}

Description: Historical performance metrics for trend analysis.

Diagnostic Tools

System Diagnostics

GET /health/diagnostics
Authorization: Bearer <token>

{
  "timestamp": "2024-01-20T16:00:00Z",
  "diagnostics": {
    "system": {
      "os": "Linux",
      "version": "Ubuntu 20.04.3 LTS",
      "kernel": "5.4.0-89-generic",
      "architecture": "x86_64"
    },
    "runtime": {
      "node": "18.17.0",
      "v8": "10.2.154.26-node.26",
      "platform": "linux",
      "arch": "x64"
    },
    "application": {
      "name": "vchata-backend",
      "version": "1.2.3",
      "environment": "production",
      "uptime": 86400
    },
    "dependencies": {
      "database": "PostgreSQL 14.5",
      "redis": "Redis 6.2.7",
      "stripe": "stripe@12.0.0"
    }
  }
}

Description: System diagnostic information for troubleshooting.

Connectivity Test

POST /health/connectivity-test
Authorization: Bearer <token>
Content-Type: application/json

{
  "services": ["database", "redis", "stripe", "facebook"],
  "timeout": 5000
}

{
  "timestamp": "2024-01-20T16:00:00Z",
  "results": [
    {
      "service": "database",
      "status": "success",
      "responseTime": 12,
      "error": null
    },
    {
      "service": "redis",
      "status": "success",
      "responseTime": 5,
      "error": null
    },
    {
      "service": "stripe",
      "status": "success",
      "responseTime": 150,
      "error": null
    },
    {
      "service": "facebook",
      "status": "success",
      "responseTime": 200,
      "error": null
    }
  ],
  "summary": {
    "total": 4,
    "successful": 4,
    "failed": 0,
    "averageResponseTime": 91.75
  }
}

Description: Tests connectivity to external services and dependencies.

Organization Health

Get Organization Health

GET /health/organization/org_abc123
Authorization: Bearer <token>

{
  "organizationId": "org_abc123",
  "status": "healthy",
  "timestamp": "2024-01-20T16:00:00Z",
  "health": {
    "accounts": {
      "total": 5,
      "active": 5,
      "inactive": 0,
      "issues": 0
    },
    "integrations": {
      "social": {
        "connected": 3,
        "healthy": 3,
        "issues": 0
      },
      "billing": {
        "status": "active",
        "paymentMethod": "valid",
        "subscription": "active"
      }
    },
    "usage": {
      "apiCalls": {
        "last24h": 15000,
        "limit": 100000,
        "percentage": 15.0
      },
      "storage": {
        "used": 1024,
        "limit": 10240,
        "percentage": 10.0
      }
    },
    "alerts": [],
    "recommendations": [
      {
        "type": "optimization",
        "message": "Consider upgrading your plan for better performance",
        "priority": "low"
      }
    ]
  }
}

Description: Organization-specific health information and recommendations.

Health Status Values

System Status

healthy - All systems operational
degraded - Some services experiencing issues
unhealthy - Critical services down
maintenance - System under maintenance

Service Status

operational - Service running normally
degraded - Service experiencing performance issues
outage - Service completely unavailable
maintenance - Service under maintenance

Error Responses

Common Errors

{
  "status": "unhealthy",
  "timestamp": "2024-01-20T16:00:00Z",
  "message": "Service temporarily unavailable",
  "services": {
    "database": {
      "status": "outage",
      "error": "Connection timeout"
    }
  }
}

{
  "status": "error",
  "timestamp": "2024-01-20T16:00:00Z",
  "message": "Internal server error during health check",
  "error": "Database connection failed"
}

Monitoring Integration

Prometheus Metrics

The health controller exposes Prometheus-compatible metrics at /health/metrics/prometheus:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",status="200"} 1500

# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.1"} 800
http_request_duration_seconds_bucket{le="0.5"} 1200
http_request_duration_seconds_bucket{le="1.0"} 1450
http_request_duration_seconds_bucket{le="+Inf"} 1500

# HELP system_cpu_usage CPU usage percentage
# TYPE system_cpu_usage gauge
system_cpu_usage 45.2

# HELP system_memory_usage Memory usage percentage
# TYPE system_memory_usage gauge
system_memory_usage 50.0

Grafana Dashboard

Health metrics can be visualized in Grafana dashboards for real-time monitoring.

Alerting

Health Check Alerts

The system can be configured to send alerts based on health check results:

Service Down - Alert when critical services become unavailable
Performance Degradation - Alert when response times exceed thresholds
Resource Usage - Alert when CPU, memory, or disk usage is high
Error Rate - Alert when error rates exceed acceptable levels

Integration with Monitoring Tools

DataDog - Custom metrics and dashboards
New Relic - Application performance monitoring
PagerDuty - Incident management and alerting
Slack - Real-time notifications and status updates

Security Considerations

Access Control

Public Endpoints - Basic health checks are publicly accessible
Authenticated Endpoints - Detailed diagnostics require authentication
Rate Limiting - Health endpoints are rate limited to prevent abuse
IP Restrictions - Sensitive diagnostic endpoints can be IP restricted

Data Protection

Minimal Information - Health checks expose only necessary information
No Sensitive Data - No passwords, tokens, or personal data in health responses
Sanitized Output - All diagnostic output is sanitized before exposure

Getting Started

Authentication

Core Services

Content & Social

AI & Automation

Analytics & Insights

Communication

Utilities

Services

​Health Controller

​Base Path

​Overview

​Authentication & Authorization

​Health Checks

​Basic Health Check

​Detailed Health Check

​System Status

​Get System Status

​Get Service Status

​Performance Metrics

​Get Performance Metrics

​Get Historical Metrics

​Diagnostic Tools

​System Diagnostics

​Connectivity Test

​Organization Health

​Get Organization Health

​Health Status Values

​System Status

​Service Status

​Error Responses

​Common Errors

​Monitoring Integration

​Prometheus Metrics

​Grafana Dashboard

​Alerting

​Health Check Alerts

​Integration with Monitoring Tools

​Security Considerations

​Access Control

​Data Protection

Health Controller

Base Path

Overview

Authentication & Authorization

Health Checks

Basic Health Check

Detailed Health Check

System Status

Get System Status

Get Service Status

Performance Metrics

Get Performance Metrics

Get Historical Metrics

Diagnostic Tools

System Diagnostics

Connectivity Test

Organization Health

Get Organization Health

Health Status Values

System Status

Service Status

Error Responses

Common Errors

Monitoring Integration

Prometheus Metrics

Grafana Dashboard

Alerting

Health Check Alerts

Integration with Monitoring Tools

Security Considerations

Access Control

Data Protection