How to monitor PerfectScale Agent

PerfectScale psc_exporter metrics and associated alerts overview

PerfectScale Exporter installed in your Kubernetes cluster provides helpful metrics for monitoring, reviewing, and alerting on PerfectScale platform behavior. These metrics will instantly detect abnormal behavior and notify you through alerts.

Metrics

The Exporter presents metrics in the Prometheus format and utilizes two categories of metrics:

  1. Gauge - system state at a specific time point. This metric can go down and can go up.

PerfectScale Exporter takes Kube State Metrics (KSM) and cAdvisor Metrics as inputs and uploads compressed metrics to PerfectScale SaaS as output.

KSM Metrics

These metrics monitor the communication between the Exporter and the KSM in your cluster.

ksm_instances_scraped

  • Metric Name: psc_exporter_ksm_instances_scraped

  • Description: The number of KSM instances the exporter tried to scrape during the latest scraping round. Usually, you have only 1.

  • Type: Gauge

ksm_scraping_errors_total

  • Metric Name: psc_exporter_ksm_scraping_errors_total

  • Description: A counter that grows incrementally to keep track of errors during KSM scraping.

  • Type: Counter

cAdvisor Metrics

cAdvisor component provides the Exporter with information on specific containers and their resource usage.

cadvisor_instances_scraped

  • Metric Name: psc_exporter_cadvisor_instances_scraped

  • Description: The number of cAdvisor instances the exporter tried to scrape during the latest scraping round.

  • Type: Gauge

cadvisor_scraping_errors_total

  • Metric Name: psc_exporter_cadvisor_scraping_errors_total

  • Description: A counter that grows incrementally to keep track of errors during cAdvisor scraping.

  • Type: Counter

PerfectScale Metrics

PerfectScale provides an additional group of metrics that pertains to exporters communicating with the PerfectScale SaaS platform.

time_windows_upload_errors_total

  • Metric Name: psc_exporter_time_windows_upload_errors_total

  • Description: Total number of failed attempts to upload time windows to PerfectScale.

  • Type: Counter

auth_errors_total

  • Metric Name: psc_exporter_auth_errors_total

  • Description: Total number of PerfectScale exporter authorization errors.

  • Type: Counter

upload_policy_errors_total

  • Metric Name: psc_exporter_upload_policy_errors_total

  • Description: Total number of errors when updating the upload policy.

  • Type: Counter

Alerts

Alerts are useful in immediately informing about the abnormal behavior of the Exporter based on the metrics described above.

How to configure the Alerts

To enable or configure the Alerts, update the Helm values according to your requirements.

Helm Values example

serviceMonitor:
  enable: true

prometheusRule:
  enable: true
  labels:
    customLabel: "value"
  annotations:
    customAnnotation: "value"
  team: "operations"
  severity: "critical"
  cAdvisorScraping:
    timeRange: "15m"
    threshold: 0.5

Alerts overview

PerfectScale Exporter High KSM Scraping Error Rate

  • Alert Name: PerfectScale Exporter High KSM Scraping Error Rate

  • Description: Within the last 5 minutes, over 30% of kube-state-metrics scraping attempts have failed.

PerfectScale Exporter High cAdvisor Scraping Error Rate

  • Alert Name: PerfectScale Exporter High cAdvisor Scraping Error Rate

  • Description: Over a specified percentage of cAdvisor scraping attempts have failed within the specified time range.

PerfectScale Exporter Time Windows Upload Error Rate

  • Alert Name: PerfectScale exporter Time Windows Upload Error Rate

  • Description: Within 1 hour, three or more time windows upload errors to PerfectScale occurred.

PerfectScale Exporter Authorization Errors

  • Alert Name: PerfectScale exporter Authorization Errors

  • Description: Within 1 hour, two or more PerfectScale Exporter authorization errors occurred.

PerfectScale Exporter Upload Policy Refresh Errors

  • Alert Name: PerfectScale exporter Upload Policy Refresh Errors

  • Description: Within 1 hour, two or more errors occurred when updating the upload policy.

Last updated