Simplified and Cost-Efficient GKE Node Management with NAP:
Secure your spot!
LogoLogo
PerfectScale.ioStart for FreeYour Account
  • Kubernetes Optimization
  • Getting started
    • How to onboard a cluster
    • Onboarding clusters programmatically
    • Onboarding with ArgoCD
    • Updating PerfectScale Agent resources
    • Re-onboarding a cluster
  • Enable automation
    • Automation setup instruction
    • Including a cluster, namespace or workload to the Automation
      • Configuring Automation for a cluster
      • Configuring Automation for a namespace
      • Configuring Automation for a workload
    • Excluding a namespace or workload from the Automation
    • Automation customization
    • Verifying Automation status
    • Exploring Automation KPIs
    • Self-healing mechanism for unschedulable pods
    • Disable automation
    • Troubleshooting
    • Automation with GitOps
  • Cloud billing integration
    • Connecting AWS CUR
    • Connecting Azure Cost Management
  • Clusters' metrics overview
  • Podfit | vertical pod right-sizing
    • Understanding 'At Risk' indicators
    • LimitRange and ResourceQuota
  • Infrafit | node right-sizing
  • Configure alerts
    • Alerts acknowledgement
  • Trends monitoring
  • Revisions history log
  • Product overview
    • How to monitor PerfectScale Agent
    • PerfectScale data collected
    • PerfectScale Autoscaler Objects' Events
    • Outbound Request Ports used by the Exporter and Autoscaler
    • PerfectScale Weekly Report
    • Product architecture
  • Customizations
    • Alerting
      • Resiliency alerts
      • Financial alerts
    • Pricing
      • Custom Pricing configuration
      • AWS CUR configuration
      • Azure Cost Management configuration
    • Ticketing & Bug Tracking
    • Communication & Messaging
      • Slack Integration
        • How to configure slack_token
        • How to configure routings
      • MS Teams Integration
        • How to configure teams_webhook
      • Datadog Alerts Integration
    • Label customizations
    • Grouping
    • Observability
    • Podfit labels
    • Optimization Policy customization
  • Administration
    • Cluster settings
    • User management
    • Roles and permissions
    • Subscription details
    • Help Center
  • PerfectScale trial
    • How to find your allocated vCPU?
  • PerfectScale Prometheus Exporter
  • Security
    • MFA
    • SSO
    • ps-agent RBAC Permissions
    • psc-autoscaler RBAC Permissions
    • ps-exporter via Proxy Configuration
  • Public API
  • Help PerfectScale to improve
  • Go to your account
Powered by GitBook
LogoLogo

© PerfectScale 2025

On this page
  • Resilience indicators
  • Limit/Request not set indicators
  • UnderProvisioning indicators
  • Waste indicators
  1. Podfit | vertical pod right-sizing

Understanding 'At Risk' indicators

Explore wide range of 'At Risk' indicators that PerfectScale provides

PreviousPodfit | vertical pod right-sizingNextLimitRange and ResourceQuota

Last updated 1 month ago

Resilience indicators

OOM

Out-of-Memory events usually occur in the following situations:

  • The memory limit for a pod is set too low. An event will be triggered when the memory usage of the pod reaches a defined limit.

CPU Throttling

CPU Throttling occurs when the pod reaches its defined CPU limit and could create latency in application response.

RestartsObserved

Frequent restarts indicate the presence of a problem with a high potential of harming the desired SLA.

Eviction

Eviction indicates forcefully terminating and removing a running pod from a node. Eviction events usually occur due to memory or CPU pressure on a node.

HPAAtMaxReplicasObserved

As demand for a service or application increases, HPA will scale the system to handle the additional load by dynamically adding more replicas. Once the maximum configured limit of replicas is reached, PerfectScale will raise the HPAAtMaxReplicasObserved indicator, which means the system cannot scale further based on the existing settings.

Depending on a workload's running time at maximum replicas, the severity of the indicator will vary. For example, the longer a workload runs at maximum replicas, the higher the severity indicator.

Limit/Request not set indicators

CpuRequestNotSet

Setting proper CPU requests helps the Kubernetes scheduler to allocate the right amount of CPU for each container, making sure that the cluster's nodes capacity meets the demand.

MemRequestNotSet

Setting proper MEMORY requests helps the Kubernetes scheduler to allocate the right amount of memory for each container, making sure that the cluster's nodes capacity meets the demand.

MemLimitNotSet

Setting proper MEMORY limit helps to protect your worker node from OOM, preventing the risk of memory over-allocation.

Unlike compressible CPU (new cycle every 100ms), MEMORY is incompressible and cannot be over-allocated.

UnderProvisioning indicators

UnderProvisionedCpuRequest

Setting proper CPU requests helps the Kubernetes scheduler to allocate the right amount of CPU for each container, making sure that the cluster's nodes capacity meets the demand.

UnderProvisionedMemRequest

Setting proper MEMORY requests helps the Kubernetes scheduler to allocate the right amount of memory for each container, making sure that the cluster's nodes capacity meets the demand.

UnderProvisionedMemLimit

Setting proper MEMORY limit helps to protect your worker node from OOM, preventing the risk of memory over-allocation. However, under-provisioned MEMORY limit could cause unwanted OOM events on a pod level, potentially harming the desired SLA.

Waste indicators

OverProvisionedCpuRequest

Setting proper CPU requests helps the Kubernetes scheduler allocate the right amount of CPU for each container, ensuring that the cluster's nodes capacity meets the demand. In cases of over-provisioned CPU requests, cloud resources are unnecessarily wasted due to allocation without utilization.

OverProvisionedMemoryRequest

Setting proper MEMORY requests helps the Kubernetes scheduler allocate the right amount of memory for each container, ensuring that the cluster's nodes capacity meets the demand. However, when a memory request is over-provisioned, it wastes cloud resources, which are allocated but never used.

Node is experiencing memory pressure and tries to evict some pods.

Kubernetes use to implement the limit. The quota is based on the time period and not based on available CPU power. cfs_period_us is used to define the time period, it’s always 100000us (100ms). For example, a container with 1 core limit will be throttled after 50ms when running on 2 cores node and after 25ms when running on 4 cores node regardless of the number of consumed CPU cores.

When eviction is observed, an will be triggered immediately to inform the users. Make sure that you have configured and assigned the to the cluster to receive timely notifications on or channel.

Official documentation
CFS’s quota mechanism
Official documentation
alert
integration profile
Slack
MS Teams
Risks