Simplified and Cost-Efficient GKE Node Management with NAP:
Secure your spot!
LogoLogo
PerfectScale.ioStart for FreeYour Account
  • Kubernetes Optimization
  • Getting started
    • How to onboard a cluster
    • Onboarding clusters programmatically
    • Onboarding with ArgoCD
    • Updating PerfectScale Agent resources
    • Re-onboarding a cluster
  • Enable automation
    • Automation setup instruction
    • Including a cluster, namespace or workload to the Automation
      • Configuring Automation for a cluster
      • Configuring Automation for a namespace
      • Configuring Automation for a workload
    • Excluding a namespace or workload from the Automation
    • Automation customization
    • Verifying Automation status
    • Exploring Automation KPIs
    • Self-healing mechanism for unschedulable pods
    • Disable automation
    • Troubleshooting
    • Automation with GitOps
  • Cloud billing integration
    • Connecting AWS CUR
    • Connecting Azure Cost Management
  • Clusters' metrics overview
  • Podfit | vertical pod right-sizing
    • Understanding 'At Risk' indicators
    • LimitRange and ResourceQuota
  • Infrafit | node right-sizing
  • Configure alerts
    • Alerts acknowledgement
  • Trends monitoring
  • Revisions history log
  • Product overview
    • How to monitor PerfectScale Agent
    • PerfectScale data collected
    • PerfectScale Autoscaler Objects' Events
    • Outbound Request Ports used by the Exporter and Autoscaler
    • PerfectScale Weekly Report
    • Product architecture
  • Customizations
    • Alerting
      • Resiliency alerts
      • Financial alerts
    • Pricing
      • Custom Pricing configuration
      • AWS CUR configuration
      • Azure Cost Management configuration
    • Ticketing & Bug Tracking
    • Communication & Messaging
      • Slack Integration
        • How to configure slack_token
        • How to configure routings
      • MS Teams Integration
        • How to configure teams_webhook
      • Datadog Alerts Integration
    • Label customizations
    • Grouping
    • Observability
    • Podfit labels
    • Optimization Policy customization
  • Administration
    • Cluster settings
    • User management
    • Roles and permissions
    • Subscription details
    • Help Center
  • PerfectScale trial
    • How to find your allocated vCPU?
  • PerfectScale Prometheus Exporter
  • Security
    • MFA
    • SSO
    • ps-agent RBAC Permissions
    • psc-autoscaler RBAC Permissions
    • ps-exporter via Proxy Configuration
  • Public API
  • Help PerfectScale to improve
  • Go to your account
Powered by GitBook
LogoLogo

© PerfectScale 2025

On this page
  • General concept
  • Indication
  • Resolution
  1. Enable automation

Self-healing mechanism for unschedulable pods

Autonomously identify unschedulable pods and fix the issues, ensuring continuous smooth operation

PreviousExploring Automation KPIsNextDisable automation

Last updated 1 month ago

Upgrade psc-autoscaler to at least v1.0.18 to enable this feature.

In some situations, pods may become unschedulable on a node. This can happen due to various factors, such as a lack of nodes with sufficient resources for the pod, etc. The PerfectScale automation mechanism is designed to actively identify and autonomously resolve such issues by reverting the resources of such workload to the previous valid configurations.

This feature is only available to customers using Karpenter or Cluster Autoscaler.

General concept

The self-healing mechanism for unschedulable pods supports automated workloads with the following workload types: Deployments, Jobs, and CronJobs. to get the latest updates and be the first to hear about new releases.

When automation has increased the pod's resources, and no node with the requested resources is available, PerfectScale will consider the pod unschedulable. To ensure that the pod continues to operate, automation will revert the resources to their previous values. If returning to the previous resource request resolves the issue, the pod will continue running, and automation will not increase resources for this pod for a couple of days to ensure it's stable work and prevent recurrence. If returning to the previous resource request didn't help, automation will return pod resources to the spec.

Indication

When PerfectScale marks a pod as unschedulable, the indicator Automation: Limited by Rule will be triggered.

Once automation returns resources to their previous values, the label will be updated to perfectscale.io/selfHealing:previous-cycle .

When automation returns the pod resources to the spec, the label perfectscale.io/selfHealing:original-spec will be assigned accordingly.

Resolution

Once the issue has been resolved and the pod is successfully scheduled on a node, PerfectScale will remove the selfHealing label and continue to operate as usual.

If the pod continues to be unschedulable after reverting to the spec, the underlying issue may differ. In such cases, it is recommended to verify other configurations, such as the cluster autoscaler configurations, etc.

Join our Slack community
Unschedulable pods indication