PerfectScale.io Start for Free Your Account

Self-healing mechanism for unschedulable pods

Autonomously identify unschedulable pods and fix the issues, ensuring continuous smooth operation

PreviousExploring Automation KPIs NextDisable automation

Last updated 1 month ago

Self-healing mechanism for unschedulable pods

Autonomously identify unschedulable pods and fix the issues, ensuring continuous smooth operation

Upgrade psc-autoscaler to at least v1.0.18 to enable this feature.

In some situations, pods may become unschedulable on a node. This can happen due to various factors, such as a lack of nodes with sufficient resources for the pod, etc. The PerfectScale automation mechanism is designed to actively identify and autonomously resolve such issues by reverting the resources of such workload to the previous valid configurations.

This feature is only available to customers using Karpenter or Cluster Autoscaler.

General concept

The self-healing mechanism for unschedulable pods supports automated workloads with the following workload types: Deployments, Jobs, and CronJobs. to get the latest updates and be the first to hear about new releases.

When automation has increased the pod's resources, and no node with the requested resources is available, PerfectScale will consider the pod unschedulable. To ensure that the pod continues to operate, automation will revert the resources to their previous values. If returning to the previous resource request resolves the issue, the pod will continue running, and automation will not increase resources for this pod for a couple of days to ensure it's stable work and prevent recurrence. If returning to the previous resource request didn't help, automation will return pod resources to the spec.

Indication

When PerfectScale marks a pod as unschedulable, the indicator Automation: Limited by Rule will be triggered.

Once automation returns resources to their previous values, the label will be updated to perfectscale.io/selfHealing:previous-cycle .

When automation returns the pod resources to the spec, the label perfectscale.io/selfHealing:original-spec will be assigned accordingly.

Resolution

Once the issue has been resolved and the pod is successfully scheduled on a node, PerfectScale will remove the selfHealing label and continue to operate as usual.

If the pod continues to be unschedulable after reverting to the spec, the underlying issue may differ. In such cases, it is recommended to verify other configurations, such as the cluster autoscaler configurations, etc.