Self-healing mechanism for unschedulable pods
Autonomously identify unschedulable pods and fix the issues, ensuring continuous smooth operation
Last updated
Autonomously identify unschedulable pods and fix the issues, ensuring continuous smooth operation
Last updated
In some situations, pods may become unschedulable on a node. This can happen due to various factors, such as a lack of nodes with sufficient resources for the pod, etc. The PerfectScale automation mechanism is designed to actively identify and autonomously resolve such issues by reverting the resources of such workload to the previous valid configurations.
When automation has increased the pod's resources, and no node with the requested resources is available, PerfectScale will consider the pod unschedulable. To ensure that the pod continues to operate, automation will revert the resources to their previous values. If returning to the previous resource request resolves the issue, the pod will continue running, and automation will not increase resources for this pod for a couple of days to ensure it's stable work and prevent recurrence. If returning to the previous resource request didn't help, automation will return pod resources to the spec.
When PerfectScale marks a pod as unschedulable, the indicator Automation: Limit Reached
will be triggered.
Once automation returns resources to their previous values, the label will be updated to perfectscale.io/selfHealing:
previous-cycle
.
When automation returns the pod resources to the spec, the label perfectscale.io/selfHealing:
original-spec
will be assigned accordingly.
Once the issue has been resolved and the pod is successfully scheduled on a node, PerfectScale will remove the selfHealing label and continue to operate as usual.
If the pod continues to be unschedulable after reverting to the spec, the underlying issue may differ. In such cases, it is recommended to verify other configurations, such as the cluster autoscaler configurations, etc.