LogoLogo
PerfectScale.ioStart for FreeYour Account
  • Kubernetes Optimization
  • Getting started
    • How to onboard a cluster
    • Onboarding clusters programmatically
    • Onboarding with ArgoCD
    • Updating PerfectScale Agent resources
    • Re-onboarding a cluster
  • Enable automation
    • Automation setup instruction
    • Including a cluster, namespace or workload to the Automation
      • Configuring Automation for a cluster
      • Configuring Automation for a namespace
      • Configuring Automation for a workload
    • Excluding a namespace or workload from the Automation
    • Automation customization
    • Verifying Automation status
    • Exploring Automation KPIs
    • Self-healing mechanism for unschedulable pods
    • Disable automation
    • Troubleshooting
    • Automation with GitOps
  • Cloud billing integration
    • Connecting AWS CUR
    • Connecting Azure Cost Management
  • Clusters' metrics overview
  • Podfit | vertical pod right-sizing
    • Understanding 'At Risk' indicators
    • LimitRange and ResourceQuota
  • Infrafit | node right-sizing
  • Configure alerts
    • Alerts acknowledgement
  • Trends monitoring
  • Revisions history log
  • Product overview
    • How to monitor PerfectScale Agent
    • PerfectScale data collected
    • PerfectScale Autoscaler Objects' Events
    • Outbound Request Ports used by the Exporter and Autoscaler
    • PerfectScale Weekly Report
    • Product architecture
  • Customizations
    • Alerting
      • Resiliency alerts
      • Financial alerts
    • Pricing
      • Custom Pricing configuration
      • AWS CUR configuration
      • Azure Cost Management configuration
    • Ticketing & Bug Tracking
    • Communication & Messaging
      • Slack Integration
        • How to configure slack_token
        • How to configure routings
      • MS Teams Integration
        • How to configure teams_webhook
      • Datadog Alerts Integration
    • Label customizations
    • Grouping
    • Observability
    • Podfit labels
    • Optimization Policy customization
  • Administration
    • Cluster settings
    • User management
    • Roles and permissions
    • Subscription details
    • Help Center
  • PerfectScale trial
    • How to find your allocated vCPU?
  • PerfectScale Prometheus Exporter
  • Security
    • MFA
    • SSO
    • ps-agent RBAC Permissions
    • psc-autoscaler RBAC Permissions
    • ps-exporter via Proxy Configuration
  • Public API
  • Help PerfectScale to improve
  • Go to your account
Powered by GitBook
LogoLogo

© PerfectScale 2025

On this page
  • Cluster overview and telemetry
  • Overview section
  • Telemetry section
  • Workloads table
  • View Customization
  • Detailed workload analysis
  • Top panel
  • Workload summary panel
  • Workload details panel

Podfit | vertical pod right-sizing

Explore a granular, comprehensive view of your clusters' health and costs, identify and prioritize areas that need attention while autonomously optimizing workloads

PreviousClusters' metrics overviewNextUnderstanding 'At Risk' indicators

Last updated 1 month ago

PerfectScale Podfit provides comprehensive insights into the health and costs of your cluster and its components, helping you quickly pinpoint areas requiring attention along with data-driven, actionable recommendations to streamline and enhance your optimization process.

Cluster overview and telemetry

The overview and telemetry section delivers a comprehensive summary of performance risks, costs, and waste insights for the selected cluster, along with identified optimization opportunities you can quickly achieve with PerfectScale. This view enables a quick evaluation of your cluster's overall health and efficiency, pinpointing configuration issues and empowering you to streamline and enhance your optimization process effectively.

Overview section

Cluster selector allows for dynamic switching between clusters, enabling seamless management and monitoring of a multi-cluster environment.

Tenant - the account name (PerfectScale in the example above).

Optimization Policy - displays the optimization policy of the selected cluster. Optimization Policy allows you to specify how your resources should be allocated in order to support the individual needs of your workloads. Define the policies that best suit your environment and business goals, depending on whether you want to maximize cost savings or provide extra headroom to maintain the resilience of mission-critical services.

  • MaxSavings - maximum cost savings, the best for non-production environments

  • Balanced (default) - optimally balances cost and resiliency

  • ExtraHeadroom - the best fit for latency-sensitive environments

  • MaxHeadroom - keeps the environment above the highest spikes

Timeframe allows you to adjust the period for reviewing metrics, enabling a focused analysis for a specific time range.

Export allows you to easily download your data as a .csv file, enabling smooth analysis and effortless sharing.

Telemetry section

The telemetry section provides a comprehensive overview of aggregated data for the selected cluster, offering key insights into the cluster's health and efficiency. This helps you evaluate the cluster's performance easily and identifies opportunities for optimization, giving you a clear view of its overall status.

Unused Resources provides insights into the resources within the cluster that are not being effectively utilized:

  • Pod Waste displays the total cost of wasted resources within the cluster. Clicking on this metric will direct you to the workload waste report, offering a detailed visual breakdown of the workloads contributing to the waste. This allows you to quickly identify the most impactful areas requiring attention, enabling more efficient optimization.

  • Node Idle indicates the total cost of unutilized node space. By clicking on this metric, you'll be navigated to a comprehensive view of the cluster at the infrastructure level. This view provides valuable insights into the behavior of different node groups and types, enabling you to optimize the underlying infrastructure for your workloads effectively.

Potential Savings is a powerful widget that offers insights into the total costs incurred compared to the actual resource utilization. This information helps you evaluate whether the cluster is well-balanced, over-provisioned, or under-provisioned. Additionally, the widget provides a Recommended Cost, reflecting the potential savings achievable through PerfectScale's recommendations, ensuring your cluster operates efficiently and cost-effectively. Clicking on this metric will direct you to the cluster cost report for further investigation.

Negative savings indicate an under-provisioned environment.

CPU/Memory Utilization Over Time provides a comprehensive visual representation of resource allocation, requests, and usage trends within your cluster. Tracking these metrics over a specified timeframe allows you to analyze historical data to understand how resource dynamics have changed, compare actual usage with allocated and requested resources, and identify utilization patterns.

  • Used - p99 of utilization

  • Requested - p99 of the combined requests of all the workloads

  • Allocated - p99 of available cluster compute

Workloads table

The Workload table provides a detailed overview of all the workloads running in your cluster. Each row represents a specific workload and its containers, including critical metrics like cost, waste, and potential cost increase due to under-provisioned resources. This view will help you quickly identify workloads that are misaligned with resource demands, highlighting optimization opportunities and areas at risk that require attention. With dynamic filtering and sorting options, you can easily focus on specific namespaces, labels, or workloads, making it easier to prioritize optimization tasks and run clusters efficiently.

Workloads are sets of pods of a Deployment, StatefulSet, DaemonSet, Job, or custom resource CRD (for example - Runner, SparkJob, etc)

Hover over the column name to view hints.

Filtering resiliency issues

The dot count is a visual indicator of risk levels, with three levels: Low, Medium, and High (three dots represent the High-risk level).

Hollow dots indicate a muted workload, while shaded dots indicate the presence of a workflow ticket in progress.

Automation status

This column shows the current automation status of each workload. You can quickly filter the data by automation status, prioritizing and focusing on the most relevant workloads for further investigation.

Multiselect is available.

Type

This column identifies the workload type (e.g., Deployment, StatefulSet). You can use filtering, sorting, and multi-select options to tailor the data display, making focusing on specific workload types easier.

Namespace

The namespace column shows the namespace of each workload. You can apply filtering, sorting, and multi-select options to customize the data display, allowing you to focus on specific namespaces.

If PerfectScale does not detect any workload in the Namespaces for 7 consecutive days, those Namespaces will be consolidated into a separate Namespace __deleted-namespaces__.

Running Hours

The workload running hours column indicates the total duration each workload, including its replicas, has been actively running in the cluster during the selected period. You can use the sorting option to arrange the data in your preferred order.

Cost/h

The workload cost per hour column indicates the total hourly expense of the workload. You can use the sorting option to arrange the data in your preferred order.

Total cost

The workload total cost column shows the total expense of the workload for the selected period, considering both its hourly cost and the duration it has been actively running. You can easily identify the most costly workloads in the cluster with a single click using the sorting option.

Increase Needed

The increase needed column shows the projected rise in workload cost based on PerfectScale’s recommendations, indicating that the workload is under-provisioned. This helps you predict the cost adjustments required to maintain cluster stability.

Pod Waste

The workload waste column indicates the cost of over-provisioned resources allocated to a workload and represents the potential savings achievable through PerfectScale's recommendations. You can easily identify the most wasteful workloads in the cluster with a single click using the sorting option.

Container

The container column lists the containers associated with each workload. You can use filtering options to display the data for a specific container(s). Multi-select is available.

View Customization

Recommendations view

Name
Description

CPU Request

PerfectScale guidelines for CPU Request.

CPU Limit

PerfectScale guidelines for CPU Limit.

Memory Request

PerfectScale guidelines for Memory Request.

Memory Limit

PerfectScale guidelines for Memory Limit.

To customize your recommendations view, use the Resource Change View drop-down menu.

  • Detailed - to display the changes made to resources (shows both the previous and new values).

  • Total Impact in Units - to display changes made to resources as an absolute number, factoring in replica count.

  • Single Instance Impact in Units - to display changes made to resources as an absolute number.

  • Single Instance Impact in % - to display changes made to resources in a percentage format.

When the recommendation view is set to Total Impact in Units, the resource change impact summary is available. This view provides a clear understanding of the effect of total resource adjustments, enabling seamless evaluation of the optimization process.

Labels and Policies view

Using your existing labels can help you manage the workloads more effectively by allowing you to focus on the most important ones.

PerfectScale collects and supports Workloads and Namespaces labels.

To customize the Labels View, PerfectScale allows you to choose two label keys. Each column in the Labels Table corresponds to a selected key and displays its relevant data for each workload.

To configure the label, click on the gear button. Then, choose the desired labels to be displayed and click the Apply button. Once the changes are applied, the values that correspond to the selected keys for workloads will be displayed.

When configuring the label view, it is possible to operate with the labels of Workloads and Namespaces. All the labels appear in the same list.

The Workload labels have higher precedence than Namespace labels. If the Workload label and Namespace label have the same name, only the Workload label will be displayed.

The Label set listed in the attached to the cluster Podfit Labels Profile takes precedence over any manually applied labels. If the cluster has a Podfit Labels Profile attached, it will always revert to its label set. However, if no such profile is attached, any manual label changes will be saved.

HPA view

The HPA view provides a clear overview of workloads utilizing Horizontal Pod Autoscaler (HPA). This feature enables users to quickly identify the workloads where HPA has been introduced and adjust HPA thresholds with provided informative tooltips that offer tailored recommendations. These recommendations are particularly helpful in optimizing scaling decisions, minimizing resource waste, and ensuring efficient operation of workloads.

Column
Description

HPA

Indicates whether HPA has been introduced for the workload. You can easily sort the column by clicking the header or apply specific filters.

CPU (%)

Displays the trigger for HPA by CPU. For insights on threshold recommendations, simply hover over the warning tooltip. You can easily sort the column by its values by clicking the header.

There are two types of indicators to be aware of:

  • A red indication signifies that the threshold is below 60%, indicating potential significant CPU waste.

  • A yellow indication suggests that the threshold falls between 60% and 80%, pointing to potential moderate CPU waste.

Memory (%)

Displays the trigger for HPA by Memory. For insights on threshold recommendations, simply hover over the warning tooltip. You can easily sort the column by its values by clicking on the header.

There are two types of indicators to be aware of:

  • A red indication signifies that the threshold is below 60%, indicating potential significant Memory waste.

  • A yellow indication suggests that the threshold falls between 60% and 80%, pointing to potential moderate Memory waste.

Custom metric

Indicates whether a Custom metric has been detected. You can easily sort the column by clicking the header or apply specific filters.

Detailed workload analysis

The zoom-in window provides comprehensive details of the workload's current state and behavior, along with historical data over time, delivering detailed metrics and unmatched visibility on resource utilization efficiency and performance risks. It provides actionable recommendations for adjusting resource allocations to enhance performance and minimize waste, and emphasizes the impact once they are implemented. Additionally, users can explore the Revision History, which displays all updates and changes, including automated or manual adjustments made to the workload, simplifying further analysis and helping track optimization progress over time.

By clicking on the workload, you will be directed to its zoom-in window:

Top panel

The top panel displays the currently selected workload and offers easy access to the Workload Optimization Policy management menu as well as the actions menu.

Workload Optimization Policy

This displays the optimization policy of the selected workload. The optimization policy specifies how resources should be allocated to achieve the desired level of resiliency and meet application demand. This ensures that your system maintains optimal performance and stability according to your predefined standards.

  • MaxEconomy - the best fit for non-production environments (Low Resiliency)

  • Balanced (default) - optimally balances cost and resiliency (Medium Resiliency)

  • ExtraHeadroom - the best fit for latency-sensitive environments (High Resiliency)

  • MaxHeadroom - keeps the environment above the highest spikes (Highest Resiliency)

Set ExtraHeadroom or MaxHeadroom Optimization Policy with just a few clicks for your mission-critical production services, ensuring continuous optimal performance.

To change the policy for the workload, select the desired one from the drop-down list and click Save button to apply the changes.

The Optimization Policy can be set for the entire cluster and for a specific workload. The workload's Optimization Policy takes precedence and will override the value defined at the cluster level. If the Optimization Policy is not specified for the workload, PerfectScale will use the default policy set for the cluster.

Actions

The actions menu provides quick access to various tasks for streamlined workload management.

Workload summary panel

This panel provides a comprehensive overview of key cost metrics, highlighting potential savings and identifying existing performance risks. Additionally, it shows the average number of observed workload replicas and indicates whether HPA has been introduced, along with its associated thresholds.

At the top of the panel, you can see the type of the selected workload, along with the corresponding namespace and cluster. Running hours (Running Hrs) refers to the total duration the workload, including its replicas, has been actively running in the cluster over the last 30 days.

Cost reflects the total expenses associated with the workload over the past 30 days.

Waste reflects the total price of unutilized resources associated with the workload over the past 30 days.

Potential savings shows the reducible workload cost through PerfectScale's recommendations, all while maintaining peak performance.

The revision section shows the ID of the selected revision (the current by default), its start and end time, and the associated risks overview:

  • The risk level associated with the revision performance issues, based on their potential impact

  • The count of the most pressing risks associated with the revision, such as evictions, observed restarts, or when the maximum configured limit of replicas for HPA is reached, if detected.

  • Comprehensive overview of all risks per container associated with the particular revision. Click View All Risks to access the full list of the risks.

This view is particularly helpful for quickly evaluating the workload's health at any given moment over the past 30 days.

Replicas displays the average number of pod replicas.

The HPA section shows whether HPA has been introduced in the workload, displays its associated thresholds, and indicates if the custom metric has been detected.

Workload details panel

The workload details panel offers unparalleled visibility into workload resource utilization at any moment over the past 30 days, identifying inefficiencies in real-time and uncovering new optimization opportunities. Providing data-driven optimization recommendations empowers you to take action proactively, enhance efficiency, and reduce costs while maintaining high performance.

Recommendations

The recommendations are only available for the current revision.

To apply the recommendations manually, click the View as YAML button and deploy the recommendations to your cluster.

Automation Limited by Rule

When one or more resources have reached their CRD-defined size constraints, the recommendations cannot be executed by automation. In this case, the relevant indicator will be displayed on the recommendations panel.

To get quick access to the automation configuration associated with the workload, click the View Config CR button.

Historical versions of specific Config CRs can be easily accessed, allowing for a comprehensive review of their changes over time.

Clicking Show History will open an additional panel with the list of historical CR versions, enabling you to review all the changes that were made to the CR configuration over time. Select any previous version to preview it with highlighted changes, showing the differences between the selected version and the current one.

CPU and Memory Over Time

These widgets provide a granular historical view of CPU and Memory utilization per container over the past 30 days, including the p90, p95, and p99.9 utilization percentiles, enabling you to seamlessly evaluate the efficiency of resource distribution by leveraging a comprehensive visual comparison of these values with set resource requests and limits.

  1. Container selector allows you to easily switch between containers to display the data for the specific container. Click the drop-down menu and select the needed container from the list.

  2. Control panel includes toggles that allow you to easily add or remove quantile lines from the chart. This feature is especially useful for managing data display, enabling smooth workload analysis with just a few clicks. Clicking on the toggles will either include or exclude the corresponding quantile lines from the chart.

Cost vs Waste

This widget provides a comprehensive historical cost and waste overview across all containers within the workload. It is particularly helpful for understanding cost and waste trends as well as identifying anomalies and spikes.

Cost is determined by the maximum of resources allocated or used (p90) on each machine. Any remaining machine headroom is not distributed across multiple workloads.

Replicas

This widget provides a comprehensive view of workload replicas, allowing you to track scaling trends over time. This view supports both scaling scenarios: a static replica count when HPA is not used and dynamic scaling when HPA has been introduced.

When the number of replicas is static, PerfectScale shows the average amount of replicas captured.

When HPA is enabled for a workload, the replicas widget provides a comprehensive view of all key configured parameters alongside the actual scaling values over time. On the right side of the widget, you’ll also find the configured HPA triggers and an indicator showing whether the maximum replica count has been reached, helping you monitor scaling behavior, detect spikes, and ensure efficient resource allocation.

Revisions History

Effective optimization of the environment requires understanding the release content and its impact on cost and resilience. PerfectScale built an advanced solution to address such issues and provide users with a comprehensive breakdown of every revision for each container. This enables easy comparison of versions to track issues and remediation effectiveness.

This view is particularly helpful for tracking when, why, and how resource allocations have changed over time, whether the change has been propagated manually or through automation, enabling users to streamline their investigations and remain aware of the impact of those changes.

Revisions History Timeline displays all the revisions for the last 30 days, where each block corresponds to a particular revision. Hover on the revision to see its details:

  • Revision ID

  • Date

  • Risks

  • The number of Restarts

The Optimization Policy can be set for the entire and for a specific . The workload's Optimization Policy takes precedence and will override the value defined at the cluster level.

Discover more about customizing the Optimization policy .

Current Risks shows the total identified within the cluster for the selected period. This value is dynamic and updates based on the filters applied in the .

Status indicates workloads at risk. Workloads could be easily filtered by the resiliency risk level or particular . Risk indicators are dynamic, i.e., the presence of OOM indicator in the list means that at least one workload experienced an out-of-memory event in a given timeframe.

Easily jump between , , and views using the switcher above the table.

The Recommendations Table offers clear insights into necessary workload resource adjustments to maintain the cluster's stability and cost efficiency. To access more information, click on the workload. This will open up a that provides a comprehensive breakdown.

If one or more resources have reached their CRD-defined size constraints, the recommendations will not be executed. In this case, the Limited by Rule indicator, along with an explanatory tooltip, will be displayed near the recommendations.

Learn more about resource allocation constraints .

Podfit Labels Profile enables users to create and save sets of labels, which can then be applied to clusters. The Label set listed in the Podfit Labels Profile will be applied to the clusters attached to this profile. how to configure the profile.

Optimization Policy outlines how resources should be allocated to meet the unique requirements of each workload. The Optimization Policy can be set for the entire and a specific .

Discover more about customizing the Optimization policy .

The previous version of the Zoom-in window is accessible. To change to this version, click the Switch to Legacy UI button in the Actions menu.

Clicking View in Observability will be directed you to the observability tool connected to the cluster. about how to integrate your preferred observability tool and receive exceptional insights from PerfectScale directly to your dashboard.

Create a ticket with all the details about needed changes in the defined project and assign it to the relevant engineer (team) automatically by clicking Create Ticket. Learn how to integrate your Jira with PerfectScale smoothly .

If the ticket already exists, you can use one of the following options: View Task or Delete Task.

Mute Workload is a useful feature when you want to stop receiving notifications for a specific workload. By muting it, you'll no longer get alerts related to that workload, even if there’s an linked to it. If you want to start receiving alerts for the previously muted workload, click Un-Mute Workload in the same menu.

Clicking Revert to Default Layout will reset the order of the widgets in the .

You can easily manage the order of widgets in the workload details panel. Grab the widget up or down by clicking on the widget name and moving it to the desired place on the panel. To reset the widgets to the default order, select Revert to Default Layout from the .

The recommendations widget provides recommendations for workload right-sizing. With this comprehensive view, you can effortlessly review current resource requests and limits per container, followed by the recommended values based on the actual resource consumption.

You can seamlessly configure autonomous workload optimization to actively maintain your environment in prime condition and ensure peak K8s performance at minimal cost. Check the in the top right corner (learn more about automation statuses ).

Learn more about resource allocation constraints .

Use the gear button to define the custom percentile.

Recommendations section displays the resource requests and limits recommendations for the selected container compared to the current values.

Click the controls above the graph to display or conceal specific parameters.

Clicking the revision will highlight this revision on the charts and display the corresponding data on the , enabling you to access all the needed data and streamline the analysis with a single click.

here
indicator
here
Learn here
here
Learn more here
here
Alert Profile
here
risks
workload table
Recommendations
Labels and Policies
HPA
Zoom-in window
Top panel
Workload summary panel
Workload details panel
Workload details panel
policy-driven
automation status
here
policy-driven
CPU and Memory Over Time
Workload summary panel
workload
workload
Actions menu
cluster
cluster
Status
I
Description

Active

Once the configuration is completed, automation will be indicated as successfully enabled.

Limited by Rule

When one or more resources have reached your configured size constraint in CRD, the recommendations can't be executed. The indicator will also be displayed in the . Learn more about resource allocation constraints .

Delayed

If the defined CRD causes time constraints, the execution of recommendations will be postponed.

Disabled

The merged CRD will disable automation for the workload. For example, if the cluster-level configuration enables automation while the namespace-level configuration disables it, the namespace-level configuration takes precedence, resulting in disabled automation for the particular workloads within the cluster.

Stopped

PerfectScale will forcibly stop the automation. For example, to prevent your environment from recursive resource increases, such as those resulting from memory leaks.

here
Zoom-in recommendation panel
Podfit screen
Podfit overview
Podfit telemetry section
Workloads table
Status
Automation status
Podfit view customization
Recommendations formats
Resource change summary
Labels view
Optimization policy
HPA view
Zoom-in window
Workload optimization policy
Actions
Workload summary
Workload risks
Risks expanded
Replicas & HPA
Recommendations
Recommendations YAML
Automation config CR
CR history
CPU & Memory over time
Percentiles
Cost vs waste widget
Static replicas
Dynamic scaling
Revision history
Revision on the chart
maintenance window