Infrafit | node right-sizing

InfraFit is an advanced feature that provides a granular view of resource allocation across the various nodes supporting your Kubernetes clusters

Infrafit overview

Infrafit is an advanced feature that provides comprehensive visibility of the entire Kubernetes environment at the infrastructure level. It helps in understanding the behavior of specific node groups and node types, and provides actionable data-driven insights, optimizing the underlying infrastructure of the workloads. Infrafit offers a clear historical view of node utilization across the entire environment, enhancing the optimization process.

This level of visibility allows you to identify areas with idle space and gain insights into adjusting node sizes based on actual resource utilization and selecting the optimal node types to support the needs of your workloads. Additionally, Infrafit provides a unique view into scheduling results, showing what workloads are being scheduled to each node group, what is preventing scaling down, and what causes overcommitment over time.

Infrafit & Podfit connectivity

Infrafit and Podfit together offer a comprehensive solution that provides a multidimensional approach to reduce wasted resources in your nodes. When drilling down into individual node types or node groups, InfraFit provides a unique view into scheduling results, showing which workloads are being scheduled to the node group, what is preventing scaling down, and what causes overcommitting over time.

While Podfit helps you right-size your workloads based on their actual resource utilization, Infrafit provides data-driven recommendations that help you choose the optimal node type to best serve the needs of your workloads.

This provides insights that can help you more efficiently bin-pack and size your node to maximize resource utilization. It also helps you lower your overall node count while improving the efficiency of your node-autoscalers (like Karpenter or Cluster Autoscaler) to keep your costs optimized.

By combining the capabilities of both features, you gain a comprehensive understanding of your infrastructure and workloads, enabling more efficient management and cost reduction.

Key benefits

Integrated Insights: Infrafit provides detailed visibility into node groups and node types, while Podfit offers insights into workload scheduling and resource allocation. Together, they give you a holistic view of your environment.
Optimized Resource Allocation: By analyzing the data from Infrafit and Podfit, you can make informed decisions on adjusting node sizes, selecting the best nodes for your workloads, improving bin-packing, and optimizing the performance of node-autoscalers (e.g., Karpenter or Cluster Autoscaler).
Enhanced Efficiency: Identify workloads that prevent scaling down or cause overcommitment and adjust resource allocations accordingly. This helps in lowering the overall node count while maximizing resource utilization.
Cost Reduction: With a clear understanding of resource usage and waste, you can implement strategies to reduce costs and improve the efficiency of your infrastructure.

How it works

Workload Scheduling Visibility: Infrafit provides a unique view into scheduling results, showing which workloads are being scheduled to each node group, what prevents scaling down, and what causes overcommitment over time.
Resource Utilization Insights: Podfit complements this by offering detailed insights into workload runtime and resource usage, helping you identify areas of improvement.
Data-Driven Recommendations: Together, Infrafit and Podfit provide actionable recommendations to optimize your Kubernetes environment, ensuring that resources are allocated efficiently and costs are minimized.

By leveraging the combined power of Infrafit and Podfit, you can achieve a more efficient, cost-effective, and well-optimized Kubernetes infrastructure. We will explore how to leverage these capabilities throughout the documentation.

Upper Panel

1. Tenant name

Displays the name of the account and enables you to switch between different accounts quickly.

2. Clusters drop-down

This menu allows you to switch between clusters and displays the associated data.

3. In-app path

Shows your current location within the app, helping you easily navigate and understand where you are in the interface.

4. Optimization Policy

Displays the cluster's optimization policy. Optimization policy allows you to specify how your resources should be allocated in order to support the individual needs of your workloads. Define the policies that best suit your environment and business goals, depending on whether you want to maximize cost savings or provide extra headroom to maintain the resilience of mission-critical services.

MaxSavings - maximum cost savings, the best for non-production environments
Balanced (default) - optimally balances cost and resiliency
ExtraHeadroom - the best fit for latency-sensitive environments
MaxHeadroom - keeps the environment above the highest spikes

The Optimization Policy can be set for the entire cluster and a specific workload. The workload's Optimization Policy takes precedence and will override the value defined at the cluster level.

If a custom policy is set through the exporter when installing the PerfectScale Agent, it cannot be modified in the UI afterward. You can still change the policy by upgrading the exporter with the new value, or you can return it to the default by upgrading the exporter without specifying any value (this will also enable the option to change the custom time window through the UI).

Discover more about customizing the Optimization policy here.

5. Timeframe

Allows you to check the data for a specific time period: click on the drop-down list in the upper right corner and select one of the options.

6. Export

This feature allows you to seamlessly analyze and effortlessly share your data by exporting it into a .csv file. Click the Export button, and the data will be exported to your local machine in a few seconds.

Nodes Resource Utilization Over Time Panel

PerfectScale provides comprehensive GPU visibility when GPU nodes are detected in the cluster. This visibility enables you to monitor GPU usage in real time, identify underutilized or idle resources, and make informed decisions to optimize GPU allocation and reduce waste. Learn more about GPU optimization here.

Hover over the specific time point on the chart to view data for that time

1. Node Groups/Node Types selector

This selector allows easy switching between node groups and node types, giving you different perspectives of your infrastructure for more comprehensive analysis.

2. Utilization

A drop-down selector allows you to choose the resource utilization percentile (affects graphs as well as the Utilization chart in the table below), helping you better understand the resource utilization patterns.

3. Cost per Node Group/Node Type

Displays the cost of the Node Groups or Node Types over the selected timeframe.

The diagram displays the top 10 node groups or node types with the highest costs

4. CPU Over Time

Displays the allocated, requested, and used amount of CPU (cores) with the selected utilization percentile over the selected timeframe.

5. Memory Over Time

Displays the allocated, requested, and used amount of Memory (GB) with the selected utilization percentile over the selected timeframe.

Data Table Display Options

Use the Node Groups/Node Types switcher to change the desired view easily.

PreviousMuted workload NextNode group view

Last updated 4 months ago

Was this helpful?