GPU optimization
Reduce cloud GPU costs with real-time utilization insights
PerfectScale delivers exceptional GPU utilization visibility to monitor and optimize GPU resources within your Kubernetes clusters. This feature helps teams identify optimization opportunities, reduce resource waste, and improve overall K8s efficiency.
To enable GPU visibility support in PerfectScale, the NVIDIA DCGM exporter should be installed. Additionally, specific configuration parameters should be set when deploying or upgrading the PerfectScale agent. Learn more here.
When PerfectScale detects active GPU resources within a cluster, it automatically enables GPU-specific widgets and utilization insights in the UI. These components provide detailed metrics on GPU usage, allocation efficiency, and workload distribution, enabling data-driven K8s optimization.
Podfit GPU visibility
To quickly identify GPU-allocated workloads in PodFit, switch to the GPU view by clicking the GPU tab from the view selector, as shown below, then sort the table by GPU usage. This will bring all GPU-consuming workloads to the top.

Click on a workload to open the detailed Zoom-in view. This panel provides in-depth information about the workload’s current state and behavior, along with historical data on resource allocation and utilization over time. It includes GPU utilization metrics, as well as other resource usage, and detected performance risks. To learn more about zoom-in capabilities, go here.

Infrafit GPU visibility
To see detailed GPU usage across your infrastructure, go to InfraFit. The GPU chart shows how much of your GPUs are being used versus how much was requested, making it easy to spot inefficiencies and find ways to optimize.

This view helps you quickly evaluate the difference between requested GPU resources and actual usage, making it easy to pinpoint underutilized or idle GPU capacity across your clusters.
By clicking on the specific node group, you will get a granular breakdown of individual instances within that group, along with key metrics for each one.

Clicking on a specific instance will display a list of workloads running on that machine, allowing for deeper investigation and analysis.
Last updated