GPU optimization

Reduce cloud GPU costs with real-time utilization insights

PerfectScale now only supports NVIDIA Data Center GPU Manager (DCGM). GPU support is available starting with the exporter version 1.0.55.

PerfectScale delivers exceptional GPU utilization visibility to monitor and optimize GPU resources within your Kubernetes clusters. This feature helps teams identify optimization opportunities, reduce resource waste, and improve overall K8s efficiency.

When PerfectScale detects active GPU resources within a cluster, it automatically enables GPU-specific widgets and utilization insights in the UI. These components provide detailed metrics on GPU usage, allocation efficiency, and workload distribution, enabling data-driven K8s optimization.

Podfit GPU visibility

To quickly identify GPU-allocated workloads in PodFit, switch to the GPU view by clicking the GPU tab from the view selector, as shown below, then sort the table by GPU usage. This will bring all GPU-consuming workloads to the top.

GPU view - Podfit

Click on a workload to open the detailed Zoom-in view. This panel provides in-depth information about the workload’s current state and behavior, along with historical data on resource allocation and utilization over time. It includes GPU utilization metrics, as well as other resource usage, and detected performance risks. To learn more about zoom-in capabilities, go here.

Workload details - GPU widget

Infrafit GPU visibility

To see detailed GPU usage across your infrastructure, go to InfraFit. The GPU chart shows how much of your GPUs are being used versus how much was requested, making it easy to spot inefficiencies and find ways to optimize.

GPU view - Infrafit

This view helps you quickly evaluate the difference between requested GPU resources and actual usage, making it easy to pinpoint underutilized or idle GPU capacity across your clusters.

By clicking on the specific node group, you will get a granular breakdown of individual instances within that group, along with key metrics for each one.

GPU utilization by instance

Clicking on a specific instance will display a list of workloads running on that machine, allowing for deeper investigation and analysis.

Last updated