How to Define a pGPU Cost Model
TOC
PrerequisitesAbout Alauda Build of NVIDIA GPU Device PluginProcedureCreate PrometheusRule for generate needed metricsAdd Collection Config (Cost Management Agent)Add Display/Storage Config (Cost Management Server)Add Price For a pGPU Cost ModelBilling Method DescriptionCost Details and Cost StatisticsPrerequisites
In the GPU cluster:
- Alauda Build of NVIDIA GPU Device Plugin installed
- The Cost Management Agent installed
About Alauda Build of NVIDIA GPU Device Plugin
The NVIDIA device plugin for Kubernetes is a Daemonset that allows you to automatically:
- Expose the number of GPUs on each nodes of your cluster
- Keep track of the health of your GPUs
- Run GPU enabled containers in your Kubernetes cluster.
Procedure
Create PrometheusRule for generate needed metrics
Create a PrometheusRule in the GPU cluster.
Add Collection Config (Cost Management Agent)
Create a ConfigMap in the GPU cluster where the Cost Management Agent runs to declare what to collect.
After adding the yaml , you need to restart the Agent Pod to reload the configurations.
Add Display/Storage Config (Cost Management Server)
Create a ConfigMap in the cluster where the Cost Management Server runs to declare billing items, methods, units, and display names. This tells the server what and how to bill.
After adding the yaml , you need to restart the Server Pod to reload the configurations.
Add Price For a pGPU Cost Model
Billing Method Description
If the GPU cluster does not have a Cost model, you need to create a new cost model. Then you can add price for the cost model of the GPU cluster:
- Select
pGPUin Billing Items. - Select
Request Usage (count-hours)orProject Quota (count-hours)in Method. - Set Default Price.
- Config Price By Label (optional).
Example:
key: modelName
value: "Tesla P100-PCIE-16GB" or "Tesla T4" or "NVIDIA A30"(Got it by run
nvidia-smi)
Cost Details and Cost Statistics
Finally, after waiting for 1 or more hours, you can see the cost details in the Cost Details with namespace and card uuid dimensions. And you can see the total costs based on cluster, project, and namespace in the Cost Statistics.