The SiliconMark™ agent will collect statistics on the performance of your GPU(s) and upload them to SiliconData. This will trigger the production of a PDF report detailing the results and comparison to public performance data.
To run a benchmark, you can create a test job using the SiliconMark API (api/silicon-mark/test-job/create), or using the SiliconData developer site. You can also run a quick test on a single node without creating a job, but it will not be saved to your account and you will only get limited results and no report.
You can choose to use the executable test agent, or the containerized agent. The executable agent is recommended for most users, as it provides the most comprehensive results, though it has more complex pre-requisites to configure.The containerized agent is particularly useful for CI/CD pipelines or when you want to isolate the benchmarking process from your host system. It also reduces the pre-requisites required to run the agent, as they are mostly bundled into the container.
The agent is currently only available for x86_64 Linux systems with NVIDIA GPUs. If you have a use case for other GPU Vendors, Arm, Windows or MacOS versions, please contact us with details.
NVIDIA drivers with CUDA support must be installed. We recommend updating to the latest driver for your GPU, following NVIDIA’s instructions.
Python3.10+ and PyTorch 2.5.1+ must also be installed. Some systems may have Python configurations that make it more complex. PyTorch has installation documentation available.
If you have Python3.10+ installed, you can install the dependencies with the following command:
(Optional) For systems equipped with RTX 5070, RTX 5070 Ti, RTX 5080, or RTX 5090 GPUs, a pre-release PyTorch version may be required. In such cases, use the following command:
When you create a job, you will receive an id_token. The developer site also has a convenient cURL link that will copy the commands required to download and run the agent using the token for that job.
On the system you want to test, download the agent and run /bin/bash ./agent -api-key to have it execute. You can add -quiet if you want no output for use in an automation script or pipeline.
Example using shell to execute:
Copy
Ask AI
# Set credentialsSD_EMAIL=<your email address>SD_PASSWORD=<Your SiliconData password># Use current date and time to generate a unique job nameSD_JOBNAME=$(date +%x-%R)# Log in to get a bearer token:SD_LOGIN=$(cat <<EOF{ "email":"$SD_EMAIL", "password":"$SD_PASSWORD"}EOF)TOKEN=$(curl --location 'https://api.silicondata.com/api/user/login' \--header 'Content-Type: application/json' \--data-raw "$SD_LOGIN" | jq -r '.data.id_token')# Create a new job and get a job tokenSD_JOBDATA=$(cat <<EOF{ "name": "$SD_JOBNAME", "test_plan_name": "QuickMark", "description": "Job created by $SD_EMAIL"}EOF)curl --location 'https://api.silicondata.com/api/silicon-mark/test-job/create' \--header 'Content-Type: application/json' \--header "Authorization: Bearer $TOKEN" \--data "$SD_JOBDATA" > job.jsonJOB_TOKEN=$(jq -r '.data.token' job.json)JOB_ID=$(jq -r '.data.id' job.json)# Download and run the agent using the Job Tokenwget -O ./agent https://downloads.silicondata.com/agentchmod +x ./agent./agent -api-key $JOB_TOKENecho "Job $JOB_NAME complete, retrieve your report at https://silicondata.com/silicon-mark"
The agent will take some time to run. The more GPUs in the test system, the longer it will take. You can monitor it to ensure it is active by opening an additional connection and running nvidia-smi dmon and observing the GPU activity. The test will mostly be focused on one GPU at a time, but the final test should exercise all the GPUs in the system.
After it is complete, you can download the report from the SiliconData portal or retrieve a url to download it from the SiliconMark API (PUT api/silicon-mark/test-task/pdf-report-url) with {"id":task-id} as the payload.
The SiliconMark™ QuickMark benchmark measures 5 widely reported performance statistics of a system. Memory Bandwidth, FLOPS for different size data types (dTypes) and GPU energy consumption for running the benchmark.Memory Bandwidth is important, as many GPU operations involve moving large volumes of data from RAM to the SM processors and back out. If there isn’t enough bandwidth then the GPU compute cores could sit idle waiting for data to process. Bandwidth is impacted by the vRAM speed and the memory interface bus width.The next 3 metrics are measures of floating point operations per second that a GPU can process. GPUs excel at floating point calculations when compared to traditional CPUs. Floating point dTypes come in different sizes that impact the data precision and storage used. An FP32 dType uses 32-bits to store a floating point number, which gives it large range, and precision, but requires significant memory storage and bandwidth to move data to and from the GPU, and more computational time to process. FP16 is more efficient, and brain floating point (bfloat16) sacrifices precision for performance, being a space efficient format optimized for ML workloads.Finally, energy consumption is read from the GPU at the beginning of the benchmark process and again at the end, with the delta providing the power consumption of executing the benchmark. All executions of the QuickMark benchmark are completing the same activities, thus the power consumed is comparable between different GPUs and can be used to guide sustainable consumption decision-making, especially for time-insensitive workloads. The temperature of the GPU is also recorded and can be used to understand how the GPU is performing under load, and whether it is being thermally throttled.For multi-node cluster systems, SiliconMark also tests inter-node bandwidth and latency, giving you a realistic view of how your system handles distributed workloads—a critical factor in modern AI and HPC applications. SiliconMark assembles a list of systems based on nodes registering for a job and builds a graph of all the connections required to fully test every node. The agents run bandwidth and latency tests to populate the graph with latency and bandwidth data for each link.