SiliconMark™ User Guide

The SiliconMark™ agent will collect statistics on the performance of your GPU(s) and upload them to SiliconData. This will trigger the production of a PDF report detailing the results and comparison to public performance data.

Select Test Configuration

To run a benchmark, you can create a test job using the SiliconMark API (api/silicon-mark/v1/jobs), or using the SiliconData developer site. You can also run a quick test on a single node without creating a job, but it will not be saved to your account and you will only get limited results and no report.

You can choose to use the executable test agent, or the containerized agent. The executable agent is recommended for most users, as it provides the most comprehensive results, though it has more complex pre-requisites to configure. The containerized agent is particularly useful for CI/CD pipelines or when you want to isolate the benchmarking process from your host system. It also reduces the pre-requisites required to run the agent, as they are mostly bundled into the container.

Executable Agent
Containerized

Pre-requisites

The agent is currently only available for x86_64 Linux systems with NVIDIA GPUs. If you have a use case for other GPU Vendors, Arm, Windows or MacOS versions, please contact us with details.
NVIDIA drivers with CUDA support must be installed. We recommend updating to the latest driver for your GPU, following NVIDIA’s instructions.
Python3.10+ and PyTorch 2.5.1+ must also be installed. Some systems may have Python configurations that make it more complex. PyTorch has installation documentation available.

If you have Python3.10+ installed, you can install the dependencies with the following command:

pip3 install --upgrade torch torchvision torchaudio pynvml

(Optional) For systems equipped with RTX 5070, RTX 5070 Ti, RTX 5080, or RTX 5090 GPUs, a pre-release PyTorch version may be required. In such cases, use the following command:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 --force-reinstall && pip3 install pynvml 

Validate pre-requisites

You can validate these requirements are in place by running a couple of commands:

nvidia-smi should run and provide a basic inventory of your GPU hardware
python3 -c 'import torch; print(torch.cuda.is_available())' should output true

Download the agent

When you create a job, you will receive an id_token. The developer site also has a convenient cURL link that will copy the commands required to download and run the agent using the token for that job.
On the system you want to test, download the agent and run /bin/bash ./agent -api-key to have it execute. You can add -quiet if you want no output for use in an automation script or pipeline.

Example using shell to execute:

# Set credentials
SD_EMAIL=<your email address>
SD_PASSWORD=<Your SiliconData password>
# Use current date and time to generate a unique job name
SD_JOBNAME=$(date +%x-%R)
# Log in to get a bearer token:
SD_LOGIN=$(cat <<EOF
{
  "email":"$SD_EMAIL",
  "password":"$SD_PASSWORD"
}
EOF
)
TOKEN=$(curl --location 'https://api.silicondata.com/api/user/login' \
--header 'Content-Type: application/json' \
--data-raw "$SD_LOGIN" | jq -r '.data.id_token')
# Create a new job and get a job token
SD_JOBDATA=$(cat <<EOF
{
  "name": "$SD_JOBNAME",
  "benchmarks": ["quick_mark"],
  "description": "Job created by $SD_EMAIL"
}
EOF
)
curl --location 'https://api.silicondata.com/api/silicon-mark/v1/jobs' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $TOKEN" \
--data "$SD_JOBDATA" > job.json
JOB_TOKEN=$(jq -r '.data.token' job.json)
JOB_ID=$(jq -r '.data.id' job.json)
# Download and run the agent using the Job Token
wget -O ./agent https://downloads.silicondata.com/agent
chmod +x ./agent
./agent -api-key $JOB_TOKEN
echo "Job $JOB_NAME complete, retrieve your report at https://silicondata.com/silicon-mark"

Monitoring the Agent run

The agent will take some time to run. The more GPUs in the test system, the longer it will take. You can monitor it to ensure it is active by opening an additional connection and running nvidia-smi dmon and observing the GPU activity. The test will mostly be focused on one GPU at a time, but the final test should exercise all the GPUs in the system.

Download Reports

After it is complete, you can download the report from the SiliconData portal or retrieve a url to download it from the SiliconMark API (PUT api/silicon-mark/v1/task/{task_id}.

Output

The SiliconMark™ QuickMark benchmark measures 5 widely reported performance statistics of a system. Memory Bandwidth, FLOPS for different size data types (dTypes) and GPU energy consumption for running the benchmark. Memory Bandwidth is important, as many GPU operations involve moving large volumes of data from RAM to the SM processors and back out. If there isn’t enough bandwidth then the GPU compute cores could sit idle waiting for data to process. Bandwidth is impacted by the vRAM speed and the memory interface bus width. The next 3 metrics are measures of floating point operations per second that a GPU can process. GPUs excel at floating point calculations when compared to traditional CPUs. Floating point dTypes come in different sizes that impact the data precision and storage used. An FP32 dType uses 32-bits to store a floating point number, which gives it large range, and precision, but requires significant memory storage and bandwidth to move data to and from the GPU, and more computational time to process. FP16 is more efficient, and brain floating point (bfloat16) sacrifices precision for performance, being a space efficient format optimized for ML workloads. Finally, energy consumption is read from the GPU at the beginning of the benchmark process and again at the end, with the delta providing the power consumption of executing the benchmark. All executions of the QuickMark benchmark are completing the same activities, thus the power consumed is comparable between different GPUs and can be used to guide sustainable consumption decision-making, especially for time-insensitive workloads. The temperature of the GPU is also recorded and can be used to understand how the GPU is performing under load, and whether it is being thermally throttled. For multi-node cluster systems, SiliconMark also tests inter-node bandwidth and latency, giving you a realistic view of how your system handles distributed workloads—a critical factor in modern AI and HPC applications. SiliconMark assembles a list of systems based on nodes registering for a job and builds a graph of all the connections required to fully test every node. The agents run bandwidth and latency tests to populate the graph with latency and bandwidth data for each link.

Example Output

{
  "machine_uuid": "59cfa587-c971-502e-b770-6dc5f989eb48",
  "config_id": "085ed573-2683-5144-bef1-51d77a207b43",
  "location": "US",
  "gpu_count": 1,
  "cpu_info": {
    "cpu_count": "1",
    "arch": "x86_64",
    "os": "linux",
    "hardware_vendor": "Advanced Micro Devices, Inc.",
    "hardware_model": "AMD EPYC 7R13 Processor"
  },
  "gpu_info": {
    "name": "NVIDIA XX",
    "total_memory": "XX GB",
    "driver_version": "570.133.20",
    "cuda_version": "12.8",
    "pci_info": {
      "generation_max": "4",
      "generation_current": "1",
      "link_width_max": "16",
      "link_width_current": "8"
    }
  },
  "ram_info": {
    "total_memory": "XX GB",
    "memory_module_count": 1,
    "memory_module_type": "DDR4",
    "memory_module_speed": "3200 MT/s"
  },
  "disk_info": {
    "name": "nvme0n1",
    "model": "Amazon Elastic Block Store",
    "rota": false,
    "size": "500 GB",
    "type": "disk",
    "mountpoints": [
      "/"
    ]
  },
  "network_info": {
    "download_speed_mbps": "8835",
    "upload_speed_mbps": "1612",
    "open_ports": "0",
    "machine_ip": "172.17.0.2"
  },
  "benchmark_results": [
    {
      "gpu_id": "GPU-7cc898ab-df4c-7837-fe85-b18f350bbf01",
      "fp32_tflops": XX.XX,
      "fp32_cuda_core_tflops": XX.XX,
      "fp16_tflops": XX.XX,
      "bf16_tflops": XX.XX,
      "mixed_precision_tflops": XX.XX,
      "memory_bandwidth_gbs": XX.XX,
      "temperature_centigrade": XX,
      "power_consumption_watts": XX.XX,
      "host_to_device_bandwidth_gbs": XX.XX,
      "device_to_host_bandwidth_gbs": XX.XX,
      "kernel_launch_overhead_us": XX.XX,
      "l2_bandwidth_gbs": XXX.XX,
    }
  ]
}

Benefits

Quick Results: Delivers actionable insights within minutes.
Holistic Evaluation: Supports evaluation of both single and multi-GPU setups.
Precision Tracking: Ensures reproducibility and tracks performance over time.

Contact and Support

For additional information or technical support, please contact:

Email: support@silicondata.com

SiliconNavigator™

SiliconCarbon™

SiliconMark™

PriceIQ™

GPU Index

SiliconMark™ User Guide

Select Test Configuration

Pre-requisites

Validate pre-requisites

Download the agent

Monitoring the Agent run

Download Reports

Output

Example Output

Benefits

Contact and Support

SiliconNavigator™

SiliconCarbon™

SiliconMark™

PriceIQ™

GPU Index

​Select Test Configuration

​Pre-requisites

​Validate pre-requisites

​Download the agent

​Monitoring the Agent run

​Download Reports

​Output

​Example Output

​Benefits

​Contact and Support

Select Test Configuration

Pre-requisites

Validate pre-requisites

Download the agent

Monitoring the Agent run

Download Reports

Output

Example Output

Benefits

Contact and Support