Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.silicondata.com/llms.txt

Use this file to discover all available pages before exploring further.

The SiliconMark™ agent will collect statistics on the performance of your GPU(s) and upload them to SiliconData. This will trigger the production of a PDF report detailing the results and comparison to public performance data.

Select Test Configuration

  • To run a benchmark, you can create a test job using the SiliconMark API (api/silicon-mark/v1/jobs), or using the SiliconData developer site. You can also run a quick test on a single node without creating a job, but it will not be saved to your account and you will only get limited results and no PDF report.
You can choose to use the executable test agent, or the containerized agent. The executable agent is recommended for most users, as it provides the most comprehensive results. It is also the best option for users who want to run the agent on a single node or have specific requirements for how the agent is executed. The containerized agent is particularly useful for CI/CD pipelines or when you want to isolate the benchmarking process from your host system. It also reduces the pre-requisites required to run the agent, as they are mostly bundled into the container.

Pre-requisites

  1. AMD Drivers and ROCm or NVIDIA drivers with CUDA support must be installed. We recommend updating to the latest driver for your GPU, following AMD’s instructions or NVIDIA’s instructions.
  2. Python3.10+ and PyTorch must also be installed. The --setup flag on the agent automatically creates a self-contained Python virtual environment and installs the correct version of PyTorch for your GPU — this is the recommended approach.
If you prefer to set up the environment yourself, install the dependencies manually:
# NVIDIA
pip install torch numpy pynvml
# AMD
pip install torch numpy amdsmi

Download the agent

  • When you create a job, you will receive an id_token. The developer site also has a convenient cURL link that will copy the commands required to download and run the agent using the token for that job.
  • On the system you want to test, download the agent and run /bin/bash ./agent -api-key {id_token} --setup to have it execute.
Example using shell to execute:
# Set credentials
SD_EMAIL=<your email address>
SD_PASSWORD=<Your SiliconData password>
# Use current date and time to generate a unique job name
SD_JOBNAME=$(date +%x-%R)
# Log in to get a bearer token:
SD_LOGIN=$(cat <<EOF
{
  "email":"$SD_EMAIL",
  "password":"$SD_PASSWORD"
}
EOF
)
TOKEN=$(curl --location 'https://api.silicondata.com/api/user/login' \
--header 'Content-Type: application/json' \
--data-raw "$SD_LOGIN" | jq -r '.data.id_token')
# Create a new job and get a job token
SD_JOBDATA=$(cat <<EOF
{
  "name": "$SD_JOBNAME",
  "benchmarks": ["inference_benchmark"],
  "description": "Job created by $SD_EMAIL"
}
EOF
)
curl --location 'https://api.silicondata.com/api/silicon-mark/v1/jobs' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $TOKEN" \
--data "$SD_JOBDATA" > job.json
JOB_TOKEN=$(jq -r '.data.token' job.json)
JOB_ID=$(jq -r '.data.id' job.json)
# Download and run the agent using the Job Token
wget -O ./agent https://downloads.silicondata.com/agent
chmod +x ./agent
./agent -api-key $JOB_TOKEN --setup
echo "Job $JOB_NAME complete, retrieve your report at https://silicondata.com/silicon-mark"

Monitoring the Agent run

  • The agent will take some time to run. The more GPUs in the test system, the longer it will take. You can monitor it to ensure it is active by opening an additional connection and observing GPU activity:
nvidia-smi dmon   # NVIDIA
amd-smi monitor   # AMD
The test will mostly be focused on one GPU at a time, but the final test should exercise all the GPUs in the system. When running in a terminal, the agent displays a live progress UI. In non-interactive environments (scripts, pipes), it falls back to plain log output automatically.

Download Reports

  • After it is complete, you can download the report from the SiliconData portal or retrieve a url to download it from the SiliconMark API:
PUT api/silicon-mark/test-task/pdf-report-url
Body: {"id": <task-id>}

Output

The SiliconMark™ QuickMark benchmark measures 5 widely reported performance statistics of a system. Memory Bandwidth, FLOPS for different size data types (dTypes) and GPU energy consumption for running the benchmark. Memory Bandwidth is important, as many GPU operations involve moving large volumes of data from RAM to the SM processors and back out. If there isn’t enough bandwidth then the GPU compute cores could sit idle waiting for data to process. Bandwidth is impacted by the vRAM speed and the memory interface bus width. The next 3 metrics are measures of floating point operations per second that a GPU can process. GPUs excel at floating point calculations when compared to traditional CPUs. Floating point dTypes come in different sizes that impact the data precision and storage used. An FP32 dType uses 32-bits to store a floating point number, which gives it large range, and precision, but requires significant memory storage and bandwidth to move data to and from the GPU, and more computational time to process. FP16 is more efficient, and brain floating point (bfloat16) sacrifices precision for performance, being a space efficient format optimized for ML workloads. Finally, energy consumption is read from the GPU at the beginning of the benchmark process and again at the end, with the delta providing the power consumption of executing the benchmark. All executions of the QuickMark benchmark are completing the same activities, thus the power consumed is comparable between different GPUs and can be used to guide sustainable consumption decision-making, especially for time-insensitive workloads. The temperature of the GPU is also recorded and can be used to understand how the GPU is performing under load, and whether it is being thermally throttled. For multi-node cluster systems, SiliconMark also tests inter-node bandwidth and latency, giving you a realistic view of how your system handles distributed workloads—a critical factor in modern AI and HPC applications. SiliconMark assembles a list of systems based on nodes registering for a job and builds a graph of all the connections required to fully test every node. The agents run bandwidth and latency tests to populate the graph with latency and bandwidth data for each link.

Example Output

{
  "machine_uuid": "59cfa587-c971-502e-b770-6dc5f989eb48",
  "config_id": "085ed573-2683-5144-bef1-51d77a207b43",
  "location": "US",
  "gpu_vendor": "NVIDIA",
  "gpu_count": 1,
  "cpu_info": {
    "cpu_count": "64",
    "arch": "x86_64",
    "os": "linux",
    "hardware_vendor": "Advanced Micro Devices, Inc.",
    "hardware_model": "AMD EPYC 7R13 Processor",
    "processor_brand": "AMD EPYC 7R13 Processor",
    "virtualization": "guest"
  },
  "gpu_info": [
    {
      "name": "NVIDIA H100 80GB HBM3",
      "total_memory": "81559 MiB",
      "driver_version": "570.133.20",
      "cuda_version": "12.8",
      "pci_info": {
        "generation_max": "5",
        "generation_current": "5",
        "link_width_max": "16",
        "link_width_current": "16"
      }
    }
  ],
  "ram_info": {
    "total_memory": "1842 GB",
    "memory_module_count": 16,
    "memory_module_type": "DDR5",
    "memory_module_speed": "4800 MT/s"
  },
  "disk_info": [
    {
      "name": "nvme0n1",
      "model": "Amazon Elastic Block Store",
      "rota": false,
      "size": "500 GB",
      "type": "disk",
      "mountpoints": ["/"],
      "read_speed": "3500 MB/s",
      "write_speed": "3000 MB/s"
    }
  ],
  "network_info": {
    "download_speed_mbps": "8835",
    "upload_speed_mbps": "1612",
    "open_ports": "12",
    "machine_ip": "10.132.64.57"
  },
  "benchmark_results": {
    "quick_mark": {
      "test_results": [
        {
          "gpu_id": "GPU-7cc898ab-df4c-7837-fe85-b18f350bbf01",
          "gpu_model": "NVIDIA H100 80GB HBM3",
          "fp32_tflops": 367.5,
          "fp32_cuda_core_tflops": 53.6,
          "fp16_tflops": 684.6,
          "bf16_tflops": 729.6,
          "fp8_tflops": 1456.2,
          "mixed_precision_tflops": 648.9,
          "memory_bandwidth_gbs": 3025.0,
          "l2_bandwidth_gbs": 415.5,
          "host_to_device_bandwidth_gbs": 27.7,
          "device_to_host_bandwidth_gbs": 28.5,
          "kernel_launch_overhead_us": 7.6,
          "power_consumption_watts": 654.3,
          "temperature_centigrade": 59,
          "energy_consumption_wh": 45.2
        }
      ],
      "aggregate_results": {
        "gpu_id": "aggregate",
        "fp32_tflops": 367.5,
        "fp16_tflops": 684.6,
        "bf16_tflops": 729.6,
        "memory_bandwidth_gbs": 3025.0,
        "power_consumption_watts": 654.3,
        "temperature_centigrade": 59
      }
    }
  }
}

Benefits

  • Quick Results: Delivers actionable insights within minutes.
  • Holistic Evaluation: Supports evaluation of both single and multi-GPU setups.
  • Precision Tracking: Ensures reproducibility and tracks performance over time.

Contact and Support

For additional information or technical support, please contact: