> ## Documentation Index
> Fetch the complete documentation index at: https://docs.silicondata.com/llms.txt
> Use this file to discover all available pages before exploring further.

# SiliconMark™ User Guide

> Comprehensive guide to SiliconMark™.

The **SiliconMark™** agent will collect statistics on the performance of your GPU(s) and upload them to SiliconData. This will trigger the production of a PDF report detailing the results and comparison to public performance data.

## Select Test Configuration

* To run a benchmark, you can create a test job using the SiliconMark API (api/silicon-mark/v1/jobs), or using the [SiliconData developer site](https://portal.silicondata.com/silicon-mark).  You can also run a quick test on a single node without creating a job, but it will not be saved to your account and you will only get limited results and no PDF report.

You can choose to use the executable test agent, or the containerized agent. The executable agent is recommended for most users, as it provides the most comprehensive results. It is also the best option for users who want to run the agent on a single node or have specific requirements for how the agent is executed.

The containerized agent is particularly useful for CI/CD pipelines or when you want to isolate the benchmarking process from your host system. It also reduces the pre-requisites required to run the agent, as they are mostly bundled into the container.

<Tabs>
  <Tab title="Executable Agent">
    ### Pre-requisites

    1. AMD Drivers and ROCm or NVIDIA drivers with CUDA support must be installed. We recommend updating to the latest driver for your GPU, following [AMD's instructions](https://www.amd.com/en/support) or [NVIDIA's instructions](https://www.nvidia.com/en-us/drivers/).
    2. Python3.10+ and PyTorch must also be installed. The `--setup` flag on the agent automatically creates a self-contained Python virtual environment and installs the correct version of PyTorch for your GPU — this is the recommended approach.

    If you prefer to set up the environment yourself, install the dependencies manually:

    ```bash theme={null}
    # NVIDIA
    pip install torch numpy pynvml
    # AMD
    pip install torch numpy amdsmi
    ```

    ### Download the agent

    * When you create a job, you will receive an **id\_token**. The developer site also has a convenient **cURL** link that will copy the commands required to download and run the agent using the token for that job.
    * On the system you want to test, [download the agent](https://downloads.silicondata.com/agent) and run `/bin/bash ./agent -api-key {id_token} --setup` to have it execute.

    Example using shell to execute:

    ```bash theme={null}
    # Set credentials
    SD_EMAIL=<your email address>
    SD_PASSWORD=<Your SiliconData password>
    # Use current date and time to generate a unique job name
    SD_JOBNAME=$(date +%x-%R)
    # Log in to get a bearer token:
    SD_LOGIN=$(cat <<EOF
    {
      "email":"$SD_EMAIL",
      "password":"$SD_PASSWORD"
    }
    EOF
    )
    TOKEN=$(curl --location 'https://api.silicondata.com/api/user/login' \
    --header 'Content-Type: application/json' \
    --data-raw "$SD_LOGIN" | jq -r '.data.id_token')
    # Create a new job and get a job token
    SD_JOBDATA=$(cat <<EOF
    {
      "name": "$SD_JOBNAME",
      "benchmarks": ["inference_benchmark"],
      "description": "Job created by $SD_EMAIL"
    }
    EOF
    )
    curl --location 'https://api.silicondata.com/api/silicon-mark/v1/jobs' \
    --header 'Content-Type: application/json' \
    --header "Authorization: Bearer $TOKEN" \
    --data "$SD_JOBDATA" > job.json
    JOB_TOKEN=$(jq -r '.data.token' job.json)
    JOB_ID=$(jq -r '.data.id' job.json)
    # Download and run the agent using the Job Token
    wget -O ./agent https://downloads.silicondata.com/agent
    chmod +x ./agent
    ./agent -api-key $JOB_TOKEN --setup
    echo "Job $JOB_NAME complete, retrieve your report at https://silicondata.com/silicon-mark"
    ```
  </Tab>

  <Tab title="Containerized">
    ### Pre-requisites

    1. Ensure you have Docker installed on your system. You can follow the [Docker installation guide](https://docs.docker.com/get-docker/) for your specific operating system.
    2. NVIDIA Container Toolkit must be installed to allow Docker to access the GPU. Follow the [NVIDIA Container Toolkit installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)

    ### How to Use

    (Optional) Obtain your job token from the SiliconData developer site or API. This is not required if you are running a quick test.
    **Pull the Docker Image**: You can pull the latest SiliconMark™ Docker image using the following command:

    ```bash theme={null}
    docker pull ghcr.io/silicon-data/siliconmark-agent:test
    ```

    ### Run the Container

    Use the following command to run the SiliconMark™ agent:

    ```bash theme={null}
    docker run --gpus all --privileged ghcr.io/silicon-data/siliconmark-agent:test [id_token]
    ```

    * Replace `[id_token]` with your actual job token if you have one.
    * You can run the container without privileged mode, but it will limit the inventory data that is collected and will not be able to run the full test suite.

    ### Run Across All Cluster Nodes

    If you want the agent to run on every node in your cluster, the two most common approaches are:

    1. Kubernetes DaemonSet (one pod per node)

    * Prerequisites: NVIDIA Device Plugin installed on the cluster so pods can request GPUs. See NVIDIA docs: [https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/kubernetes.html](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/kubernetes.html)
    * Apply the DaemonSet below (update image tag and add your job token as needed):

    ```yaml theme={null}
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: siliconmark-agent
      namespace: default
    spec:
      selector:
        matchLabels:
          app: siliconmark-agent
      template:
        metadata:
          labels:
            app: siliconmark-agent
        spec:
          tolerations:
            - operator: Exists
          containers:
            - name: agent
              image: ghcr.io/silicon-data/siliconmark-agent:test
              securityContext:
                privileged: true
              resources:
                limits:
                  nvidia.com/gpu: <GPU_COUNT_PER_NODE>
              env:
                - name: JOB_TOKEN
                  value: "<your_job_token_if_any>"
              args: ["$(JOB_TOKEN)"]
          nodeSelector:
            kubernetes.io/os: linux
    ```

    * Apply with:

    ```bash theme={null}
    kubectl apply -f siliconmark-daemonset.yaml
    ```

    * This schedules one agent pod per node. If you need to target only GPU nodes, add a label (e.g., `gpu=true`) to those nodes and set `nodeSelector: { gpu: "true" }` accordingly.

    2. SSH/Ansible fan-out (non-orchestrated clusters)

    * Simple SSH loop (replace hosts and token):

    ```bash theme={null}
    NODES=(node1.example.com node2.example.com node3.example.com)
    JOB_TOKEN=<your_job_token>
    for n in "${NODES[@]}"; do
      ssh "$n" \
        "docker pull ghcr.io/silicon-data/siliconmark-agent:test && \
         docker run --gpus all --privileged ghcr.io/silicon-data/siliconmark-agent:test $JOB_TOKEN" &
    done
    wait
    ```

    * Minimal Ansible play (inventory `hosts.ini` and job token variable required):

    ```yaml theme={null}
    ---
    - hosts: all
      become: true
      tasks:
        - name: Pull agent image
          ansible.builtin.shell: |
            docker pull ghcr.io/silicon-data/siliconmark-agent:test
        - name: Run agent container
          ansible.builtin.shell: |
            docker run --gpus all --privileged ghcr.io/silicon-data/siliconmark-agent:test {{ job_token }}
    ```

    * Run with:

    ```bash theme={null}
    ansible-playbook -i hosts.ini run-siliconmark.yml -e job_token=<your_job_token>
    ```

    Notes

    * Use a job token when you want results saved to your account and multi-node network tests enabled.
    * Privileged mode increases hardware inventory visibility; disable only if your environment requires stricter isolation.
    * In Kubernetes, set `nvidia.com/gpu` equal to the GPU count on target nodes (e.g., 8) so the agent can use all GPUs. For mixed-size clusters, deploy separate DaemonSets per node group with `nodeSelector` and appropriate limits.
    * For Kubernetes, GPU access requires the NVIDIA device plugin; for SSH/Ansible, ensure the NVIDIA Container Toolkit is installed on each node.
  </Tab>
</Tabs>

## Monitoring the Agent run

* The agent will take some time to run. The more GPUs in the test system, the longer it will take. You can monitor it to ensure it is active by opening an additional connection and observing GPU activity:

```bash theme={null}
nvidia-smi dmon   # NVIDIA
amd-smi monitor   # AMD
```

The test will mostly be focused on one GPU at a time, but the final test should exercise all the GPUs in the system. When running in a terminal, the agent displays a live progress UI. In non-interactive environments (scripts, pipes), it falls back to plain log output automatically.

## Download Reports

* After it is complete, you can download the report from the [SiliconData portal](https://www.silicondata.com/silicon-mark) or retrieve a url to download it from the SiliconMark API:

```
PUT api/silicon-mark/test-task/pdf-report-url
Body: {"id": <task-id>}
```

## Output

The SiliconMark™ QuickMark benchmark measures 5 widely reported performance statistics of a system. Memory Bandwidth, FLOPS for different size data types (dTypes) and GPU energy consumption for running the benchmark.

Memory Bandwidth is important, as many GPU operations involve moving large volumes of data from RAM to the SM processors and back out. If there isn’t enough bandwidth then the GPU compute cores could sit idle waiting for data to process. Bandwidth is impacted by the vRAM speed and the memory interface bus width.

The next 3 metrics are measures of floating point operations per second that a GPU can process. GPUs excel at floating point calculations when compared to traditional CPUs. Floating point dTypes come in different sizes that impact the data precision and storage used. An FP32 dType uses 32-bits to store a floating point number, which gives it large range, and precision, but requires significant memory storage and bandwidth to move data to and from the GPU, and more computational time to process. FP16 is more efficient, and brain floating point (bfloat16) sacrifices precision for performance, being a space efficient format optimized for ML workloads.

Finally, energy consumption is read from the GPU at the beginning of the benchmark process and again at the end, with the delta providing the power consumption of executing the benchmark. All executions of the QuickMark benchmark are completing the same activities, thus the power consumed is comparable between different GPUs and can be used to guide sustainable consumption decision-making, especially for time-insensitive workloads. The temperature of the GPU is also recorded and can be used to understand how the GPU is performing under load, and whether it is being thermally throttled.

For multi-node cluster systems, SiliconMark also tests inter-node bandwidth and latency, giving you a realistic view of how your system handles distributed workloads—a critical factor in modern AI and HPC applications. SiliconMark assembles a list of systems based on nodes registering for a job and builds a graph of all the connections required to fully test every node. The agents run bandwidth and latency tests to populate the graph with latency and bandwidth data for each link.

### Example Output

```json theme={null}
{
  "machine_uuid": "59cfa587-c971-502e-b770-6dc5f989eb48",
  "config_id": "085ed573-2683-5144-bef1-51d77a207b43",
  "location": "US",
  "gpu_vendor": "NVIDIA",
  "gpu_count": 1,
  "cpu_info": {
    "cpu_count": "64",
    "arch": "x86_64",
    "os": "linux",
    "hardware_vendor": "Advanced Micro Devices, Inc.",
    "hardware_model": "AMD EPYC 7R13 Processor",
    "processor_brand": "AMD EPYC 7R13 Processor",
    "virtualization": "guest"
  },
  "gpu_info": [
    {
      "name": "NVIDIA H100 80GB HBM3",
      "total_memory": "81559 MiB",
      "driver_version": "570.133.20",
      "cuda_version": "12.8",
      "pci_info": {
        "generation_max": "5",
        "generation_current": "5",
        "link_width_max": "16",
        "link_width_current": "16"
      }
    }
  ],
  "ram_info": {
    "total_memory": "1842 GB",
    "memory_module_count": 16,
    "memory_module_type": "DDR5",
    "memory_module_speed": "4800 MT/s"
  },
  "disk_info": [
    {
      "name": "nvme0n1",
      "model": "Amazon Elastic Block Store",
      "rota": false,
      "size": "500 GB",
      "type": "disk",
      "mountpoints": ["/"],
      "read_speed": "3500 MB/s",
      "write_speed": "3000 MB/s"
    }
  ],
  "network_info": {
    "download_speed_mbps": "8835",
    "upload_speed_mbps": "1612",
    "open_ports": "12",
    "machine_ip": "10.132.64.57"
  },
  "benchmark_results": {
    "quick_mark": {
      "test_results": [
        {
          "gpu_id": "GPU-7cc898ab-df4c-7837-fe85-b18f350bbf01",
          "gpu_model": "NVIDIA H100 80GB HBM3",
          "fp32_tflops": 367.5,
          "fp32_cuda_core_tflops": 53.6,
          "fp16_tflops": 684.6,
          "bf16_tflops": 729.6,
          "fp8_tflops": 1456.2,
          "mixed_precision_tflops": 648.9,
          "memory_bandwidth_gbs": 3025.0,
          "l2_bandwidth_gbs": 415.5,
          "host_to_device_bandwidth_gbs": 27.7,
          "device_to_host_bandwidth_gbs": 28.5,
          "kernel_launch_overhead_us": 7.6,
          "power_consumption_watts": 654.3,
          "temperature_centigrade": 59,
          "energy_consumption_wh": 45.2
        }
      ],
      "aggregate_results": {
        "gpu_id": "aggregate",
        "fp32_tflops": 367.5,
        "fp16_tflops": 684.6,
        "bf16_tflops": 729.6,
        "memory_bandwidth_gbs": 3025.0,
        "power_consumption_watts": 654.3,
        "temperature_centigrade": 59
      }
    }
  }
}
```

## Benefits

* **Quick Results**: Delivers actionable insights within minutes.
* **Holistic Evaluation**: Supports evaluation of both single and multi-GPU setups.
* **Precision Tracking**: Ensures reproducibility and tracks performance over time.

***

## Contact and Support

For additional information or technical support, please contact:

* **Email**: [support@silicondata.com](mailto:support@silicondata.com)
