SiliconMark™ Benchmarks

Available Benchmarks

SiliconMark supports various benchmarks for GPU performance testing.

Quick Mark - A comprehensive single-node GPU performance test that always runs first
Cluster Network benchmark - Multi-node network connectivity and bandwidth testing
LLM Fine-Tuning benchmark - Single-node LLM fine-tuning performance benchmarks

Each benchmark section includes configuration options, execution actions, result structures, and field metadata for interpreting the performance metrics.

QuickMark Benchmark

Overview

Field	Value
Benchmark ID	`quick_mark`
Type	Single-node
Min Nodes	1
Description	Comprehensive GPU performance test

Configuration

No configuration required - uses defaults.

Result Structure

{
  "test_results": [
    {
      "gpu_id": "GPU-000...",
      "bf16_tflops": 156.8,
      "fp16_tflops": 78.5,
      "fp32_tflops": 39.2,
      "fp32_cuda_core_tflops": 19.5,
      "mixed_precision_tflops": 312.4,
      "l2_bandwidth_gbs": 3890.5,
      "memory_bandwidth_gbs": 1935.4,
      "temperature_centigrade": 65,
      "power_consumption_watts": 350,
      "kernel_launch_overhead_us": 12.5,
      "device_to_host_bandwidth_gbs": 24.8,
      "host_to_device_bandwidth_gbs": 25.2
    },
    {
       "gpu_id": "GPU-111...",
       ...
    }
  ],
  "aggregate_results": {
    "total_fp16_tflops": 628.0,
    "total_fp32_tflops": 313.6,
    "avg_temperature": 67.2,
    "allreduce_bandwidth_gbs": 180.5,
    "broadcast_bandwidth_gbs": 175.3,
    "fp16_tflops_per_peak_watt": 1.79,
    "fp32_tflops_per_peak_watt": 0.89,
    "gpu_bandwidth_matrix": [
      {
        "source_gpu": 0,
        "target_gpu": 1,
        "duplex_gbs": 450.2,
        "simplex_gbs": 225.1,
        "connection_type": ...
      }
    ]
  }
}

Field Metadata

Field	Display Name	Unit
`bf16_tflops`	BF16 Performance	TFLOPS
`fp16_tflops`	FP16 Performance	TFLOPS
`fp32_tflops`	FP32 Performance	TFLOPS
`fp32_cuda_core_tflops`	FP32 CUDA Core Performance	TFLOPS
`mixed_precision_tflops`	Mixed Precision Performance	TFLOPS
`l2_bandwidth_gbs`	L2 Cache Bandwidth	GB/s
`memory_bandwidth_gbs`	Memory Bandwidth	GB/s
`temperature_centigrade`	GPU Temperature	°C
`power_consumption_watts`	Power Consumption	W
`kernel_launch_overhead_us`	Kernel Launch Overhead	μs
`device_to_host_bandwidth_gbs`	Device to Host Bandwidth	GB/s
`host_to_device_bandwidth_gbs`	Host to Device Bandwidth	GB/s
`allreduce_bandwidth_gbs`	AllReduce Bandwidth	GB/s
`broadcast_bandwidth_gbs`	Broadcast Bandwidth	GB/s
`fp16_tflops_per_peak_watt`	FP16 TFLOPS per Peak Watt	TFLOPS/W
`fp32_tflops_per_peak_watt`	FP32 TFLOPS per Peak Watt	TFLOPS/W

Cluster Network Benchmark

Overview

Field	Value
Benchmark ID	`cluster_network`
Type	Multi-node
Min Nodes	2
Description	Tests network connectivity and bandwidth between cluster nodes

Result Structure (Cluster-level)

{
  "avg_bandwidth_gbps": 45.2,
  "avg_latency": 0.8,
  "min_bandwidth_gbps": 42.1,
  "total_links_tested": 12,
  "node_count": 4,
  "measurements": [
    {
      "host_ip": "192.168.1.10",
      "dest_ip": "192.168.1.11",
      "throughput_mbps": 45200,
      "throughput_gbps": 45.2,
      "latency_ms": 0.75
    }
  ]
}

Field Metadata

Field	Display Name	Unit
`throughput_gbps`	Throughput	Gbps
`throughput_mbps`	Throughput	Mbps
`latency_ms`	Latency	ms
`avg_bandwidth_gbps`	Average Bandwidth	Gbps
`min_bandwidth_gbps`	Minimum Bandwidth	Gbps
`total_links_tested`	Total Links Count Tested
`node_count`	Total Node Count

Llama3 Fine-tuning Single Node

This benchmark requires a Hugging Face token and uses NVIDIA DGX benchmark methodology with NeMo container. Please set environment variable HF_TOKEN with your Hugging Face token before running the benchmark.

Overview

Field	Value
Benchmark ID	`llama3_ft_single`
Type	Single-node
Min Nodes	1
Description	Llama3 model fine-tuning performance benchmark

Configuration

Field	Type	Required	Description	Default	Constraints
`model_size`	string	No	Model size	”8b"	"8b”, “70b”, “405b”
`dtype`	string	No	Data type	”fp16"	"fp8”, “fp16”, “bf16”
`fine_tune_type`	string	No	Fine-tuning method	”lora"	"lora”, “sft”
`global_batch_size`	integer	No	Global batch size	8	[ 1 .. 128 ]
`max_steps`	integer	No	Maximum steps	50	[ 1 .. 100 ]

Configuration Example

{
  "model_size": "8b",
  "dtype": "fp16",
  "fine_tune_type": "lora",
  "global_batch_size": 8,
  "max_steps": 50
}

Configuration Constraints

For 405b model: maximum batch size is 32
Batch size must be between 1 and 128
Max steps must be between 1 and 100

Result Structure

{
  "tokens_per_step": 16384,
  "tokens_per_second": 125000,
  "train_step_time_mean": 0.328,
  "train_step_time_std": 0.012,
  "step_time_cv_percent": 3.66,
  "time_to_1t_tokens_days": 92.6
}

Field Metadata

Field	Display Name	Unit
`tokens_per_step`	Tokens per Step	tokens
`tokens_per_second`	Tokens per Second	tokens/s
`train_step_time_mean`	Training Step Time (Mean)	s
`train_step_time_std`	Training Step Time (Std Dev)	s
`step_time_cv_percent`	Step Time CV%	%
`time_to_1t_tokens_days`	Time to 1T Tokens	days

SiliconNavigator™

SiliconCarbon™

SiliconMark™

PriceIQ™

GPU Index

SiliconMark™ Benchmarks

Available Benchmarks

QuickMark Benchmark

Overview

Configuration

Result Structure

Field Metadata

Cluster Network Benchmark

Overview

Result Structure (Cluster-level)

Field Metadata

Llama3 Fine-tuning Single Node

Overview

Configuration

Configuration Example

Configuration Constraints

Result Structure

Field Metadata

SiliconNavigator™

SiliconCarbon™

SiliconMark™

PriceIQ™

GPU Index

​Available Benchmarks

​QuickMark Benchmark

​Overview

​Configuration

​Result Structure

​Field Metadata

​Cluster Network Benchmark

​Overview

​Result Structure (Cluster-level)

​Field Metadata

​Llama3 Fine-tuning Single Node

​Overview

​Configuration

​Configuration Example

​Configuration Constraints

​Result Structure

​Field Metadata

Available Benchmarks

QuickMark Benchmark

Overview

Configuration

Result Structure

Field Metadata

Cluster Network Benchmark

Overview

Result Structure (Cluster-level)

Field Metadata

Llama3 Fine-tuning Single Node

Overview

Configuration

Configuration Example

Configuration Constraints

Result Structure

Field Metadata