AMD Instinct™ MI325X Accelerators

Revolutionizing AI Performance and Scalability

The AMD Instinct™ MI325X GPU accelerators set a new benchmark in AI computing, delivering unmatched performance and efficiency for training and inference workloads.

Introducing the AMD Instinct MI325X Accelerator and Platform

3rd Generation AMD CDNA™ Core Architecture

Built on advanced die stacking and chiplet technology to deliver superior scalability and efficiency.
Purpose-designed for demanding AI workloads, providing exceptional performance for AI inferencing, training, and data analytics.

Unparalleled Memory Capacity

Equipped with high-capacity HBM3E memory, enabling seamless handling of massive datasets and complex computations.
Platforms powered by 8 AMD Instinct™ MI325X accelerators deliver an astounding 2 terabytes of memory with low latency.
Seamlessly scalable as a drop-in replacement for the Instinct MI300X Platform.
Optimizes multitasking efficiency and supports extensive AI models and virtual machines.

High-Speed Memory Bandwidth

Features 6 TB/s of peak memory bandwidth for accelerated data transfer and reduced latency.
Enhances scalability and supports high-resolution, data-intensive applications.
Ideal for real-time AI inferencing and advanced data processing tasks.

Built on 3rd Gen AMD CDNA Architecture

The AMD Instinct MI325X leverages the CDNA 3 architecture, featuring:

Enhanced AMD Matrix Core technology for improved throughput and streamlined compute performance.
AMD Infinity Fabric™ technology for optimized I/O efficiency, enabling seamless scaling within and across accelerators.
PCIe® Gen 5 interface with 16 lanes for high-speed host connections.
Seven Infinity Fabric links ensuring full connectivity between eight GPUs in a ring topology.
Pre-configured in the MI325X Platform with eight accelerators linked by the AMD Universal Base Board (UBB 2.0) featuring HGX host connectors.

Multi-Chip Architecture

The MI325X utilizes a multi-chip architecture for dense computing and high-bandwidth memory integration:

Eight accelerated compute dies (XCDs), each equipped with:
- 38 Compute Units (CUs), 32 KB L1 cache per CU, and 4 MB shared L2 cache.
- 256 MB of AMD Infinity Cache™ shared across 8 XCDs.
- Support for multiple precisions for AI/ML and HPC tasks, including native hardware for sparsity.
Advanced Media Decoding: Supports HEVC/H.265, AVC/H.264, VP9, and AV1 with additional 8-core JPEG/MPEG CODEC.
256 GB of HBM3E memory with 6 TB/s peak throughput for data-intensive applications.
SR-IOV support for up to 8 partitions, enabling virtualization and multi-user environments.

Coherent Shared Memory for Large Models

The MI325X accelerators facilitate large-scale AI and HPC workloads with:

Coherent shared memory between eight accelerators on a UBB.
128 GB/s bidirectional bandwidth between GPUs to ensure rapid data exchange.
Enhanced performance for memory-intensive AI, ML, and HPC models.

Transforming AI and HPC Capabilities

The AMD Instinct MI325X accelerators redefine AI performance, enabling developers and enterprises to handle the most complex AI workloads with superior efficiency, scalability, and data processing speed. Whether it’s AI training, inference, or data analytics, the MI325X platform delivers the power and flexibility needed to drive cutting-edge AI solutions.

Contact us for pricing

Tech Specs

Product Basics

Name	AMD Instinct™ MI325X
Family	Instinct
Series	Instinct MI300 Series
Form Factor	Servers
Launch Date	10/10/2024

GPU Specifications

GPU Architecture	CDNA3
Lithography	TSMC 5nm \| 6nm FinFET
Stream Processors	19,456
Matrix Cores	1216
Compute Units	304
Peak Engine Clock	2100 MHz
Peak Eight-bit Precision (FP8) Performance (E5M2, E4M3)	2.61 PFLOPs
Peak Eight-bit Precision (FP8) Performance with Structured Sparsity (E5M2, E4M3)	5.22 PFLOPs
Peak Half Precision (FP16) Performance	1.3 PFLOPs
Peak Half Precision (FP16) Performance with Structured Sparsity	2.61 PFLOPs
Peak Single Precision (TF32 Matrix) Performance	653.7 TFLOPs
Peak Single Precision (TF32) Performance with Structured Sparsity	1.3 PFLOPs
Peak Single Precision Matrix (FP32) Performance	163.4 TFLOPs
Peak Double Precision Matrix (FP64) Performance	163.4 TFLOPs
Peak Single Precision (FP32) Performance	163.4 TFLOPs
Peak Double Precision (FP64) Performance	81.7 TFLOPs
Peak INT8 Performance	2.6 POPs
Peak INT8 Performance with Structured Sparsity	5.22 POPs
Peak bfloat16	1.3 PFLOPs
Peak bfloat16 with Strutured Sparsity	2.61 PFLOPs
Transistor Count	153 Billion

Requirements

External Power Connectors	54V UBB
Typical Board Power (TBP)	1000W Peak

GPU Memory

Last Level Cache (LLC)	256 MB
Dedicated Memory Size	256 GB
Dedicated Memory Type	HBM3E
Infinity Cache	Yes
Memory Interface	8192-bit
Memory Clock	6 GHz
Peak Memory Bandwidth	6 TB/s
Memory ECC Support	Yes (Full-Chip)

Board Specifications

GPU Form Factor	OAM Module
Bus Type	PCIe® 5.0 x16
Infinity Fabric™ Links	8
Peak Infinity Fabric™ Link Bandwidth	128 GB/s
Cooling	Passive OAM

Additional Features

Supported Technologies	AMD CDNA™ 3 Architecture , AMD ROCm™ - Ecosystem without Borders , AMD Infinity Architecture
RAS Support	Yes
Page Retirement	Yes
Page Avoidance	Yes
SR-IOV	Yes

Footnotes

MI325-002 - Calculations conducted by AMD Performance Labs as of May 28th, 2024 for the AMD Instinct™ MI325X GPU resulted in 1307.4 TFLOPS peak theoretical half precision (FP16), 1307.4 TFLOPS peak theoretical Bfloat16 format precision (BF16), 2614.9 TFLOPS peak theoretical 8-bit precision (FP8), 2614.9 TOPs INT8 floating-point performance. Actual performance will vary based on final specifications and system configuration.
Published results on Nvidia H200 SXM (141GB) GPU: 989.4 TFLOPS peak theoretical half precision tensor (FP16 Tensor), 989.4 TFLOPS peak theoretical Bfloat16 tensor format precision (BF16 Tensor), 1,978.9 TFLOPS peak theoretical 8-bit precision (FP8), 1,978.9 TOPs peak theoretical INT8 floating-point performance. BFLOAT16 Tensor Core, FP16 Tensor Core, FP8 Tensor Core and INT8 Tensor Core performance were published by Nvidia using sparsity; for the purposes of comparison, AMD converted these numbers to non-sparsity/dense by dividing by 2, and these numbers appear above
Nvidia H200 source: https://nvdam.widen.net/s/nb5zzzsjdf/hpc-datasheet-sc23-h200-datasheet-3002446 and https://www.anandtech.com/show/21136/nvidia-at-sc23-h200-accelerator-with-hbm3e-and-jupiter-supercomputer-for-2024
Note: Nvidia H200 GPUs have the same published FLOPs performance as H100 products https://resources.nvidia.com/en-us-tensor-core/. MI325-02
MI325-008 - Calculations conducted by AMD Performance Labs as of October 2nd, 2024 for the AMD Instinct™ MI325X (1000W) GPU designed with AMD CDNA™ 3 5nm | 6nm FinFET process technology at 2,100 MHz peak boost engine clock resulted in 163.4 TFLOPs peak theoretical double precision Matrix (FP64 Matrix), 81.7 TFLOPs peak theoretical double precision (FP64), 163.4 TFLOPs peak theoretical single precision Matrix (FP32 Matrix), 163.4 TFLOPs peak theoretical single precision (FP32), 653.7 TFLOPS peak theoretical TensorFloat-32 (TF32), 1307.4 TFLOPS peak theoretical half precision (FP16), Actual performance may vary based on final specifications and system configuration.
Published results on Nvidia H200 SXM (141GB) GPU: 66.9 TFLOPs peak theoretical double precision tensor (FP64 Tensor), 33.5 TFLOPs peak theoretical double precision (FP64), 66.9 TFLOPs peak theoretical single precision (FP32), 494.7 TFLOPs peak TensorFloat-32 (TF32), 989.5 TFLOPS peak theoretical half precision tensor (FP16 Tensor). TF32 Tensor Core performance were published by Nvidia using sparsity; for the purposes of comparison, AMD converted these numbers to non-sparsity/dense by dividing by 2, and this number appears above.
Nvidia H200 source: https://nvdam.widen.net/s/nb5zzzsjdf/hpc-datasheet-sc23-h200-datasheet-3002446 and https://www.anandtech.com/show/21136/nvidia-at-sc23-h200-accelerator-with-hbm3e-and-jupiter-supercomputer-for-2024
Note: Nvidia H200 GPUs have the same published FLOPs performance as H100 products https://resources.nvidia.com/en-us-tensor-core/.
*Nvidia H200 GPUs don’t support FP32 Tensor.
MI325-001A - Calculations conducted by AMD Performance Labs as of September 26th, 2024, based on current specifications and /or estimation. The AMD Instinct™ MI325X OAM accelerator will have 256GB HBM3E memory capacity and 6 TB/s GPU peak theoretical memory bandwidth performance. Actual results based on production silicon may vary.
The highest published results on the NVidia Hopper H200 (141GB) SXM GPU accelerator resulted in 141GB HBM3E memory capacity and 4.8 TB/s GPU memory bandwidth performance. https://nvdam.widen.net/s/nb5zzzsjdf/hpc-datasheet-sc23-h200-datasheet-3002446
The highest published results on the NVidia Blackwell HGX B100 (192GB) 700W GPU accelerator resulted in 192GB HBM3E memory capacity and 8 TB/s GPU memory bandwidth performance.
The highest published results on the NVidia Blackwell HGX B200 (192GB) GPU accelerator resulted in 192GB HBM3E memory capacity and 8 TB/s GPU memory bandwidth performance.
Nvidia Blackwell specifications at https://resources.nvidia.com/en-us-blackwell-architecture?_gl=1*1r4pme7*_gcl_aw*R0NMLjE3MTM5NjQ3NTAuQ2p3S0NBancyNkt4QmhCREVpd0F1NktYdDlweXY1dlUtaHNKNmhPdHM4UVdPSlM3dFdQaE40WkI4THZBaWFVajFy

Go Back

Why Choose Us?

We deliver tailored, scalable, and cost-effective IT solutions that drive business success in a technology-driven world. Whether optimizing your data center, adopting AI, or securing your enterprise, we are committed to partnering with you on your digital transformation journey.

Contact us

NEWSLETTER

Subscribe to receive updates, access to exclusive deals, and more.

contact@wiaix.com

+603 3290 5915

All brand names and trademarks are referred to here for descriptive purposes only and are the properties of their respective owners.

Please visit our Terms & Conditions.

AMD Instinct™ MI325X Accelerators