AMD Instinct™ MI325X Accelerators
AMD Instinct™ MI325X Accelerators
Revolutionizing AI Performance and Scalability
The AMD Instinct™ MI325X GPU accelerators set a new benchmark in AI computing, delivering unmatched performance and efficiency for training and inference workloads.
Introducing the AMD Instinct MI325X Accelerator and Platform
3rd Generation AMD CDNA™ Core Architecture
Built on advanced die stacking and chiplet technology to deliver superior scalability and efficiency.
Purpose-designed for demanding AI workloads, providing exceptional performance for AI inferencing, training, and data analytics.
Unparalleled Memory Capacity
Equipped with high-capacity HBM3E memory, enabling seamless handling of massive datasets and complex computations.
Platforms powered by 8 AMD Instinct™ MI325X accelerators deliver an astounding 2 terabytes of memory with low latency.
Seamlessly scalable as a drop-in replacement for the Instinct MI300X Platform.
Optimizes multitasking efficiency and supports extensive AI models and virtual machines.
High-Speed Memory Bandwidth
Features 6 TB/s of peak memory bandwidth for accelerated data transfer and reduced latency.
Enhances scalability and supports high-resolution, data-intensive applications.
Ideal for real-time AI inferencing and advanced data processing tasks.
Built on 3rd Gen AMD CDNA Architecture
The AMD Instinct MI325X leverages the CDNA 3 architecture, featuring:
Enhanced AMD Matrix Core technology for improved throughput and streamlined compute performance.
AMD Infinity Fabric™ technology for optimized I/O efficiency, enabling seamless scaling within and across accelerators.
PCIe® Gen 5 interface with 16 lanes for high-speed host connections.
Seven Infinity Fabric links ensuring full connectivity between eight GPUs in a ring topology.
Pre-configured in the MI325X Platform with eight accelerators linked by the AMD Universal Base Board (UBB 2.0) featuring HGX host connectors.
Multi-Chip Architecture
The MI325X utilizes a multi-chip architecture for dense computing and high-bandwidth memory integration:
Eight accelerated compute dies (XCDs), each equipped with:
38 Compute Units (CUs), 32 KB L1 cache per CU, and 4 MB shared L2 cache.
256 MB of AMD Infinity Cache™ shared across 8 XCDs.
Support for multiple precisions for AI/ML and HPC tasks, including native hardware for sparsity.
Advanced Media Decoding: Supports HEVC/H.265, AVC/H.264, VP9, and AV1 with additional 8-core JPEG/MPEG CODEC.
256 GB of HBM3E memory with 6 TB/s peak throughput for data-intensive applications.
SR-IOV support for up to 8 partitions, enabling virtualization and multi-user environments.
Coherent Shared Memory for Large Models
The MI325X accelerators facilitate large-scale AI and HPC workloads with:
Coherent shared memory between eight accelerators on a UBB.
128 GB/s bidirectional bandwidth between GPUs to ensure rapid data exchange.
Enhanced performance for memory-intensive AI, ML, and HPC models.
Transforming AI and HPC Capabilities
The AMD Instinct MI325X accelerators redefine AI performance, enabling developers and enterprises to handle the most complex AI workloads with superior efficiency, scalability, and data processing speed. Whether it’s AI training, inference, or data analytics, the MI325X platform delivers the power and flexibility needed to drive cutting-edge AI solutions.
Contact us for pricing
Tech Specs
Product Basics
Name | AMD Instinct™ MI325X |
Family | Instinct |
Series | Instinct MI300 Series |
Form Factor | Servers |
Launch Date | 10/10/2024 |
GPU Specifications
GPU Architecture | CDNA3 |
Lithography | TSMC 5nm | 6nm FinFET |
Stream Processors | 19,456 |
Matrix Cores | 1216 |
Compute Units | 304 |
Peak Engine Clock | 2100 MHz |
Peak Eight-bit Precision (FP8) Performance (E5M2, E4M3) | 2.61 PFLOPs |
Peak Eight-bit Precision (FP8) Performance with Structured Sparsity (E5M2, E4M3) | 5.22 PFLOPs |
Peak Half Precision (FP16) Performance | 1.3 PFLOPs |
Peak Half Precision (FP16) Performance with Structured Sparsity | 2.61 PFLOPs |
Peak Single Precision (TF32 Matrix) Performance | 653.7 TFLOPs |
Peak Single Precision (TF32) Performance with Structured Sparsity | 1.3 PFLOPs |
Peak Single Precision Matrix (FP32) Performance | 163.4 TFLOPs |
Peak Double Precision Matrix (FP64) Performance | 163.4 TFLOPs |
Peak Single Precision (FP32) Performance | 163.4 TFLOPs |
Peak Double Precision (FP64) Performance | 81.7 TFLOPs |
Peak INT8 Performance | 2.6 POPs |
Peak INT8 Performance with Structured Sparsity | 5.22 POPs |
Peak bfloat16 | 1.3 PFLOPs |
Peak bfloat16 with Strutured Sparsity | 2.61 PFLOPs |
Transistor Count | 153 Billion |
Requirements
External Power Connectors | 54V UBB |
Typical Board Power (TBP) | 1000W Peak |
GPU Memory
Last Level Cache (LLC) | 256 MB |
Dedicated Memory Size | 256 GB |
Dedicated Memory Type | HBM3E |
Infinity Cache | Yes |
Memory Interface | 8192-bit |
Memory Clock | 6 GHz |
Peak Memory Bandwidth | 6 TB/s |
Memory ECC Support | Yes (Full-Chip) |
Board Specifications
GPU Form Factor | OAM Module |
Bus Type | PCIe® 5.0 x16 |
Infinity Fabric™ Links | 8 |
Peak Infinity Fabric™ Link Bandwidth | 128 GB/s |
Cooling | Passive OAM |
Additional Features
Supported Technologies | AMD CDNA™ 3 Architecture , AMD ROCm™ - Ecosystem without Borders , AMD Infinity Architecture |
RAS Support | Yes |
Page Retirement | Yes |
Page Avoidance | Yes |
SR-IOV | Yes |
Footnotes
MI325-002 - Calculations conducted by AMD Performance Labs as of May 28th, 2024 for the AMD Instinct™ MI325X GPU resulted in 1307.4 TFLOPS peak theoretical half precision (FP16), 1307.4 TFLOPS peak theoretical Bfloat16 format precision (BF16), 2614.9 TFLOPS peak theoretical 8-bit precision (FP8), 2614.9 TOPs INT8 floating-point performance. Actual performance will vary based on final specifications and system configuration.
Published results on Nvidia H200 SXM (141GB) GPU: 989.4 TFLOPS peak theoretical half precision tensor (FP16 Tensor), 989.4 TFLOPS peak theoretical Bfloat16 tensor format precision (BF16 Tensor), 1,978.9 TFLOPS peak theoretical 8-bit precision (FP8), 1,978.9 TOPs peak theoretical INT8 floating-point performance. BFLOAT16 Tensor Core, FP16 Tensor Core, FP8 Tensor Core and INT8 Tensor Core performance were published by Nvidia using sparsity; for the purposes of comparison, AMD converted these numbers to non-sparsity/dense by dividing by 2, and these numbers appear above
Nvidia H200 source: https://nvdam.widen.net/s/nb5zzzsjdf/hpc-datasheet-sc23-h200-datasheet-3002446 and https://www.anandtech.com/show/21136/nvidia-at-sc23-h200-accelerator-with-hbm3e-and-jupiter-supercomputer-for-2024
Note: Nvidia H200 GPUs have the same published FLOPs performance as H100 products https://resources.nvidia.com/en-us-tensor-core/. MI325-02MI325-008 - Calculations conducted by AMD Performance Labs as of October 2nd, 2024 for the AMD Instinct™ MI325X (1000W) GPU designed with AMD CDNA™ 3 5nm | 6nm FinFET process technology at 2,100 MHz peak boost engine clock resulted in 163.4 TFLOPs peak theoretical double precision Matrix (FP64 Matrix), 81.7 TFLOPs peak theoretical double precision (FP64), 163.4 TFLOPs peak theoretical single precision Matrix (FP32 Matrix), 163.4 TFLOPs peak theoretical single precision (FP32), 653.7 TFLOPS peak theoretical TensorFloat-32 (TF32), 1307.4 TFLOPS peak theoretical half precision (FP16), Actual performance may vary based on final specifications and system configuration.
Published results on Nvidia H200 SXM (141GB) GPU: 66.9 TFLOPs peak theoretical double precision tensor (FP64 Tensor), 33.5 TFLOPs peak theoretical double precision (FP64), 66.9 TFLOPs peak theoretical single precision (FP32), 494.7 TFLOPs peak TensorFloat-32 (TF32), 989.5 TFLOPS peak theoretical half precision tensor (FP16 Tensor). TF32 Tensor Core performance were published by Nvidia using sparsity; for the purposes of comparison, AMD converted these numbers to non-sparsity/dense by dividing by 2, and this number appears above.
Nvidia H200 source: https://nvdam.widen.net/s/nb5zzzsjdf/hpc-datasheet-sc23-h200-datasheet-3002446 and https://www.anandtech.com/show/21136/nvidia-at-sc23-h200-accelerator-with-hbm3e-and-jupiter-supercomputer-for-2024
Note: Nvidia H200 GPUs have the same published FLOPs performance as H100 products https://resources.nvidia.com/en-us-tensor-core/.
*Nvidia H200 GPUs don’t support FP32 Tensor.MI325-001A - Calculations conducted by AMD Performance Labs as of September 26th, 2024, based on current specifications and /or estimation. The AMD Instinct™ MI325X OAM accelerator will have 256GB HBM3E memory capacity and 6 TB/s GPU peak theoretical memory bandwidth performance. Actual results based on production silicon may vary.
The highest published results on the NVidia Hopper H200 (141GB) SXM GPU accelerator resulted in 141GB HBM3E memory capacity and 4.8 TB/s GPU memory bandwidth performance. https://nvdam.widen.net/s/nb5zzzsjdf/hpc-datasheet-sc23-h200-datasheet-3002446
The highest published results on the NVidia Blackwell HGX B100 (192GB) 700W GPU accelerator resulted in 192GB HBM3E memory capacity and 8 TB/s GPU memory bandwidth performance.
The highest published results on the NVidia Blackwell HGX B200 (192GB) GPU accelerator resulted in 192GB HBM3E memory capacity and 8 TB/s GPU memory bandwidth performance.
Nvidia Blackwell specifications at https://resources.nvidia.com/en-us-blackwell-architecture?_gl=1*1r4pme7*_gcl_aw*R0NMLjE3MTM5NjQ3NTAuQ2p3S0NBancyNkt4QmhCREVpd0F1NktYdDlweXY1dlUtaHNKNmhPdHM4UVdPSlM3dFdQaE40WkI4THZBaWFVajFy
Why Choose Us?
We deliver tailored, scalable, and cost-effective IT solutions that drive business success in a technology-driven world. Whether optimizing your data center, adopting AI, or securing your enterprise, we are committed to partnering with you on your digital transformation journey.
NEWSLETTER
Subscribe to receive updates, access to exclusive deals, and more.
+603 3290 5915
All brand names and trademarks are referred to here for descriptive purposes only and are the properties of their respective owners.
Please visit our Terms & Conditions.
© Will Imaging. All rights reserved.
