Product images are provided for reference and may not represent the exact model, configuration, or included components.

Overview

SKU: S6A73C
Condition: New
Write a Review

HPE S6A73C NVIDIA RTX PRO 6000 96G PCIE

HPE S6A73C NVIDIA RTX PRO 6000 96GB GDDR7 PCIe Professional GPUThe HPE S6A73C is a full-fat professional GPU based on NVIDIA's Blackwell architecture …

$38,327.99
Ships same business day
In stock

Quantity:

Adding to cart… The item has been added
Compatibility guidance available for your deployment
Senior specialists for pre and post-sales support
Authorized sourcing and documentation support
Shipping and lead-time confirmation before install

Laura Bennett, IPSD Senior Specialist

Talk to Laura

200+ hrs training • U.S - based

Senior Specialist • 877-277-7147

HPE S6A73C NVIDIA RTX PRO 6000 96G PCIE

$38,327.99

Overview

SKU: S6A73C
Condition: New

No Bots, Just Experts

Questions about this product? Free pre-sales support from a senior specialist — product questions, compatibility checks, BOM quotes, price confirmation — typically answered within one business day. Need camera placement or system design work? Engineering time is $175 per hour (qty 1 = 1 hour). Hardware buyers get up to one hour ($175) credited back on their order.

Description

HPE S6A73C NVIDIA RTX PRO 6000 96GB GDDR7 PCIe Professional GPU

The HPE S6A73C is a full-fat professional GPU based on NVIDIA's Blackwell architecture — 96 GB of GDDR7 memory, 24,064 CUDA parallel processing cores, and a 512-bit memory interface delivering 1,597 GB/s of bandwidth. This is not a workstation card trimmed for cost; it is the RTX PRO 6000, positioned for demanding AI inferencing, large-model training runs, simulation, and 3D rendering workloads where GPU memory capacity and raw FP32 throughput are the binding constraints. If you are sizing a server for LLM inference, a visualization node, or a GPU-dense AI appliance and 96 GB of on-card memory matters, the S6A73C is the configuration to evaluate.

Key Features

  • 96 GB GDDR7 on a 512-bit Bus: 1,597 GB/s of memory bandwidth means large models load without spilling to slower system RAM — a decisive factor when running 70B+ parameter models at full precision or batching high-resolution inference requests. GDDR7 at this bus width is a meaningful step over prior-generation GDDR6X configurations.
  • 24,064 CUDA Cores + 120 TFLOPS FP32: Single-precision throughput at 120 TFLOPS handles traditional rendering, simulation, and general-purpose GPU compute. For environments running mixed GPU workloads — visualization alongside AI — the CUDA core count keeps both lanes fed without bottlenecking either.
  • 188 Fourth-Generation RT Cores (355 TFLOPS Peak RT): Hardware ray tracing at 355 TFLOPS peak means real-time photorealistic rendering at professional resolution. Architectural visualization, digital twin environments, and broadcast-grade virtual production pipelines can leverage this without offloading to a dedicated render farm.
  • Tensor Cores — 4 PFLOPS FP4 / 2 PFLOPS FP8 / 1 PFLOP FP16: The tiered Tensor Core stack lets you match numerical precision to the task: FP4 for high-throughput quantized inference where memory bandwidth is the ceiling, FP8 for training runs that tolerate reduced precision, FP16/BF16 for standard deep learning. 234 TFLOPS at TF32 covers the majority of production AI training workloads. Picking the right precision tier can halve inference latency without touching model accuracy.
  • Blackwell GPU Architecture: NVIDIA's Blackwell generation brings second-generation transformer engine improvements and updated NVLink/PCIe interconnect support — relevant if the target server platform uses multi-GPU configurations or requires the updated driver and CUDA toolkit ecosystem that Blackwell mandates.
  • PCIe Interface: Standard PCIe slot installation means compatibility with a broad range of HPE ProLiant and third-party server platforms that support full-length, high-power PCIe cards. No proprietary carrier required — though physical slot clearance and auxiliary power delivery must be validated against the target chassis before ordering.
  • 2.87 lb Card Weight: At 2.87 lb, mechanical support in a server chassis matters. Verify the target platform's GPU retention and riser arm specifications, particularly for 1U/2U deployments where physical clearance and card sag are real installation concerns.

Integration and Compatibility

The S6A73C installs via PCIe and is positioned within the HPE server GPU portfolio for high-memory AI and professional visualization use cases. It carries an UNSPSC code of 43201401 (graphics cards/accelerators), consistent with procurement classification in enterprise hardware catalogs. Country of origin is listed as CN, TW, VN — relevant for procurement teams with TAA or NDAA supply-chain requirements who need to verify current HPE certification status independently through official channels.

For deployments targeting AI inference at scale, pair this card with sufficient system memory and a high-core-count CPU that can feed the PCIe bus without creating a host-side bottleneck. NVMe storage bandwidth is also worth sizing carefully when model weights exceed GPU memory and paging is unavoidable.

Buyers integrating this card into an AI compute cluster or GPU server build should also evaluate high-throughput networking to keep the data pipeline matched to the GPU's throughput ceiling. For reference on broader HPE compute and GPU options, see the full HPE catalog. If you are building an AI inferencing rack rather than a single-node workstation, compare this against other GPU accelerator configurations for total memory-per-rack economics.

Frequently Asked Questions

Q: What is the GPU memory configuration on the HPE S6A73C?

A: The S6A73C is equipped with 96 GB of GDDR7 memory on a 512-bit memory interface, delivering 1,597 GB/s of memory bandwidth. This is the full RTX PRO 6000 memory configuration — not a reduced-capacity variant.

Q: What PCIe slot does the S6A73C require?

A: The S6A73C uses a standard PCIe interface. Verify slot physical length, auxiliary power connector availability, and chassis airflow clearance against your target server platform before ordering — a card of this class typically requires a full-length slot and dedicated auxiliary power.

Q: Is the S6A73C suitable for large language model inference?

A: The 96 GB GDDR7 frame buffer is a primary reason buyers specify this card for LLM inference. Larger on-card memory allows bigger models (or larger batches) to reside entirely on the GPU, avoiding the latency penalty of weight paging. The 1,597 GB/s bandwidth supports high token throughput on quantized workloads at FP4 (4 PFLOPS) or FP8 (2 PFLOPS) precision.

Q: Does the S6A73C support hardware ray tracing?

A: Yes. The card includes 188 fourth-generation RT Cores with peak ray tracing performance of 355 TFLOPS, suitable for real-time photorealistic rendering in architectural visualization, digital twin, and virtual production workflows.

Q: What is the weight of the S6A73C and are there chassis installation considerations?

A: The card weighs 2.87 lb. In server deployments, especially rack-mounted 1U or 2U chassis, verify GPU retention bracket and riser arm load ratings. Card sag or inadequate retention can cause PCIe contact issues over time in high-vibration data center environments.

Q: What compute precision tiers does the S6A73C support?

A: The S6A73C supports FP4 (4 PFLOPS), FP8 (2 PFLOPS), FP16/BF16 (1 PFLOP), TF32 (234 TFLOPS), and FP32 (120 TFLOPS) via its Tensor Core and CUDA core stack. Choose precision based on your workload tolerance for numerical accuracy vs. throughput — FP4 maximizes inference speed; FP32 is for workloads requiring full single-precision compute.

Jerry Tildsen
Jerry Tildsen

The number that defines the S6A73C for production AI deployments is 1,597 GB/s — that is the memory bandwidth this card delivers via its 512-bit GDDR7 interface, and it is the figure that separates GPU platforms that can sustain high-throughput inference batches from those that stall waiting on memory reads. When I am specifying a GPU for a 70B-class language model running at FP8 precision, memory bandwidth is the first number I check, and 1,597 GB/s is competitive at the top of the current PCIe GPU tier.

Technical Highlights:

  • 96 GB GDDR7 / 512-bit / 1,597 GB/s: The combination of capacity and bandwidth is what makes this card viable for large-model inference without quantization trade-offs. 96 GB holds most production-scale models in full FP16 without spilling to system RAM, and the 512-bit bus ensures that bandwidth scales with the memory size rather than bottlenecking it.
  • Tensor Core Precision Stack (4 PFLOPS FP4 → 120 TFLOPS FP32): Having a full precision stack means you can tune workloads without swapping hardware — FP4 for latency-critical batch inference, TF32 at 234 TFLOPS for training runs, and FP32 for simulation or visualization tasks that need full numerical fidelity. One card, multiple workload profiles.
  • 24,064 CUDA Cores + 188 4th-Gen RT Cores: The RT Core count (355 TFLOPS peak) is not incidental — if this card is going into a digital twin or architectural visualization environment alongside AI workloads, you are not sacrificing rendering capability for AI throughput. Both pipelines are fully provisioned.

Deployment Considerations:

  • PCIe interface means broad platform compatibility across HPE ProLiant and third-party server chassis, but confirm physical slot dimensions, auxiliary power connector count, and chassis airflow before finalizing the server BOM — a 96 GB GDDR7 card at this performance tier has a non-trivial thermal envelope that smaller chassis may not adequately support.
  • Country of origin (CN, TW, VN) requires independent verification against current TAA and NDAA Section 889 compliance tables if the deployment is for a federal, DoD, or government-adjacent customer — do not assume compliance without confirming against active HPE certifications for this specific SKU.

The S6A73C is the right specification for an AI inference node or multi-GPU visualization server where 96 GB of on-card memory is the architectural requirement — not a luxury. If your workload fits comfortably in 48 GB or less, a lower-memory configuration in the same family will deliver better price-per-TFLOP. But if you are running large foundation models at production batch sizes without quantization, this is the card that removes memory capacity as a constraint.

Specifications
Weight: 2.87 lb
Country Origin: CN,TW,VN
Interface: PCIe
Country Of Origin: CN,TW,VN
Unspsc Code: 43201401
Gpu Architecture: NVIDIA Blackwell Architecture
Cuda Parallel Processing Cores: 24,064
Nvidia Rt Cores: 188 (4th Gen)
Fp4 Tensor Core: 4 PFLOPS
Fp8 Tensor Core: 2 PFLOPS
Fp16 | Bf16 Tensor Core: 1 PFLOP
Tf32 Tensor Core: 234 TFLOPS
Single-Precision Performance (Fp32: 120 TFLOPS
Peak Rt Core Performance: 355 TFLOPS
Gpu Memory: 96 GB GDDR7
Memory Interface: 512-bit
Memory Bandwidth: 1597 GB/s
Q&A
Reviews
Have Questions?

RELATED PRODUCTS

System Design, Deployment & Technical Support

Support services and planning resources for commercial surveillance, access control, and infrastructure deployments.

Fixed scope • Fixed price

System Design Assistance

  • Get help validating product compatibility
  • Coverage requirements
  • Storage planning and deployment architecture before you buy.
Request Design Help

Deployment & Configuration Support

  • Access fixed-scope support for rollout planning
  • User setup guidance
  • Migration and system standardization across single-site or multi-site deployments
View Support Services

Guides, Tools & Calculators

  • PoE requirements
  • Storage retention
  • Camera selection and deployment methodology
Open Technical Resources