AI Novel Memory - HBF Technology
2026-03-02 10:12:13

     High Bandwidth Flash (HBF) is a novel memory architecture specifically designed for the AI domain. HBF, whose full name stands for High Bandwidth Flash, boasts a structure akin to that of High Bandwidth Memory (HBM) utilizing stacked DRAM chips, and is a product crafted by stacking NAND flash memories.

     In terms of design, HBF combines the characteristics of 3D NAND flash memory and High-Bandwidth Memory (HBM) to better meet the needs of AI inference. The stacking design of HBF is similar to that of HBM, where multiple high-performance flash core chips are stacked and connected to a logic chip that can access parallel flash sub-arrays through Through-Silicon Vias (TSVs). Specifically, it is based on SanDisk's BICS 3D NAND technology, utilizing a CMOS Direct Bonding to Array (CBA) design, where the 3D NAND storage array is directly bonded onto the I/O chip.
       HBF breaks away from traditional NAND design, implementing independently accessible memory sub-arrays. Its core innovations include:
     Distributed control structure: Each group of NAND chips can be accessed independently and in parallel. The optimized controller algorithm compresses the inherent millisecond-level delay of NAND to the microsecond level, matching the AI inference requirements.
     Dense interconnect architecture: Utilizing chip-to-wafer bonding technology, it constructs a densely interconnected storage structure, supports parallel access to multiple NAND arrays, and significantly enhances I/O bandwidth.
     Non-volatile storage: Based on the characteristics of NAND flash memory, HBF can maintain data for a long time without requiring a refresh current, thereby reducing power consumption and enhancing reliability.
       HBF can match the bandwidth of HBM, while achieving a capacity 8 to 16 times higher per stack at a similar cost. HBF uses 16 core chips, with a single stack capacity of up to 512GB. Eight HBF stacks can achieve a capacity of 4TB, which can support the operation of large AI models on GPU hardware, and its high-capacity feature performs exceptionally well. A single HBF can accommodate a complete 64B model, which is expected to be applied to the localization of large models on mobile devices, as well as to low-power, high-capacity edge AI storage requirements for edge devices such as autonomous driving, AI toys, and IoT.
     However, HBF primarily possesses the characteristics of high bandwidth and capacity, and this technology is targeted at read-intensive AI inference tasks, rather than latency-sensitive applications. The increasing demand for memory and decreasing demand for computation have given rise to a new paradigm, known as "storage-centric AI", which is most suitable for HBF-based systems. Initially, some people believed that NAND-based technology could not meet the demands of AI. For example, the latency level of NAND is too high, the write speed does not match that of DRAM, or there are durability issues. However, HBF is a reimagined NAND with extremely high performance. Based on the 400 billion parameter Llama3.1 model, we simulated the performance differences between GPU-equipped HBF and HBM. When observing the inference results at various stages of the inference engine process without considering capacity, it can be found that the overall performance difference between these two systems is only within 2.2%.
     It is estimated that by 2030, the HBF market will reach a value of $12 billion. Although this only accounts for 10% of the HBM market (approximately $117 billion) in the same year, it is expected to complement HBM and accelerate its growth. It is believed that HBF technology, with its advantages of high capacity, high bandwidth, low cost, and low power consumption, will become a new storage darling for AI inference.

 

This article is reprinted from the internet. If there is any infringement, please contact us to delete it. Thank you!