LZ4 and zstd IP cores NVMe expansion

LZ4 and zstd compression and decompression implemented in hardware accelerated IP cores. Today costly host cycles are allocated to software based compression, when data is compressed before sent off, or decompressed when retrieved over the storage interconnect. These accelerators unlock up to 50% more bandwidth and 2-4x more storage capacity.

Overview

LZ4 and zstd compression and decompression implemented in hardware accelerated IP cores. Today costly host cycles are allocated to software based compression, when data is compressed before sent off, or decompressed when retrieved over the storage interconnect. These accelerators unlock up to 50% more bandwidth and 2-4x more storage capacity.

LZ4 IP core

Standards
  • Compression algorithm: LZ4 compression and decompression

Architecture
  • Modular architecture, enables scalability to meet customer throughput requirements

  • Flexible integration based on customer architectural requirements

  • Architectural configuration parameters accessible to fine tune performance

Deliverables

Performance evaluation license

  • C++ compression model for integration in customer performance simulation model

FPGA evaluation license

  • Encrypted IP delivery (Xilinx)

HDL Source Licenses

  • Synthesizable System Verilog RTL (encrypted)

  • Implementation constraints

  • UVM testbench (self-checking)

  • Vectors for testbench and expected results

  • User Documentation

Background

The LZ4 compression algorithm is a fast, lossless data compression technology renowned for its high-speed performance and low latency. LZ4 offers impressive compression and decompression speeds, making it an excellent choice for real-time applications and environments where quick data access is critical. LZ4 maintains a balance between compression ratio and speed, providing an efficient solution for various data types and use cases.

Key benefits of LZ4
  • Lightning-Fast Speeds: Extremely fast compression and decompression.

  • Low Latency: Minimal delay in accessing compressed data, enhancing real-time performance.

  • Enhanced Performance: Dedicated hardware acceleration dramatically improves data processing speeds for both compression and decompression.

  • Reduced CPU Load: Offloads compression and decompression tasks from the CPU, allowing it to handle other critical functions.

  • Energy Efficiency: Hardware-based compression and decompression are more power-efficient compared to software solutions, ideal for energy-sensitive applications.

  • Scalability: Adaptable for a wide range of devices, from low-power IoT devices to high-performance computing systems.

  • Compatibility: Easy to integrate. Compatible with SW based LZ4 compression and decompression.

Applications

The LZ4 compression and decompression hardware IP can be integrated into various applications, providing performance and efficiency improvements:

  • ZRAM/ZSWAP: Replace software-based decompression with hardware accelerated decompression. Same functionality, more performance/watt. 

  • Data Storage Systems: Speed up data compression and decompression, enhancing the performance of storage solutions.

  • Big Data Analytics: Accelerate the processing of large datasets, enabling quicker insights and decision-making.

  • Cloud Services: Improve the efficiency of cloud storage and computing, reducing latency and operating costs.

  • Network Appliances: Increase the throughput of network devices by accelerating data compression and decompression.

  • Consumer Electronics: Enhance user experience in devices such as smartphones, tablets, and multimedia players by speeding up data access.

  • Embedded Systems: Suitable for resource-constrained environments, providing efficient compression and decompression capabilities without taxing the CPU.

Integration

Designed with ease of use in mind, our LZ4 compression / decompression IP core operates stand-alone, alleviating the host from the demanding task of data compression and decompression. It is integrated on the SoC as a hardware accelerator based in the customer xPU architecture. This could be either on the Host CPU or on the GPU, DPU, NPU or SmartNIC.

It offers a fully synchronous design, ensuring straightforward integration into a variety of systems. With multiple configurations available, users can tailor the IP to their specific bus widths and throughput requirements.

Technical specification

Compression Ratio: Maintains high compression ratios inline with the LZ4 algorithm.

Throughput: Achieves up to 1.5 GB/s for compression and 6.1 GB/s for decompression.

Resource Utilization: Optimized for FPGA/ASIC implementation with efficient resource utilization.

Configurability: Allows customization of compression parameters for diverse applications.

Performance / KPI

FeaturePerformance
Compression ratio:According to standard specification
History buffer size (block and window size):Configurable at design time up to 256KB
History table size and ways (compression lookup table):Configurable at design time up to 65,536 separate entries distributed across up to 8 separate memory ways
Single decompression engine throughput:Avg: 6.1 GB/s for 32/64KB blocks, 4.7 GB/s for 4KB blocks (@1.5 GHz)
Single compression engine throughput:Avg: 1.5 GB/s for streams up to 128KB (@1.5 GHz)
Clock frequency:Up to 2.0 GHz (@TSMC N5)

zstd IP core

Standards
  • Compression algorithm: zstd decompression

Architecture
  • Modular architecture, enables scalability to meet customer throughput requirements

  • Flexible integration based on customer architectural requirements

  • Architectural configuration parameters accessible to fine tune performance

Deliverables

Performance evaluation license

  • C++ compression model for integration in customer performance simulation model

FPGA evaluation license

  • Encrypted IP delivery (Xilinx)

HDL Source Licenses

  • Synthesizable System Verilog RTL (encrypted)

  • Implementation constraints

  • UVM testbench (self-checking)

  • Vectors for testbench and expected results

  • User Documentation

Background

The zstd (Zstandard) compression algorithm is an advanced, lossless data compression technology. It has quickly become a popular choice for a variety of applications due to its high compression ratios, fast compression and decompression speeds, and efficient resource usage. Zstd is designed to offer the flexibility of adjusting compression levels, making it suitable for different performance needs, ranging from real-time applications to large data archival.

Key benefits of zstd
  • High Compression Ratios: Achieves significant reduction in data size, saving storage space and bandwidth.

  • Fast Decompression: Ensures quick access to compressed data, enhancing performance in data-intensive applications.

  • Increased Performance: Offloading decompression tasks to dedicated hardware significantly boosts data processing speeds.

  • Reduced CPU Load: Frees up the CPU to handle other critical tasks, improving overall system efficiency.

  • Energy Efficiency: Hardware decompression is more power-efficient than software-based solutions, ideal for energy-conscious applications.

  • Scalability: Suitable for a wide range of devices, from IoT devices to high-performance computing systems.

  • Compatibility: Easy to integrate. Compatible with SW based zstd compression and decompression.

Applications

The zstd decompression hardware IP can be integrated into various applications, providing performance and efficiency improvements:

  • ZRAM/ZSWAP: Replace software-based decompression with hardware accelerated decompression. Same functionality, more performance/watt. 

  • Data Storage Systems: Enhance the performance of storage solutions by accelerating data retrieval times.

  • Big Data Analytics: Speed up the processing of large datasets, enabling faster insights and decision-making.

  • Cloud Services: Improve the efficiency of cloud storage and computing, reducing latency and operating costs.

  • Network Appliances: Increase the throughput of network devices by accelerating data decompression.

  • Consumer Electronics: Enhance user experience in devices such as smartphones, tablets, and multimedia players by speeding up data access.

  • Embedded Systems: Suitable for resource-constrained environments, providing efficient decompression capabilities without taxing the CPU.

Ease of Use and Integration

Designed with ease of use in mind, our zstd Hardware Accelerated decompression IP operates stand-alone, alleviating the host CPU from the demanding task of data compression and decompression. It offers a fully synchronous design, ensuring straightforward integration into a variety of systems. With multiple configurations available, users can tailor the IP to their specific bus widths and throughput requirements.

Technical Specification

Throughput: Achieves 6.1GB/s for decompression.

Resource Utilization: Optimized for FPGA/ASIC implementation with efficient resource utilization.

Configurability: Allows customization of accelerator for diverse applications and throughput/area requirements.

Performance / KPI

FeaturePerformance
Compression ratio:According to standard specification
History Buffer size (block and window size)Design-time configurable (up to 128KB). Runtime: any block size up to design-time configured size
Single Decompression Engine ThroughputAvg: 6.1 / 6.1 / 4.7 GB/s for a stream of 64KB / 32KB / 4KB blocks (@1.5 GHz) 
Clock frequencyUp to 1.6GHz (@Samsung 4nm / TSMC N5)

Cache MX

The Cache MX compression solution increases the cache capacity by 2x at an 80% area and power saving to comparable SRAM capacity.

SuperRAM

High performance and low latency hardware accelerated compression at unmatched power efficiency.

Ziptilion™ BW

Delivers up to 25% more (LP)DDR bandwidth at nominal frequency and power, enabling a significantly more performance and energy efficient SoC.

DenseMem

Double the CXL connected memory capacity with data DenseMem.

Flash MX

Extend NvMe storage capacity 2-4x with LZ4 or zstd hardware accelerated compression.

SphinX

High Performance and Low Latency AES-XTS industry-standard encryption / decryption. Independent non-blocking encryption and decryption channels.