Zstd and LZ4 IP cores
Zstd and LZ4 compression and decompression implemented in hardware accelerated IP cores. Today costly host cycles are allocated to software based compression, when data is compressed before sent off, or decompressed when retrieved over the storage interconnect. These accelerators unlock up to 50% more bandwidth and 2-4x more storage capacity.
Overview
Zstd and LZ4 compression and decompression implemented in hardware accelerated IP cores. Today costly host cycles are allocated to software based compression, when data is compressed before sent off, or decompressed when retrieved over the storage interconnect. These accelerators unlock up to 50% more bandwidth and 2-4x more storage capacity.
zstd IP core
Standards
Compression algorithm: zstd decompression
Architecture
Modular architecture, enables scalability to meet customer throughput requirements
Flexible integration based on customer architectural requirements
Architectural configuration parameters accessible to fine tune performance
Deliverables
Performance evaluation license
C++ compression model for integration in customer performance simulation model
FPGA evaluation license
Encrypted IP delivery (Xilinx)
HDL Source Licenses
Synthesizable System Verilog RTL (encrypted)
Implementation constraints
UVM testbench (self-checking)
Vectors for testbench and expected results
User Documentation
Background
The zstd (Zstandard) compression algorithm is an advanced, lossless data compression technology. It has quickly become a popular choice for a variety of applications due to its high compression ratios, fast compression and decompression speeds, and efficient resource usage. Zstd is designed to offer the flexibility of adjusting compression levels, making it suitable for different performance needs, ranging from real-time applications to large data archival.
Key benefits of zstd
High Compression Ratios: Achieves significant reduction in data size, saving storage space and bandwidth.
Fast Decompression: Ensures quick access to compressed data, enhancing performance in data-intensive applications.
Increased Performance: Offloading decompression tasks to dedicated hardware significantly boosts data processing speeds.
Reduced CPU Load: Frees up the CPU to handle other critical tasks, improving overall system efficiency.
Energy Efficiency: Hardware decompression is more power-efficient than software-based solutions, ideal for energy-conscious applications.
Scalability: Suitable for a wide range of devices, from IoT devices to high-performance computing systems.
Compatibility: Easy to integrate. Compatible with SW based zstd compression and decompression.
Applications
The zstd decompression hardware IP can be integrated into various applications, providing performance and efficiency improvements:
ZRAM/ZSWAP: Replace software-based decompression with hardware accelerated decompression. Same functionality, more performance/watt.
Data Storage Systems: Enhance the performance of storage solutions by accelerating data retrieval times.
Big Data Analytics: Speed up the processing of large datasets, enabling faster insights and decision-making.
Cloud Services: Improve the efficiency of cloud storage and computing, reducing latency and operating costs.
Network Appliances: Increase the throughput of network devices by accelerating data decompression.
Consumer Electronics: Enhance user experience in devices such as smartphones, tablets, and multimedia players by speeding up data access.
Embedded Systems: Suitable for resource-constrained environments, providing efficient decompression capabilities without taxing the CPU.
Ease of Use and Integration
Designed with ease of use in mind, our zstd Hardware Accelerated decompression IP operates stand-alone, alleviating the host CPU from the demanding task of data compression and decompression. It offers a fully synchronous design, ensuring straightforward integration into a variety of systems. With multiple configurations available, users can tailor the IP to their specific bus widths and throughput requirements.
Technical Specification
Throughput: Achieves 6.1GB/s for decompression.
Resource Utilization: Optimized for FPGA/ASIC implementation with efficient resource utilization.
Configurability: Allows customization of accelerator for diverse applications and throughput/area requirements.
Performance / KPI
Feature | Performance |
Compression ratio: | According to standard specification |
History Buffer size (block and window size) | Design-time configurable (up to 128KB). Runtime: any block size up to design-time configured size |
Single Decompression Engine Throughput | Avg: 6.1 / 6.1 / 4.7 GB/s for a stream of 64KB / 32KB / 4KB blocks (@1.5 GHz) |
Clock frequency | Up to 1.6GHz (@Samsung 4nm / TSMC N5) |
LZ4 IP core
Standards
Compression algorithm: LZ4 compression and decompression
Architecture
Modular architecture, enables scalability to meet customer throughput requirements
Flexible integration based on customer architectural requirements
Architectural configuration parameters accessible to fine tune performance
Deliverables
Performance evaluation license
C++ compression model for integration in customer performance simulation model
FPGA evaluation license
Encrypted IP delivery (Xilinx)
HDL Source Licenses
Synthesizable System Verilog RTL (encrypted)
Implementation constraints
UVM testbench (self-checking)
Vectors for testbench and expected results
User Documentation
Background
The LZ4 compression algorithm is a fast, lossless data compression technology renowned for its high-speed performance and low latency. LZ4 offers impressive compression and decompression speeds, making it an excellent choice for real-time applications and environments where quick data access is critical. LZ4 maintains a balance between compression ratio and speed, providing an efficient solution for various data types and use cases.
Key benefits of LZ4
Lightning-Fast Speeds: Extremely fast compression and decompression.
Low Latency: Minimal delay in accessing compressed data, enhancing real-time performance.
Enhanced Performance: Dedicated hardware acceleration dramatically improves data processing speeds for both compression and decompression.
Reduced CPU Load: Offloads compression and decompression tasks from the CPU, allowing it to handle other critical functions.
Energy Efficiency: Hardware-based compression and decompression are more power-efficient compared to software solutions, ideal for energy-sensitive applications.
Scalability: Adaptable for a wide range of devices, from low-power IoT devices to high-performance computing systems.
Compatibility: Easy to integrate. Compatible with SW based LZ4 compression and decompression.
Applications
The LZ4 compression and decompression hardware IP can be integrated into various applications, providing performance and efficiency improvements:
ZRAM/ZSWAP: Replace software-based decompression with hardware accelerated decompression. Same functionality, more performance/watt.
Data Storage Systems: Speed up data compression and decompression, enhancing the performance of storage solutions.
Big Data Analytics: Accelerate the processing of large datasets, enabling quicker insights and decision-making.
Cloud Services: Improve the efficiency of cloud storage and computing, reducing latency and operating costs.
Network Appliances: Increase the throughput of network devices by accelerating data compression and decompression.
Consumer Electronics: Enhance user experience in devices such as smartphones, tablets, and multimedia players by speeding up data access.
Embedded Systems: Suitable for resource-constrained environments, providing efficient compression and decompression capabilities without taxing the CPU.
Integration
Designed with ease of use in mind, our LZ4 compression / decompression IP core operates stand-alone, alleviating the host from the demanding task of data compression and decompression. It is integrated on the SoC as a hardware accelerator based in the customer xPU architecture. This could be either on the Host CPU or on the GPU, DPU, NPU or SmartNIC.
It offers a fully synchronous design, ensuring straightforward integration into a variety of systems. With multiple configurations available, users can tailor the IP to their specific bus widths and throughput requirements.
Technical specification
Compression Ratio: Maintains high compression ratios inline with the LZ4 algorithm.
Throughput: Achieves up to 1.5 GB/s for compression and 6.1 GB/s for decompression.
Resource Utilization: Optimized for FPGA/ASIC implementation with efficient resource utilization.
Configurability: Allows customization of compression parameters for diverse applications.
Performance / KPI
Feature | Performance |
Compression ratio: | According to standard specification |
History buffer size (block and window size): | Configurable at design time up to 256KB |
History table size and ways (compression lookup table): | Configurable at design time up to 65,536 separate entries distributed across up to 8 separate memory ways |
Single decompression engine throughput: | Avg: 6.1 GB/s for 32/64KB blocks, 4.7 GB/s for 4KB blocks (@1.5 GHz) |
Single compression engine throughput: | Avg: 1.5 GB/s for streams up to 128KB (@1.5 GHz) |
Clock frequency: | Up to 2.0 GHz (@TSMC N5) |
Cache MX
The Cache MX compression solution increases the cache capacity by 2x at an 80% area and power saving to comparable SRAM capacity.
SuperRAM
High performance and low latency hardware accelerated compression at unmatched power efficiency.
Ziptilion™ BW
Delivers up to 25% more (LP)DDR bandwidth at nominal frequency and power, enabling a significantly more performance and energy efficient SoC.
DenseMem
Double the CXL connected memory capacity with data DenseMem.
NVMe expansion
Extend NvMe storage capacity 2-4x with LZ4 or zstd hardware accelerated compression.
SphinX
High Performance and Low Latency AES-XTS industry-standard encryption / decryption. Independent non-blocking encryption and decryption channels.