CubeHash: a simple hash function


Introduction
Security
Software
Hardware
Submission
Prizes

CubeHash hardware

CubeHash fits into exceptionally small hardware area for a wide range of real-world throughput targets. CubeHash hardware implementations naturally provide future interoperability, supporting both CubeHash256 and CubeHash512 (and even CubeHash512x) without compromising area. Algorithms published in November 2010 show that the regular structure of CubeHash provides an unprecedented level of flexibility for the hardware designer, squeezing CubeHash into smaller and smaller areas while maintaining high speed.

Important questions to keep in mind when reading a CubeHash hardware implementation report:

  • Does the report say that it takes advantage of the November 2010 CubeHash algorithms?
  • Does the report show the flexibility of CubeHash in being implementable with not just 1 cycle per round but also 2 cycles per round, 4 cycles per round, 8 cycles per round, 16 cycles per round, and 32 cycles per round, with the area dropping accordingly? (Reports using 0.5 cycles per round should be disregarded; 0.5 cycles per round is an academic latency-minimization exercise, not a sensible choice for real-world applications.)
  • Does the report say that it takes advantage of SRAM cells to save area? This is particularly important for CubeHash as the number of cycles per round increases.
  • Does the report say exactly which ASIC library it is using? This is important because areas reported in "gate equivalents" (GE) are relative to the library size. An ASIC using a high-performance 130nm library will be faster than an ASIC using a standard-performance 130nm library, and will have about the same "GE" count, but will actually use considerably more area.
  • Does the report include post-place-and-route figures? The following example illustrates the importance of place and route. A 2009.11.11 comparison report by Tillich et al. included only post-synthesis figures, such as 21229Mbps for Keccak in 56316 GE (UMC 180nm FSA0A_C), and stated "Place & route is expected to have only little impact on the performance figures estimated after synthesis." An updated 2010.08.04 comparison report by the same authors included post-place-and-route figures, such as only 11624Mbps for Keccak in 56713 GE (UMC 180nm FSA0A_C), and quietly removed the erroneous "little impact" statement.
  • Did the authors provide all information necessary to verify the report? Would measurement errors and copy-and-paste errors be caught by third parties? Is the VHDL/Verilog code available online?

Even before the November 2010 algorithms, CubeHash was ranked very highly in SHA-3 hardware comparison reports. For example:

  • CubeHash was ranked #1 among second-round SHA-3-256 candidates in a comparison by Guo, Huang, Nazhandali, and Schaumont of the FPGA area (Xilinx Virtex-5 XC5VLX330-2FF1760) required to hash at 200Mbps: 622 slices for CubeHash (at 84.36 mW), 740 slices for SHA-256 (at 73.47 mW), 788 slices for Skein-256 (at 88.65 mW), 930 slices for Hamsi-256 (at 86.53 mW), ..., 4790 slices for SIMD-256 (at 362.12 mW), 5935 slices for BMW-256 (at 321.89 mW).
  • CubeHash was ranked #3 among second-round SHA-3-256 candidates in a comparison by the same authors of the ASIC area (130nm FSC0G_D_SC_TP_2006Q1v2.0) required to hash at 200Mbps: 23484 GEs for Hamsi-256 (at 2.77 mW), 26167 GEs for SHA-256 (at 2.20 mW), 29931 GEs for Skein-256 (at 4.41 mW), 34443 GEs for CubeHash (at 3.31 mW), ..., 113202 GEs for SIMD-256 (at 4.56 mW), 149858 GEs for BMW-256 (at 1.11 mW). This was a 1-cycle-per-round CubeHash implementation; a 16-cycle-per-round CubeHash implementation will achieve the same throughput in much smaller area, bringing CubeHash to #1.
  • CubeHash was ranked #3 among second-round SHA-3-256 candidates in a multiple-FPGA comparison of throughput-to-area ratio by Homsirikamol, Rogawski, and Gaj. According to the report, Keccak, Luffa, and CubeHash are the only SHA-3-256 candidates that have been shown to provide consistently better throughput-to-area ratio than SHA-256 on Spartan 3, Virtex 4, Virtex 5, Cyclone II, Cyclone III, Stratix II, and Stratix III.
  • CubeHash was ranked #2 among second-round SHA-3-512 candidates in the same comparison. The only SHA-3-512 candidates providing better overall throughput-to-area ratio than SHA-512 are Keccak (20% better on average) and CubeHash (10% better on average).

Version

This is version 2010.12.03 of the hardware.html web page.