哈希值,又称:散列函式(或散列算法,又称哈希函式,英语:Hash Function)是一种从任何一种数据中创建小的数字“指纹”的方法。想要了解哈希值,就需要了解哈希函数的性质。

哈希函数的逆向算法

Inverse of a hash function
 
I’ve used Thomas Wang’s integer hash functions for years for various purposes. Using techniques invented by Bob Jenkins for general hashing (e.g., hashes of strings), Wang derived several hash specialized for fixed size integer input. His 64-bit version is
 
uint64_t hash(uint64_t key) {
  key = (~key) + (key << 21); // key = (key << 21) - key - 1;
  key = key ^ (key >> 24);
  key = (key + (key << 3)) + (key << 8); // key * 265
  key = key ^ (key >> 14);
  key = (key + (key << 2)) + (key << 4); // key * 21
  key = key ^ (key >> 28);
  key = key + (key << 31);
  return key;
}
 
Key properties include avalanche (changing any input bit changes about half of the output bits) and invertibility. Recently I wanted to make explicit use of the inverse, in order to verify that zero would never arise as the hash of a given set of inputs. This property would allow me to initialize the hash table in question (which takes up several gigabytes) with zeros and avoid an explicit occupied bit on each entry. Thus, I needed inverse_hash(0).
 
 
Our function is the composition of the functions on each line, so we need to invert each one. Multiplication by 21 and 265 is easy; both numbers are odd, and therefore have multiplicative inverses mod 264. The rest of the lines are invertible because they’re Feistel functions; they break the key into two pieces, leave one piece alone, run the second function through an invertible function that depends on the first. For example, the line key = key ^ (key >> 24) leaves the top 24 bits alone. Once you know the top 24 bits, you can reconstruct the next 24 bits with an xor, and one more round gives the remaining bits. The full inverse is
 
uint64_t inverse_hash(uint64_t key) {
  uint64_t tmp;
 
  // Invert key = key + (key << 31)
  tmp = key-(key<<31);
  key = key-(tmp<<31);
 
  // Invert key = key ^ (key >> 28)
  tmp = key^key>>28;
  key = key^tmp>>28;
 
  // Invert key *= 21
  key *= 14933078535860113213u;
 
  // Invert key = key ^ (key >> 14)
  tmp = key^key>>14;
  tmp = key^tmp>>14;
  tmp = key^tmp>>14;
  key = key^tmp>>14;
 
  // Invert key *= 265
  key *= 15244667743933553977u;
 
  // Invert key = key ^ (key >> 24)
  tmp = key^key>>24;
  key = key^tmp>>24;
 
  // Invert key = (~key) + (key << 21)
  tmp = ~key;
  tmp = ~(key-(tmp<<21));
  tmp = ~(key-(tmp<<21));
  key = ~(key-(tmp<<21));
 
  return key;
}
 
The cleverness of the original hash function is that each invertible step is also extremely fast. The inverse is slower, but only moderately.
 
 
Finally, I did indeed luck out:
inverse_hash(0) = 0x7ffffbffffdfffff
This isn’t a valid pentago board in my packed representation, so zero initialization works. This turned out to be obvious in hindsight: all but the first step in the hash function leaves zero alone, and the last step is complement on the lower 21 bits, which is enough to know that inverse_hash(0) can’t be a valid pentago board. It’s still cool to have the full inverse, though.
 
 
I tried to email Thomas Wang in case he hadn’t had the occasion to write down the inverse explicitly, but unfortunately his HP email bounced. Impermanent email addresses make me sad.
 

哈希世界创造了一个平行世界,只为你们而存在,而你们可以在其中与你们接触。
Copyright © 2022 www.57jianzhi.com 哈希hash论坛 版权所有