说一说java里面的hashcode()—String.hashcode()_JAVA

您所在的位置：程序员俱乐部 > 编程开发 > JAVA > 说一说java里面的hashcode()—String.hashcode()

说一说java里面的hashcode()—String.hashcode()

2012/2/24 9:58:30 核桃博客程序员俱乐部我要评论(0)

摘要：http://www.hetaoblog.com/%E8%AF%B4%E4%B8%80%E8%AF%B4java%E9%87%8C%E9%9D%A2%E7%9A%84hashcode-string-hashcode/前一篇文章说了Object.hashcode(),现在来看下String.hashcode(),因为很多情况下HashMap和HashTable的key是String；下面是jdk里面的代码和注释Returnsahashcodeforthisstring
标签：has Hash Java

http://www.hetaoblog.com/%E8%AF%B4%E4%B8%80%E8%AF%B4java%E9%87%8C%E9%9D%A2%E7%9A%84hashcode-string-hashcode/
前一篇文章说了Object.hashcode(),现在来看下String.hashcode(), 因为很多情况下HashMap和HashTable的key是String；
下面是jdk里面的代码和注释

Returns a hash code for this string. The hash code for a String object is computed as
s[0]*31^(n-1) + s[1]*31^(n-2) + … + s[n-1]

using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

public int hashCode() {
int h = hash;
int len = count;
if (h == 0 && len > 0) {
int off = offset;
char val[] = value;

for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}

这个意思是说

1. String.Hashcode()的算法是s[0]*31^(n-1) + s[1]*31^(n-2) + … + s[n-1],就是每个字符的ascii值以质数31作为进制计算后得到的结果就是，相当于31进制；

实际的计算过程是从左向右计算，每移一位对之前算的值乘以31

2. 同时，有个成员变量hash保存了计算过的hashcode()，第一次调用String.hashcode() 后，计算出来的hashcode()就保存在hash成员变量中，以后就直接返回了；当然，这里有个特殊的地方是String是常量，就是创建以后值就不再变了， “Strings are constant; their values cannot be changed after they are created.”, 这是jdk里面的说明;所以他的hashcode()也永远不会变化，可以缓存在成员变量里面

3. 既然String.hashcode()的算法已经明确了，那么就可以尝试构造冲突，

假设两个字符串都是有两个字符，那么如果满足 a1*31+b1 = a2*31 +b2这个等式，这两个字符串的hashcode()就一样

也就是(a1-a2)*31=b2-b1 ，如果设a1-a2=1, 那么b2-b1=31就好， ascii表里面满足这个等式的字符串可以找出很多来, 下面是一段测试代码

public void testHash()

{

    System.out.println("A:" + (int)'A');

    System.out.println("B:" + (int)'B');

    System.out.println("a:" + (int)'a');

    System.out.println("Aa".hashCode() + "," + "BB".hashCode());

    System.out.println("Ba".hashCode() + "," + "CB".hashCode());

    System.out.println("Ca".hashCode() + "," + "DB".hashCode());

    System.out.println("Da".hashCode() + "," + "EB".hashCode());

}

打印的结果

A:65

B:66

a:97

2112,2112

2143,2143

2174,2174

2205,2205

可以看到，这些字符串都满足上面的等式，都发生了hashcode()冲突的情况；并且，只要按照上面的方法，可以构造出很多两个字符串的冲突来；

更进一步，假设两个字符串都有4个字符串，按照公式，假设前两个字符串产生hashcode为m, 后两个字符串产生hashcode为n,

那么，4个字符串的hashcode是m*31^2+n, 所以，如果两个字符串组成的hashcode()是一样的，那么4个字符串组成的hashcode也是一样的！

所以，”AaAa”, “BBBB”,”AaBB”,”BBAa”的hashcode都一样，都是2031744，

而”CaDB”,”CaCa”,”DBCa”,”DBDB”的hashcode都一样，是2091388；

并且按照这个方法，可以构造6个字符串，8个字符串等hashcode相同的字符串来；

去年的hash冲突导致dos攻击的漏洞，基本就可以按照这个方法构造错误

酷壳的Hash Collision DoS 问题里面也详细说明了当时的情况，不过陈浩举例的hash函数其实是HashMap的hash函数，

那个其实是对已经产生的hashcode()做二次hash，并不是String.hashcode()的代码，

可能读者看那个代码和后续的说明不能直接看明白为什么Aa和BB的hash值是一样的，也不知道为什么进一步可以得出AaBB和AaAa的hash值也一样的

相信读者看了这篇文章应该会清楚了

当然，陈浩的技术很好,我一直都很喜欢看他的文章

上一篇： JDK 1.5自带的几个小工具下一篇： MongoDB Java Driver 源码分析(7)：com.mongodb.DBAPILayer