static double cosine_similarity(Map<String, Double> v1, Map<String, Double> v2) {
Set<String> both = Sets.newHashSet(v1.keySet());
both.retainAll(v2.keySet());
double sclar = 0, norm1 = 0, norm2 = 0;
for (String k : both) sclar += v1.get(k) * v2.get(k);
for (String k : v1.keySet()) norm1 += v1.get(k) * v1.get(k);
for (String k : v2.keySet()) norm2 += v2.get(k) * v2.get(k);
return sclar / Math.sqrt(norm1 * norm2);
}
http://stackoverflow.com/questions/1997750/cosine-similarity/1997775#1997775
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.htmlhttp://comscience.tistory.com/6
--------------------------------------------------------------------------------------
Have a look at: http://en.wikipedia.org/wiki/Cosine_similarity.
If you have vectors A and B.
The similarity is defined as:
cosine(theta) = A . B / ||A|| ||B||
For a vector A = (a1, a2), ||A|| is defined as sqrt(a1^2 + a2^2)
For vector A = (a1, a2) and B = (b1, b2), A . B is defined as a1 b1 + a2 b2;
So for vector A = (a1, a2) and B = (b1, b2), the cosine similarity is given as:
(a1 b1 + a2 b2) / sqrt(a1^2 + a2^2) sqrt(b1^2 + b2^2)
Example:
A = (1, 0.5), B = (0.5, 1)
cosine(theta) = (0.5 + 0.5) / sqrt(5/4) sqrt(5/4) = 4/5