[infinispan-dev] MurmurHash fixes

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[infinispan-dev] MurmurHash fixes

Manik Surtani
Guys,

The MurmurHash2 impl we have in 4.2.0 is buggy in that my translation from the original C source was faulty and it effectively hashes over just 31 bits instead of 32.  It means the distribution result isn't as good as it could be.

Now it isn't that easy for me to just *fix* this in 4.2.1, since it means keys mapped to nodes using 4.2.0 may not map to the same node in 4.2.1.

So here is what I propose:

1) Fix it in 4.2.x as MurmurHash2A
2) Use MurmurHash2A by default, *unless* a config flag is provided that forces the use of MurmurHash2.  (e.g., <hash function="MurmurHash2">)

This will even give us the ability to use MurmurHash3 in 5.0 when we have it.

WDYT?

Cheers
Manik
--
Manik Surtani
[hidden email]
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

david marion

Manik,

 

  Being new here, I'm not totally sure of JBoss' stance on including other libraries. Hadoop has a Murmur2 hash implementation that I have been using at work for a while with no problems. Just a thought.

 

-- Dave Marion


 
> From: [hidden email]

> Date: Thu, 13 Jan 2011 19:35:44 +0000
> To: [hidden email]
> Subject: [infinispan-dev] MurmurHash fixes
>
> Guys,
>
> The MurmurHash2 impl we have in 4.2.0 is buggy in that my translation from the original C source was faulty and it effectively hashes over just 31 bits instead of 32. It means the distribution result isn't as good as it could be.
>
> Now it isn't that easy for me to just *fix* this in 4.2.1, since it means keys mapped to nodes using 4.2.0 may not map to the same node in 4.2.1.
>
> So here is what I propose:
>
> 1) Fix it in 4.2.x as MurmurHash2A
> 2) Use MurmurHash2A by default, *unless* a config flag is provided that forces the use of MurmurHash2. (e.g., <hash function="MurmurHash2">)
>
> This will even give us the ability to use MurmurHash3 in 5.0 when we have it.
>
> WDYT?
>
> Cheers
> Manik
> --
> Manik Surtani
> [hidden email]
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

Bela Ban
In reply to this post by Manik Surtani
I would fix the existing MurmurHash; IMO backward compat doesn't extend
to consistent hashing...

Does leaving the 1 bit out create extreme anomalies wrt hashing
distribution ? I wouldn't think so...

I'm fine with the other solution, too.



On 1/13/11 8:35 PM, Manik Surtani wrote:

> Guys,
>
> The MurmurHash2 impl we have in 4.2.0 is buggy in that my translation from the original C source was faulty and it effectively hashes over just 31 bits instead of 32.  It means the distribution result isn't as good as it could be.
>
> Now it isn't that easy for me to just *fix* this in 4.2.1, since it means keys mapped to nodes using 4.2.0 may not map to the same node in 4.2.1.
>
> So here is what I propose:
>
> 1) Fix it in 4.2.x as MurmurHash2A
> 2) Use MurmurHash2A by default, *unless* a config flag is provided that forces the use of MurmurHash2.  (e.g.,<hash function="MurmurHash2">)
>
> This will even give us the ability to use MurmurHash3 in 5.0 when we have it.
>
> WDYT?


--
Bela Ban
Lead JGroups / Clustering Team
JBoss
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

Bela Ban
In reply to this post by david marion
I'd favor not including it, unless it's a standalone small JAR. You guys
know my stance: we're already including way too much in Infinispan,
let's not make it worse...

Anyway, David, have you checked distribution created with Hadoop's
murmur hash impl ? It is more or less even ? What cluster sizes did you
test on ?

In any case, let's run RadarGun with Hadoop's murmur hash a few times,
and see what the distribution (and, I hope, consequently) the
performance is...

Of couse, this assumes uneven distribution leads to bad performance...


On 1/14/11 4:20 AM, david marion wrote:
>
> Manik,
>
>    Being new here, I'm not totally sure of JBoss' stance on including other libraries. Hadoop has a Murmur2 hash implementation that I have been using at work for a while with no problems. Just a thought.


--
Bela Ban
Lead JGroups / Clustering Team
JBoss
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

Emmanuel Bernard
In reply to this post by Bela Ban
Or release a 4.3.

On 14 janv. 2011, at 09:32, Bela Ban wrote:

> I would fix the existing MurmurHash; IMO backward compat doesn't extend
> to consistent hashing...
>
> Does leaving the 1 bit out create extreme anomalies wrt hashing
> distribution ? I wouldn't think so...
>
> I'm fine with the other solution, too.
>
>
>
> On 1/13/11 8:35 PM, Manik Surtani wrote:
>> Guys,
>>
>> The MurmurHash2 impl we have in 4.2.0 is buggy in that my translation from the original C source was faulty and it effectively hashes over just 31 bits instead of 32.  It means the distribution result isn't as good as it could be.
>>
>> Now it isn't that easy for me to just *fix* this in 4.2.1, since it means keys mapped to nodes using 4.2.0 may not map to the same node in 4.2.1.
>>
>> So here is what I propose:
>>
>> 1) Fix it in 4.2.x as MurmurHash2A
>> 2) Use MurmurHash2A by default, *unless* a config flag is provided that forces the use of MurmurHash2.  (e.g.,<hash function="MurmurHash2">)
>>
>> This will even give us the ability to use MurmurHash3 in 5.0 when we have it.
>>
>> WDYT?
>
>
> --
> Bela Ban
> Lead JGroups / Clustering Team
> JBoss
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

Sanne Grinovero
I'm wondering why you're considering backwards compatibility of the
hashing distribution. Is Infinispan supposed to support multiple
different versions to be connected in a cluster, providing some form
of rolling upgrade?

Cheers,
Sanne

2011/1/14 Emmanuel Bernard <[hidden email]>:

> Or release a 4.3.
>
> On 14 janv. 2011, at 09:32, Bela Ban wrote:
>
>> I would fix the existing MurmurHash; IMO backward compat doesn't extend
>> to consistent hashing...
>>
>> Does leaving the 1 bit out create extreme anomalies wrt hashing
>> distribution ? I wouldn't think so...
>>
>> I'm fine with the other solution, too.
>>
>>
>>
>> On 1/13/11 8:35 PM, Manik Surtani wrote:
>>> Guys,
>>>
>>> The MurmurHash2 impl we have in 4.2.0 is buggy in that my translation from the original C source was faulty and it effectively hashes over just 31 bits instead of 32.  It means the distribution result isn't as good as it could be.
>>>
>>> Now it isn't that easy for me to just *fix* this in 4.2.1, since it means keys mapped to nodes using 4.2.0 may not map to the same node in 4.2.1.
>>>
>>> So here is what I propose:
>>>
>>> 1) Fix it in 4.2.x as MurmurHash2A
>>> 2) Use MurmurHash2A by default, *unless* a config flag is provided that forces the use of MurmurHash2.  (e.g.,<hash function="MurmurHash2">)
>>>
>>> This will even give us the ability to use MurmurHash3 in 5.0 when we have it.
>>>
>>> WDYT?
>>
>>
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

Manik Surtani
We do support rolling upgrade.  And further, we also support the following scenario:

1) Run a cluster, using a cache store that is *not* shared.  Entries get hashed to specific nodes.
2) Shut down the cluster, upgrade Infinispan.
3) Should still be able to locate your entries, as they would be loaded from the cache stores.

If the hash function changes between upgrades, then requests for keys would now be mapped to different nodes, and the keys won't be found in those nodes' cache stores.


On 14 Jan 2011, at 09:45, Sanne Grinovero wrote:

> I'm wondering why you're considering backwards compatibility of the
> hashing distribution. Is Infinispan supposed to support multiple
> different versions to be connected in a cluster, providing some form
> of rolling upgrade?
>
> Cheers,
> Sanne
>
> 2011/1/14 Emmanuel Bernard <[hidden email]>:
>> Or release a 4.3.
>>
>> On 14 janv. 2011, at 09:32, Bela Ban wrote:
>>
>>> I would fix the existing MurmurHash; IMO backward compat doesn't extend
>>> to consistent hashing...
>>>
>>> Does leaving the 1 bit out create extreme anomalies wrt hashing
>>> distribution ? I wouldn't think so...
>>>
>>> I'm fine with the other solution, too.
>>>
>>>
>>>
>>> On 1/13/11 8:35 PM, Manik Surtani wrote:
>>>> Guys,
>>>>
>>>> The MurmurHash2 impl we have in 4.2.0 is buggy in that my translation from the original C source was faulty and it effectively hashes over just 31 bits instead of 32.  It means the distribution result isn't as good as it could be.
>>>>
>>>> Now it isn't that easy for me to just *fix* this in 4.2.1, since it means keys mapped to nodes using 4.2.0 may not map to the same node in 4.2.1.
>>>>
>>>> So here is what I propose:
>>>>
>>>> 1) Fix it in 4.2.x as MurmurHash2A
>>>> 2) Use MurmurHash2A by default, *unless* a config flag is provided that forces the use of MurmurHash2.  (e.g.,<hash function="MurmurHash2">)
>>>>
>>>> This will even give us the ability to use MurmurHash3 in 5.0 when we have it.
>>>>
>>>> WDYT?
>>>
>>>
>>> --
>>> Bela Ban
>>> Lead JGroups / Clustering Team
>>> JBoss
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
[hidden email]
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

Manik Surtani
In reply to this post by david marion
<base href="x-msg://14/">I don't have a problem with including libraries in principle, but it really needs to be worth it.  In this case, there is no point dragging in a whole bunch of Hadoop stuff for a single class implementation of a hash function.  :-)

On 14 Jan 2011, at 03:20, david marion wrote:

Manik,
 
  Being new here, I'm not totally sure of JBoss' stance on including other libraries. Hadoop has a Murmur2 hash implementation that I have been using at work for a while with no problems. Just a thought.
 
-- Dave Marion

 

> From: [hidden email]
> Date: Thu, 13 Jan 2011 19:35:44 +0000
> To: [hidden email]
> Subject: [infinispan-dev] MurmurHash fixes
> 
> Guys,
> 
> The MurmurHash2 impl we have in 4.2.0 is buggy in that my translation from the original C source was faulty and it effectively hashes over just 31 bits instead of 32. It means the distribution result isn't as good as it could be.
> 
> Now it isn't that easy for me to just *fix* this in 4.2.1, since it means keys mapped to nodes using 4.2.0 may not map to the same node in 4.2.1.
> 
> So here is what I propose:
> 
> 1) Fix it in 4.2.x as MurmurHash2A
> 2) Use MurmurHash2A by default, *unless* a config flag is provided that forces the use of MurmurHash2. (e.g., <hash function="MurmurHash2">)
> 
> This will even give us the ability to use MurmurHash3 in 5.0 when we have it.
> 
> WDYT?
> 
> Cheers
> Manik
> --
> Manik Surtani
> [hidden email]
> twitter.com/maniksurtani
> 
> Lead, Infinispan
> http://www.infinispan.org
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

Manik Surtani
In reply to this post by Emmanuel Bernard
Well, even with a 4.3 we'd still need to provide some form of backward compat.

The last, and most expensive, approach is to use the fixed function, and if a request doesn't find anything, retry with the old function (!)

On 14 Jan 2011, at 08:36, Emmanuel Bernard wrote:

> Or release a 4.3.
>
> On 14 janv. 2011, at 09:32, Bela Ban wrote:
>
>> I would fix the existing MurmurHash; IMO backward compat doesn't extend
>> to consistent hashing...
>>
>> Does leaving the 1 bit out create extreme anomalies wrt hashing
>> distribution ? I wouldn't think so...
>>
>> I'm fine with the other solution, too.
>>
>>
>>
>> On 1/13/11 8:35 PM, Manik Surtani wrote:
>>> Guys,
>>>
>>> The MurmurHash2 impl we have in 4.2.0 is buggy in that my translation from the original C source was faulty and it effectively hashes over just 31 bits instead of 32.  It means the distribution result isn't as good as it could be.
>>>
>>> Now it isn't that easy for me to just *fix* this in 4.2.1, since it means keys mapped to nodes using 4.2.0 may not map to the same node in 4.2.1.
>>>
>>> So here is what I propose:
>>>
>>> 1) Fix it in 4.2.x as MurmurHash2A
>>> 2) Use MurmurHash2A by default, *unless* a config flag is provided that forces the use of MurmurHash2.  (e.g.,<hash function="MurmurHash2">)
>>>
>>> This will even give us the ability to use MurmurHash3 in 5.0 when we have it.
>>>
>>> WDYT?
>>
>>
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
[hidden email]
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

Sanne Grinovero
In reply to this post by Manik Surtani
Very nice.
wouldn't it make sense to be able to run Infinispan in a "migration
mode", to cover all possible future cases in which a hash function is
upgraded, or for those cases people want to replace their function?
We could provide a feature to use a secondary function to try fetching
what was not found using the primary function, and when this triggers
it should also take care of moving data around the stores.
You could see this operation as an extension to the rehashing process,
so even if it has a performance hit it could reuse some code and
benefit of all optimizations you do during rehashing.

Thinking it as a rehashing operation, means we could potentially offer
means to change the hash algorithm on the fly; quite nice to
experiment on big datasets with different implementations.

Sanne

2011/1/14 Manik Surtani <[hidden email]>:

> We do support rolling upgrade.  And further, we also support the following scenario:
>
> 1) Run a cluster, using a cache store that is *not* shared.  Entries get hashed to specific nodes.
> 2) Shut down the cluster, upgrade Infinispan.
> 3) Should still be able to locate your entries, as they would be loaded from the cache stores.
>
> If the hash function changes between upgrades, then requests for keys would now be mapped to different nodes, and the keys won't be found in those nodes' cache stores.
>
>
> On 14 Jan 2011, at 09:45, Sanne Grinovero wrote:
>
>> I'm wondering why you're considering backwards compatibility of the
>> hashing distribution. Is Infinispan supposed to support multiple
>> different versions to be connected in a cluster, providing some form
>> of rolling upgrade?
>>
>> Cheers,
>> Sanne
>>
>> 2011/1/14 Emmanuel Bernard <[hidden email]>:
>>> Or release a 4.3.
>>>
>>> On 14 janv. 2011, at 09:32, Bela Ban wrote:
>>>
>>>> I would fix the existing MurmurHash; IMO backward compat doesn't extend
>>>> to consistent hashing...
>>>>
>>>> Does leaving the 1 bit out create extreme anomalies wrt hashing
>>>> distribution ? I wouldn't think so...
>>>>
>>>> I'm fine with the other solution, too.
>>>>
>>>>
>>>>
>>>> On 1/13/11 8:35 PM, Manik Surtani wrote:
>>>>> Guys,
>>>>>
>>>>> The MurmurHash2 impl we have in 4.2.0 is buggy in that my translation from the original C source was faulty and it effectively hashes over just 31 bits instead of 32.  It means the distribution result isn't as good as it could be.
>>>>>
>>>>> Now it isn't that easy for me to just *fix* this in 4.2.1, since it means keys mapped to nodes using 4.2.0 may not map to the same node in 4.2.1.
>>>>>
>>>>> So here is what I propose:
>>>>>
>>>>> 1) Fix it in 4.2.x as MurmurHash2A
>>>>> 2) Use MurmurHash2A by default, *unless* a config flag is provided that forces the use of MurmurHash2.  (e.g.,<hash function="MurmurHash2">)
>>>>>
>>>>> This will even give us the ability to use MurmurHash3 in 5.0 when we have it.
>>>>>
>>>>> WDYT?
>>>>
>>>>
>>>> --
>>>> Bela Ban
>>>> Lead JGroups / Clustering Team
>>>> JBoss
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> [hidden email]
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

Manik Surtani

On 14 Jan 2011, at 11:53, Sanne Grinovero wrote:

>
> Thinking it as a rehashing operation, means we could potentially offer
> means to change the hash algorithm on the fly; quite nice to
> experiment on big datasets with different implementations.

+1.  Although no way this can be enabled by default.  :)

--
Manik Surtani
[hidden email]
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

Mircea Markus
In reply to this post by Manik Surtani

On 13 Jan 2011, at 19:35, Manik Surtani wrote:

> Guys,
>
> The MurmurHash2 impl we have in 4.2.0 is buggy in that my translation from the original C source was faulty and it effectively hashes over just 31 bits instead of 32.  It means the distribution result isn't as good as it could be.
>
> Now it isn't that easy for me to just *fix* this in 4.2.1, since it means keys mapped to nodes using 4.2.0 may not map to the same node in 4.2.1.
>
> So here is what I propose:
>
> 1) Fix it in 4.2.x as MurmurHash2A
> 2) Use MurmurHash2A by default, *unless* a config flag is provided that forces the use of MurmurHash2.  (e.g., <hash function="MurmurHash2">)
>
> This will even give us the ability to use MurmurHash3 in 5.0 when we have it.
>
> WDYT?
+1. An production upgrade should not be affected by this change.

>
> Cheers
> Manik
> --
> Manik Surtani
> [hidden email]
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] MurmurHash fixes

Mircea Markus
In reply to this post by Bela Ban

On 14 Jan 2011, at 08:32, Bela Ban wrote:

> I would fix the existing MurmurHash; IMO backward compat doesn't extend
> to consistent hashing...
-1. Not doing so might "break" an application upgrading from 4.2.0 to 4.2.1 if e.g. cache loaders are used.  
>
> Does leaving the 1 bit out create extreme anomalies wrt hashing
> distribution ? I wouldn't think so...
>
> I'm fine with the other solution, too.
>
>
>


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Loading...