[infinispan-dev] Eventual consistency

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[infinispan-dev] Eventual consistency

Manik Surtani
As consistency models go, Infinispan is primarily strongly consistent (with 2-phase commit between data owners), with the exception of during a rehash where because of eventual consistency (inability to get a valid response to a remote GET) forces us to wait for more responses, a quorum if you like.  Not dissimilar to PAXOS [1] in some ways.

I'm wondering whether, for the sake of performance, we should also offer a fully eventually consistent model?  What I am thinking is that changes *always* occur only on the primary data owner.  Single phase, no additional round trips, etc.  The primary owner then asynchronously propagates changes to the other data owners.  This would mean things run much faster in a stable cluster, and durability is maintained.  However, during rehashes when keys are moved, the notion of the primary owner may change.  So to deal with this, we could use vector clocks [2] to version each entry.  Vector clocks allow us to "merge" state nicely in most cases, and in the case of reads, we'd flip back to a PAXOS style quorum during a rehash to get the most "correct" version.

In terms of implementation, almost all of this would only affect the DistributionInterceptor and the DistributionManager, so we could easily have eventually consistent flavours of these two components.  

Thoughts?

Cheers
Manik


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Eventual consistency

Erik Salter

+1 – but I thought the “eagerLockSingleNode” option behaved in this manner already?

 

Erik

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Manik Surtani
Sent: Wednesday, March 02, 2011 12:43 PM
To: infinispan -Dev List
Subject: [infinispan-dev] Eventual consistency

 

As consistency models go, Infinispan is primarily strongly consistent (with 2-phase commit between data owners), with the exception of during a rehash where because of eventual consistency (inability to get a valid response to a remote GET) forces us to wait for more responses, a quorum if you like.  Not dissimilar to PAXOS [1] in some ways.

 

I'm wondering whether, for the sake of performance, we should also offer a fully eventually consistent model?  What I am thinking is that changes *always* occur only on the primary data owner.  Single phase, no additional round trips, etc.  The primary owner then asynchronously propagates changes to the other data owners.  This would mean things run much faster in a stable cluster, and durability is maintained.  However, during rehashes when keys are moved, the notion of the primary owner may change.  So to deal with this, we could use vector clocks [2] to version each entry.  Vector clocks allow us to "merge" state nicely in most cases, and in the case of reads, we'd flip back to a PAXOS style quorum during a rehash to get the most "correct" version.

 

In terms of implementation, almost all of this would only affect the DistributionInterceptor and the DistributionManager, so we could easily have eventually consistent flavours of these two components.  

 

Thoughts?

 

Cheers

Manik

 



The information contained in this message is legally privileged and confidential, and is intended for the individual or entity to whom it is addressed (or their designee). If this message is read by anyone other than the intended recipient, please be advised that distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete or destroy all copies of this message.

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Eventual consistency

Sanne Grinovero
In reply to this post by Manik Surtani
Hi Manik,
can you explain the first cause, why is it that during a rehash you're
unable to get an answer to a GET?
If a node A having installed the view T'' is receiving a GET request
from B which is still having the outdated view T', while he's not the
owner any more as the new view changed to T'' and so he just
transferred the requested value to a node C,
he definitely knows how to handle the request by forwarding it to C: B
was the owner before - otherwise it wouldn't receive the request, and
because it isn't any more he must be aware of the new hash
configuration.
He still stays in the middle of communication, and then sends the
requested value to A along with enough information about the new view
to avoid more erroneous requests.

About your proposal, what would happen if the owner crashes before he
has async-written the changes to a secondary node?

Cheers,
Sanne

2011/3/2 Manik Surtani <[hidden email]>:

> As consistency models go, Infinispan is primarily strongly consistent (with
> 2-phase commit between data owners), with the exception of during a rehash
> where because of eventual consistency (inability to get a valid response to
> a remote GET) forces us to wait for more responses, a quorum if you like.
>  Not dissimilar to PAXOS [1] in some ways.
> I'm wondering whether, for the sake of performance, we should also offer a
> fully eventually consistent model?  What I am thinking is that changes
> *always* occur only on the primary data owner.  Single phase, no additional
> round trips, etc.  The primary owner then asynchronously propagates changes
> to the other data owners.  This would mean things run much faster in a
> stable cluster, and durability is maintained.  However, during rehashes when
> keys are moved, the notion of the primary owner may change.  So to deal with
> this, we could use vector clocks [2] to version each entry.  Vector clocks
> allow us to "merge" state nicely in most cases, and in the case of reads,
> we'd flip back to a PAXOS style quorum during a rehash to get the most
> "correct" version.
> In terms of implementation, almost all of this would only affect the
> DistributionInterceptor and the DistributionManager, so we could easily have
> eventually consistent flavours of these two components.
> Thoughts?
> Cheers
> Manik
> [1] http://en.wikipedia.org/wiki/Paxos_algorithm
> [2] http://en.wikipedia.org/wiki/Vector_clock
> --
> Manik Surtani
> [hidden email]
> twitter.com/maniksurtani
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Eventual consistency

Bela Ban
In reply to this post by Manik Surtani
The thing is the a causal history does not lead to automatic merges all
the time; Dynamo for instance leaves it up to the app developer to
resolve merge conflicts by comparing vector clocks.

+1 for experimenting with an eventual consistency model in Infinispan

On 3/2/11 6:43 PM, Manik Surtani wrote:
> As consistency models go, Infinispan is primarily strongly consistent (with 2-phase commit between data owners), with the exception of during a rehash where because of eventual consistency (inability to get a valid response to a remote GET) forces us to wait for more responses, a quorum if you like.  Not dissimilar to PAXOS [1] in some ways.
>
> I'm wondering whether, for the sake of performance, we should also offer a fully eventually consistent model?  What I am thinking is that changes *always* occur only on the primary data owner.  Single phase, no additional round trips, etc.  The primary owner then asynchronously propagates changes to the other data owners.  This would mean things run much faster in a stable cluster, and durability is maintained.  However, during rehashes when keys are moved, the notion of the primary owner may change.  So to deal with this, we could use vector clocks [2] to version each entry.  Vector clocks allow us to "merge" state nicely in most cases, and in the case of reads, we'd flip back to a PAXOS style quorum during a rehash to get the most "correct" version.
>
> In terms of implementation, almost all of this would only affect the DistributionInterceptor and the DistributionManager, so we could easily have eventually consistent flavours of these two components.


--
Bela Ban
Lead JGroups / Clustering Team
JBoss
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Eventual consistency

Manik Surtani-2
No, not all the time. To some degree, vector clock based versions can be merged, but as you say there are circumstances where manual intervention may be necessary.

Sent from my mobile phone

On 2 Mar 2011, at 22:30, Bela Ban <[hidden email]> wrote:

> The thing is the a causal history does not lead to automatic merges all
> the time; Dynamo for instance leaves it up to the app developer to
> resolve merge conflicts by comparing vector clocks.
>
> +1 for experimenting with an eventual consistency model in Infinispan
>
> On 3/2/11 6:43 PM, Manik Surtani wrote:
>> As consistency models go, Infinispan is primarily strongly consistent (with 2-phase commit between data owners), with the exception of during a rehash where because of eventual consistency (inability to get a valid response to a remote GET) forces us to wait for more responses, a quorum if you like.  Not dissimilar to PAXOS [1] in some ways.
>>
>> I'm wondering whether, for the sake of performance, we should also offer a fully eventually consistent model?  What I am thinking is that changes *always* occur only on the primary data owner.  Single phase, no additional round trips, etc.  The primary owner then asynchronously propagates changes to the other data owners.  This would mean things run much faster in a stable cluster, and durability is maintained.  However, during rehashes when keys are moved, the notion of the primary owner may change.  So to deal with this, we could use vector clocks [2] to version each entry.  Vector clocks allow us to "merge" state nicely in most cases, and in the case of reads, we'd flip back to a PAXOS style quorum during a rehash to get the most "correct" version.
>>
>> In terms of implementation, almost all of this would only affect the DistributionInterceptor and the DistributionManager, so we could easily have eventually consistent flavours of these two components.
>
>
> --
> Bela Ban
> Lead JGroups / Clustering Team
> JBoss
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Eventual consistency

Manik Surtani-2
In reply to this post by Sanne Grinovero
A GET may produce incorrect results if a node not involved in a rehash, Node X, asks for a key. E.g., it may ask Node C for the entry since Node C is a new owner. However Node C may not have completed applying state received so would return a null. In normal circumstances, this would be considered a valid response but if X is aware that a rehash is going on it would wait for more responses (from A and B).

As for node failure before an async rpc completes, this would result in data loss.

Sent from my mobile phone

On 2 Mar 2011, at 18:38, Sanne Grinovero <[hidden email]> wrote:

> Hi Manik,
> can you explain the first cause, why is it that during a rehash you're
> unable to get an answer to a GET?
> If a node A having installed the view T'' is receiving a GET request
> from B which is still having the outdated view T', while he's not the
> owner any more as the new view changed to T'' and so he just
> transferred the requested value to a node C,
> he definitely knows how to handle the request by forwarding it to C: B
> was the owner before - otherwise it wouldn't receive the request, and
> because it isn't any more he must be aware of the new hash
> configuration.
> He still stays in the middle of communication, and then sends the
> requested value to A along with enough information about the new view
> to avoid more erroneous requests.
>
> About your proposal, what would happen if the owner crashes before he
> has async-written the changes to a secondary node?
>
> Cheers,
> Sanne
>
> 2011/3/2 Manik Surtani <[hidden email]>:
>> As consistency models go, Infinispan is primarily strongly consistent (with
>> 2-phase commit between data owners), with the exception of during a rehash
>> where because of eventual consistency (inability to get a valid response to
>> a remote GET) forces us to wait for more responses, a quorum if you like.
>>  Not dissimilar to PAXOS [1] in some ways.
>> I'm wondering whether, for the sake of performance, we should also offer a
>> fully eventually consistent model?  What I am thinking is that changes
>> *always* occur only on the primary data owner.  Single phase, no additional
>> round trips, etc.  The primary owner then asynchronously propagates changes
>> to the other data owners.  This would mean things run much faster in a
>> stable cluster, and durability is maintained.  However, during rehashes when
>> keys are moved, the notion of the primary owner may change.  So to deal with
>> this, we could use vector clocks [2] to version each entry.  Vector clocks
>> allow us to "merge" state nicely in most cases, and in the case of reads,
>> we'd flip back to a PAXOS style quorum during a rehash to get the most
>> "correct" version.
>> In terms of implementation, almost all of this would only affect the
>> DistributionInterceptor and the DistributionManager, so we could easily have
>> eventually consistent flavours of these two components.
>> Thoughts?
>> Cheers
>> Manik
>> [1] http://en.wikipedia.org/wiki/Paxos_algorithm
>> [2] http://en.wikipedia.org/wiki/Vector_clock
>> --
>> Manik Surtani
>> [hidden email]
>> twitter.com/maniksurtani
>> Lead, Infinispan
>> http://www.infinispan.org
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Eventual consistency

Manik Surtani-2
In reply to this post by Erik Salter
Close, but not quite. EagerLockSingleNode just applies to locking. Actual replication of entries, that happens when a transaction commits, happens across all data owners. 

Sent from my mobile phone

On 2 Mar 2011, at 18:17, Erik Salter <[hidden email]> wrote:

+1 – but I thought the “eagerLockSingleNode” option behaved in this manner already?

 

Erik

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Manik Surtani
Sent: Wednesday, March 02, 2011 12:43 PM
To: infinispan -Dev List
Subject: [infinispan-dev] Eventual consistency

 

As consistency models go, Infinispan is primarily strongly consistent (with 2-phase commit between data owners), with the exception of during a rehash where because of eventual consistency (inability to get a valid response to a remote GET) forces us to wait for more responses, a quorum if you like.  Not dissimilar to PAXOS [1] in some ways.

 

I'm wondering whether, for the sake of performance, we should also offer a fully eventually consistent model?  What I am thinking is that changes *always* occur only on the primary data owner.  Single phase, no additional round trips, etc.  The primary owner then asynchronously propagates changes to the other data owners.  This would mean things run much faster in a stable cluster, and durability is maintained.  However, during rehashes when keys are moved, the notion of the primary owner may change.  So to deal with this, we could use vector clocks [2] to version each entry.  Vector clocks allow us to "merge" state nicely in most cases, and in the case of reads, we'd flip back to a PAXOS style quorum during a rehash to get the most "correct" version.

 

In terms of implementation, almost all of this would only affect the DistributionInterceptor and the DistributionManager, so we could easily have eventually consistent flavours of these two components.  

 

Thoughts?

 

Cheers

Manik

 



The information contained in this message is legally privileged and confidential, and is intended for the individual or entity to whom it is addressed (or their designee). If this message is read by anyone other than the intended recipient, please be advised that distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete or destroy all copies of this message.
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Eventual consistency

Sanne Grinovero
In reply to this post by Manik Surtani-2
2011/3/3 Manik Surtani <[hidden email]>:
> A GET may produce incorrect results if a node not involved in a rehash, Node X, asks for a key. E.g., it may ask Node C for the entry since Node C is a new owner. However Node C may not have completed applying state received so would return a null. In normal circumstances, this would be considered a valid response but if X is aware that a rehash is going on it would wait for more responses (from A and B).

I'd say that because C is aware that it's performing a rehash, and
it's aware he still doesn't know the correct value for the key it is
being requested, it should not return null but wait to know the proper
answer and return that - ideally it could give some hints to the
ongoing state transfer to prioritize this specific key as there is
immediate demand for it.
In any case, C is going to download this value soon as it's now an
owner of it, so this doesn't look like to me that this approach would
augment the network traffic.

Otherwise, how could a client requesting a key know if the returned
value is null or not? Should I deal with it in the application?

>
> As for node failure before an async rpc completes, this would result in data loss.
>
> Sent from my mobile phone
>
> On 2 Mar 2011, at 18:38, Sanne Grinovero <[hidden email]> wrote:
>
>> Hi Manik,
>> can you explain the first cause, why is it that during a rehash you're
>> unable to get an answer to a GET?
>> If a node A having installed the view T'' is receiving a GET request
>> from B which is still having the outdated view T', while he's not the
>> owner any more as the new view changed to T'' and so he just
>> transferred the requested value to a node C,
>> he definitely knows how to handle the request by forwarding it to C: B
>> was the owner before - otherwise it wouldn't receive the request, and
>> because it isn't any more he must be aware of the new hash
>> configuration.
>> He still stays in the middle of communication, and then sends the
>> requested value to A along with enough information about the new view
>> to avoid more erroneous requests.
>>
>> About your proposal, what would happen if the owner crashes before he
>> has async-written the changes to a secondary node?
>>
>> Cheers,
>> Sanne
>>
>> 2011/3/2 Manik Surtani <[hidden email]>:
>>> As consistency models go, Infinispan is primarily strongly consistent (with
>>> 2-phase commit between data owners), with the exception of during a rehash
>>> where because of eventual consistency (inability to get a valid response to
>>> a remote GET) forces us to wait for more responses, a quorum if you like.
>>>  Not dissimilar to PAXOS [1] in some ways.
>>> I'm wondering whether, for the sake of performance, we should also offer a
>>> fully eventually consistent model?  What I am thinking is that changes
>>> *always* occur only on the primary data owner.  Single phase, no additional
>>> round trips, etc.  The primary owner then asynchronously propagates changes
>>> to the other data owners.  This would mean things run much faster in a
>>> stable cluster, and durability is maintained.  However, during rehashes when
>>> keys are moved, the notion of the primary owner may change.  So to deal with
>>> this, we could use vector clocks [2] to version each entry.  Vector clocks
>>> allow us to "merge" state nicely in most cases, and in the case of reads,
>>> we'd flip back to a PAXOS style quorum during a rehash to get the most
>>> "correct" version.
>>> In terms of implementation, almost all of this would only affect the
>>> DistributionInterceptor and the DistributionManager, so we could easily have
>>> eventually consistent flavours of these two components.
>>> Thoughts?
>>> Cheers
>>> Manik
>>> [1] http://en.wikipedia.org/wiki/Paxos_algorithm
>>> [2] http://en.wikipedia.org/wiki/Vector_clock
>>> --
>>> Manik Surtani
>>> [hidden email]
>>> twitter.com/maniksurtani
>>> Lead, Infinispan
>>> http://www.infinispan.org
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Eventual consistency

Manik Surtani-2
The way it works right now is if a rehash is in progress, C returns with an UnsureResponse, which prompts X to wait for further responses from either A or B and use that response instead.  X sends the GET out to all of A, B and C in parallel anyway.

On 3 Mar 2011, at 14:23, Sanne Grinovero wrote:

> 2011/3/3 Manik Surtani <[hidden email]>:
>> A GET may produce incorrect results if a node not involved in a rehash, Node X, asks for a key. E.g., it may ask Node C for the entry since Node C is a new owner. However Node C may not have completed applying state received so would return a null. In normal circumstances, this would be considered a valid response but if X is aware that a rehash is going on it would wait for more responses (from A and B).
>
> I'd say that because C is aware that it's performing a rehash, and
> it's aware he still doesn't know the correct value for the key it is
> being requested, it should not return null but wait to know the proper
> answer and return that - ideally it could give some hints to the
> ongoing state transfer to prioritize this specific key as there is
> immediate demand for it.
> In any case, C is going to download this value soon as it's now an
> owner of it, so this doesn't look like to me that this approach would
> augment the network traffic.
>
> Otherwise, how could a client requesting a key know if the returned
> value is null or not? Should I deal with it in the application?
>
>>
>> As for node failure before an async rpc completes, this would result in data loss.
>>
>> Sent from my mobile phone
>>
>> On 2 Mar 2011, at 18:38, Sanne Grinovero <[hidden email]> wrote:
>>
>>> Hi Manik,
>>> can you explain the first cause, why is it that during a rehash you're
>>> unable to get an answer to a GET?
>>> If a node A having installed the view T'' is receiving a GET request
>>> from B which is still having the outdated view T', while he's not the
>>> owner any more as the new view changed to T'' and so he just
>>> transferred the requested value to a node C,
>>> he definitely knows how to handle the request by forwarding it to C: B
>>> was the owner before - otherwise it wouldn't receive the request, and
>>> because it isn't any more he must be aware of the new hash
>>> configuration.
>>> He still stays in the middle of communication, and then sends the
>>> requested value to A along with enough information about the new view
>>> to avoid more erroneous requests.
>>>
>>> About your proposal, what would happen if the owner crashes before he
>>> has async-written the changes to a secondary node?
>>>
>>> Cheers,
>>> Sanne
>>>
>>> 2011/3/2 Manik Surtani <[hidden email]>:
>>>> As consistency models go, Infinispan is primarily strongly consistent (with
>>>> 2-phase commit between data owners), with the exception of during a rehash
>>>> where because of eventual consistency (inability to get a valid response to
>>>> a remote GET) forces us to wait for more responses, a quorum if you like.
>>>>  Not dissimilar to PAXOS [1] in some ways.
>>>> I'm wondering whether, for the sake of performance, we should also offer a
>>>> fully eventually consistent model?  What I am thinking is that changes
>>>> *always* occur only on the primary data owner.  Single phase, no additional
>>>> round trips, etc.  The primary owner then asynchronously propagates changes
>>>> to the other data owners.  This would mean things run much faster in a
>>>> stable cluster, and durability is maintained.  However, during rehashes when
>>>> keys are moved, the notion of the primary owner may change.  So to deal with
>>>> this, we could use vector clocks [2] to version each entry.  Vector clocks
>>>> allow us to "merge" state nicely in most cases, and in the case of reads,
>>>> we'd flip back to a PAXOS style quorum during a rehash to get the most
>>>> "correct" version.
>>>> In terms of implementation, almost all of this would only affect the
>>>> DistributionInterceptor and the DistributionManager, so we could easily have
>>>> eventually consistent flavours of these two components.
>>>> Thoughts?
>>>> Cheers
>>>> Manik
>>>> [1] http://en.wikipedia.org/wiki/Paxos_algorithm
>>>> [2] http://en.wikipedia.org/wiki/Vector_clock
>>>> --
>>>> Manik Surtani
>>>> [hidden email]
>>>> twitter.com/maniksurtani
>>>> Lead, Infinispan
>>>> http://www.infinispan.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>

--
Manik Surtani
[hidden email]
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Eventual consistency

Sanne Grinovero
2011/3/3 Manik Surtani <[hidden email]>:
> The way it works right now is if a rehash is in progress, C returns with an UnsureResponse, which prompts X to wait for further responses from either A or B and use that response instead.  X sends the GET out to all of A, B and C in parallel anyway.

Haa, thanks for all the explanations.
Sanne

>
> On 3 Mar 2011, at 14:23, Sanne Grinovero wrote:
>
>> 2011/3/3 Manik Surtani <[hidden email]>:
>>> A GET may produce incorrect results if a node not involved in a rehash, Node X, asks for a key. E.g., it may ask Node C for the entry since Node C is a new owner. However Node C may not have completed applying state received so would return a null. In normal circumstances, this would be considered a valid response but if X is aware that a rehash is going on it would wait for more responses (from A and B).
>>
>> I'd say that because C is aware that it's performing a rehash, and
>> it's aware he still doesn't know the correct value for the key it is
>> being requested, it should not return null but wait to know the proper
>> answer and return that - ideally it could give some hints to the
>> ongoing state transfer to prioritize this specific key as there is
>> immediate demand for it.
>> In any case, C is going to download this value soon as it's now an
>> owner of it, so this doesn't look like to me that this approach would
>> augment the network traffic.
>>
>> Otherwise, how could a client requesting a key know if the returned
>> value is null or not? Should I deal with it in the application?
>>
>>>
>>> As for node failure before an async rpc completes, this would result in data loss.
>>>
>>> Sent from my mobile phone
>>>
>>> On 2 Mar 2011, at 18:38, Sanne Grinovero <[hidden email]> wrote:
>>>
>>>> Hi Manik,
>>>> can you explain the first cause, why is it that during a rehash you're
>>>> unable to get an answer to a GET?
>>>> If a node A having installed the view T'' is receiving a GET request
>>>> from B which is still having the outdated view T', while he's not the
>>>> owner any more as the new view changed to T'' and so he just
>>>> transferred the requested value to a node C,
>>>> he definitely knows how to handle the request by forwarding it to C: B
>>>> was the owner before - otherwise it wouldn't receive the request, and
>>>> because it isn't any more he must be aware of the new hash
>>>> configuration.
>>>> He still stays in the middle of communication, and then sends the
>>>> requested value to A along with enough information about the new view
>>>> to avoid more erroneous requests.
>>>>
>>>> About your proposal, what would happen if the owner crashes before he
>>>> has async-written the changes to a secondary node?
>>>>
>>>> Cheers,
>>>> Sanne
>>>>
>>>> 2011/3/2 Manik Surtani <[hidden email]>:
>>>>> As consistency models go, Infinispan is primarily strongly consistent (with
>>>>> 2-phase commit between data owners), with the exception of during a rehash
>>>>> where because of eventual consistency (inability to get a valid response to
>>>>> a remote GET) forces us to wait for more responses, a quorum if you like.
>>>>>  Not dissimilar to PAXOS [1] in some ways.
>>>>> I'm wondering whether, for the sake of performance, we should also offer a
>>>>> fully eventually consistent model?  What I am thinking is that changes
>>>>> *always* occur only on the primary data owner.  Single phase, no additional
>>>>> round trips, etc.  The primary owner then asynchronously propagates changes
>>>>> to the other data owners.  This would mean things run much faster in a
>>>>> stable cluster, and durability is maintained.  However, during rehashes when
>>>>> keys are moved, the notion of the primary owner may change.  So to deal with
>>>>> this, we could use vector clocks [2] to version each entry.  Vector clocks
>>>>> allow us to "merge" state nicely in most cases, and in the case of reads,
>>>>> we'd flip back to a PAXOS style quorum during a rehash to get the most
>>>>> "correct" version.
>>>>> In terms of implementation, almost all of this would only affect the
>>>>> DistributionInterceptor and the DistributionManager, so we could easily have
>>>>> eventually consistent flavours of these two components.
>>>>> Thoughts?
>>>>> Cheers
>>>>> Manik
>>>>> [1] http://en.wikipedia.org/wiki/Paxos_algorithm
>>>>> [2] http://en.wikipedia.org/wiki/Vector_clock
>>>>> --
>>>>> Manik Surtani
>>>>> [hidden email]
>>>>> twitter.com/maniksurtani
>>>>> Lead, Infinispan
>>>>> http://www.infinispan.org
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> [hidden email]
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>
>
> --
> Manik Surtani
> [hidden email]
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Eventual consistency

Mircea Markus
In reply to this post by Manik Surtani

On 2 Mar 2011, at 17:43, Manik Surtani wrote:

As consistency models go, Infinispan is primarily strongly consistent (with 2-phase commit between data owners), with the exception of during a rehash where because of eventual consistency (inability to get a valid response to a remote GET) forces us to wait for more responses, a quorum if you like.  Not dissimilar to PAXOS [1] in some ways.
We are strongly consistent even during the rehash - we make sure that the user receive's the last piece of data or no data at all (TimeouException). 

I'm wondering whether, for the sake of performance, we should also offer a fully eventually consistent model?  What I am thinking is that changes *always* occur only on the primary data owner.  Single phase, no additional round trips, etc.  The primary owner then asynchronously propagates changes to the other data owners.  This would mean things run much faster in a stable cluster, and durability is maintained.
Don't we already do that with "eagerLockSingleNode"?
 However, during rehashes when keys are moved, the notion of the primary owner may change.  So to deal with this, we could use vector clocks [2] to version each entry.  Vector clocks allow us to "merge" state nicely in most cases, and in the case of reads, we'd flip back to a PAXOS style quorum during a rehash to get the most "correct" version.

In terms of implementation, almost all of this would only affect the DistributionInterceptor and the DistributionManager, so we could easily have eventually consistent flavours of these two components.  

Thoughts?
+1
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Eventual consistency

Mircea Markus
In reply to this post by Manik Surtani-2
I see.
IMO we should first offer an complete (i.e. even during rehash) strongly consistent approach.

On 3 Mar 2011, at 14:05, Manik Surtani wrote:

> A GET may produce incorrect results if a node not involved in a rehash, Node X, asks for a key. E.g., it may ask Node C for the entry since Node C is a new owner. However Node C may not have completed applying state received so would return a null. In normal circumstances, this would be considered a valid response but if X is aware that a rehash is going on it would wait for more responses (from A and B).
>
> As for node failure before an async rpc completes, this would result in data loss.
>
> Sent from my mobile phone
>
> On 2 Mar 2011, at 18:38, Sanne Grinovero <[hidden email]> wrote:
>
>> Hi Manik,
>> can you explain the first cause, why is it that during a rehash you're
>> unable to get an answer to a GET?
>> If a node A having installed the view T'' is receiving a GET request
>> from B which is still having the outdated view T', while he's not the
>> owner any more as the new view changed to T'' and so he just
>> transferred the requested value to a node C,
>> he definitely knows how to handle the request by forwarding it to C: B
>> was the owner before - otherwise it wouldn't receive the request, and
>> because it isn't any more he must be aware of the new hash
>> configuration.
>> He still stays in the middle of communication, and then sends the
>> requested value to A along with enough information about the new view
>> to avoid more erroneous requests.
>>
>> About your proposal, what would happen if the owner crashes before he
>> has async-written the changes to a secondary node?
>>
>> Cheers,
>> Sanne
>>
>> 2011/3/2 Manik Surtani <[hidden email]>:
>>> As consistency models go, Infinispan is primarily strongly consistent (with
>>> 2-phase commit between data owners), with the exception of during a rehash
>>> where because of eventual consistency (inability to get a valid response to
>>> a remote GET) forces us to wait for more responses, a quorum if you like.
>>> Not dissimilar to PAXOS [1] in some ways.
>>> I'm wondering whether, for the sake of performance, we should also offer a
>>> fully eventually consistent model?  What I am thinking is that changes
>>> *always* occur only on the primary data owner.  Single phase, no additional
>>> round trips, etc.  The primary owner then asynchronously propagates changes
>>> to the other data owners.  This would mean things run much faster in a
>>> stable cluster, and durability is maintained.  However, during rehashes when
>>> keys are moved, the notion of the primary owner may change.  So to deal with
>>> this, we could use vector clocks [2] to version each entry.  Vector clocks
>>> allow us to "merge" state nicely in most cases, and in the case of reads,
>>> we'd flip back to a PAXOS style quorum during a rehash to get the most
>>> "correct" version.
>>> In terms of implementation, almost all of this would only affect the
>>> DistributionInterceptor and the DistributionManager, so we could easily have
>>> eventually consistent flavours of these two components.
>>> Thoughts?
>>> Cheers
>>> Manik
>>> [1] http://en.wikipedia.org/wiki/Paxos_algorithm
>>> [2] http://en.wikipedia.org/wiki/Vector_clock
>>> --
>>> Manik Surtani
>>> [hidden email]
>>> twitter.com/maniksurtani
>>> Lead, Infinispan
>>> http://www.infinispan.org
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev