[infinispan-dev] REPL async semantics in the context of Hibernate 2LC

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[infinispan-dev] REPL async semantics in the context of Hibernate 2LC

Galder Zamarreño
Hi all,

Forgive me if we've discussed this before (I vaguely remember...), but the current async semantics always through me off a bit, let me explain:

I've been working on/off on Hibernate 2LC tutorial that demonstrates how to run 2LC on embedded, Wildfly and Spring set ups, and for each of them, explains how it all works in local vs clustered mode.

One of the sections involves working with queries, updating an entity that's part of the query, and seeing how that query gets re-executed from the db. When an entity is updated, that entity's update timestamp gets updated in a cache, which in a cluster environment is configured with repl async.

If you have two nodes A and B, it was expected that if you updated the entity in node A, you'd want to wait a tiny bit to run the query in node B so that the timestamp update would propagate to node B.

However, recent async semantics work in such way that if you updated the entity in node A and wanted to execute the query in node A, you still might want to add a little delay...

The reason for that is that the logic changes based on whether the ownership of entity type key in the update timestamp cache is in node A or node B. If the owner is node A, the cache is updated directly by the main thread. So you can execute a query on node A immediately after the update and it'll be fine.

However, if the owner is node B, even if the update was done in node A, node A will only be updated asynchronously. So, if after calling an update on node A, you do a query on node A, in this scenario you'd get outdated results for a small period of time. [1]

So, my question here is: can we do anything to make this more predictable from a users perspective? Or is it just not worth doing it? Or is it just a side effect that we must be aware off?

Cheers,

[1] https://gist.github.com/galderz/676f689884969658b01a7695f08dd7a2
--
Galder Zamarreño
Infinispan, Red Hat


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] REPL async semantics in the context of Hibernate 2LC

Radim Vansa
Hi Galder,

I think that this was changed in Infinispan version 5.3 or so :) The
reason for this is that updates even in async cache are applied in the
same order on all owners. If you'd update local node A first to X, and
then asynchronously update the other node B, there could be a concurrent
update to Y on the other node B, and then the cluster would likely end
up with A having Y and B having X, without anything eventually resolving
this. Some locking has to be involved, too, and the algorithm in 5.3
actually did not allow the values to diverge, but caused a deadlock.

In 2LC, this can be eliminated in some cases, though - e.g. if we do
putIfAbsents with the same value, it's safe to apply the value locally
and sent the update asynchronously to the other node. For removals, it's
safe, too. Therefore, I have recently replaced distribution & locking
interceptors with 'optimized' version [1][2].

While I am strong adversary of the *_ASYNC modes in general, I think
that the consistent order of updates should be preserved there. And if
you do an async put to dist cache, you can't be sure that following read
will return the value either (and repl is just read-optimized+failure
resilient case of dist).

Radim

[1]
https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinispan/src/main/java/org/hibernate/cache/infinispan/access/UnorderedDistributionInterceptor.java
[2]
https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinispan/src/main/java/org/hibernate/cache/infinispan/access/LockingInterceptor.java

On 01/26/2017 01:24 PM, Galder Zamarreño wrote:

> Hi all,
>
> Forgive me if we've discussed this before (I vaguely remember...), but the current async semantics always through me off a bit, let me explain:
>
> I've been working on/off on Hibernate 2LC tutorial that demonstrates how to run 2LC on embedded, Wildfly and Spring set ups, and for each of them, explains how it all works in local vs clustered mode.
>
> One of the sections involves working with queries, updating an entity that's part of the query, and seeing how that query gets re-executed from the db. When an entity is updated, that entity's update timestamp gets updated in a cache, which in a cluster environment is configured with repl async.
>
> If you have two nodes A and B, it was expected that if you updated the entity in node A, you'd want to wait a tiny bit to run the query in node B so that the timestamp update would propagate to node B.
>
> However, recent async semantics work in such way that if you updated the entity in node A and wanted to execute the query in node A, you still might want to add a little delay...
>
> The reason for that is that the logic changes based on whether the ownership of entity type key in the update timestamp cache is in node A or node B. If the owner is node A, the cache is updated directly by the main thread. So you can execute a query on node A immediately after the update and it'll be fine.
>
> However, if the owner is node B, even if the update was done in node A, node A will only be updated asynchronously. So, if after calling an update on node A, you do a query on node A, in this scenario you'd get outdated results for a small period of time. [1]
>
> So, my question here is: can we do anything to make this more predictable from a users perspective? Or is it just not worth doing it? Or is it just a side effect that we must be aware off?
>
> Cheers,
>
> [1] https://gist.github.com/galderz/676f689884969658b01a7695f08dd7a2
> --
> Galder Zamarreño
> Infinispan, Red Hat
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] REPL async semantics in the context of Hibernate 2LC

Galder Zamarreño
Hahaha, yeah, changed a while back but keeps catching me everytime :)

Makes sense, the change came in as a result of having a single node owner of a key, and hence being able to apply changes in the right order.

I'll add a bit more details to the 2L cache docu so that this is made clearer.

Cheers,
--
Galder Zamarreño
Infinispan, Red Hat

> On 26 Jan 2017, at 14:30, Radim Vansa <[hidden email]> wrote:
>
> Hi Galder,
>
> I think that this was changed in Infinispan version 5.3 or so :) The
> reason for this is that updates even in async cache are applied in the
> same order on all owners. If you'd update local node A first to X, and
> then asynchronously update the other node B, there could be a concurrent
> update to Y on the other node B, and then the cluster would likely end
> up with A having Y and B having X, without anything eventually resolving
> this. Some locking has to be involved, too, and the algorithm in 5.3
> actually did not allow the values to diverge, but caused a deadlock.
>
> In 2LC, this can be eliminated in some cases, though - e.g. if we do
> putIfAbsents with the same value, it's safe to apply the value locally
> and sent the update asynchronously to the other node. For removals, it's
> safe, too. Therefore, I have recently replaced distribution & locking
> interceptors with 'optimized' version [1][2].
>
> While I am strong adversary of the *_ASYNC modes in general, I think
> that the consistent order of updates should be preserved there. And if
> you do an async put to dist cache, you can't be sure that following read
> will return the value either (and repl is just read-optimized+failure
> resilient case of dist).
>
> Radim
>
> [1]
> https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinispan/src/main/java/org/hibernate/cache/infinispan/access/UnorderedDistributionInterceptor.java
> [2]
> https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinispan/src/main/java/org/hibernate/cache/infinispan/access/LockingInterceptor.java
>
> On 01/26/2017 01:24 PM, Galder Zamarreño wrote:
>> Hi all,
>>
>> Forgive me if we've discussed this before (I vaguely remember...), but the current async semantics always through me off a bit, let me explain:
>>
>> I've been working on/off on Hibernate 2LC tutorial that demonstrates how to run 2LC on embedded, Wildfly and Spring set ups, and for each of them, explains how it all works in local vs clustered mode.
>>
>> One of the sections involves working with queries, updating an entity that's part of the query, and seeing how that query gets re-executed from the db. When an entity is updated, that entity's update timestamp gets updated in a cache, which in a cluster environment is configured with repl async.
>>
>> If you have two nodes A and B, it was expected that if you updated the entity in node A, you'd want to wait a tiny bit to run the query in node B so that the timestamp update would propagate to node B.
>>
>> However, recent async semantics work in such way that if you updated the entity in node A and wanted to execute the query in node A, you still might want to add a little delay...
>>
>> The reason for that is that the logic changes based on whether the ownership of entity type key in the update timestamp cache is in node A or node B. If the owner is node A, the cache is updated directly by the main thread. So you can execute a query on node A immediately after the update and it'll be fine.
>>
>> However, if the owner is node B, even if the update was done in node A, node A will only be updated asynchronously. So, if after calling an update on node A, you do a query on node A, in this scenario you'd get outdated results for a small period of time. [1]
>>
>> So, my question here is: can we do anything to make this more predictable from a users perspective? Or is it just not worth doing it? Or is it just a side effect that we must be aware off?
>>
>> Cheers,
>>
>> [1] https://gist.github.com/galderz/676f689884969658b01a7695f08dd7a2
>> --
>> Galder Zamarreño
>> Infinispan, Red Hat
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Radim Vansa <[hidden email]>
> JBoss Performance Team
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev