[infinispan-dev] CloudTM: Additional Atomic broadcast based replication mechanism integrated in Infinispan

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[infinispan-dev] CloudTM: Additional Atomic broadcast based replication mechanism integrated in Infinispan

Paolo Romano
Hi,

at INESC-ID, we have just finished brewing another (full) replication
protocol for Infinispan, in the context of the Cloud-TM project. The
credits go to Pedro Ruivo, a Msc student working in our group, who
developed this as part of his thesis work.

Like the current Infinispan's 2PC-based replication mechanism, also this
is a multi-master scheme in which transactions are executed locally by
any node and without any coordination among replicas till the commit phase.

Rather than using 2PC, however, this scheme scheme relies on a single
Atomic broadcast (AB, a.k.a. total order broadcast) to disseminate
transactions' writesets while ensuring totally ordered delivery of
messages across the various nodes.

The replication scheme is, in fact, quite simple. Upon delivery of the
AB with the writeset of a transaction, we ensure that each replica
executes the following operations in the order established by the AB layer:
- acquiring locks on the writeset, possibly stealing the lock from any
locally running transaction (and flagging it as aborted)
- performing a write-skew check (in case Infinispan is configured to do
this check)
- applying the writeset
- release the locks.

ABcast can be implemented in many many ways, but for simplicity we used
the sequencer based implementation available in JGroups. This scheme is
quite simple:
- upon an atomic broadcast, the message is simply broadcast to all nodes
- upon receipt of the message, however, the nodes do not deliver it
immediately to the application layer (Infinispan in this case). Instead,
a special node (e.g. the one with lesser id in the group), namely the
sequencer, broadcasts back the order in which processes should deliver
this message.
- finally, the nodes deliver the atomic broadcast message in the order
specified by the sequencer.

The main advantage of this replication mechanism is that it avoids
distributed deadlocks, which makes its performance way better at high
contention (note that local deadlocks may still occur, as Infinispan's
encounter locking scheme does not prevent them. But these are much more
rare as the number of "contending" threads is much lower).
Further, locks are held shorter with respect to 2PC. With 2PC, the
cohort nodes, maintain the locks on behalf of the coordinator, since the
reception of the prepare and until they receive the commit message. This
encompasses a round trip (collecting votes from all cohorts, and sending
back the decision). With the ABcast scheme, instead, the locks acquired
on behalf of remote transactions, are held only for the time strictly
required to writeback locally.
Finally the sequencer is, in fact, a privileged node, as it can commit
transactions much faster than the other nodes, as it can self-assign the
order in which transactions need to be processed. This may not be very
fair at high contention rate, as it gets higher chances of committing
transactions, but does make it MUCH faster overall.

Concerning blocking scenarios in case of failures. Just like 2PC is
blocking in case of crashes of a node coordinating a transaction, this
replication scheme is also blocking, but this time in case of crashes of
the sequencer. The comparison in terms of liveness guarantees seem
therefore quite fair. (Note that it would have been possible to make
this replication mechanism non-blocking, at the cost of one extra
communication step. But we opted not to do it, to compare protocols more
fairly).

To evaluate performance we ran the same kind of experiments we used in
our recent mail where we evaluted a Primary backup-based replication
scheme. All nodes only do write operations, transactions of 10
statements, one of which being a put. Accesses are uniformly distributed
to 1K, 10K, 100K data items. Machines are 8cores, 8GB RAM, radargun is
using 10 threads per ndoe.

Summary of results (see attached pdf):
- Throughput, page 1: the plots show significant speedups. Especially at
high contention, where 2PC stumbles upon very frequent distributed
deadlocks. Note that we enabled deadlock detection in all experiments
running 2PC.
- Abort rate, page 2: abort rates are similar in both protocols (note
the logscale on y-axis). But the AB-based solution, avoiding deadlocks,
detects the need to abort transactions much faster than 2PC.
- Average Commit duration, page 3: at high contention, 2PC trashes due
to deadlocks (despite the deadlock detection mechanism), and this is
reflected in the Avg. commit duration as well. In the other cases (10K
and 100K keys), where there is less contention, the commit duration of
the two protocols is similar (see next).
- To bundle or not to bundle?, page 4: the throughput results shown in
page 1 have actually a little trick :-P We used "some" bundling ONLY in
the case of transactions spanning 100K data items. We noticed that w/o
bundling, the commit time duration of the AB-based scheme got way
lousier and experimented with a few bundling values till we  got decent
performance (See plot at page 4). This is explainable since with 100K
keys, being the contention low, less transactions get aborted along
their execution. This means that more transaction reach the commit
phase, and, in its turn, more atomic broadcasts hit the network layer,
and more load for the sequencing node. Bundling, as you can see in page
4, makes the trick however. We did not retry the tests with the other
protocols with and w/o bundling, as this would have been very time
consuming. BTW, I believe that the results would not have been very
different.

Let me open a small parenthesis about bundling: this can be an extremely
effective weapon, but manual configuration is a bid headache and
extremely time consuming. I believe that it would be very useful to have
some self-tuning mechanism for it in JGroups. In fact, we've recently
got very nice results using Reinforcement-learning to tackle this
problem (well, to be precise, not really the same problem, but a very
similar one):

     http://www.inesc-id.pt/ficheiros/publicacoes/7163.pdf

...but I've implemented the prototype of this solution in Appia, as I
knew it much better than JGroups. What do you think about integrating a
similar mechanism in JGroup's ergonomics?

Back to replication: we are now working on a similar solution for
partial replication (a.k.a. distribution in Infinispan).

I'll keep you posted on our progresses!

Cheers,

     Paolo

PS: The code is available on GitHUB, if you want to take a look at it:
https://github.com/cloudtm/infinispan/tree/abcast_replication


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Performance Evaluation.pdf (102K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] CloudTM: Additional Atomic broadcast based replication mechanism integrated in Infinispan

Bela Ban


On 4/16/11 10:47 PM, Paolo Romano wrote:

> Hi,
>
> at INESC-ID, we have just finished brewing another (full) replication
> protocol for Infinispan, in the context of the Cloud-TM project. The
> credits go to Pedro Ruivo, a Msc student working in our group, who
> developed this as part of his thesis work.
>
> Like the current Infinispan's 2PC-based replication mechanism, also this
> is a multi-master scheme in which transactions are executed locally by
> any node and without any coordination among replicas till the commit phase.
>
> Rather than using 2PC, however, this scheme scheme relies on a single
> Atomic broadcast (AB, a.k.a. total order broadcast) to disseminate
> transactions' writesets while ensuring totally ordered delivery of
> messages across the various nodes.
>
> The replication scheme is, in fact, quite simple. Upon delivery of the
> AB with the writeset of a transaction, we ensure that each replica
> executes the following operations in the order established by the AB layer:
> - acquiring locks on the writeset, possibly stealing the lock from any
> locally running transaction (and flagging it as aborted)
> - performing a write-skew check (in case Infinispan is configured to do
> this check)
> - applying the writeset
> - release the locks.


Using atomic broadcasts instead of 2PC to disseminate updates is
something I always thought was a good idea !
(I think something similar has been implemented in the context of a
research project at ETH Zurich in Gustavo Alonzo's group (with Postgresql))

I think your performance results show that this schema results in fewer
(lock) collisions and faster updates - despite the sequencer approach
used. This *might* even be faster with a token based total order protocol...


> ABcast can be implemented in many many ways, but for simplicity we used
> the sequencer based implementation available in JGroups. This scheme is
> quite simple:
> - upon an atomic broadcast, the message is simply broadcast to all nodes
> - upon receipt of the message, however, the nodes do not deliver it
> immediately to the application layer (Infinispan in this case). Instead,
> a special node (e.g. the one with lesser id in the group), namely the
> sequencer, broadcasts back the order in which processes should deliver
> this message.
> - finally, the nodes deliver the atomic broadcast message in the order
> specified by the sequencer.
>
> The main advantage of this replication mechanism is that it avoids
> distributed deadlocks, which makes its performance way better at high
> contention


+1


  (note that local deadlocks may still occur, as Infinispan's
> encounter locking scheme does not prevent them. But these are much more
> rare as the number of "contending" threads is much lower).
> Further, locks are held shorter with respect to 2PC.


Exactly !


>  With 2PC, the
> cohort nodes, maintain the locks on behalf of the coordinator, since the
> reception of the prepare and until they receive the commit message. This
> encompasses a round trip (collecting votes from all cohorts, and sending
> back the decision). With the ABcast scheme, instead, the locks acquired
> on behalf of remote transactions, are held only for the time strictly
> required to writeback locally.
> Finally the sequencer is, in fact, a privileged node, as it can commit
> transactions much faster than the other nodes, as it can self-assign the
> order in which transactions need to be processed.


This is something I haven't thought about yet. JGroups itself doesn't
re-arrange the total order; it basically establishes it on a
first-come-first-serve basis. But you're right, arranging incoming
requests into a different total order might make sense. Example: if you
have P1, R2, P2, P3, R2, it might make sense to package {P1 and P2} and
{R1 and R2} together, to make this even more efficient... Interesting !



>  This may not be very
> fair at high contention rate, as it gets higher chances of committing
> transactions, but does make it MUCH faster overall.


Agreed. Compare it to java.util.concurrent reentrant locks: fairness
slows down this implementation, so IMO non-fairness is ok as long as
there is no starvation.



> - To bundle or not to bundle?, page 4: the throughput results shown in
> page 1 have actually a little trick :-P We used "some" bundling ONLY in
> the case of transactions spanning 100K data items. We noticed that w/o
> bundling, the commit time duration of the AB-based scheme got way
> lousier and experimented with a few bundling values till we got decent
> performance (See plot at page 4). This is explainable since with 100K
> keys, being the contention low, less transactions get aborted along
> their execution. This means that more transaction reach the commit
> phase, and, in its turn, more atomic broadcasts hit the network layer,
> and more load for the sequencing node. Bundling, as you can see in page
> 4, makes the trick however. We did not retry the tests with the other
> protocols with and w/o bundling, as this would have been very time
> consuming. BTW, I believe that the results would not have been very
> different.


Did you apply bundling at the configuration level, or per message ? Not
sure if you know about the ability to override bundling on a per-message
basis: the NO_BUNDLE flag set in a message overrides the bundling
configuration, and a message with such a flag set is sent immediately.



> Let me open a small parenthesis about bundling: this can be an extremely
> effective weapon, but manual configuration is a bid headache and
> extremely time consuming. I believe that it would be very useful to have
> some self-tuning mechanism for it in JGroups. In fact, we've recently
> got very nice results using Reinforcement-learning to tackle this
> problem (well, to be precise, not really the same problem, but a very
> similar one):
>
> http://www.inesc-id.pt/ficheiros/publicacoes/7163.pdf


I'll take a look. In the context of ergonomics, I've been wanting to set
the max_bundle_time value dynamically, so this input is certainly helpful !


> ...but I've implemented the prototype of this solution in Appia, as I
> knew it much better than JGroups. What do you think about integrating a
> similar mechanism in JGroup's ergonomics?


+1. Definitely something I've been wanting to do...




--
Bela Ban
Lead JGroups / Clustering Team
JBoss
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] [Cloudtm-discussion] CloudTM: Additional Atomic broadcast based replication mechanism integrated in Infinispan

Manik Surtani
In reply to this post by Paolo Romano
Excellent stuff, Paolo and Pedro.  My comments below, inline.  Cc'ing infinispan-dev as well.

On 16 Apr 2011, at 21:47, Paolo Romano wrote:

> Hi,
>
> at INESC-ID, we have just finished brewing another (full) replication protocol for Infinispan, in the context of the Cloud-TM project. The credits go to Pedro Ruivo, a Msc student working in our group, who developed this as part of his thesis work.
>
> Like the current Infinispan's 2PC-based replication mechanism, also this is a multi-master scheme in which transactions are executed locally by any node and without any coordination among replicas till the commit phase.
>
> Rather than using 2PC, however, this scheme scheme relies on a single Atomic broadcast (AB, a.k.a. total order broadcast) to disseminate transactions' writesets while ensuring totally ordered delivery of messages across the various nodes.
>
> The replication scheme is, in fact, quite simple. Upon delivery of the AB with the writeset of a transaction, we ensure that each replica executes the following operations in the order established by the AB layer:
> - acquiring locks on the writeset, possibly stealing the lock from any locally running transaction (and flagging it as aborted)
> - performing a write-skew check (in case Infinispan is configured to do this check)
> - applying the writeset
> - release the locks.
I presume this implementation still plays nice with XA/JTA?  In that transactions marked for rollback, etc are respected?

>
> ABcast can be implemented in many many ways, but for simplicity we used the sequencer based implementation available in JGroups. This scheme is quite simple:
> - upon an atomic broadcast, the message is simply broadcast to all nodes
> - upon receipt of the message, however, the nodes do not deliver it immediately to the application layer (Infinispan in this case). Instead, a special node (e.g. the one with lesser id in the group), namely the sequencer, broadcasts back the order in which processes should deliver this message.
> - finally, the nodes deliver the atomic broadcast message in the order specified by the sequencer.
>
> The main advantage of this replication mechanism is that it avoids distributed deadlocks, which makes its performance way better at high contention (note that local deadlocks may still occur, as Infinispan's encounter locking scheme does not prevent them. But these are much more rare as the number of "contending" threads is much lower).

Definitely.

> Further, locks are held shorter with respect to 2PC. With 2PC, the cohort nodes, maintain the locks on behalf of the coordinator, since the reception of the prepare and until they receive the commit message. This encompasses a round trip (collecting votes from all cohorts, and sending back the decision). With the ABcast scheme, instead, the locks acquired on behalf of remote transactions, are held only for the time strictly required to writeback locally.

Very true.  Nice.

> Finally the sequencer is, in fact, a privileged node, as it can commit transactions much faster than the other nodes, as it can self-assign the order in which transactions need to be processed. This may not be very fair at high contention rate, as it gets higher chances of committing transactions, but does make it MUCH faster overall.
>
> Concerning blocking scenarios in case of failures. Just like 2PC is blocking in case of crashes of a node coordinating a transaction, this replication scheme is also blocking, but this time in case of crashes of the sequencer. The comparison in terms of liveness guarantees seem therefore quite fair. (Note that it would have been possible to make this replication mechanism non-blocking, at the cost of one extra communication step. But we opted not to do it, to compare protocols more fairly).

When you say make the replication mechanism non-blocking, you are referring to asynchronous communication?

> To evaluate performance we ran the same kind of experiments we used in our recent mail where we evaluted a Primary backup-based replication scheme. All nodes only do write operations, transactions of 10 statements, one of which being a put. Accesses are uniformly distributed to 1K, 10K, 100K data items. Machines are 8cores, 8GB RAM, radargun is using 10 threads per ndoe.

Is your test still forcing deadlocks by working on the same keyset on each node?

> Summary of results (see attached pdf):
> - Throughput, page 1: the plots show significant speedups. Especially at high contention, where 2PC stumbles upon very frequent distributed deadlocks. Note that we enabled deadlock detection in all experiments running 2PC.
> - Abort rate, page 2: abort rates are similar in both protocols (note the logscale on y-axis). But the AB-based solution, avoiding deadlocks, detects the need to abort transactions much faster than 2PC.
> - Average Commit duration, page 3: at high contention, 2PC trashes due to deadlocks (despite the deadlock detection mechanism), and this is reflected in the Avg. commit duration as well. In the other cases (10K and 100K keys), where there is less contention, the commit duration of the two protocols is similar (see next).
> - To bundle or not to bundle?, page 4: the throughput results shown in page 1 have actually a little trick :-P We used "some" bundling ONLY in the case of transactions spanning 100K data items. We noticed that w/o bundling, the commit time duration of the AB-based scheme got way lousier and experimented with a few bundling values till we  got decent performance (See plot at page 4). This is explainable since with 100K keys, being the contention low, less transactions get aborted along their execution. This means that more transaction reach the commit phase, and, in its turn, more atomic broadcasts hit the network layer, and more load for the sequencing node. Bundling, as you can see in page 4, makes the trick however. We did not retry the tests with the other protocols with and w/o bundling, as this would have been very time consuming. BTW, I believe that the results would not have been very different.
>
> Let me open a small parenthesis about bundling: this can be an extremely effective weapon, but manual configuration is a bid headache and extremely time consuming. I believe that it would be very useful to have some self-tuning mechanism for it in JGroups. In fact, we've recently got very nice results using Reinforcement-learning to tackle this problem (well, to be precise, not really the same problem, but a very similar one):
>
>    http://www.inesc-id.pt/ficheiros/publicacoes/7163.pdf
>
> ...but I've implemented the prototype of this solution in Appia, as I knew it much better than JGroups. What do you think about integrating a similar mechanism in JGroup's ergonomics?
>
> Back to replication: we are now working on a similar solution for partial replication (a.k.a. distribution in Infinispan).
Very keen to see this work with distribution.  :-)

Cheers
Manik
--
Manik Surtani
[hidden email]
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Performance Evaluation.pdf (102K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] [Cloudtm-discussion] CloudTM: Additional Atomic broadcast based replication mechanism integrated in Infinispan

Paolo Romano
On 4/17/11 7:55 PM, Manik Surtani wrote:
> Excellent stuff, Paolo and Pedro.  My comments below, inline.  Cc'ing infinispan-dev as well.
Thanks Manik!
> On 16 Apr 2011, at 21:47, Paolo Romano wrote:
>
> ...
> I presume this implementation still plays nice with XA/JTA?  In that transactions marked for rollback, etc are respected?
Kind of :)

We didn't really polish the code to have it fully integrated with
XA/JTA, but this could be done without too many problems.

Basically, if Infinispan is the only resource in a distributed xact, and
it is configured to work in replicated mode, then it could externalize a
1 phase commit interface. In this case, the output of the commit phase
would be that of the AB-based certification, avoiding totally 2PC.

If on the other hand, Infinispan (replicated) needs to be enlisted in a
distributed transaction encompassing other resources, 2PC is
unavoidable. In this case, when an (replicated) Infinispan node receives
a prepare message from some external coordinator it could i) AB-cast the
prepare message to the other replicas, and ii) do the lock acquisition
and write-skew validation to determine the vote to be sent back to the
external coordinator. (Note that all replicas are guaranteed to
determine the same outcome here, given the total order guarantees of the
ABcast and that the certification procedure is deterministic.) The
write-back and lock release should instead be done upon receipt of the
final decision (2nd phase of the 2PC) from the coordinator.

Does this answer your question?

> ...
>> Finally the sequencer is, in fact, a privileged node, as it can commit transactions much faster than the other nodes, as it can self-assign the order in which transactions need to be processed. This may not be very fair at high contention rate, as it gets higher chances of committing transactions, but does make it MUCH faster overall.
>>
>> Concerning blocking scenarios in case of failures. Just like 2PC is blocking in case of crashes of a node coordinating a transaction, this replication scheme is also blocking, but this time in case of crashes of the sequencer. The comparison in terms of liveness guarantees seem therefore quite fair. (Note that it would have been possible to make this replication mechanism non-blocking, at the cost of one extra communication step. But we opted not to do it, to compare protocols more fairly).
> When you say make the replication mechanism non-blocking, you are referring to asynchronous communication?
No. I was referring the need to block to wait for the recovery of a
crashed node in order to determine the outcome of transactions stuck in
their commit phase. 2PC needs to block (or to resort to heuristic
decisions possibly violating Atomicity) upon crash of the corresponding
coordinator node. The implemented AB-based replication mechanism needs
to block in case the sequencer crashes, as he may have ordered and
committed transactions (before crashing) that the other nodes have not
seen.
>> To evaluate performance we ran the same kind of experiments we used in our recent mail where we evaluted a Primary backup-based replication scheme. All nodes only do write operations, transactions of 10 statements, one of which being a put. Accesses are uniformly distributed to 1K, 10K, 100K data items. Machines are 8cores, 8GB RAM, radargun is using 10 threads per ndoe.
> Is your test still forcing deadlocks by working on the same keyset on each node?
Yes. In the test, we've generated transactions with 9 reads and 1 write.
The accesses are distributed uniformly on keysets of sizes {1K, 10K, 100K}
> ...
> Cheers


Cheers

     Paolo
> Manik

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] [Cloudtm-discussion] CloudTM: Additional Atomic broadcast based replication mechanism integrated in Infinispan

Manik Surtani

On 20 Apr 2011, at 18:18, Paolo Romano wrote:

> On 4/17/11 7:55 PM, Manik Surtani wrote:
>> Excellent stuff, Paolo and Pedro.  My comments below, inline.  Cc'ing infinispan-dev as well.
> Thanks Manik!
>> On 16 Apr 2011, at 21:47, Paolo Romano wrote:
>>
>> ...
>> I presume this implementation still plays nice with XA/JTA?  In that transactions marked for rollback, etc are respected?
> Kind of :)
>
> We didn't really polish the code to have it fully integrated with
> XA/JTA, but this could be done without too many problems.
>
> Basically, if Infinispan is the only resource in a distributed xact, and
> it is configured to work in replicated mode, then it could externalize a
> 1 phase commit interface. In this case, the output of the commit phase
> would be that of the AB-based certification, avoiding totally 2PC.
>
> If on the other hand, Infinispan (replicated) needs to be enlisted in a
> distributed transaction encompassing other resources, 2PC is
> unavoidable. In this case, when an (replicated) Infinispan node receives
> a prepare message from some external coordinator it could i) AB-cast the
> prepare message to the other replicas, and ii) do the lock acquisition
> and write-skew validation to determine the vote to be sent back to the
> external coordinator. (Note that all replicas are guaranteed to
> determine the same outcome here, given the total order guarantees of the
> ABcast and that the certification procedure is deterministic.) The
> write-back and lock release should instead be done upon receipt of the
> final decision (2nd phase of the 2PC) from the coordinator.
>
> Does this answer your question?

Yes.  Is there any benefit then of participating in a distributed transaction (with other resources using 2PC) while Infinispan uses AB?  We still have the network roundtrips and hold locks for the entire duration...

--
Manik Surtani
[hidden email]
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] [Cloudtm-discussion] CloudTM: Additional Atomic broadcast based replication mechanism integrated in Infinispan

Paolo Romano
On 4/20/11 7:25 PM, Manik Surtani wrote:

> On 20 Apr 2011, at 18:18, Paolo Romano wrote:
>
>> On 4/17/11 7:55 PM, Manik Surtani wrote:
>>> Excellent stuff, Paolo and Pedro.  My comments below, inline.  Cc'ing infinispan-dev as well.
>> Thanks Manik!
>>> On 16 Apr 2011, at 21:47, Paolo Romano wrote:
>>>
>>> ...
>>> I presume this implementation still plays nice with XA/JTA?  In that transactions marked for rollback, etc are respected?
>> Kind of :)
>>
>> We didn't really polish the code to have it fully integrated with
>> XA/JTA, but this could be done without too many problems.
>>
>> Basically, if Infinispan is the only resource in a distributed xact, and
>> it is configured to work in replicated mode, then it could externalize a
>> 1 phase commit interface. In this case, the output of the commit phase
>> would be that of the AB-based certification, avoiding totally 2PC.
>>
>> If on the other hand, Infinispan (replicated) needs to be enlisted in a
>> distributed transaction encompassing other resources, 2PC is
>> unavoidable. In this case, when an (replicated) Infinispan node receives
>> a prepare message from some external coordinator it could i) AB-cast the
>> prepare message to the other replicas, and ii) do the lock acquisition
>> and write-skew validation to determine the vote to be sent back to the
>> external coordinator. (Note that all replicas are guaranteed to
>> determine the same outcome here, given the total order guarantees of the
>> ABcast and that the certification procedure is deterministic.) The
>> write-back and lock release should instead be done upon receipt of the
>> final decision (2nd phase of the 2PC) from the coordinator.
>>
>> Does this answer your question?
> Yes.  Is there any benefit then of participating in a distributed transaction (with other resources using 2PC) while Infinispan uses AB?  We still have the network roundtrips and hold locks for the entire duration...

In general settings the benefits would be reduced... the main advantage
would be that the replicas of Infinispan would not suffer of deadlocks
and reply faster to the prepare messages.

This can be very effective, however, in those use cases where i)
Infinispan is used as the only in-memory fault-tolerant transactional
data storage, and ii) data is persisted in an asynchronous fashion
towards some back-end storage out of the boundaries of the Infinispan's
transaction.

Cheers

     Paolo

> --
> Manik Surtani
> [hidden email]
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev