[infinispan-dev] SysAdmin operations for recovering transactions

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[infinispan-dev] SysAdmin operations for recovering transactions

Mircea Markus
Hi,

It's about the stage where TM's recovery  process finds a in-doubt transaction and notifies the sys admin about it: what hooks does ISPN provide to the sys admin in order to "fix" the tx.
E.g. step >= 3.3 : http://community.jboss.org/servlet/JiveServlet/showImage/102-16552-14-11811/3_non_originator_failure.png

Here is what I have in mind:

Expose (JMX) two operations:

   //all the params together fully describe a xid.
   replayTx(byte[] txBranch, byte[] txId, int formatId);
   forceRollbackTx(byte[] txBranch, byte[] txId, int formatId);

Here is how these two ops would work:
A. replayTx
    1. the node has locally the PrepareCommand associated with that XID
        - re-issues a prepare: TransactionXAResource.prepare
        - if successful re-issues a commit: TransactionXAResource.commit
        -if failure happens at any step the user is informed and she/he can re-do the JMX call
        - if success the recovery information is removed from the cluster (async)
    2. the node doesn't have the PrepareCommand associated with that XID
        - broadcast ReplayTxCommand (Xid)
        - when a node receives ReplayTxCommand
                - if doesn't have a PreparedCommand associated with the Xid ignores it
                - if has a PreparedCommand...
                        - is it the first in the view that has it [1]?
                                - yes. Execute A.1then returns result to node that broadcasted ReplayTxCommand. This is guaranteed to happen on at most[2] one node in the cluster
                                - no. Ignores it.
        - if success the recovery information is removed from the cluster (async)
B.rollbackTx
   - node broadcasts RollbackCommand
   - each node that has the PrepareCommand forces a rollback
   - each node that doesn't have the PreparedCommand ignores it
   - if success the recovery information is removed from the cluster (async)

Cheers,
Mircea

[1] this is determined by building the set of nodes on which tx spreads, based on tx's state. Then determine the first in the view.
[2] it is possible not to happen on any node as the PrepareCommand might had been removed from all nodes in between (node failures, expiration from the recovery cache).

   


 
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] SysAdmin operations for recovering transactions

Manik Surtani

On 18 Mar 2011, at 12:13, Mircea Markus wrote:

Hi,

It's about the stage where TM's recovery  process finds a in-doubt transaction and notifies the sys admin about it: what hooks does ISPN provide to the sys admin in order to "fix" the tx.
E.g. step >= 3.3 : http://community.jboss.org/servlet/JiveServlet/showImage/102-16552-14-11811/3_non_originator_failure.png

Here is what I have in mind:

Expose (JMX) two operations:

  //all the params together fully describe a xid.
  replayTx(byte[] txBranch, byte[] txId, int formatId);
  forceRollbackTx(byte[] txBranch, byte[] txId, int formatId);

You expect a sysadmin to type a byte array into a JMX console?  :-)  You might get death threats from sysadmins... 

Here is how these two ops would work:
A. replayTx
   1. the node has locally the PrepareCommand associated with that XID
- re-issues a prepare: TransactionXAResource.prepare
- if successful re-issues a commit: TransactionXAResource.commit
       -if failure happens at any step the user is informed and she/he can re-do the JMX call
- if success the recovery information is removed from the cluster (async)
   2. the node doesn't have the PrepareCommand associated with that XID
- broadcast ReplayTxCommand (Xid)
       - when a node receives ReplayTxCommand
- if doesn't have a PreparedCommand associated with the Xid ignores it
- if has a PreparedCommand...
- is it the first in the view that has it [1]?

How does a node know the answer to this question?  Is the list of nodes that holds the prepare replay info stored on the PrepareCommand?

- yes. Execute A.1then returns result to node that broadcasted ReplayTxCommand. This is guaranteed to happen on at most[2] one node in the cluster
- no. Ignores it.
- if success the recovery information is removed from the cluster (async)
B.rollbackTx
  - node broadcasts RollbackCommand
  - each node that has the PrepareCommand forces a rollback
  - each node that doesn't have the PreparedCommand ignores it
  - if success the recovery information is removed from the cluster (async)

Cheers,
Mircea

[1] this is determined by building the set of nodes on which tx spreads, based on tx's state. Then determine the first in the view.
[2] it is possible not to happen on any node as the PrepareCommand might had been removed from all nodes in between (node failures, expiration from the recovery cache).





_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] SysAdmin operations for recovering transactions

Mircea Markus-2




On 18 Mar 2011, at 12:32, Manik Surtani <[hidden email]> wrote:


On 18 Mar 2011, at 12:13, Mircea Markus wrote:

Hi,

It's about the stage where TM's recovery  process finds a in-doubt transaction and notifies the sys admin about it: what hooks does ISPN provide to the sys admin in order to "fix" the tx.
E.g. step >= 3.3 : http://community.jboss.org/servlet/JiveServlet/showImage/102-16552-14-11811/3_non_originator_failure.png

Here is what I have in mind:

Expose (JMX) two operations:

  //all the params together fully describe a xid.
  replayTx(byte[] txBranch, byte[] txId, int formatId);
  forceRollbackTx(byte[] txBranch, byte[] txId, int formatId);

You expect a sysadmin to type a byte array into a JMX console?  :-)  You might get death threats from sysadmins... 
I imagine untraceble threats, right?
String then...

Here is how these two ops would work:
A. replayTx
   1. the node has locally the PrepareCommand associated with that XID
- re-issues a prepare: TransactionXAResource.prepare
- if successful re-issues a commit: TransactionXAResource.commit
       -if failure happens at any step the user is informed and she/he can re-do the JMX call
- if success the recovery information is removed from the cluster (async)
   2. the node doesn't have the PrepareCommand associated with that XID
- broadcast ReplayTxCommand (Xid)
       - when a node receives ReplayTxCommand
- if doesn't have a PreparedCommand associated with the Xid ignores it
- if has a PreparedCommand...
- is it the first in the view that has it [1]?

How does a node know the answer to this question?  Is the list of nodes that holds the prepare replay info stored on the PrepareCommand?
No, [1] explains it

- yes. Execute A.1then returns result to node that broadcasted ReplayTxCommand. This is guaranteed to happen on at most[2] one node in the cluster
- no. Ignores it.
- if success the recovery information is removed from the cluster (async)
B.rollbackTx
  - node broadcasts RollbackCommand
  - each node that has the PrepareCommand forces a rollback
  - each node that doesn't have the PreparedCommand ignores it
  - if success the recovery information is removed from the cluster (async)

Cheers,
Mircea

[1] this is determined by building the set of nodes on which tx spreads, based on tx's state. Then determine the first in the view.
[2] it is possible not to happen on any node as the PrepareCommand might had been removed from all nodes in between (node failures, expiration from the recovery cache).





_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] SysAdmin operations for recovering transactions

Manik Surtani

On 18 Mar 2011, at 12:41, Mircea Markus wrote:





On 18 Mar 2011, at 12:32, Manik Surtani <[hidden email]> wrote:


On 18 Mar 2011, at 12:13, Mircea Markus wrote:

Hi,

It's about the stage where TM's recovery  process finds a in-doubt transaction and notifies the sys admin about it: what hooks does ISPN provide to the sys admin in order to "fix" the tx.
E.g. step >= 3.3 : http://community.jboss.org/servlet/JiveServlet/showImage/102-16552-14-11811/3_non_originator_failure.png

Here is what I have in mind:

Expose (JMX) two operations:

  //all the params together fully describe a xid.
  replayTx(byte[] txBranch, byte[] txId, int formatId);
  forceRollbackTx(byte[] txBranch, byte[] txId, int formatId);

You expect a sysadmin to type a byte array into a JMX console?  :-)  You might get death threats from sysadmins... 
I imagine untraceble threats, right?
String then...

Can an XID be mapped to a String (and vice versa) reliably, in a TransactionManager-independent manner?


Here is how these two ops would work:
A. replayTx
   1. the node has locally the PrepareCommand associated with that XID
- re-issues a prepare: TransactionXAResource.prepare
- if successful re-issues a commit: TransactionXAResource.commit
       -if failure happens at any step the user is informed and she/he can re-do the JMX call
- if success the recovery information is removed from the cluster (async)
   2. the node doesn't have the PrepareCommand associated with that XID
- broadcast ReplayTxCommand (Xid)
       - when a node receives ReplayTxCommand
- if doesn't have a PreparedCommand associated with the Xid ignores it
- if has a PreparedCommand...
- is it the first in the view that has it [1]?

How does a node know the answer to this question?  Is the list of nodes that holds the prepare replay info stored on the PrepareCommand?
No, [1] explains it

Ok, as long as this is deterministic.


- yes. Execute A.1then returns result to node that broadcasted ReplayTxCommand. This is guaranteed to happen on at most[2] one node in the cluster
- no. Ignores it.
- if success the recovery information is removed from the cluster (async)
B.rollbackTx
  - node broadcasts RollbackCommand
  - each node that has the PrepareCommand forces a rollback
  - each node that doesn't have the PreparedCommand ignores it
  - if success the recovery information is removed from the cluster (async)

Cheers,
Mircea

[1] this is determined by building the set of nodes on which tx spreads, based on tx's state. Then determine the first in the view.
[2] it is possible not to happen on any node as the PrepareCommand might had been removed from all nodes in between (node failures, expiration from the recovery cache).





_______________________________________________
infinispan-dev mailing list
[hidden email][hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] SysAdmin operations for recovering transactions

Mircea Markus

On 18 Mar 2011, at 12:47, Manik Surtani wrote:


On 18 Mar 2011, at 12:41, Mircea Markus wrote:





On 18 Mar 2011, at 12:32, Manik Surtani <[hidden email]> wrote:


On 18 Mar 2011, at 12:13, Mircea Markus wrote:

Hi,

It's about the stage where TM's recovery  process finds a in-doubt transaction and notifies the sys admin about it: what hooks does ISPN provide to the sys admin in order to "fix" the tx.
E.g. step >= 3.3 : http://community.jboss.org/servlet/JiveServlet/showImage/102-16552-14-11811/3_non_originator_failure.png

Here is what I have in mind:

Expose (JMX) two operations:

  //all the params together fully describe a xid.
  replayTx(byte[] txBranch, byte[] txId, int formatId);
  forceRollbackTx(byte[] txBranch, byte[] txId, int formatId);

You expect a sysadmin to type a byte array into a JMX console?  :-)  You might get death threats from sysadmins... 
I imagine untraceble threats, right?
String then...

Can an XID be mapped to a String (and vice versa) reliably, in a TransactionManager-independent manner?
Xid can be reliably mapped to (byte[] txBranch, byte[] txId, int formatId). The only part left is converting a String (as received from JMX operation) to the corresponding  byte[]. Seems doable.


Here is how these two ops would work:
A. replayTx
   1. the node has locally the PrepareCommand associated with that XID
- re-issues a prepare: TransactionXAResource.prepare
- if successful re-issues a commit: TransactionXAResource.commit
       -if failure happens at any step the user is informed and she/he can re-do the JMX call
- if success the recovery information is removed from the cluster (async)
   2. the node doesn't have the PrepareCommand associated with that XID
- broadcast ReplayTxCommand (Xid)
       - when a node receives ReplayTxCommand
- if doesn't have a PreparedCommand associated with the Xid ignores it
- if has a PreparedCommand...
- is it the first in the view that has it [1]?

How does a node know the answer to this question?  Is the list of nodes that holds the prepare replay info stored on the PrepareCommand?
No, [1] explains it

Ok, as long as this is deterministic.
It is, see [1] :-)


- yes. Execute A.1then returns result to node that broadcasted ReplayTxCommand. This is guaranteed to happen on at most[2] one node in the cluster
- no. Ignores it.
- if success the recovery information is removed from the cluster (async)
B.rollbackTx
  - node broadcasts RollbackCommand
  - each node that has the PrepareCommand forces a rollback
  - each node that doesn't have the PreparedCommand ignores it
  - if success the recovery information is removed from the cluster (async)

Cheers,
Mircea

[1] this is determined by building the set of nodes on which tx spreads, based on tx's state. Then determine the first in the view.
[2] it is possible not to happen on any node as the PrepareCommand might had been removed from all nodes in between (node failures, expiration from the recovery cache).





_______________________________________________
infinispan-dev mailing list
[hidden email][hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev