[infinispan-dev] Infinispan and change data capture

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[infinispan-dev] Infinispan and change data capture

Randall Hauch
The Debezium project [1] is working on building change data capture connectors for a variety of databases. MySQL is available now, MongoDB will be soon, and PostgreSQL and Oracle are next on our roadmap. 

One way in which Debezium and Infinispan can be used together is when Infinispan is being used as a cache for data stored in a database. In this case, Debezium can capture the changes to the database and produce a stream of events; a separate process can consume these change and evict entries from an Infinispan cache.

If Infinispan is to be used as a data store, then it would be useful for Debezium to be able to capture those changes so other apps/services can consume the changes. First of all, does this make sense? Secondly, if it does, then Debezium would need an Infinispan connector, and it’s not clear to me how that connector might capture the changes from Infinispan.

Debezium typically monitors the log of transactions/changes that are committed to a database. Of course how this works varies for each type of database. For example, MySQL internally produces a transaction log that contains information about every committed row change, and MySQL ensures that every committed change is included and that non-committed changes are excluded. The MySQL mechanism is actually part of the replication mechanism, so slaves update their internal state by reading the master’s log. The Debezium MySQL connector [2] simply reads the same log.

Infinispan has several mechanisms that may be useful:

  • Interceptors - See [3]. This seems pretty straightforward and IIUC provides access to all internal operations. However, it’s not clear to me whether a single interceptor will see all the changes in a cluster (perhaps in local and replicated modes) or only those changes that happen on that particular node (in distributed mode). It’s also not clear whether this interceptor is called within the context of the cache’s transaction, so if a failure happens just at the wrong time whether a change might be made to the cache but is not seen by the interceptor (or vice versa).
  • Cross-site replication - See [4][5]. A potential advantage of this mechanism appears to be that it is defined (more) globally, and it appears to function if the remote backup comes back online after being offline for a period of time.
  • State transfer - is it possible to participate as a non-active member of the cluster, and to effectively read all state transfer activities that occur within the cluster?
  • Cache store - tie into the cache store mechanism, perhaps by wrapping an existing cache store and sitting between the cache and the cache store
  • Monitor the cache store - don’t monitor Infinispan at all, and instead monitor the store in which Infinispan is storing entries. (This is probably the least attractive, since some stores can’t be monitored, or because the store is persisting an opaque binary value.)

Are there other mechanism that might be used?

There are a couple of important requirements for change data capture to be able to work correctly:

  1. Upon initial connection, the CDC connector must be able to obtain a snapshot of all existing data, followed by seeing all changes to data that may have occurred since the snapshot was started. If the connector is stopped/fails, upon restart it needs to be able to reconnect and either see all changes that occurred since it last was capturing changes, or perform a snapshot. (Performing a snapshot upon restart is very inefficient and undesirable.) This works as follows: the CDC connector only records the “offset” in the source’s sequence of events; what this “offset” entails depends on the source. Upon restart, the connector can use this offset information to coordinate with the source where it wants to start reading. (In MySQL and PostgreSQL, every event includes the filename of the log and position in that file. MongoDB includes in each event the monotonically increasing timestamp of the transaction.
  2. No change can be missed, even when things go wrong and components crash.
  3. When a new entry is added, the “after” state of the entity will be included. When an entry is updated, the “after” state will be included in the event; if possible, the event should also include the “before” state. When an entry is removed, the “before” state should be included in the event.

Any thoughts or advice would be greatly appreciated.

Best regards,

Randall



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Adrian Nistor
Hi Randall,

Infinispan supports both push and pull access models. The push model is supported by events (and listeners), which are cluster wide and are available in both library and remote mode (hotrod). The notification system is pretty advanced as there is a filtering mechanism available that can use a hand coded filter / converter or one specified in jpql (experimental atm). Getting a snapshot of the initial data is also possible. But infinispan does not produce a transaction log to be used for determining all changes that happened since a previous connection time, so you'll always have to get a new full snapshot when re-connecting.

So if Infinispan is the data store I would base the Debezium connector implementation on Infinispan's event notification system. Not sure about the other use case though.

Adrian

On 07/09/2016 04:38 PM, Randall Hauch wrote:
The Debezium project [1] is working on building change data capture connectors for a variety of databases. MySQL is available now, MongoDB will be soon, and PostgreSQL and Oracle are next on our roadmap. 

One way in which Debezium and Infinispan can be used together is when Infinispan is being used as a cache for data stored in a database. In this case, Debezium can capture the changes to the database and produce a stream of events; a separate process can consume these change and evict entries from an Infinispan cache.

If Infinispan is to be used as a data store, then it would be useful for Debezium to be able to capture those changes so other apps/services can consume the changes. First of all, does this make sense? Secondly, if it does, then Debezium would need an Infinispan connector, and it’s not clear to me how that connector might capture the changes from Infinispan.

Debezium typically monitors the log of transactions/changes that are committed to a database. Of course how this works varies for each type of database. For example, MySQL internally produces a transaction log that contains information about every committed row change, and MySQL ensures that every committed change is included and that non-committed changes are excluded. The MySQL mechanism is actually part of the replication mechanism, so slaves update their internal state by reading the master’s log. The Debezium MySQL connector [2] simply reads the same log.

Infinispan has several mechanisms that may be useful:

  • Interceptors - See [3]. This seems pretty straightforward and IIUC provides access to all internal operations. However, it’s not clear to me whether a single interceptor will see all the changes in a cluster (perhaps in local and replicated modes) or only those changes that happen on that particular node (in distributed mode). It’s also not clear whether this interceptor is called within the context of the cache’s transaction, so if a failure happens just at the wrong time whether a change might be made to the cache but is not seen by the interceptor (or vice versa).
  • Cross-site replication - See [4][5]. A potential advantage of this mechanism appears to be that it is defined (more) globally, and it appears to function if the remote backup comes back online after being offline for a period of time.
  • State transfer - is it possible to participate as a non-active member of the cluster, and to effectively read all state transfer activities that occur within the cluster?
  • Cache store - tie into the cache store mechanism, perhaps by wrapping an existing cache store and sitting between the cache and the cache store
  • Monitor the cache store - don’t monitor Infinispan at all, and instead monitor the store in which Infinispan is storing entries. (This is probably the least attractive, since some stores can’t be monitored, or because the store is persisting an opaque binary value.)

Are there other mechanism that might be used?

There are a couple of important requirements for change data capture to be able to work correctly:

  1. Upon initial connection, the CDC connector must be able to obtain a snapshot of all existing data, followed by seeing all changes to data that may have occurred since the snapshot was started. If the connector is stopped/fails, upon restart it needs to be able to reconnect and either see all changes that occurred since it last was capturing changes, or perform a snapshot. (Performing a snapshot upon restart is very inefficient and undesirable.) This works as follows: the CDC connector only records the “offset” in the source’s sequence of events; what this “offset” entails depends on the source. Upon restart, the connector can use this offset information to coordinate with the source where it wants to start reading. (In MySQL and PostgreSQL, every event includes the filename of the log and position in that file. MongoDB includes in each event the monotonically increasing timestamp of the transaction.
  2. No change can be missed, even when things go wrong and components crash.
  3. When a new entry is added, the “after” state of the entity will be included. When an entry is updated, the “after” state will be included in the event; if possible, the event should also include the “before” state. When an entry is removed, the “before” state should be included in the event.

Any thoughts or advice would be greatly appreciated.

Best regards,

Randall




_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Randall Hauch

On Jul 11, 2016, at 3:42 AM, Adrian Nistor <[hidden email]> wrote:

Hi Randall,

Infinispan supports both push and pull access models. The push model is supported by events (and listeners), which are cluster wide and are available in both library and remote mode (hotrod). The notification system is pretty advanced as there is a filtering mechanism available that can use a hand coded filter / converter or one specified in jpql (experimental atm). Getting a snapshot of the initial data is also possible. But infinispan does not produce a transaction log to be used for determining all changes that happened since a previous connection time, so you'll always have to get a new full snapshot when re-connecting.

So if Infinispan is the data store I would base the Debezium connector implementation on Infinispan's event notification system. Not sure about the other use case though.


Thanks, Adrian, for the feedback. A couple of questions.

You mentioned Infinispan has a pull model — is this just using the normal API to read the entries?

With event listeners, a single connection will receive all of the events that occur in the cluster, correct? Is it possible (e.g., a very unfortunately timed crash) for a change to be made to the cache without an event being produced and sent to listeners? What happens if the network fails or partitions? How does cross site replication address this?

Has there been any thought about adding to Infinispan a write ahead log or transaction log to each node or, better yet, for the whole cluster?

Thanks again!

Adrian

On 07/09/2016 04:38 PM, Randall Hauch wrote:
The Debezium project [1] is working on building change data capture connectors for a variety of databases. MySQL is available now, MongoDB will be soon, and PostgreSQL and Oracle are next on our roadmap. 

One way in which Debezium and Infinispan can be used together is when Infinispan is being used as a cache for data stored in a database. In this case, Debezium can capture the changes to the database and produce a stream of events; a separate process can consume these change and evict entries from an Infinispan cache.

If Infinispan is to be used as a data store, then it would be useful for Debezium to be able to capture those changes so other apps/services can consume the changes. First of all, does this make sense? Secondly, if it does, then Debezium would need an Infinispan connector, and it’s not clear to me how that connector might capture the changes from Infinispan.

Debezium typically monitors the log of transactions/changes that are committed to a database. Of course how this works varies for each type of database. For example, MySQL internally produces a transaction log that contains information about every committed row change, and MySQL ensures that every committed change is included and that non-committed changes are excluded. The MySQL mechanism is actually part of the replication mechanism, so slaves update their internal state by reading the master’s log. The Debezium MySQL connector [2] simply reads the same log.

Infinispan has several mechanisms that may be useful:

  • Interceptors - See [3]. This seems pretty straightforward and IIUC provides access to all internal operations. However, it’s not clear to me whether a single interceptor will see all the changes in a cluster (perhaps in local and replicated modes) or only those changes that happen on that particular node (in distributed mode). It’s also not clear whether this interceptor is called within the context of the cache’s transaction, so if a failure happens just at the wrong time whether a change might be made to the cache but is not seen by the interceptor (or vice versa).
  • Cross-site replication - See [4][5]. A potential advantage of this mechanism appears to be that it is defined (more) globally, and it appears to function if the remote backup comes back online after being offline for a period of time.
  • State transfer - is it possible to participate as a non-active member of the cluster, and to effectively read all state transfer activities that occur within the cluster?
  • Cache store - tie into the cache store mechanism, perhaps by wrapping an existing cache store and sitting between the cache and the cache store
  • Monitor the cache store - don’t monitor Infinispan at all, and instead monitor the store in which Infinispan is storing entries. (This is probably the least attractive, since some stores can’t be monitored, or because the store is persisting an opaque binary value.)

Are there other mechanism that might be used?

There are a couple of important requirements for change data capture to be able to work correctly:

  1. Upon initial connection, the CDC connector must be able to obtain a snapshot of all existing data, followed by seeing all changes to data that may have occurred since the snapshot was started. If the connector is stopped/fails, upon restart it needs to be able to reconnect and either see all changes that occurred since it last was capturing changes, or perform a snapshot. (Performing a snapshot upon restart is very inefficient and undesirable.) This works as follows: the CDC connector only records the “offset” in the source’s sequence of events; what this “offset” entails depends on the source. Upon restart, the connector can use this offset information to coordinate with the source where it wants to start reading. (In MySQL and PostgreSQL, every event includes the filename of the log and position in that file. MongoDB includes in each event the monotonically increasing timestamp of the transaction.
  2. No change can be missed, even when things go wrong and components crash.
  3. When a new entry is added, the “after” state of the entity will be included. When an entry is updated, the “after” state will be included in the event; if possible, the event should also include the “before” state. When an entry is removed, the “before” state should be included in the event.

Any thoughts or advice would be greatly appreciated.

Best regards,

Randall




_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Galder Zamarreño

--
Galder Zamarreño
Infinispan, Red Hat

> On 11 Jul 2016, at 16:41, Randall Hauch <[hidden email]> wrote:
>
>>
>> On Jul 11, 2016, at 3:42 AM, Adrian Nistor <[hidden email]> wrote:
>>
>> Hi Randall,
>>
>> Infinispan supports both push and pull access models. The push model is supported by events (and listeners), which are cluster wide and are available in both library and remote mode (hotrod). The notification system is pretty advanced as there is a filtering mechanism available that can use a hand coded filter / converter or one specified in jpql (experimental atm). Getting a snapshot of the initial data is also possible. But infinispan does not produce a transaction log to be used for determining all changes that happened since a previous connection time, so you'll always have to get a new full snapshot when re-connecting.
>>
>> So if Infinispan is the data store I would base the Debezium connector implementation on Infinispan's event notification system. Not sure about the other use case though.
>>
>
> Thanks, Adrian, for the feedback. A couple of questions.
>
> You mentioned Infinispan has a pull model — is this just using the normal API to read the entries?
>
> With event listeners, a single connection will receive all of the events that occur in the cluster, correct? Is it possible (e.g., a very unfortunately timed crash) for a change to be made to the cache without an event being produced and sent to listeners?

^ Yeah, that can happen due to async nature of remote events. However, there's the possibility for clients, upon receiving a new topology, to receive the current state of the server as events, see [1] and [2]

[1] http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_listener_state_consumption
[2] http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_listener_failure_handling

> What happens if the network fails or partitions? How does cross site replication address this?

In terms of cross-site, depends what the client is connected to. Clients can now failover between sites, so they should be able to deal with events too in the same as explained above.

>
> Has there been any thought about adding to Infinispan a write ahead log or transaction log to each node or, better yet, for the whole cluster?

Not that I'm aware of but we've recently added security audit log, so a transaction log might make sense too.

Cheers,

>
> Thanks again!
>
>> Adrian
>>
>> On 07/09/2016 04:38 PM, Randall Hauch wrote:
>>> The Debezium project [1] is working on building change data capture connectors for a variety of databases. MySQL is available now, MongoDB will be soon, and PostgreSQL and Oracle are next on our roadmap.
>>>
>>> One way in which Debezium and Infinispan can be used together is when Infinispan is being used as a cache for data stored in a database. In this case, Debezium can capture the changes to the database and produce a stream of events; a separate process can consume these change and evict entries from an Infinispan cache.
>>>
>>> If Infinispan is to be used as a data store, then it would be useful for Debezium to be able to capture those changes so other apps/services can consume the changes. First of all, does this make sense? Secondly, if it does, then Debezium would need an Infinispan connector, and it’s not clear to me how that connector might capture the changes from Infinispan.
>>>
>>> Debezium typically monitors the log of transactions/changes that are committed to a database. Of course how this works varies for each type of database. For example, MySQL internally produces a transaction log that contains information about every committed row change, and MySQL ensures that every committed change is included and that non-committed changes are excluded. The MySQL mechanism is actually part of the replication mechanism, so slaves update their internal state by reading the master’s log. The Debezium MySQL connector [2] simply reads the same log.
>>>
>>> Infinispan has several mechanisms that may be useful:
>>>
>>> • Interceptors - See [3]. This seems pretty straightforward and IIUC provides access to all internal operations. However, it’s not clear to me whether a single interceptor will see all the changes in a cluster (perhaps in local and replicated modes) or only those changes that happen on that particular node (in distributed mode). It’s also not clear whether this interceptor is called within the context of the cache’s transaction, so if a failure happens just at the wrong time whether a change might be made to the cache but is not seen by the interceptor (or vice versa).
>>> • Cross-site replication - See [4][5]. A potential advantage of this mechanism appears to be that it is defined (more) globally, and it appears to function if the remote backup comes back online after being offline for a period of time.
>>> • State transfer - is it possible to participate as a non-active member of the cluster, and to effectively read all state transfer activities that occur within the cluster?
>>> • Cache store - tie into the cache store mechanism, perhaps by wrapping an existing cache store and sitting between the cache and the cache store
>>> • Monitor the cache store - don’t monitor Infinispan at all, and instead monitor the store in which Infinispan is storing entries. (This is probably the least attractive, since some stores can’t be monitored, or because the store is persisting an opaque binary value.)
>>>
>>> Are there other mechanism that might be used?
>>>
>>> There are a couple of important requirements for change data capture to be able to work correctly:
>>>
>>> • Upon initial connection, the CDC connector must be able to obtain a snapshot of all existing data, followed by seeing all changes to data that may have occurred since the snapshot was started. If the connector is stopped/fails, upon restart it needs to be able to reconnect and either see all changes that occurred since it last was capturing changes, or perform a snapshot.               (Performing a snapshot upon restart is very inefficient and undesirable.) This works as follows: the CDC connector only records the “offset” in the source’s sequence of events; what this “offset” entails depends on the source. Upon restart, the connector can use this offset information to coordinate with the source where it wants to start reading. (In MySQL and PostgreSQL, every event includes the filename of the log and position in that file. MongoDB includes in each event the monotonically increasing timestamp of the transaction.
>>> • No change can be missed, even when things go wrong and components crash.
>>> • When a new entry is added, the “after” state of the entity will be included. When an entry is updated, the “after” state will be included in the event; if possible, the event should also include the “before” state. When an entry is removed, the “before” state should be included in the event.
>>>
>>> Any thoughts or advice would be greatly appreciated.
>>>
>>> Best regards,
>>>
>>> Randall
>>>
>>>
>>> [1] http://debezium.io
>>> [2] http://debezium.io/docs/connectors/mysql/
>>> [3] http://infinispan.org/docs/stable/user_guide/user_guide.html#_custom_interceptors_chapter
>>> [4] http://infinispan.org/docs/stable/user_guide/user_guide.html#CrossSiteReplication
>>> [5] https://github.com/infinispan/infinispan/wiki/Design-For-Cross-Site-Replication
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>>
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Randall Hauch
Reviving this old thread, and as before I appreciate any help the Infinispan community might provide. There definitely is interest in Debezium capturing the changes being made to an Infinispan cluster. This isn’t as important when Infinispan is used as a cache, but when Infinispan is used as a store then it is important for other apps/services to be able to accurately keep up with the changes being made in the store.

On Jul 29, 2016, at 8:47 AM, Galder Zamarreño <[hidden email]> wrote:


--
Galder Zamarreño
Infinispan, Red Hat

On 11 Jul 2016, at 16:41, Randall Hauch <[hidden email]> wrote:


On Jul 11, 2016, at 3:42 AM, Adrian Nistor <[hidden email]> wrote:

Hi Randall,

Infinispan supports both push and pull access models. The push model is supported by events (and listeners), which are cluster wide and are available in both library and remote mode (hotrod). The notification system is pretty advanced as there is a filtering mechanism available that can use a hand coded filter / converter or one specified in jpql (experimental atm). Getting a snapshot of the initial data is also possible. But infinispan does not produce a transaction log to be used for determining all changes that happened since a previous connection time, so you'll always have to get a new full snapshot when re-connecting. 

So if Infinispan is the data store I would base the Debezium connector implementation on Infinispan's event notification system. Not sure about the other use case though.


Thanks, Adrian, for the feedback. A couple of questions.

You mentioned Infinispan has a pull model — is this just using the normal API to read the entries?

With event listeners, a single connection will receive all of the events that occur in the cluster, correct? Is it possible (e.g., a very unfortunately timed crash) for a change to be made to the cache without an event being produced and sent to listeners?

^ Yeah, that can happen due to async nature of remote events. However, there's the possibility for clients, upon receiving a new topology, to receive the current state of the server as events, see [1] and [2]

[1] http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_listener_state_consumption
[2] http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_listener_failure_handling

It is critical that any change event stream is consistent with the store, and the change event stream is worthless without it. Only when the change event stream is an accurate representation of what changed can downstream consumers use the stream to rebuild their own perfect copy of the upstream store and to keep those copies consistent with the upstream store.

So, given that the events are handled asynchronously, in a cluster how are multiple changes to a single entry handled. For example, if a client sets entry <A,Foo>, then a short time after that (or another) client sets entry <A,Bar>, is it guaranteed that a client listening to events will see <A,Foo> first and <A,Bar> some time later? Or is it possible that a client listening might first see <A,Bar> and then <A,Foo>?


What happens if the network fails or partitions? How does cross site replication address this?

In terms of cross-site, depends what the client is connected to. Clients can now failover between sites, so they should be able to deal with events too in the same as explained above.


Has there been any thought about adding to Infinispan a write ahead log or transaction log to each node or, better yet, for the whole cluster?

Not that I'm aware of but we've recently added security audit log, so a transaction log might make sense too.

Without a transaction log, Debezium would have to use a client listener with includeCurrentState=true to obtain the state every time it reconnects. If Debezium just included all of this state in the event stream, then the stream might contain lots of superfluous or unnecessary events, then this impacts all downstream consumers by forcing them to spend a lot of time processing changes that never really happened. So the only way to avoid that would be for Debezium to use an external store to track the changes it has seen so far so that it doesn’t include unnecessary events in the change event stream. It’d be a shame to have to require this much infrastructure.

A transaction log would really be a great way to solve this problem. Has there been any more thought about Infinispan using and exposing a transaction log? Or perhaps Infinispan could record the changes in a Kafka topic directly?

(I guess if the Infinispan cache used relational database(s) as a cache store(s), then Debezium could just capture the changes from there. That seems like a big constraint, though.)

Thoughts?


Cheers,


Thanks again!

Adrian

On 07/09/2016 04:38 PM, Randall Hauch wrote:
The Debezium project [1] is working on building change data capture connectors for a variety of databases. MySQL is available now, MongoDB will be soon, and PostgreSQL and Oracle are next on our roadmap. 

One way in which Debezium and Infinispan can be used together is when Infinispan is being used as a cache for data stored in a database. In this case, Debezium can capture the changes to the database and produce a stream of events; a separate process can consume these change and evict entries from an Infinispan cache.

If Infinispan is to be used as a data store, then it would be useful for Debezium to be able to capture those changes so other apps/services can consume the changes. First of all, does this make sense? Secondly, if it does, then Debezium would need an Infinispan connector, and it’s not clear to me how that connector might capture the changes from Infinispan.

Debezium typically monitors the log of transactions/changes that are committed to a database. Of course how this works varies for each type of database. For example, MySQL internally produces a transaction log that contains information about every committed row change, and MySQL ensures that every committed change is included and that non-committed changes are excluded. The MySQL mechanism is actually part of the replication mechanism, so slaves update their internal state by reading the master’s log. The Debezium MySQL connector [2] simply reads the same log.

Infinispan has several mechanisms that may be useful:

• Interceptors - See [3]. This seems pretty straightforward and IIUC provides access to all internal operations. However, it’s not clear to me whether a single interceptor will see all the changes in a cluster (perhaps in local and replicated modes) or only those changes that happen on that particular node (in distributed mode). It’s also not clear whether this interceptor is called within the context of the cache’s transaction, so if a failure happens just at the wrong time whether a change might be made to the cache but is not seen by the interceptor (or vice versa).
• Cross-site replication - See [4][5]. A potential advantage of this mechanism appears to be that it is defined (more) globally, and it appears to function if the remote backup comes back online after being offline for a period of time.
• State transfer - is it possible to participate as a non-active member of the cluster, and to effectively read all state transfer activities that occur within the cluster?
• Cache store - tie into the cache store mechanism, perhaps by wrapping an existing cache store and sitting between the cache and the cache store
• Monitor the cache store - don’t monitor Infinispan at all, and instead monitor the store in which Infinispan is storing entries. (This is probably the least attractive, since some stores can’t be monitored, or because the store is persisting an opaque binary value.)

Are there other mechanism that might be used?

There are a couple of important requirements for change data capture to be able to work correctly:

• Upon initial connection, the CDC connector must be able to obtain a snapshot of all existing data, followed by seeing all changes to data that may have occurred since the snapshot was started. If the connector is stopped/fails, upon restart it needs to be able to reconnect and either see all changes that occurred since it last was capturing changes, or perform a snapshot.               (Performing a snapshot upon restart is very inefficient and undesirable.) This works as follows: the CDC connector only records the “offset” in the source’s sequence of events; what this “offset” entails depends on the source. Upon restart, the connector can use this offset information to coordinate with the source where it wants to start reading. (In MySQL and PostgreSQL, every event includes the filename of the log and position in that file. MongoDB includes in each event the monotonically increasing timestamp of the transaction.
• No change can be missed, even when things go wrong and components crash.
• When a new entry is added, the “after” state of the entity will be included. When an entry is updated, the “after” state will be included in the event; if possible, the event should also include the “before” state. When an entry is removed, the “before” state should be included in the event.

Any thoughts or advice would be greatly appreciated.

Best regards,

Randall


[1] http://debezium.io
[2] http://debezium.io/docs/connectors/mysql/
[3] http://infinispan.org/docs/stable/user_guide/user_guide.html#_custom_interceptors_chapter
[4] http://infinispan.org/docs/stable/user_guide/user_guide.html#CrossSiteReplication
[5] https://github.com/infinispan/infinispan/wiki/Design-For-Cross-Site-Replication


_______________________________________________
infinispan-dev mailing list

[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Gustavo Fernandes-2
On Wed, Dec 7, 2016 at 9:20 PM, Randall Hauch <[hidden email]> wrote:
Reviving this old thread, and as before I appreciate any help the Infinispan community might provide. There definitely is interest in Debezium capturing the changes being made to an Infinispan cluster. This isn’t as important when Infinispan is used as a cache, but when Infinispan is used as a store then it is important for other apps/services to be able to accurately keep up with the changes being made in the store.

On Jul 29, 2016, at 8:47 AM, Galder Zamarreño <[hidden email]> wrote:


--
Galder Zamarreño
Infinispan, Red Hat

On 11 Jul 2016, at 16:41, Randall Hauch <[hidden email]> wrote:


On Jul 11, 2016, at 3:42 AM, Adrian Nistor <[hidden email]> wrote:

Hi Randall,

Infinispan supports both push and pull access models. The push model is supported by events (and listeners), which are cluster wide and are available in both library and remote mode (hotrod). The notification system is pretty advanced as there is a filtering mechanism available that can use a hand coded filter / converter or one specified in jpql (experimental atm). Getting a snapshot of the initial data is also possible. But infinispan does not produce a transaction log to be used for determining all changes that happened since a previous connection time, so you'll always have to get a new full snapshot when re-connecting. 

So if Infinispan is the data store I would base the Debezium connector implementation on Infinispan's event notification system. Not sure about the other use case though.


Thanks, Adrian, for the feedback. A couple of questions.

You mentioned Infinispan has a pull model — is this just using the normal API to read the entries?

With event listeners, a single connection will receive all of the events that occur in the cluster, correct? Is it possible (e.g., a very unfortunately timed crash) for a change to be made to the cache without an event being produced and sent to listeners?

^ Yeah, that can happen due to async nature of remote events. However, there's the possibility for clients, upon receiving a new topology, to receive the current state of the server as events, see [1] and [2]

[1] http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_listener_state_consumption
[2] http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_listener_failure_handling

It is critical that any change event stream is consistent with the store, and the change event stream is worthless without it. Only when the change event stream is an accurate representation of what changed can downstream consumers use the stream to rebuild their own perfect copy of the upstream store and to keep those copies consistent with the upstream store.

So, given that the events are handled asynchronously, in a cluster how are multiple changes to a single entry handled. For example, if a client sets entry <A,Foo>, then a short time after that (or another) client sets entry <A,Bar>, is it guaranteed that a client listening to events will see <A,Foo> first and <A,Bar> some time later? Or is it possible that a client listening might first see <A,Bar> and then <A,Foo>?


What happens if the network fails or partitions? How does cross site replication address this?

In terms of cross-site, depends what the client is connected to. Clients can now failover between sites, so they should be able to deal with events too in the same as explained above.


Has there been any thought about adding to Infinispan a write ahead log or transaction log to each node or, better yet, for the whole cluster?

Not that I'm aware of but we've recently added security audit log, so a transaction log might make sense too.

Without a transaction log, Debezium would have to use a client listener with includeCurrentState=true to obtain the state every time it reconnects. If Debezium just included all of this state in the event stream, then the stream might contain lots of superfluous or unnecessary events, then this impacts all downstream consumers by forcing them to spend a lot of time processing changes that never really happened. So the only way to avoid that would be for Debezium to use an external store to track the changes it has seen so far so that it doesn’t include unnecessary events in the change event stream. It’d be a shame to have to require this much infrastructure.

A transaction log would really be a great way to solve this problem. Has there been any more thought about Infinispan using and exposing a transaction log? Or perhaps Infinispan could record the changes in a Kafka topic directly?

(I guess if the Infinispan cache used relational database(s) as a cache store(s), then Debezium could just capture the changes from there. That seems like a big constraint, though.)

Thoughts?


I recently updated a proposal [1] based on several discussions we had in the past that is essentially about introducing an event storage mechanism (write ahead log) in order to improve reliability, failover and "replayability" for the remote listeners, any feedback greatly appreciated.


[1] https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvement-proposal

Thanks,
Gustavo


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Randall Hauch
On Dec 8, 2016, at 3:13 AM, Gustavo Fernandes <[hidden email]> wrote:

On Wed, Dec 7, 2016 at 9:20 PM, Randall Hauch <[hidden email]> wrote:
Reviving this old thread, and as before I appreciate any help the Infinispan community might provide. There definitely is interest in Debezium capturing the changes being made to an Infinispan cluster. This isn’t as important when Infinispan is used as a cache, but when Infinispan is used as a store then it is important for other apps/services to be able to accurately keep up with the changes being made in the store.

On Jul 29, 2016, at 8:47 AM, Galder Zamarreño <[hidden email]> wrote:


--
Galder Zamarreño
Infinispan, Red Hat

On 11 Jul 2016, at 16:41, Randall Hauch <[hidden email]> wrote:


On Jul 11, 2016, at 3:42 AM, Adrian Nistor <[hidden email]> wrote:

Hi Randall,

Infinispan supports both push and pull access models. The push model is supported by events (and listeners), which are cluster wide and are available in both library and remote mode (hotrod). The notification system is pretty advanced as there is a filtering mechanism available that can use a hand coded filter / converter or one specified in jpql (experimental atm). Getting a snapshot of the initial data is also possible. But infinispan does not produce a transaction log to be used for determining all changes that happened since a previous connection time, so you'll always have to get a new full snapshot when re-connecting. 

So if Infinispan is the data store I would base the Debezium connector implementation on Infinispan's event notification system. Not sure about the other use case though.


Thanks, Adrian, for the feedback. A couple of questions.

You mentioned Infinispan has a pull model — is this just using the normal API to read the entries?

With event listeners, a single connection will receive all of the events that occur in the cluster, correct? Is it possible (e.g., a very unfortunately timed crash) for a change to be made to the cache without an event being produced and sent to listeners?

^ Yeah, that can happen due to async nature of remote events. However, there's the possibility for clients, upon receiving a new topology, to receive the current state of the server as events, see [1] and [2]

[1] http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_listener_state_consumption
[2] http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_listener_failure_handling

It is critical that any change event stream is consistent with the store, and the change event stream is worthless without it. Only when the change event stream is an accurate representation of what changed can downstream consumers use the stream to rebuild their own perfect copy of the upstream store and to keep those copies consistent with the upstream store.

So, given that the events are handled asynchronously, in a cluster how are multiple changes to a single entry handled. For example, if a client sets entry <A,Foo>, then a short time after that (or another) client sets entry <A,Bar>, is it guaranteed that a client listening to events will see <A,Foo> first and <A,Bar> some time later? Or is it possible that a client listening might first see <A,Bar> and then <A,Foo>?


What happens if the network fails or partitions? How does cross site replication address this?

In terms of cross-site, depends what the client is connected to. Clients can now failover between sites, so they should be able to deal with events too in the same as explained above.


Has there been any thought about adding to Infinispan a write ahead log or transaction log to each node or, better yet, for the whole cluster?

Not that I'm aware of but we've recently added security audit log, so a transaction log might make sense too.

Without a transaction log, Debezium would have to use a client listener with includeCurrentState=true to obtain the state every time it reconnects. If Debezium just included all of this state in the event stream, then the stream might contain lots of superfluous or unnecessary events, then this impacts all downstream consumers by forcing them to spend a lot of time processing changes that never really happened. So the only way to avoid that would be for Debezium to use an external store to track the changes it has seen so far so that it doesn’t include unnecessary events in the change event stream. It’d be a shame to have to require this much infrastructure.

A transaction log would really be a great way to solve this problem. Has there been any more thought about Infinispan using and exposing a transaction log? Or perhaps Infinispan could record the changes in a Kafka topic directly?

(I guess if the Infinispan cache used relational database(s) as a cache store(s), then Debezium could just capture the changes from there. That seems like a big constraint, though.)

Thoughts?


I recently updated a proposal [1] based on several discussions we had in the past that is essentially about introducing an event storage mechanism (write ahead log) in order to improve reliability, failover and "replayability" for the remote listeners, any feedback greatly appreciated.


[1] https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvement-proposal


Hi, Gustavo. Thanks for the response. I like the proposal a lot, and have a few specific comments and questions. Let me know if there is a better forum for this feedback.

  1. It is smart to require the application using the HotRod client to know/manage the id of the latest event it has seen. This allows an application to restart from where it left off, but it also allows the application to replay some events if needed. For example, an application may “fully-process” an event asynchronously from the “handle” method (e.g., the event handler method just puts the event into a queue and immediately returns), so only the application knows which ids it has fully processed. If anything goes wrong, the client is in full control over where it wants to restart.
  2. When a client first registers, it should always obtain the id of the most recent event in the log. When using "includeState=true”, the client will first receive the state of all entries, and then needs to start reading events from the point at which the state transfer started (this is the only way to ensure that every change is seen at least once).
  3. It must be possible to enable this logging on an existing cache, and doing this will likely mean the log starts out capturing only the changes made since the log was enabled. This should be acceptable, since clients that want all entries can optionally start out with a snapshot (e.g., “includeState=true”).
  4. Is the log guaranteed to have the same order of changes as was changed in the cache?
  5. Will the log be configured with a TTL for the events or a fixed size? TTLs are easy to understand but require variable amount of storage; capped storage size is easy to manage but harder to understand.
  6. Will the log store the “before” state of the entry? This increases the size of the events and therefore the log, but it means client applications can do a lot more with the events without storing (as much) state.
  7. It is very useful for the HodRod client automatically failover when it loses its connectivity with Infinispan. I presume this is based upon the id of the event successfully provided and handled by the listener method.
  8. Will the log include transaction boundaries, or at least a transaction id/number in each event?
  9. Do/will the events include the entry version numbers? Are the versions included in events when "includeCurrentState=true” is set?

I hope this helps; let me know if you want clarification on any of these. I can’t wait to have this feature!

Best regards,

Randall



Thanks,
Gustavo

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Radim Vansa
In reply to this post by Gustavo Fernandes-2
On 12/08/2016 10:13 AM, Gustavo Fernandes wrote:
>
> I recently updated a proposal [1] based on several discussions we had
> in the past that is essentially about introducing an event storage
> mechanism (write ahead log) in order to improve reliability, failover
> and "replayability" for the remote listeners, any feedback greatly
> appreciated.

Hi Gustavo,

while I really like the pull-style architecture and reliable events, I
see some problematic parts here:

1) 'cache that would persist the events with a monotonically increasing id'

I assume that you mean globally (for all entries) monotonous. How will
you obtain such ID? Currently, commands have unique IDs that are
<Address, Long> where the number part is monotonous per node. That's
easy to achieve. But introducing globally monotonous counter means that
there will be a single contention point. (you can introduce another
contention points by adding backups, but this is probably unnecessary as
you can find out the last id from the indexed cache data). Per-segment
monotonous would be probably more scalabe, though that increases complexity.

2) 'The write to the event log would be async in order to not affect
normal data writes'

Who should write to the cache?
a) originator - what if originator crashes (despite the change has been
added)? Besides, originator would have to do (async) RPC to primary
owner (which will be the primary owner of the event, too).
b) primary owner - with triangle, primary does not really know if the
change has been written on backup. Piggybacking that info won't be
trivial - we don't want to send another message explicitly. But even if
we get the confirmation, since the write to event cache is async, if the
primary owner crashes before replicating the event to backup, we lost
the event
c) all owners, but locally - that will require more complex
reconciliation if the event did really happen on all surviving nodes or
not. And backups could have some trouble to resolve order, too.

IIUC clustered listeners are called from primary owner before the change
is really confirmed on backups (@Pedro correct me if I am wrong,
please), but for this reliable event cache you need higher level of
consistency.

3) The log will also have to filter out retried operations (based on
command ID - though this can be indexed, too). Though, I would prefer to
see per-event command-id log to deal with retries properly.

4) Client should pull data, but I would keep push notifications that
'something happened' (throttled on server). There could be use case for
rarely updated caches, and polling the servers would be excessive there.

Radim

>
>
> [1]
> https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvement-proposal
>
> Thanks,
> Gustavo
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Randall Hauch

On Dec 9, 2016, at 3:13 AM, Radim Vansa <[hidden email]> wrote:

On 12/08/2016 10:13 AM, Gustavo Fernandes wrote:

I recently updated a proposal [1] based on several discussions we had
in the past that is essentially about introducing an event storage
mechanism (write ahead log) in order to improve reliability, failover
and "replayability" for the remote listeners, any feedback greatly
appreciated.

Hi Gustavo,

while I really like the pull-style architecture and reliable events, I
see some problematic parts here:

1) 'cache that would persist the events with a monotonically increasing id'

I assume that you mean globally (for all entries) monotonous. How will
you obtain such ID? Currently, commands have unique IDs that are
<Address, Long> where the number part is monotonous per node. That's
easy to achieve. But introducing globally monotonous counter means that
there will be a single contention point. (you can introduce another
contention points by adding backups, but this is probably unnecessary as
you can find out the last id from the indexed cache data). Per-segment
monotonous would be probably more scalabe, though that increases complexity.

It is complicated, but one way to do this is to have one “primary” node maintain the log and to have other replicate from it. The cluster does need to use consensus to agree which is the primary, and to know which secondary becomes the primary if the primary is failing. Consensus is not trivial, but JGroups Raft (http://belaban.github.io/jgroups-raft/) may be an option. However, this approach ensures that the replica logs are identical to the primary since they are simply recording the primary’s log as-is. Of course, another challenge is what happens during a failure of the primary log node, and can any transactions be performed/completed while the primary is unavailable.

Another option is to have each node maintain their own log, and to have an aggregator log that merges/combines the various logs into one. Not sure how feasible it is to merge logs by getting rid of duplicates and determining a total order, but if it is then it may have better fault tolerance characteristics.

Of course, it is possible to have node-specific monotonic IDs. For example, MySQL Global Transaction IDs (GTIDs) use a unique UUID for each node, and then GTIDs consists of the node’s UUID plus a monotonically-increasing value (e.g., “31fc48cd-ecd4-46ad-b0a9-f515fc9497c4:1001”). The transaction log contains a mix of GTIDs, and MySQL replication uses a “GTID set” to describe the ranges of transactions known by a server (e.g., “u1:1-100,u2:1-10000,u3:3-5” where “u1”, “u2”, and “u3” are actually UUIDs). So, when a MySQL replica connects, it says “I know about this GTID set", and this tells the master where that client wants to start reading.


2) 'The write to the event log would be async in order to not affect
normal data writes'

Who should write to the cache?
a) originator - what if originator crashes (despite the change has been
added)? Besides, originator would have to do (async) RPC to primary
owner (which will be the primary owner of the event, too).
b) primary owner - with triangle, primary does not really know if the
change has been written on backup. Piggybacking that info won't be
trivial - we don't want to send another message explicitly. But even if
we get the confirmation, since the write to event cache is async, if the
primary owner crashes before replicating the event to backup, we lost
the event
c) all owners, but locally - that will require more complex
reconciliation if the event did really happen on all surviving nodes or
not. And backups could have some trouble to resolve order, too.

IIUC clustered listeners are called from primary owner before the change
is really confirmed on backups (@Pedro correct me if I am wrong,
please), but for this reliable event cache you need higher level of
consistency.

This could be handled by writing a confirmation or “commit” event to the log when the write is confirmed or the transaction is committed. Then, only those confirmed events/transactions would be exposed to client listeners. This requires some buffering, but this could be done in each HotRod client.


3) The log will also have to filter out retried operations (based on
command ID - though this can be indexed, too). Though, I would prefer to
see per-event command-id log to deal with retries properly.

IIUC, a “commit” event would work here, too.


4) Client should pull data, but I would keep push notifications that
'something happened' (throttled on server). There could be use case for
rarely updated caches, and polling the servers would be excessive there.

IMO the clients should poll, but if the server has nothing to return it blocks until there is something or until a timeout occurs. This makes it easy for clients and actually reduces network traffic compared to constantly polling.

BTW, a lot of this is replicating the functionality of Kafka, which is already quite mature and feature rich. It’s actually possible to *embed* Kafka to simplify operations, but I don’t think that’s recommended. And, it introduces a very complex codebase that would need to be supported.


Radim



[1]
https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvement-proposal

Thanks,
Gustavo



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Emmanuel Bernard
Randall and I had a chat on $subject. Here is a proposal worth
exploring as it is very lightweight on Infinispan's code.

Does an operation has a unique id sahred by the master and replicas?
If not could we add that?

The proposal itself:

The total order would not be global but per key.
Each node has a Debezium connector instance embedded that listens to the
operations happening (primary and replicas alike).
All of this process is happening async compared to the operation.
Per key, a log of operations is kept in memory (it contains the key, the
operation, the operation unique id and a ack status.
If on the key owner, the operation is written by the Debezium connector
to Kafka when it has been acked (whatever that means is where I'm less
knowledgable - too many bi-cache, tri-cache and quadri latency mixed in
my brain).
On a replica, the kafka partition is read regularly to clear the
in-memory log from operations stored in Kafka
If the replica becomes the owner, it reads the kafka partition to see
what operations are already in and writes the missing ones.

There are a few cool things:
- few to no change in what Infinispan does
- no global ordering simplifies things and frankly is fine for most
  Debezium cases. In the end a global order could be defined after the
  fact (by not partitioning for example). But that's a pure downstream
  concern.
- everything is async compared to the Infinispan ops
- the in-memory log can remain in memory as it is protected by replicas
- the in-memory log is self cleaning thanks to the state in Kafka

Everyone wins. But it does require some sort of globally unique id per
operation to dedup.

Emmanuel


On Fri 16-12-09 10:08, Randall Hauch wrote:

>
>> On Dec 9, 2016, at 3:13 AM, Radim Vansa <[hidden email]> wrote:
>>
>> On 12/08/2016 10:13 AM, Gustavo Fernandes wrote:
>>>
>>> I recently updated a proposal [1] based on several discussions we had
>>> in the past that is essentially about introducing an event storage
>>> mechanism (write ahead log) in order to improve reliability, failover
>>> and "replayability" for the remote listeners, any feedback greatly
>>> appreciated.
>>
>> Hi Gustavo,
>>
>> while I really like the pull-style architecture and reliable events, I
>> see some problematic parts here:
>>
>> 1) 'cache that would persist the events with a monotonically increasing id'
>>
>> I assume that you mean globally (for all entries) monotonous. How will
>> you obtain such ID? Currently, commands have unique IDs that are
>> <Address, Long> where the number part is monotonous per node. That's
>> easy to achieve. But introducing globally monotonous counter means that
>> there will be a single contention point. (you can introduce another
>> contention points by adding backups, but this is probably unnecessary as
>> you can find out the last id from the indexed cache data). Per-segment
>> monotonous would be probably more scalabe, though that increases complexity.
>
>It is complicated, but one way to do this is to have one “primary” node maintain the log and to have other replicate from it. The cluster does need to use consensus to agree which is the primary, and to know which secondary becomes the primary if the primary is failing. Consensus is not trivial, but JGroups Raft (http://belaban.github.io/jgroups-raft/ <http://belaban.github.io/jgroups-raft/>) may be an option. However, this approach ensures that the replica logs are identical to the primary since they are simply recording the primary’s log as-is. Of course, another challenge is what happens during a failure of the primary log node, and can any transactions be performed/completed while the primary is unavailable.
>
>Another option is to have each node maintain their own log, and to have an aggregator log that merges/combines the various logs into one. Not sure how feasible it is to merge logs by getting rid of duplicates and determining a total order, but if it is then it may have better fault tolerance characteristics.
>
>Of course, it is possible to have node-specific monotonic IDs. For example, MySQL Global Transaction IDs (GTIDs) use a unique UUID for each node, and then GTIDs consists of the node’s UUID plus a monotonically-increasing value (e.g., “31fc48cd-ecd4-46ad-b0a9-f515fc9497c4:1001”). The transaction log contains a mix of GTIDs, and MySQL replication uses a “GTID set” to describe the ranges of transactions known by a server (e.g., “u1:1-100,u2:1-10000,u3:3-5” where “u1”, “u2”, and “u3” are actually UUIDs). So, when a MySQL replica connects, it says “I know about this GTID set", and this tells the master where that client wants to start reading.
>
>>
>> 2) 'The write to the event log would be async in order to not affect
>> normal data writes'
>>
>> Who should write to the cache?
>> a) originator - what if originator crashes (despite the change has been
>> added)? Besides, originator would have to do (async) RPC to primary
>> owner (which will be the primary owner of the event, too).
>> b) primary owner - with triangle, primary does not really know if the
>> change has been written on backup. Piggybacking that info won't be
>> trivial - we don't want to send another message explicitly. But even if
>> we get the confirmation, since the write to event cache is async, if the
>> primary owner crashes before replicating the event to backup, we lost
>> the event
>> c) all owners, but locally - that will require more complex
>> reconciliation if the event did really happen on all surviving nodes or
>> not. And backups could have some trouble to resolve order, too.
>>
>> IIUC clustered listeners are called from primary owner before the change
>> is really confirmed on backups (@Pedro correct me if I am wrong,
>> please), but for this reliable event cache you need higher level of
>> consistency.
>
>This could be handled by writing a confirmation or “commit” event to the log when the write is confirmed or the transaction is committed. Then, only those confirmed events/transactions would be exposed to client listeners. This requires some buffering, but this could be done in each HotRod client.
>
>>
>> 3) The log will also have to filter out retried operations (based on
>> command ID - though this can be indexed, too). Though, I would prefer to
>> see per-event command-id log to deal with retries properly.
>
>IIUC, a “commit” event would work here, too.
>
>>
>> 4) Client should pull data, but I would keep push notifications that
>> 'something happened' (throttled on server). There could be use case for
>> rarely updated caches, and polling the servers would be excessive there.
>
>IMO the clients should poll, but if the server has nothing to return it blocks until there is something or until a timeout occurs. This makes it easy for clients and actually reduces network traffic compared to constantly polling.
>
>BTW, a lot of this is replicating the functionality of Kafka, which is already quite mature and feature rich. It’s actually possible to *embed* Kafka to simplify operations, but I don’t think that’s recommended. And, it introduces a very complex codebase that would need to be supported.
>
>>
>> Radim
>>
>>>
>>>
>>> [1]
>>> https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvement-proposal
>>>
>>> Thanks,
>>> Gustavo
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> --
>> Radim Vansa <[hidden email]>
>> JBoss Performance Team
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

>_______________________________________________
>infinispan-dev mailing list
>[hidden email]
>https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Randall Hauch
In reply to this post by Randall Hauch
On Dec 9, 2016, at 10:08 AM, Randall Hauch <[hidden email]> wrote:


On Dec 9, 2016, at 3:13 AM, Radim Vansa <[hidden email]> wrote:

On 12/08/2016 10:13 AM, Gustavo Fernandes wrote:

I recently updated a proposal [1] based on several discussions we had 
in the past that is essentially about introducing an event storage 
mechanism (write ahead log) in order to improve reliability, failover 
and "replayability" for the remote listeners, any feedback greatly 
appreciated.

Hi Gustavo,

while I really like the pull-style architecture and reliable events, I 
see some problematic parts here:

1) 'cache that would persist the events with a monotonically increasing id'

I assume that you mean globally (for all entries) monotonous. How will 
you obtain such ID? Currently, commands have unique IDs that are 
<Address, Long> where the number part is monotonous per node. That's 
easy to achieve. But introducing globally monotonous counter means that 
there will be a single contention point. (you can introduce another 
contention points by adding backups, but this is probably unnecessary as 
you can find out the last id from the indexed cache data). Per-segment 
monotonous would be probably more scalabe, though that increases complexity.

It is complicated, but one way to do this is to have one “primary” node maintain the log and to have other replicate from it. The cluster does need to use consensus to agree which is the primary, and to know which secondary becomes the primary if the primary is failing. Consensus is not trivial, but JGroups Raft (http://belaban.github.io/jgroups-raft/) may be an option. However, this approach ensures that the replica logs are identical to the primary since they are simply recording the primary’s log as-is. Of course, another challenge is what happens during a failure of the primary log node, and can any transactions be performed/completed while the primary is unavailable.

Another option is to have each node maintain their own log, and to have an aggregator log that merges/combines the various logs into one. Not sure how feasible it is to merge logs by getting rid of duplicates and determining a total order, but if it is then it may have better fault tolerance characteristics.

Of course, it is possible to have node-specific monotonic IDs. For example, MySQL Global Transaction IDs (GTIDs) use a unique UUID for each node, and then GTIDs consists of the node’s UUID plus a monotonically-increasing value (e.g., “31fc48cd-ecd4-46ad-b0a9-f515fc9497c4:1001”). The transaction log contains a mix of GTIDs, and MySQL replication uses a “GTID set” to describe the ranges of transactions known by a server (e.g., “u1:1-100,u2:1-10000,u3:3-5” where “u1”, “u2”, and “u3” are actually UUIDs). So, when a MySQL replica connects, it says “I know about this GTID set", and this tells the master where that client wants to start reading.

Emmanuel and I were talking offline. Another approach entirely is to have each node (optionally) write the changes it is making as a leader directly to Kafka, meaning that Kafka becomes the event log and delivery mechanism. Upon failure of that node, the node that becomes the new leader would write any of its events not already written by the former leader, and then continue writing new changes it is making as a leader. Thus, Infinispan would not be producing a single log with total order of all changes to a cache (which there isn’t one in Infinispan), but rather the total order of each key. (Kafka does this very nicely via topic partitions, where all changes for each key always get written to the same partition, and each partition has a total order.) This approach may still need separate “commit” events to reflect how Infinispan currently works internally.

Obviously Infinispan wouldn’t require this to be done, but when it’s enabled it might provide a much simpler way of capturing the history of changes to the events in an Infinispan cache. The HotRod client could consume the events directly from Kafka, or that could be left to a completely different client/utility. It does add a dependency on Kafka, but it means the Infinispan community doesn’t need to build much of the same functionality.



2) 'The write to the event log would be async in order to not affect 
normal data writes'

Who should write to the cache?
a) originator - what if originator crashes (despite the change has been 
added)? Besides, originator would have to do (async) RPC to primary 
owner (which will be the primary owner of the event, too).
b) primary owner - with triangle, primary does not really know if the 
change has been written on backup. Piggybacking that info won't be 
trivial - we don't want to send another message explicitly. But even if 
we get the confirmation, since the write to event cache is async, if the 
primary owner crashes before replicating the event to backup, we lost 
the event
c) all owners, but locally - that will require more complex 
reconciliation if the event did really happen on all surviving nodes or 
not. And backups could have some trouble to resolve order, too.

IIUC clustered listeners are called from primary owner before the change 
is really confirmed on backups (@Pedro correct me if I am wrong, 
please), but for this reliable event cache you need higher level of 
consistency.

This could be handled by writing a confirmation or “commit” event to the log when the write is confirmed or the transaction is committed. Then, only those confirmed events/transactions would be exposed to client listeners. This requires some buffering, but this could be done in each HotRod client.


3) The log will also have to filter out retried operations (based on 
command ID - though this can be indexed, too). Though, I would prefer to 
see per-event command-id log to deal with retries properly.

IIUC, a “commit” event would work here, too.


4) Client should pull data, but I would keep push notifications that 
'something happened' (throttled on server). There could be use case for 
rarely updated caches, and polling the servers would be excessive there.

IMO the clients should poll, but if the server has nothing to return it blocks until there is something or until a timeout occurs. This makes it easy for clients and actually reduces network traffic compared to constantly polling.

BTW, a lot of this is replicating the functionality of Kafka, which is already quite mature and feature rich. It’s actually possible to *embed* Kafka to simplify operations, but I don’t think that’s recommended. And, it introduces a very complex codebase that would need to be supported.


Radim



[1] 
https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvement-proposal

Thanks,
Gustavo



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Radim Vansa
In reply to this post by Randall Hauch
On 12/09/2016 05:08 PM, Randall Hauch wrote:

>
>> On Dec 9, 2016, at 3:13 AM, Radim Vansa <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> On 12/08/2016 10:13 AM, Gustavo Fernandes wrote:
>>>
>>> I recently updated a proposal [1] based on several discussions we had
>>> in the past that is essentially about introducing an event storage
>>> mechanism (write ahead log) in order to improve reliability, failover
>>> and "replayability" for the remote listeners, any feedback greatly
>>> appreciated.
>>
>> Hi Gustavo,
>>
>> while I really like the pull-style architecture and reliable events, I
>> see some problematic parts here:
>>
>> 1) 'cache that would persist the events with a monotonically
>> increasing id'
>>
>> I assume that you mean globally (for all entries) monotonous. How will
>> you obtain such ID? Currently, commands have unique IDs that are
>> <Address, Long> where the number part is monotonous per node. That's
>> easy to achieve. But introducing globally monotonous counter means that
>> there will be a single contention point. (you can introduce another
>> contention points by adding backups, but this is probably unnecessary as
>> you can find out the last id from the indexed cache data). Per-segment
>> monotonous would be probably more scalabe, though that increases
>> complexity.
>
> It is complicated, but one way to do this is to have one “primary”
> node maintain the log and to have other replicate from it. The cluster
> does need to use consensus to agree which is the primary, and to know
> which secondary becomes the primary if the primary is failing.
> Consensus is not trivial, but JGroups Raft
> (http://belaban.github.io/jgroups-raft/) may be an option. However,
> this approach ensures that the replica logs are identical to the
> primary since they are simply recording the primary’s log as-is. Of
> course, another challenge is what happens during a failure of the
> primary log node, and can any transactions be performed/completed
> while the primary is unavailable.

I am not sure here if you propose to store all events in log on one
node, use RAFT for the monotonic counter, or just for some node
selection that will source the ids. In either case, you introduce a
bottleneck - RAFT does not scale performance-wise, as any solution that
uses single node for each operation, no matter how simple that operation is.

>
> Another option is to have each node maintain their own log, and to
> have an aggregator log that merges/combines the various logs into one.
> Not sure how feasible it is to merge logs by getting rid of duplicates
> and determining a total order, but if it is then it may have better
> fault tolerance characteristics.
>
> Of course, it is possible to have node-specific monotonic IDs. For
> example, MySQL Global Transaction IDs (GTIDs) use a unique UUID for
> each node, and then GTIDs consists of the node’s UUID plus a
> monotonically-increasing value (e.g.,
> “31fc48cd-ecd4-46ad-b0a9-f515fc9497c4:1001”). The transaction log
> contains a mix of GTIDs, and MySQL replication uses a “GTID set” to
> describe the ranges of transactions known by a server (e.g.,
> “u1:1-100,u2:1-10000,u3:3-5” where “u1”, “u2”, and “u3” are actually
> UUIDs). So, when a MySQL replica connects, it says “I know about this
> GTID set", and this tells the master where that client wants to start
> reading.

Yes, similar node + monotonous id is used both for transactions and for
non-transactional commands in Infinispan. I would say that in
complexity, it's similar to per-segment counters, but so far we have a
constant number of segments as opposed to varying number of nodes.

Node-specific monotonic ids do not give you monotonic order of commits,
just unique ids: If a NodeA does operation 1 and 2, this does not say
that 1 will be comitted before 2; 2 can be finished (and pushed to log)
before 1. But I don't think you really need a monotonic sequence. In
Infinispan, all the nodes should push the events in the same order,
though, so the log will know where to start from if a client asks for
all messages after op 1. As long as duplicates are properly filtered out.

>
>>
>> 2) 'The write to the event log would be async in order to not affect
>> normal data writes'
>>
>> Who should write to the cache?
>> a) originator - what if originator crashes (despite the change has been
>> added)? Besides, originator would have to do (async) RPC to primary
>> owner (which will be the primary owner of the event, too).
>> b) primary owner - with triangle, primary does not really know if the
>> change has been written on backup. Piggybacking that info won't be
>> trivial - we don't want to send another message explicitly. But even if
>> we get the confirmation, since the write to event cache is async, if the
>> primary owner crashes before replicating the event to backup, we lost
>> the event
>> c) all owners, but locally - that will require more complex
>> reconciliation if the event did really happen on all surviving nodes or
>> not. And backups could have some trouble to resolve order, too.
>>
>> IIUC clustered listeners are called from primary owner before the change
>> is really confirmed on backups (@Pedro correct me if I am wrong,
>> please), but for this reliable event cache you need higher level of
>> consistency.
>
> This could be handled by writing a confirmation or “commit” event to
> the log when the write is confirmed or the transaction is committed.
> Then, only those confirmed events/transactions would be exposed to
> client listeners. This requires some buffering, but this could be done
> in each HotRod client.

I would put this under "originator". So, if the node that writes the
"commit" event crashes, the data is changed (and consistent) in the
cluster but nobody will be notified about that.
Note that Infinispan does not guarantee that data being written by a
crashing node will end up consistent on all owners, because it is the
originator who retries the operation if one of the owners crashed (or
generally, when a topology changes during the command). So it's not that
bad solution after all, if you're okay by missing an effectively
committed operation on node crash.

>
>>
>> 3) The log will also have to filter out retried operations (based on
>> command ID - though this can be indexed, too). Though, I would prefer to
>> see per-event command-id log to deal with retries properly.
>
> IIUC, a “commit” event would work here, too.
>
>>
>> 4) Client should pull data, but I would keep push notifications that
>> 'something happened' (throttled on server). There could be use case for
>> rarely updated caches, and polling the servers would be excessive there.
>
> IMO the clients should poll, but if the server has nothing to return
> it blocks until there is something or until a timeout occurs. This
> makes it easy for clients and actually reduces network traffic
> compared to constantly polling.

I would say that client waiting on a blocked connection is a push (maybe
there's a method to implement push otherwise on TCP connection but I am
not aware of it - please forgive my ignorance).

>
> BTW, a lot of this is replicating the functionality of Kafka, which is
> already quite mature and feature rich. It’s actually possible to
> *embed* Kafka to simplify operations, but I don’t think that’s
> recommended. And, it introduces a very complex codebase that would
> need to be supported.

I wouldn't use complex third party project on a similar tier as
JGroups/Infinispan to implement basic functionality (which remote
listeners are), but for Debezium it could be a fit. Let's discuss your
Kafka based proposal in the follow-up mail thread.

R.

>
>>
>> Radim
>>
>>>
>>>
>>> [1]
>>> https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvement-proposal
>>>
>>> Thanks,
>>> Gustavo
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> --
>> Radim Vansa <[hidden email] <mailto:[hidden email]>>
>> JBoss Performance Team
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email] <mailto:[hidden email]>
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Radim Vansa
In reply to this post by Emmanuel Bernard
On 12/09/2016 06:25 PM, Emmanuel Bernard wrote:
> Randall and I had a chat on $subject. Here is a proposal worth
> exploring as it is very lightweight on Infinispan's code.
>
> Does an operation has a unique id sahred by the master and replicas?
> If not could we add that?

Yes, each modification has unique CommandInvocationId in non-tx cache,
and there are GlobalTransaction ids in tx-caches.

>
> The proposal itself:
>
> The total order would not be global but per key.
> Each node has a Debezium connector instance embedded that listens to the
> operations happening (primary and replicas alike).
> All of this process is happening async compared to the operation.
> Per key, a log of operations is kept in memory (it contains the key, the
> operation, the operation unique id and a ack status.
> If on the key owner, the operation is written by the Debezium connector
> to Kafka when it has been acked (whatever that means is where I'm less
> knowledgable - too many bi-cache, tri-cache and quadri latency mixed in
> my brain).

And the ack is what you have to define. All Infinispan gives you is

operation was confirmed on originator => all owners (primary + backups)
have stored the value

If you send the ack from originator to primary, it could be lost (when
originator crashes).
If you write Kafka on originator, you don't have any order, and the
update could be lost by crashing before somehow replicating to Kafka.
If you write Kafka on primary, you need the ack from all backups (minor
technical difficulty), and if primary crashes after it has sent the
update to all backups, data is effectively modified but Kafka is not.
The originator has to detect primary crashing to retry - so probably the
primary could only send the ack to originator after it gets ack from all
backups AND updates Kafka. But this is exactly what Triangle eliminated.
And you still have the problem when originator crashes as well, but at
least you're resilient to single node (primary) failure.

So you probably intend to forget any "acks" and as soon as primary
executes the write locally, just push it to Kafka. No matter the actual
"outcome" of the operation. E.g. with putIfAbsent there could be
topology change during replication to backup, which will cause the
operation to be retried (from the originator). In the meantime, there
would be another write, and the retried putIfAbsent will fail. You will
have one successful and one unsuccessful putIfAbsent in the log, with
the same ID.

> On a replica, the kafka partition is read regularly to clear the
> in-memory log from operations stored in Kafka
> If the replica becomes the owner, it reads the kafka partition to see
> what operations are already in and writes the missing ones.

Backup owner (becoming primary owner) having the write locally logged
does not mean that operation was successfully finished on all owners.

>
> There are a few cool things:
> - few to no change in what Infinispan does
> - no global ordering simplifies things and frankly is fine for most
>    Debezium cases. In the end a global order could be defined after the
>    fact (by not partitioning for example). But that's a pure downstream
>    concern.
> - everything is async compared to the Infinispan ops
> - the in-memory log can remain in memory as it is protected by replicas
> - the in-memory log is self cleaning thanks to the state in Kafka
>
> Everyone wins. But it does require some sort of globally unique id per
> operation to dedup.

And a suitable definition for Debezium if the operation "happened" or not.

Radim

>
> Emmanuel
>
>
> On Fri 16-12-09 10:08, Randall Hauch wrote:
>>> On Dec 9, 2016, at 3:13 AM, Radim Vansa <[hidden email]> wrote:
>>>
>>> On 12/08/2016 10:13 AM, Gustavo Fernandes wrote:
>>>> I recently updated a proposal [1] based on several discussions we had
>>>> in the past that is essentially about introducing an event storage
>>>> mechanism (write ahead log) in order to improve reliability, failover
>>>> and "replayability" for the remote listeners, any feedback greatly
>>>> appreciated.
>>> Hi Gustavo,
>>>
>>> while I really like the pull-style architecture and reliable events, I
>>> see some problematic parts here:
>>>
>>> 1) 'cache that would persist the events with a monotonically increasing id'
>>>
>>> I assume that you mean globally (for all entries) monotonous. How will
>>> you obtain such ID? Currently, commands have unique IDs that are
>>> <Address, Long> where the number part is monotonous per node. That's
>>> easy to achieve. But introducing globally monotonous counter means that
>>> there will be a single contention point. (you can introduce another
>>> contention points by adding backups, but this is probably unnecessary as
>>> you can find out the last id from the indexed cache data). Per-segment
>>> monotonous would be probably more scalabe, though that increases complexity.
>> It is complicated, but one way to do this is to have one “primary” node maintain the log and to have other replicate from it. The cluster does need to use consensus to agree which is the primary, and to know which secondary becomes the primary if the primary is failing. Consensus is not trivial, but JGroups Raft (http://belaban.github.io/jgroups-raft/ <http://belaban.github.io/jgroups-raft/>) may be an option. However, this approach ensures that the replica logs are identical to the primary since they are simply recording the primary’s log as-is. Of course, another challenge is what happens during a failure of the primary log node, and can any transactions be performed/completed while the primary is unavailable.
>>
>> Another option is to have each node maintain their own log, and to have an aggregator log that merges/combines the various logs into one. Not sure how feasible it is to merge logs by getting rid of duplicates and determining a total order, but if it is then it may have better fault tolerance characteristics.
>>
>> Of course, it is possible to have node-specific monotonic IDs. For example, MySQL Global Transaction IDs (GTIDs) use a unique UUID for each node, and then GTIDs consists of the node’s UUID plus a monotonically-increasing value (e.g., “31fc48cd-ecd4-46ad-b0a9-f515fc9497c4:1001”). The transaction log contains a mix of GTIDs, and MySQL replication uses a “GTID set” to describe the ranges of transactions known by a server (e.g., “u1:1-100,u2:1-10000,u3:3-5” where “u1”, “u2”, and “u3” are actually UUIDs). So, when a MySQL replica connects, it says “I know about this GTID set", and this tells the master where that client wants to start reading.
>>
>>> 2) 'The write to the event log would be async in order to not affect
>>> normal data writes'
>>>
>>> Who should write to the cache?
>>> a) originator - what if originator crashes (despite the change has been
>>> added)? Besides, originator would have to do (async) RPC to primary
>>> owner (which will be the primary owner of the event, too).
>>> b) primary owner - with triangle, primary does not really know if the
>>> change has been written on backup. Piggybacking that info won't be
>>> trivial - we don't want to send another message explicitly. But even if
>>> we get the confirmation, since the write to event cache is async, if the
>>> primary owner crashes before replicating the event to backup, we lost
>>> the event
>>> c) all owners, but locally - that will require more complex
>>> reconciliation if the event did really happen on all surviving nodes or
>>> not. And backups could have some trouble to resolve order, too.
>>>
>>> IIUC clustered listeners are called from primary owner before the change
>>> is really confirmed on backups (@Pedro correct me if I am wrong,
>>> please), but for this reliable event cache you need higher level of
>>> consistency.
>> This could be handled by writing a confirmation or “commit” event to the log when the write is confirmed or the transaction is committed. Then, only those confirmed events/transactions would be exposed to client listeners. This requires some buffering, but this could be done in each HotRod client.
>>
>>> 3) The log will also have to filter out retried operations (based on
>>> command ID - though this can be indexed, too). Though, I would prefer to
>>> see per-event command-id log to deal with retries properly.
>> IIUC, a “commit” event would work here, too.
>>
>>> 4) Client should pull data, but I would keep push notifications that
>>> 'something happened' (throttled on server). There could be use case for
>>> rarely updated caches, and polling the servers would be excessive there.
>> IMO the clients should poll, but if the server has nothing to return it blocks until there is something or until a timeout occurs. This makes it easy for clients and actually reduces network traffic compared to constantly polling.
>>
>> BTW, a lot of this is replicating the functionality of Kafka, which is already quite mature and feature rich. It’s actually possible to *embed* Kafka to simplify operations, but I don’t think that’s recommended. And, it introduces a very complex codebase that would need to be supported.
>>
>>> Radim
>>>
>>>>
>>>> [1]
>>>> https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvement-proposal
>>>>
>>>> Thanks,
>>>> Gustavo
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> --
>>> Radim Vansa <[hidden email]>
>>> JBoss Performance Team
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Bela Ban
Could BiCache help here? It does establish total order per key
(essentially established by the primary) and since multiple node (or
all) have ordering information, modifications can always be rolled
forward...

On 09/12/16 22:43, Radim Vansa wrote:

> On 12/09/2016 06:25 PM, Emmanuel Bernard wrote:
>> Randall and I had a chat on $subject. Here is a proposal worth
>> exploring as it is very lightweight on Infinispan's code.
>>
>> Does an operation has a unique id sahred by the master and replicas?
>> If not could we add that?
>
> Yes, each modification has unique CommandInvocationId in non-tx cache,
> and there are GlobalTransaction ids in tx-caches.
>
>>
>> The proposal itself:
>>
>> The total order would not be global but per key.
>> Each node has a Debezium connector instance embedded that listens to the
>> operations happening (primary and replicas alike).
>> All of this process is happening async compared to the operation.
>> Per key, a log of operations is kept in memory (it contains the key, the
>> operation, the operation unique id and a ack status.
>> If on the key owner, the operation is written by the Debezium connector
>> to Kafka when it has been acked (whatever that means is where I'm less
>> knowledgable - too many bi-cache, tri-cache and quadri latency mixed in
>> my brain).
>
> And the ack is what you have to define. All Infinispan gives you is
>
> operation was confirmed on originator => all owners (primary + backups)
> have stored the value
>
> If you send the ack from originator to primary, it could be lost (when
> originator crashes).
> If you write Kafka on originator, you don't have any order, and the
> update could be lost by crashing before somehow replicating to Kafka.
> If you write Kafka on primary, you need the ack from all backups (minor
> technical difficulty), and if primary crashes after it has sent the
> update to all backups, data is effectively modified but Kafka is not.
> The originator has to detect primary crashing to retry - so probably the
> primary could only send the ack to originator after it gets ack from all
> backups AND updates Kafka. But this is exactly what Triangle eliminated.
> And you still have the problem when originator crashes as well, but at
> least you're resilient to single node (primary) failure.
>
> So you probably intend to forget any "acks" and as soon as primary
> executes the write locally, just push it to Kafka. No matter the actual
> "outcome" of the operation. E.g. with putIfAbsent there could be
> topology change during replication to backup, which will cause the
> operation to be retried (from the originator). In the meantime, there
> would be another write, and the retried putIfAbsent will fail. You will
> have one successful and one unsuccessful putIfAbsent in the log, with
> the same ID.
>
>> On a replica, the kafka partition is read regularly to clear the
>> in-memory log from operations stored in Kafka
>> If the replica becomes the owner, it reads the kafka partition to see
>> what operations are already in and writes the missing ones.
>
> Backup owner (becoming primary owner) having the write locally logged
> does not mean that operation was successfully finished on all owners.
>
>>
>> There are a few cool things:
>> - few to no change in what Infinispan does
>> - no global ordering simplifies things and frankly is fine for most
>>    Debezium cases. In the end a global order could be defined after the
>>    fact (by not partitioning for example). But that's a pure downstream
>>    concern.
>> - everything is async compared to the Infinispan ops
>> - the in-memory log can remain in memory as it is protected by replicas
>> - the in-memory log is self cleaning thanks to the state in Kafka
>>
>> Everyone wins. But it does require some sort of globally unique id per
>> operation to dedup.
>
> And a suitable definition for Debezium if the operation "happened" or not.
>
> Radim
>
>>
>> Emmanuel
>>
>>
>> On Fri 16-12-09 10:08, Randall Hauch wrote:
>>>> On Dec 9, 2016, at 3:13 AM, Radim Vansa <[hidden email]> wrote:
>>>>
>>>> On 12/08/2016 10:13 AM, Gustavo Fernandes wrote:
>>>>> I recently updated a proposal [1] based on several discussions we had
>>>>> in the past that is essentially about introducing an event storage
>>>>> mechanism (write ahead log) in order to improve reliability, failover
>>>>> and "replayability" for the remote listeners, any feedback greatly
>>>>> appreciated.
>>>> Hi Gustavo,
>>>>
>>>> while I really like the pull-style architecture and reliable events, I
>>>> see some problematic parts here:
>>>>
>>>> 1) 'cache that would persist the events with a monotonically increasing id'
>>>>
>>>> I assume that you mean globally (for all entries) monotonous. How will
>>>> you obtain such ID? Currently, commands have unique IDs that are
>>>> <Address, Long> where the number part is monotonous per node. That's
>>>> easy to achieve. But introducing globally monotonous counter means that
>>>> there will be a single contention point. (you can introduce another
>>>> contention points by adding backups, but this is probably unnecessary as
>>>> you can find out the last id from the indexed cache data). Per-segment
>>>> monotonous would be probably more scalabe, though that increases complexity.
>>> It is complicated, but one way to do this is to have one “primary” node maintain the log and to have other replicate from it. The cluster does need to use consensus to agree which is the primary, and to know which secondary becomes the primary if the primary is failing. Consensus is not trivial, but JGroups Raft (http://belaban.github.io/jgroups-raft/ <http://belaban.github.io/jgroups-raft/>) may be an option. However, this approach ensures that the replica logs are identical to the primary since they are simply recording the primary’s log as-is. Of course, another challenge is what happens during a failure of the primary log node, and can any transactions be performed/completed while the primary is unavailable.
>>>
>>> Another option is to have each node maintain their own log, and to have an aggregator log that merges/combines the various logs into one. Not sure how feasible it is to merge logs by getting rid of duplicates and determining a total order, but if it is then it may have better fault tolerance characteristics.
>>>
>>> Of course, it is possible to have node-specific monotonic IDs. For example, MySQL Global Transaction IDs (GTIDs) use a unique UUID for each node, and then GTIDs consists of the node’s UUID plus a monotonically-increasing value (e.g., “31fc48cd-ecd4-46ad-b0a9-f515fc9497c4:1001”). The transaction log contains a mix of GTIDs, and MySQL replication uses a “GTID set” to describe the ranges of transactions known by a server (e.g., “u1:1-100,u2:1-10000,u3:3-5” where “u1”, “u2”, and “u3” are actually UUIDs). So, when a MySQL replica connects, it says “I know about this GTID set", and this tells the master where that client wants to start reading.
>>>
>>>> 2) 'The write to the event log would be async in order to not affect
>>>> normal data writes'
>>>>
>>>> Who should write to the cache?
>>>> a) originator - what if originator crashes (despite the change has been
>>>> added)? Besides, originator would have to do (async) RPC to primary
>>>> owner (which will be the primary owner of the event, too).
>>>> b) primary owner - with triangle, primary does not really know if the
>>>> change has been written on backup. Piggybacking that info won't be
>>>> trivial - we don't want to send another message explicitly. But even if
>>>> we get the confirmation, since the write to event cache is async, if the
>>>> primary owner crashes before replicating the event to backup, we lost
>>>> the event
>>>> c) all owners, but locally - that will require more complex
>>>> reconciliation if the event did really happen on all surviving nodes or
>>>> not. And backups could have some trouble to resolve order, too.
>>>>
>>>> IIUC clustered listeners are called from primary owner before the change
>>>> is really confirmed on backups (@Pedro correct me if I am wrong,
>>>> please), but for this reliable event cache you need higher level of
>>>> consistency.
>>> This could be handled by writing a confirmation or “commit” event to the log when the write is confirmed or the transaction is committed. Then, only those confirmed events/transactions would be exposed to client listeners. This requires some buffering, but this could be done in each HotRod client.
>>>
>>>> 3) The log will also have to filter out retried operations (based on
>>>> command ID - though this can be indexed, too). Though, I would prefer to
>>>> see per-event command-id log to deal with retries properly.
>>> IIUC, a “commit” event would work here, too.
>>>
>>>> 4) Client should pull data, but I would keep push notifications that
>>>> 'something happened' (throttled on server). There could be use case for
>>>> rarely updated caches, and polling the servers would be excessive there.
>>> IMO the clients should poll, but if the server has nothing to return it blocks until there is something or until a timeout occurs. This makes it easy for clients and actually reduces network traffic compared to constantly polling.
>>>
>>> BTW, a lot of this is replicating the functionality of Kafka, which is already quite mature and feature rich. It’s actually possible to *embed* Kafka to simplify operations, but I don’t think that’s recommended. And, it introduces a very complex codebase that would need to be supported.
>>>
>>>> Radim
>>>>
>>>>>
>>>>> [1]
>>>>> https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvement-proposal
>>>>>
>>>>> Thanks,
>>>>> Gustavo
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> [hidden email]
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>> --
>>>> Radim Vansa <[hidden email]>
>>>> JBoss Performance Team
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>

--
Bela Ban, JGroups lead (http://www.jgroups.org)

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Gustavo Fernandes-2
In reply to this post by Radim Vansa
On Fri, Dec 9, 2016 at 9:13 AM, Radim Vansa <[hidden email]> wrote:
1) 'cache that would persist the events with a monotonically increasing id'

I assume that you mean globally (for all entries) monotonous. How will
you obtain such ID? Currently, commands have unique IDs that are
<Address, Long> where the number part is monotonous per node. That's
easy to achieve. But introducing globally monotonous counter means that
there will be a single contention point. (you can introduce another
contention points by adding backups, but this is probably unnecessary as
you can find out the last id from the indexed cache data). Per-segment
monotonous would be probably more scalabe, though that increases complexity.

Having it per segment would imply only operations involving the same key would be ordered,
probably it's fine for most cases.

Could this order be affected during topology changes though? As I could observe, there is a small
window where there is more than 1 primary owner for a given key due to the fact that the CH propagation
is not complete.
 

2) 'The write to the event log would be async in order to not affect
normal data writes'

Who should write to the cache?
a) originator - what if originator crashes (despite the change has been
added)? Besides, originator would have to do (async) RPC to primary
owner (which will be the primary owner of the event, too).
b) primary owner - with triangle, primary does not really know if the
change has been written on backup. Piggybacking that info won't be
trivial - we don't want to send another message explicitly. But even if
we get the confirmation, since the write to event cache is async, if the
primary owner crashes before replicating the event to backup, we lost
the event
c) all owners, but locally - that will require more complex
reconciliation if the event did really happen on all surviving nodes or
not. And backups could have some trouble to resolve order, too.

IIUC clustered listeners are called from primary owner before the change
is really confirmed on backups (@Pedro correct me if I am wrong,
please), but for this reliable event cache you need higher level of
consistency.

Async writes to a cache event log would not provide the best of guarantees, agreed.

OTOH, to have the writes done synchronously, it'd be hard to avoid extra RPCs.
Some can be prevented by using a KeyPartitioner similar to the one used on the AffinityIndexManager [1],
so that Segment(K) = Segment(KE),  being K the key and KE the related event log key.

Still RPCs would happen to replicate events, and as you pointed out, it is not trivial to piggyback this on the triangle'd
data RPCs.

I'm starting to think that an extra cache to store events is overkill.

An alternative could be to bypass the event log cache altogether and store the events on the Lucene index directly.
For this a custom interceptor would write them to a local index when it's "safe" to do so, similar to what the QueryInterceptor
does with the Index.ALL flag, but only writing on primary + backup, more like to a hypothetical  "Index.OWNER" setup.

This index does not necessarily need to be stored in extra caches (like the Infinispan directory does) but can use a local MMap
based directory, making it OS cache friendly. At event consumption time, though, broadcast queries to the primary owners would be
needed to collect the events on each of the nodes and merge them before serving to the clients.



3) The log will also have to filter out retried operations (based on
command ID - though this can be indexed, too). Though, I would prefer to
see per-event command-id log to deal with retries properly.

4) Client should pull data, but I would keep push notifications that
'something happened' (throttled on server). There could be use case for
rarely updated caches, and polling the servers would be excessive there.

Radim


Makes sense, the push could be a notification that the event log changed and the
client would them proceed with its normal pull.


>
>
> [1]
> https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvement-proposal
>
> Thanks,
> Gustavo
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Gustavo Fernandes-2
In reply to this post by Radim Vansa


On Fri, Dec 9, 2016 at 9:13 AM, Radim Vansa <[hidden email]> wrote:
But introducing globally monotonous counter means that
there will be a single contention point.

I wonder if the trade-off of Flake Ids [1] could be acceptable for this use case.


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Sanne Grinovero-3
I'm reading many clever suggestions for various aspects of such a
system, but I fail to see a clear definition of the goal.

>From Randall's opening email I understand how MySQL does this, but
it's an example and I'm not sure which aspects are implementation
details of how MySQL happens to accomplish this, and which aspects are
requirements for the Infinispan enhancement proposals.

I remember a meeting with Manik Surtani, Jonathan Halliday and Mark
Little, whose outcome was a general agreement that Infinispan would
eventually need both tombstones and versioned entries, not just for
change data capture but to improve several other aspects;
unfortunately that was in December 2010 and never became a priority,
but the benefits are clear.
The complexities which have put off such plans lie in the "garbage
collection", aka the need to not grow the history without bounds, and
have to drop or compact history.

So I'm definitely sold on the need to add a certain amount of history,
but we need to define how much of this history is expected to be held.

In short, what's the ultimate goal? I see two main but different
options intertwined:
 - allow to synchronize the *final state* of a replica
 - inspect specific changes

For the first case, it would be enough for us to be able to provide a
"squashed history" (as in Git squash), but we'd need to keep versioned
shapshots around and someone needs to tell you which ones can be
garbage collected.
For example when a key is: written, updated, updated, deleted since
the snapshot, we'll send only "deleted" as the intermediary states are
irrelevant.
For the second case, say the goal is to inspect fluctuations of price
variations of some item, then the intermediary states are not
irrelevant.

Which one will we want to solve? Both?
Personally the attempt of solving the second one seems like a huge
pivot of the project, the current data-structures and storage are not
designed for this. I see the value of such benefits, but maybe
Infinispan is not the right tool for such a problem.

I'd prefer to focus on the benefits of the squashed history, and have
versioned entries soon, but even in that case we need to define which
versions need to be kept around, and how garbage collection /
vacuuming is handled.
This can be designed to be transparent to the client: handled as an
internal implementation detail which we use to improve performance of
Infinispan itself, or it can be exposed to clients to implement change
data capture, but in this case we need to track which clients are
still going to need an older snapshot; this has an impact as clients
would need to be registered, and has a significant impact on the
storage strategies.

Within Kafka the log compaction strategies are configurable; I have no
experience with Kafka but the docs seem to suggest that it's most
often used to provide the last known value of each key. That would be
doable for us, but Kafka also does allow optionally for wider scope
retention strategies: can we agree that that would not be an option
with Infinispan? If not, these goals need to be clarified.

My main concern is that if we don't limit the scope of which
capabilities we want Infinispan to provide, it risks to become the
same thing as Kafka, rather than integrating with it. I don't think we
want to pivot all our storage design into efficiently treating large
scale logs.

In short, I'd like to see an agreement that analyzing e.g.
fluctuations in stock prices would be a non-goal, if these are stored
as {"stock name", value} key/value pairs. One could still implement
such a thing by using a more sophisticated model, just don't expect to
be able to see all intermediary values each entry has ever had since
the key was first used.

# Commenting on specific proposals

On ID generation: I'd definitely go with IDs per segment rather than
IDs per key for the purpose of change data capture. If you go with
independent IDs per key, the client would need to keep track of each
version of each entry, which has an high overhead and degree of
complexity for the clients.
On the other hand, we already guarantee that each segment is managed
by a single primary owner, so attaching the "segment transaction id"
to each internal entry being changed can be implemented efficiently by
Infinispan.
Segment ownership handoff needs to be highly consistent during cluster
topology changes, but that requirement already exists; we'd just need
to make sure that this monotonic counter is included during the
handoff of the responsibility as primary owner of a segment.

Thanks,

Sanne





On 12 December 2016 at 13:58, Gustavo Fernandes <[hidden email]> wrote:

>
>
> On Fri, Dec 9, 2016 at 9:13 AM, Radim Vansa <[hidden email]> wrote:
>>
>> But introducing globally monotonous counter means that
>> there will be a single contention point.
>
>
> I wonder if the trade-off of Flake Ids [1] could be acceptable for this use
> case.
>
> [1] http://yellerapp.com/posts/2015-02-09-flake-ids.html
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Gustavo Fernandes-2
On Mon, Dec 12, 2016 at 3:13 PM, Sanne Grinovero <[hidden email]> wrote:
 
In short, what's the ultimate goal? I see two main but different
options intertwined:
 - allow to synchronize the *final state* of a replica

I'm assuming this case is already in place when using remote listeners and includeCurrentState=true and we are
discussing how to improve it, as described in the proposal in the wiki and on the 5th email of this thread.
 
 - inspect specific changes

For the first case, it would be enough for us to be able to provide a
"squashed history" (as in Git squash), but we'd need to keep versioned
shapshots around and someone needs to tell you which ones can be
garbage collected.
For example when a key is: written, updated, updated, deleted since
the snapshot, we'll send only "deleted" as the intermediary states are
irrelevant.
For the second case, say the goal is to inspect fluctuations of price
variations of some item, then the intermediary states are not
irrelevant.

Which one will we want to solve? Both?
 

Looking at http://debezium.io/, it implies the second case.

"[...] Start it up, point it at your databases, and your apps can start responding to all of the inserts, updates,
and deletes that other apps commit to your databases. [...] your apps can respond quickly and never miss an event,
even when things go wrong."

IMO the choice between squashed/full history, and even retention time is highly application specific. Deletes might
not even be involved, one may be interested on answering "what is the peak value of a certain key during the day?"

 
Personally the attempt of solving the second one seems like a huge
pivot of the project, the current data-structures and storage are not
designed for this.


+1, as I wrote earlier about ditching the idea of event cache storage in favor of Lucene.

 
I see the value of such benefits, but maybe
Infinispan is not the right tool for such a problem.

I'd prefer to focus on the benefits of the squashed history, and have
versioned entries soon, but even in that case we need to define which
versions need to be kept around, and how garbage collection /
vacuuming is handled.

Is that proposal written/recorded somewhere? It'd be interesting to know how a client interested on data
changes would consume those multi-versioned entries (push/pull with offset?, sorted/unsorted?, client tracking/per key/per version?),
as it seems there is some storage impedance as well.
 

In short, I'd like to see an agreement that analyzing e.g.
fluctuations in stock prices would be a non-goal, if these are stored
as {"stock name", value} key/value pairs. One could still implement
such a thing by using a more sophisticated model, just don't expect to
be able to see all intermediary values each entry has ever had since
the key was first used.


Continuous Queries listens to data key/value data using a query, should it not be expected to
see all the intermediary values when changes in the server causes an entry to start/stop matching
the query?

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Radim Vansa
On 12/12/2016 06:56 PM, Gustavo Fernandes wrote:

> On Mon, Dec 12, 2016 at 3:13 PM, Sanne Grinovero <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     In short, what's the ultimate goal? I see two main but different
>     options intertwined:
>      - allow to synchronize the *final state* of a replica
>
>
> I'm assuming this case is already in place when using remote listeners
> and includeCurrentState=true and we are
> discussing how to improve it, as described in the proposal in the wiki
> and on the 5th email of this thread.

I don't think the guarantees for any listeners are explicitly stated
anywhere in docs. There are two parts of it:

- ideal state: I assume that in ideal state we don't want to miss any
committed operation, but we have to define committed. And mention that
events can be received multiple times (we aim at at-least-once semantics)
- current limitations: behaviour that does not resonate with the ideal
but we were not able to fix it so far. Even [1] does not mention
listeners (and it would be outdated).

[1]
https://github.com/infinispan/infinispan/wiki/Consistency-guarantees-in-Infinispan

>      - inspect specific changes
>
>     For the first case, it would be enough for us to be able to provide a
>     "squashed history" (as in Git squash), but we'd need to keep versioned
>     shapshots around and someone needs to tell you which ones can be
>     garbage collected.
>     For example when a key is: written, updated, updated, deleted since
>     the snapshot, we'll send only "deleted" as the intermediary states are
>     irrelevant.
>     For the second case, say the goal is to inspect fluctuations of price
>     variations of some item, then the intermediary states are not
>     irrelevant.
>
>     Which one will we want to solve? Both?
>
>
> Looking at http://debezium.io/, it implies the second case.
>
> "[...] Start it up, point it at your databases, and your apps can
> start responding to all of the inserts, updates,
> and deletes that other apps commit to your databases. [...] your apps
> can respond quickly and never miss an event,
> even when things go wrong."
>
> IMO the choice between squashed/full history, and even retention time
> is highly application specific. Deletes might
> not even be involved, one may be interested on answering "what is the
> peak value of a certain key during the day?"
>
>     Personally the attempt of solving the second one seems like a huge
>     pivot of the project, the current data-structures and storage are not
>     designed for this.
>
>
>
> +1, as I wrote earlier about ditching the idea of event cache storage
> in favor of Lucene.
>
>     I see the value of such benefits, but maybe
>     Infinispan is not the right tool for such a problem.
>
>     I'd prefer to focus on the benefits of the squashed history, and have
>     versioned entries soon, but even in that case we need to define which
>     versions need to be kept around, and how garbage collection /
>     vacuuming is handled.
>
>
> Is that proposal written/recorded somewhere? It'd be interesting to
> know how a client interested on data
> changes would consume those multi-versioned entries (push/pull with
> offset?, sorted/unsorted?, client tracking/per key/per version?),
> as it seems there is some storage impedance as well.
>
>
>     In short, I'd like to see an agreement that analyzing e.g.
>     fluctuations in stock prices would be a non-goal, if these are stored
>     as {"stock name", value} key/value pairs. One could still implement
>     such a thing by using a more sophisticated model, just don't expect to
>     be able to see all intermediary values each entry has ever had since
>     the key was first used.
>
>
>
> Continuous Queries listens to data key/value data using a query,
> should it not be expected to
> see all the intermediary values when changes in the server causes an
> entry to start/stop matching
> the query?

In Konstanz we were discussing listeners with Dan and later with Adrian
and found out that CQ expects listeners to be much more reliable than
these actually are. So, CQ is already broken and people can live with
that; Theoretically Debezium can do the same, boldly claim that "your
apps can respond quickly and never miss an event, even when things go
wrong" and push the blame to Infinispan :)

Radim

>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Infinispan and change data capture

Galder Zamarreño
In reply to this post by Gustavo Fernandes-2

--
Galder Zamarreño
Infinispan, Red Hat

> On 12 Dec 2016, at 14:58, Gustavo Fernandes <[hidden email]> wrote:
>
>
>
> On Fri, Dec 9, 2016 at 9:13 AM, Radim Vansa <[hidden email]> wrote:
> But introducing globally monotonous counter means that
> there will be a single contention point.
>
> I wonder if the trade-off of Flake Ids [1] could be acceptable for this use case.

Not exactly the same, but org.infinispan.container.versioning.NumericVersionGenerator uses a view id + node rank + local counter combo for generating version numbers.

>
> [1] http://yellerapp.com/posts/2015-02-09-flake-ids.html
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
123