[infinispan-dev] Transactional consistency of query

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[infinispan-dev] Transactional consistency of query

Radim Vansa
Hi,

while working on ISPN-7806 I am wondering how should queries work with
transactions. Right now it seems that updates to index are done during
either regular command execution (on originator [A]) or prepare command
on remote nodes [B]. Both of these cause rolled-back transactions to be
seen, so these must be treated as bugs [C].

If we index the data after committing the transaction, there would be a
time window when we could see the updated entries but the index would
not reflect that. That might be acceptable limitation if a
query-matching misses some entity, but it's also possible that we
retrieve the query result key-set and then (after retrieving full
entities) we return something that does not match the query. One of the
reproducers for ISPN-7806 I've written [1] triggers a situation where
listing all Persons could return Animal (different entity type), so I
think that there's no validity post-check (though these reproducers
don't use transactions).

Therefore, I wonder if the index should contain only the key; maybe we
should store an unique version and invalidate the query if some of the
entries has changed.

If we index the data before committing the transaction, similar
situation could happen: the index will return keys for entities that
will match in the future but the actually returned list will contain
stale entities.

What's the overall plan? Do we just accept inconsistencies? In that
case, please add a verbose statement in docs and point me to that.

And if I've misinterpreted something and raised the red flag in error,
please let me know.

Radim

[A] This seems to be a regression after moving towards async
interceptors - our impl of
org.hibernate.search.backend.TransactionContext is incorrectly bound to
TransactionManager. Then we seem to be running out of transaction and
are happy to index it right away. The thread that executes the
interceptor handler is also dependent on ownership (due to remote
LockCommand execution), so I think that it does not fail the local-mode
tests.

[B] ... and it does so twice as a regression after ISPN-7840 but that's
easy to fix.

[C] Indexing in prepare command was OK before ISPN-7840 with pessimistic
locking which does not send the CommitCommand, but now that the QI has
been moved below EWI it means that we're indexing before storing the
actual values. Optimistic locking was not correct, though.

[1]
https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Transactional consistency of query

Adrian Nistor
My feeling regarding this was to accept such inconsistencies, but maybe
I'm wrong. I've always regarded indexing as being async in general, even
though it did behave as if being sync in some not so rare circumstances,
which probably made people believe it is expected to be sync in general.
I'm curious what Sanne and Gustavo have in mind.

Please note that updating the index synchronously during tx commit was
always regarded as a performance bottleneck, so it was out of the
question. And that would not always work anyway, it all depends on the
underlying indexing technology. For example when using HS with elastic
search you have to accept that elastic indexing is always async.

And there might not be an index at all. It's very possible that the
query runs unindexed. In that case it will use distributed streams which
have their own transaction issues.

In the past we had some bugs were a matching entry was deleted/evicted
right before the search results were returned to the user, so loading of
those values failed in a silent way. Those queries mistakenly returned
some unexpected nulls among other valid results. The fix was to just
filter out those nulls. We could enhance that to double check that the
returned entry is indeed of the requested type, to also cover the issue
that you encountered.

Adrian

On 07/28/2017 01:38 PM, Radim Vansa wrote:

> Hi,
>
> while working on ISPN-7806 I am wondering how should queries work with
> transactions. Right now it seems that updates to index are done during
> either regular command execution (on originator [A]) or prepare command
> on remote nodes [B]. Both of these cause rolled-back transactions to be
> seen, so these must be treated as bugs [C].
>
> If we index the data after committing the transaction, there would be a
> time window when we could see the updated entries but the index would
> not reflect that. That might be acceptable limitation if a
> query-matching misses some entity, but it's also possible that we
> retrieve the query result key-set and then (after retrieving full
> entities) we return something that does not match the query. One of the
> reproducers for ISPN-7806 I've written [1] triggers a situation where
> listing all Persons could return Animal (different entity type), so I
> think that there's no validity post-check (though these reproducers
> don't use transactions).
>
> Therefore, I wonder if the index should contain only the key; maybe we
> should store an unique version and invalidate the query if some of the
> entries has changed.
>
> If we index the data before committing the transaction, similar
> situation could happen: the index will return keys for entities that
> will match in the future but the actually returned list will contain
> stale entities.
>
> What's the overall plan? Do we just accept inconsistencies? In that
> case, please add a verbose statement in docs and point me to that.
>
> And if I've misinterpreted something and raised the red flag in error,
> please let me know.
>
> Radim
>
> [A] This seems to be a regression after moving towards async
> interceptors - our impl of
> org.hibernate.search.backend.TransactionContext is incorrectly bound to
> TransactionManager. Then we seem to be running out of transaction and
> are happy to index it right away. The thread that executes the
> interceptor handler is also dependent on ownership (due to remote
> LockCommand execution), so I think that it does not fail the local-mode
> tests.
>
> [B] ... and it does so twice as a regression after ISPN-7840 but that's
> easy to fix.
>
> [C] Indexing in prepare command was OK before ISPN-7840 with pessimistic
> locking which does not send the CommitCommand, but now that the QI has
> been moved below EWI it means that we're indexing before storing the
> actual values. Optimistic locking was not correct, though.
>
> [1]
> https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546
>
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Transactional consistency of query

Radim Vansa
On 07/28/2017 02:59 PM, Adrian Nistor wrote:

> My feeling regarding this was to accept such inconsistencies, but
> maybe I'm wrong. I've always regarded indexing as being async in
> general, even though it did behave as if being sync in some not so
> rare circumstances, which probably made people believe it is expected
> to be sync in general. I'm curious what Sanne and Gustavo have in mind.
>
> Please note that updating the index synchronously during tx commit was
> always regarded as a performance bottleneck, so it was out of the
> question. And that would not always work anyway, it all depends on the
> underlying indexing technology. For example when using HS with elastic
> search you have to accept that elastic indexing is always async.

OK, queries being inherently async would be acceptable for me (as long
as we document it - preferably blogging about the limitations, too).
Could you make sure that But async should mean that the result looks as
being done at some point earlier, maybe mix ordering a bit, but not that
it's inconsistent (e.g. returning entries that not match the criteria).
Also in case that we store fields in index and return a projection,
those values should not come expose any non-committed data.

I guess that expecting query in transaction to reflect uncommitted state
would be probably too much :)

>
> And there might not be an index at all. It's very possible that the
> query runs unindexed. In that case it will use distributed streams
> which have their own transaction issues.

Yes; please leave non-indexed queries aside from this discussion.

>
> In the past we had some bugs were a matching entry was deleted/evicted
> right before the search results were returned to the user, so loading
> of those values failed in a silent way. Those queries mistakenly
> returned some unexpected nulls among other valid results. The fix was
> to just filter out those nulls. We could enhance that to double check
> that the returned entry is indeed of the requested type, to also cover
> the issue that you encountered.

It's not just entity type, criteria may be invalidated by any field
change. Would a full criteria check on the returned entities be too
expensive? Can you even check e.g. native queries against provided set
of objects?

Radim

>
> Adrian
>
> On 07/28/2017 01:38 PM, Radim Vansa wrote:
>> Hi,
>>
>> while working on ISPN-7806 I am wondering how should queries work with
>> transactions. Right now it seems that updates to index are done during
>> either regular command execution (on originator [A]) or prepare command
>> on remote nodes [B]. Both of these cause rolled-back transactions to be
>> seen, so these must be treated as bugs [C].
>>
>> If we index the data after committing the transaction, there would be a
>> time window when we could see the updated entries but the index would
>> not reflect that. That might be acceptable limitation if a
>> query-matching misses some entity, but it's also possible that we
>> retrieve the query result key-set and then (after retrieving full
>> entities) we return something that does not match the query. One of the
>> reproducers for ISPN-7806 I've written [1] triggers a situation where
>> listing all Persons could return Animal (different entity type), so I
>> think that there's no validity post-check (though these reproducers
>> don't use transactions).
>>
>> Therefore, I wonder if the index should contain only the key; maybe we
>> should store an unique version and invalidate the query if some of the
>> entries has changed.
>>
>> If we index the data before committing the transaction, similar
>> situation could happen: the index will return keys for entities that
>> will match in the future but the actually returned list will contain
>> stale entities.
>>
>> What's the overall plan? Do we just accept inconsistencies? In that
>> case, please add a verbose statement in docs and point me to that.
>>
>> And if I've misinterpreted something and raised the red flag in error,
>> please let me know.
>>
>> Radim
>>
>> [A] This seems to be a regression after moving towards async
>> interceptors - our impl of
>> org.hibernate.search.backend.TransactionContext is incorrectly bound to
>> TransactionManager. Then we seem to be running out of transaction and
>> are happy to index it right away. The thread that executes the
>> interceptor handler is also dependent on ownership (due to remote
>> LockCommand execution), so I think that it does not fail the local-mode
>> tests.
>>
>> [B] ... and it does so twice as a regression after ISPN-7840 but that's
>> easy to fix.
>>
>> [C] Indexing in prepare command was OK before ISPN-7840 with pessimistic
>> locking which does not send the CommitCommand, but now that the QI has
>> been moved below EWI it means that we're indexing before storing the
>> actual values. Optimistic locking was not correct, though.
>>
>> [1]
>> https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546 
>>
>>
>>
>


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Transactional consistency of query

Gustavo Fernandes-2
In reply to this post by Adrian Nistor
IMO, indexing should be eventually consistent, as this offers the best performance.

On tx-caches, although Lucene has hooks to be enlisted in a transaction [1], some backends (elasticsearch) don't
expose this, and Hibernate Search by design doesn't make use of it. So currently we must deal with inconsistencies
after the fact: checking for nulls, mismatched types and so on.

[1] https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/index/TwoPhaseCommit.html


On Fri, Jul 28, 2017 at 1:59 PM, Adrian Nistor <[hidden email]> wrote:
My feeling regarding this was to accept such inconsistencies, but maybe
I'm wrong. I've always regarded indexing as being async in general, even
though it did behave as if being sync in some not so rare circumstances,
which probably made people believe it is expected to be sync in general.
I'm curious what Sanne and Gustavo have in mind.

Please note that updating the index synchronously during tx commit was
always regarded as a performance bottleneck, so it was out of the
question.
And that would not always work anyway, it all depends on the
underlying indexing technology. For example when using HS with elastic
search you have to accept that elastic indexing is always async.

And there might not be an index at all. It's very possible that the
query runs unindexed. In that case it will use distributed streams which
have their own transaction issues.

In the past we had some bugs were a matching entry was deleted/evicted
right before the search results were returned to the user, so loading of
those values failed in a silent way. Those queries mistakenly returned
some unexpected nulls among other valid results. The fix was to just
filter out those nulls. We could enhance that to double check that the
returned entry is indeed of the requested type, to also cover the issue
that you encountered.

Adrian

On 07/28/2017 01:38 PM, Radim Vansa wrote:
> Hi,
>
> while working on ISPN-7806 I am wondering how should queries work with
> transactions. Right now it seems that updates to index are done during
> either regular command execution (on originator [A]) or prepare command
> on remote nodes [B]. Both of these cause rolled-back transactions to be
> seen, so these must be treated as bugs [C].
>
> If we index the data after committing the transaction, there would be a
> time window when we could see the updated entries but the index would
> not reflect that. That might be acceptable limitation if a
> query-matching misses some entity, but it's also possible that we
> retrieve the query result key-set and then (after retrieving full
> entities) we return something that does not match the query. One of the
> reproducers for ISPN-7806 I've written [1] triggers a situation where
> listing all Persons could return Animal (different entity type), so I
> think that there's no validity post-check (though these reproducers
> don't use transactions).
>
> Therefore, I wonder if the index should contain only the key; maybe we
> should store an unique version and invalidate the query if some of the
> entries has changed.
>
> If we index the data before committing the transaction, similar
> situation could happen: the index will return keys for entities that
> will match in the future but the actually returned list will contain
> stale entities.
>
> What's the overall plan? Do we just accept inconsistencies? In that
> case, please add a verbose statement in docs and point me to that.
>
> And if I've misinterpreted something and raised the red flag in error,
> please let me know.
>
> Radim
>
> [A] This seems to be a regression after moving towards async
> interceptors - our impl of
> org.hibernate.search.backend.TransactionContext is incorrectly bound to
> TransactionManager. Then we seem to be running out of transaction and
> are happy to index it right away. The thread that executes the
> interceptor handler is also dependent on ownership (due to remote
> LockCommand execution), so I think that it does not fail the local-mode
> tests.
>
> [B] ... and it does so twice as a regression after ISPN-7840 but that's
> easy to fix.
>
> [C] Indexing in prepare command was OK before ISPN-7840 with pessimistic
> locking which does not send the CommitCommand, but now that the QI has
> been moved below EWI it means that we're indexing before storing the
> actual values. Optimistic locking was not correct, though.
>
> [1]
> https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546
>
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Transactional consistency of query

Tristan Tarrant-2
Shouldn't we use an appropriate conflict resolution strategy for this so
that in case of partitions we repair the index ?

Tristan

On 7/31/17 10:41 AM, Gustavo Fernandes wrote:

> IMO, indexing should be eventually consistent, as this offers the best
> performance.
>
> On tx-caches, although Lucene has hooks to be enlisted in a transaction
> [1], some backends (elasticsearch) don't
> expose this, and Hibernate Search by design doesn't make use of it. So
> currently we must deal with inconsistencies
> after the fact: checking for nulls, mismatched types and so on.
>
> [1]
> https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/index/TwoPhaseCommit.html
>
>
> On Fri, Jul 28, 2017 at 1:59 PM, Adrian Nistor <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     My feeling regarding this was to accept such inconsistencies, but maybe
>     I'm wrong. I've always regarded indexing as being async in general, even
>     though it did behave as if being sync in some not so rare circumstances,
>     which probably made people believe it is expected to be sync in general.
>     I'm curious what Sanne and Gustavo have in mind.
>
>     Please note that updating the index synchronously during tx commit was
>     always regarded as a performance bottleneck, so it was out of the
>     question.
>
>     And that would not always work anyway, it all depends on the
>     underlying indexing technology. For example when using HS with elastic
>     search you have to accept that elastic indexing is always async.
>
>     And there might not be an index at all. It's very possible that the
>     query runs unindexed. In that case it will use distributed streams which
>     have their own transaction issues.
>
>     In the past we had some bugs were a matching entry was deleted/evicted
>     right before the search results were returned to the user, so loading of
>     those values failed in a silent way. Those queries mistakenly returned
>     some unexpected nulls among other valid results. The fix was to just
>     filter out those nulls. We could enhance that to double check that the
>     returned entry is indeed of the requested type, to also cover the issue
>     that you encountered.
>
>     Adrian
>
>     On 07/28/2017 01:38 PM, Radim Vansa wrote:
>      > Hi,
>      >
>      > while working on ISPN-7806 I am wondering how should queries work
>     with
>      > transactions. Right now it seems that updates to index are done
>     during
>      > either regular command execution (on originator [A]) or prepare
>     command
>      > on remote nodes [B]. Both of these cause rolled-back transactions
>     to be
>      > seen, so these must be treated as bugs [C].
>      >
>      > If we index the data after committing the transaction, there
>     would be a
>      > time window when we could see the updated entries but the index would
>      > not reflect that. That might be acceptable limitation if a
>      > query-matching misses some entity, but it's also possible that we
>      > retrieve the query result key-set and then (after retrieving full
>      > entities) we return something that does not match the query. One
>     of the
>      > reproducers for ISPN-7806 I've written [1] triggers a situation where
>      > listing all Persons could return Animal (different entity type), so I
>      > think that there's no validity post-check (though these reproducers
>      > don't use transactions).
>      >
>      > Therefore, I wonder if the index should contain only the key;
>     maybe we
>      > should store an unique version and invalidate the query if some
>     of the
>      > entries has changed.
>      >
>      > If we index the data before committing the transaction, similar
>      > situation could happen: the index will return keys for entities that
>      > will match in the future but the actually returned list will contain
>      > stale entities.
>      >
>      > What's the overall plan? Do we just accept inconsistencies? In that
>      > case, please add a verbose statement in docs and point me to that.
>      >
>      > And if I've misinterpreted something and raised the red flag in
>     error,
>      > please let me know.
>      >
>      > Radim
>      >
>      > [A] This seems to be a regression after moving towards async
>      > interceptors - our impl of
>      > org.hibernate.search.backend.TransactionContext is incorrectly
>     bound to
>      > TransactionManager. Then we seem to be running out of transaction and
>      > are happy to index it right away. The thread that executes the
>      > interceptor handler is also dependent on ownership (due to remote
>      > LockCommand execution), so I think that it does not fail the
>     local-mode
>      > tests.
>      >
>      > [B] ... and it does so twice as a regression after ISPN-7840 but
>     that's
>      > easy to fix.
>      >
>      > [C] Indexing in prepare command was OK before ISPN-7840 with
>     pessimistic
>      > locking which does not send the CommitCommand, but now that the
>     QI has
>      > been moved below EWI it means that we're indexing before storing the
>      > actual values. Optimistic locking was not correct, though.
>      >
>      > [1]
>      >
>     https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546
>     <https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546>
>      >
>      >
>
>     _______________________________________________
>     infinispan-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://lists.jboss.org/mailman/listinfo/infinispan-dev
>     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

--
Tristan Tarrant
Infinispan Lead
JBoss, a division of Red Hat
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Transactional consistency of query

Radim Vansa
On 07/31/2017 11:12 AM, Tristan Tarrant wrote:
> Shouldn't we use an appropriate conflict resolution strategy for this so
> that in case of partitions we repair the index ?

This is not about eventual consistency in case of partitions, just
eventually publishing the change in the index after the transaction
completes.

Making index consistent after a split brain (even with DENY_ALL policy
some operations may end up in a half-complete state) is a completely
different issue and I think nobody ever tried to deal with that.

R.

>
> Tristan
>
> On 7/31/17 10:41 AM, Gustavo Fernandes wrote:
>> IMO, indexing should be eventually consistent, as this offers the best
>> performance.
>>
>> On tx-caches, although Lucene has hooks to be enlisted in a transaction
>> [1], some backends (elasticsearch) don't
>> expose this, and Hibernate Search by design doesn't make use of it. So
>> currently we must deal with inconsistencies
>> after the fact: checking for nulls, mismatched types and so on.
>>
>> [1]
>> https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/index/TwoPhaseCommit.html
>>
>>
>> On Fri, Jul 28, 2017 at 1:59 PM, Adrian Nistor <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>      My feeling regarding this was to accept such inconsistencies, but maybe
>>      I'm wrong. I've always regarded indexing as being async in general, even
>>      though it did behave as if being sync in some not so rare circumstances,
>>      which probably made people believe it is expected to be sync in general.
>>      I'm curious what Sanne and Gustavo have in mind.
>>
>>      Please note that updating the index synchronously during tx commit was
>>      always regarded as a performance bottleneck, so it was out of the
>>      question.
>>
>>      And that would not always work anyway, it all depends on the
>>      underlying indexing technology. For example when using HS with elastic
>>      search you have to accept that elastic indexing is always async.
>>
>>      And there might not be an index at all. It's very possible that the
>>      query runs unindexed. In that case it will use distributed streams which
>>      have their own transaction issues.
>>
>>      In the past we had some bugs were a matching entry was deleted/evicted
>>      right before the search results were returned to the user, so loading of
>>      those values failed in a silent way. Those queries mistakenly returned
>>      some unexpected nulls among other valid results. The fix was to just
>>      filter out those nulls. We could enhance that to double check that the
>>      returned entry is indeed of the requested type, to also cover the issue
>>      that you encountered.
>>
>>      Adrian
>>
>>      On 07/28/2017 01:38 PM, Radim Vansa wrote:
>>       > Hi,
>>       >
>>       > while working on ISPN-7806 I am wondering how should queries work
>>      with
>>       > transactions. Right now it seems that updates to index are done
>>      during
>>       > either regular command execution (on originator [A]) or prepare
>>      command
>>       > on remote nodes [B]. Both of these cause rolled-back transactions
>>      to be
>>       > seen, so these must be treated as bugs [C].
>>       >
>>       > If we index the data after committing the transaction, there
>>      would be a
>>       > time window when we could see the updated entries but the index would
>>       > not reflect that. That might be acceptable limitation if a
>>       > query-matching misses some entity, but it's also possible that we
>>       > retrieve the query result key-set and then (after retrieving full
>>       > entities) we return something that does not match the query. One
>>      of the
>>       > reproducers for ISPN-7806 I've written [1] triggers a situation where
>>       > listing all Persons could return Animal (different entity type), so I
>>       > think that there's no validity post-check (though these reproducers
>>       > don't use transactions).
>>       >
>>       > Therefore, I wonder if the index should contain only the key;
>>      maybe we
>>       > should store an unique version and invalidate the query if some
>>      of the
>>       > entries has changed.
>>       >
>>       > If we index the data before committing the transaction, similar
>>       > situation could happen: the index will return keys for entities that
>>       > will match in the future but the actually returned list will contain
>>       > stale entities.
>>       >
>>       > What's the overall plan? Do we just accept inconsistencies? In that
>>       > case, please add a verbose statement in docs and point me to that.
>>       >
>>       > And if I've misinterpreted something and raised the red flag in
>>      error,
>>       > please let me know.
>>       >
>>       > Radim
>>       >
>>       > [A] This seems to be a regression after moving towards async
>>       > interceptors - our impl of
>>       > org.hibernate.search.backend.TransactionContext is incorrectly
>>      bound to
>>       > TransactionManager. Then we seem to be running out of transaction and
>>       > are happy to index it right away. The thread that executes the
>>       > interceptor handler is also dependent on ownership (due to remote
>>       > LockCommand execution), so I think that it does not fail the
>>      local-mode
>>       > tests.
>>       >
>>       > [B] ... and it does so twice as a regression after ISPN-7840 but
>>      that's
>>       > easy to fix.
>>       >
>>       > [C] Indexing in prepare command was OK before ISPN-7840 with
>>      pessimistic
>>       > locking which does not send the CommitCommand, but now that the
>>      QI has
>>       > been moved below EWI it means that we're indexing before storing the
>>       > actual values. Optimistic locking was not correct, though.
>>       >
>>       > [1]
>>       >
>>      https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546
>>      <https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546>
>>       >
>>       >
>>
>>      _______________________________________________
>>      infinispan-dev mailing list
>>      [hidden email] <mailto:[hidden email]>
>>      https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>      <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Transactional consistency of query

Adrian Nistor
In reply to this post by Gustavo Fernandes-2
Yup, I also meant 'eventually consistent' when saying such inconsistencies should be acceptable. At some point in time after transactions have been committed and topology changes have been handled (state transfer completed) and we have a steady state we should see a consistent index when querying.

On 07/31/2017 11:41 AM, Gustavo Fernandes wrote:
IMO, indexing should be eventually consistent, as this offers the best performance.

On tx-caches, although Lucene has hooks to be enlisted in a transaction [1], some backends (elasticsearch) don't
expose this, and Hibernate Search by design doesn't make use of it. So currently we must deal with inconsistencies
after the fact: checking for nulls, mismatched types and so on.

[1] https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/index/TwoPhaseCommit.html


On Fri, Jul 28, 2017 at 1:59 PM, Adrian Nistor <[hidden email]> wrote:
My feeling regarding this was to accept such inconsistencies, but maybe
I'm wrong. I've always regarded indexing as being async in general, even
though it did behave as if being sync in some not so rare circumstances,
which probably made people believe it is expected to be sync in general.
I'm curious what Sanne and Gustavo have in mind.

Please note that updating the index synchronously during tx commit was
always regarded as a performance bottleneck, so it was out of the
question.
And that would not always work anyway, it all depends on the
underlying indexing technology. For example when using HS with elastic
search you have to accept that elastic indexing is always async.

And there might not be an index at all. It's very possible that the
query runs unindexed. In that case it will use distributed streams which
have their own transaction issues.

In the past we had some bugs were a matching entry was deleted/evicted
right before the search results were returned to the user, so loading of
those values failed in a silent way. Those queries mistakenly returned
some unexpected nulls among other valid results. The fix was to just
filter out those nulls. We could enhance that to double check that the
returned entry is indeed of the requested type, to also cover the issue
that you encountered.

Adrian

On 07/28/2017 01:38 PM, Radim Vansa wrote:
> Hi,
>
> while working on ISPN-7806 I am wondering how should queries work with
> transactions. Right now it seems that updates to index are done during
> either regular command execution (on originator [A]) or prepare command
> on remote nodes [B]. Both of these cause rolled-back transactions to be
> seen, so these must be treated as bugs [C].
>
> If we index the data after committing the transaction, there would be a
> time window when we could see the updated entries but the index would
> not reflect that. That might be acceptable limitation if a
> query-matching misses some entity, but it's also possible that we
> retrieve the query result key-set and then (after retrieving full
> entities) we return something that does not match the query. One of the
> reproducers for ISPN-7806 I've written [1] triggers a situation where
> listing all Persons could return Animal (different entity type), so I
> think that there's no validity post-check (though these reproducers
> don't use transactions).
>
> Therefore, I wonder if the index should contain only the key; maybe we
> should store an unique version and invalidate the query if some of the
> entries has changed.
>
> If we index the data before committing the transaction, similar
> situation could happen: the index will return keys for entities that
> will match in the future but the actually returned list will contain
> stale entities.
>
> What's the overall plan? Do we just accept inconsistencies? In that
> case, please add a verbose statement in docs and point me to that.
>
> And if I've misinterpreted something and raised the red flag in error,
> please let me know.
>
> Radim
>
> [A] This seems to be a regression after moving towards async
> interceptors - our impl of
> org.hibernate.search.backend.TransactionContext is incorrectly bound to
> TransactionManager. Then we seem to be running out of transaction and
> are happy to index it right away. The thread that executes the
> interceptor handler is also dependent on ownership (due to remote
> LockCommand execution), so I think that it does not fail the local-mode
> tests.
>
> [B] ... and it does so twice as a regression after ISPN-7840 but that's
> easy to fix.
>
> [C] Indexing in prepare command was OK before ISPN-7840 with pessimistic
> locking which does not send the CommitCommand, but now that the QI has
> been moved below EWI it means that we're indexing before storing the
> actual values. Optimistic locking was not correct, though.
>
> [1]
> https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546
>
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Loading...