[infinispan-dev] Major version cleaning

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Major version cleaning

Tristan Tarrant-2


On 21/02/17 17:59, Sanne Grinovero wrote:

> On 21 February 2017 at 16:16, Tristan Tarrant <[hidden email]> wrote:
>> On 21/02/17 16:29, Sanne Grinovero wrote:
>>>> You haven't explained what "flush" means. Since you separate that from
>>>> atomicity/consistency, I assume that batches on non-tx cache are just
>>>> ordered putOrRemoveAll operations, immediately visible on flush without
>>>> any atomicity.
>>
>> I assume that in Sanne's idea, ordering in a batch doesn't matter, aside
>> from operations on the same key. Having ordering in there would for
>> example not allow us to parallelize by segment.
>>
>>> So I want to write a first chunk, in our code that looks like:
>>>
>>> startBatch
>>> put(chunk1/A, [some large value])
>>> put(chunk1/B, [some small metadata])
>>> put(chunk1/C, [some small metadata])
>>> endBatch
>>> There is no reason to use a transaction, in fact we had to disable
>>> transactions as some of these entries could be large.
>>> There also is no reason for the batch, other than optimising the latency.
>> Let me summarize to see if we have the requirements for a useful
>> batching system (which is sort of patterned on the JDBC statement batching):
>>
>> - a batch is not an atomic operation, i.e. it is not backed by a transaction
>> - it can be wrapped in a transaction if needed
>> - batches cannot be nested
>> - batches only involve unconditional write operations (put, putAll, remove)
>> - ordering of operations within a batch is unimportant aside from
>> modifications to the same key where we apply "last one wins"
>> - when a batch is "flushed" (i.e. endBatch is invoked) the ops are
>> grouped by segment and sent to the appropriate owner for processing,
>> potentially in parallel
>>
>> As Radim has called it, this is essentially a putOrRemoveAll op (with an
>> async counterpart).
>>
>> Is that summary correct ?
>
> Yes! Thanks
>
> BTW I don't fully subscribe that conditional operations shouldn't be
> considered, but I'm happy to keep that pandora box for another time.
> Remember optimistic transactions are useful in some cases.

Why I thought that putIfAbsent might not be appropriate I don't know. It
makes perfect sense. Even the replace ops.

--
Tristan Tarrant
Infinispan Lead
JBoss, a division of Red Hat
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Major version cleaning

Radim Vansa
In reply to this post by Tristan Tarrant-2
On 02/21/2017 05:16 PM, Tristan Tarrant wrote:

> On 21/02/17 16:29, Sanne Grinovero wrote:
>>> You haven't explained what "flush" means. Since you separate that from
>>> atomicity/consistency, I assume that batches on non-tx cache are just
>>> ordered putOrRemoveAll operations, immediately visible on flush without
>>> any atomicity.
> I assume that in Sanne's idea, ordering in a batch doesn't matter, aside
> from operations on the same key. Having ordering in there would for
> example not allow us to parallelize by segment.
>
>> So I want to write a first chunk, in our code that looks like:
>>
>> startBatch
>> put(chunk1/A, [some large value])
>> put(chunk1/B, [some small metadata])
>> put(chunk1/C, [some small metadata])
>> endBatch
>> There is no reason to use a transaction, in fact we had to disable
>> transactions as some of these entries could be large.
>> There also is no reason for the batch, other than optimising the latency.
> Let me summarize to see if we have the requirements for a useful
> batching system (which is sort of patterned on the JDBC statement batching):
>
> - a batch is not an atomic operation, i.e. it is not backed by a transaction
> - it can be wrapped in a transaction if needed
> - batches cannot be nested
> - batches only involve unconditional write operations (put, putAll, remove)
> - ordering of operations within a batch is unimportant aside from
> modifications to the same key where we apply "last one wins"
> - when a batch is "flushed" (i.e. endBatch is invoked) the ops are
> grouped by segment and sent to the appropriate owner for processing,
> potentially in parallel
>
> As Radim has called it, this is essentially a putOrRemoveAll op (with an
> async counterpart).

It is putOrRemoveAll when applied on a non-tx cache, and actually
implementing that shouldn't be complex. However, when transactions come
into play, it is different, because Sanne wants us to remove the
modifications in completed batch from the local transactional invocation
context and 'cache' them on the owners. Since reads have to be
transactionally consistent, we need to inspect the transaction on the
remote nodes (remote repeatable read).

Sanne's request makes sense to me. However as the current implementation
is providing a false assumption that it could work as JDBC batches while
it's nothing but crippled JTA, and as I don't see anyone shouting "I'll
implement that, next week it's done!", I second deprecating/removing the
API for the time being.

I don't find the current API ideal either, as it depends on thread
locals (JTA does as well, but...) while it does not seem useful enough
to me. I would prefer

interface BatchingCache {
     Batch start();
}

@NotThreadSafe
interface Batch {
     void put(K k, V value);
     ...
     void execute();
     void drop(); // maybe not needed
}

Radim

>
> Is that summary correct ?
>
> Tristan


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Major version cleaning

Dan Berindei
But doesn't the functional API's evalMany() provide something very
close to the API you're suggesting?

Dan


On Tue, Feb 21, 2017 at 7:55 PM, Radim Vansa <[hidden email]> wrote:

> On 02/21/2017 05:16 PM, Tristan Tarrant wrote:
>> On 21/02/17 16:29, Sanne Grinovero wrote:
>>>> You haven't explained what "flush" means. Since you separate that from
>>>> atomicity/consistency, I assume that batches on non-tx cache are just
>>>> ordered putOrRemoveAll operations, immediately visible on flush without
>>>> any atomicity.
>> I assume that in Sanne's idea, ordering in a batch doesn't matter, aside
>> from operations on the same key. Having ordering in there would for
>> example not allow us to parallelize by segment.
>>
>>> So I want to write a first chunk, in our code that looks like:
>>>
>>> startBatch
>>> put(chunk1/A, [some large value])
>>> put(chunk1/B, [some small metadata])
>>> put(chunk1/C, [some small metadata])
>>> endBatch
>>> There is no reason to use a transaction, in fact we had to disable
>>> transactions as some of these entries could be large.
>>> There also is no reason for the batch, other than optimising the latency.
>> Let me summarize to see if we have the requirements for a useful
>> batching system (which is sort of patterned on the JDBC statement batching):
>>
>> - a batch is not an atomic operation, i.e. it is not backed by a transaction
>> - it can be wrapped in a transaction if needed
>> - batches cannot be nested
>> - batches only involve unconditional write operations (put, putAll, remove)
>> - ordering of operations within a batch is unimportant aside from
>> modifications to the same key where we apply "last one wins"
>> - when a batch is "flushed" (i.e. endBatch is invoked) the ops are
>> grouped by segment and sent to the appropriate owner for processing,
>> potentially in parallel
>>
>> As Radim has called it, this is essentially a putOrRemoveAll op (with an
>> async counterpart).
>
> It is putOrRemoveAll when applied on a non-tx cache, and actually
> implementing that shouldn't be complex. However, when transactions come
> into play, it is different, because Sanne wants us to remove the
> modifications in completed batch from the local transactional invocation
> context and 'cache' them on the owners. Since reads have to be
> transactionally consistent, we need to inspect the transaction on the
> remote nodes (remote repeatable read).
>
> Sanne's request makes sense to me. However as the current implementation
> is providing a false assumption that it could work as JDBC batches while
> it's nothing but crippled JTA, and as I don't see anyone shouting "I'll
> implement that, next week it's done!", I second deprecating/removing the
> API for the time being.
>

Exactly my thoughts, it's definitely not an unreasonable request, but
it would require a lot of work to implement correctly.


> I don't find the current API ideal either, as it depends on thread
> locals (JTA does as well, but...) while it does not seem useful enough
> to me. I would prefer
>
> interface BatchingCache {
>      Batch start();
> }
>
> @NotThreadSafe
> interface Batch {
>      void put(K k, V value);
>      ...
>      void execute();
>      void drop(); // maybe not needed
> }
>

I was hoping the functional API's evalMany() would be good enough, but
I see now that it replicates the function argument (which holds all
the values) everywhere. So putAll() is still more efficient, unless
using groups to make sure all keys in the batch have the same owners.

Dan
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Major version cleaning

Radim Vansa
On 02/21/2017 07:14 PM, Dan Berindei wrote:

> But doesn't the functional API's evalMany() provide something very
> close to the API you're suggesting?
>
> Dan
>
>
> On Tue, Feb 21, 2017 at 7:55 PM, Radim Vansa <[hidden email]> wrote:
>> On 02/21/2017 05:16 PM, Tristan Tarrant wrote:
>>> On 21/02/17 16:29, Sanne Grinovero wrote:
>>>>> You haven't explained what "flush" means. Since you separate that from
>>>>> atomicity/consistency, I assume that batches on non-tx cache are just
>>>>> ordered putOrRemoveAll operations, immediately visible on flush without
>>>>> any atomicity.
>>> I assume that in Sanne's idea, ordering in a batch doesn't matter, aside
>>> from operations on the same key. Having ordering in there would for
>>> example not allow us to parallelize by segment.
>>>
>>>> So I want to write a first chunk, in our code that looks like:
>>>>
>>>> startBatch
>>>> put(chunk1/A, [some large value])
>>>> put(chunk1/B, [some small metadata])
>>>> put(chunk1/C, [some small metadata])
>>>> endBatch
>>>> There is no reason to use a transaction, in fact we had to disable
>>>> transactions as some of these entries could be large.
>>>> There also is no reason for the batch, other than optimising the latency.
>>> Let me summarize to see if we have the requirements for a useful
>>> batching system (which is sort of patterned on the JDBC statement batching):
>>>
>>> - a batch is not an atomic operation, i.e. it is not backed by a transaction
>>> - it can be wrapped in a transaction if needed
>>> - batches cannot be nested
>>> - batches only involve unconditional write operations (put, putAll, remove)
>>> - ordering of operations within a batch is unimportant aside from
>>> modifications to the same key where we apply "last one wins"
>>> - when a batch is "flushed" (i.e. endBatch is invoked) the ops are
>>> grouped by segment and sent to the appropriate owner for processing,
>>> potentially in parallel
>>>
>>> As Radim has called it, this is essentially a putOrRemoveAll op (with an
>>> async counterpart).
>> It is putOrRemoveAll when applied on a non-tx cache, and actually
>> implementing that shouldn't be complex. However, when transactions come
>> into play, it is different, because Sanne wants us to remove the
>> modifications in completed batch from the local transactional invocation
>> context and 'cache' them on the owners. Since reads have to be
>> transactionally consistent, we need to inspect the transaction on the
>> remote nodes (remote repeatable read).
>>
>> Sanne's request makes sense to me. However as the current implementation
>> is providing a false assumption that it could work as JDBC batches while
>> it's nothing but crippled JTA, and as I don't see anyone shouting "I'll
>> implement that, next week it's done!", I second deprecating/removing the
>> API for the time being.
>>
> Exactly my thoughts, it's definitely not an unreasonable request, but
> it would require a lot of work to implement correctly.
>
>
>> I don't find the current API ideal either, as it depends on thread
>> locals (JTA does as well, but...) while it does not seem useful enough
>> to me. I would prefer
>>
>> interface BatchingCache {
>>       Batch start();
>> }
>>
>> @NotThreadSafe
>> interface Batch {
>>       void put(K k, V value);
>>       ...
>>       void execute();
>>       void drop(); // maybe not needed
>> }
>>
> I was hoping the functional API's evalMany() would be good enough, but
> I see now that it replicates the function argument (which holds all
> the values) everywhere. So putAll() is still more efficient, unless
> using groups to make sure all keys in the batch have the same owners.

Uh, doesn't putAll do the same? Is the modifications list isolate per
target segments when replicating?

R.

>
> Dan
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Major version cleaning

Dan Berindei
On Tue, Feb 21, 2017 at 8:28 PM, Radim Vansa <[hidden email]> wrote:

> On 02/21/2017 07:14 PM, Dan Berindei wrote:
>> But doesn't the functional API's evalMany() provide something very
>> close to the API you're suggesting?
>>
>> Dan
>>
>>
>> On Tue, Feb 21, 2017 at 7:55 PM, Radim Vansa <[hidden email]> wrote:
>>> On 02/21/2017 05:16 PM, Tristan Tarrant wrote:
>>>> On 21/02/17 16:29, Sanne Grinovero wrote:
>>>>>> You haven't explained what "flush" means. Since you separate that from
>>>>>> atomicity/consistency, I assume that batches on non-tx cache are just
>>>>>> ordered putOrRemoveAll operations, immediately visible on flush without
>>>>>> any atomicity.
>>>> I assume that in Sanne's idea, ordering in a batch doesn't matter, aside
>>>> from operations on the same key. Having ordering in there would for
>>>> example not allow us to parallelize by segment.
>>>>
>>>>> So I want to write a first chunk, in our code that looks like:
>>>>>
>>>>> startBatch
>>>>> put(chunk1/A, [some large value])
>>>>> put(chunk1/B, [some small metadata])
>>>>> put(chunk1/C, [some small metadata])
>>>>> endBatch
>>>>> There is no reason to use a transaction, in fact we had to disable
>>>>> transactions as some of these entries could be large.
>>>>> There also is no reason for the batch, other than optimising the latency.
>>>> Let me summarize to see if we have the requirements for a useful
>>>> batching system (which is sort of patterned on the JDBC statement batching):
>>>>
>>>> - a batch is not an atomic operation, i.e. it is not backed by a transaction
>>>> - it can be wrapped in a transaction if needed
>>>> - batches cannot be nested
>>>> - batches only involve unconditional write operations (put, putAll, remove)
>>>> - ordering of operations within a batch is unimportant aside from
>>>> modifications to the same key where we apply "last one wins"
>>>> - when a batch is "flushed" (i.e. endBatch is invoked) the ops are
>>>> grouped by segment and sent to the appropriate owner for processing,
>>>> potentially in parallel
>>>>
>>>> As Radim has called it, this is essentially a putOrRemoveAll op (with an
>>>> async counterpart).
>>> It is putOrRemoveAll when applied on a non-tx cache, and actually
>>> implementing that shouldn't be complex. However, when transactions come
>>> into play, it is different, because Sanne wants us to remove the
>>> modifications in completed batch from the local transactional invocation
>>> context and 'cache' them on the owners. Since reads have to be
>>> transactionally consistent, we need to inspect the transaction on the
>>> remote nodes (remote repeatable read).
>>>
>>> Sanne's request makes sense to me. However as the current implementation
>>> is providing a false assumption that it could work as JDBC batches while
>>> it's nothing but crippled JTA, and as I don't see anyone shouting "I'll
>>> implement that, next week it's done!", I second deprecating/removing the
>>> API for the time being.
>>>
>> Exactly my thoughts, it's definitely not an unreasonable request, but
>> it would require a lot of work to implement correctly.
>>
>>
>>> I don't find the current API ideal either, as it depends on thread
>>> locals (JTA does as well, but...) while it does not seem useful enough
>>> to me. I would prefer
>>>
>>> interface BatchingCache {
>>>       Batch start();
>>> }
>>>
>>> @NotThreadSafe
>>> interface Batch {
>>>       void put(K k, V value);
>>>       ...
>>>       void execute();
>>>       void drop(); // maybe not needed
>>> }
>>>
>> I was hoping the functional API's evalMany() would be good enough, but
>> I see now that it replicates the function argument (which holds all
>> the values) everywhere. So putAll() is still more efficient, unless
>> using groups to make sure all keys in the batch have the same owners.
>
> Uh, doesn't putAll do the same? Is the modifications list isolate per
> target segments when replicating?
>

I missed the evalMany(Map, BiFunction) overload yesterday, and I
assume WriteOnlyMap.evalMany(Map,
MarshallableFunctions.setValueConsumer()) would behave exactly the
same as putAll(Map). Of course, it's not better either, as you still
have to create the map beforehand...

OTOH I'm pretty sure you're the one who wrote the code to split the
input keys/values by target segments both for putAll() and for
evalMany() in NonTxDistributionInterceptor :)

Dan
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Major version cleaning

Radim Vansa
On 02/22/2017 09:11 AM, Dan Berindei wrote:

> On Tue, Feb 21, 2017 at 8:28 PM, Radim Vansa <[hidden email]> wrote:
>> On 02/21/2017 07:14 PM, Dan Berindei wrote:
>>> But doesn't the functional API's evalMany() provide something very
>>> close to the API you're suggesting?
>>>
>>> Dan
>>>
>>>
>>> On Tue, Feb 21, 2017 at 7:55 PM, Radim Vansa <[hidden email]> wrote:
>>>> On 02/21/2017 05:16 PM, Tristan Tarrant wrote:
>>>>> On 21/02/17 16:29, Sanne Grinovero wrote:
>>>>>>> You haven't explained what "flush" means. Since you separate that from
>>>>>>> atomicity/consistency, I assume that batches on non-tx cache are just
>>>>>>> ordered putOrRemoveAll operations, immediately visible on flush without
>>>>>>> any atomicity.
>>>>> I assume that in Sanne's idea, ordering in a batch doesn't matter, aside
>>>>> from operations on the same key. Having ordering in there would for
>>>>> example not allow us to parallelize by segment.
>>>>>
>>>>>> So I want to write a first chunk, in our code that looks like:
>>>>>>
>>>>>> startBatch
>>>>>> put(chunk1/A, [some large value])
>>>>>> put(chunk1/B, [some small metadata])
>>>>>> put(chunk1/C, [some small metadata])
>>>>>> endBatch
>>>>>> There is no reason to use a transaction, in fact we had to disable
>>>>>> transactions as some of these entries could be large.
>>>>>> There also is no reason for the batch, other than optimising the latency.
>>>>> Let me summarize to see if we have the requirements for a useful
>>>>> batching system (which is sort of patterned on the JDBC statement batching):
>>>>>
>>>>> - a batch is not an atomic operation, i.e. it is not backed by a transaction
>>>>> - it can be wrapped in a transaction if needed
>>>>> - batches cannot be nested
>>>>> - batches only involve unconditional write operations (put, putAll, remove)
>>>>> - ordering of operations within a batch is unimportant aside from
>>>>> modifications to the same key where we apply "last one wins"
>>>>> - when a batch is "flushed" (i.e. endBatch is invoked) the ops are
>>>>> grouped by segment and sent to the appropriate owner for processing,
>>>>> potentially in parallel
>>>>>
>>>>> As Radim has called it, this is essentially a putOrRemoveAll op (with an
>>>>> async counterpart).
>>>> It is putOrRemoveAll when applied on a non-tx cache, and actually
>>>> implementing that shouldn't be complex. However, when transactions come
>>>> into play, it is different, because Sanne wants us to remove the
>>>> modifications in completed batch from the local transactional invocation
>>>> context and 'cache' them on the owners. Since reads have to be
>>>> transactionally consistent, we need to inspect the transaction on the
>>>> remote nodes (remote repeatable read).
>>>>
>>>> Sanne's request makes sense to me. However as the current implementation
>>>> is providing a false assumption that it could work as JDBC batches while
>>>> it's nothing but crippled JTA, and as I don't see anyone shouting "I'll
>>>> implement that, next week it's done!", I second deprecating/removing the
>>>> API for the time being.
>>>>
>>> Exactly my thoughts, it's definitely not an unreasonable request, but
>>> it would require a lot of work to implement correctly.
>>>
>>>
>>>> I don't find the current API ideal either, as it depends on thread
>>>> locals (JTA does as well, but...) while it does not seem useful enough
>>>> to me. I would prefer
>>>>
>>>> interface BatchingCache {
>>>>        Batch start();
>>>> }
>>>>
>>>> @NotThreadSafe
>>>> interface Batch {
>>>>        void put(K k, V value);
>>>>        ...
>>>>        void execute();
>>>>        void drop(); // maybe not needed
>>>> }
>>>>
>>> I was hoping the functional API's evalMany() would be good enough, but
>>> I see now that it replicates the function argument (which holds all
>>> the values) everywhere. So putAll() is still more efficient, unless
>>> using groups to make sure all keys in the batch have the same owners.
>> Uh, doesn't putAll do the same? Is the modifications list isolate per
>> target segments when replicating?
>>
> I missed the evalMany(Map, BiFunction) overload yesterday, and I
> assume WriteOnlyMap.evalMany(Map,
> MarshallableFunctions.setValueConsumer()) would behave exactly the
> same as putAll(Map). Of course, it's not better either, as you still
> have to create the map beforehand...
>
> OTOH I'm pretty sure you're the one who wrote the code to split the
> input keys/values by target segments both for putAll() and for
> evalMany() in NonTxDistributionInterceptor :)

I was referring to "I see now that it replicates the function argument
(which holds all the values) everywhere. So putAll() is still more
efficient." And besides sending the setValue operation along, I don't
see what you mean.

* In non-tx mode, evalMany() wraps the entries using
ReadOnlySegmentAwareMap which filters segments for given node during
marshalling. putAll does exactly the same.
* In tx mode, both commands are added to modifications list and this is
sent (in PrepareCommand) to all participating nodes, including entries
that are not owned by a given node. (that's a space for optimizations).

Anyway, while evalMany would be sufficient for non-transactional mode
(and batching API would be only a convenience), it is not enough in tx
mode as it does not wipe out non-local modifications/reads on originator
during the flush.
When implementing tx functional commands, I was considering keeping
modifications on remote node during the transaction (before the prepare
phase) for repeatable functional reads, but in the end I went with
sending all modifications for a given entry along with the read.

Radim



>
> Dan
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Major version cleaning

Radim Vansa
On 02/22/2017 09:53 AM, Radim Vansa wrote:

> On 02/22/2017 09:11 AM, Dan Berindei wrote:
>> On Tue, Feb 21, 2017 at 8:28 PM, Radim Vansa <[hidden email]> wrote:
>>> On 02/21/2017 07:14 PM, Dan Berindei wrote:
>>>> But doesn't the functional API's evalMany() provide something very
>>>> close to the API you're suggesting?
>>>>
>>>> Dan
>>>>
>>>>
>>>> On Tue, Feb 21, 2017 at 7:55 PM, Radim Vansa <[hidden email]> wrote:
>>>>> On 02/21/2017 05:16 PM, Tristan Tarrant wrote:
>>>>>> On 21/02/17 16:29, Sanne Grinovero wrote:
>>>>>>>> You haven't explained what "flush" means. Since you separate that from
>>>>>>>> atomicity/consistency, I assume that batches on non-tx cache are just
>>>>>>>> ordered putOrRemoveAll operations, immediately visible on flush without
>>>>>>>> any atomicity.
>>>>>> I assume that in Sanne's idea, ordering in a batch doesn't matter, aside
>>>>>> from operations on the same key. Having ordering in there would for
>>>>>> example not allow us to parallelize by segment.
>>>>>>
>>>>>>> So I want to write a first chunk, in our code that looks like:
>>>>>>>
>>>>>>> startBatch
>>>>>>> put(chunk1/A, [some large value])
>>>>>>> put(chunk1/B, [some small metadata])
>>>>>>> put(chunk1/C, [some small metadata])
>>>>>>> endBatch
>>>>>>> There is no reason to use a transaction, in fact we had to disable
>>>>>>> transactions as some of these entries could be large.
>>>>>>> There also is no reason for the batch, other than optimising the latency.
>>>>>> Let me summarize to see if we have the requirements for a useful
>>>>>> batching system (which is sort of patterned on the JDBC statement batching):
>>>>>>
>>>>>> - a batch is not an atomic operation, i.e. it is not backed by a transaction
>>>>>> - it can be wrapped in a transaction if needed
>>>>>> - batches cannot be nested
>>>>>> - batches only involve unconditional write operations (put, putAll, remove)
>>>>>> - ordering of operations within a batch is unimportant aside from
>>>>>> modifications to the same key where we apply "last one wins"
>>>>>> - when a batch is "flushed" (i.e. endBatch is invoked) the ops are
>>>>>> grouped by segment and sent to the appropriate owner for processing,
>>>>>> potentially in parallel
>>>>>>
>>>>>> As Radim has called it, this is essentially a putOrRemoveAll op (with an
>>>>>> async counterpart).
>>>>> It is putOrRemoveAll when applied on a non-tx cache, and actually
>>>>> implementing that shouldn't be complex. However, when transactions come
>>>>> into play, it is different, because Sanne wants us to remove the
>>>>> modifications in completed batch from the local transactional invocation
>>>>> context and 'cache' them on the owners. Since reads have to be
>>>>> transactionally consistent, we need to inspect the transaction on the
>>>>> remote nodes (remote repeatable read).
>>>>>
>>>>> Sanne's request makes sense to me. However as the current implementation
>>>>> is providing a false assumption that it could work as JDBC batches while
>>>>> it's nothing but crippled JTA, and as I don't see anyone shouting "I'll
>>>>> implement that, next week it's done!", I second deprecating/removing the
>>>>> API for the time being.
>>>>>
>>>> Exactly my thoughts, it's definitely not an unreasonable request, but
>>>> it would require a lot of work to implement correctly.
>>>>
>>>>
>>>>> I don't find the current API ideal either, as it depends on thread
>>>>> locals (JTA does as well, but...) while it does not seem useful enough
>>>>> to me. I would prefer
>>>>>
>>>>> interface BatchingCache {
>>>>>         Batch start();
>>>>> }
>>>>>
>>>>> @NotThreadSafe
>>>>> interface Batch {
>>>>>         void put(K k, V value);
>>>>>         ...
>>>>>         void execute();
>>>>>         void drop(); // maybe not needed
>>>>> }
>>>>>
>>>> I was hoping the functional API's evalMany() would be good enough, but
>>>> I see now that it replicates the function argument (which holds all
>>>> the values) everywhere. So putAll() is still more efficient, unless
>>>> using groups to make sure all keys in the batch have the same owners.
>>> Uh, doesn't putAll do the same? Is the modifications list isolate per
>>> target segments when replicating?
>>>
>> I missed the evalMany(Map, BiFunction) overload yesterday, and I
>> assume WriteOnlyMap.evalMany(Map,
>> MarshallableFunctions.setValueConsumer()) would behave exactly the
>> same as putAll(Map). Of course, it's not better either, as you still
>> have to create the map beforehand...
>>
>> OTOH I'm pretty sure you're the one who wrote the code to split the
>> input keys/values by target segments both for putAll() and for
>> evalMany() in NonTxDistributionInterceptor :)
> I was referring to "I see now that it replicates the function argument
> (which holds all the values) everywhere. So putAll() is still more
> efficient." And besides sending the setValue operation along, I don't
> see what you mean.
>
> * In non-tx mode, evalMany() wraps the entries using
> ReadOnlySegmentAwareMap which filters segments for given node during
> marshalling. putAll does exactly the same.
> * In tx mode, both commands are added to modifications list and this is
> sent (in PrepareCommand) to all participating nodes, including entries
> that are not owned by a given node. (that's a space for optimizations).
>
> Anyway, while evalMany would be sufficient for non-transactional mode
> (and batching API would be only a convenience), it is not enough in tx
> mode as it does not wipe out non-local modifications/reads on originator
> during the flush.
> When implementing tx functional commands, I was considering keeping
> modifications on remote node during the transaction (before the prepare
> phase) for repeatable functional reads, but in the end I went with
> sending all modifications for a given entry along with the read.

Silly me :-/ When reviewing the code I've realized that non-functional
reads don't send these modifications, which is a bug [1]

R.

[1] https://issues.jboss.org/browse/ISPN-7505


>
> Radim
>
>
>
>> Dan
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [infinispan-dev] Major version cleaning

Dan Berindei
In reply to this post by Radim Vansa
On Wed, Feb 22, 2017 at 10:53 AM, Radim Vansa <[hidden email]> wrote:

> On 02/22/2017 09:11 AM, Dan Berindei wrote:
>> On Tue, Feb 21, 2017 at 8:28 PM, Radim Vansa <[hidden email]> wrote:
>>> On 02/21/2017 07:14 PM, Dan Berindei wrote:
>>>> But doesn't the functional API's evalMany() provide something very
>>>> close to the API you're suggesting?
>>>>
>>>> Dan
>>>>
>>>>
>>>> On Tue, Feb 21, 2017 at 7:55 PM, Radim Vansa <[hidden email]> wrote:
>>>>> On 02/21/2017 05:16 PM, Tristan Tarrant wrote:
>>>>>> On 21/02/17 16:29, Sanne Grinovero wrote:
>>>>>>>> You haven't explained what "flush" means. Since you separate that from
>>>>>>>> atomicity/consistency, I assume that batches on non-tx cache are just
>>>>>>>> ordered putOrRemoveAll operations, immediately visible on flush without
>>>>>>>> any atomicity.
>>>>>> I assume that in Sanne's idea, ordering in a batch doesn't matter, aside
>>>>>> from operations on the same key. Having ordering in there would for
>>>>>> example not allow us to parallelize by segment.
>>>>>>
>>>>>>> So I want to write a first chunk, in our code that looks like:
>>>>>>>
>>>>>>> startBatch
>>>>>>> put(chunk1/A, [some large value])
>>>>>>> put(chunk1/B, [some small metadata])
>>>>>>> put(chunk1/C, [some small metadata])
>>>>>>> endBatch
>>>>>>> There is no reason to use a transaction, in fact we had to disable
>>>>>>> transactions as some of these entries could be large.
>>>>>>> There also is no reason for the batch, other than optimising the latency.
>>>>>> Let me summarize to see if we have the requirements for a useful
>>>>>> batching system (which is sort of patterned on the JDBC statement batching):
>>>>>>
>>>>>> - a batch is not an atomic operation, i.e. it is not backed by a transaction
>>>>>> - it can be wrapped in a transaction if needed
>>>>>> - batches cannot be nested
>>>>>> - batches only involve unconditional write operations (put, putAll, remove)
>>>>>> - ordering of operations within a batch is unimportant aside from
>>>>>> modifications to the same key where we apply "last one wins"
>>>>>> - when a batch is "flushed" (i.e. endBatch is invoked) the ops are
>>>>>> grouped by segment and sent to the appropriate owner for processing,
>>>>>> potentially in parallel
>>>>>>
>>>>>> As Radim has called it, this is essentially a putOrRemoveAll op (with an
>>>>>> async counterpart).
>>>>> It is putOrRemoveAll when applied on a non-tx cache, and actually
>>>>> implementing that shouldn't be complex. However, when transactions come
>>>>> into play, it is different, because Sanne wants us to remove the
>>>>> modifications in completed batch from the local transactional invocation
>>>>> context and 'cache' them on the owners. Since reads have to be
>>>>> transactionally consistent, we need to inspect the transaction on the
>>>>> remote nodes (remote repeatable read).
>>>>>
>>>>> Sanne's request makes sense to me. However as the current implementation
>>>>> is providing a false assumption that it could work as JDBC batches while
>>>>> it's nothing but crippled JTA, and as I don't see anyone shouting "I'll
>>>>> implement that, next week it's done!", I second deprecating/removing the
>>>>> API for the time being.
>>>>>
>>>> Exactly my thoughts, it's definitely not an unreasonable request, but
>>>> it would require a lot of work to implement correctly.
>>>>
>>>>
>>>>> I don't find the current API ideal either, as it depends on thread
>>>>> locals (JTA does as well, but...) while it does not seem useful enough
>>>>> to me. I would prefer
>>>>>
>>>>> interface BatchingCache {
>>>>>        Batch start();
>>>>> }
>>>>>
>>>>> @NotThreadSafe
>>>>> interface Batch {
>>>>>        void put(K k, V value);
>>>>>        ...
>>>>>        void execute();
>>>>>        void drop(); // maybe not needed
>>>>> }
>>>>>
>>>> I was hoping the functional API's evalMany() would be good enough, but
>>>> I see now that it replicates the function argument (which holds all
>>>> the values) everywhere. So putAll() is still more efficient, unless
>>>> using groups to make sure all keys in the batch have the same owners.
>>> Uh, doesn't putAll do the same? Is the modifications list isolate per
>>> target segments when replicating?
>>>
>> I missed the evalMany(Map, BiFunction) overload yesterday, and I
>> assume WriteOnlyMap.evalMany(Map,
>> MarshallableFunctions.setValueConsumer()) would behave exactly the
>> same as putAll(Map). Of course, it's not better either, as you still
>> have to create the map beforehand...
>>
>> OTOH I'm pretty sure you're the one who wrote the code to split the
>> input keys/values by target segments both for putAll() and for
>> evalMany() in NonTxDistributionInterceptor :)
>
> I was referring to "I see now that it replicates the function argument
> (which holds all the values) everywhere. So putAll() is still more
> efficient." And besides sending the setValue operation along, I don't
> see what you mean.
>
> * In non-tx mode, evalMany() wraps the entries using
> ReadOnlySegmentAwareMap which filters segments for given node during
> marshalling. putAll does exactly the same.
> * In tx mode, both commands are added to modifications list and this is
> sent (in PrepareCommand) to all participating nodes, including entries
> that are not owned by a given node. (that's a space for optimizations).
>
> Anyway, while evalMany would be sufficient for non-transactional mode
> (and batching API would be only a convenience), it is not enough in tx
> mode as it does not wipe out non-local modifications/reads on originator
> during the flush.

I should have clarified that I was only talking about non-tx. With
transactions, the owners can change between the write and the prepare
(and the prepare can also be retried because of a topology change), so
it should be the PrepareCommand's job to filter updates by
destination.

> When implementing tx functional commands, I was considering keeping
> modifications on remote node during the transaction (before the prepare
> phase) for repeatable functional reads, but in the end I went with
> sending all modifications for a given entry along with the read.
>

Hmmm, neither seems like a very good fit...

I was wondering about making the CompletableFutures returned by
eval/evalMany during a transaction only complete after the transaction
is done, and have methods thenEval()/thenEvalMany() to add another
read/write to the transaction. The weird part would be that some other
code might extend the scope of our transaction by starting an outer
transaction, then committing the inner transaction wouldn't run the
batched operations and wouldn't complete anything. The transaction
manager wouldn't even notify the cache when the inner transaction is
committed, so perhaps this could only work with our own "functional
batching" API.

Cheers
Dan
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
12
Loading...