[infinispan-dev] DataContainer performance review

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[infinispan-dev] DataContainer performance review

Vladimir Blagojevic
Hi,

I would like to review recent DataContainer performance claims and I was
wondering if any of you have some spare cycles to help me out.

I've added a test[1] to MapStressTest that measures and contrasts single
node Cache performance to synchronized HashMap, ConcurrentHashMap and
BCHM variants.


Performance for container BoundedConcurrentHashMap (LIRS)
Average get ops/ms 1063
Average put ops/ms 101
Average remove ops/ms 421
Size = 480
Performance for container BoundedConcurrentHashMap (LRU)
Average get ops/ms 976
Average put ops/ms 306
Average remove ops/ms 521
Size = 463
Performance for container CacheImpl
Average get ops/ms 94
Average put ops/ms 61
Average remove ops/ms 65
Size = 453
Performance for container ConcurrentHashMap
Average get ops/ms 484
Average put ops/ms 326
Average remove ops/ms 376
Size = 49870
Performance for container SynchronizedMap
Average get ops/ms 96
Average put ops/ms 85
Average remove ops/ms 96
Size = 49935


I ran MapStressTest on my Macbook Air, 32 threads continually doing
get/put/remove ops. Fore more details see[1]. If my measurements are
correct Cache instance seems to be capable of about ~220 ops per
millisecond on my crappy hardware setup. As you can see performance of
the entire cache structure does not seem to be much worse from a
SynchronizedMap which is great in one hand but also leaves us some room
for potential improvement since concurrent hashmap and BCHM seem to be
substantially faster. I have not tested impact of having a cache store
for passivation and I will do that tomorrow/next week.

Any comments/ideas going forward?

[1] https://github.com/infinispan/infinispan/pull/404
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Sanne Grinovero-3
Hi Vladimir,
this looks very interesting, I couldn't resist to start some runs.

I noticed the test is quite quick to finish, so I've raised my
LOOP_FACTOR to 200, but it still finishes in some minutes which is not
long enough IMHO for these numbers to be really representative.
I've noticed that the test has some "warmup" boolean, but that's not
being used while I think it should.
Also, the three different operations need of course to happen all
together to properly "shuffle" the data, but we have to consider while
interpreting these numbers that some operations will finish before the
others, so some of the results achieved by the remaining operations
are not disturbed by the other operations. Maybe it's more interesting
to have the three operations run in a predictable sequence? or have
them all work as fast as they can for a given timebox instead of
"until the keys are finished" ?

Here where my results, if any comparing is useful. To conclude
something from this data, it looks to me that indeed the put
operations during LIRS are having something wrong? Also trying to add
more writers worsens the scenario for LIRS significantly.

When running the test with "doTest(map, 28, 8, 8, true, testName);"
(adding more put and remove operations) the synchronizedMap is
significanly faster than the CacheImpl.

Performance for container BoundedConcurrentHashMap
Average get ops/ms 1711
Average put ops/ms 63
Average remove ops/ms 1108
Size = 480
Performance for container BoundedConcurrentHashMap
Average get ops/ms 1851
Average put ops/ms 665
Average remove ops/ms 1199
Size = 463
Performance for container CacheImpl
Average get ops/ms 349
Average put ops/ms 213
Average remove ops/ms 250
Size = 459
Performance for container ConcurrentHashMap
Average get ops/ms 776
Average put ops/ms 611
Average remove ops/ms 606
Size = 562
Performance for container SynchronizedMap
Average get ops/ms 244
Average put ops/ms 222
Average remove ops/ms 236
Size = 50000

Now with doTest(map, 28, 8, 8, true, testName):

Performance for container Infinispan Cache implementation
Average get ops/ms 71
Average put ops/ms 47
Average remove ops/ms 51
Size = 474
Performance for container ConcurrentHashMap
Average get ops/ms 606
Average put ops/ms 227
Average remove ops/ms 246
Size = 49823
Performance for container synchronizedMap
Average get ops/ms 175
Average put ops/ms 141
Average remove ops/ms 160

As a first glance it doesn't look very nice, but these runs where not
long enough at all.

Sanne

2011/6/26 Vladimir Blagojevic <[hidden email]>:

> Hi,
>
> I would like to review recent DataContainer performance claims and I was
> wondering if any of you have some spare cycles to help me out.
>
> I've added a test[1] to MapStressTest that measures and contrasts single
> node Cache performance to synchronized HashMap, ConcurrentHashMap and
> BCHM variants.
>
>
> Performance for container BoundedConcurrentHashMap (LIRS)
> Average get ops/ms 1063
> Average put ops/ms 101
> Average remove ops/ms 421
> Size = 480
> Performance for container BoundedConcurrentHashMap (LRU)
> Average get ops/ms 976
> Average put ops/ms 306
> Average remove ops/ms 521
> Size = 463
> Performance for container CacheImpl
> Average get ops/ms 94
> Average put ops/ms 61
> Average remove ops/ms 65
> Size = 453
> Performance for container ConcurrentHashMap
> Average get ops/ms 484
> Average put ops/ms 326
> Average remove ops/ms 376
> Size = 49870
> Performance for container SynchronizedMap
> Average get ops/ms 96
> Average put ops/ms 85
> Average remove ops/ms 96
> Size = 49935
>
>
> I ran MapStressTest on my Macbook Air, 32 threads continually doing
> get/put/remove ops. Fore more details see[1]. If my measurements are
> correct Cache instance seems to be capable of about ~220 ops per
> millisecond on my crappy hardware setup. As you can see performance of
> the entire cache structure does not seem to be much worse from a
> SynchronizedMap which is great in one hand but also leaves us some room
> for potential improvement since concurrent hashmap and BCHM seem to be
> substantially faster. I have not tested impact of having a cache store
> for passivation and I will do that tomorrow/next week.
>
> Any comments/ideas going forward?
>
> [1] https://github.com/infinispan/infinispan/pull/404
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Vladimir Blagojevic
Sanne & others,

I think we might be onto something. I changed the test to run for
specified period of time, I used 10 minutes test runs (You need to pull
this change in MapStressTest manually until it is integrated). I noticed
that as we raise map capacity BCHM and CacheImpl performance starts to
degrade while it does not for ConcurrentHashMap and SynchronizedMap. See
results below.


max capacity = 512
Performance for container BoundedConcurrentHashMap
Average get ops/ms 382
Average put ops/ms 35
Average remove ops/ms 195
Size = 478
Performance for container BoundedConcurrentHashMap
Average get ops/ms 388
Average put ops/ms 54
Average remove ops/ms 203
Size = 462
Performance for container CacheImpl
Average get ops/ms 143
Average put ops/ms 16
Average remove ops/ms 26
Size = 418
Performance for container ConcurrentHashMap
Average get ops/ms 176
Average put ops/ms 67
Average remove ops/ms 74
Size = 43451
Performance for container SynchronizedMap
Average get ops/ms 58
Average put ops/ms 47
Average remove ops/ms 60
Size = 30996


max capacity = 16384
Performance for container BoundedConcurrentHashMap
Average get ops/ms 118
Average put ops/ms 7
Average remove ops/ms 11
Size = 16358
Performance for container BoundedConcurrentHashMap
Average get ops/ms 76
Average put ops/ms 5
Average remove ops/ms 6
Size = 15488
Performance for container CacheImpl
Average get ops/ms 48
Average put ops/ms 4
Average remove ops/ms 16
Size = 12275
Performance for container ConcurrentHashMap
Average get ops/ms 251
Average put ops/ms 107
Average remove ops/ms 122
Size = 17629
Performance for container SynchronizedMap
Average get ops/ms 51
Average put ops/ms 42
Average remove ops/ms 51
Size = 36978


max capacity = 32768
Performance for container BoundedConcurrentHashMap
Average get ops/ms 72
Average put ops/ms 7
Average remove ops/ms 9
Size = 32405
Performance for container BoundedConcurrentHashMap
Average get ops/ms 13
Average put ops/ms 5
Average remove ops/ms 2
Size = 29214
Performance for container CacheImpl
Average get ops/ms 14
Average put ops/ms 2
Average remove ops/ms 4
Size = 23887
Performance for container ConcurrentHashMap
Average get ops/ms 235
Average put ops/ms 102
Average remove ops/ms 115
Size = 27823
Performance for container SynchronizedMap
Average get ops/ms 55
Average put ops/ms 48
Average remove ops/ms 53
Size = 39650

On 11-06-26 8:44 PM, Sanne Grinovero wrote:

> Hi Vladimir,
> this looks very interesting, I couldn't resist to start some runs.
>
> I noticed the test is quite quick to finish, so I've raised my
> LOOP_FACTOR to 200, but it still finishes in some minutes which is not
> long enough IMHO for these numbers to be really representative.
> I've noticed that the test has some "warmup" boolean, but that's not
> being used while I think it should.
> Also, the three different operations need of course to happen all
> together to properly "shuffle" the data, but we have to consider while
> interpreting these numbers that some operations will finish before the
> others, so some of the results achieved by the remaining operations
> are not disturbed by the other operations. Maybe it's more interesting
> to have the three operations run in a predictable sequence? or have
> them all work as fast as they can for a given timebox instead of
> "until the keys are finished" ?
>
> Here where my results, if any comparing is useful. To conclude
> something from this data, it looks to me that indeed the put
> operations during LIRS are having something wrong? Also trying to add
> more writers worsens the scenario for LIRS significantly.
>
> When running the test with "doTest(map, 28, 8, 8, true, testName);"
> (adding more put and remove operations) the synchronizedMap is
> significanly faster than the CacheImpl.
>
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 1711
> Average put ops/ms 63
> Average remove ops/ms 1108
> Size = 480
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 1851
> Average put ops/ms 665
> Average remove ops/ms 1199
> Size = 463
> Performance for container CacheImpl
> Average get ops/ms 349
> Average put ops/ms 213
> Average remove ops/ms 250
> Size = 459
> Performance for container ConcurrentHashMap
> Average get ops/ms 776
> Average put ops/ms 611
> Average remove ops/ms 606
> Size = 562
> Performance for container SynchronizedMap
> Average get ops/ms 244
> Average put ops/ms 222
> Average remove ops/ms 236
> Size = 50000
>
> Now with doTest(map, 28, 8, 8, true, testName):
>
> Performance for container Infinispan Cache implementation
> Average get ops/ms 71
> Average put ops/ms 47
> Average remove ops/ms 51
> Size = 474
> Performance for container ConcurrentHashMap
> Average get ops/ms 606
> Average put ops/ms 227
> Average remove ops/ms 246
> Size = 49823
> Performance for container synchronizedMap
> Average get ops/ms 175
> Average put ops/ms 141
> Average remove ops/ms 160
>
> As a first glance it doesn't look very nice, but these runs where not
> long enough at all.
>
> Sanne
>
> 2011/6/26 Vladimir Blagojevic<[hidden email]>:
>> Hi,
>>
>> I would like to review recent DataContainer performance claims and I was
>> wondering if any of you have some spare cycles to help me out.
>>
>> I've added a test[1] to MapStressTest that measures and contrasts single
>> node Cache performance to synchronized HashMap, ConcurrentHashMap and
>> BCHM variants.
>>
>>
>> Performance for container BoundedConcurrentHashMap (LIRS)
>> Average get ops/ms 1063
>> Average put ops/ms 101
>> Average remove ops/ms 421
>> Size = 480
>> Performance for container BoundedConcurrentHashMap (LRU)
>> Average get ops/ms 976
>> Average put ops/ms 306
>> Average remove ops/ms 521
>> Size = 463
>> Performance for container CacheImpl
>> Average get ops/ms 94
>> Average put ops/ms 61
>> Average remove ops/ms 65
>> Size = 453
>> Performance for container ConcurrentHashMap
>> Average get ops/ms 484
>> Average put ops/ms 326
>> Average remove ops/ms 376
>> Size = 49870
>> Performance for container SynchronizedMap
>> Average get ops/ms 96
>> Average put ops/ms 85
>> Average remove ops/ms 96
>> Size = 49935
>>
>>
>> I ran MapStressTest on my Macbook Air, 32 threads continually doing
>> get/put/remove ops. Fore more details see[1]. If my measurements are
>> correct Cache instance seems to be capable of about ~220 ops per
>> millisecond on my crappy hardware setup. As you can see performance of
>> the entire cache structure does not seem to be much worse from a
>> SynchronizedMap which is great in one hand but also leaves us some room
>> for potential improvement since concurrent hashmap and BCHM seem to be
>> substantially faster. I have not tested impact of having a cache store
>> for passivation and I will do that tomorrow/next week.
>>
>> Any comments/ideas going forward?
>>
>> [1] https://github.com/infinispan/infinispan/pull/404
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Galder Zamarreno
Vladimir,

I think it's better if you run your tests in one of the cluster or perf machines cos that way everyone has access to the same base system and results can be compared, particularly when changes are made. Also, you avoid local apps or CPU usage affecting your test results.

I agree with Sanne, put ops for LIRS don't look go in comparison with LRU. Did you run some profiling?

Cheers,

On Jun 27, 2011, at 10:11 PM, Vladimir Blagojevic wrote:

> Sanne & others,
>
> I think we might be onto something. I changed the test to run for
> specified period of time, I used 10 minutes test runs (You need to pull
> this change in MapStressTest manually until it is integrated). I noticed
> that as we raise map capacity BCHM and CacheImpl performance starts to
> degrade while it does not for ConcurrentHashMap and SynchronizedMap. See
> results below.
>
>
> max capacity = 512
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 382
> Average put ops/ms 35
> Average remove ops/ms 195
> Size = 478
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 388
> Average put ops/ms 54
> Average remove ops/ms 203
> Size = 462
> Performance for container CacheImpl
> Average get ops/ms 143
> Average put ops/ms 16
> Average remove ops/ms 26
> Size = 418
> Performance for container ConcurrentHashMap
> Average get ops/ms 176
> Average put ops/ms 67
> Average remove ops/ms 74
> Size = 43451
> Performance for container SynchronizedMap
> Average get ops/ms 58
> Average put ops/ms 47
> Average remove ops/ms 60
> Size = 30996
>
>
> max capacity = 16384
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 118
> Average put ops/ms 7
> Average remove ops/ms 11
> Size = 16358
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 76
> Average put ops/ms 5
> Average remove ops/ms 6
> Size = 15488
> Performance for container CacheImpl
> Average get ops/ms 48
> Average put ops/ms 4
> Average remove ops/ms 16
> Size = 12275
> Performance for container ConcurrentHashMap
> Average get ops/ms 251
> Average put ops/ms 107
> Average remove ops/ms 122
> Size = 17629
> Performance for container SynchronizedMap
> Average get ops/ms 51
> Average put ops/ms 42
> Average remove ops/ms 51
> Size = 36978
>
>
> max capacity = 32768
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 72
> Average put ops/ms 7
> Average remove ops/ms 9
> Size = 32405
> Performance for container BoundedConcurrentHashMap
> Average get ops/ms 13
> Average put ops/ms 5
> Average remove ops/ms 2
> Size = 29214
> Performance for container CacheImpl
> Average get ops/ms 14
> Average put ops/ms 2
> Average remove ops/ms 4
> Size = 23887
> Performance for container ConcurrentHashMap
> Average get ops/ms 235
> Average put ops/ms 102
> Average remove ops/ms 115
> Size = 27823
> Performance for container SynchronizedMap
> Average get ops/ms 55
> Average put ops/ms 48
> Average remove ops/ms 53
> Size = 39650
>
> On 11-06-26 8:44 PM, Sanne Grinovero wrote:
>> Hi Vladimir,
>> this looks very interesting, I couldn't resist to start some runs.
>>
>> I noticed the test is quite quick to finish, so I've raised my
>> LOOP_FACTOR to 200, but it still finishes in some minutes which is not
>> long enough IMHO for these numbers to be really representative.
>> I've noticed that the test has some "warmup" boolean, but that's not
>> being used while I think it should.
>> Also, the three different operations need of course to happen all
>> together to properly "shuffle" the data, but we have to consider while
>> interpreting these numbers that some operations will finish before the
>> others, so some of the results achieved by the remaining operations
>> are not disturbed by the other operations. Maybe it's more interesting
>> to have the three operations run in a predictable sequence? or have
>> them all work as fast as they can for a given timebox instead of
>> "until the keys are finished" ?
>>
>> Here where my results, if any comparing is useful. To conclude
>> something from this data, it looks to me that indeed the put
>> operations during LIRS are having something wrong? Also trying to add
>> more writers worsens the scenario for LIRS significantly.
>>
>> When running the test with "doTest(map, 28, 8, 8, true, testName);"
>> (adding more put and remove operations) the synchronizedMap is
>> significanly faster than the CacheImpl.
>>
>> Performance for container BoundedConcurrentHashMap
>> Average get ops/ms 1711
>> Average put ops/ms 63
>> Average remove ops/ms 1108
>> Size = 480
>> Performance for container BoundedConcurrentHashMap
>> Average get ops/ms 1851
>> Average put ops/ms 665
>> Average remove ops/ms 1199
>> Size = 463
>> Performance for container CacheImpl
>> Average get ops/ms 349
>> Average put ops/ms 213
>> Average remove ops/ms 250
>> Size = 459
>> Performance for container ConcurrentHashMap
>> Average get ops/ms 776
>> Average put ops/ms 611
>> Average remove ops/ms 606
>> Size = 562
>> Performance for container SynchronizedMap
>> Average get ops/ms 244
>> Average put ops/ms 222
>> Average remove ops/ms 236
>> Size = 50000
>>
>> Now with doTest(map, 28, 8, 8, true, testName):
>>
>> Performance for container Infinispan Cache implementation
>> Average get ops/ms 71
>> Average put ops/ms 47
>> Average remove ops/ms 51
>> Size = 474
>> Performance for container ConcurrentHashMap
>> Average get ops/ms 606
>> Average put ops/ms 227
>> Average remove ops/ms 246
>> Size = 49823
>> Performance for container synchronizedMap
>> Average get ops/ms 175
>> Average put ops/ms 141
>> Average remove ops/ms 160
>>
>> As a first glance it doesn't look very nice, but these runs where not
>> long enough at all.
>>
>> Sanne
>>
>> 2011/6/26 Vladimir Blagojevic<[hidden email]>:
>>> Hi,
>>>
>>> I would like to review recent DataContainer performance claims and I was
>>> wondering if any of you have some spare cycles to help me out.
>>>
>>> I've added a test[1] to MapStressTest that measures and contrasts single
>>> node Cache performance to synchronized HashMap, ConcurrentHashMap and
>>> BCHM variants.
>>>
>>>
>>> Performance for container BoundedConcurrentHashMap (LIRS)
>>> Average get ops/ms 1063
>>> Average put ops/ms 101
>>> Average remove ops/ms 421
>>> Size = 480
>>> Performance for container BoundedConcurrentHashMap (LRU)
>>> Average get ops/ms 976
>>> Average put ops/ms 306
>>> Average remove ops/ms 521
>>> Size = 463
>>> Performance for container CacheImpl
>>> Average get ops/ms 94
>>> Average put ops/ms 61
>>> Average remove ops/ms 65
>>> Size = 453
>>> Performance for container ConcurrentHashMap
>>> Average get ops/ms 484
>>> Average put ops/ms 326
>>> Average remove ops/ms 376
>>> Size = 49870
>>> Performance for container SynchronizedMap
>>> Average get ops/ms 96
>>> Average put ops/ms 85
>>> Average remove ops/ms 96
>>> Size = 49935
>>>
>>>
>>> I ran MapStressTest on my Macbook Air, 32 threads continually doing
>>> get/put/remove ops. Fore more details see[1]. If my measurements are
>>> correct Cache instance seems to be capable of about ~220 ops per
>>> millisecond on my crappy hardware setup. As you can see performance of
>>> the entire cache structure does not seem to be much worse from a
>>> SynchronizedMap which is great in one hand but also leaves us some room
>>> for potential improvement since concurrent hashmap and BCHM seem to be
>>> substantially faster. I have not tested impact of having a cache store
>>> for passivation and I will do that tomorrow/next week.
>>>
>>> Any comments/ideas going forward?
>>>
>>> [1] https://github.com/infinispan/infinispan/pull/404
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Vladimir Blagojevic
On 11-06-28 10:06 AM, Galder Zamarreño wrote:
> Vladimir,
>
> I think it's better if you run your tests in one of the cluster or perf machines cos that way everyone has access to the same base system and results can be compared, particularly when changes are made. Also, you avoid local apps or CPU usage affecting your test results.
>
> I agree with Sanne, put ops for LIRS don't look go in comparison with LRU. Did you run some profiling?
>
Hey,

Very likely you are right and it is a better approach but it does not
take much to notice a trend of deteriorating BCHM performance for large
map/cache size. Looking to do some profiling now.

Cheers


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Vladimir Blagojevic
Hey, good news!

I have found that a main culprit of a poor DataContainer performance for
large caches (100K entries +) is in fact use of default concurrency of
32. If users are going to use caches with many entries then they should
also increase concurrency level. I found that concurrency of 512 works
fairly well for caches up to million entries. Also note that if users
are using such large caches (1M+ entries) I do not see the point of
having eviction, they should just use unbounded DataContainer.

I am also looking to chart these for easy review, forum post, and
DataContainer performance tuning wiki. Tomorrow I'll determine impact of
passivation on DataContainer performance.

Cheers,
Vladimir


On 11-06-28 10:20 AM, Vladimir Blagojevic wrote:

> On 11-06-28 10:06 AM, Galder Zamarreño wrote:
>> Vladimir,
>>
>> I think it's better if you run your tests in one of the cluster or perf machines cos that way everyone has access to the same base system and results can be compared, particularly when changes are made. Also, you avoid local apps or CPU usage affecting your test results.
>>
>> I agree with Sanne, put ops for LIRS don't look go in comparison with LRU. Did you run some profiling?
>>
> Hey,
>
> Very likely you are right and it is a better approach but it does not
> take much to notice a trend of deteriorating BCHM performance for large
> map/cache size. Looking to do some profiling now.
>
> Cheers
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Manik Surtani
Awesome!  What's the default concurrency level we set in Infinispan?  Surely it is much higher than 32?

On 30 Jun 2011, at 01:18, Vladimir Blagojevic wrote:

> Hey, good news!
>
> I have found that a main culprit of a poor DataContainer performance for
> large caches (100K entries +) is in fact use of default concurrency of
> 32. If users are going to use caches with many entries then they should
> also increase concurrency level. I found that concurrency of 512 works
> fairly well for caches up to million entries. Also note that if users
> are using such large caches (1M+ entries) I do not see the point of
> having eviction, they should just use unbounded DataContainer.
>
> I am also looking to chart these for easy review, forum post, and
> DataContainer performance tuning wiki. Tomorrow I'll determine impact of
> passivation on DataContainer performance.
>
> Cheers,
> Vladimir
>
>
> On 11-06-28 10:20 AM, Vladimir Blagojevic wrote:
>> On 11-06-28 10:06 AM, Galder Zamarreño wrote:
>>> Vladimir,
>>>
>>> I think it's better if you run your tests in one of the cluster or perf machines cos that way everyone has access to the same base system and results can be compared, particularly when changes are made. Also, you avoid local apps or CPU usage affecting your test results.
>>>
>>> I agree with Sanne, put ops for LIRS don't look go in comparison with LRU. Did you run some profiling?
>>>
>> Hey,
>>
>> Very likely you are right and it is a better approach but it does not
>> take much to notice a trend of deteriorating BCHM performance for large
>> map/cache size. Looking to do some profiling now.
>>
>> Cheers
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
[hidden email]
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Mircea Markus
In reply to this post by Vladimir Blagojevic

On 30 Jun 2011, at 01:18, Vladimir Blagojevic wrote:

> Hey, good news!
>
> I have found that a main culprit of a poor DataContainer performance for
> large caches (100K entries +) is in fact use of default concurrency of
> 32.
Does that cause BCHM to create only 32 segments, that resulting in lots of contention on concurrent updates?
> If users are going to use caches with many entries then they should
> also increase concurrency level. I found that concurrency of 512 works
> fairly well for caches up to million entries. Also note that if users
> are using such large caches (1M+ entries) I do not see the point of
> having eviction, they should just use unbounded DataContainer.
I'm not sure this is true for all use cases: e.g. 1M Integers occupy cca 4Mb, and people might want to allocated up to Gb to cache data.
What I think we can do is suggest them(log), based on the  DC size, to increase the concurrencyLevel when needed.

>
> I am also looking to chart these for easy review, forum post, and
> DataContainer performance tuning wiki. Tomorrow I'll determine impact of
> passivation on DataContainer performance.
>
> Cheers,
> Vladimir
>
>
> On 11-06-28 10:20 AM, Vladimir Blagojevic wrote:
>> On 11-06-28 10:06 AM, Galder Zamarreño wrote:
>>> Vladimir,
>>>
>>> I think it's better if you run your tests in one of the cluster or perf machines cos that way everyone has access to the same base system and results can be compared, particularly when changes are made. Also, you avoid local apps or CPU usage affecting your test results.
>>>
>>> I agree with Sanne, put ops for LIRS don't look go in comparison with LRU. Did you run some profiling?
>>>
>> Hey,
>>
>> Very likely you are right and it is a better approach but it does not
>> take much to notice a trend of deteriorating BCHM performance for large
>> map/cache size. Looking to do some profiling now.
>>
>> Cheers
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Vladimir Blagojevic
On 11-06-30 6:08 AM, Mircea Markus wrote:
> On 30 Jun 2011, at 01:18, Vladimir Blagojevic wrote:
>
>> Hey, good news!
>>
>> I have found that a main culprit of a poor DataContainer performance for
>> large caches (100K entries +) is in fact use of default concurrency of
>> 32.
> Does that cause BCHM to create only 32 segments, that resulting in lots of contention on concurrent updates?

Yes, only 32 segments! Much of that performance impact in BCHM comes
from node tracking overhead per segment (queues, lists etc etc) as well.
When we increase segment count this overhead falls substantially. CHM
performs well with both 32 and 512 segments under these tests. It is
just that BCHM basically grinds to a stop for large caches if we have 32
segments only.

>> If users are going to use caches with many entries then they should
>> also increase concurrency level. I found that concurrency of 512 works
>> fairly well for caches up to million entries. Also note that if users
>> are using such large caches (1M+ entries) I do not see the point of
>> having eviction, they should just use unbounded DataContainer.
> I'm not sure this is true for all use cases: e.g. 1M Integers occupy cca 4Mb, and people might want to allocated up to Gb to cache data.
> What I think we can do is suggest them(log), based on the  DC size, to increase the concurrencyLevel when needed.

True, both of these parameters (object size and object count) should be
considered!

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Mircea Markus

On 30 Jun 2011, at 16:58, Vladimir Blagojevic wrote:

> On 11-06-30 6:08 AM, Mircea Markus wrote:
>> On 30 Jun 2011, at 01:18, Vladimir Blagojevic wrote:
>>
>>> Hey, good news!
>>>
>>> I have found that a main culprit of a poor DataContainer performance for
>>> large caches (100K entries +) is in fact use of default concurrency of
>>> 32.
>> Does that cause BCHM to create only 32 segments, that resulting in lots of contention on concurrent updates?
>
> Yes, only 32 segments! Much of that performance impact in BCHM comes from node tracking overhead per segment (queues, lists etc etc) as well. When we increase segment count this overhead falls substantially. CHM performs well with both 32 and 512 segments under these tests. It is just that BCHM basically grinds to a stop for large caches if we have 32 segments only.
Can't we adjust the segment size dynamically?
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Vladimir Blagojevic
On 11-06-30 12:23 PM, Mircea Markus wrote:
>> Yes, only 32 segments! Much of that performance impact in BCHM comes
>> from node tracking overhead per segment (queues, lists etc etc) as
>> well. When we increase segment count this overhead falls
>> substantially. CHM performs well with both 32 and 512 segments under
>> these tests. It is just that BCHM basically grinds to a stop for
>> large caches if we have 32 segments only.
> Can't we adjust the segment size dynamically?
I don't think we can ignore user settings for concurrency and max
capacity. Concurrency essentially equals to segment count. We can warn
user that if concurrency is inappropriate for container max capacity.
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Galder Zamarreno
In reply to this post by Vladimir Blagojevic

On Jun 30, 2011, at 2:18 AM, Vladimir Blagojevic wrote:

> Hey, good news!
>
> I have found that a main culprit of a poor DataContainer performance for large caches (100K entries +) is in fact use of default concurrency of 32. If users are going to use caches with many entries then they should also increase concurrency level. I found that concurrency of 512 works fairly well for caches up to million entries. Also note that if users are using such large caches (1M+ entries) I do not see the point of having eviction, they should just use unbounded DataContainer.

This might have come up before in the forums (http://community.jboss.org/message/609061#609061). Out of the box we're not as performant for bigger caches. Sure, we could increase the concurrency level but what would be the impact for small caches?

Could concurrency level be a bit more ergonomic?

> I am also looking to chart these for easy review, forum post, and DataContainer performance tuning wiki. Tomorrow I'll determine impact of passivation on DataContainer performance.
>
> Cheers,
> Vladimir
>
>
> On 11-06-28 10:20 AM, Vladimir Blagojevic wrote:
>> On 11-06-28 10:06 AM, Galder Zamarreño wrote:
>>> Vladimir,
>>>
>>> I think it's better if you run your tests in one of the cluster or perf machines cos that way everyone has access to the same base system and results can be compared, particularly when changes are made. Also, you avoid local apps or CPU usage affecting your test results.
>>>
>>> I agree with Sanne, put ops for LIRS don't look go in comparison with LRU. Did you run some profiling?
>>>
>> Hey,
>>
>> Very likely you are right and it is a better approach but it does not
>> take much to notice a trend of deteriorating BCHM performance for large
>> map/cache size. Looking to do some profiling now.
>>
>> Cheers
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Galder Zamarreno
In reply to this post by Mircea Markus

On Jun 30, 2011, at 12:08 PM, Mircea Markus wrote:

>
> On 30 Jun 2011, at 01:18, Vladimir Blagojevic wrote:
>
>> Hey, good news!
>>
>> I have found that a main culprit of a poor DataContainer performance for
>> large caches (100K entries +) is in fact use of default concurrency of
>> 32.
> Does that cause BCHM to create only 32 segments, that resulting in lots of contention on concurrent updates?
>> If users are going to use caches with many entries then they should
>> also increase concurrency level. I found that concurrency of 512 works
>> fairly well for caches up to million entries. Also note that if users
>> are using such large caches (1M+ entries) I do not see the point of
>> having eviction, they should just use unbounded DataContainer.

By the way, I forgot to mention, why isnt there a point of using eviction with 1M+ entries? I still wanna try to keep my memory consumption in check, regardless of the amount of data that I put in the cache.

What is it that you're trying to imply here exactly? That the data container is not performant once the number of max entries goes beyond some limit?

> I'm not sure this is true for all use cases: e.g. 1M Integers occupy cca 4Mb, and people might want to allocated up to Gb to cache data.
> What I think we can do is suggest them(log), based on the  DC size, to increase the concurrencyLevel when needed.

I think we should go further, we should try to be more ergonomic, adapt to the circumstance and do as much as we can so that the data container is tuned at runtime. And this applies not only to the data container, but buffer sizes...etc.

Logging suggestions is poor man's tuning.

>>
>> I am also looking to chart these for easy review, forum post, and
>> DataContainer performance tuning wiki. Tomorrow I'll determine impact of
>> passivation on DataContainer performance.
>>
>> Cheers,
>> Vladimir
>>
>>
>> On 11-06-28 10:20 AM, Vladimir Blagojevic wrote:
>>> On 11-06-28 10:06 AM, Galder Zamarreño wrote:
>>>> Vladimir,
>>>>
>>>> I think it's better if you run your tests in one of the cluster or perf machines cos that way everyone has access to the same base system and results can be compared, particularly when changes are made. Also, you avoid local apps or CPU usage affecting your test results.
>>>>
>>>> I agree with Sanne, put ops for LIRS don't look go in comparison with LRU. Did you run some profiling?
>>>>
>>> Hey,
>>>
>>> Very likely you are right and it is a better approach but it does not
>>> take much to notice a trend of deteriorating BCHM performance for large
>>> map/cache size. Looking to do some profiling now.
>>>
>>> Cheers
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Vladimir Blagojevic
In reply to this post by Galder Zamarreno
On 11-07-04 3:13 AM, Galder Zamarreño wrote:
> On Jun 30, 2011, at 2:18 AM, Vladimir Blagojevic wrote:
>
> This might have come up before in the forums (http://community.jboss.org/message/609061#609061). Out of the box we're not as performant for bigger caches. Sure, we could increase the concurrency level but what would be the impact for small caches?
>
> Could concurrency level be a bit more ergonomic?

I am all up for it but how would you do it? Should we completely ignore
concurrencyLevel setting by the user?
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] DataContainer performance review

Dan Berindei
In reply to this post by Vladimir Blagojevic
Hi Vladimir

On Thu, Jun 30, 2011 at 5:31 PM, Vladimir Blagojevic
<[hidden email]> wrote:

> On 11-06-30 12:23 PM, Mircea Markus wrote:
>>> Yes, only 32 segments! Much of that performance impact in BCHM comes
>>> from node tracking overhead per segment (queues, lists etc etc) as
>>> well. When we increase segment count this overhead falls
>>> substantially. CHM performs well with both 32 and 512 segments under
>>> these tests. It is just that BCHM basically grinds to a stop for
>>> large caches if we have 32 segments only.
>> Can't we adjust the segment size dynamically?
> I don't think we can ignore user settings for concurrency and max
> capacity. Concurrency essentially equals to segment count. We can warn
> user that if concurrency is inappropriate for container max capacity.

I don't think it is a good idea to tie the concurrency level to the
container capacity. ConcurrentHashMap defines the concurrency level as
"the allowed concurrency among update operations", we should try to
stick to that definition.

For comparison, I updated your MapStressTest to use a synchronized
java.util.LinkedHashMap with LRU eviction and also a (very incomplete)
ConcurrentHashMap clone using synchronized LinkedHashMaps as segments.
LinkedHashMap uses a neat trick: the hashtable entries are the same as
the LRU queue entries, so moving an element to the head of the queue
is an O(1) operation. This makes its performance much closer to CHM
than to our LRU implementation.

I don't think we can use this trick with our BoundedConcurrentHashMap
as it is, because we try to keep the map and the eviction policies
completely separate and that forces the LRU eviction to look up the
key in a LinkedList on average once for every get(), which is a O(n)
operation. I tried to use a BidirectionalLinkedHashMap instead of a
LinkedList and I failed, but it would never be as fast as the
LinkedHashMap implementation anyway.

I'm not sure if the LinkedHashMap trick can be applied for the LIRS
algorithm as well, since it uses both a stack and a queue (at least
that's what their names say). But it also tends to look up keys in the
queue, which is a LinkedList, so I definitely think we should try it.

The big advantage I see in the LinkedHashMap trick is with faster gets
we can eliminate the batching logic from the eviction algorithms,
which would offset the necessity of reimplementing the Segment
implementation completely for each eviction policy.

I also made some changes to the test itself to make it more "realistic":
1. I added a warm-up period
2. I used Gaussian distribution for the keys
3. I made the keys Strings to make the equals() calls more expensive
4. I made each thread start from another index in the keys sequence,
so they don't do exactly the same sequence of operations
5. At the end of the test I also print the standard deviation of the
keys in the map, which should broadly show how effective the eviction
policy is in keeping the most accessed keys in the map.

I've created a pull request at
https://github.com/infinispan/infinispan/pull/414 , please have a look
and tell me what you think.

Dan
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev