[infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Thomas SEGISMONT
Hi everyone,

I am working on integrating Infinispan 9.2.Final in vertx-infinispan. Before merging I wanted to make sure the test suite passed but it doesn't. It's not the always the same test involved.

In the logs, I see a lot of messages like "After merge (or coordinator change), cache still hasn't recovered a majority of members and must stay in degraded mode.
The context involved are "___counter_configuration" and "org.infinispan.LOCKS"

Most often it's harmless but, sometimes, I also see this exception "ISPN000210: Failed to request state of cache"
Again the cache involved is either "___counter_configuration" or "org.infinispan.LOCKS"
After this exception, the cache manager is unable to stop. It blocks in method "terminate" (join on cache future).

I thought the test suite was too rough (we stop all nodes at the same time). So I changed it to make sure that:
- nodes start one after the other
- a new node is started only when the previous one indicates HEALTHY status
- nodes stop one after the other
- a node is stopped only when it indicates HEALTHY status
Pretty much what we do on Kubernetes for the readiness check actually.
But it didn't get any better.

Attached are the logs of such a failing test.

Note that the Vert.x test itself does not fail, it's only when closing nodes that we have issues.


Does that ring a bell? Do you need more info?

Regards,
Thomas


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

InfinispanClusteredEventBusTest#testPublishByteArray.log (205K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Pedro Ruivo-2
Hi Thomas,

Is the test in question using any counter/lock?

I did see similar behavior with the counter's in our server test suite.
The partition handling makes the cache degraded because nodes are
starting and stopping concurrently.

I'm not sure if there are any JIRA to tracking. Ryan, Dan do you know?
If there is none, it should be created.

I improved the counters by making the cache start lazily when you first
get or define a counter [1]. This workaround solved the issue for us.

As a workaround for your test suite, I suggest to make sure the caches
(___counter_configuration and org.infinispan.LOCK) have finished their
state transfer before stopping the cache managers, by invoking
DefaultCacheManager.getCache(*cache-name*) in all the caches managers.

Sorry for the inconvenience and the delay in replying.

Cheers,
Pedro

[1] https://issues.jboss.org/browse/ISPN-8860

On 21-03-2018 16:16, Thomas SEGISMONT wrote:

> Hi everyone,
>
> I am working on integrating Infinispan 9.2.Final in vertx-infinispan.
> Before merging I wanted to make sure the test suite passed but it
> doesn't. It's not the always the same test involved.
>
> In the logs, I see a lot of messages like "After merge (or coordinator
> change), cache still hasn't recovered a majority of members and must
> stay in degraded mode.
> The context involved are "___counter_configuration" and
> "org.infinispan.LOCKS"
>
> Most often it's harmless but, sometimes, I also see this exception
> "ISPN000210: Failed to request state of cache"
> Again the cache involved is either "___counter_configuration" or
> "org.infinispan.LOCKS"
> After this exception, the cache manager is unable to stop. It blocks in
> method "terminate" (join on cache future).
>
> I thought the test suite was too rough (we stop all nodes at the same
> time). So I changed it to make sure that:
> - nodes start one after the other
> - a new node is started only when the previous one indicates HEALTHY status
> - nodes stop one after the other
> - a node is stopped only when it indicates HEALTHY status
> Pretty much what we do on Kubernetes for the readiness check actually.
> But it didn't get any better.
>
> Attached are the logs of such a failing test.
>
> Note that the Vert.x test itself does not fail, it's only when closing
> nodes that we have issues.
>
> Here's our XML config:
> https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>
> Does that ring a bell? Do you need more info?
>
> Regards,
> Thomas
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Radim Vansa
This looks similar to [1] which has a fix [2] ready for a while. Please
try with it to see if it solves your problem.

[1] https://issues.jboss.org/browse/ISPN-8859
[2] https://github.com/infinispan/infinispan/pull/5786

On 03/23/2018 01:25 PM, Pedro Ruivo wrote:

> Hi Thomas,
>
> Is the test in question using any counter/lock?
>
> I did see similar behavior with the counter's in our server test suite.
> The partition handling makes the cache degraded because nodes are
> starting and stopping concurrently.
>
> I'm not sure if there are any JIRA to tracking. Ryan, Dan do you know?
> If there is none, it should be created.
>
> I improved the counters by making the cache start lazily when you first
> get or define a counter [1]. This workaround solved the issue for us.
>
> As a workaround for your test suite, I suggest to make sure the caches
> (___counter_configuration and org.infinispan.LOCK) have finished their
> state transfer before stopping the cache managers, by invoking
> DefaultCacheManager.getCache(*cache-name*) in all the caches managers.
>
> Sorry for the inconvenience and the delay in replying.
>
> Cheers,
> Pedro
>
> [1] https://issues.jboss.org/browse/ISPN-8860
>
> On 21-03-2018 16:16, Thomas SEGISMONT wrote:
>> Hi everyone,
>>
>> I am working on integrating Infinispan 9.2.Final in vertx-infinispan.
>> Before merging I wanted to make sure the test suite passed but it
>> doesn't. It's not the always the same test involved.
>>
>> In the logs, I see a lot of messages like "After merge (or coordinator
>> change), cache still hasn't recovered a majority of members and must
>> stay in degraded mode.
>> The context involved are "___counter_configuration" and
>> "org.infinispan.LOCKS"
>>
>> Most often it's harmless but, sometimes, I also see this exception
>> "ISPN000210: Failed to request state of cache"
>> Again the cache involved is either "___counter_configuration" or
>> "org.infinispan.LOCKS"
>> After this exception, the cache manager is unable to stop. It blocks in
>> method "terminate" (join on cache future).
>>
>> I thought the test suite was too rough (we stop all nodes at the same
>> time). So I changed it to make sure that:
>> - nodes start one after the other
>> - a new node is started only when the previous one indicates HEALTHY status
>> - nodes stop one after the other
>> - a node is stopped only when it indicates HEALTHY status
>> Pretty much what we do on Kubernetes for the readiness check actually.
>> But it didn't get any better.
>>
>> Attached are the logs of such a failing test.
>>
>> Note that the Vert.x test itself does not fail, it's only when closing
>> nodes that we have issues.
>>
>> Here's our XML config:
>> https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>>
>> Does that ring a bell? Do you need more info?
>>
>> Regards,
>> Thomas
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Thomas SEGISMONT
In reply to this post by Pedro Ruivo-2
Hi Pedro,

2018-03-23 13:25 GMT+01:00 Pedro Ruivo <[hidden email]>:
Hi Thomas,

Is the test in question using any counter/lock?

I have seen the problem on a test for counters, on another one for locks, as well as well as caches only.
But Vert.x starts the ClusteredLockManager and the CounterManager in all cases (even if no lock/counter is created/used)
 

I did see similar behavior with the counter's in our server test suite.
The partition handling makes the cache degraded because nodes are
starting and stopping concurrently.

As for me I was able to observe the problem even when stopping nodes one after the other and waiting for cluster to go back to HEALTHY status.
Is it possible that the status of the counter and lock caches are not taken into account in cluster health?
 

I'm not sure if there are any JIRA to tracking. Ryan, Dan do you know?
If there is none, it should be created.

I improved the counters by making the cache start lazily when you first
get or define a counter [1]. This workaround solved the issue for us.

As a workaround for your test suite, I suggest to make sure the caches
(___counter_configuration and org.infinispan.LOCK) have finished their
state transfer before stopping the cache managers, by invoking
DefaultCacheManager.getCache(*cache-name*) in all the caches managers.

Sorry for the inconvenience and the delay in replying.

No problem.
 

Cheers,
Pedro

[1] https://issues.jboss.org/browse/ISPN-8860

On 21-03-2018 16:16, Thomas SEGISMONT wrote:
> Hi everyone,
>
> I am working on integrating Infinispan 9.2.Final in vertx-infinispan.
> Before merging I wanted to make sure the test suite passed but it
> doesn't. It's not the always the same test involved.
>
> In the logs, I see a lot of messages like "After merge (or coordinator
> change), cache still hasn't recovered a majority of members and must
> stay in degraded mode.
> The context involved are "___counter_configuration" and
> "org.infinispan.LOCKS"
>
> Most often it's harmless but, sometimes, I also see this exception
> "ISPN000210: Failed to request state of cache"
> Again the cache involved is either "___counter_configuration" or
> "org.infinispan.LOCKS"
> After this exception, the cache manager is unable to stop. It blocks in
> method "terminate" (join on cache future).
>
> I thought the test suite was too rough (we stop all nodes at the same
> time). So I changed it to make sure that:
> - nodes start one after the other
> - a new node is started only when the previous one indicates HEALTHY status
> - nodes stop one after the other
> - a node is stopped only when it indicates HEALTHY status
> Pretty much what we do on Kubernetes for the readiness check actually.
> But it didn't get any better.
>
> Attached are the logs of such a failing test.
>
> Note that the Vert.x test itself does not fail, it's only when closing
> nodes that we have issues.
>
> Here's our XML config:
> https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>
> Does that ring a bell? Do you need more info?
>
> Regards,
> Thomas
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Thomas SEGISMONT
In reply to this post by Radim Vansa
I'll give it a try and keep you posted. Thanks

2018-03-23 14:41 GMT+01:00 Radim Vansa <[hidden email]>:
This looks similar to [1] which has a fix [2] ready for a while. Please
try with it to see if it solves your problem.

[1] https://issues.jboss.org/browse/ISPN-8859
[2] https://github.com/infinispan/infinispan/pull/5786

On 03/23/2018 01:25 PM, Pedro Ruivo wrote:
> Hi Thomas,
>
> Is the test in question using any counter/lock?
>
> I did see similar behavior with the counter's in our server test suite.
> The partition handling makes the cache degraded because nodes are
> starting and stopping concurrently.
>
> I'm not sure if there are any JIRA to tracking. Ryan, Dan do you know?
> If there is none, it should be created.
>
> I improved the counters by making the cache start lazily when you first
> get or define a counter [1]. This workaround solved the issue for us.
>
> As a workaround for your test suite, I suggest to make sure the caches
> (___counter_configuration and org.infinispan.LOCK) have finished their
> state transfer before stopping the cache managers, by invoking
> DefaultCacheManager.getCache(*cache-name*) in all the caches managers.
>
> Sorry for the inconvenience and the delay in replying.
>
> Cheers,
> Pedro
>
> [1] https://issues.jboss.org/browse/ISPN-8860
>
> On 21-03-2018 16:16, Thomas SEGISMONT wrote:
>> Hi everyone,
>>
>> I am working on integrating Infinispan 9.2.Final in vertx-infinispan.
>> Before merging I wanted to make sure the test suite passed but it
>> doesn't. It's not the always the same test involved.
>>
>> In the logs, I see a lot of messages like "After merge (or coordinator
>> change), cache still hasn't recovered a majority of members and must
>> stay in degraded mode.
>> The context involved are "___counter_configuration" and
>> "org.infinispan.LOCKS"
>>
>> Most often it's harmless but, sometimes, I also see this exception
>> "ISPN000210: Failed to request state of cache"
>> Again the cache involved is either "___counter_configuration" or
>> "org.infinispan.LOCKS"
>> After this exception, the cache manager is unable to stop. It blocks in
>> method "terminate" (join on cache future).
>>
>> I thought the test suite was too rough (we stop all nodes at the same
>> time). So I changed it to make sure that:
>> - nodes start one after the other
>> - a new node is started only when the previous one indicates HEALTHY status
>> - nodes stop one after the other
>> - a node is stopped only when it indicates HEALTHY status
>> Pretty much what we do on Kubernetes for the readiness check actually.
>> But it didn't get any better.
>>
>> Attached are the logs of such a failing test.
>>
>> Note that the Vert.x test itself does not fail, it's only when closing
>> nodes that we have issues.
>>
>> Here's our XML config:
>> https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>>
>> Does that ring a bell? Do you need more info?
>>
>> Regards,
>> Thomas
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Radim Vansa <[hidden email]>
JBoss Performance Team

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Pedro Ruivo-2
In reply to this post by Thomas SEGISMONT


On 23-03-2018 15:06, Thomas SEGISMONT wrote:

> Hi Pedro,
>
> 2018-03-23 13:25 GMT+01:00 Pedro Ruivo <[hidden email]
> <mailto:[hidden email]>>:
>
>     Hi Thomas,
>
>     Is the test in question using any counter/lock?
>
>
> I have seen the problem on a test for counters, on another one for
> locks, as well as well as caches only.
> But Vert.x starts the ClusteredLockManager and the CounterManager in all
> cases (even if no lock/counter is created/used)
>
>
>     I did see similar behavior with the counter's in our server test suite.
>     The partition handling makes the cache degraded because nodes are
>     starting and stopping concurrently.
>
>
> As for me I was able to observe the problem even when stopping nodes one
> after the other and waiting for cluster to go back to HEALTHY status.
> Is it possible that the status of the counter and lock caches are not
> taken into account in cluster health?

The counter and lock caches are private. So, they aren't in the cluster
health neither their name are returned by getCacheNames() method.

>
>
>     I'm not sure if there are any JIRA to tracking. Ryan, Dan do you know?
>     If there is none, it should be created.
>
>     I improved the counters by making the cache start lazily when you first
>     get or define a counter [1]. This workaround solved the issue for us.
>
>     As a workaround for your test suite, I suggest to make sure the caches
>     (___counter_configuration and org.infinispan.LOCK) have finished their
>     state transfer before stopping the cache managers, by invoking
>     DefaultCacheManager.getCache(*cache-name*) in all the caches managers.
>
>     Sorry for the inconvenience and the delay in replying.
>
>
> No problem.
>
>
>     Cheers,
>     Pedro
>
>     [1] https://issues.jboss.org/browse/ISPN-8860
>     <https://issues.jboss.org/browse/ISPN-8860>
>
>     On 21-03-2018 16:16, Thomas SEGISMONT wrote:
>      > Hi everyone,
>      >
>      > I am working on integrating Infinispan 9.2.Final in vertx-infinispan.
>      > Before merging I wanted to make sure the test suite passed but it
>      > doesn't. It's not the always the same test involved.
>      >
>      > In the logs, I see a lot of messages like "After merge (or
>     coordinator
>      > change), cache still hasn't recovered a majority of members and must
>      > stay in degraded mode.
>      > The context involved are "___counter_configuration" and
>      > "org.infinispan.LOCKS"
>      >
>      > Most often it's harmless but, sometimes, I also see this exception
>      > "ISPN000210: Failed to request state of cache"
>      > Again the cache involved is either "___counter_configuration" or
>      > "org.infinispan.LOCKS"
>      > After this exception, the cache manager is unable to stop. It
>     blocks in
>      > method "terminate" (join on cache future).
>      >
>      > I thought the test suite was too rough (we stop all nodes at the same
>      > time). So I changed it to make sure that:
>      > - nodes start one after the other
>      > - a new node is started only when the previous one indicates
>     HEALTHY status
>      > - nodes stop one after the other
>      > - a node is stopped only when it indicates HEALTHY status
>      > Pretty much what we do on Kubernetes for the readiness check
>     actually.
>      > But it didn't get any better.
>      >
>      > Attached are the logs of such a failing test.
>      >
>      > Note that the Vert.x test itself does not fail, it's only when
>     closing
>      > nodes that we have issues.
>      >
>      > Here's our XML config:
>      >
>     https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>     <https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml>
>      >
>      > Does that ring a bell? Do you need more info?
>      >
>      > Regards,
>      > Thomas
>      >
>      >
>      >
>      > _______________________________________________
>      > infinispan-dev mailing list
>      > [hidden email]
>     <mailto:[hidden email]>
>      > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>      >
>     _______________________________________________
>     infinispan-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://lists.jboss.org/mailman/listinfo/infinispan-dev
>     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Thomas SEGISMONT


2018-03-26 13:16 GMT+02:00 Pedro Ruivo <[hidden email]>:


On 23-03-2018 15:06, Thomas SEGISMONT wrote:
> Hi Pedro,
>
> 2018-03-23 13:25 GMT+01:00 Pedro Ruivo <[hidden email]
> <mailto:[hidden email]>>:
>
>     Hi Thomas,
>
>     Is the test in question using any counter/lock?
>
>
> I have seen the problem on a test for counters, on another one for
> locks, as well as well as caches only.
> But Vert.x starts the ClusteredLockManager and the CounterManager in all
> cases (even if no lock/counter is created/used)
>
>
>     I did see similar behavior with the counter's in our server test suite.
>     The partition handling makes the cache degraded because nodes are
>     starting and stopping concurrently.
>
>
> As for me I was able to observe the problem even when stopping nodes one
> after the other and waiting for cluster to go back to HEALTHY status.
> Is it possible that the status of the counter and lock caches are not
> taken into account in cluster health?

The counter and lock caches are private. So, they aren't in the cluster
health neither their name are returned by getCacheNames() method.

Thanks for the details.

I'm not concerned with these internal caches not being listed when calling getCacheNames.

However, the cluster health status should include their status as well.
Cluster status testing is the recommended way to implement readiness checks on Kubernetes for example.

What do you think Sebastian?
 

>
>
>     I'm not sure if there are any JIRA to tracking. Ryan, Dan do you know?
>     If there is none, it should be created.
>
>     I improved the counters by making the cache start lazily when you first
>     get or define a counter [1]. This workaround solved the issue for us.
>
>     As a workaround for your test suite, I suggest to make sure the caches
>     (___counter_configuration and org.infinispan.LOCK) have finished their
>     state transfer before stopping the cache managers, by invoking
>     DefaultCacheManager.getCache(*cache-name*) in all the caches managers.
>
>     Sorry for the inconvenience and the delay in replying.
>
>
> No problem.
>
>
>     Cheers,
>     Pedro
>
>     [1] https://issues.jboss.org/browse/ISPN-8860
>     <https://issues.jboss.org/browse/ISPN-8860>
>
>     On 21-03-2018 16:16, Thomas SEGISMONT wrote:
>      > Hi everyone,
>      >
>      > I am working on integrating Infinispan 9.2.Final in vertx-infinispan.
>      > Before merging I wanted to make sure the test suite passed but it
>      > doesn't. It's not the always the same test involved.
>      >
>      > In the logs, I see a lot of messages like "After merge (or
>     coordinator
>      > change), cache still hasn't recovered a majority of members and must
>      > stay in degraded mode.
>      > The context involved are "___counter_configuration" and
>      > "org.infinispan.LOCKS"
>      >
>      > Most often it's harmless but, sometimes, I also see this exception
>      > "ISPN000210: Failed to request state of cache"
>      > Again the cache involved is either "___counter_configuration" or
>      > "org.infinispan.LOCKS"
>      > After this exception, the cache manager is unable to stop. It
>     blocks in
>      > method "terminate" (join on cache future).
>      >
>      > I thought the test suite was too rough (we stop all nodes at the same
>      > time). So I changed it to make sure that:
>      > - nodes start one after the other
>      > - a new node is started only when the previous one indicates
>     HEALTHY status
>      > - nodes stop one after the other
>      > - a node is stopped only when it indicates HEALTHY status
>      > Pretty much what we do on Kubernetes for the readiness check
>     actually.
>      > But it didn't get any better.
>      >
>      > Attached are the logs of such a failing test.
>      >
>      > Note that the Vert.x test itself does not fail, it's only when
>     closing
>      > nodes that we have issues.
>      >
>      > Here's our XML config:
>      >
>     https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>     <https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml>
>      >
>      > Does that ring a bell? Do you need more info?
>      >
>      > Regards,
>      > Thomas
>      >
>      >
>      >
>      > _______________________________________________
>      > infinispan-dev mailing list
>      > [hidden email]
>     <mailto:[hidden email]>
>      > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>      >
>     _______________________________________________
>     infinispan-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://lists.jboss.org/mailman/listinfo/infinispan-dev
>     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Sebastian Laskawiec
At the moment, the cluster health status checker enumerates all caches in the cache manager [1] and checks whether those cashes are running and not in degraded more [2].


On Mon, Mar 26, 2018 at 1:59 PM Thomas SEGISMONT <[hidden email]> wrote:
2018-03-26 13:16 GMT+02:00 Pedro Ruivo <[hidden email]>:


On 23-03-2018 15:06, Thomas SEGISMONT wrote:
> Hi Pedro,
>
> 2018-03-23 13:25 GMT+01:00 Pedro Ruivo <[hidden email]
> <mailto:[hidden email]>>:
>
>     Hi Thomas,
>
>     Is the test in question using any counter/lock?
>
>
> I have seen the problem on a test for counters, on another one for
> locks, as well as well as caches only.
> But Vert.x starts the ClusteredLockManager and the CounterManager in all
> cases (even if no lock/counter is created/used)
>
>
>     I did see similar behavior with the counter's in our server test suite.
>     The partition handling makes the cache degraded because nodes are
>     starting and stopping concurrently.
>
>
> As for me I was able to observe the problem even when stopping nodes one
> after the other and waiting for cluster to go back to HEALTHY status.
> Is it possible that the status of the counter and lock caches are not
> taken into account in cluster health?

The counter and lock caches are private. So, they aren't in the cluster
health neither their name are returned by getCacheNames() method.

Thanks for the details.

I'm not concerned with these internal caches not being listed when calling getCacheNames.

However, the cluster health status should include their status as well.
Cluster status testing is the recommended way to implement readiness checks on Kubernetes for example.

What do you think Sebastian?
 

>
>
>     I'm not sure if there are any JIRA to tracking. Ryan, Dan do you know?
>     If there is none, it should be created.
>
>     I improved the counters by making the cache start lazily when you first
>     get or define a counter [1]. This workaround solved the issue for us.
>
>     As a workaround for your test suite, I suggest to make sure the caches
>     (___counter_configuration and org.infinispan.LOCK) have finished their
>     state transfer before stopping the cache managers, by invoking
>     DefaultCacheManager.getCache(*cache-name*) in all the caches managers.
>
>     Sorry for the inconvenience and the delay in replying.
>
>
> No problem.
>
>
>     Cheers,
>     Pedro
>
>     [1] https://issues.jboss.org/browse/ISPN-8860
>     <https://issues.jboss.org/browse/ISPN-8860>
>
>     On 21-03-2018 16:16, Thomas SEGISMONT wrote:
>      > Hi everyone,
>      >
>      > I am working on integrating Infinispan 9.2.Final in vertx-infinispan.
>      > Before merging I wanted to make sure the test suite passed but it
>      > doesn't. It's not the always the same test involved.
>      >
>      > In the logs, I see a lot of messages like "After merge (or
>     coordinator
>      > change), cache still hasn't recovered a majority of members and must
>      > stay in degraded mode.
>      > The context involved are "___counter_configuration" and
>      > "org.infinispan.LOCKS"
>      >
>      > Most often it's harmless but, sometimes, I also see this exception
>      > "ISPN000210: Failed to request state of cache"
>      > Again the cache involved is either "___counter_configuration" or
>      > "org.infinispan.LOCKS"
>      > After this exception, the cache manager is unable to stop. It
>     blocks in
>      > method "terminate" (join on cache future).
>      >
>      > I thought the test suite was too rough (we stop all nodes at the same
>      > time). So I changed it to make sure that:
>      > - nodes start one after the other
>      > - a new node is started only when the previous one indicates
>     HEALTHY status
>      > - nodes stop one after the other
>      > - a node is stopped only when it indicates HEALTHY status
>      > Pretty much what we do on Kubernetes for the readiness check
>     actually.
>      > But it didn't get any better.
>      >
>      > Attached are the logs of such a failing test.
>      >
>      > Note that the Vert.x test itself does not fail, it's only when
>     closing
>      > nodes that we have issues.
>      >
>      > Here's our XML config:
>      >
>     https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>     <https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml>
>      >
>      > Does that ring a bell? Do you need more info?
>      >
>      > Regards,
>      > Thomas
>      >
>      >
>      >
>      > _______________________________________________
>      > infinispan-dev mailing list
>      > [hidden email]
>     <mailto:[hidden email]>
>      > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>      >
>     _______________________________________________
>     infinispan-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://lists.jboss.org/mailman/listinfo/infinispan-dev
>     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Thomas SEGISMONT
Thanks Sebastian. Is there a JIRA for this already?

2018-03-27 10:03 GMT+02:00 Sebastian Laskawiec <[hidden email]>:
At the moment, the cluster health status checker enumerates all caches in the cache manager [1] and checks whether those cashes are running and not in degraded more [2].


On Mon, Mar 26, 2018 at 1:59 PM Thomas SEGISMONT <[hidden email]> wrote:
2018-03-26 13:16 GMT+02:00 Pedro Ruivo <[hidden email]>:


On 23-03-2018 15:06, Thomas SEGISMONT wrote:
> Hi Pedro,
>
> 2018-03-23 13:25 GMT+01:00 Pedro Ruivo <[hidden email]
> <mailto:[hidden email]>>:
>
>     Hi Thomas,
>
>     Is the test in question using any counter/lock?
>
>
> I have seen the problem on a test for counters, on another one for
> locks, as well as well as caches only.
> But Vert.x starts the ClusteredLockManager and the CounterManager in all
> cases (even if no lock/counter is created/used)
>
>
>     I did see similar behavior with the counter's in our server test suite.
>     The partition handling makes the cache degraded because nodes are
>     starting and stopping concurrently.
>
>
> As for me I was able to observe the problem even when stopping nodes one
> after the other and waiting for cluster to go back to HEALTHY status.
> Is it possible that the status of the counter and lock caches are not
> taken into account in cluster health?

The counter and lock caches are private. So, they aren't in the cluster
health neither their name are returned by getCacheNames() method.

Thanks for the details.

I'm not concerned with these internal caches not being listed when calling getCacheNames.

However, the cluster health status should include their status as well.
Cluster status testing is the recommended way to implement readiness checks on Kubernetes for example.

What do you think Sebastian?
 

>
>
>     I'm not sure if there are any JIRA to tracking. Ryan, Dan do you know?
>     If there is none, it should be created.
>
>     I improved the counters by making the cache start lazily when you first
>     get or define a counter [1]. This workaround solved the issue for us.
>
>     As a workaround for your test suite, I suggest to make sure the caches
>     (___counter_configuration and org.infinispan.LOCK) have finished their
>     state transfer before stopping the cache managers, by invoking
>     DefaultCacheManager.getCache(*cache-name*) in all the caches managers.
>
>     Sorry for the inconvenience and the delay in replying.
>
>
> No problem.
>
>
>     Cheers,
>     Pedro
>
>     [1] https://issues.jboss.org/browse/ISPN-8860
>     <https://issues.jboss.org/browse/ISPN-8860>
>
>     On 21-03-2018 16:16, Thomas SEGISMONT wrote:
>      > Hi everyone,
>      >
>      > I am working on integrating Infinispan 9.2.Final in vertx-infinispan.
>      > Before merging I wanted to make sure the test suite passed but it
>      > doesn't. It's not the always the same test involved.
>      >
>      > In the logs, I see a lot of messages like "After merge (or
>     coordinator
>      > change), cache still hasn't recovered a majority of members and must
>      > stay in degraded mode.
>      > The context involved are "___counter_configuration" and
>      > "org.infinispan.LOCKS"
>      >
>      > Most often it's harmless but, sometimes, I also see this exception
>      > "ISPN000210: Failed to request state of cache"
>      > Again the cache involved is either "___counter_configuration" or
>      > "org.infinispan.LOCKS"
>      > After this exception, the cache manager is unable to stop. It
>     blocks in
>      > method "terminate" (join on cache future).
>      >
>      > I thought the test suite was too rough (we stop all nodes at the same
>      > time). So I changed it to make sure that:
>      > - nodes start one after the other
>      > - a new node is started only when the previous one indicates
>     HEALTHY status
>      > - nodes stop one after the other
>      > - a node is stopped only when it indicates HEALTHY status
>      > Pretty much what we do on Kubernetes for the readiness check
>     actually.
>      > But it didn't get any better.
>      >
>      > Attached are the logs of such a failing test.
>      >
>      > Note that the Vert.x test itself does not fail, it's only when
>     closing
>      > nodes that we have issues.
>      >
>      > Here's our XML config:
>      >
>     https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>     <https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml>
>      >
>      > Does that ring a bell? Do you need more info?
>      >
>      > Regards,
>      > Thomas
>      >
>      >
>      >
>      > _______________________________________________
>      > infinispan-dev mailing list
>      > [hidden email]
>     <mailto:[hidden email]>
>      > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>      >
>     _______________________________________________
>     infinispan-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://lists.jboss.org/mailman/listinfo/infinispan-dev
>     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Pedro Ruivo-2
In reply to this post by Sebastian Laskawiec


On 27-03-2018 09:03, Sebastian Laskawiec wrote:
> At the moment, the cluster health status checker enumerates all caches
> in the cache manager [1] and checks whether those cashes are running and
> not in degraded more [2].
>
> I'm not sure how counter caches have been implemented. One thing is for
> sure - they should be taken into account in this loop [3].

The private caches aren't listed by CacheManager.getCacheNames(). We
have to check them via InternalCacheRegistry.getInternalCacheNames().

I'll open a JIRA if you don't mind :)

>
> [1]
> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L22
> [2]
> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/CacheHealthImpl.java#L25
> [3]
> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L23-L24
>
> On Mon, Mar 26, 2018 at 1:59 PM Thomas SEGISMONT <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     2018-03-26 13:16 GMT+02:00 Pedro Ruivo <[hidden email]
>     <mailto:[hidden email]>>:
>
>
>
>         On 23-03-2018 15:06, Thomas SEGISMONT wrote:
>         > Hi Pedro,
>         >
>         > 2018-03-23 13:25 GMT+01:00 Pedro Ruivo <[hidden email] <mailto:[hidden email]>
>          > <mailto:[hidden email] <mailto:[hidden email]>>>:
>         >
>         >     Hi Thomas,
>         >
>         >     Is the test in question using any counter/lock?
>         >
>         >
>         > I have seen the problem on a test for counters, on another one for
>         > locks, as well as well as caches only.
>         > But Vert.x starts the ClusteredLockManager and the CounterManager in all
>         > cases (even if no lock/counter is created/used)
>         >
>         >
>         >     I did see similar behavior with the counter's in our server test suite.
>         >     The partition handling makes the cache degraded because nodes are
>         >     starting and stopping concurrently.
>         >
>         >
>         > As for me I was able to observe the problem even when stopping nodes one
>         > after the other and waiting for cluster to go back to HEALTHY status.
>         > Is it possible that the status of the counter and lock caches are not
>         > taken into account in cluster health?
>
>         The counter and lock caches are private. So, they aren't in the
>         cluster
>         health neither their name are returned by getCacheNames() method.
>
>
>     Thanks for the details.
>
>     I'm not concerned with these internal caches not being listed when
>     calling getCacheNames.
>
>     However, the cluster health status should include their status as well.
>     Cluster status testing is the recommended way to implement readiness
>     checks on Kubernetes for example.
>
>     What do you think Sebastian?
>
>
>          >
>          >
>          >     I'm not sure if there are any JIRA to tracking. Ryan, Dan
>         do you know?
>          >     If there is none, it should be created.
>          >
>          >     I improved the counters by making the cache start lazily
>         when you first
>          >     get or define a counter [1]. This workaround solved the
>         issue for us.
>          >
>          >     As a workaround for your test suite, I suggest to make
>         sure the caches
>          >     (___counter_configuration and org.infinispan.LOCK) have
>         finished their
>          >     state transfer before stopping the cache managers, by
>         invoking
>          >     DefaultCacheManager.getCache(*cache-name*) in all the
>         caches managers.
>          >
>          >     Sorry for the inconvenience and the delay in replying.
>          >
>          >
>          > No problem.
>          >
>          >
>          >     Cheers,
>          >     Pedro
>          >
>          >     [1] https://issues.jboss.org/browse/ISPN-8860
>          >     <https://issues.jboss.org/browse/ISPN-8860>
>          >
>          >     On 21-03-2018 16:16, Thomas SEGISMONT wrote:
>          >      > Hi everyone,
>          >      >
>          >      > I am working on integrating Infinispan 9.2.Final in
>         vertx-infinispan.
>          >      > Before merging I wanted to make sure the test suite
>         passed but it
>          >      > doesn't. It's not the always the same test involved.
>          >      >
>          >      > In the logs, I see a lot of messages like "After merge (or
>          >     coordinator
>          >      > change), cache still hasn't recovered a majority of
>         members and must
>          >      > stay in degraded mode.
>          >      > The context involved are "___counter_configuration" and
>          >      > "org.infinispan.LOCKS"
>          >      >
>          >      > Most often it's harmless but, sometimes, I also see
>         this exception
>          >      > "ISPN000210: Failed to request state of cache"
>          >      > Again the cache involved is either
>         "___counter_configuration" or
>          >      > "org.infinispan.LOCKS"
>          >      > After this exception, the cache manager is unable to
>         stop. It
>          >     blocks in
>          >      > method "terminate" (join on cache future).
>          >      >
>          >      > I thought the test suite was too rough (we stop all
>         nodes at the same
>          >      > time). So I changed it to make sure that:
>          >      > - nodes start one after the other
>          >      > - a new node is started only when the previous one
>         indicates
>          >     HEALTHY status
>          >      > - nodes stop one after the other
>          >      > - a node is stopped only when it indicates HEALTHY status
>          >      > Pretty much what we do on Kubernetes for the readiness
>         check
>          >     actually.
>          >      > But it didn't get any better.
>          >      >
>          >      > Attached are the logs of such a failing test.
>          >      >
>          >      > Note that the Vert.x test itself does not fail, it's
>         only when
>          >     closing
>          >      > nodes that we have issues.
>          >      >
>          >      > Here's our XML config:
>          >      >
>          >
>         https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml
>          >  
>           <https://github.com/vert-x3/vertx-infinispan/blob/ispn92/src/main/resources/default-infinispan.xml>
>          >      >
>          >      > Does that ring a bell? Do you need more info?
>          >      >
>          >      > Regards,
>          >      > Thomas
>          >      >
>          >      >
>          >      >
>          >      > _______________________________________________
>          >      > infinispan-dev mailing list
>          >      > [hidden email]
>         <mailto:[hidden email]>
>          >     <mailto:[hidden email]
>         <mailto:[hidden email]>>
>          >      > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>         >     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>         >      >
>         >     _______________________________________________
>         >     infinispan-dev mailing list
>          > [hidden email]
>         <mailto:[hidden email]>
>         <mailto:[hidden email]
>         <mailto:[hidden email]>>
>          > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>          >     <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>          >
>          >
>          >
>          >
>          > _______________________________________________
>          > infinispan-dev mailing list
>          > [hidden email]
>         <mailto:[hidden email]>
>          > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>          >
>         _______________________________________________
>         infinispan-dev mailing list
>         [hidden email]
>         <mailto:[hidden email]>
>         https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>     _______________________________________________
>     infinispan-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Pedro Ruivo-2
JIRA: https://issues.jboss.org/browse/ISPN-8994

On 27-03-2018 10:08, Pedro Ruivo wrote:

>
>
> On 27-03-2018 09:03, Sebastian Laskawiec wrote:
>> At the moment, the cluster health status checker enumerates all caches
>> in the cache manager [1] and checks whether those cashes are running
>> and not in degraded more [2].
>>
>> I'm not sure how counter caches have been implemented. One thing is
>> for sure - they should be taken into account in this loop [3].
>
> The private caches aren't listed by CacheManager.getCacheNames(). We
> have to check them via InternalCacheRegistry.getInternalCacheNames().
>
> I'll open a JIRA if you don't mind :)
>
>>
>> [1]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L22 
>>
>> [2]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/CacheHealthImpl.java#L25 
>>
>> [3]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L23-L24 
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Thomas SEGISMONT
Hi folks,

Sorry I've been busy on other things and couldn't get back to you earlier.

I tried running vertx-infinispan test suite with 9.2.1.Final today. There are some problems still but I can't say which ones yet because I hit: https://jira.qos.ch/browse/LOGBACK-1027

We use logback for test logs and all I get is:

2018-04-18 11:37:46,678 [stateTransferExecutor-thread--p4453-t24] ERROR o.i.executors.LimitedExecutor - Exception in task
java.lang.StackOverflowError: null
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:54)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
... so on so forth

I will run the suite again without logback and tell you what the actual problem is.

Regards,
Thomas

2018-03-27 11:15 GMT+02:00 Pedro Ruivo <[hidden email]>:
JIRA: https://issues.jboss.org/browse/ISPN-8994

On 27-03-2018 10:08, Pedro Ruivo wrote:
>
>
> On 27-03-2018 09:03, Sebastian Laskawiec wrote:
>> At the moment, the cluster health status checker enumerates all caches
>> in the cache manager [1] and checks whether those cashes are running
>> and not in degraded more [2].
>>
>> I'm not sure how counter caches have been implemented. One thing is
>> for sure - they should be taken into account in this loop [3].
>
> The private caches aren't listed by CacheManager.getCacheNames(). We
> have to check them via InternalCacheRegistry.getInternalCacheNames().
>
> I'll open a JIRA if you don't mind :)
>
>>
>> [1]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L22
>>
>> [2]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/CacheHealthImpl.java#L25
>>
>> [3]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L23-L24
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Thomas SEGISMONT
So here's the Circular Referenced Suppressed Exception

[stateTransferExecutor-thread--p221-t33] 2018-04-18T16:15:06.662+02:00 WARN [org.infinispan.statetransfer.InboundTransferTask]  ISPN000210: Failed to request state of cache __vertx.subs from node sombrero-25286, segments {0}
org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected
    at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
    at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
    at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
    at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
    at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:51)
    at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.addRequest(JGroupsTransport.java:921)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeCommand(JGroupsTransport.java:815)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeCommand(JGroupsTransport.java:123)
    at org.infinispan.remoting.rpc.RpcManagerImpl.invokeCommand(RpcManagerImpl.java:138)
    at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
    at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
    at org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest.lambda$requestState$2(StateReceiverImpl.java:164)
    at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:101)
    at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
    at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
    at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:82)
        at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:260)
        ... 10 more
    [CIRCULAR REFERENCE:org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected]

It does not happen with 9.2.0.Final and prevents from using ISPN embedded with logback. Do you want me to file an issue ?

2018-04-18 11:45 GMT+02:00 Thomas SEGISMONT <[hidden email]>:
Hi folks,

Sorry I've been busy on other things and couldn't get back to you earlier.

I tried running vertx-infinispan test suite with 9.2.1.Final today. There are some problems still but I can't say which ones yet because I hit: https://jira.qos.ch/browse/LOGBACK-1027

We use logback for test logs and all I get is:

2018-04-18 11:37:46,678 [stateTransferExecutor-thread--p4453-t24] ERROR o.i.executors.LimitedExecutor - Exception in task
java.lang.StackOverflowError: null
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:54)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
... so on so forth

I will run the suite again without logback and tell you what the actual problem is.

Regards,
Thomas

2018-03-27 11:15 GMT+02:00 Pedro Ruivo <[hidden email]>:
JIRA: https://issues.jboss.org/browse/ISPN-8994

On 27-03-2018 10:08, Pedro Ruivo wrote:
>
>
> On 27-03-2018 09:03, Sebastian Laskawiec wrote:
>> At the moment, the cluster health status checker enumerates all caches
>> in the cache manager [1] and checks whether those cashes are running
>> and not in degraded more [2].
>>
>> I'm not sure how counter caches have been implemented. One thing is
>> for sure - they should be taken into account in this loop [3].
>
> The private caches aren't listed by CacheManager.getCacheNames(). We
> have to check them via InternalCacheRegistry.getInternalCacheNames().
>
> I'll open a JIRA if you don't mind :)
>
>>
>> [1]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L22
>>
>> [2]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/CacheHealthImpl.java#L25
>>
>> [3]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L23-L24
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Thomas SEGISMONT
I tried our test suite on a slower machine (iMac from 2011). It passes consistently there.

On my laptop, I keep seeing this from time to time (in different tests):

2018-04-19T19:53:09.513 WARN [Context=org.infinispan.LOCKS]ISPN000320: After merge (or coordinator change), cache still hasn't recovered a majority of members and must stay in degraded mode. Current members are [sombrero-19385], lost members are [sombrero-42917], stable members are [sombrero-42917, sombrero-19385]

It happens when we shutdown nodes one after the other (even when waiting for cluster status to be "healthy" plus extra 2 seconds).

After that the nodes remains blocked in DefaultCacheManager.stop

2018-04-19T19:49:29.242 AVERTISSEMENT Thread Thread[vert.x-worker-thread-5,5,main] has been blocked for 60774 ms, time limit is 60000
io.vertx.core.VertxException: Thread blocked
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
    at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
    at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
    at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
    at org.infinispan.manager.DefaultCacheManager.terminate(DefaultCacheManager.java:688)
    at org.infinispan.manager.DefaultCacheManager.stopCaches(DefaultCacheManager.java:734)
    at org.infinispan.manager.DefaultCacheManager.stop(DefaultCacheManager.java:711)
    at io.vertx.ext.cluster.infinispan.InfinispanClusterManager.lambda$leave$5(InfinispanClusterManager.java:285)
    at io.vertx.ext.cluster.infinispan.InfinispanClusterManager$$Lambda$421/578931659.handle(Unknown Source)
    at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:265)
    at io.vertx.core.impl.ContextImpl$$Lambda$27/1330754528.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)



2018-04-18 17:00 GMT+02:00 Thomas SEGISMONT <[hidden email]>:
So here's the Circular Referenced Suppressed Exception

[stateTransferExecutor-thread--p221-t33] 2018-04-18T16:15:06.662+02:00 WARN [org.infinispan.statetransfer.InboundTransferTask]  ISPN000210: Failed to request state of cache __vertx.subs from node sombrero-25286, segments {0}
org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected
    at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
    at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
    at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
    at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
    at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:51)
    at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.addRequest(JGroupsTransport.java:921)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeCommand(JGroupsTransport.java:815)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeCommand(JGroupsTransport.java:123)
    at org.infinispan.remoting.rpc.RpcManagerImpl.invokeCommand(RpcManagerImpl.java:138)
    at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
    at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
    at org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest.lambda$requestState$2(StateReceiverImpl.java:164)
    at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:101)
    at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
    at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
    at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:82)
        at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:260)
        ... 10 more
    [CIRCULAR REFERENCE:org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected]

It does not happen with 9.2.0.Final and prevents from using ISPN embedded with logback. Do you want me to file an issue ?

2018-04-18 11:45 GMT+02:00 Thomas SEGISMONT <[hidden email]>:
Hi folks,

Sorry I've been busy on other things and couldn't get back to you earlier.

I tried running vertx-infinispan test suite with 9.2.1.Final today. There are some problems still but I can't say which ones yet because I hit: https://jira.qos.ch/browse/LOGBACK-1027

We use logback for test logs and all I get is:

2018-04-18 11:37:46,678 [stateTransferExecutor-thread--p4453-t24] ERROR o.i.executors.LimitedExecutor - Exception in task
java.lang.StackOverflowError: null
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:54)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
... so on so forth

I will run the suite again without logback and tell you what the actual problem is.

Regards,
Thomas

2018-03-27 11:15 GMT+02:00 Pedro Ruivo <[hidden email]>:
JIRA: https://issues.jboss.org/browse/ISPN-8994

On 27-03-2018 10:08, Pedro Ruivo wrote:
>
>
> On 27-03-2018 09:03, Sebastian Laskawiec wrote:
>> At the moment, the cluster health status checker enumerates all caches
>> in the cache manager [1] and checks whether those cashes are running
>> and not in degraded more [2].
>>
>> I'm not sure how counter caches have been implemented. One thing is
>> for sure - they should be taken into account in this loop [3].
>
> The private caches aren't listed by CacheManager.getCacheNames(). We
> have to check them via InternalCacheRegistry.getInternalCacheNames().
>
> I'll open a JIRA if you don't mind :)
>
>>
>> [1]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L22
>>
>> [2]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/CacheHealthImpl.java#L25
>>
>> [3]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L23-L24
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev




_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

William Burns-3
On Fri, Apr 20, 2018 at 9:43 AM Thomas SEGISMONT <[hidden email]> wrote:
I tried our test suite on a slower machine (iMac from 2011). It passes consistently there.

On my laptop, I keep seeing this from time to time (in different tests):

2018-04-19T19:53:09.513 WARN [Context=org.infinispan.LOCKS]ISPN000320: After merge (or coordinator change), cache still hasn't recovered a majority of members and must stay in degraded mode. Current members are [sombrero-19385], lost members are [sombrero-42917], stable members are [sombrero-42917, sombrero-19385]

I would expect the nodes to be leaving gracefully, which shouldn't cause a merge. I am not sure how your test is producing that. Can you produce a TRACE log and a JIRA for it?

However if there is a merge, if you go down a single node it will always be in DEGRADED mode, when using partition handling. This is due to not having a simple majority as described in http://infinispan.org/docs/stable/user_guide/user_guide.html#partition_handling
 

It happens when we shutdown nodes one after the other (even when waiting for cluster status to be "healthy" plus extra 2 seconds).

After that the nodes remains blocked in DefaultCacheManager.stop

2018-04-19T19:49:29.242 AVERTISSEMENT Thread Thread[vert.x-worker-thread-5,5,main] has been blocked for 60774 ms, time limit is 60000
io.vertx.core.VertxException: Thread blocked
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
    at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
    at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
    at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
    at org.infinispan.manager.DefaultCacheManager.terminate(DefaultCacheManager.java:688)
    at org.infinispan.manager.DefaultCacheManager.stopCaches(DefaultCacheManager.java:734)
    at org.infinispan.manager.DefaultCacheManager.stop(DefaultCacheManager.java:711)
    at io.vertx.ext.cluster.infinispan.InfinispanClusterManager.lambda$leave$5(InfinispanClusterManager.java:285)
    at io.vertx.ext.cluster.infinispan.InfinispanClusterManager$$Lambda$421/578931659.handle(Unknown Source)
    at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:265)
    at io.vertx.core.impl.ContextImpl$$Lambda$27/1330754528.run(Unknown Source)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)

This looks like the exact issue that Radim mentioned with https://issues.jboss.org/browse/ISPN-8859.
 



2018-04-18 17:00 GMT+02:00 Thomas SEGISMONT <[hidden email]>:
So here's the Circular Referenced Suppressed Exception

[stateTransferExecutor-thread--p221-t33] 2018-04-18T16:15:06.662+02:00 WARN [org.infinispan.statetransfer.InboundTransferTask]  ISPN000210: Failed to request state of cache __vertx.subs from node sombrero-25286, segments {0}
org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected
    at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
    at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
    at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
    at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
    at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:51)
    at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.addRequest(JGroupsTransport.java:921)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeCommand(JGroupsTransport.java:815)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeCommand(JGroupsTransport.java:123)
    at org.infinispan.remoting.rpc.RpcManagerImpl.invokeCommand(RpcManagerImpl.java:138)
    at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
    at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
    at org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest.lambda$requestState$2(StateReceiverImpl.java:164)
    at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:101)
    at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
    at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
    at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:82)
        at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:260)
        ... 10 more
    [CIRCULAR REFERENCE:org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected]

It does not happen with 9.2.0.Final and prevents from using ISPN embedded with logback. Do you want me to file an issue ?

2018-04-18 11:45 GMT+02:00 Thomas SEGISMONT <[hidden email]>:
Hi folks,

Sorry I've been busy on other things and couldn't get back to you earlier.

I tried running vertx-infinispan test suite with 9.2.1.Final today. There are some problems still but I can't say which ones yet because I hit: https://jira.qos.ch/browse/LOGBACK-1027

We use logback for test logs and all I get is:

2018-04-18 11:37:46,678 [stateTransferExecutor-thread--p4453-t24] ERROR o.i.executors.LimitedExecutor - Exception in task
java.lang.StackOverflowError: null
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:54)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
... so on so forth

I will run the suite again without logback and tell you what the actual problem is.

Regards,
Thomas

2018-03-27 11:15 GMT+02:00 Pedro Ruivo <[hidden email]>:
JIRA: https://issues.jboss.org/browse/ISPN-8994

On 27-03-2018 10:08, Pedro Ruivo wrote:
>
>
> On 27-03-2018 09:03, Sebastian Laskawiec wrote:
>> At the moment, the cluster health status checker enumerates all caches
>> in the cache manager [1] and checks whether those cashes are running
>> and not in degraded more [2].
>>
>> I'm not sure how counter caches have been implemented. One thing is
>> for sure - they should be taken into account in this loop [3].
>
> The private caches aren't listed by CacheManager.getCacheNames(). We
> have to check them via InternalCacheRegistry.getInternalCacheNames().
>
> I'll open a JIRA if you don't mind :)
>
>>
>> [1]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L22
>>
>> [2]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/CacheHealthImpl.java#L25
>>
>> [3]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L23-L24
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] 9.2 EmbeddedCacheManager blocked at shutdown

Thomas SEGISMONT
Hi Will,

I will create the JIRA and provide the TRACE level logs as soon as possible.

Thanks for the update.


2018-04-20 20:10 GMT+02:00 William Burns <[hidden email]>:
On Fri, Apr 20, 2018 at 9:43 AM Thomas SEGISMONT <[hidden email]> wrote:
I tried our test suite on a slower machine (iMac from 2011). It passes consistently there.

On my laptop, I keep seeing this from time to time (in different tests):

2018-04-19T19:53:09.513 WARN [Context=org.infinispan.LOCKS]ISPN000320: After merge (or coordinator change), cache still hasn't recovered a majority of members and must stay in degraded mode. Current members are [sombrero-19385], lost members are [sombrero-42917], stable members are [sombrero-42917, sombrero-19385]

I would expect the nodes to be leaving gracefully, which shouldn't cause a merge. I am not sure how your test is producing that. Can you produce a TRACE log and a JIRA for it?

However if there is a merge, if you go down a single node it will always be in DEGRADED mode, when using partition handling. This is due to not having a simple majority as described in http://infinispan.org/docs/stable/user_guide/user_guide.html#partition_handling
 

It happens when we shutdown nodes one after the other (even when waiting for cluster status to be "healthy" plus extra 2 seconds).

After that the nodes remains blocked in DefaultCacheManager.stop

2018-04-19T19:49:29.242 AVERTISSEMENT Thread Thread[vert.x-worker-thread-5,5,main] has been blocked for 60774 ms, time limit is 60000
io.vertx.core.VertxException: Thread blocked
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
    at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
    at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
    at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
    at org.infinispan.manager.DefaultCacheManager.terminate(DefaultCacheManager.java:688)
    at org.infinispan.manager.DefaultCacheManager.stopCaches(DefaultCacheManager.java:734)
    at org.infinispan.manager.DefaultCacheManager.stop(DefaultCacheManager.java:711)
    at io.vertx.ext.cluster.infinispan.InfinispanClusterManager.lambda$leave$5(InfinispanClusterManager.java:285)
    at io.vertx.ext.cluster.infinispan.InfinispanClusterManager$$Lambda$421/578931659.handle(Unknown Source)
    at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:265)
    at io.vertx.core.impl.ContextImpl$$Lambda$27/1330754528.run(Unknown Source)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)

This looks like the exact issue that Radim mentioned with https://issues.jboss.org/browse/ISPN-8859.
 



2018-04-18 17:00 GMT+02:00 Thomas SEGISMONT <[hidden email]>:
So here's the Circular Referenced Suppressed Exception

[stateTransferExecutor-thread--p221-t33] 2018-04-18T16:15:06.662+02:00 WARN [org.infinispan.statetransfer.InboundTransferTask]  ISPN000210: Failed to request state of cache __vertx.subs from node sombrero-25286, segments {0}
org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected
    at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:33)
    at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
    at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
    at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
    at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:51)
    at org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:42)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.addRequest(JGroupsTransport.java:921)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeCommand(JGroupsTransport.java:815)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeCommand(JGroupsTransport.java:123)
    at org.infinispan.remoting.rpc.RpcManagerImpl.invokeCommand(RpcManagerImpl.java:138)
    at org.infinispan.statetransfer.InboundTransferTask.startTransfer(InboundTransferTask.java:134)
    at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:113)
    at org.infinispan.conflict.impl.StateReceiverImpl$SegmentRequest.lambda$requestState$2(StateReceiverImpl.java:164)
    at org.infinispan.executors.LimitedExecutor.lambda$executeAsync$1(LimitedExecutor.java:101)
    at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
    at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
    at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:82)
        at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:260)
        ... 10 more
    [CIRCULAR REFERENCE:org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node sombrero-25286 was suspected]

It does not happen with 9.2.0.Final and prevents from using ISPN embedded with logback. Do you want me to file an issue ?

2018-04-18 11:45 GMT+02:00 Thomas SEGISMONT <[hidden email]>:
Hi folks,

Sorry I've been busy on other things and couldn't get back to you earlier.

I tried running vertx-infinispan test suite with 9.2.1.Final today. There are some problems still but I can't say which ones yet because I hit: https://jira.qos.ch/browse/LOGBACK-1027

We use logback for test logs and all I get is:

2018-04-18 11:37:46,678 [stateTransferExecutor-thread--p4453-t24] ERROR o.i.executors.LimitedExecutor - Exception in task
java.lang.StackOverflowError: null
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:54)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:60)
    at ch.qos.logback.classic.spi.ThrowableProxy.<init>(ThrowableProxy.java:72)
... so on so forth

I will run the suite again without logback and tell you what the actual problem is.

Regards,
Thomas

2018-03-27 11:15 GMT+02:00 Pedro Ruivo <[hidden email]>:
JIRA: https://issues.jboss.org/browse/ISPN-8994

On 27-03-2018 10:08, Pedro Ruivo wrote:
>
>
> On 27-03-2018 09:03, Sebastian Laskawiec wrote:
>> At the moment, the cluster health status checker enumerates all caches
>> in the cache manager [1] and checks whether those cashes are running
>> and not in degraded more [2].
>>
>> I'm not sure how counter caches have been implemented. One thing is
>> for sure - they should be taken into account in this loop [3].
>
> The private caches aren't listed by CacheManager.getCacheNames(). We
> have to check them via InternalCacheRegistry.getInternalCacheNames().
>
> I'll open a JIRA if you don't mind :)
>
>>
>> [1]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L22
>>
>> [2]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/CacheHealthImpl.java#L25
>>
>> [3]
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/health/impl/ClusterHealthImpl.java#L23-L24
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev