[infinispan-dev] multi-mapping with indexing - do we need big-table

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[infinispan-dev] multi-mapping with indexing - do we need big-table

kapil nayar
We have two data sets {A1, A2, A3...} and {B1, B2, B3...}
Each B has some associated data {C1, C2, C3....}  which has 1:1 mapping.

The mappings would be something like (assume that C would be stored along side B):
A1-> B1, B2
A2-> B3, B5
A3-> B4, B6, B7

Now, we would need the following indexes:
A->B and B->A

Notice, that both are unique mappings. However, as shown A has multiple mappings to B.
The big-table type of data structure allow this and make it pretty easy off the shelf.

Now, I am trying to explore if we can implement these mappings with Infinispan.
We may need a basic multi-map - to store multiple values for the same key in the cache.

1. The "get" would return the complete list of the values.
2. The "put" would add the new value without replacing the existing value.
3. The "remove" would remove a specific value or optionally all values associated with the key.
4. These operations (especially "put") on the same key can occur simultaneously from multiple nodes.

I know there is an atomic map option in Infinispan which may be applicable, but AFAIK it requires transactions (which we want to avoid..).

Alternatively, perhaps Infinispan (in combination with lucene) can be used.
1. We should be able to create data structure {B, C} and store A-> {B,C} with indexes defined for B.
2. Also, the key A could be structured as a combination of A+B to store multiple entries like A1B1->{B1,C1} and A1B2->{B2,C2}. Lucene would allow wild carded searches. e.g. To look for all A1 values we could do something like A1* which should return both A1B1 and A2B2....I may be making some assumptions here (feel free to correct!)
3. There seems to be one bottleneck though - since the cache mode is "distribution", it seems it is mandatory to use a backend DB to store these indexes and moreover the DB needs to be shared. This requirement actually seems to defeat the purpose of using Infinispan.

Any ideas for achieving this implementation would be greatly appreciated.

For reference this email is further to the user forum thread http://community.jboss.org/message/622996#622996

Thanks,
Kapil


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] multi-mapping with indexing - do we need big-table

Manik Surtani
Hi Kapil - please don't post such questions to this mail list; use the user forums instead.

On 9 Sep 2011, at 05:23, kapil nayar wrote:

We have two data sets {A1, A2, A3...} and {B1, B2, B3...}
Each B has some associated data {C1, C2, C3....}  which has 1:1 mapping.

The mappings would be something like (assume that C would be stored along side B):
A1-> B1, B2
A2-> B3, B5
A3-> B4, B6, B7

Now, we would need the following indexes:
A->B and B->A

Notice, that both are unique mappings. However, as shown A has multiple mappings to B.
The big-table type of data structure allow this and make it pretty easy off the shelf.

Now, I am trying to explore if we can implement these mappings with Infinispan.
We may need a basic multi-map - to store multiple values for the same key in the cache.

1. The "get" would return the complete list of the values.
2. The "put" would add the new value without replacing the existing value.
3. The "remove" would remove a specific value or optionally all values associated with the key.
4. These operations (especially "put") on the same key can occur simultaneously from multiple nodes.

I know there is an atomic map option in Infinispan which may be applicable, but AFAIK it requires transactions (which we want to avoid..).

Alternatively, perhaps Infinispan (in combination with lucene) can be used.
1. We should be able to create data structure {B, C} and store A-> {B,C} with indexes defined for B.
2. Also, the key A could be structured as a combination of A+B to store multiple entries like A1B1->{B1,C1} and A1B2->{B2,C2}. Lucene would allow wild carded searches. e.g. To look for all A1 values we could do something like A1* which should return both A1B1 and A2B2....I may be making some assumptions here (feel free to correct!)
3. There seems to be one bottleneck though - since the cache mode is "distribution", it seems it is mandatory to use a backend DB to store these indexes and moreover the DB needs to be shared. This requirement actually seems to defeat the purpose of using Infinispan.

Any ideas for achieving this implementation would be greatly appreciated.

For reference this email is further to the user forum thread http://community.jboss.org/message/622996#622996

Thanks,
Kapil

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] multi-mapping with indexing - do we need big-table

kapil nayar
Hi Manik,

Actually, I started the thread on the user-forum. However, Sanne suggested to discuss the use case on the developer's mailing list (see the link for the user-forum discussion at the end of my last email).
"If you have suggestions or interesting use cases they are very welcome on the developer's mailing list, or you can open feature requests directly on JIRA if you have a very clear idea of your need already."

Albeit, I find this case interesting - let me know if this is trivial and we can discuss it further on the user-forum.

Thanks,
Kapil

On Fri, Sep 9, 2011 at 5:56 AM, Manik Surtani <[hidden email]> wrote:
Hi Kapil - please don't post such questions to this mail list; use the user forums instead.

On 9 Sep 2011, at 05:23, kapil nayar wrote:

We have two data sets {A1, A2, A3...} and {B1, B2, B3...}
Each B has some associated data {C1, C2, C3....}  which has 1:1 mapping.

The mappings would be something like (assume that C would be stored along side B):
A1-> B1, B2
A2-> B3, B5
A3-> B4, B6, B7

Now, we would need the following indexes:
A->B and B->A

Notice, that both are unique mappings. However, as shown A has multiple mappings to B.
The big-table type of data structure allow this and make it pretty easy off the shelf.

Now, I am trying to explore if we can implement these mappings with Infinispan.
We may need a basic multi-map - to store multiple values for the same key in the cache.

1. The "get" would return the complete list of the values.
2. The "put" would add the new value without replacing the existing value.
3. The "remove" would remove a specific value or optionally all values associated with the key.
4. These operations (especially "put") on the same key can occur simultaneously from multiple nodes.

I know there is an atomic map option in Infinispan which may be applicable, but AFAIK it requires transactions (which we want to avoid..).

Alternatively, perhaps Infinispan (in combination with lucene) can be used.
1. We should be able to create data structure {B, C} and store A-> {B,C} with indexes defined for B.
2. Also, the key A could be structured as a combination of A+B to store multiple entries like A1B1->{B1,C1} and A1B2->{B2,C2}. Lucene would allow wild carded searches. e.g. To look for all A1 values we could do something like A1* which should return both A1B1 and A2B2....I may be making some assumptions here (feel free to correct!)
3. There seems to be one bottleneck though - since the cache mode is "distribution", it seems it is mandatory to use a backend DB to store these indexes and moreover the DB needs to be shared. This requirement actually seems to defeat the purpose of using Infinispan.

Any ideas for achieving this implementation would be greatly appreciated.

For reference this email is further to the user forum thread http://community.jboss.org/message/622996#622996

Thanks,
Kapil

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] multi-mapping with indexing - do we need big-table

Manik Surtani
In reply to this post by kapil nayar
Hi Kapil

After reading through this again, it is indeed an interesting use case.  My comments inline:

On 9 Sep 2011, at 05:23, kapil nayar wrote:

We have two data sets {A1, A2, A3...} and {B1, B2, B3...}
Each B has some associated data {C1, C2, C3....}  which has 1:1 mapping.

The mappings would be something like (assume that C would be stored along side B):
A1-> B1, B2
A2-> B3, B5
A3-> B4, B6, B7

Now, we would need the following indexes:
A->B and B->A

Notice, that both are unique mappings. However, as shown A has multiple mappings to B.
The big-table type of data structure allow this and make it pretty easy off the shelf.

Now, I am trying to explore if we can implement these mappings with Infinispan.
We may need a basic multi-map - to store multiple values for the same key in the cache.

1. The "get" would return the complete list of the values.
2. The "put" would add the new value without replacing the existing value.
3. The "remove" would remove a specific value or optionally all values associated with the key.
4. These operations (especially "put") on the same key can occur simultaneously from multiple nodes.

I know there is an atomic map option in Infinispan which may be applicable, but AFAIK it requires transactions (which we want to avoid..).

The AtomicMap does do this, but will lock the entire map for any operation.  We're working on a FineGrainedMap as well, which will allow concurrent updates to contents within the map.  See https://issues.jboss.org/browse/ISPN-1115

However this too is likely to require JTA transactions for consistency.  Could you explain why you wish to avoid transactions?


Alternatively, perhaps Infinispan (in combination with lucene) can be used.
1. We should be able to create data structure {B, C} and store A-> {B,C} with indexes defined for B.
2. Also, the key A could be structured as a combination of A+B to store multiple entries like A1B1->{B1,C1} and A1B2->{B2,C2}. Lucene would allow wild carded searches. e.g. To look for all A1 values we could do something like A1* which should return both A1B1 and A2B2....I may be making some assumptions here (feel free to correct!)

Yes, this should be possible.

3. There seems to be one bottleneck though - since the cache mode is "distribution", it seems it is mandatory to use a backend DB to store these indexes and moreover the DB needs to be shared. This requirement actually seems to defeat the purpose of using Infinispan.

Not necessarily.  You can configure Lucene to store indexes in a replicated Infinispan cache as well.  This means the indexes are globally available, and in-memory.  You would need a lot of memory though!  :)

Cheers
Manik
--


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] multi-mapping with indexing - do we need big-table

kapil nayar
Thanks Manik.

We want to avoid transactions because of the additional perceptible overhead. Our opinion is that the use case does not literally involve multiple datastores/ caches and hence transactions should be avoided as far as possible. Do you have different thoughts/ inputs?

If we go with Lucene option - is there a way to calculate the memory footprint required for the indices based upon the field length etc.
  
Kapil

On Wed, Sep 14, 2011 at 8:22 AM, Manik Surtani <[hidden email]> wrote:
Hi Kapil

After reading through this again, it is indeed an interesting use case.  My comments inline:

On 9 Sep 2011, at 05:23, kapil nayar wrote:

We have two data sets {A1, A2, A3...} and {B1, B2, B3...}
Each B has some associated data {C1, C2, C3....}  which has 1:1 mapping.

The mappings would be something like (assume that C would be stored along side B):
A1-> B1, B2
A2-> B3, B5
A3-> B4, B6, B7

Now, we would need the following indexes:
A->B and B->A

Notice, that both are unique mappings. However, as shown A has multiple mappings to B.
The big-table type of data structure allow this and make it pretty easy off the shelf.

Now, I am trying to explore if we can implement these mappings with Infinispan.
We may need a basic multi-map - to store multiple values for the same key in the cache.

1. The "get" would return the complete list of the values.
2. The "put" would add the new value without replacing the existing value.
3. The "remove" would remove a specific value or optionally all values associated with the key.
4. These operations (especially "put") on the same key can occur simultaneously from multiple nodes.

I know there is an atomic map option in Infinispan which may be applicable, but AFAIK it requires transactions (which we want to avoid..).

The AtomicMap does do this, but will lock the entire map for any operation.  We're working on a FineGrainedMap as well, which will allow concurrent updates to contents within the map.  See https://issues.jboss.org/browse/ISPN-1115

However this too is likely to require JTA transactions for consistency.  Could you explain why you wish to avoid transactions?


Alternatively, perhaps Infinispan (in combination with lucene) can be used.
1. We should be able to create data structure {B, C} and store A-> {B,C} with indexes defined for B.
2. Also, the key A could be structured as a combination of A+B to store multiple entries like A1B1->{B1,C1} and A1B2->{B2,C2}. Lucene would allow wild carded searches. e.g. To look for all A1 values we could do something like A1* which should return both A1B1 and A2B2....I may be making some assumptions here (feel free to correct!)

Yes, this should be possible.

3. There seems to be one bottleneck though - since the cache mode is "distribution", it seems it is mandatory to use a backend DB to store these indexes and moreover the DB needs to be shared. This requirement actually seems to defeat the purpose of using Infinispan.

Not necessarily.  You can configure Lucene to store indexes in a replicated Infinispan cache as well.  This means the indexes are globally available, and in-memory.  You would need a lot of memory though!  :)

Cheers
Manik
--

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] multi-mapping with indexing - do we need big-table

Manik Surtani
You should still try the transactions option, using synchronisations instead of full XA.  Most modern transaction managers can optimise this a lot if you are the only resource in the transaction.  This way at least you know for sure whether you can or cannot cope with the overhead of transactions.

On 15 Sep 2011, at 14:39, kapil nayar wrote:

Thanks Manik.

We want to avoid transactions because of the additional perceptible overhead. Our opinion is that the use case does not literally involve multiple datastores/ caches and hence transactions should be avoided as far as possible. Do you have different thoughts/ inputs?

If we go with Lucene option - is there a way to calculate the memory footprint required for the indices based upon the field length etc.
  
Kapil

On Wed, Sep 14, 2011 at 8:22 AM, Manik Surtani <[hidden email]> wrote:
Hi Kapil

After reading through this again, it is indeed an interesting use case.  My comments inline:

On 9 Sep 2011, at 05:23, kapil nayar wrote:

We have two data sets {A1, A2, A3...} and {B1, B2, B3...}
Each B has some associated data {C1, C2, C3....}  which has 1:1 mapping.

The mappings would be something like (assume that C would be stored along side B):
A1-> B1, B2
A2-> B3, B5
A3-> B4, B6, B7

Now, we would need the following indexes:
A->B and B->A

Notice, that both are unique mappings. However, as shown A has multiple mappings to B.
The big-table type of data structure allow this and make it pretty easy off the shelf.

Now, I am trying to explore if we can implement these mappings with Infinispan.
We may need a basic multi-map - to store multiple values for the same key in the cache.

1. The "get" would return the complete list of the values.
2. The "put" would add the new value without replacing the existing value.
3. The "remove" would remove a specific value or optionally all values associated with the key.
4. These operations (especially "put") on the same key can occur simultaneously from multiple nodes.

I know there is an atomic map option in Infinispan which may be applicable, but AFAIK it requires transactions (which we want to avoid..).

The AtomicMap does do this, but will lock the entire map for any operation.  We're working on a FineGrainedMap as well, which will allow concurrent updates to contents within the map.  See https://issues.jboss.org/browse/ISPN-1115

However this too is likely to require JTA transactions for consistency.  Could you explain why you wish to avoid transactions?


Alternatively, perhaps Infinispan (in combination with lucene) can be used.
1. We should be able to create data structure {B, C} and store A-> {B,C} with indexes defined for B.
2. Also, the key A could be structured as a combination of A+B to store multiple entries like A1B1->{B1,C1} and A1B2->{B2,C2}. Lucene would allow wild carded searches. e.g. To look for all A1 values we could do something like A1* which should return both A1B1 and A2B2....I may be making some assumptions here (feel free to correct!)

Yes, this should be possible.

3. There seems to be one bottleneck though - since the cache mode is "distribution", it seems it is mandatory to use a backend DB to store these indexes and moreover the DB needs to be shared. This requirement actually seems to defeat the purpose of using Infinispan.

Not necessarily.  You can configure Lucene to store indexes in a replicated Infinispan cache as well.  This means the indexes are globally available, and in-memory.  You would need a lot of memory though!  :)

Cheers
Manik
--

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev