[infinispan-dev] Providing a context for object de-serialization

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[infinispan-dev] Providing a context for object de-serialization

Sanne Grinovero-3
Imagine I have a value object which needs to be stored in Infinispan:

class Person {
   final String nationality = ...
   final String fullName = ...
 [constructor]
}

And now let's assume that - as you could expect - most Person
instances have the same value for the nationality String, but a
different name.

I want to define a custom Externalizer for my type, but the current
Externalizer API doesn't allow to refer to some common application
context, which might be extremely useful to deserialize this Person
instance:

we could avoid filling the memory of my Grid by having multiple copies
of the nationality String repeated all over, when a String [1] could
be reused.

Would it be a good idea to have the Externalizer instances have an
initialization phase receiving a ComponentRegistry, so I could look up
some custom service to de-duplicate or otherwise optimize my in-memory
data representation?
Personally I'd prefer to receive it injected via the constructor so
that I could use a final field when my custom Externalizer is
constructed.

This is OGM related.

Cheers,
Sanne


1 - or any immutable object: I'm using String as an example so let's
forget about the static String pool optimizations the JVM might
enable..
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Mircea Markus-2


----- Original Message -----

> From: "Sanne Grinovero" <[hidden email]>
> To: "infinispan -Dev List" <[hidden email]>
> Sent: Tuesday, June 26, 2012 5:13:28 PM
> Subject: [infinispan-dev] Providing a context for object de-serialization
>
> Imagine I have a value object which needs to be stored in Infinispan:
>
> class Person {
>    final String nationality = ...
>    final String fullName = ...
>  [constructor]
> }
>
> And now let's assume that - as you could expect - most Person
> instances have the same value for the nationality String, but a
> different name.
>
> I want to define a custom Externalizer for my type, but the current
> Externalizer API doesn't allow to refer to some common application
> context, which might be extremely useful to deserialize this Person
> instance:
>
> we could avoid filling the memory of my Grid by having multiple
> copies
> of the nationality String repeated all over, when a String [1] could
> be reused.
>
> Would it be a good idea to have the Externalizer instances have an
> initialization phase receiving a ComponentRegistry, so I could look
> up
> some custom service to de-duplicate or otherwise optimize my
> in-memory
> data representation?
> Personally I'd prefer to receive it injected via the constructor so
> that I could use a final field when my custom Externalizer is
> constructed.
+1. That's pretty easy to achieve by registering the Externalizers into the ComponentRegistry (or simpler just run the component registry wiring code on each externalizer object in the ExternalizerTable). Unfortunately our internal dependency injection framework doesn't support c-tor injection yet.
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Sanne Grinovero-3
I'll mention another good reason why this would be awesome (on top of
memory consumption):

In one of the OGM tests we highlighted quite some cost on the
EntityKey#equals method. EntityKey is the key used to store some of
the most frequently accessed elements from the Cache, so no wonder
this equals method is quite stressed.
We improved it a bit with a carefully crafted equals implementation,
still it has to compare several fields, of which most are the same in
most cases.

Back to our Person example, this would allow me to change an #equals
implementation from

[...]
   if (name == null) {
      if (other.name != null)
         return false;
   } else if (!name.equals(other.name))
      return false;
   if (nationality == null) {
      if (other.nationality != null)
         return false;
   } else if (!nationality.equals(other.nationality))
      return false;

Into
   if (nationality != other.nationality)
      return false;
   if (name == null) {
      if (other.name != null)
         return false;
   } else if (!name.equals(other.name))
      return false;

See the dirty trick? String comparison isn't exactly cheap, if you
have to run it on hundreds of elements.
Of course in this example just ordering name as first comparison makes
it unlikely unlikely enough to ever perform the second check, but
guessing the best order in a real world scenario is far more complex..
plus I have arrays of Strings to compare.


On 26 June 2012 20:05, Mircea Markus <[hidden email]> wrote:

>
>
> ----- Original Message -----
>> From: "Sanne Grinovero" <[hidden email]>
>> To: "infinispan -Dev List" <[hidden email]>
>> Sent: Tuesday, June 26, 2012 5:13:28 PM
>> Subject: [infinispan-dev] Providing a context for object de-serialization
>>
>> Imagine I have a value object which needs to be stored in Infinispan:
>>
>> class Person {
>>    final String nationality = ...
>>    final String fullName = ...
>>  [constructor]
>> }
>>
>> And now let's assume that - as you could expect - most Person
>> instances have the same value for the nationality String, but a
>> different name.
>>
>> I want to define a custom Externalizer for my type, but the current
>> Externalizer API doesn't allow to refer to some common application
>> context, which might be extremely useful to deserialize this Person
>> instance:
>>
>> we could avoid filling the memory of my Grid by having multiple
>> copies
>> of the nationality String repeated all over, when a String [1] could
>> be reused.
>>
>> Would it be a good idea to have the Externalizer instances have an
>> initialization phase receiving a ComponentRegistry, so I could look
>> up
>> some custom service to de-duplicate or otherwise optimize my
>> in-memory
>> data representation?
>> Personally I'd prefer to receive it injected via the constructor so
>> that I could use a final field when my custom Externalizer is
>> constructed.
> +1. That's pretty easy to achieve by registering the Externalizers into the ComponentRegistry (or simpler just run the component registry wiring code on each externalizer object in the ExternalizerTable). Unfortunately our internal dependency injection framework doesn't support c-tor injection yet.

Thanks:
https://issues.jboss.org/browse/ISPN-2133

Sanne

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Mircea Markus-2


----- Original Message -----

> From: "Sanne Grinovero" <[hidden email]>
> To: "infinispan -Dev List" <[hidden email]>
> Sent: Tuesday, June 26, 2012 8:59:00 PM
> Subject: Re: [infinispan-dev] Providing a context for object de-serialization
>
> I'll mention another good reason why this would be awesome (on top of
> memory consumption):
>
> In one of the OGM tests we highlighted quite some cost on the
> EntityKey#equals method. EntityKey is the key used to store some of
> the most frequently accessed elements from the Cache, so no wonder
> this equals method is quite stressed.
> We improved it a bit with a carefully crafted equals implementation,
> still it has to compare several fields, of which most are the same in
> most cases.
>
> Back to our Person example, this would allow me to change an #equals
> implementation from
>
> [...]
>    if (name == null) {
>       if (other.name != null)
>          return false;
>    } else if (!name.equals(other.name))
>       return false;
>    if (nationality == null) {
>       if (other.nationality != null)
>          return false;
>    } else if (!nationality.equals(other.nationality))
>       return false;
>
> Into
>    if (nationality != other.nationality)
>       return false;
>    if (name == null) {
>       if (other.name != null)
>          return false;
>    } else if (!name.equals(other.name))
>       return false;
>
> See the dirty trick? String comparison isn't exactly cheap, if you
> have to run it on hundreds of elements.
yep, especially when the strings are actual equals, it takes O(String.length) to compare them.
In your example with nationality there's a high chance they are, so equal should be costly indeed.
(I think I'd model nationality as an enum though, but still excellent point).
 
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Sanne Grinovero-3
On 27 June 2012 12:57, Mircea Markus <[hidden email]> wrote:

>
>
> ----- Original Message -----
>> From: "Sanne Grinovero" <[hidden email]>
>> To: "infinispan -Dev List" <[hidden email]>
>> Sent: Tuesday, June 26, 2012 8:59:00 PM
>> Subject: Re: [infinispan-dev] Providing a context for object de-serialization
>>
>> I'll mention another good reason why this would be awesome (on top of
>> memory consumption):
>>
>> In one of the OGM tests we highlighted quite some cost on the
>> EntityKey#equals method. EntityKey is the key used to store some of
>> the most frequently accessed elements from the Cache, so no wonder
>> this equals method is quite stressed.
>> We improved it a bit with a carefully crafted equals implementation,
>> still it has to compare several fields, of which most are the same in
>> most cases.
>>
>> Back to our Person example, this would allow me to change an #equals
>> implementation from
>>
>> [...]
>>    if (name == null) {
>>       if (other.name != null)
>>          return false;
>>    } else if (!name.equals(other.name))
>>       return false;
>>    if (nationality == null) {
>>       if (other.nationality != null)
>>          return false;
>>    } else if (!nationality.equals(other.nationality))
>>       return false;
>>
>> Into
>>    if (nationality != other.nationality)
>>       return false;
>>    if (name == null) {
>>       if (other.name != null)
>>          return false;
>>    } else if (!name.equals(other.name))
>>       return false;
>>
>> See the dirty trick? String comparison isn't exactly cheap, if you
>> have to run it on hundreds of elements.
> yep, especially when the strings are actual equals, it takes O(String.length) to compare them.
> In your example with nationality there's a high chance they are, so equal should be costly indeed.
> (I think I'd model nationality as an enum though, but still excellent point).

right; in this specific case an enum would be nice, but it's just an
example I'm making up to save you from a gory explanation of OGM
internals, and the available types are not known the the
application at compile time but depend on the schema being used.

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Galder Zamarreno
In reply to this post by Sanne Grinovero-3

On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:

> Imagine I have a value object which needs to be stored in Infinispan:
>
> class Person {
>   final String nationality = ...
>   final String fullName = ...
> [constructor]
> }
>
> And now let's assume that - as you could expect - most Person
> instances have the same value for the nationality String, but a
> different name.
>
> I want to define a custom Externalizer for my type, but the current
> Externalizer API doesn't allow to refer to some common application
> context, which might be extremely useful to deserialize this Person
> instance:
>
> we could avoid filling the memory of my Grid by having multiple copies
> of the nationality String repeated all over, when a String [1] could
> be reused.
>
> Would it be a good idea to have the Externalizer instances have an
> initialization phase receiving a ComponentRegistry, so I could look up
> some custom service to de-duplicate or otherwise optimize my in-memory
> data representation?
> Personally I'd prefer to receive it injected via the constructor so
> that I could use a final field when my custom Externalizer is
> constructed.
>
> This is OGM related.

^ Makes sense, but only solves one part of the problem.

String is probably a bad example here [as you already said, due to 1], but a better example is if you have a Nationality class with country name, timezone…etc in it.

My point is, your suggestion works for nodes to which data is replicated to, but in the original node where you've created 100 Person instances for Spanish nationaility, you'd still potentially have 100 instances.

Did you have anything in mind for this?

Btw, not sure about the need of ComponentRegistry here. IMO, this kind of feature should work for Hot Rod clients too, where Externalizers might be used in the future, and where there's no ComponentRegistry (unless it's a RemoteCacheStore...)

>
> Cheers,
> Sanne
>
>
> 1 - or any immutable object: I'm using String as an example so let's
> forget about the static String pool optimizations the JVM might
> enable..
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Sanne Grinovero-3
On 6 July 2012 15:06, Galder Zamarreño <[hidden email]> wrote:

>
> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:
>
>> Imagine I have a value object which needs to be stored in Infinispan:
>>
>> class Person {
>>   final String nationality = ...
>>   final String fullName = ...
>> [constructor]
>> }
>>
>> And now let's assume that - as you could expect - most Person
>> instances have the same value for the nationality String, but a
>> different name.
>>
>> I want to define a custom Externalizer for my type, but the current
>> Externalizer API doesn't allow to refer to some common application
>> context, which might be extremely useful to deserialize this Person
>> instance:
>>
>> we could avoid filling the memory of my Grid by having multiple copies
>> of the nationality String repeated all over, when a String [1] could
>> be reused.
>>
>> Would it be a good idea to have the Externalizer instances have an
>> initialization phase receiving a ComponentRegistry, so I could look up
>> some custom service to de-duplicate or otherwise optimize my in-memory
>> data representation?
>> Personally I'd prefer to receive it injected via the constructor so
>> that I could use a final field when my custom Externalizer is
>> constructed.
>>
>> This is OGM related.
>
> ^ Makes sense, but only solves one part of the problem.
>
> String is probably a bad example here [as you already said, due to 1], but a better example is if you have a Nationality class with country name, timezone…etc in it.
>
> My point is, your suggestion works for nodes to which data is replicated to, but in the original node where you've created 100 Person instances for Spanish nationaility, you'd still potentially have 100 instances.
>
> Did you have anything in mind for this?

That's where the ComponentRegistry's role kicks in: the user
application created these object instances before storing them in the
original node, and if it is a bit cleverly designed it will have
something like a Map of immutable Nationality instances, so that every
time it needs Spanish it looks up the same instance.

Consequentially the custom externalizer implementation needs access to
the same service instance as used by the application, so that it can
make use of the same pool rather than having to create his own pool
instance: the essence of my proposal is really to have the user
application and the Externalizer framework to share the same Factory.

> Btw, not sure about the need of ComponentRegistry here. IMO, this kind of feature should work for Hot Rod clients too, where Externalizers might be used in the future, and where there's no ComponentRegistry (unless it's a RemoteCacheStore...)

It doesn't need to be literally a ComponentRegistry interface
implementation, just anything which allows the Externalizer to be
initialized using some externally provided service as in the above
example.

This optimisation should have no functional impact but just an
optionally implementable trick which saves some memory.. so if we can
think of a way to do the same for Hot Rod that's very cool but doesn't
necessarily have to use the same components and (internal) interfaces.

I'm thinking of this as a similar "optionality" as we have when
choosing between Serializable vs. custom Externalizers : people can
plug one in if they know what they're doing (like these instances
should definitely be immutable) but everything just works fine if you
don't.
I'm not really sure if there is a wide range of applications, nor have
any idea of the amount of memory it could save in practice... just and
idea I wanted to sketch.
I suspect it might allow me to do some cool things with both OGM and
Lucene Directoy, as you can re-hidratate complex object graphs from
different cache entries, reassembling them with direct references...
dreaming?

>
>>
>> Cheers,
>> Sanne
>>
>>
>> 1 - or any immutable object: I'm using String as an example so let's
>> forget about the static String pool optimizations the JVM might
>> enable..
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Mircea Markus
On 06/07/2012 22:48, Sanne Grinovero wrote:

> On 6 July 2012 15:06, Galder Zamarreño <[hidden email]> wrote:
>> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:
>>
>>> Imagine I have a value object which needs to be stored in Infinispan:
>>>
>>> class Person {
>>>    final String nationality = ...
>>>    final String fullName = ...
>>> [constructor]
>>> }
>>>
>>> And now let's assume that - as you could expect - most Person
>>> instances have the same value for the nationality String, but a
>>> different name.
>>>
>>> I want to define a custom Externalizer for my type, but the current
>>> Externalizer API doesn't allow to refer to some common application
>>> context, which might be extremely useful to deserialize this Person
>>> instance:
>>>
>>> we could avoid filling the memory of my Grid by having multiple copies
>>> of the nationality String repeated all over, when a String [1] could
>>> be reused.
>>>
>>> Would it be a good idea to have the Externalizer instances have an
>>> initialization phase receiving a ComponentRegistry, so I could look up
>>> some custom service to de-duplicate or otherwise optimize my in-memory
>>> data representation?
>>> Personally I'd prefer to receive it injected via the constructor so
>>> that I could use a final field when my custom Externalizer is
>>> constructed.
>>>
>>> This is OGM related.
>> ^ Makes sense, but only solves one part of the problem.
>>
>> String is probably a bad example here [as you already said, due to 1], but a better example is if you have a Nationality class with country name, timezone…etc in it.
>>
>> My point is, your suggestion works for nodes to which data is replicated to, but in the original node where you've created 100 Person instances for Spanish nationaility, you'd still potentially have 100 instances.
>>
>> Did you have anything in mind for this?
> That's where the ComponentRegistry's role kicks in: the user
> application created these object instances before storing them in the
> original node, and if it is a bit cleverly designed it will have
> something like a Map of immutable Nationality instances, so that every
> time it needs Spanish it looks up the same instance.
>
> Consequentially the custom externalizer implementation needs access to
> the same service instance as used by the application, so that it can
> make use of the same pool rather than having to create his own pool
> instance: the essence of my proposal is really to have the user
> application and the Externalizer framework to share the same Factory.
>
>> Btw, not sure about the need of ComponentRegistry here. IMO, this kind of feature should work for Hot Rod clients too, where Externalizers might be used in the future, and where there's no ComponentRegistry (unless it's a RemoteCacheStore...)
> It doesn't need to be literally a ComponentRegistry interface
> implementation, just anything which allows the Externalizer to be
> initialized using some externally provided service as in the above
> example.
>
> This optimisation should have no functional impact but just an
> optionally implementable trick which saves some memory.. so if we can
> think of a way to do the same for Hot Rod that's very cool but doesn't
> necessarily have to use the same components and (internal) interfaces.
>
> I'm thinking of this as a similar "optionality" as we have when
> choosing between Serializable vs. custom Externalizers : people can
> plug one in if they know what they're doing (like these instances
> should definitely be immutable) but everything just works fine if you
> don't.
> I'm not really sure if there is a wide range of applications, nor have
> any idea of the amount of memory it could save in practice... just and
> idea I wanted to sketch.
I think there might be quite useful; the flyweight pattern[1] was
created to solve exactly this kind of *existing* problems.
Just as a note, there is a simple, not necessarily nice, workaround for
this: make the object pool statically accessible (or even better Enums).

[1] http://en.wikipedia.org/wiki/Flyweight_pattern

> I suspect it might allow me to do some cool things with both OGM and
> Lucene Directoy, as you can re-hidratate complex object graphs from
> different cache entries, reassembling them with direct references...
> dreaming?
>
>>> Cheers,
>>> Sanne
>>>
>>>
>>> 1 - or any immutable object: I'm using String as an example so let's
>>> forget about the static String pool optimizations the JVM might
>>> enable..
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> --
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Galder Zamarreno
In reply to this post by Sanne Grinovero-3

On Jul 6, 2012, at 11:48 PM, Sanne Grinovero wrote:

> On 6 July 2012 15:06, Galder Zamarreño <[hidden email]> wrote:
>>
>> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:
>>
>>> Imagine I have a value object which needs to be stored in Infinispan:
>>>
>>> class Person {
>>>  final String nationality = ...
>>>  final String fullName = ...
>>> [constructor]
>>> }
>>>
>>> And now let's assume that - as you could expect - most Person
>>> instances have the same value for the nationality String, but a
>>> different name.
>>>
>>> I want to define a custom Externalizer for my type, but the current
>>> Externalizer API doesn't allow to refer to some common application
>>> context, which might be extremely useful to deserialize this Person
>>> instance:
>>>
>>> we could avoid filling the memory of my Grid by having multiple copies
>>> of the nationality String repeated all over, when a String [1] could
>>> be reused.
>>>
>>> Would it be a good idea to have the Externalizer instances have an
>>> initialization phase receiving a ComponentRegistry, so I could look up
>>> some custom service to de-duplicate or otherwise optimize my in-memory
>>> data representation?
>>> Personally I'd prefer to receive it injected via the constructor so
>>> that I could use a final field when my custom Externalizer is
>>> constructed.
>>>
>>> This is OGM related.
>>
>> ^ Makes sense, but only solves one part of the problem.
>>
>> String is probably a bad example here [as you already said, due to 1], but a better example is if you have a Nationality class with country name, timezone…etc in it.
>>
>> My point is, your suggestion works for nodes to which data is replicated to, but in the original node where you've created 100 Person instances for Spanish nationaility, you'd still potentially have 100 instances.
>>
>> Did you have anything in mind for this?
>
> That's where the ComponentRegistry's role kicks in: the user
> application created these object instances before storing them in the
> original node, and if it is a bit cleverly designed it will have
> something like a Map of immutable Nationality instances, so that every
> time it needs Spanish it looks up the same instance.

^ Yeah. The problem I was trying to highlight is what happens to the original instances.

I guess the problem of the original object instances goes away if no references are kept to it any more by the rest of the JVM and hence can be garbage collected. This is of course dependant on the client application, but hints would need to be provided for the OGM case so that users avoid such anti-pattern, right?

So, assuming no other refs are kept, you're left to the references that the cache has which are reduced in the process you explained.

> Consequentially the custom externalizer implementation needs access to
> the same service instance as used by the application, so that it can
> make use of the same pool rather than having to create his own pool
> instance: the essence of my proposal is really to have the user
> application and the Externalizer framework to share the same Factory.
>
>> Btw, not sure about the need of ComponentRegistry here. IMO, this kind of feature should work for Hot Rod clients too, where Externalizers might be used in the future, and where there's no ComponentRegistry (unless it's a RemoteCacheStore...)
>
> It doesn't need to be literally a ComponentRegistry interface
> implementation, just anything which allows the Externalizer to be
> initialized using some externally provided service as in the above
> example.
>
> This optimisation should have no functional impact but just an
> optionally implementable trick which saves some memory.. so if we can
> think of a way to do the same for Hot Rod that's very cool but doesn't
> necessarily have to use the same components and (internal) interfaces.
>
> I'm thinking of this as a similar "optionality" as we have when
> choosing between Serializable vs. custom Externalizers : people can
> plug one in if they know what they're doing (like these instances
> should definitely be immutable) but everything just works fine if you
> don't.
> I'm not really sure if there is a wide range of applications, nor have
> any idea of the amount of memory it could save in practice... just and
> idea I wanted to sketch.
> I suspect it might allow me to do some cool things with both OGM and
> Lucene Directoy, as you can re-hidratate complex object graphs from
> different cache entries, reassembling them with direct references...
> dreaming?

Not dreaming :). For sure we should focus on the most important use case here, which is OGM. We can always work on extending it to other bits at a later stage.

>
>>
>>>
>>> Cheers,
>>> Sanne
>>>
>>>
>>> 1 - or any immutable object: I'm using String as an example so let's
>>> forget about the static String pool optimizations the JVM might
>>> enable..
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Galder Zamarreno
In reply to this post by Mircea Markus

On Jul 9, 2012, at 9:52 AM, Mircea Markus wrote:

> On 06/07/2012 22:48, Sanne Grinovero wrote:
>> On 6 July 2012 15:06, Galder Zamarreño <[hidden email]> wrote:
>>> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:
>>>
>>>> Imagine I have a value object which needs to be stored in Infinispan:
>>>>
>>>> class Person {
>>>>   final String nationality = ...
>>>>   final String fullName = ...
>>>> [constructor]
>>>> }
>>>>
>>>> And now let's assume that - as you could expect - most Person
>>>> instances have the same value for the nationality String, but a
>>>> different name.
>>>>
>>>> I want to define a custom Externalizer for my type, but the current
>>>> Externalizer API doesn't allow to refer to some common application
>>>> context, which might be extremely useful to deserialize this Person
>>>> instance:
>>>>
>>>> we could avoid filling the memory of my Grid by having multiple copies
>>>> of the nationality String repeated all over, when a String [1] could
>>>> be reused.
>>>>
>>>> Would it be a good idea to have the Externalizer instances have an
>>>> initialization phase receiving a ComponentRegistry, so I could look up
>>>> some custom service to de-duplicate or otherwise optimize my in-memory
>>>> data representation?
>>>> Personally I'd prefer to receive it injected via the constructor so
>>>> that I could use a final field when my custom Externalizer is
>>>> constructed.
>>>>
>>>> This is OGM related.
>>> ^ Makes sense, but only solves one part of the problem.
>>>
>>> String is probably a bad example here [as you already said, due to 1], but a better example is if you have a Nationality class with country name, timezone…etc in it.
>>>
>>> My point is, your suggestion works for nodes to which data is replicated to, but in the original node where you've created 100 Person instances for Spanish nationaility, you'd still potentially have 100 instances.
>>>
>>> Did you have anything in mind for this?
>> That's where the ComponentRegistry's role kicks in: the user
>> application created these object instances before storing them in the
>> original node, and if it is a bit cleverly designed it will have
>> something like a Map of immutable Nationality instances, so that every
>> time it needs Spanish it looks up the same instance.
>>
>> Consequentially the custom externalizer implementation needs access to
>> the same service instance as used by the application, so that it can
>> make use of the same pool rather than having to create his own pool
>> instance: the essence of my proposal is really to have the user
>> application and the Externalizer framework to share the same Factory.
>>
>>> Btw, not sure about the need of ComponentRegistry here. IMO, this kind of feature should work for Hot Rod clients too, where Externalizers might be used in the future, and where there's no ComponentRegistry (unless it's a RemoteCacheStore...)
>> It doesn't need to be literally a ComponentRegistry interface
>> implementation, just anything which allows the Externalizer to be
>> initialized using some externally provided service as in the above
>> example.
>>
>> This optimisation should have no functional impact but just an
>> optionally implementable trick which saves some memory.. so if we can
>> think of a way to do the same for Hot Rod that's very cool but doesn't
>> necessarily have to use the same components and (internal) interfaces.
>>
>> I'm thinking of this as a similar "optionality" as we have when
>> choosing between Serializable vs. custom Externalizers : people can
>> plug one in if they know what they're doing (like these instances
>> should definitely be immutable) but everything just works fine if you
>> don't.
>> I'm not really sure if there is a wide range of applications, nor have
>> any idea of the amount of memory it could save in practice... just and
>> idea I wanted to sketch.
> I think there might be quite useful; the flyweight pattern[1] was
> created to solve exactly this kind of *existing* problems.
> Just as a note, there is a simple, not necessarily nice, workaround for
> this: make the object pool statically accessible (or even better Enums).

It's wise to avoid static object pools, cos they can lead to classloader leak issues. Enums might be better...

>
> [1] http://en.wikipedia.org/wiki/Flyweight_pattern
>
>> I suspect it might allow me to do some cool things with both OGM and
>> Lucene Directoy, as you can re-hidratate complex object graphs from
>> different cache entries, reassembling them with direct references...
>> dreaming?
>>
>>>> Cheers,
>>>> Sanne
>>>>
>>>>
>>>> 1 - or any immutable object: I'm using String as an example so let's
>>>> forget about the static String pool optimizations the JVM might
>>>> enable..
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> --
>>> Galder Zamarreño
>>> Sr. Software Engineer
>>> Infinispan, JBoss Cache
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Dan Berindei
On Tue, Jul 10, 2012 at 2:15 PM, Galder Zamarreño <[hidden email]> wrote:

On Jul 9, 2012, at 9:52 AM, Mircea Markus wrote:

> On 06/07/2012 22:48, Sanne Grinovero wrote:
>> On 6 July 2012 15:06, Galder Zamarreño <[hidden email]> wrote:
>>> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:
>>>
>>>> Imagine I have a value object which needs to be stored in Infinispan:
>>>>
>>>> class Person {
>>>>   final String nationality = ...
>>>>   final String fullName = ...
>>>> [constructor]
>>>> }
>>>>
>>>> And now let's assume that - as you could expect - most Person
>>>> instances have the same value for the nationality String, but a
>>>> different name.
>>>>
>>>> I want to define a custom Externalizer for my type, but the current
>>>> Externalizer API doesn't allow to refer to some common application
>>>> context, which might be extremely useful to deserialize this Person
>>>> instance:
>>>>
>>>> we could avoid filling the memory of my Grid by having multiple copies
>>>> of the nationality String repeated all over, when a String [1] could
>>>> be reused.
>>>>
>>>> Would it be a good idea to have the Externalizer instances have an
>>>> initialization phase receiving a ComponentRegistry, so I could look up
>>>> some custom service to de-duplicate or otherwise optimize my in-memory
>>>> data representation?
>>>> Personally I'd prefer to receive it injected via the constructor so
>>>> that I could use a final field when my custom Externalizer is
>>>> constructed.
>>>>
>>>> This is OGM related.
>>> ^ Makes sense, but only solves one part of the problem.
>>>
>>> String is probably a bad example here [as you already said, due to 1], but a better example is if you have a Nationality class with country name, timezone…etc in it.
>>>
>>> My point is, your suggestion works for nodes to which data is replicated to, but in the original node where you've created 100 Person instances for Spanish nationaility, you'd still potentially have 100 instances.
>>>
>>> Did you have anything in mind for this?
>> That's where the ComponentRegistry's role kicks in: the user
>> application created these object instances before storing them in the
>> original node, and if it is a bit cleverly designed it will have
>> something like a Map of immutable Nationality instances, so that every
>> time it needs Spanish it looks up the same instance.
>>
>> Consequentially the custom externalizer implementation needs access to
>> the same service instance as used by the application, so that it can
>> make use of the same pool rather than having to create his own pool
>> instance: the essence of my proposal is really to have the user
>> application and the Externalizer framework to share the same Factory.
>>
>>> Btw, not sure about the need of ComponentRegistry here. IMO, this kind of feature should work for Hot Rod clients too, where Externalizers might be used in the future, and where there's no ComponentRegistry (unless it's a RemoteCacheStore...)
>> It doesn't need to be literally a ComponentRegistry interface
>> implementation, just anything which allows the Externalizer to be
>> initialized using some externally provided service as in the above
>> example.
>>
>> This optimisation should have no functional impact but just an
>> optionally implementable trick which saves some memory.. so if we can
>> think of a way to do the same for Hot Rod that's very cool but doesn't
>> necessarily have to use the same components and (internal) interfaces.
>>
>> I'm thinking of this as a similar "optionality" as we have when
>> choosing between Serializable vs. custom Externalizers : people can
>> plug one in if they know what they're doing (like these instances
>> should definitely be immutable) but everything just works fine if you
>> don't.
>> I'm not really sure if there is a wide range of applications, nor have
>> any idea of the amount of memory it could save in practice... just and
>> idea I wanted to sketch.
> I think there might be quite useful; the flyweight pattern[1] was
> created to solve exactly this kind of *existing* problems.
> Just as a note, there is a simple, not necessarily nice, workaround for
> this: make the object pool statically accessible (or even better Enums).

It's wise to avoid static object pools, cos they can lead to classloader leak issues. Enums might be better...


Sanne already mentioned in another email that OGM doesn't know the actual data type at compile time, so switching to an enum is definitely not an option.

Although it might work well enough when you know the fields ahead of time, a single static cache does seem a bit simplistic for the general case. I think in general you'd want a cache per field, e.g. so that you can give up on caching once there are too many different values for that field.


>
> [1] http://en.wikipedia.org/wiki/Flyweight_pattern
>
>> I suspect it might allow me to do some cool things with both OGM and
>> Lucene Directoy, as you can re-hidratate complex object graphs from
>> different cache entries, reassembling them with direct references...
>> dreaming?
>>
>>>> Cheers,
>>>> Sanne
>>>>
>>>>
>>>> 1 - or any immutable object: I'm using String as an example so let's
>>>> forget about the static String pool optimizations the JVM might
>>>> enable..


_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Sanne Grinovero-2
On 10 July 2012 12:48, Dan Berindei <[hidden email]> wrote:

> On Tue, Jul 10, 2012 at 2:15 PM, Galder Zamarreño <[hidden email]> wrote:
>>
>>
>> On Jul 9, 2012, at 9:52 AM, Mircea Markus wrote:
>>
>> > On 06/07/2012 22:48, Sanne Grinovero wrote:
>> >> On 6 July 2012 15:06, Galder Zamarreño <[hidden email]> wrote:
>> >>> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:
>> >>>
>> >>>> Imagine I have a value object which needs to be stored in Infinispan:
>> >>>>
>> >>>> class Person {
>> >>>>   final String nationality = ...
>> >>>>   final String fullName = ...
>> >>>> [constructor]
>> >>>> }
>> >>>>
>> >>>> And now let's assume that - as you could expect - most Person
>> >>>> instances have the same value for the nationality String, but a
>> >>>> different name.
>> >>>>
>> >>>> I want to define a custom Externalizer for my type, but the current
>> >>>> Externalizer API doesn't allow to refer to some common application
>> >>>> context, which might be extremely useful to deserialize this Person
>> >>>> instance:
>> >>>>
>> >>>> we could avoid filling the memory of my Grid by having multiple
>> >>>> copies
>> >>>> of the nationality String repeated all over, when a String [1] could
>> >>>> be reused.
>> >>>>
>> >>>> Would it be a good idea to have the Externalizer instances have an
>> >>>> initialization phase receiving a ComponentRegistry, so I could look
>> >>>> up
>> >>>> some custom service to de-duplicate or otherwise optimize my
>> >>>> in-memory
>> >>>> data representation?
>> >>>> Personally I'd prefer to receive it injected via the constructor so
>> >>>> that I could use a final field when my custom Externalizer is
>> >>>> constructed.
>> >>>>
>> >>>> This is OGM related.
>> >>> ^ Makes sense, but only solves one part of the problem.
>> >>>
>> >>> String is probably a bad example here [as you already said, due to 1],
>> >>> but a better example is if you have a Nationality class with country name,
>> >>> timezone…etc in it.
>> >>>
>> >>> My point is, your suggestion works for nodes to which data is
>> >>> replicated to, but in the original node where you've created 100 Person
>> >>> instances for Spanish nationaility, you'd still potentially have 100
>> >>> instances.
>> >>>
>> >>> Did you have anything in mind for this?
>> >> That's where the ComponentRegistry's role kicks in: the user
>> >> application created these object instances before storing them in the
>> >> original node, and if it is a bit cleverly designed it will have
>> >> something like a Map of immutable Nationality instances, so that every
>> >> time it needs Spanish it looks up the same instance.
>> >>
>> >> Consequentially the custom externalizer implementation needs access to
>> >> the same service instance as used by the application, so that it can
>> >> make use of the same pool rather than having to create his own pool
>> >> instance: the essence of my proposal is really to have the user
>> >> application and the Externalizer framework to share the same Factory.
>> >>
>> >>> Btw, not sure about the need of ComponentRegistry here. IMO, this kind
>> >>> of feature should work for Hot Rod clients too, where Externalizers might be
>> >>> used in the future, and where there's no ComponentRegistry (unless it's a
>> >>> RemoteCacheStore...)
>> >> It doesn't need to be literally a ComponentRegistry interface
>> >> implementation, just anything which allows the Externalizer to be
>> >> initialized using some externally provided service as in the above
>> >> example.
>> >>
>> >> This optimisation should have no functional impact but just an
>> >> optionally implementable trick which saves some memory.. so if we can
>> >> think of a way to do the same for Hot Rod that's very cool but doesn't
>> >> necessarily have to use the same components and (internal) interfaces.
>> >>
>> >> I'm thinking of this as a similar "optionality" as we have when
>> >> choosing between Serializable vs. custom Externalizers : people can
>> >> plug one in if they know what they're doing (like these instances
>> >> should definitely be immutable) but everything just works fine if you
>> >> don't.
>> >> I'm not really sure if there is a wide range of applications, nor have
>> >> any idea of the amount of memory it could save in practice... just and
>> >> idea I wanted to sketch.
>> > I think there might be quite useful; the flyweight pattern[1] was
>> > created to solve exactly this kind of *existing* problems.
>> > Just as a note, there is a simple, not necessarily nice, workaround for
>> > this: make the object pool statically accessible (or even better Enums).
>>
>> It's wise to avoid static object pools, cos they can lead to classloader
>> leak issues. Enums might be better...
>>
>
> Sanne already mentioned in another email that OGM doesn't know the actual
> data type at compile time, so switching to an enum is definitely not an
> option.

+1, thanks.

> Although it might work well enough when you know the fields ahead of time, a
> single static cache does seem a bit simplistic for the general case. I think
> in general you'd want a cache per field, e.g. so that you can give up on
> caching once there are too many different values for that field.

Not sure what you mean by fields. I'm not intending to specify how
such a component would need to be designed, what I'd like is to be
able to access my application-provided services from a custom
Externalizer implementation. I would then be able to do something
clever, but leaving clever details to what is most suited for the
application, so I don't think Infinispan should try enforce any logic,
just expose the integration points.

To talk specifics, I wouldn't do this per user-type fields: as you say
I might have too many, my "cache" would need complex eviction logic.
But I know that some specific fields are all very likely the same;
think at "table name" for example, when storing the field "to which
table name this entry is related to", as column names and relation
roles.. so not the values, but still a good boost and likely more than
halving the memory overhead as for each entry we have more meta-data
stuff than actual user values.

>> > [1] http://en.wikipedia.org/wiki/Flyweight_pattern

Exactly.

>> >> I suspect it might allow me to do some cool things with both OGM and
>> >> Lucene Directoy, as you can re-hidratate complex object graphs from
>> >> different cache entries, reassembling them with direct references...
>> >> dreaming?
>> >>
>> >>>> Cheers,
>> >>>> Sanne
>> >>>>
>> >>>>
>> >>>> 1 - or any immutable object: I'm using String as an example so let's
>> >>>> forget about the static String pool optimizations the JVM might
>> >>>> enable..
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Sanne Grinovero-3
In reply to this post by Galder Zamarreno
On 10 July 2012 12:09, Galder Zamarreño <[hidden email]> wrote:

>
> On Jul 6, 2012, at 11:48 PM, Sanne Grinovero wrote:
>
>> On 6 July 2012 15:06, Galder Zamarreño <[hidden email]> wrote:
>>>
>>> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:
>>>
>>>> Imagine I have a value object which needs to be stored in Infinispan:
>>>>
>>>> class Person {
>>>>  final String nationality = ...
>>>>  final String fullName = ...
>>>> [constructor]
>>>> }
>>>>
>>>> And now let's assume that - as you could expect - most Person
>>>> instances have the same value for the nationality String, but a
>>>> different name.
>>>>
>>>> I want to define a custom Externalizer for my type, but the current
>>>> Externalizer API doesn't allow to refer to some common application
>>>> context, which might be extremely useful to deserialize this Person
>>>> instance:
>>>>
>>>> we could avoid filling the memory of my Grid by having multiple copies
>>>> of the nationality String repeated all over, when a String [1] could
>>>> be reused.
>>>>
>>>> Would it be a good idea to have the Externalizer instances have an
>>>> initialization phase receiving a ComponentRegistry, so I could look up
>>>> some custom service to de-duplicate or otherwise optimize my in-memory
>>>> data representation?
>>>> Personally I'd prefer to receive it injected via the constructor so
>>>> that I could use a final field when my custom Externalizer is
>>>> constructed.
>>>>
>>>> This is OGM related.
>>>
>>> ^ Makes sense, but only solves one part of the problem.
>>>
>>> String is probably a bad example here [as you already said, due to 1], but a better example is if you have a Nationality class with country name, timezone…etc in it.
>>>
>>> My point is, your suggestion works for nodes to which data is replicated to, but in the original node where you've created 100 Person instances for Spanish nationaility, you'd still potentially have 100 instances.
>>>
>>> Did you have anything in mind for this?
>>
>> That's where the ComponentRegistry's role kicks in: the user
>> application created these object instances before storing them in the
>> original node, and if it is a bit cleverly designed it will have
>> something like a Map of immutable Nationality instances, so that every
>> time it needs Spanish it looks up the same instance.
>
> ^ Yeah. The problem I was trying to highlight is what happens to the original instances.
>
> I guess the problem of the original object instances goes away if no references are kept to it any more by the rest of the JVM and hence can be garbage collected. This is of course dependant on the client application, but hints would need to be provided for the OGM case so that users avoid such anti-pattern, right?

I intend to use this optimisation only on selected types, specifically
objects which are not exposed to the application: as you say that
would be tricky.

There is not such "original instance" as all instances would be
created by the same factory; that's why I want to share the factory
instance between the Externalizer and the application: for it to work,
they should not have two different pools.
Which implies there will be always a single Nationality instance with
value "Spanish" both in the Infinispan internals and the app using it.

>
> So, assuming no other refs are kept, you're left to the references that the cache has which are reduced in the process you explained.
>
>> Consequentially the custom externalizer implementation needs access to
>> the same service instance as used by the application, so that it can
>> make use of the same pool rather than having to create his own pool
>> instance: the essence of my proposal is really to have the user
>> application and the Externalizer framework to share the same Factory.
>>
>>> Btw, not sure about the need of ComponentRegistry here. IMO, this kind of feature should work for Hot Rod clients too, where Externalizers might be used in the future, and where there's no ComponentRegistry (unless it's a RemoteCacheStore...)
>>
>> It doesn't need to be literally a ComponentRegistry interface
>> implementation, just anything which allows the Externalizer to be
>> initialized using some externally provided service as in the above
>> example.
>>
>> This optimisation should have no functional impact but just an
>> optionally implementable trick which saves some memory.. so if we can
>> think of a way to do the same for Hot Rod that's very cool but doesn't
>> necessarily have to use the same components and (internal) interfaces.
>>
>> I'm thinking of this as a similar "optionality" as we have when
>> choosing between Serializable vs. custom Externalizers : people can
>> plug one in if they know what they're doing (like these instances
>> should definitely be immutable) but everything just works fine if you
>> don't.
>> I'm not really sure if there is a wide range of applications, nor have
>> any idea of the amount of memory it could save in practice... just and
>> idea I wanted to sketch.
>> I suspect it might allow me to do some cool things with both OGM and
>> Lucene Directoy, as you can re-hidratate complex object graphs from
>> different cache entries, reassembling them with direct references...
>> dreaming?
>
> Not dreaming :). For sure we should focus on the most important use case here, which is OGM. We can always work on extending it to other bits at a later stage.
>
>>
>>>
>>>>
>>>> Cheers,
>>>> Sanne
>>>>
>>>>
>>>> 1 - or any immutable object: I'm using String as an example so let's
>>>> forget about the static String pool optimizations the JVM might
>>>> enable..
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> --
>>> Galder Zamarreño
>>> Sr. Software Engineer
>>> Infinispan, JBoss Cache
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
>
>
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Mircea Markus
In reply to this post by Dan Berindei

On 10 Jul 2012, at 12:48, Dan Berindei wrote:

On Tue, Jul 10, 2012 at 2:15 PM, Galder Zamarreño <[hidden email]> wrote:

On Jul 9, 2012, at 9:52 AM, Mircea Markus wrote:

> On 06/07/2012 22:48, Sanne Grinovero wrote:
>> On 6 July 2012 15:06, Galder Zamarreño <[hidden email]> wrote:
>>> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:
>>>
>>>> Imagine I have a value object which needs to be stored in Infinispan:
>>>>
>>>> class Person {
>>>>   final String nationality = ...
>>>>   final String fullName = ...
>>>> [constructor]
>>>> }
>>>>
>>>> And now let's assume that - as you could expect - most Person
>>>> instances have the same value for the nationality String, but a
>>>> different name.
>>>>
>>>> I want to define a custom Externalizer for my type, but the current
>>>> Externalizer API doesn't allow to refer to some common application
>>>> context, which might be extremely useful to deserialize this Person
>>>> instance:
>>>>
>>>> we could avoid filling the memory of my Grid by having multiple copies
>>>> of the nationality String repeated all over, when a String [1] could
>>>> be reused.
>>>>
>>>> Would it be a good idea to have the Externalizer instances have an
>>>> initialization phase receiving a ComponentRegistry, so I could look up
>>>> some custom service to de-duplicate or otherwise optimize my in-memory
>>>> data representation?
>>>> Personally I'd prefer to receive it injected via the constructor so
>>>> that I could use a final field when my custom Externalizer is
>>>> constructed.
>>>>
>>>> This is OGM related.
>>> ^ Makes sense, but only solves one part of the problem.
>>>
>>> String is probably a bad example here [as you already said, due to 1], but a better example is if you have a Nationality class with country name, timezone…etc in it.
>>>
>>> My point is, your suggestion works for nodes to which data is replicated to, but in the original node where you've created 100 Person instances for Spanish nationaility, you'd still potentially have 100 instances.
>>>
>>> Did you have anything in mind for this?
>> That's where the ComponentRegistry's role kicks in: the user
>> application created these object instances before storing them in the
>> original node, and if it is a bit cleverly designed it will have
>> something like a Map of immutable Nationality instances, so that every
>> time it needs Spanish it looks up the same instance.
>>
>> Consequentially the custom externalizer implementation needs access to
>> the same service instance as used by the application, so that it can
>> make use of the same pool rather than having to create his own pool
>> instance: the essence of my proposal is really to have the user
>> application and the Externalizer framework to share the same Factory.
>>
>>> Btw, not sure about the need of ComponentRegistry here. IMO, this kind of feature should work for Hot Rod clients too, where Externalizers might be used in the future, and where there's no ComponentRegistry (unless it's a RemoteCacheStore...)
>> It doesn't need to be literally a ComponentRegistry interface
>> implementation, just anything which allows the Externalizer to be
>> initialized using some externally provided service as in the above
>> example.
>>
>> This optimisation should have no functional impact but just an
>> optionally implementable trick which saves some memory.. so if we can
>> think of a way to do the same for Hot Rod that's very cool but doesn't
>> necessarily have to use the same components and (internal) interfaces.
>>
>> I'm thinking of this as a similar "optionality" as we have when
>> choosing between Serializable vs. custom Externalizers : people can
>> plug one in if they know what they're doing (like these instances
>> should definitely be immutable) but everything just works fine if you
>> don't.
>> I'm not really sure if there is a wide range of applications, nor have
>> any idea of the amount of memory it could save in practice... just and
>> idea I wanted to sketch.
> I think there might be quite useful; the flyweight pattern[1] was
> created to solve exactly this kind of *existing* problems.
> Just as a note, there is a simple, not necessarily nice, workaround for
> this: make the object pool statically accessible (or even better Enums).

It's wise to avoid static object pools, cos they can lead to classloader leak issues. Enums might be better...


Sanne already mentioned in another email that OGM doesn't know the actual data type at compile time, so switching to an enum is definitely not an option.
The way I understand this, the pool would be required in the custom Externalizer implementation. More specific in the o.i.marshall.Externalizer.readObject.
Even if the type of objects are not know at compile time, the associated Externalizer implementation where the caching logic resides is  aware of its fields's types - which can be enums.   


Although it might work well enough when you know the fields ahead of time, a single static cache does seem a bit simplistic for the general case. I think in general you'd want a cache per field, e.g. so that you can give up on caching once there are too many different values for that field.



_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Providing a context for object de-serialization

Dan Berindei
In reply to this post by Sanne Grinovero-2
On Thu, Jul 12, 2012 at 2:15 AM, Sanne Grinovero <[hidden email]> wrote:
On 10 July 2012 12:48, Dan Berindei <[hidden email]> wrote:
> On Tue, Jul 10, 2012 at 2:15 PM, Galder Zamarreño <[hidden email]> wrote:
>>
>>
>> On Jul 9, 2012, at 9:52 AM, Mircea Markus wrote:
>>
>> > On 06/07/2012 22:48, Sanne Grinovero wrote:
>> >> On 6 July 2012 15:06, Galder Zamarreño <[hidden email]> wrote:
>> >>> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:
>> >>>
>> >>>> Imagine I have a value object which needs to be stored in Infinispan:
>> >>>>
>> >>>> class Person {
>> >>>>   final String nationality = ...
>> >>>>   final String fullName = ...
>> >>>> [constructor]
>> >>>> }
>> >>>>
>> >>>> And now let's assume that - as you could expect - most Person
>> >>>> instances have the same value for the nationality String, but a
>> >>>> different name.
>> >>>>
>> >>>> I want to define a custom Externalizer for my type, but the current
>> >>>> Externalizer API doesn't allow to refer to some common application
>> >>>> context, which might be extremely useful to deserialize this Person
>> >>>> instance:
>> >>>>
>> >>>> we could avoid filling the memory of my Grid by having multiple
>> >>>> copies
>> >>>> of the nationality String repeated all over, when a String [1] could
>> >>>> be reused.
>> >>>>
>> >>>> Would it be a good idea to have the Externalizer instances have an
>> >>>> initialization phase receiving a ComponentRegistry, so I could look
>> >>>> up
>> >>>> some custom service to de-duplicate or otherwise optimize my
>> >>>> in-memory
>> >>>> data representation?
>> >>>> Personally I'd prefer to receive it injected via the constructor so
>> >>>> that I could use a final field when my custom Externalizer is
>> >>>> constructed.
>> >>>>
>> >>>> This is OGM related.
>> >>> ^ Makes sense, but only solves one part of the problem.
>> >>>
>> >>> String is probably a bad example here [as you already said, due to 1],
>> >>> but a better example is if you have a Nationality class with country name,
>> >>> timezone…etc in it.
>> >>>
>> >>> My point is, your suggestion works for nodes to which data is
>> >>> replicated to, but in the original node where you've created 100 Person
>> >>> instances for Spanish nationaility, you'd still potentially have 100
>> >>> instances.
>> >>>
>> >>> Did you have anything in mind for this?
>> >> That's where the ComponentRegistry's role kicks in: the user
>> >> application created these object instances before storing them in the
>> >> original node, and if it is a bit cleverly designed it will have
>> >> something like a Map of immutable Nationality instances, so that every
>> >> time it needs Spanish it looks up the same instance.
>> >>
>> >> Consequentially the custom externalizer implementation needs access to
>> >> the same service instance as used by the application, so that it can
>> >> make use of the same pool rather than having to create his own pool
>> >> instance: the essence of my proposal is really to have the user
>> >> application and the Externalizer framework to share the same Factory.
>> >>
>> >>> Btw, not sure about the need of ComponentRegistry here. IMO, this kind
>> >>> of feature should work for Hot Rod clients too, where Externalizers might be
>> >>> used in the future, and where there's no ComponentRegistry (unless it's a
>> >>> RemoteCacheStore...)
>> >> It doesn't need to be literally a ComponentRegistry interface
>> >> implementation, just anything which allows the Externalizer to be
>> >> initialized using some externally provided service as in the above
>> >> example.
>> >>
>> >> This optimisation should have no functional impact but just an
>> >> optionally implementable trick which saves some memory.. so if we can
>> >> think of a way to do the same for Hot Rod that's very cool but doesn't
>> >> necessarily have to use the same components and (internal) interfaces.
>> >>
>> >> I'm thinking of this as a similar "optionality" as we have when
>> >> choosing between Serializable vs. custom Externalizers : people can
>> >> plug one in if they know what they're doing (like these instances
>> >> should definitely be immutable) but everything just works fine if you
>> >> don't.
>> >> I'm not really sure if there is a wide range of applications, nor have
>> >> any idea of the amount of memory it could save in practice... just and
>> >> idea I wanted to sketch.
>> > I think there might be quite useful; the flyweight pattern[1] was
>> > created to solve exactly this kind of *existing* problems.
>> > Just as a note, there is a simple, not necessarily nice, workaround for
>> > this: make the object pool statically accessible (or even better Enums).
>>
>> It's wise to avoid static object pools, cos they can lead to classloader
>> leak issues. Enums might be better...
>>
>
> Sanne already mentioned in another email that OGM doesn't know the actual
> data type at compile time, so switching to an enum is definitely not an
> option.

+1, thanks.

> Although it might work well enough when you know the fields ahead of time, a
> single static cache does seem a bit simplistic for the general case. I think
> in general you'd want a cache per field, e.g. so that you can give up on
> caching once there are too many different values for that field.

Not sure what you mean by fields. I'm not intending to specify how
such a component would need to be designed, what I'd like is to be
able to access my application-provided services from a custom
Externalizer implementation. I would then be able to do something
clever, but leaving clever details to what is most suited for the
application, so I don't think Infinispan should try enforce any logic,
just expose the integration points.


I meant a regular Java field, since that's what the Externalizer deals with. But what I had in mind was a generic Externalizer for user-supplied classes (registered at runtime), so the externalizer would need get the field metadata from a central registry and based on the current conditions it would decide whether to cache the deserialized value or not.

I think we all agree that Infinispan should not be concerned about how exactly this will be implemented. The discussion seems to be around whether there really is a need for such a smarter externalizer.

 
To talk specifics, I wouldn't do this per user-type fields: as you say
I might have too many, my "cache" would need complex eviction logic.
But I know that some specific fields are all very likely the same;
think at "table name" for example, when storing the field "to which
table name this entry is related to", as column names and relation
roles.. so not the values, but still a good boost and likely more than
halving the memory overhead as for each entry we have more meta-data
stuff than actual user values.


The table name example isn't convincing enough, I think String.intern() would actually be a great fit here as you don't really need eviction :)


>> > [1] http://en.wikipedia.org/wiki/Flyweight_pattern

Exactly.

>> >> I suspect it might allow me to do some cool things with both OGM and
>> >> Lucene Directoy, as you can re-hidratate complex object graphs from
>> >> different cache entries, reassembling them with direct references...
>> >> dreaming?
>> >>
>> >>>> Cheers,
>> >>>> Sanne
>> >>>>
>> >>>>
>> >>>> 1 - or any immutable object: I'm using String as an example so let's
>> >>>> forget about the static String pool optimizations the JVM might
>> >>>> enable..
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev