Re: [infinispan-dev] Extend GridFS

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Extend GridFS

Yuri de Wit
Hi Galder,

Thanks for your reply. Let me continue this discussion here first to
validate my thinking before I create any issues in JIRA (forgive me
for the lengthy follow up).

First of all, thanks for this wonderful project! I started looking
into Ehcache as the default caching implementation, but found it
lacking on some key features when using JGroups. My guess is that all
the development there is going towards the Terracotta distribution
instead of JGroups. Terracotta does seems like a wonderful product,
but I was hoping to stick to JGroups based caching impl. So I was
happy to have found Infinispan.

I need to create a distributed cache that loads data from the file
system. It's a tree of folders/files containing mostly metadata info
that changes seldom, but changes. Our mid-term goal is to move the
metadata away from the file system and into a database, but that is
not feasible now due to a tight deadline and the risks of refactoring
too much of the code base.

So I was happy to see the GridFilesystem implementation in Infinispan
and the fact that clustered caches can be lazily populated (the
metadata tree in the FS can be large and having all nodes in the
cluster preloaded with all the data would not work for us). However,
it defines it's own persistence scheme with specific file names and
serialized buckets, which would require us to have a cache-aside
strategy to read our metadata tree and populate the GridFilesystem
with it.

What I am looking for is to be able to plug into the GridFilesystem a
new FileCacheStore that can load directly from an existing directory
tree, transparently. This will basically automatically lazy load FS
content across the cluster without having to pre-populate the
GridFilesystem programatically.

At first I was hoping to extend the existing FileCacheStore to support
this (hence why I was asking for a GripInputStream.skip()/available()
implementation and make the constructors protected instead package
level access), but I later realized that what I needed was an entire
new implementation since the buckets abstraction there is not really
appropriate.

The good news is that I am close to 75% complete with the impl here.
It is working, with a few caveats, beautifully for a single node, but
I am facing some issues trying to launch a second node in the cluster
(most of it my ignorance, I am sure).

** Do you see any issues with this approach that I am not aware of?

In addition, I am having a couple of issues launching the second node
in the cluster. A couple of NPEs and an exception
"java.net.NoRouteToHostException: No route to host". I will send the
details to these exceptions in a follow up email.

This is where I am stuck at the moment. In my setup I have two
configuration files:
* cache-master.xml
* cache-slave.xml
Both define data and the metadata caches required by GridFilesystem
but -master.xml configures the custom FileCacheStore I implemented and
-slave.xml uses the ClusterCacheLoader.

These are some of the items/todo's for this custom FileCacheStore impl:
** Implement chunked writes with a special chunking protocol to
trigger when the last chunk has been delivered
** custom configuration to simplify it for GridFilesystem.

regards,
-- yuri







 With the exception of supporting a safe chunk write (for now I am
sending the whole file content when writing to the cache, since
chunked write would require additional changes to GridFS such as a
protocol to let the loader know that the current chunk is the last one
and so it can finally update the underlying file as a whole, etc).
 can parsed and translate into file read on the real


Any change to implement the skip() and available() methods in
GridInputStream or make the constructors in the GridFileSystem package
public so I can easily extend them?

I am trying to plug in a custom FileMetadataCacheStore and
FileDataCacheStore implementation under the metadata/data caches used
by the GridFS so that loading from an existing FS it completely
transparent and lazy (I'll be happy to contribute if it makes sense).
The problem is that any BufferedInputStream's wrapped around the
GridInputStream call available/skip but they are not implemented in
gridfs.

Do you also see any issues with the above approach?

regards,


On Mon, Jul 11, 2011 at 12:06 PM, galderz
<[hidden email]>
wrote:

> Hey Yuri,
>
> Why do you need two new file based stores? Can't you plug Infinispan with a file based cache store to give you FS persistence?
>
> Anyway, I'd suggest you discuss it in the Infinispan dev list (http://lists.jboss.org/pipermail/infinispan-dev/) and in parallel, create an issue in https://issues.jboss.org/browse/ISPN
>
> Cheers,
> Galder
>
> --
> Reply to this email directly or view it on GitHub:
> http://github.com/inbox/9770320#reply
>
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Extend GridFS

Yuri de Wit
Following up on my setup for the GridFilesystem + custom FileCacheStore...

There is a single node (master) that is responsible to reading/writing
to the file system. There will be many more nodes (slaves) that will
fetch File content when needed. So my setup has two configuration
files:
* cache-master.xml, and
* cache-slave.xml

The cache-master.xml defines (using the std jgroups-udp.xml conf):
----------------------------------------------------------------------
        <namedCache name="type-metadata">
                <clustering mode="replication">
                        <stateRetrieval timeout="20000" fetchInMemoryState="true"
                                alwaysProvideInMemoryState="true" />
                        <sync replTimeout="20000" />
                </clustering>
                <loaders passivation="false" shared="true" preload="true">
                        <loader class="com.my.cache.loaders.FileMetadataCacheStore"
                                fetchPersistentState="false" purgeOnStartup="false">
                                <properties>
                                        <property name="location" value="/data" />
                                </properties>
                        </loader>
                </loaders>
        </namedCache>

        <namedCache name="type-data">
                <clustering mode="invalidation">
                        <sync replTimeout="20000" />
                </clustering>
                <loaders passivation="false" shared="true" preload="false">
                        <loader class="com.my.cache.loaders.FileDataCacheStore"
                                fetchPersistentState="false" purgeOnStartup="false">
                                <properties>
                                        <property name="location" value="/data" />
                                </properties>
                        </loader>
                </loaders>
        </namedCache>
----------------------------------------------------------------------
And here is the cache-slave.xml (also using the std jgroups-udp.xml conf):
----------------------------------------------------------------------
        <namedCache name="type-metadata">
                <clustering mode="replication">
                        <stateRetrieval timeout="20000" fetchInMemoryState="true"
                                alwaysProvideInMemoryState="true" />
                        <sync replTimeout="20000" />
                </clustering>
                <loaders preload="true">
                        <loader class="org.infinispan.loaders.cluster.ClusterCacheLoader">
                                <properties>
                                        <property name="remoteCallTimeout" value="20000" />
                                </properties>
                        </loader>
                </loaders>
        </namedCache>

        <namedCache name="type-data">
                <clustering mode="invalidation">
                        <sync replTimeout="20000" />
                </clustering>
                <loaders preload="false">
                        <loader class="org.infinispan.loaders.cluster.ClusterCacheLoader">
                                <properties>
                                        <property name="remoteCallTimeout" value="20000" />
                                </properties>
                        </loader>
                </loaders>
        </namedCache>
----------------------------------------------------------------------

The master starts up fine, but when starting the 1st slave I get:

java.net.NoRouteToHostException: No route to host
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)
        at java.net.Socket.connect(Socket.java:529)
        at org.jgroups.util.Util.connect(Util.java:276)
        at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:510)
        at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:462)
        at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:223)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:189)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
        at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:613)
        at org.jgroups.protocols.UNICAST.up(UNICAST.java:294)


Any ideas?

thanks,
-- yuri

On Mon, Jul 11, 2011 at 8:58 PM, Yuri de Wit <[hidden email]> wrote:

> Hi Galder,
>
> Thanks for your reply. Let me continue this discussion here first to
> validate my thinking before I create any issues in JIRA (forgive me
> for the lengthy follow up).
>
> First of all, thanks for this wonderful project! I started looking
> into Ehcache as the default caching implementation, but found it
> lacking on some key features when using JGroups. My guess is that all
> the development there is going towards the Terracotta distribution
> instead of JGroups. Terracotta does seems like a wonderful product,
> but I was hoping to stick to JGroups based caching impl. So I was
> happy to have found Infinispan.
>
> I need to create a distributed cache that loads data from the file
> system. It's a tree of folders/files containing mostly metadata info
> that changes seldom, but changes. Our mid-term goal is to move the
> metadata away from the file system and into a database, but that is
> not feasible now due to a tight deadline and the risks of refactoring
> too much of the code base.
>
> So I was happy to see the GridFilesystem implementation in Infinispan
> and the fact that clustered caches can be lazily populated (the
> metadata tree in the FS can be large and having all nodes in the
> cluster preloaded with all the data would not work for us). However,
> it defines it's own persistence scheme with specific file names and
> serialized buckets, which would require us to have a cache-aside
> strategy to read our metadata tree and populate the GridFilesystem
> with it.
>
> What I am looking for is to be able to plug into the GridFilesystem a
> new FileCacheStore that can load directly from an existing directory
> tree, transparently. This will basically automatically lazy load FS
> content across the cluster without having to pre-populate the
> GridFilesystem programatically.
>
> At first I was hoping to extend the existing FileCacheStore to support
> this (hence why I was asking for a GripInputStream.skip()/available()
> implementation and make the constructors protected instead package
> level access), but I later realized that what I needed was an entire
> new implementation since the buckets abstraction there is not really
> appropriate.
>
> The good news is that I am close to 75% complete with the impl here.
> It is working, with a few caveats, beautifully for a single node, but
> I am facing some issues trying to launch a second node in the cluster
> (most of it my ignorance, I am sure).
>
> ** Do you see any issues with this approach that I am not aware of?
>
> In addition, I am having a couple of issues launching the second node
> in the cluster. A couple of NPEs and an exception
> "java.net.NoRouteToHostException: No route to host". I will send the
> details to these exceptions in a follow up email.
>
> This is where I am stuck at the moment. In my setup I have two
> configuration files:
> * cache-master.xml
> * cache-slave.xml
> Both define data and the metadata caches required by GridFilesystem
> but -master.xml configures the custom FileCacheStore I implemented and
> -slave.xml uses the ClusterCacheLoader.
>
> These are some of the items/todo's for this custom FileCacheStore impl:
> ** Implement chunked writes with a special chunking protocol to
> trigger when the last chunk has been delivered
> ** custom configuration to simplify it for GridFilesystem.
>
> regards,
> -- yuri
>
>
>
>
>
>
>
>  With the exception of supporting a safe chunk write (for now I am
> sending the whole file content when writing to the cache, since
> chunked write would require additional changes to GridFS such as a
> protocol to let the loader know that the current chunk is the last one
> and so it can finally update the underlying file as a whole, etc).
>  can parsed and translate into file read on the real
>
>
> Any change to implement the skip() and available() methods in
> GridInputStream or make the constructors in the GridFileSystem package
> public so I can easily extend them?
>
> I am trying to plug in a custom FileMetadataCacheStore and
> FileDataCacheStore implementation under the metadata/data caches used
> by the GridFS so that loading from an existing FS it completely
> transparent and lazy (I'll be happy to contribute if it makes sense).
> The problem is that any BufferedInputStream's wrapped around the
> GridInputStream call available/skip but they are not implemented in
> gridfs.
>
> Do you also see any issues with the above approach?
>
> regards,
>
>
> On Mon, Jul 11, 2011 at 12:06 PM, galderz
> <[hidden email]>
> wrote:
>> Hey Yuri,
>>
>> Why do you need two new file based stores? Can't you plug Infinispan with a file based cache store to give you FS persistence?
>>
>> Anyway, I'd suggest you discuss it in the Infinispan dev list (http://lists.jboss.org/pipermail/infinispan-dev/) and in parallel, create an issue in https://issues.jboss.org/browse/ISPN
>>
>> Cheers,
>> Galder
>>
>> --
>> Reply to this email directly or view it on GitHub:
>> http://github.com/inbox/9770320#reply
>>
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Extend GridFS

Bela Ban
In reply to this post by Yuri de Wit


On 7/12/11 2:58 AM, Yuri de Wit wrote:

> Hi Galder,
>
> Thanks for your reply. Let me continue this discussion here first to
> validate my thinking before I create any issues in JIRA (forgive me
> for the lengthy follow up).
>
> First of all, thanks for this wonderful project! I started looking
> into Ehcache as the default caching implementation, but found it
> lacking on some key features when using JGroups. My guess is that all
> the development there is going towards the Terracotta distribution
> instead of JGroups. Terracotta does seems like a wonderful product,
> but I was hoping to stick to JGroups based caching impl. So I was
> happy to have found Infinispan.


Yes, I guess Terracotta (the company) has no interest in using something
other than Terracotta (the product) in ehcache, let alone supporting a
competitor...

However, I heard that recently the JGroups plugin for ehcache (which
used to be terrible) was updated by some outside contributor...


> I need to create a distributed cache that loads data from the file
> system. It's a tree of folders/files containing mostly metadata info
> that changes seldom, but changes. Our mid-term goal is to move the
> metadata away from the file system and into a database, but that is
> not feasible now due to a tight deadline and the risks of refactoring
> too much of the code base.
>
> So I was happy to see the GridFilesystem implementation in Infinispan
> and the fact that clustered caches can be lazily populated (the
> metadata tree in the FS can be large and having all nodes in the
> cluster preloaded with all the data would not work for us).


Note that I wrote GridFS as a prototype in JGroups, and then Manik
copied it over to Infinispan. Code quality is beta and not all methods
have been implemented. So, in short, this is to say that GridFS needs
some work before it can be used in production !


>  However,
> it defines it's own persistence scheme with specific file names and
> serialized buckets, which would require us to have a cache-aside
> strategy to read our metadata tree and populate the GridFilesystem
> with it.
>
> What I am looking for is to be able to plug into the GridFilesystem a
> new FileCacheStore that can load directly from an existing directory
> tree, transparently. This will basically automatically lazy load FS
> content across the cluster without having to pre-populate the
> GridFilesystem programatically.


Interesting... I guess that loader would have to know the mapping of
files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
/home/bela/dump.txt' from the file system and return it, unless it's in
the local cache.

This requires that your loader knows the chunk size and the
mapping/naming between files and chunks...

Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
chunk number, the suffix should incorporate the index (in bytes), e.g.
/home/bela/dump.txt.#6000 ?

Also, a put() on the cache loader would have to update the real file,
and *not* store a chunk named "/home/bela/dump.txt.#3"...




--
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Extend GridFS

Bela Ban
In reply to this post by Yuri de Wit
If you enable tracing for
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER, you should see
why the socket cannot be created. The NoRouteToHost exception might be
misleading (but do check your routing table anyway!), it might be a
missing jgroups.bind_addr system property, used by STREAMING_STATE_TRANSFER.


On 7/12/11 4:03 AM, Yuri de Wit wrote:
> Following up on my setup for the GridFilesystem + custom FileCacheStore...
>
> There is a single node (master) that is responsible to reading/writing
> to the file system. There will be many more nodes (slaves) that will
> fetch File content when needed. So my setup has two configuration

>
> The master starts up fine, but when starting the 1st slave I get:
>
> java.net.NoRouteToHostException: No route to host
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)
> at java.net.Socket.connect(Socket.java:529)
> at org.jgroups.util.Util.connect(Util.java:276)
> at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:510)
> at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:462)
> at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:223)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:189)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
> at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:613)
> at org.jgroups.protocols.UNICAST.up(UNICAST.java:294)
>

--
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Extend GridFS

Yuri de Wit
In reply to this post by Bela Ban
Hi Bela,

Thanks for the note re: status. I read about the experimental note in
the docs and also noted that some of the methods were not implemented
(e.g. InputStream.skip() and InputStream.available). Refactoring our
code base is a bigger risk at this point and what I was able to get
working so far is promising.

> Interesting... I guess that loader would have to know the mapping of
> files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
> get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
> /home/bela/dump.txt' from the file system and return it, unless it's in
> the local cache.

Correct. This is exactly how I implemented the store after looking
into the existing FileCacheStore. It is basically a base
FileCacheStore extending LockSupportCacheStore and two subclasses:
FileMetadataCacheStore and FileDataCacheStore. The first subclass
returns Metadata entries and the second one returns byte[] chunks.

> This requires that your loader knows the chunk size and the
> mapping/naming between files and chunks...

Right now I am setting the preferred chunk size in the
FileDataCacheStore properties in my config file and when I instantiate
the GridFilesystem (traversing the configs in
getCacheLoaderManagerConfig) I pass in the same value as the default
chunk size there.

> Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
> chunk number, the suffix should incorporate the index (in bytes), e.g.
> /home/bela/dump.txt.#6000 ?

Interesting. This could be a bit more reliable, but it wouldnt
eliminate the need to define the chunk size. In theory, OutputStream
chunk size could be different than input chunking and the former could
be client driven and the latter loader driven. However, I am not sure
the actual benefits and maybe a single chunk size for the cluster
could be good enough.

Chunking writes is a bit more complex since you don't want to write
chunk #5 and have the client node stop writing to the OutputStream,
for instance (or multiple clients writing at the same time). For now I
have disabled write chunking (worse case scenario the slaves are
read-only and writes only through master), but I could envision a
protocol where the chunks are written to a temp file based on a unique
client stream id and triggered by an OutputStream.close(). A close
would push a 'closing' chunk with or without actual data that would
replace the original file on disk. The locking scheme on
LockSupportCacheStore would make sure there is some FS protection and
the last client closing the stream would win if multiple clients are
writing to the same file at once (or maybe an explicit lock using the
Cache API?).

Another issue I found is with creating directories, but most likely
with my rewrite. A new GridFile could become a folder or a file so the
Metadata must have no flags set to (1) mimic the behavior of a real
File and to (2) make sure the impl can properly implement mkdir(),
mkdirs() and exists().

>
> Also, a put() on the cache loader would have to update the real file,
> and *not* store a chunk named "/home/bela/dump.txt.#3"...
>
This is already taken care of by the new implementation i created. The
only caveat is the chunking on write issues described above.

I have cloned the infinispan project on github and would be happy to
commit the changes somewhere there so you could take a peak, if
interested.

One last note is regarding configuration. It seems that the metadata
cache has to use full replication (or at least it would make the most
sense) and the data cache has to use distribution mode. It took me a
few rounds to get it somewhat working (again, I am still learning the
product and JGroups), but it seems that some of this configuration
could be hidden from the user. Food for thought and the least of my
concerns right now.

Let me know what you think.

- yuri


On Tue, Jul 12, 2011 at 4:17 AM, Bela Ban <[hidden email]> wrote:

>
>
> On 7/12/11 2:58 AM, Yuri de Wit wrote:
>> Hi Galder,
>>
>> Thanks for your reply. Let me continue this discussion here first to
>> validate my thinking before I create any issues in JIRA (forgive me
>> for the lengthy follow up).
>>
>> First of all, thanks for this wonderful project! I started looking
>> into Ehcache as the default caching implementation, but found it
>> lacking on some key features when using JGroups. My guess is that all
>> the development there is going towards the Terracotta distribution
>> instead of JGroups. Terracotta does seems like a wonderful product,
>> but I was hoping to stick to JGroups based caching impl. So I was
>> happy to have found Infinispan.
>
>
> Yes, I guess Terracotta (the company) has no interest in using something
> other than Terracotta (the product) in ehcache, let alone supporting a
> competitor...
>
> However, I heard that recently the JGroups plugin for ehcache (which
> used to be terrible) was updated by some outside contributor...
>
>
>> I need to create a distributed cache that loads data from the file
>> system. It's a tree of folders/files containing mostly metadata info
>> that changes seldom, but changes. Our mid-term goal is to move the
>> metadata away from the file system and into a database, but that is
>> not feasible now due to a tight deadline and the risks of refactoring
>> too much of the code base.
>>
>> So I was happy to see the GridFilesystem implementation in Infinispan
>> and the fact that clustered caches can be lazily populated (the
>> metadata tree in the FS can be large and having all nodes in the
>> cluster preloaded with all the data would not work for us).
>
>
> Note that I wrote GridFS as a prototype in JGroups, and then Manik
> copied it over to Infinispan. Code quality is beta and not all methods
> have been implemented. So, in short, this is to say that GridFS needs
> some work before it can be used in production !
>
>
>>  However,
>> it defines it's own persistence scheme with specific file names and
>> serialized buckets, which would require us to have a cache-aside
>> strategy to read our metadata tree and populate the GridFilesystem
>> with it.
>>
>> What I am looking for is to be able to plug into the GridFilesystem a
>> new FileCacheStore that can load directly from an existing directory
>> tree, transparently. This will basically automatically lazy load FS
>> content across the cluster without having to pre-populate the
>> GridFilesystem programatically.
>
>
> Interesting... I guess that loader would have to know the mapping of
> files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
> get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
> /home/bela/dump.txt' from the file system and return it, unless it's in
> the local cache.
>
> This requires that your loader knows the chunk size and the
> mapping/naming between files and chunks...
>
> Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
> chunk number, the suffix should incorporate the index (in bytes), e.g.
> /home/bela/dump.txt.#6000 ?
>
> Also, a put() on the cache loader would have to update the real file,
> and *not* store a chunk named "/home/bela/dump.txt.#3"...
>
>
>
>
> --
> Bela Ban
> Lead JGroups (http://www.jgroups.org)
> JBoss / Red Hat
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Extend GridFS

Yuri de Wit
In reply to this post by Bela Ban
I got past the NoRouteToHost by adding the following properties:

-Djava.net.preferIP4Stack=true -Djgroups.bind_addr=192.168.1.102

I guess it was selecting a wrong interface by default.

thanks,
-- yuri

On Tue, Jul 12, 2011 at 4:21 AM, Bela Ban <[hidden email]> wrote:

> If you enable tracing for
> org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER, you should see
> why the socket cannot be created. The NoRouteToHost exception might be
> misleading (but do check your routing table anyway!), it might be a
> missing jgroups.bind_addr system property, used by STREAMING_STATE_TRANSFER.
>
>
> On 7/12/11 4:03 AM, Yuri de Wit wrote:
>> Following up on my setup for the GridFilesystem + custom FileCacheStore...
>>
>> There is a single node (master) that is responsible to reading/writing
>> to the file system. There will be many more nodes (slaves) that will
>> fetch File content when needed. So my setup has two configuration
>
>>
>> The master starts up fine, but when starting the 1st slave I get:
>>
>> java.net.NoRouteToHostException: No route to host
>>       at java.net.PlainSocketImpl.socketConnect(Native Method)
>>       at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>>       at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
>>       at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
>>       at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)
>>       at java.net.Socket.connect(Socket.java:529)
>>       at org.jgroups.util.Util.connect(Util.java:276)
>>       at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:510)
>>       at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:462)
>>       at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:223)
>>       at org.jgroups.protocols.FRAG2.up(FRAG2.java:189)
>>       at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
>>       at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
>>       at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891)
>>       at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
>>       at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:613)
>>       at org.jgroups.protocols.UNICAST.up(UNICAST.java:294)
>>
>
> --
> Bela Ban
> Lead JGroups (http://www.jgroups.org)
> JBoss / Red Hat
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Extend GridFS

Yuri de Wit
In reply to this post by Yuri de Wit
Ok, I made good progress today and I have close to 100% of my existing
app running on top of GridFS loading chunked data directly from our
existing data files. I have made bug fixes and enhancements to the
org.infinispan.io package and the rest of the implementation went into
the 3 new classes I mentioned before: FileCacheStore,
FileDataCacheStore and FileMetadataCacheStore. I basically got the
functionality right by running single node and now I want to start
testing in a cluster.

One of the todo's is to implement chunked writes and I will see if I
can tackle that later this week. Another todo that seems worth
exploring is GZipping the chunks as soon as it is read from the file
system or only when marshalling and sending it over the wire. Are
there ways to plug that in? I saw something like StreamingMarshaller
that could decouple this from the actual GridFilesystem.

-- yuri

On Tue, Jul 12, 2011 at 10:30 AM, Yuri de Wit <[hidden email]> wrote:

> Hi Bela,
>
> Thanks for the note re: status. I read about the experimental note in
> the docs and also noted that some of the methods were not implemented
> (e.g. InputStream.skip() and InputStream.available). Refactoring our
> code base is a bigger risk at this point and what I was able to get
> working so far is promising.
>
>> Interesting... I guess that loader would have to know the mapping of
>> files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
>> get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
>> /home/bela/dump.txt' from the file system and return it, unless it's in
>> the local cache.
>
> Correct. This is exactly how I implemented the store after looking
> into the existing FileCacheStore. It is basically a base
> FileCacheStore extending LockSupportCacheStore and two subclasses:
> FileMetadataCacheStore and FileDataCacheStore. The first subclass
> returns Metadata entries and the second one returns byte[] chunks.
>
>> This requires that your loader knows the chunk size and the
>> mapping/naming between files and chunks...
>
> Right now I am setting the preferred chunk size in the
> FileDataCacheStore properties in my config file and when I instantiate
> the GridFilesystem (traversing the configs in
> getCacheLoaderManagerConfig) I pass in the same value as the default
> chunk size there.
>
>> Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
>> chunk number, the suffix should incorporate the index (in bytes), e.g.
>> /home/bela/dump.txt.#6000 ?
>
> Interesting. This could be a bit more reliable, but it wouldnt
> eliminate the need to define the chunk size. In theory, OutputStream
> chunk size could be different than input chunking and the former could
> be client driven and the latter loader driven. However, I am not sure
> the actual benefits and maybe a single chunk size for the cluster
> could be good enough.
>
> Chunking writes is a bit more complex since you don't want to write
> chunk #5 and have the client node stop writing to the OutputStream,
> for instance (or multiple clients writing at the same time). For now I
> have disabled write chunking (worse case scenario the slaves are
> read-only and writes only through master), but I could envision a
> protocol where the chunks are written to a temp file based on a unique
> client stream id and triggered by an OutputStream.close(). A close
> would push a 'closing' chunk with or without actual data that would
> replace the original file on disk. The locking scheme on
> LockSupportCacheStore would make sure there is some FS protection and
> the last client closing the stream would win if multiple clients are
> writing to the same file at once (or maybe an explicit lock using the
> Cache API?).
>
> Another issue I found is with creating directories, but most likely
> with my rewrite. A new GridFile could become a folder or a file so the
> Metadata must have no flags set to (1) mimic the behavior of a real
> File and to (2) make sure the impl can properly implement mkdir(),
> mkdirs() and exists().
>
>>
>> Also, a put() on the cache loader would have to update the real file,
>> and *not* store a chunk named "/home/bela/dump.txt.#3"...
>>
> This is already taken care of by the new implementation i created. The
> only caveat is the chunking on write issues described above.
>
> I have cloned the infinispan project on github and would be happy to
> commit the changes somewhere there so you could take a peak, if
> interested.
>
> One last note is regarding configuration. It seems that the metadata
> cache has to use full replication (or at least it would make the most
> sense) and the data cache has to use distribution mode. It took me a
> few rounds to get it somewhat working (again, I am still learning the
> product and JGroups), but it seems that some of this configuration
> could be hidden from the user. Food for thought and the least of my
> concerns right now.
>
> Let me know what you think.
>
> - yuri
>
>
> On Tue, Jul 12, 2011 at 4:17 AM, Bela Ban <[hidden email]> wrote:
>>
>>
>> On 7/12/11 2:58 AM, Yuri de Wit wrote:
>>> Hi Galder,
>>>
>>> Thanks for your reply. Let me continue this discussion here first to
>>> validate my thinking before I create any issues in JIRA (forgive me
>>> for the lengthy follow up).
>>>
>>> First of all, thanks for this wonderful project! I started looking
>>> into Ehcache as the default caching implementation, but found it
>>> lacking on some key features when using JGroups. My guess is that all
>>> the development there is going towards the Terracotta distribution
>>> instead of JGroups. Terracotta does seems like a wonderful product,
>>> but I was hoping to stick to JGroups based caching impl. So I was
>>> happy to have found Infinispan.
>>
>>
>> Yes, I guess Terracotta (the company) has no interest in using something
>> other than Terracotta (the product) in ehcache, let alone supporting a
>> competitor...
>>
>> However, I heard that recently the JGroups plugin for ehcache (which
>> used to be terrible) was updated by some outside contributor...
>>
>>
>>> I need to create a distributed cache that loads data from the file
>>> system. It's a tree of folders/files containing mostly metadata info
>>> that changes seldom, but changes. Our mid-term goal is to move the
>>> metadata away from the file system and into a database, but that is
>>> not feasible now due to a tight deadline and the risks of refactoring
>>> too much of the code base.
>>>
>>> So I was happy to see the GridFilesystem implementation in Infinispan
>>> and the fact that clustered caches can be lazily populated (the
>>> metadata tree in the FS can be large and having all nodes in the
>>> cluster preloaded with all the data would not work for us).
>>
>>
>> Note that I wrote GridFS as a prototype in JGroups, and then Manik
>> copied it over to Infinispan. Code quality is beta and not all methods
>> have been implemented. So, in short, this is to say that GridFS needs
>> some work before it can be used in production !
>>
>>
>>>  However,
>>> it defines it's own persistence scheme with specific file names and
>>> serialized buckets, which would require us to have a cache-aside
>>> strategy to read our metadata tree and populate the GridFilesystem
>>> with it.
>>>
>>> What I am looking for is to be able to plug into the GridFilesystem a
>>> new FileCacheStore that can load directly from an existing directory
>>> tree, transparently. This will basically automatically lazy load FS
>>> content across the cluster without having to pre-populate the
>>> GridFilesystem programatically.
>>
>>
>> Interesting... I guess that loader would have to know the mapping of
>> files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
>> get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
>> /home/bela/dump.txt' from the file system and return it, unless it's in
>> the local cache.
>>
>> This requires that your loader knows the chunk size and the
>> mapping/naming between files and chunks...
>>
>> Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
>> chunk number, the suffix should incorporate the index (in bytes), e.g.
>> /home/bela/dump.txt.#6000 ?
>>
>> Also, a put() on the cache loader would have to update the real file,
>> and *not* store a chunk named "/home/bela/dump.txt.#3"...
>>
>>
>>
>>
>> --
>> Bela Ban
>> Lead JGroups (http://www.jgroups.org)
>> JBoss / Red Hat
>> _______________________________________________
>> infinispan-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>

_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Extend GridFS

Bela Ban
In reply to this post by Yuri de Wit


On 7/12/11 4:30 PM, Yuri de Wit wrote:
> Hi Bela,

>> Interesting... I guess that loader would have to know the mapping of
>> files to chunks, e.g. if a file is 10K, and the chunk size 2k, then a
>> get("/home/bela/dump.txt.#3") would mean 'read the 3rd chunk from
>> /home/bela/dump.txt' from the file system and return it, unless it's in
>> the local cache.
>
> Correct. This is exactly how I implemented the store after looking
> into the existing FileCacheStore. It is basically a base
> FileCacheStore extending LockSupportCacheStore and two subclasses:
> FileMetadataCacheStore and FileDataCacheStore. The first subclass
> returns Metadata entries and the second one returns byte[] chunks.


OK


>> This requires that your loader knows the chunk size and the
>> mapping/naming between files and chunks...
>
> Right now I am setting the preferred chunk size in the
> FileDataCacheStore properties in my config file and when I instantiate
> the GridFilesystem (traversing the configs in
> getCacheLoaderManagerConfig) I pass in the same value as the default
> chunk size there


OK, makes sense.


>> Hmm. Perhaps the mapping can be more intuitive ? Maybe instead of the
>> chunk number, the suffix should incorporate the index (in bytes), e.g.
>> /home/bela/dump.txt.#6000 ?
>
> Interesting. This could be a bit more reliable, but it wouldnt
> eliminate the need to define the chunk size. In theory, OutputStream
> chunk size could be different than input chunking and the former could
> be client driven and the latter loader driven. However, I am not sure
> the actual benefits and maybe a single chunk size for the cluster
> could be good enough.


Yes, I agree. If you wanted different chunk sizes for different files,
you could always store the chunk size in the metedata for a given file
though.


> Chunking writes is a bit more complex since you don't want to write
> chunk #5 and have the client node stop writing to the OutputStream,
> for instance (or multiple clients writing at the same time). For now I
> have disabled write chunking (worse case scenario the slaves are
> read-only and writes only through master), but I could envision a
> protocol where the chunks are written to a temp file based on a unique
> client stream id and triggered by an OutputStream.close(). A close
> would push a 'closing' chunk with or without actual data that would
> replace the original file on disk. The locking scheme on
> LockSupportCacheStore would make sure there is some FS protection and
> the last client closing the stream would win if multiple clients are
> writing to the same file at once (or maybe an explicit lock using the
> Cache API?).


OK


> Another issue I found is with creating directories, but most likely
> with my rewrite. A new GridFile could become a folder or a file so the
> Metadata must have no flags set to (1) mimic the behavior of a real
> File and to (2) make sure the impl can properly implement mkdir(),
> mkdirs() and exists().


Yep, as I said the impl is incomplete...


> I have cloned the infinispan project on github and would be happy to
> commit the changes somewhere there so you could take a peak, if
> interested.


Interested yes, but I have no time to look at this... :-( I'm busy
working on JGroups 3.0, which should be beta1 soon...

I hope though that your changes go into the Infinispan Git repo, and
maybe you should publish an article about this on Infoq... ?

> One last note is regarding configuration. It seems that the metadata
> cache has to use full replication (or at least it would make the most
> sense) and the data cache has to use distribution mode.


Yes, that's the idea. Metadata should be small(ish), so full replication
is warranted. This of course also depends on what we cram into metadata;
if it becomes too big, or we have many small files, then it might make
sense to switch to distribution. Anyway, at the end of the day, this is
a configuration issue and doesn't require code changes.


--
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
Reply | Threaded
Open this post in threaded view
|

Re: [infinispan-dev] Extend GridFS

Bela Ban
In reply to this post by Yuri de Wit


On 7/13/11 4:24 AM, Yuri de Wit wrote:
> Ok, I made good progress today and I have close to 100% of my existing
> app running on top of GridFS loading chunked data directly from our
> existing data files. I have made bug fixes and enhancements to the
> org.infinispan.io package and the rest of the implementation went into
> the 3 new classes I mentioned before: FileCacheStore,
> FileDataCacheStore and FileMetadataCacheStore.


Excellent !

I suggest open a pull request on Infinispan, so the changes make it into
the code base.


>  I basically got the
> functionality right by running single node and now I want to start
> testing in a cluster.


OK


> One of the todo's is to implement chunked writes and I will see if I
> can tackle that later this week. Another todo that seems worth
> exploring is GZipping the chunks as soon as it is read from the file
> system or only when marshalling and sending it over the wire.

Yes. One option is to add COMPRESS to the JGroups protocol stack used.
Enabled tracing for org.jgroups.protocols.COMPRESS allows you to see the
compression ratio, to confirm that it makes sense to use compression. Of
course this depends on the type of data you put into the virtual file
system.


> Are there ways to plug that in? I saw something like StreamingMarshaller
> that could decouple this from the actual GridFilesystem.

Yes, you could always do this at the Infinispan level (or even at the
level of GridFS, but that would mean duplicating functionality).

--
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev