-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: Hibernate Search : Load existing index into RamDirectory?
PostPosted: Fri Oct 23, 2009 7:17 am 
Regular
Regular

Joined: Thu Oct 08, 2009 10:34 am
Posts: 55
Hi Guys,
I have created my index on disk but would also like to load it into memory when i want, (im just playing around with hibernate search at the moment.)

I would have thought configuring HS like so
Code:
<property name="hibernate.search.default.directory_provider" value="org.hibernate.search.store.RAMDirectoryProvider" />
<property name="hibernate.search.default.indexBase" value="C:/lucene/indexes" />       


That underneath HS would load the index automatically into memory but it doesnt. I see that lucene provides this ability from its API so how do i do it in HS??
[url]
http://lucene.apache.org/java/2_4_0/api ... io.File%29
[/url]

I also see a previous ticket on this topic but nothing done on it
http://opensource.atlassian.com/project ... SEARCH-135
My index takes alot of time to build, i dont want to have to re-create every time i want to use the RamDirectory and would be nice to be able to load an existing index into the RamDirectory.

Also one other thing that i couldnt find anything on, when using the FSDirectoryProvider is there anyway to configure it that the index is allowed to take xyzMB in the VM and the rest you have to go to disk for. So if you have configured your directory to allow 100MB in memory, anything above that it will evict it back to disk. Looking for something sort of like Jboss cache has with its cache loaders, you can allow a certain amount to be loaded in the region and if it exceeds that it will evict it from memory. If something doesnt exist in memory it then goes to disk and retrieves it.

So sort of like a hybrid between the RamDirectory and FSDirectory. Have looked at the "ram_buffer_size" and the rest of the properties but they dont seem to be what Im looking for. Does this functionality exist??

Thanks for any help,
LL


Top
 Profile  
 
 Post subject: Re: Hibernate Search : Load existing index into RamDirectory?
PostPosted: Tue Oct 27, 2009 7:00 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
interesting ideas, you give me chance to report some tests I did way back; in short:

Quote:
I also see a previous ticket on this topic but nothing done on it
http://opensource.atlassian.com/project ... SEARCH-135
My index takes alot of time to build, i dont want to have to re-create every time i want to use the RamDirectory and would be nice to be able to load an existing index into the RamDirectory.

Why would you like to use RamDirectory instead?
you might think for performance reasons - this is the same thing I thought some time back - still benchmarking proved me wrong on this assumption. There's no real reason to use RAMDirectory, besides it's good in testing scenarios to reset the state back after execution. (Because copying index to RAM means you're loosing all changes at shutdown!)

Quote:
Also one other thing that i couldnt find anything on, when using the FSDirectoryProvider is there anyway to configure it that the index is allowed to take xyzMB in the VM and the rest you have to go to disk for. So if you have configured your directory to allow 100MB in memory, anything above that it will evict it back to disk. Looking for something sort of like Jboss cache has with its cache loaders, you can allow a certain amount to be loaded in the region and if it exceeds that it will evict it from memory. If something doesnt exist in memory it then goes to disk and retrieves it.

Right, you've seen another good reason to not use RAMDirectory in production: you can't predict the amount of memory it's going to need.
Trying to forcefully cache I/O resources in memory is not the best approach, the operating system dedicates available memory for this purpose, just make sure you don't assign too much memory to the JVM and any clever operating system will cache file reads.
The name "RAMDirectory" is forcing RAM usage, even if you didn't want. The "FSDirectory" is not forcing anything, it just delegates to the FS layer. If you use a good FS you're going to have it done fast and extremely well, and not necessarily having your disk spinning, it might as well stay in RAM yet again.

Also Hibernate Search does cache and reuse structured segments of the Index, see:
http://fisheye.jboss.org/browse/Hibernate/search/trunk/src/main/java/org/hibernate/search/reader/SharingBufferReaderProvider.java?r=17630
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-279

Quote:
So sort of like a hybrid between the RamDirectory and FSDirectory.

FSDirectory is a good hybrid, as your OS is going to apply all optimizations you need and use an optimal size of available memory for caching. RAMDirectory is more a toy for tests, as it's just a HashMap. Additionally you have the advantage to be sure that when a change is committed, it's not going to be lost after application shutdown/kill.

Nice you're interested in this aspect. Help and more suggestions are welcome, you might like to see:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-402
The most interesting performance improvement is having an IndexReader "warmed up" before giving it back to the application: the first query on a just opened IndexReader is always the slowest, so doing a fake query in background on a directory which is going to be shared among the application threads should improve throughput.
This needs however to open the index "a while before" the application need for it, breaking the guarantee of always-up-to-date index.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search : Load existing index into RamDirectory?
PostPosted: Thu Oct 29, 2009 9:21 am 
Regular
Regular

Joined: Thu Oct 08, 2009 10:34 am
Posts: 55
Hi Sanne,

Quote:
Why would you like to use RamDirectory instead?

As I said Im playing around with HS, Im curious and I just wanted to see if there was much of a difference between RAMDirectory and FSDirectory for myself. I think any user beginning with HS would like to see for themselves the difference. I assume one can do this programtically oneself (havent had time at the moment to come back to playing with it) but its just something that as a user playing around with HS, I thought it would come out of the box(Remember also that lucene offers this feature for their users). If a users index can be held in memory then surely people will want to try it out for both RamDirectory and FSDirectory. In my case the DEV machine with my DB running on it is no where near up to spec and takes a while for me to index (Currently awaiting proper hardware but wont come till DEC ). Hence why when I generated my index for FSDirectory I would just like to load it into the RAMDirectory.

Quote:
you might think for performance reasons - this is the same thing I thought some time back - still benchmarking proved me wrong on this assumption. There's no real reason to use RAMDirectory, besides it's good in testing scenarios to reset the state back after execution.

I would have thought that every case is unique, I mean there are so many variant factors that one shouldnt automatically assume that FSDirectory will fit everyones solution. Im sure hardware, operating system, queries,dataset etc etc all play a factor on whether RAM or FSDirectory are suitable. Any any of the literature that I have read so far all indicate that RAM is quicker than FSDirectory every time. Now by how much is probably dependant on your hardware,os setup but none the less faster.

Quote:
(Because copying index to RAM means you're loosing all changes at shutdown!)


If I used the RAM option with the default.indexBase set, I would also expect the index to be written back to where it was read in, resulting in no loss of changes.
One other reason why i wanted to try my index in RAM mode was that the exact same queries I used in luke vs my own test client produced a big difference in the times, queries in luke would take under 10ms and via the test client was taking 300ms. Now everything was done of my local machine and i wanted to see if could get closer to the figures luke was producing, I assume that luke uses RAM under the hood and wanted to switch effortlessly into RAM mode using my already built index. Unfortunately it looks like HS doesnt offer this out of the box.

Thanks for the other information Sanne, much appreciated.

Cheers,
LL


Top
 Profile  
 
 Post subject: Re: Hibernate Search : Load existing index into RamDirectory?
PostPosted: Thu Oct 29, 2009 9:38 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
you have some good points I didn't consider, thanks.
A patch would be quite simple, would you like to contribute it? As you say Lucene provides support already, so you know what do to.
Have a look into org.hibernate.search.store.RAMDirectoryProvider, it's extremely simple, and feel free to ask for more directions.
The properties contain the configuration properties you need to read from.

Some points to consider:
[*] read from FS at startup (optionally, should be configurabile)
[*] write back to FS at shutdown (optionally, should be configurabile)
[*] document the behaviour, especially warning about this case: some setups are doing local clustering: 2 instances sharing the directory. This obviously can't work, so they should avoid using this feature of writing back to FS.

adding above to JIRA.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.