Hi,
interesting ideas, you give me chance to report some tests I did way back; in short:
Quote:
I also see a previous ticket on this topic but nothing done on it
http://opensource.atlassian.com/project ... SEARCH-135
My index takes alot of time to build, i dont want to have to re-create every time i want to use the RamDirectory and would be nice to be able to load an existing index into the RamDirectory.
Why would you like to use RamDirectory instead?
you might think for performance reasons - this is the same thing I thought some time back - still benchmarking proved me wrong on this assumption. There's no real reason to use RAMDirectory, besides it's good in testing scenarios to reset the state back after execution. (Because copying index to RAM means you're loosing all changes at shutdown!)
Quote:
Also one other thing that i couldnt find anything on, when using the FSDirectoryProvider is there anyway to configure it that the index is allowed to take xyzMB in the VM and the rest you have to go to disk for. So if you have configured your directory to allow 100MB in memory, anything above that it will evict it back to disk. Looking for something sort of like Jboss cache has with its cache loaders, you can allow a certain amount to be loaded in the region and if it exceeds that it will evict it from memory. If something doesnt exist in memory it then goes to disk and retrieves it.
Right, you've seen another good reason to not use RAMDirectory in production: you can't predict the amount of memory it's going to need.
Trying to forcefully cache I/O resources in memory is not the best approach, the operating system dedicates available memory for this purpose, just make sure you don't assign too much memory to the JVM and any clever operating system will cache file reads.
The name "RAMDirectory" is
forcing RAM usage, even if you didn't want. The "FSDirectory" is
not forcing anything, it just delegates to the FS layer. If you use a good FS you're going to have it done fast and extremely well, and not necessarily having your disk spinning, it might as well stay in RAM yet again.
Also Hibernate Search does cache and reuse structured segments of the Index, see:
http://fisheye.jboss.org/browse/Hibernate/search/trunk/src/main/java/org/hibernate/search/reader/SharingBufferReaderProvider.java?r=17630http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-279Quote:
So sort of like a hybrid between the RamDirectory and FSDirectory.
FSDirectory is a good hybrid, as your OS is going to apply all optimizations you need and use an optimal size of available memory for caching. RAMDirectory is more a toy for tests, as it's just a HashMap. Additionally you have the advantage to be sure that when a change is committed, it's not going to be lost after application shutdown/kill.
Nice you're interested in this aspect. Help and more suggestions are welcome, you might like to see:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-402The most interesting performance improvement is having an IndexReader "warmed up" before giving it back to the application: the first query on a just opened IndexReader is always the slowest, so doing a fake query in background on a directory which is going to be shared among the application threads should improve throughput.
This needs however to open the index "a while before" the application need for it, breaking the guarantee of always-up-to-date index.