-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 
Author Message
 Post subject: [Search] Any known shard-ed DirectoryProviders?
PostPosted: Tue Sep 06, 2011 3:22 pm 
Beginner
Beginner

Joined: Tue May 11, 2004 12:20 am
Posts: 33
Hi,

I've learned a lot from the "Clustering" chapter in "Hibernate Search in Action".
And I understand the "default vendor-independent suggestion" of JMS/master/slaves. Thanks for the clear description!

Still, I wonder about the other route mentioned there: Shard-ed implementations of Lucene Directory (vendor specific, and with appropriate DirectoryProvider).
Could I kindly ask for references to specific "Sharded Lucene Directories" that really integrate with Hibernate Search?
E.g. I would love to find some "Hadoop Lucene Directory", that I can just plug into Hibernate Search (through DirectoryProvider), and automatically get Sharded behavior.

I would note that our data is "appropriate for sharding", e.g. a chain of stores where each store can be a "shard".
Thanks :)


Top
 Profile  
 
 Post subject: Re: [Search] Any known shard-ed DirectoryProviders?
PostPosted: Tue Sep 06, 2011 7:14 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi solmyr72,
I'm not sure where we mention any "vendor specifics"? Sharding works out of the box with the provided DirectoryProvider(s), there's no need for extra implementations, though it is possible if you need special customization to plug in your own components.

To enable sharding all you need to do is configure it as explained here: http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#search-configuration-directory-sharding
Any DirectoryProvider implementation will work fine in combination with sharding.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: [Search] Any known shard-ed DirectoryProviders?
PostPosted: Sun Sep 11, 2011 11:26 am 
Beginner
Beginner

Joined: Tue May 11, 2004 12:20 am
Posts: 33
Hi,

Thanks. I'm afraid I got it wrong, and would greatly appreciate some pointers.
Here's what I'm trying to achieve:
Say I have a Chain of Book-stores, each store in a different country.
- Books from the Japanese store should be indexed on 2 machines, "japan1" and "japan2" (replicas, for load-balancing/failover)
- Books from the UK store should be indexed on 2 other machines, "uk1" and "uk2" (replicas again)
- etc...

By default, users search their local store (e.g. Japanese users search the Japan store).
On less frequent occasions, they ask to "search the entire chain", collecting results from all indexes.

If I were to write it "from scratch" (without Hibernate Search), I'd probably:
- Setup a master/slave inside each country (e.g. "japan1" and "japan2" are master/slave, sharing a lucene index)
- When indexing books, I'd update the correct index (on the correct machine) based on property Book.country
- When searching the entire chain, I'd have some "map/reduce"

Is this supported by Hibernate Search, in some elegant way?
Or if not, could you please tell (shortly) what alternative is recommended?
Thanks.


Top
 Profile  
 
 Post subject: Re: [Search] Any known shard-ed DirectoryProviders?
PostPosted: Sun Sep 11, 2011 1:51 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
Here's what I'm trying to achieve:
Say I have a Chain of Book-stores, each store in a different country.
- Books from the Japanese store should be indexed on 2 machines, "japan1" and "japan2" (replicas, for load-balancing/failover)
- Books from the UK store should be indexed on 2 other machines, "uk1" and "uk2" (replicas again)
- etc...

With Hibernate Search 3.x the sharding implementation still expects that all shards "share" the same master, so if you need to use that version your options are:

A- Use two different Hibernate instances in the same application. In this case you can configure each one completely differently, and at application level you would have to code the access to a search on both. It's not very easy to send queries remotely, and then aggregate them back especially as relevance values might not be comparable, or you'll have to sort the potentially big results in memory. So in this case I'd recommend to map the other index locally too, in read-only mode, and have it replicated with some external script like rsync, or have them share the index using Infinispan (in memory sharing all indexes across all nodes).

B- Use sharding, but the machines hosting japan1 and uk1 should be the same, and have a second machine hosting both uk2 and japan2. This solution is very simple but assumes performance of a single machine is enough to host both uk+japan applications.

Quote:
If I were to write it "from scratch" (without Hibernate Search), I'd probably:
- Setup a master/slave inside each country (e.g. "japan1" and "japan2" are master/slave, sharing a lucene index)
- When indexing books, I'd update the correct index (on the correct machine) based on property Book.country
- When searching the entire chain, I'd have some "map/reduce"


With Hibernate Search 4 you can use a different master/slave configuration on each index, or even on each shard of an index. You would be able to search "the entire chain" using the standard API (no map/reduce is needed): just define the usual fulltext query and enable a ShardSensitiveOnlyFilter to select which indexes you want to be searched.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: [Search] Any known shard-ed DirectoryProviders?
PostPosted: Mon Sep 12, 2011 3:36 pm 
Beginner
Beginner

Joined: Tue May 11, 2004 12:20 am
Posts: 33
s.grinovero wrote:
It's not very easy to send queries remotely, and then aggregate them back especially as relevance values might not be comparable, or you'll have to sort the potentially big results in memory(...)

With Hibernate Search 4 you can use a different master/slave configuration on each index, or even on each shard of an index. You would be able to search "the entire chain" using the standard API (no map/reduce is needed)


Thanks so much for this professional reply.

Could I just kindly ask about Hibernate Search 4 - how does it manage such "entire chain" searches, across Sharded indexes?
Namely how does it handle the Ranking problem you mentioned?
Does it encapsulate some very brilliant map/reduce algorithm?
Or does it copy all available index files into some local copy (as you described)? If so, is it file-system copying? Or infinispan/memory based?

Thanks again.


Top
 Profile  
 
 Post subject: Re: [Search] Any known shard-ed DirectoryProviders?
PostPosted: Mon Sep 12, 2011 6:49 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
Could I just kindly ask about Hibernate Search 4 - how does it manage such "entire chain" searches, across Sharded indexes?
Namely how does it handle the Ranking problem you mentioned?
Does it encapsulate some very brilliant map/reduce algorithm?
Or does it copy all available index files into some local copy (as you described)? If so, is it file-system copying? Or infinispan/memory based?

Thanks again.

It's actually not very complex, all what we do in Hibernate Search is integrate and leverage a fair experience with Lucene's low level workings.
All search services of Lucene are built on top of the IndexReader API, an IndexReader doesn't expose stream but statistics about term frequencies, it's not very hard to open a virtual IndexReader which combines several IndexReaders to answer for each term the combined stats, on top of which the queries are run. It also has to expose an API similar to an enumeration of documents, in this case you can enumerate the first set and then start the second one. I'm of course over simplifying and using very informal terminology, my point is just that we combine this without making copies, and actually use the Lucene API which provides extensive support to run queries across multiple indexes.
In theory there is a discrepancy of the exact scores being provided by such a combined index, but in practice this is usually not a significant difference, at least not enough to significantly alter the order of results when returned in order of relevancy.

For details.. please see the code, in the case of Hibernate Search 4 a starting point is
org.hibernate.search.reader.impl.MultiReaderFactory
and
org.apache.lucene.index.MultiReader

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: [Search] Any known shard-ed DirectoryProviders?
PostPosted: Tue Sep 13, 2011 9:16 am 
Beginner
Beginner

Joined: Tue May 11, 2004 12:20 am
Posts: 33
Thanks, you guys are the greatest.
I learned a lot :)


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.