Quote:
Here's what I'm trying to achieve:
Say I have a Chain of Book-stores, each store in a different country.
- Books from the Japanese store should be indexed on 2 machines, "japan1" and "japan2" (replicas, for load-balancing/failover)
- Books from the UK store should be indexed on 2 other machines, "uk1" and "uk2" (replicas again)
- etc...
With Hibernate Search 3.x the sharding implementation still expects that all shards "share" the same master, so if you need to use that version your options are:
A- Use two different Hibernate instances in the same application. In this case you can configure each one completely differently, and at application level you would have to code the access to a search on both. It's not very easy to send queries remotely, and then aggregate them back especially as relevance values might not be comparable, or you'll have to sort the potentially big results in memory. So in this case I'd recommend to map the other index locally too, in read-only mode, and have it replicated with some external script like rsync, or have them share the index using Infinispan (in memory sharing all indexes across all nodes).
B- Use sharding, but the machines hosting japan1 and uk1 should be the same, and have a second machine hosting both uk2 and japan2. This solution is very simple but assumes performance of a single machine is enough to host both uk+japan applications.
Quote:
If I were to write it "from scratch" (without Hibernate Search), I'd probably:
- Setup a master/slave inside each country (e.g. "japan1" and "japan2" are master/slave, sharing a lucene index)
- When indexing books, I'd update the correct index (on the correct machine) based on property Book.country
- When searching the entire chain, I'd have some "map/reduce"
With Hibernate Search 4 you can use a different master/slave configuration on each index, or even on each shard of an index. You would be able to search "the entire chain" using the standard API (no map/reduce is needed): just define the usual fulltext query and enable a ShardSensitiveOnlyFilter to select which indexes you want to be searched.