-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 
Author Message
 Post subject: Creating Multiple Indexes based on language
PostPosted: Tue May 17, 2011 2:03 am 
Regular
Regular

Joined: Tue May 17, 2011 1:45 am
Posts: 52
Hi,

I need to index documents for the following languages

1. English
2. Chinese
3. Japanese
4. Korean
5. German
6. French
7. Spanish
8. Dutch

My questions are as follows

1. Can Lucene index documents in the above languages?
2. How can Hibernate Search be used to index a document in all these languages? ---> An example with code for any two languages will be very helpful
3. During Manual Indexing, how can Hibernate Search be configured/coded to ensure that the same document has different indexes based on different language, or rather during indexing process can Hibernate index the document in all the languages? How?
4. During query time, how do we determine which index to use, (given that during query time, the language is known and passed as a parameter to the DAO) ?

Thanks
David


Top
 Profile  
 
 Post subject: Re: Creating Multiple Indexes based on language
PostPosted: Tue May 17, 2011 6:38 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
1. Can Lucene index documents in the above languages?

Yes, you could index them all in the same way using a StandardAnalyzer (not sure about Japanese) or you could use some of the additional libraries in Lucene which do language-spcialized analysis. The Snowball project is considered very good at this, and I believe it supports all languages you listed. You don't need to use snowball for all of them; you can pick the analyzer of your choice or write your own for each one.

Quote:
2. How can Hibernate Search be used to index a document in all these languages? ---> An example with code for any two languages will be very helpful

Simple solution: use the same strategy for all documents.
Better solution: you'll need to pick the proper analyzer to be used for each entity instance;
example is in the docs: http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#d0e3385

Quote:
3. During Manual Indexing, how can Hibernate Search be configured/coded to ensure that the same document has different indexes based on different language, or rather during indexing process can Hibernate index the document in all the languages? How?


Have a look also to Index sharding: you can define a strategy to keep each language in it's own index. during search, you'll transparently search across all indexes, or you can use a custom filter as explained here:
http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#query-filter-shard
this filter will have the capability to "pick" the appropriate index according to the filter parameter (i.e. to implement something like "search for documents about hibernate in the dutch language" )

Quote:
4. During query time, how do we determine which index to use, (given that during query time, the language is known and passed as a parameter to the DAO) ?

as above ;)

let me know if you need more pointers; when you'll have it working, blog about it! many people ask about this so it would be nice to find your instructions.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.