Quote:
If a user search book name, say, they input Microsoft access 2007, books with title or description contains microsoft, access or 2007 returned. That is what we expected. Some of books are totally unrelated because of keyword 2007. I am looking for a solution to understand importance of each keywords. In that case, 2007 is less important in search. But for that search, there is no difference for microsoft, access or 2007.
Are you testing on a small corpus? the importance of each token is relative to how frequent it's used: if you have many books mentioning "2007", then automatically "2007" will be not very significant in the calculation of the score. So basically, Lucene should solve this for you without any explicit direction.
Quote:
The second user case: Is there a good analyzer that can use in indexing and querying to support multiple phrases? I thought the default analyzer of hibernate search just tokenize search words into single word?
That's correct, that's the default. There's are many alternatives, like org.apache.lucene.search.MultiPhraseQuery and
import org.apache.lucene.search.PhraseQuery, both supported by the QueryBuilder DSL; this is taken from the testsuite (which you can find in the sources) :
Code:
final QueryBuilder monthQb = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity( Month.class ).get();
Query query = monthQb.
phrase()
.onField( "mythology" )
.sentence( "colder and whitening" )
.createQuery();