Hibernate Search: Implementing case-insensitive sort

zentropy · **Joined:** Wed Dec 05, 2007 9:17 pm **Posts:** 4

Hello,

I am trying to implement case-insensitive sorting using Hibernate Search for a couple of text-based fields.

Lucene out-of-the-box, using the following sort type:

Code:

org.apache.lucene.search.SortField.STRING

Defaults to case-sensitive sorting. I read the Lucene documentation, and have gone down the path of extending the Lucene class:

Code:

org.apache.lucene.search.SortComparator

This worked for one field, but not the other. The first field sorts great in a case-INsensitive manner. The second field, however, gives me a stack trace at search time:

Code:

java.lang.NullPointerException
        at org.apache.lucene.search.SortComparator$1.compare(SortComparator.java:54)
        at org.apache.lucene.search.FieldSortedHitQueue.lessThan(FieldSortedHitQueue.java:125)
        at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:80)
        at org.apache.lucene.search.FieldSortedHitQueue.insertWithOverflow(FieldSortedHitQueue.java:108)
        at org.apache.lucene.search.TopFieldDocCollector.collect(TopFieldDocCollector.java:61)
        at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:320)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:124)
        at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
        at org.apache.lucene.search.Hits.<init>(Hits.java:77)
        at org.apache.lucene.search.Searcher.search(Searcher.java:64)
        at org.hibernate.search.query.FullTextQueryImpl.getHits(FullTextQueryImpl.java:270)
        at org.hibernate.search.query.FullTextQueryImpl.list(FullTextQueryImpl.java:232)
        at com.connect.vine.search.SearchManagerImpl.searchProductsWithMultiCriteria(SearchManagerImpl.java:193)

The line that is failing is this in SortComparator.java:

Code:

      public int compare (ScoreDoc i, ScoreDoc j) {
        return cachedValues[i.doc].compareTo (cachedValues[j.doc]); // cachedValues is null at either i.doc or j.doc
      }

Before I go into depth of how I've configured Hibernate Search and my code... I'm just wondering - am I going down the right path of using Lucene's SortComparator? Or, is there a simpler way of doing case-insensitive sorting using Hibernate Search?

I have been struggling with this for almost a week now, but still no progress.

Thank you very much!

-Frank

Hibernate Search version: 3.0.1.GA
Lucene version: 2.3.1

Hibernate version: 3.2.6.ga
Hibernate Annotations version: 3.3.0.ga
Hibernate commons-annotations version: 3.3.0.ga

Name and version of the database you are using: Oracle

hardy.ferentschik · **Posted:** Sat Apr 12, 2008 5:13 am

Hi Frank,

Using SortComparator is one way if doing it. There are two other ways which come to my mind:

StringBridge

http://www.hibernate.org/hib_docs/search/reference/en/html_single/#d0e1536

Code:

    @Fields( {
            @Field(index = Index.TOKENIZED),
            @Field(name = "summary_forSort", index = Index.TOKENIZED, analyzer = @Analyzer(impl = LowerCaseAnalyzer.class)
)
            } )
    public String getSummary() {
        return summary;
    }

The easiest/best might be the StringBridge, but that's a matter of taste I guess.

--Hardy

boudewijn · **Joined:** Fri Feb 24, 2006 9:15 am **Posts:** 17

Hi,

I'm trying to do the same thing, sorting case insensitive.
As I already have a customanalyzer defined, I thought I'd just use it for the sort field:

Code:

@Field(name = "nombre_forSort", index = Index.UN_TOKENIZED, store = Store.YES, analyzer = @Analyzer(definition = "customanalyzer"))

The customanalyzer has the following definition:

Code:

@AnalyzerDef(name = "customanalyzer", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = {
   @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class),
   @TokenFilterDef(factory = LowerCaseFilterFactory.class) })

So in theory, it should lower case the field value.

However, when I look in the index with Luke, I can see that it still has uppercase characters.
(The index seems to have been updated correctly, as I changed the store value to Store.YES, and I can now see the content of the field)

Is it possible that filters are not applied somehow on fields? Or any other idea why the field isn't lowercased?

Cheers, Bo

sanne.grinovero · **Posted:** Fri Jun 17, 2011 7:18 pm

Code:

/**
* Index the field's value without using an Analyzer, so it can be searched.
* As no analyzer is used the value will be stored as a single term. This is
* useful for unique Ids like product numbers.
*/
UN_TOKENIZED

So the problem is that you're not applying the custom analyzer. I'm wondering if we shouldn't rise an exception at deploy time, what do you think? If so, please open a JIRA issue. We won't be able to detect people expecting the current in scope analyzer, the global analyzer or class level analyzer to be applied, but at least specifying one explicitly as you did could be prevented.

Keep in mind that for sorting purposes it's not strictly required that the field be untokenized, but it's very important that the analyzer outputs a single token (so by all means avoid white space splitting, which most tokenizers do by default).
As in most practical cases, I'd suggest to index a field which needs both sorting and be searchable to be indexed twice.

Also keep in mind when looking into the index with Luke that what you have Stored is always a copy of the original string, so also not analyzed.