-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: Hibernate Search: Implementing case-insensitive sort
PostPosted: Fri Apr 11, 2008 8:53 pm 
Newbie

Joined: Wed Dec 05, 2007 9:17 pm
Posts: 4
Hello,

I am trying to implement case-insensitive sorting using Hibernate Search for a couple of text-based fields.

Lucene out-of-the-box, using the following sort type:

Code:
org.apache.lucene.search.SortField.STRING


Defaults to case-sensitive sorting. I read the Lucene documentation, and have gone down the path of extending the Lucene class:

Code:
org.apache.lucene.search.SortComparator


This worked for one field, but not the other. The first field sorts great in a case-INsensitive manner. The second field, however, gives me a stack trace at search time:

Code:
java.lang.NullPointerException
        at org.apache.lucene.search.SortComparator$1.compare(SortComparator.java:54)
        at org.apache.lucene.search.FieldSortedHitQueue.lessThan(FieldSortedHitQueue.java:125)
        at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:80)
        at org.apache.lucene.search.FieldSortedHitQueue.insertWithOverflow(FieldSortedHitQueue.java:108)
        at org.apache.lucene.search.TopFieldDocCollector.collect(TopFieldDocCollector.java:61)
        at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:320)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:124)
        at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
        at org.apache.lucene.search.Hits.<init>(Hits.java:77)
        at org.apache.lucene.search.Searcher.search(Searcher.java:64)
        at org.hibernate.search.query.FullTextQueryImpl.getHits(FullTextQueryImpl.java:270)
        at org.hibernate.search.query.FullTextQueryImpl.list(FullTextQueryImpl.java:232)
        at com.connect.vine.search.SearchManagerImpl.searchProductsWithMultiCriteria(SearchManagerImpl.java:193)


The line that is failing is this in SortComparator.java:

Code:
      public int compare (ScoreDoc i, ScoreDoc j) {
        return cachedValues[i.doc].compareTo (cachedValues[j.doc]); // cachedValues is null at either i.doc or j.doc
      }


Before I go into depth of how I've configured Hibernate Search and my code... I'm just wondering - am I going down the right path of using Lucene's SortComparator? Or, is there a simpler way of doing case-insensitive sorting using Hibernate Search?

I have been struggling with this for almost a week now, but still no progress.

Thank you very much!

-Frank


Hibernate Search version: 3.0.1.GA
Lucene version: 2.3.1

Hibernate version: 3.2.6.ga
Hibernate Annotations version: 3.3.0.ga
Hibernate commons-annotations version: 3.3.0.ga

Name and version of the database you are using: Oracle


Top
 Profile  
 
 Post subject:
PostPosted: Sat Apr 12, 2008 5:13 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi Frank,

Using SortComparator is one way if doing it. There are two other ways which come to my mind:

    You can write a custom StringBridge (http://www.hibernate.org/hib_docs/search/reference/en/html_single/#d0e1536) which you use on the property you want to sort on. All this 'LowerCaseStringBridge' would do is to lowercase the property and put it into the index. If you need this property also in a different form, for example tokenized, you can just index the property multiple times using @Fields.
    Another way would be to work with a custom analyzer. You could implement a custom LowerCaseAnalyzer which internally just uses a LowerCaseFilter (check the Lucene API) to lowercase the tokens. This could look something like this:
    Code:
        @Fields( {
                @Field(index = Index.TOKENIZED),
                @Field(name = "summary_forSort", index = Index.TOKENIZED, analyzer = @Analyzer(impl = LowerCaseAnalyzer.class)
    )
                } )
        public String getSummary() {
            return summary;
        }



The easiest/best might be the StringBridge, but that's a matter of taste I guess.

--Hardy


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Implementing case-insensitive sort
PostPosted: Fri Jun 17, 2011 5:42 am 
Newbie

Joined: Fri Feb 24, 2006 9:15 am
Posts: 17
Hi,

I'm trying to do the same thing, sorting case insensitive.
As I already have a customanalyzer defined, I thought I'd just use it for the sort field:
Code:
@Field(name = "nombre_forSort", index = Index.UN_TOKENIZED, store = Store.YES, analyzer = @Analyzer(definition = "customanalyzer"))


The customanalyzer has the following definition:
Code:
@AnalyzerDef(name = "customanalyzer", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = {
   @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class),
   @TokenFilterDef(factory = LowerCaseFilterFactory.class) })

So in theory, it should lower case the field value.

However, when I look in the index with Luke, I can see that it still has uppercase characters.
(The index seems to have been updated correctly, as I changed the store value to Store.YES, and I can now see the content of the field)

Is it possible that filters are not applied somehow on fields? Or any other idea why the field isn't lowercased?

Cheers, Bo


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Implementing case-insensitive sort
PostPosted: Fri Jun 17, 2011 7:18 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Code:
/**
* Index the field's value without using an Analyzer, so it can be searched.
* As no analyzer is used the value will be stored as a single term. This is
* useful for unique Ids like product numbers.
*/
UN_TOKENIZED


So the problem is that you're not applying the custom analyzer. I'm wondering if we shouldn't rise an exception at deploy time, what do you think? If so, please open a JIRA issue. We won't be able to detect people expecting the current in scope analyzer, the global analyzer or class level analyzer to be applied, but at least specifying one explicitly as you did could be prevented.

Keep in mind that for sorting purposes it's not strictly required that the field be untokenized, but it's very important that the analyzer outputs a single token (so by all means avoid white space splitting, which most tokenizers do by default).
As in most practical cases, I'd suggest to index a field which needs both sorting and be searchable to be indexed twice.

Also keep in mind when looking into the index with Luke that what you have Stored is always a copy of the original string, so also not analyzed.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.