-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: Multilingual unicode sorting
PostPosted: Mon Feb 13, 2012 7:27 pm 
Regular
Regular

Joined: Tue May 17, 2011 1:45 am
Posts: 52
Hi,

How can I sort text coming in multiple languages? I have a DB filed called Country which is entered by users from 15 different countries. In the UI we use Hibernate Search to display data and sort. However sorting is creating an issue. Its not sorting based on multiple languages. Only English seems to get acknowledged. Example if I have 3 texts like
1. 中國
2. China
3. Germany
4. France

When sorted it is coming like this

1. China
2. 中國
3. France
4. Germany


I was expecting 中國 to be at the end because its unicode is greater.

I am using the following code to index the city field
@AnalyzerDef(name = "keyword", tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class), filters = {
@TokenFilterDef(factory = CollationKeyFilterFactory.class, params = {
@Parameter(name = "language", value = ""),
@Parameter(name = "strength", value = "primary")
}),
@TokenFilterDef(factory = LowerCaseFilterFactory.class)
})


@Transient
@Field(name="city", index = Index.UN_TOKENIZED,store = Store.YES, analyzer = @Analyzer(definition = "keyword")) {
public String getCity() {
......
}

However this does not seem to work


Top
 Profile  
 
 Post subject: Re: Multilingual unicode sorting
PostPosted: Thu Feb 16, 2012 6:27 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

I think your problem is that you don't specify a language parameter in the CollationKeyFilterFactory. Leaving this parameter blank means that the system Locale is used for indexing/searching. I am not sure whether there is a Collator which purely sorts looking at a unicode defined order.

--Hardy


Top
 Profile  
 
 Post subject: Re: Multilingual unicode sorting
PostPosted: Thu Feb 16, 2012 7:16 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Maybe a RuleBasedCollator is something you could look into, but I am not sure how much effort it is to create the required sorting rules. Have a look at the java.text.RuleBasedCollator docs.


Top
 Profile  
 
 Post subject: Re: Multilingual unicode sorting
PostPosted: Tue Feb 21, 2012 2:53 pm 
Regular
Regular

Joined: Tue May 17, 2011 1:45 am
Posts: 52
Well leaving the language parameter empty implies that it is going to support all the available languages as per the Solr wiki.

Wondering if there is any other way. Interestingly even empty values are not in correct order in the sort.

Is there an ASCII sort available in Lucene?


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.