-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 
Author Message
 Post subject: Colons in terms - indexing vs parsing
PostPosted: Mon Jan 31, 2011 11:07 pm 
Beginner
Beginner

Joined: Tue Oct 07, 2008 7:05 pm
Posts: 27
I have an unusual issue in that I'm searching on terms that have colons in them. We have objects that have statuses on them of this form:

myapplication:statuscodes:new
myapplication:statuscodes:updated

When indexing this status field, I leave it untokenized because I want to search on the entire term. Examining the index, sure enough, the exact string field with colons is found.

Now I'm trying to search on this status term. However, when I run the term through the QueryParser, the colons are replaced with spaces. For example, the query looks like this: status:"myapplication statuscodes new". So, the problem is, I end up getting no results because the indexed terms have the colon yet the search is looking for the status with spaces instead of colons.

The best solution seems to have the field indexed untokenized but with spaces instead of colons, however, I can't figure out how to do that. Any ideas?

-JF


Top
 Profile  
 
 Post subject: Re: Colons in terms - indexing vs parsing
PostPosted: Wed Feb 02, 2011 12:39 pm 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

there are several approaches to solve your problem. It all depends on your usecase. The problem is that you index un-tokenized (as is), but search tokenized (the QueryParser uses an analyzer to parse the query).
If you want to keep the codes as is I would not use a QueryParser, but a TermQuery. You could use a BooleanQuery to combine the term query part with the rest of the query which might still use the QueryParser.
An alternative would be to customize (either via configuration or by implementing your own) the analyzer which gets used for indexing and searching.
You will have to spent some time thinking through how you want your search to behave. Is a search for "updated" supposed to return something?

--Hardy


Top
 Profile  
 
 Post subject: Re: Colons in terms - indexing vs parsing
PostPosted: Wed Feb 02, 2011 12:45 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
the indexed format should indeed match the search format, so you could replace the text you write in the index - as you suggest - using a FieldBridge to "encode" your string in your custom format.

however, it's easier if you index it as you are doing, and then just avoid using the QueryParser but create the Query using a TermQuery:

Code:
TermQuery q = new TermQuery( new Term( "fieldname", "myapplication:statuscodes:updated" ) );


If you need to use the QueryParser, you can customize the text analysis by defining your own Analyzer. You can either implement one yourself (hard), or use the annotations to create an analyzer definition as in 1.6. Analyzer, you basically just need to create a Tokenizer which doesn't split terms on ":".

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Colons in terms - indexing vs parsing
PostPosted: Wed Feb 02, 2011 6:48 pm 
Beginner
Beginner

Joined: Tue Oct 07, 2008 7:05 pm
Posts: 27
Thanks for the tips. You guys are spot on with the advice. In fact, that's the way I used to do it before the new requirement came up that I needed to search for two statuses. So before, I had a multi-field query that did a text search on a couple fields and a term query that searched on the one status and combined those using a Boolean query with MUST arguments. Everything worked great.

Now, I need documents that satisfy the text search, but also satisfy one status code OR the other status code. Boolean queries use MUST and SHOULD which doesn't work if I add another term query on the other status. I know I can create the status query like this:

status:"myapp:statuscodes:new" status:"myapp:statuscodes:updated"

Which essentially ORs the two statuses. I tried doing that and using it in my Boolean query. The problem is, the only way I know to create a Query object that satisfies the BooleanClause constructor is to use a parser and the parser uses the analyzer which strips out the colons. Is there another way to create a lucene Query object besides using a parser? Or can I use a parser with a different analyzer that won't strip out colons?

-JF


Top
 Profile  
 
 Post subject: Re: Colons in terms - indexing vs parsing
PostPosted: Wed Feb 02, 2011 7:07 pm 
Beginner
Beginner

Joined: Tue Oct 07, 2008 7:05 pm
Posts: 27
OK, turns out I was able to use a WhitespaceAnalyzer on the query parser and that didn't strip out the colons and it created the query I need. Everything seems to be working, so hopefully this is a proper solution?

-JF


Top
 Profile  
 
 Post subject: Re: Colons in terms - indexing vs parsing
PostPosted: Wed Feb 02, 2011 7:41 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
your solution is fine, just IMHO you don't need to use a parser at all as you're building this programmatically.
Parsers are good to parse human input, you could use the querybuilder for this.

Code:
final QueryBuilder qbuilder = fullTextSession.getSearchFactory()
   .buildQueryBuilder()
   .forEntity( YourEntity.class )
   .get();
      
Query statusQuery = qbuilder.bool().should(
      qbuilder.keyword().onField( "statuscodeField" ).matching( "myapplication:statuscodes:new" ).createQuery()
   ).should(
      qbuilder.keyword().onField( "statuscodeField" ).matching( "myapplication:statuscodes:updated" ).createQuery()
   ).createQuery();
      
Query finalQuery = qbuilder.bool()
      .must( statusQuery )
      .must( qbuilder.keyword().onField( "requiredField" ).matching( "requiredValue" ).createQuery() )
   .createQuery();

This should be easier to handle multiple fields/values and new requirements in future, for eample the should() part could be built in a for loop for all values you need, still avoiding slow String concatenation and parsing.

Alternatively, just look into using the lower level org.apache.lucene.search.BooleanQuery

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Colons in terms - indexing vs parsing
PostPosted: Thu Feb 03, 2011 12:56 pm 
Beginner
Beginner

Joined: Tue Oct 07, 2008 7:05 pm
Posts: 27
Sanne, thanks, that looks like a great solution. I didn't think of using a boolean query inside a boolean query. I was a bit confused on what the "should" term means. Based on your code, it looks like it is the same as OR'ing the two statuses. I'll give it a shot.

-JF


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.