-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 
Author Message
 Post subject: 2 letter search performance
PostPosted: Thu Jan 29, 2009 2:39 am 
Newbie

Joined: Thu Jan 08, 2009 9:23 am
Posts: 9
Hi,

we face serious performance issue when users do 2 letter search e.g ho, jo, pa ma, um ar, ma fi etc. time taken between 10 - 15 secs. Search performs on 7 fields, PrefixQuery implementation on all fields, AND search.

We show only 100 top documents only.

Our indexer size is 300 MB.

We user StandardAnalyzer & StandardTokenizer for indexing & searching.

plz let me know how can we improve the performance.



Regards,
Sourabh


Top
 Profile  
 
 Post subject:
PostPosted: Sun Feb 01, 2009 6:08 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
are you combining "two letter search" with other contraints, are do you always do only this kind of searches?

You are currently building the worst-case query for Lucene IMHO.
This is more a Lucene forum question, but I think you could solve it by building a custom analyzer which emits "two letter" tokens: in this case the number of maching Terms would be high possibly incresing the index size, but it would be highly optimized for your kind of search, returning the situation to the usual blazing fast queries.

You may also try an additional field to combine the 7 ANDed fields into one at indexing time.

These considerations are quite broken if you want to support also other kinds of searches; in that case you may want to keep two sets of fields in the index, so to use the best ones depending on search type and possibly combining them.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 03, 2009 6:59 am 
Newbie

Joined: Thu Jan 08, 2009 9:23 am
Posts: 9
Hi,

as per Lucene document I find out 2 ways of performance improvement:

1. Sorting the documents need to retrieve by docID order first increase the performance.
2. We can restrict the loading of all fields of a document by implementing the FieldSelectorResult.LAZY_LOAD.

plz correct me if I am wrong. I want to implement these suggestions & want to know Hibernate search support these.

sourabh

_________________
sourabh


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 03, 2009 9:17 am 
Pro
Pro

Joined: Wed Oct 03, 2007 2:31 pm
Posts: 205
Hi

Please find info from Erick Erickson on prefix queries:

Quote:
Prefix queries are expensive here. The problem is
that each one forms a very large OR clause on all
the terms that start with those two letters. For instance,
if a field in your index contained
mine
milanta
mica

a prefix search on "mi" would form
mine OR milanta OR mica.

Doing this across seven fields could get expensive.

Two things:
1> what is the problem you are trying to solve? Perhaps some
of the folks on the list can give you some suggestions. You can
think about many strategies depending upon what you want
to accomplish. A 300M index isn't very big, so you could, for
instance, think about indexing a separate field that contains only
the two beginning letters and search *that* in this case. I'll
assume that three letter prefix queries are OK.

2> How are you measuring query time? If you're measuring the
time it takes when you first start a searcher, be aware that the
first few queries are usually slow because the caches haven't
been filled. Further, are you measuring total response time or
are you measuring *just* the query time? It's possible that the
time is being spent assembling the response in your code
rather than actual searching. You might insert some timers
to determine that.
[/code]

This is taken from the lucene mailing.

HTH


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 03, 2009 9:24 am 
Pro
Pro

Joined: Wed Oct 03, 2007 2:31 pm
Posts: 205
ok...sorry folks..just realised that the quote i copied is actually for the original author of this post.

Sorry again!


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 03, 2009 10:33 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
hi,

I think the best approach is either Sanne's custom analyzer and emitting/adding the two letter tokens to the token stream or indexing the two letters into a separate field as suggested in the quoted email.

You definitely want to get away from the expensive PrefixQuery and complex BooleanQueries.

All this is possible in Hibernate Search right now.

--Hardy


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.