-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: Integrating the Lucene SpellChecker with Hibernate Search
PostPosted: Mon Jun 14, 2010 1:08 pm 
Newbie

Joined: Thu Jan 08, 2009 3:48 pm
Posts: 3
Location: San Francisco
I'm using Hibernate Search 3.1.1 GA with Lucene 2.9.2. First, I'm curious if anyone else has attempted to integrate HS with Lucene's SpellChecker? If so, how did you deal with the following issue:

Lucene's SpellChecker builds a supplementary index from the terms in your main index. If your main index changes, e.g. after adding a new entity, then you need to update the spell correction index. From what I can tell, there is not a clear cut way to determine when to update the spell checker index, especially if you're using hibernate.search.worker.execution=async. In other words, you don't know when Hibernate Search is finished updating the Lucene index. It would be nice if Hibernate Search raised an event, such as HS_WORK_COMPLETED, after it finished making updates to the index. The application could register a listener for this event to trigger updates to the spell correction index.

For now, I solved this by manually updating the spell checker index after an entity is inserted or updated, but wanted to put the idea out there of having HS raise an event after it completed updating the index.


Top
 Profile  
 
 Post subject: Re: Integrating the Lucene SpellChecker with Hibernate Search
PostPosted: Fri Jun 18, 2010 4:48 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi thelabdude, welcome and congrats for you great blog post on Hibernate Search.

Don't you think that, especially in the async case, the async backend should handle this to update both indexes instead of one?
I don't think the async worker could send an event to the original thread, but you'd be welcome to prototype something.

Could you explain why you're needing to update the spell correction index constantly?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Integrating the Lucene SpellChecker with Hibernate Search
PostPosted: Fri Jun 18, 2010 10:27 am 
Newbie

Joined: Thu Jan 08, 2009 3:48 pm
Posts: 3
Location: San Francisco
Hi Sanne,

Yes, I think it would be better to have the backend perform the update to both indexes. I'll give some thought to how that might work ... probably through the use of a @SpellChecker annotation ... Also, as I thought more about the event listener approach, I realized that it might be hard for the application to know what to do in the HS event listener since, as you point out, that would be in a different thread.

As for updating the spell correction index constantly, it's my understanding that you need the spell correction index to be in-sync with your main index so that the SpellChecker can find any new terms added in the last update to the main index. Otherwise, SpellChecker might think words are mis-spelled when they are just new to the system.

Thanks for the kind words about the blog ;-)


Top
 Profile  
 
 Post subject: Re: Integrating the Lucene SpellChecker with Hibernate Search
PostPosted: Fri Jun 18, 2010 10:32 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
Otherwise, SpellChecker might think words are mis-spelled when they are just new to the system.

only if you have a requirement to "learn" from your users, adding as a valid term every word people do use.
You can also build an index from a dictionary beforehand and consider that the only reliable reference for spell cheking.

Even if you want to learn from your users, you might want some kind of controls on that, like inserting only terms used by at least 2-3 users, and I wonder if you really want to perform the dictionary update "in transaction" with the insertion of new text, that's unlikely to be needed - you could have a nightly job for this, or even weekly seems reasonable.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.