-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 
Author Message
 Post subject: Hibernate search: capture unique stemmed words in document
PostPosted: Sat Jan 31, 2009 7:45 pm 
Newbie

Joined: Sat Jan 31, 2009 7:39 pm
Posts: 5
Hello

I have the latest version of hibernate search (3.1.0.GA). Very easy to get running. Thanks for a great product.


I need to capture the documentId and the list of unique, stemmed words. Proximity is not important.

Is there a combination of custom analyzer &/or PostInsertEventListener that will work for this ?


Thanks,

-brian


Top
 Profile  
 
 Post subject:
PostPosted: Sun Feb 01, 2009 6:00 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi, glad you like it.
Could you elaborate on your requirement?
what do you mean by "capture"? you want to store the Lucene Document somewhere before it gets written to the index?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject:
PostPosted: Sun Feb 01, 2009 11:44 am 
Newbie

Joined: Sat Jan 31, 2009 7:39 pm
Posts: 5
given ...

Code:
public class Post {
   @Id()
   @GeneratedValue(strategy = GenerationType.AUTO)
   @DocumentId
   private Long     articleId ;

   @Column(columnDefinition = "MEDIUMBLOB")
       @Field(index=Index.TOKENIZED, store=Store.NO)   
   private String cleanedContent ;

        // ........
}

public class GenericDAO <T, PK extends Serializable> extends
      HibernateDaoSupport {

   public PK create(T o) {
      Serializable pk = getHibernateTemplate().save(o);
      return (PK) pk;
   }
}




I have several use cases that require the list of unique, stemmed words from Post.cleanedContent. Ideally we would fire off these use cases based on the insert into the index.

My challenge is in understanding the API and determining a place to insert our code.

It seems the @AnalyzerDef / @TokenizerDef / @TokenFilterDef might be one place. However, it didn't seem that a TokenFilter knew the @DocumentId of the field it was processing ?

Another place might be FullTextIndexEventListener.onPostInsert. However, it wasn't clear to me what org.hibernate.event.PostInsertEvent
contains regarding tokens. Perhaps PostInsertEvent.getState() contains tokens?

I'd really prefer not to repeat the token/stem process that hibernate search already does.





[/code]


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 03, 2009 11:00 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

I am still not quite sure what you want to achieve. Are you looking for a way to implement word stemming for your [i]cleanedContent[i] field or do you actually need the generated (and potentially stemmed tokens) from the indexing process in a separate part of your application?

For the former @AnalyzerDef together with an appropriate chain of tokenizers and token filters will work. For the latter you might want to implement a custom analyzer which gives you access to the indexing process. You could for example intercept the tokens form the indexing process and do whatever you want with them. You can specify a custom analyzer as part of the @Field annotation.

--Hardy


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 03, 2009 11:30 am 
Newbie

Joined: Sat Jan 31, 2009 7:39 pm
Posts: 5
>actually need the generated (and potentially stemmed tokens) from the indexing process in a separate part of your application?

yes exactly, I'll look at creating the custom analyzer


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.