-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: Hibernate-Search: Indexing wiki-formatted fields
PostPosted: Mon Apr 11, 2011 4:31 am 
Beginner
Beginner

Joined: Mon Jul 05, 2004 9:29 am
Posts: 38
Hi,

I'm starting to use Hibernate-Search (with Seam) and would like to know if somebody had already indexed fields containing seam-text / wiki.

I would like the text to be indexed without the formatting instructions because when using a Lucene highlighter to display only the matching text, the formatting instructions could be cut in the middle and the (ANTLR) seam-text parser would fail.

As an example, suppose I have a Book entity whose field Description is "*Java Persistence* _in action_ blah blah blah". If I search for "persistence" and use an highlighter to show the best matching fragment with a maximum size of 15, it could return "... Persistence* _in..." where the formatting instructions * and _ are no more correct (+ I wouldn't like those instructions to be shown to the user anyway)

I had in mind to subclass SeamTextParser and override some protected method like headline1(String) to replace the produced html tags by empty strings but not every tag has a protected method that could be overridden. Moreover, seam-text can contain HTML tags and those should ideally be stripped too. This special parser would then be used in a hibernate-search FieldBridge before indexation.

The last solution I came up with would be to create a FieldBridge that converts the wiki text to HTML then use CyberNekoHTML to strip all tags... But it seems a bit overkill to me.

If somebody has an other solution, I would be happy to hear it.

Thanks,
Xavier


Top
 Profile  
 
 Post subject: Re: Hibernate-Search: Indexing wiki-formatted fields
PostPosted: Mon Apr 11, 2011 4:44 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

I think you are on the right track. Your problem is much more Lucene and Seam related than Hibernate Search. You will have to decide how to process the markup. Do you want to highlight the actual markup text or the text only?
I think the best approach is to index w/o any markup. I don't know anything about the seam markup parser. In this case you are probably best off asking on the Seam forum.

--Hardy


Top
 Profile  
 
 Post subject: Re: Hibernate-Search: Indexing wiki-formatted fields
PostPosted: Mon Apr 11, 2011 7:32 am 
Beginner
Beginner

Joined: Mon Jul 05, 2004 9:29 am
Posts: 38
Thanks for your quick answer!

I think I will trigger the formatting with seam-text then strip all markup with jericho-html all in a FieldBridge before indexing (that way, I'm sure only the relevant text is indexed and highlighted).

By the way, when I look at http://in.relation.to/search_d.seam?query=swing, I can see some wiki instructions/html in the results... Do you think it was intended? If not, I think seam should provide a custom FieldBridge as part of its integration with hibernate-search to remove the markup/instructions.

I hope seam-text or equivalent will find its way in Seam 3, I can't see anything related to text formatting in the seam modules (I know, I know, I should post that in the seam forum =)).

Xavier


Top
 Profile  
 
 Post subject: Re: Hibernate-Search: Indexing wiki-formatted fields
PostPosted: Mon Apr 11, 2011 9:02 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Quote:
I think I will trigger the formatting with seam-text then strip all markup with jericho-html all in a FieldBridge before indexing (that way, I'm sure only the relevant text is indexed and highlighted).


sounds reasonable

Quote:
By the way, when I look at http://in.relation.to/search_d.seam?query=swing, I can see some wiki instructions/html in the results... Do you think it was intended? If not, I think seam should provide a custom FieldBridge as part of its integration with hibernate-search to remove the markup/instructions.


I don't think so. My guess would be that this is a trade-off on how much work to put into the search functionality.

Quote:
I hope seam-text or equivalent will find its way in Seam 3, I can't see anything related to text formatting in the seam modules (I know, I know, I should post that in the seam forum =)).

Definitely. If you want it to be/stay in Seam you need to make your voice heard there ;-)

--Hardy


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.