-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 15 posts ] 
Author Message
 Post subject: Hibernate Search: projection of document id? (highlighter)
PostPosted: Wed Jan 09, 2008 4:31 pm 
Newbie

Joined: Wed Jan 09, 2008 4:00 pm
Posts: 11
I'm trying to implement search highlighting as shown in the Lucene Highlighter example code using TokenSources.getAnyTokenStream(IndexReader reader, int docId, String field, Analyzer analyzer).

It seems like there currently is no way to retrieve the docId from Hibernate Search for use in this call. Are there any plans to project the docId from the hit?

Has anyone else run into this issue and solved it in a different way? Are there any drawbacks to just retrieving the token stream out of the field in the document (by projecting it)?


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jan 09, 2008 7:50 pm 
Hibernate Team
Hibernate Team

Joined: Sun Sep 14, 2003 3:54 am
Posts: 7256
Location: Paris, France
Christian uses the highlighter in the Seam wiki (check the seam 2 distro in the example directory). I suspect he uses plain Lucene.
I am interested in feedbacks though to help make a better HSearch integration by providing the actual docId or doing something of a higher level.

_________________
Emmanuel


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jan 10, 2008 9:55 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Yes, match highlighting would be high on my list of features as well. Is there a feature request for this already?


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jan 10, 2008 10:25 am 
Hibernate Team
Hibernate Team

Joined: Sun Sep 14, 2003 3:54 am
Posts: 7256
Location: Paris, France
nope, go ahead.

_________________
Emmanuel


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jan 10, 2008 2:49 pm 
Newbie

Joined: Wed Jan 09, 2008 4:00 pm
Posts: 11
FYI, Christian's code appears to just reparse the indexed text, rather than trying to get the token stream out of Lucene.

Code:
            // Use the same analyzer as the indexer!
            TokenStream tokenStream = new StandardAnalyzer().tokenStream(null, new StringReader(indexedText));

            String unescapedFragements =
                    highlighter.getBestFragments(tokenStream, indexedText, numOfFragments, getFragmentSeparator());


I think projecting the document id would allow use of TokenSources.getAnyTokenStream to get a TokenStream without having to reparse the full text when it is already available in Lucene.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jan 11, 2008 6:31 am 
Hibernate Team
Hibernate Team

Joined: Sun Sep 14, 2003 3:54 am
Posts: 7256
Location: Paris, France
Open a JIRA issue then.
But TokenSources.getAnyTokenStream taking an analyzer as a parameter tells me that Lucene actually reparses the text as well. So it does no more no less work than what Christian did.

_________________
Emmanuel


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jan 11, 2008 2:09 pm 
Newbie

Joined: Wed Jan 09, 2008 4:00 pm
Posts: 11
Actually getAnyTokenStream may reparse the document, but only does so if it has to. If it can reconstruct the token stream without reparsing (if the term positions are stored, as I understand it), it does so. I will open a JIRA issue.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Apr 17, 2008 5:51 am 
Beginner
Beginner

Joined: Thu Feb 28, 2008 4:58 am
Posts: 37
Hello
I have searching based on HS and it works ok.
I see HS has projection of DOCUMENT_ID.
I've tried to use highlighting as described above.
I see in debug that StreamTokenizer gets data ok, but there is problem with score. I get nothing highlighted because scores in text fragments are always 0 (despite scores being displayed ok by projection).
Below is my code.


Code:
String searchPattern = "text*";
Analyzer = new StarndardAnalyzer();
QueryParser parser = new QueryParser(searchPattern, analyzer);
Query luceneQuery = parser.parse(searchPattern);

Session session = (Session) em.getDelegate();
FullTextSession fts = Search.createFullTextSession(session);
Transaction tx = fts.beginTransaction();
FullTextQuery query = fts.createFullTextQuery(luceneQuery, Person.class);

query.setProjection(FullTextQuery.THIS, FullTextQuery.DOCUMENT_ID,FullTextQuery.DOCUMENT,FullTextQuery.SCORE,FullTextQuery.BOOST);
Collection<Object[]> lista = query.list();

SearchFactory searchFactory = fts.getSearchFactory();

String fragmentSeparator = "...";
Fragmenter fragmenter = new SimpleFragmenter();
int numOfFragments = 5;

ReaderProvider readerProvider = searchFactory.getReaderProvider();
DirectoryProvider directoryProviders = searchFactory.getDirectoryProviders(Person.class)[0];
IndexReader reader = readerProvider.openReader(directoryProvider);

luceneQuery.rewrite(reader);
Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(), new QueryScorer(luceneQuery));
highlighter.setTextFragmenter(fragmenter);

for (Object[] zlozony : lista) {
  int docId = Integer.parseInt(zlozony[1].toString());
  Document document = (Document) zlozony[2];
  TokenStream tokenStream = TokenSources.getAnyTokenStream(   reader, docId, "name", analyzer);
  org.apache.lucene.document.Field field = document.getField("name");
  String highlight = highlighter.getBestFragments(tokenStream, field.stringValue(), numOfFragments,fragmentSeparator);
}
readerProvider.closeReader(reader);

tx.commit();
fts.close(); //session.close();



Top
 Profile  
 
 Post subject:
PostPosted: Thu Apr 17, 2008 8:03 am 
Beginner
Beginner

Joined: Thu Feb 28, 2008 4:58 am
Posts: 37
I've found a problem - I was using metacharacter "*" in query.
Is highlighting suppose to work with metacharacters? (I know it's more Lucene problem, but Lucene doesn't have normal newsgroup or forum, only mailing list)


Top
 Profile  
 
 Post subject:
PostPosted: Thu Apr 17, 2008 12:46 pm 
Newbie

Joined: Wed Jan 09, 2008 4:00 pm
Posts: 11
I'm not an expert, but as I understand it the luceneQuery.rewrite() is supposed to take care of expanding the terms when you're using wildcards. Are you sure you don't have an Analyzer mismatch between your query and your index? I believe that the Analyzer must match in order for the query to work right.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Apr 17, 2008 1:13 pm 
Beginner
Beginner

Joined: Thu Feb 28, 2008 4:58 am
Posts: 37
I use StandardAnalyzer with polish stopwords in both cases (indexing and querying). I will try with some simpler analyzer, maybe stopwordAnalyzer?


Top
 Profile  
 
 Post subject:
PostPosted: Sun Jun 01, 2008 11:49 am 
Pro
Pro

Joined: Wed Oct 03, 2007 2:31 pm
Posts: 205
Is there a new feature request for search highlighting? I have recently done a demo using Hibernate Search and some of my colleagues said that highlighting would be a great feature. I understand you can do it with Lucene natively but doing it via hibernate seems like a nice approach too.



Thanks


Top
 Profile  
 
 Post subject:
PostPosted: Tue Jun 03, 2008 5:39 pm 
Hibernate Team
Hibernate Team

Joined: Sun Sep 14, 2003 3:54 am
Posts: 7256
Location: Paris, France
If some one show me a compelling API to do it (ie much better than the Lucene one), why not :)

_________________
Emmanuel


Top
 Profile  
 
 Post subject: Re:
PostPosted: Tue Feb 09, 2010 5:07 pm 
Newbie

Joined: Mon Feb 08, 2010 3:47 pm
Posts: 3
wiedmann wrote:
I'm not an expert, but as I understand it the luceneQuery.rewrite() is supposed to take care of expanding the terms when you're using wildcards. Are you sure you don't have an Analyzer mismatch between your query and your index? I believe that the Analyzer must match in order for the query to work right.


Well, according with this link http://www.gossamer-threads.com/lists/l ... -dev/42240 it's necessary. I'm making everything correct but if I don't rewrite the query after parse hightlight doesn't work.


So, is there some method that hibernate use to call rewrite automatically? I don't like too much to retreive IndexReader and make these operation with him. I try flush(); but it doesn't do any rewrite.

If someone there's some news about this issue please share with us.


Top
 Profile  
 
 Post subject: Re: Hibernate Search: projection of document id? (highlighter)
PostPosted: Wed Feb 10, 2010 5:30 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hello,
I got this working some time ago, honestly I don't remember how I did.
If you could show some test code and point me to what you're expecting I'd be more than happy to look into it; a real Unit test would be great, but some pseudo code will do.

The sources are packed with very simple examples of working unittests, pick one as an example:
http://fisheye.jboss.org/browse/Hibernate/search/trunk/src/test/java/org/hibernate/search/test/RamDirectoryTest.java?r=18737

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 15 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.