-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 
Author Message
 Post subject: Search issues
PostPosted: Fri Jan 02, 2009 4:14 pm 
Pro
Pro

Joined: Wed Oct 03, 2007 2:31 pm
Posts: 205
Hi

First of all I would like to apologise for posting this issue over here but unfortunately I am not getting any response of help from Lucene user mail.

Here is what I asked:

--
I have created a RTFHandler which takes a RTF file and creates a lucene Document which is indexed. The RTFHandler looks like something like this:

Code:
if (bodyText != null) {
         Document document = new Document();
         Field field = new Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED);
         document.add(field);
         
      
}


I am using Java Built in RTF text extraction. When I run my test to verify that the document contains text that I expect this works fine. I get the following when I print the document:

Document<stored/uncompressed,indexed,tokenized<body:This is a test rtf document that will be indexed.

Amin Mohammed-Coleman> stored/uncompressed,indexed<path:rtfDocumentToIndex.rtf> stored/uncompressed,indexed<name:rtfDocumentToIndex.rtf> stored/uncompressed,indexed<type:RTF_INDEXER> stored/uncompressed,indexed<summary:This is a >>


The problem is when I use the following to search I get no result:

Code:
   MultiSearcher multiSearcher = new MultiSearcher(new Searchable[] {rtfIndexSearcher});
         Term t = new Term("body", "Amin");
         TermQuery termQuery = new TermQuery(t);
         TopDocs topDocs = multiSearcher.search(termQuery, 1);
         System.out.println(topDocs.totalHits);
         multiSearcher.close();



RftIndexSearcher is configured with the directory that holds rtf documents. I have used Luke to look at the document and what I am finding in the overview tab is the following for the document:

1 body test
1 id 1234
1 name rtfDocumentToIndex.rtf
1 path rtfDocumentToIndex.rtf
1 summary This is a
1 type RTF_INDEXER
1 body rtf


However on the Document tab I am getting (in the body field):

This is a test rtf document that will be indexed.

Amin Mohammed-Coleman


I would expect to get a hit using "Amin" or even "document". I am not sure whether the
line:
TopDocs topDocs = multiSearcher.search(termQuery, 1);

is incorrect as I am not too sure of the meaning of "Finds the top n hits for query." for search (Query query, int n) according to java docs.

----

It looks as though I cannot use Integer.MAX_VALUE in search (query, n) as it is a known issue which hasn't been resolved. Also I am using the StandardAnalyzer inorder to index and search for results.


Once again I do apologise for posting this on here!

Cheers
Amin


Top
 Profile  
 
 Post subject: Search Issues
PostPosted: Sun Jan 04, 2009 6:53 am 
Pro
Pro

Joined: Wed Oct 03, 2007 2:31 pm
Posts: 205
I managed to get the issue resolved. It turns out that I was using the MaxFieldLength inner class to be 2 which meant that only maxiumum of 2 fields per token was being stored.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.