Hibernate Community • View topic

View unanswered posts | View active topics

Board index » Projects » Search, Validator, Shards

All times are UTC - 5 hours [ DST ]

Search issues

Page 1 of 1

[ 2 posts ]

Previous topic | Next topic

Author

Message

amin-mc

Post subject: Search issues

Posted: Fri Jan 02, 2009 4:14 pm

Pro

Joined: Wed Oct 03, 2007 2:31 pm
Posts: 205

Hi

First of all I would like to apologise for posting this issue over here but unfortunately I am not getting any response of help from Lucene user mail.

Here is what I asked:

--
I have created a RTFHandler which takes a RTF file and creates a lucene Document which is indexed. The RTFHandler looks like something like this:

Code:

if (bodyText != null) {
         Document document = new Document();
         Field field = new Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED); 
         document.add(field);
         
      
}

I am using Java Built in RTF text extraction. When I run my test to verify that the document contains text that I expect this works fine. I get the following when I print the document:

Document<stored/uncompressed,indexed,tokenized<body:This is a test rtf document that will be indexed.

Amin Mohammed-Coleman> stored/uncompressed,indexed<path:rtfDocumentToIndex.rtf> stored/uncompressed,indexed<name:rtfDocumentToIndex.rtf> stored/uncompressed,indexed<type:RTF_INDEXER> stored/uncompressed,indexed<summary:This is a >>

The problem is when I use the following to search I get no result:

Code:

   MultiSearcher multiSearcher = new MultiSearcher(new Searchable[] {rtfIndexSearcher});
         Term t = new Term("body", "Amin");
         TermQuery termQuery = new TermQuery(t);
         TopDocs topDocs = multiSearcher.search(termQuery, 1);
         System.out.println(topDocs.totalHits);
         multiSearcher.close();

RftIndexSearcher is configured with the directory that holds rtf documents. I have used Luke to look at the document and what I am finding in the overview tab is the following for the document:

1 body test
1 id 1234
1 name rtfDocumentToIndex.rtf
1 path rtfDocumentToIndex.rtf
1 summary This is a
1 type RTF_INDEXER
1 body rtf

However on the Document tab I am getting (in the body field):

This is a test rtf document that will be indexed.

Amin Mohammed-Coleman

I would expect to get a hit using "Amin" or even "document". I am not sure whether the
line:
TopDocs topDocs = multiSearcher.search(termQuery, 1);

is incorrect as I am not too sure of the meaning of "Finds the top n hits for query." for search (Query query, int n) according to java docs.

----

It looks as though I cannot use Integer.MAX_VALUE in search (query, n) as it is a known issue which hasn't been resolved. Also I am using the StandardAnalyzer inorder to index and search for results.

Once again I do apologise for posting this on here!

Cheers
Amin

Top

amin-mc

Post subject: Search Issues

Posted: Sun Jan 04, 2009 6:53 am

Pro