Hi
First of all I would like to apologise for posting this issue over here but unfortunately I am not getting any response of help from Lucene user mail.
Here is what I asked:
--
I have created a RTFHandler which takes a RTF file and creates a lucene Document which is indexed. The RTFHandler looks like something like this:
Code:
if (bodyText != null) {
Document document = new Document();
Field field = new Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED);
document.add(field);
}
I am using Java Built in RTF text extraction. When I run my test to verify that the document contains text that I expect this works fine. I get the following when I print the document:
Document<stored/uncompressed,indexed,tokenized<body:This is a test rtf document that will be indexed.
Amin Mohammed-Coleman> stored/uncompressed,indexed<path:rtfDocumentToIndex.rtf> stored/uncompressed,indexed<name:rtfDocumentToIndex.rtf> stored/uncompressed,indexed<type:RTF_INDEXER> stored/uncompressed,indexed<summary:This is a >>
The problem is when I use the following to search I get no result:
Code:
MultiSearcher multiSearcher = new MultiSearcher(new Searchable[] {rtfIndexSearcher});
Term t = new Term("body", "Amin");
TermQuery termQuery = new TermQuery(t);
TopDocs topDocs = multiSearcher.search(termQuery, 1);
System.out.println(topDocs.totalHits);
multiSearcher.close();
RftIndexSearcher is configured with the directory that holds rtf documents. I have used Luke to look at the document and what I am finding in the overview tab is the following for the document:
1 body test
1 id 1234
1 name rtfDocumentToIndex.rtf
1 path rtfDocumentToIndex.rtf
1 summary This is a
1 type RTF_INDEXER
1 body rtf
However on the Document tab I am getting (in the body field):
This is a test rtf document that will be indexed.
Amin Mohammed-Coleman
I would expect to get a hit using "Amin" or even "document". I am not sure whether the
line:
TopDocs topDocs = multiSearcher.search(termQuery, 1);
is incorrect as I am not too sure of the meaning of "Finds the top n hits for query." for search (Query query, int n) according to java docs.
----
It looks as though I cannot use Integer.MAX_VALUE in search (query, n) as it is a known issue which hasn't been resolved. Also I am using the StandardAnalyzer inorder to index and search for results.
Once again I do apologise for posting this on here!
Cheers
Amin