Update Index using Lucene

amin-mc · **Joined:** Wed Oct 03, 2007 2:31 pm **Posts:** 205

Hi

I have a requirement to update the index created by HSearch and add some additional fields that are not mapped and cannot be mapped. So basically I was thinking to doing this and I was wondering if it is ok?

Code:

public void addAdditionalFieldsToIndex(Map<String, String> valuesToIndex, Long id) throws Exception{
      if (valuesToIndex == null || valuesToIndex.isEmpty()) {
         return;
      }
      FullTextSession fullTextSession= Search.getFullTextSession(sessionFactory.getCurrentSession());
      SearchFactory searchFactory = fullTextSession.getSearchFactory();
      @SuppressWarnings("unchecked")DirectoryProvider[] providers = searchFactory.getDirectoryProviders(Risk.class);
      ReaderProvider readerProvider = searchFactory.getReaderProvider();
      
      IndexReader indexReader = readerProvider.openReader(providers);
      IndexWriter indexWriter = null;
      Term idTerm= new Term("id", String.valueOf(id));
      try {
         TermDocs termDocs = indexReader.termDocs(riskIdTerm);
         Analyzer analyzer = searchFactory.getAnalyzer(IndexedClass.class);
         indexWriter = new IndexWriter(indexReader.directory(), analyzer, MaxFieldLength.UNLIMITED);
         while (termDocs.next()) {
            Document loadedDoc = indexReader.document(termDocs.doc());
            for(Map.Entry<String, String> entry: valuesToIndex.entrySet()) {
               loadedDoc.add(new Field(entry.getKey(), entry.getValue(), Field.Store.YES, Field.Index.ANALYZED));
            }
            indexWriter.updateDocument(idTerm, loadedDoc);
         }
      } finally {
         if (indexWriter != null) {
            try {
               indexWriter.close();
            } catch (IOException ioex) {
               throw new IllegalStateException(ioex);
            }
         }
         readerProvider.closeReader(indexReader);
         fullTextSession.close();
      }
      
      
   }

Is the above ok to use or is there any issues that I need to be concerned about as I am changing the document. We will be doing as master/slave configuration and do we have to create a LuceneWork item and place on the queue?

sanne.grinovero · **Posted:** Thu Sep 17, 2009 5:09 pm

The code itself is looking ok, but you're loosing a lot in flexibility. This is not going to work in master/slave configuration of course, as you shouldn't write to the local index but have to send changes in the form of LuceneWork(s).
Keep in mind that indexwriter.updateDocument() is going to do delete+insert, so we usually prefer to map them that way as some optimizations could be applied; generally speaking Hibernate Search takes care of most optimizations, but in this way you'll have to learn how and apply yourself. Of course LuceneWork only knows about add and delete, so you'll have to map your updates to adds and deletes anyway.

About the point of flexibility, why don't you use a ClassBridge? you can define all fields yourself, like you did with the Map<String,String>, but get away from IO concerns, transaction concerns, and a lot of code.

Another way to map fields to the index without mapping them to the database is to combine the @Transient and @Field annotations on the same getter: inside the getter logic you put the "how to define the value for this unmapped field value", and then still have all declarative features to define how it's going to be indexed. This is IMHO much better as it's also future proof, like in next version you'll be able to use the MassIndexer to rebuild your data, or use the QueryBuilder API, while with your code Hibernate Search can't help you out with nice new features.

amin-mc · **Joined:** Wed Oct 03, 2007 2:31 pm **Posts:** 205

s.grinovero wrote:

The code itself is looking ok, but you're loosing a lot in flexibility. This is not going to work in master/slave configuration of course, as you shouldn't write to the local index but have to send changes in the form of LuceneWork(s).
Keep in mind that indexwriter.updateDocument() is going to do delete+insert, so we usually prefer to map them that way as some optimizations could be applied; generally speaking Hibernate Search takes care of most optimizations, but in this way you'll have to learn how and apply yourself. Of course LuceneWork only knows about add and delete, so you'll have to map your updates to adds and deletes anyway.

About the point of flexibility, why don't you use a ClassBridge? you can define all fields yourself, like you did with the Map<String,String>, but get away from IO concerns, transaction concerns, and a lot of code.

Another way to map fields to the index without mapping them to the database is to combine the @Transient and @Field annotations on the same getter: inside the getter logic you put the "how to define the value for this unmapped field value", and then still have all declarative features to define how it's going to be indexed. This is IMHO much better as it's also future proof, like in next version you'll be able to use the MassIndexer to rebuild your data, or use the QueryBuilder API, while with your code Hibernate Search can't help you out with nice new features.

Thanks for your reply. I would like to do what you recommend but unfortunately the model that I am working with makes almost impossible to do. Basically we receive an xml message from our client which is magically converted into an object which is persisted however the data I need is stored somewhere completely different and it is impossible to link to the object i am indexing. I have spent a couple of days trying to create a relationship but with no luck.

So if i want to use the master slave configuration I need to do the following:

Code:

  FullTextSession fullTextSession = Search.getFullTextSession(sessionFactory.getCurrentSession());
        DirectoryProvider[] directoryProviders = fullTextSession.getSearchFactory().getDirectoryProviders(IndexedClass.class);
        ReaderProvider readerProvider =  fullTextSession.getSearchFactory().getReaderProvider();
        IndexReader indexReader = readerProvider.openReader(directoryProviders[0]);
        IndexWriter indexWriter = null;
        final List<LuceneWork>  queue = new ArrayList<LuceneWork>();
        try {
            Term t = new Term("id", String.valueOf(id));
            TermDocs termDocs = indexReader.termDocs(t);
            if (termDocs.next()) {
                if (IndexWriter.isLocked(directoryProviders[0].getDirectory())) {
                    IndexWriter.unlock(directoryProviders[0].getDirectory());
                }
                Document docLoaded = indexReader.document(termDocs.doc());
                for(Map.Entry<String, String> entry: values.entrySet()) {
                    docLoaded.add(new Field(entry.getKey(), entry.getValue(), Field.Store.YES, Field.Index.ANALYZED));
                }
                LuceneWork deleteWork = new DeleteLuceneWork(id, id.toString(), IndexedClass.class);
                LuceneWork addWork = new AddLuceneWork(id, id.toString(), IndexedClass.class, docLoaded);
                queue.add(deleteWork);
                queue.add(addWork);
         
                jmsTemplate.send(destination, new MessageCreator() {
                      public Message createMessage(Session session) throws JMSException {
                        ObjectMessage objectMessage = session.createObjectMessage();
                        objectMessage.setObject((Serializable)queue);
                        return objectMessage;
                    }
                });
            }
        } finally {
            readerProvider.closeReader(indexReader);
        }

The above would participate in an existing transaction. Is creating a delete and add work item ok?

Any help is extremely appreciated!

amin-mc · **Joined:** Wed Oct 03, 2007 2:31 pm **Posts:** 205

Just realised the solution may not work. The code handles when the object is created. However if the object is updated then we will lose the additional fields that we added to the LuceneDocument. Back to the drawing board.

amin-mc · **Joined:** Wed Oct 03, 2007 2:31 pm **Posts:** 205

Hi

Is it possible to add some logic to the FullTextIndexEventListener to add the additional fields? Could it be extended to get the additional data via sql and then add to the document created? I couldn't see anywhere where it was done. Another alternative I was thinking is adding the fields to the document when we recieve the message in the AbstractJMSHibernateSearchController. This would be mean hitting the database to get the data and then add to the document and then let the super.onMessage do the work.

Is there a way to test this using non jms?

Cheers
Amin

sanne.grinovero · **Posted:** Fri Sep 18, 2009 5:44 am

Quote:

if (IndexWriter.isLocked(directoryProviders[0].getDirectory())) {
IndexWriter.unlock(directoryProviders[0].getDirectory());
}

Don't do this. Why are you unlocking? the index is in read-only mode anyway when opened by a ReaderProvider, you don't want to apply changes to it directly, also as it would break the master-slave configuration.

Another problem, you can't just read the local version for the Document, add fields and then send it back to the master: you might have an out-of date Document as the master-slave index copy is asynchronous: don't trust the index current state, trust the database: that's really transactional and that's what you regularly backup.

Quote:

Is it possible to add some logic to the FullTextIndexEventListener to add the additional fields? Could it be extended to get the additional data via sql and then add to the document created? I couldn't see anywhere where it was done.

You can extend FullTextIndexEventListener and register your own instead of the default, still I don't like this solution.

Could we go back to your first sentence, and elaborate a bit more?

Quote:

I have a requirement to update the index created by HSearch and add some additional fields that are not mapped and cannot be mapped.

where do the values for this additional fields come from? and why can't you map them?

amin-mc · **Joined:** Wed Oct 03, 2007 2:31 pm **Posts:** 205

Hi

Thanks for your response. Some of the code that I added was experimental and i agree completely with what you mentioned. After talking to the original developer of the domain model i managed to find a way to map the fields (using formula, etc). I now have the fields that I want and these are apart of the indexed entity.

Apologies for the long wided thread...our domain model is overly complicated for what we are doing!

Cheers

sanne.grinovero · **Posted:** Fri Sep 18, 2009 6:44 am

Code:

After spending a few more hours and talking to the developer who original wrote the domain model (left the company as well) we managed to find a solution that works with HibernateSearch. No updates to anything except for the domain model.

Thanks!

cool! As a final thought, it would have been really weird to have fields coming from nowhere, and also quite hard to present the query results back without having an DTO, which you could have mapped to HS and was probably good to save in database for several reasons (record, backup, index rebuilding, data state validation, not to mention debugging).

amin-mc · **Joined:** Wed Oct 03, 2007 2:31 pm **Posts:** 205

s.grinovero wrote:

Code:

After spending a few more hours and talking to the developer who original wrote the domain model (left the company as well) we managed to find a solution that works with HibernateSearch. No updates to anything except for the domain model.

Thanks!

cool! As a final thought, it would have been really weird to have fields coming from nowhere, and also quite hard to present the query results back without having an DTO, which you could have mapped to HS and was probably good to save in database for several reasons (record, backup, index rebuilding, data state validation, not to mention debugging).

You have to see our codebase...:)