Urgent Plz. Infinispan Caches Write Behind-Disaster Recovery

muhammadilyas · **Joined:** Sun Aug 16, 2015 3:21 am **Posts:** 27

Hi Guys,

I need urgent help on this please...

We are performance testing Hibernate Search 5.4.0 with Infinispan 7.2. At the moment infinispan is configured as local cache with cache loader in relational database using write-behind strategy.

We had a system failure last night - Reason: DB space was full! So this disaster resulted lots of entries not being written to DB (please see failure exception at the end of this thread). When we restarted our application it ended up with this exception:

Quote:

Caused by: org.hibernate.search.exception.SearchException: HSEARCH000103: Unable to initialize IndexManager named 'com.*.*.OurIndexedObject'
at org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManager(IndexManagerHolder.java:260)
at org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManager(IndexManagerHolder.java:513)
at org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManagers(IndexManagerHolder.java:482)
at org.hibernate.search.indexes.impl.IndexManagerHolder.buildEntityIndexBinding(IndexManagerHolder.java:91)
at org.hibernate.search.spi.SearchIntegratorBuilder.initDocumentBuilders(SearchIntegratorBuilder.java:358)
at org.hibernate.search.spi.SearchIntegratorBuilder.buildNewSearchFactory(SearchIntegratorBuilder.java:199)
at org.hibernate.search.spi.SearchIntegratorBuilder.buildSearchIntegrator(SearchIntegratorBuilder.java:117)
at org.hibernate.search.hcore.impl.HibernateSearchSessionFactoryObserver.sessionFactoryCreated(HibernateSearchSessionFactoryObserver.java:73)
at org.hibernate.internal.SessionFactoryObserverChain.sessionFactoryCreated(SessionFactoryObserverChain.java:35)
at org.hibernate.internal.SessionFactoryImpl.<init>(SessionFactoryImpl.java:541)
at org.hibernate.boot.internal.SessionFactoryBuilderImpl.build(SessionFactoryBuilderImpl.java:444)
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl.build(EntityManagerFactoryBuilderImpl.java:802)
... 57 more
Caused by: org.hibernate.search.exception.SearchException: Unable to open Lucene IndexReader for IndexManager com.*.*.OurIndexedObject
at org.hibernate.search.indexes.impl.SharingBufferReaderProvider.createReader(SharingBufferReaderProvider.java:113)
at org.hibernate.search.indexes.impl.SharingBufferReaderProvider.initialize(SharingBufferReaderProvider.java:91)
at org.hibernate.search.indexes.impl.PropertiesParseHelper.createDirectoryBasedReaderProvider(PropertiesParseHelper.java:172)
at org.hibernate.search.indexes.spi.DirectoryBasedIndexManager.createIndexReader(DirectoryBasedIndexManager.java:224)
at org.hibernate.search.indexes.spi.DirectoryBasedIndexManager.initialize(DirectoryBasedIndexManager.java:109)
at org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManager(IndexManagerHolder.java:256)
... 68 more
Caused by: java.io.IOException: Read past EOF
at org.infinispan.lucene.impl.SingleChunkIndexInput.readByte(SingleChunkIndexInput.java:54)
at org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:98)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:57)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:923)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:53)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:67)
at org.hibernate.search.indexes.impl.SharingBufferReaderProvider.readerFactory(SharingBufferReaderProvider.java:131)
at org.hibernate.search.indexes.impl.SharingBufferReaderProvider$PerDirectoryLatestReader.<init>(SharingBufferReaderProvider.java:206)
at org.hibernate.search.indexes.impl.SharingBufferReaderProvider.createReader(SharingBufferReaderProvider.java:108)

Quote:

My question is we are here under multiple memory buffers:
1. Lucene Memory Buffer - Means multiple writes will be buffered before sending to persistence.
2. In our case Infinispan is the persistence behind Lucene so we got at the moment write-behind strategy. Means memory state is not same as DB specially in above case of failure well behind in seconds.

I am now testing with write through option to make memory and DB writes in same transaction. But But But will both caches (LuceneIndexesMetadata and LuceneIndexesData) will be updated in same transaction? If not then we might end up MetaData Cache saying we have X number of segments but Data Cache have no clue if failure happened after MetaData Cache updated successfully.

1. What sort of disaster recovery we can do? Building all indexes again for large databases is not good idea as it can take hours.
2. Losing data for certain period (time DB was down) might be convinced to business to be restored but getting system down because of that not even been able to search other stuff won't be acceptable...

Quote:

This is because we have a way to mass index part of data with the help of DB views.

3. How much trust we can build on Hibernate Search for DR aspect?

Please do share your thoughts as I am now out of ideas to address this issue. Thanks

Failure Exceptions while UPDATES - DB SPACE FULL

Quote:

03:42:44,762 ERROR AsyncStoreProcessor-LuceneIndexesMetadata-5 org.infinispan.persistence.jdbc.stringbased.JdbcStringBasedStore - ISPN008024: Error while storing string key to database; key: '_3r1y.cfe|M|com.*.*.OurIndexedObject'
com.ibm.db2.jcc.am.SqlException: The file system is full.. SQLCODE=-968, SQLSTATE=57011, DRIVER=3.69.49
at com.ibm.db2.jcc.am.gd.a(Unknown Source)
at com.ibm.db2.jcc.am.gd.a(Unknown Source)
at com.ibm.db2.jcc.am.gd.a(Unknown Source)
at com.ibm.db2.jcc.am.yo.b(Unknown Source)
at com.ibm.db2.jcc.am.yo.c(Unknown Source)
at com.ibm.db2.jcc.t4.bb.l(Unknown Source)
at com.ibm.db2.jcc.t4.bb.a(Unknown Source)
at com.ibm.db2.jcc.t4.p.a(Unknown Source)
at com.ibm.db2.jcc.t4.wb.b(Unknown Source)
at com.ibm.db2.jcc.am.zo.qc(Unknown Source)
at com.ibm.db2.jcc.am.zo.b(Unknown Source)
at com.ibm.db2.jcc.am.zo.ic(Unknown Source)
at com.ibm.db2.jcc.am.zo.executeUpdate(Unknown Source)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.pmiExecuteUpdate(WSJdbcPreparedStatement.java:1187)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.executeUpdate(WSJdbcPreparedStatement.java:804)
at org.infinispan.persistence.jdbc.stringbased.JdbcStringBasedStore.write(JdbcStringBasedStore.java:174)
at org.infinispan.persistence.async.AsyncCacheWriter.applyModificationsSync(AsyncCacheWriter.java:154)
at org.infinispan.persistence.async.AsyncCacheWriter$AsyncStoreProcessor.retryWork(AsyncCacheWriter.java:329)
at org.infinispan.persistence.async.AsyncCacheWriter$AsyncStoreProcessor.run(AsyncCacheWriter.java:313)

03:42:44,888 ERROR AsyncStoreProcessor-LuceneIndexesData-4 org.infinispan.persistence.jdbc.stringbased.JdbcStringBasedStore - ISPN008024: Error while storing string key to database; key: '_3r20.si|0|1048576|com.*.*.OurIndexedObject'
com.ibm.db2.jcc.am.SqlException: The file system is full.. SQLCODE=-968, SQLSTATE=57011, DRIVER=3.69.49
at com.ibm.db2.jcc.am.gd.a(Unknown Source)
at com.ibm.db2.jcc.am.gd.a(Unknown Source)
at com.ibm.db2.jcc.am.gd.a(Unknown Source)
at com.ibm.db2.jcc.am.yo.b(Unknown Source)
at com.ibm.db2.jcc.am.yo.c(Unknown Source)
at com.ibm.db2.jcc.t4.bb.l(Unknown Source)
at com.ibm.db2.jcc.t4.bb.a(Unknown Source)
at com.ibm.db2.jcc.t4.p.a(Unknown Source)
at com.ibm.db2.jcc.t4.wb.b(Unknown Source)
at com.ibm.db2.jcc.am.zo.qc(Unknown Source)
at com.ibm.db2.jcc.am.zo.b(Unknown Source)
at com.ibm.db2.jcc.am.zo.ic(Unknown Source)
at com.ibm.db2.jcc.am.zo.executeUpdate(Unknown Source)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.pmiExecuteUpdate(WSJdbcPreparedStatement.java:1187)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.executeUpdate(WSJdbcPreparedStatement.java:804)
at org.infinispan.persistence.jdbc.stringbased.JdbcStringBasedStore.write(JdbcStringBasedStore.java:174)

muhammadilyas · **Joined:** Sun Aug 16, 2015 3:21 am **Posts:** 27

Hibernate Team: Please reply! (Sanne You There ? Sorry for calling here directly :))

sanne.grinovero · **Posted:** Thu Feb 16, 2017 12:33 pm

Hello,

using the "write-behind" strategy implies that some data might be lost in similar scenarios. Nothing we can do about that: if it's not acceptable, the slower but more reliable "write-through" options should be used.

The exception suggests that some key section of the index was lost, compromising its integrity. In other words, your index is corrupted and needs to be rebuild from scratch. You need to rebuild the index using the MassIndexer.

The best disaster recovery strategy is to make sure you tested and tuned the MassIndexer so that you can run it within a decent timeframe. It's for this reason that is has many tuning options, and also exposes various metrics (you can plug in a custom org.hibernate.search.batchindexing.MassIndexerProgressMonitor to collect many details) which should help with tuning its performance.

Quote:

But will both caches (LuceneIndexesMetadata and LuceneIndexesData) will be updated in same transaction?

It's not using transactions, unless you enabled an Infinispan transactional Cache. I wouldn't recommend it, as the index writing would become quite slower and the transaction wouldn't have helped in this case either as it doesn't prevent data loss when using write-behind.

Quote:

We are performance testing Hibernate Search 5.4.0 with Infinispan 7.2

Since you seem interested in performance, I would recommend to upgrade to the latest: as usual we made lots of performance improvements, and Infinispan versions 8.2.+ have a lot of bugfixes as well.

HTH

muhammadilyas · **Joined:** Sun Aug 16, 2015 3:21 am **Posts:** 27

Thank you very much for reply. Latest is dependant on Java 8 and that is a bottleneck for us.

We are using write through now and hoping in such cases (I mean in case of disaster) will not lose data and end up corrupting indexes.

Few more concerns please: (I was looking at code today inside Infinispan and Lucene to find out below points)

Quote:

1. Whether write through option will save inconsistencies between Metadata and Data cache? I noticed it always goes with Data cache updates first in Cache Store and then Metadata updates BUT it seems like they both are updated in two separate contexts. Does this mean if let say Data cache being updated and then DB goes OFF and then MetaData updates are lost - Will this still end up data corruption?

Quote:

2. Where does Lucene memory buffer (hibernate.search.default.indexwriter.ram_buffer_size) kicks in? I mean when we call FullTextEntityManager.index(object) it will ask Lucene to update indexes, Lucene I am hoping will give this change to Infinispan in same transaction... If this is not the case will Lucene memory buffer contains data that is not shared with Infinispan?

sanne.grinovero · **Posted:** Thu Feb 16, 2017 7:15 pm

Regarding question 1:

Yes the two writes to data & metadata are fully independent, yet if one of them is lost the index is likely corrupted. However remember that Infinispan is an highly available data grid, the DB storage is just the last phase storage: if you have Infinispan configured for replication (or distribution), a failure to write on the database is not a problem as the entries are first and foremost safely stored on the other nodes. In case of a node crash during a DB synch for example, another node will take over and when a read of this specific entry is needed it will be served from there.

In your specific case you're running it with local caches, so replication is not available but the local memory still is. If you had not restarted the application, it is possible that the index state would have survived as there was a functional replica within the Infinispan datacontainer.

Regarding looking at order of stores: bear in mind that - when you use write-behind - the actual writes of the entries to the CacheStore is out of order. There is no guarantee about which entries would have been written first, the only guarantee you have is that it wouldn't overwrite an entry with an older one - nor delete an entry which should not be deleted.

Question 2:
The IndexWriter buffer is before flushing of indexing operations into the index store. So that's before reaching Infinispan. In terms of source code, this means it's within the org.apache.lucene codebase. Flushing operations are controlled by Hibernate Search; depending on your configuration: typically it will flush after each commit of your application's transaction. In case you use NRT then it would flush less frequently - that's why the documentation contains

Quote:

As a trade-off it requires a non-clustered and non-shared index.

: if it doesn't flush, the changeset of the transaction wouldn't be visible to other nodes. (When using NRT the buffers are still usable for local queries, but only to those queries running in the same JVM - with a direct reference to these buffers).

Another case in which flushing is disabled is during MassIndexing operations: flushing would slow it down, and the goal of the MassIndexer is to give you a tool to rebuild the index at maximum speed. It will flush occasionally - depending on when these buffers are full - and once again at the end of the process.

You mentioned invoking FullTextEntityManager.index(object) directly. Why do you do use this method? Most users let it index automatically.
To answer your question: when you invoke that method it will also flush the IndexWriter, at the end of the current transaction if there is any transaction, or immediately if there's no active transaction.

muhammadilyas · **Joined:** Sun Aug 16, 2015 3:21 am **Posts:** 27

Quote:

You mentioned invoking FullTextEntityManager.index(object) directly. Why do you do use this method? Most users let it index automatically.

This is because application to take care of indexing and serving search requests is different to main big application. We have messaging setup built to send updates same like Elastic - We Use SOAP and JMS.

Index Writer Flush to Infinispan

Quote:

I am glad that IndexWriter flushes the data as part of index operation to underline directory that in our case is Infinispan and using Write Through option will keep indexes in DB fully consistent with memory and ready to be backed up at any given time.
In our setup we never lose messages and we can resend the messages again for any from-to dates. So main focus is to have a DB backup that we can use as a starting point in case of a fail-over and we can re-submit the missing messages again. That won't be too many even if we go some days behind the time of disaster. At least no need to run Mass Indexer for whole dataset and wait for hours (It is relative figure based on size, currently we have speed of indexing 100,000 in 30 minutes (end to end) - will need to improve as well as you also recommended). Because we have in millions and potential to grow over years.

Fail-Over Node

Quote:

And Yes, We will have second node as well in real time production environment. We have two options to configure.

Standalone (Own Copy of Indexes)
1. Keep both as standalone instances with local caches AS we can take control of sending updates to both of them easily. So both nodes will have up-to date self owned indexes in memory and also in cache store to serve from. So fail-over at one side is not a fail-over by any means at other side - not even at underline cache store level.

OR

Master Slave
2. Use cluster as normal and have Master Slave architecture and let them communicate each other through jgroups to share the state. In this case fail-over at DB level means they both will have up-to-date data in memory but out-of-date data in underline cache store. Talking purely write-through here. Am I right? OR they can later on synch with underline cachestore and make it consistent once problem at Cache Store Level is sorted?