-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: Lost index
PostPosted: Thu Apr 21, 2016 9:42 am 
Newbie

Joined: Thu Apr 21, 2016 9:16 am
Posts: 3
I am working on an application that was super slow to commit data to the index, thus the hibernate.search.default.indexmanager setting was changed to near-real-time, and the performance was greatly improved.

However with that change, if glassfish is killed (kill -9) some items in the lucene index got lost. In some cases 3 weeks of results got lost from the lucene index and only with a full re-index it was possible to get them back in the index. I would say those results were not yet saved to the lucene index as it would happen with a clean shutdown.

We though that "near real time" is something close to "right now" and we're so surprised it could have involved results older than 3 weeks.

Is it possible to control when the lucene index is physically stored on the disk; or that loss of data is a bug?

Cheers,
Davide

platforms: linux ubuntu 12 and 14
database: mysql 5.6.27
mvn dependencies:
    hibernate-jpa-2.1-api 1.0.0.Final
    hibernate-entity-manager 4.3.9.Final
    hibernate-common-annotations 4.0.5.Final
    hibernate-core 4.3.9.Final
    hibernate-search-orm 5.2.0.Final
    hibernate-search-engine 5.2.0.Final
    hibernate-search-infinispan 5.3.0.Beta1
    lucene-facet 4.10.4
hibernate-search configuration:
    <property name="hibernate.dialect" value="org.hibernate.dialect.MySQL5InnoDBDialect" />
    <property name="hibernate.hbm2ddl.auto" value="" />
    <property name="hibernate.search.default.directory_provider" value="filesystem" />
    <property name="hibernate.search.default.indexBase" value="/var/lucene/indexes" />
    <property name="hibernate.search.lucene_version" value="LUCENE_36" />
    <property name="hibernate.search.default.worker.thread_pool.size" value="20"/>
    <property name="hibernate.search.default.exclusive_index_use" value="true"/>
    <property name="hibernate.search.default.ram_buffer_size" value="1"/>
    <property name="hibernate.search.worker.batch_size" value="250"/>
    <property name="hibernate.search.default.exclusive_index_use" value="true"/>
    <property name="hibernate.search.default.use_compound_file" value="false"/>
    <property name="hibernate.search.analyzer" value="my.DefaultAnalyser" />
    <property name="hibernate.transaction.jta.platform" value="org.hibernate.service.jta.platform.internal.SunOneJtaPlatform" />
    <property name="hibernate.connection.connectionCollation" value="utf8mb4_unicode_ci" />
    <property name="hibernate.connection.useUnicode" value="true" />
    <property name="hibernate.show_sql" value="false" />
    <property name="hibernate.connection.isolation" value="2" />
    <property name="hibernate.search.default.sharding_strategy.nbr_of_shards" value="1" />
    <property name="hibernate.search.default.indexmanager" value="near-real-time" />


Top
 Profile  
 
 Post subject: Re: Lost index
PostPosted: Fri Apr 22, 2016 8:06 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi Davide,

the near-real-time strategy might buffer writes indefinitely: it flushes when its buffers are full, but there is no time limit for this to happen.

Is it possible that for 3 weeks long there was almost no write activity on your system?
If there's write activity, that would have caused flushing to disk.

You might be interested in this feature, if you can live with asynchronous indexing:
- https://hibernate.atlassian.net/browse/HSEARCH-1693

That policy will trigger a flush periodically. We don't have a policy which applies both NRT optimisations, and yet flushes periodically to disk. It would be quite easy to add one, have a look at:
- org.hibernate.search.backend.impl.lucene.ScheduledCommitPolicy
- org.hibernate.search.backend.impl.lucene.NRTCommitPolicy

I am not sure if this would have solved your problem though: the data which is not flushed on disk could contain critical metadata, and losing that it might have to consider a large segment to be potentially corrupted.
If you need best reliability, don't use NRT.. of course it will be quite slower as it will flush to disk a lot.

BTW the option "thread_pool.size" doesn't need 20 threads. With the latest design, 1 thread should be more than enough when using NRT.
"ram_buffer_size" is very low though. Consider allowing it 128 MB or more, then the non-NRT backend might be quite faster.

Quote:
We though that "near real time" is something close to "right now" and we're so surprised it could have involved results older than 3 weeks.

The "near real time" name is referring to its write performance, as perceived. To achieve this efficiency, it does avoid disk flushes which might lose data.

But I agree that 3 weeks of time is rather extreme, I had never heard of that so it might be case of bad luck with the wrong metadata being lost, and for some reason you application never had the need to flush that little 1MB buffer.

Also: wouldn't you have had to reindex everything even if you had lost only 1 minute of changes?

Thanks for the feedback!

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Lost index
PostPosted: Tue May 24, 2016 10:15 am 
Newbie

Joined: Thu Apr 21, 2016 9:16 am
Posts: 3
Thank you for the quick answer!

We used to have a buffer of 256MB, and I changed it to 1 to verify if it was going to flush at some point. But even with over 10K results it wasn't.
I spent most of time looking at sources and trying different possible solutions and it was an interesting diving!

The system is always getting results, and in those weeks it got over 60K new entries.
After many different experiments and tuning it seems that the feature you linked seems working fine for us.

I'll keep you update.
And thank you again for your excellent answer!

Best,
Davide


Top
 Profile  
 
 Post subject: Re: Lost index
PostPosted: Tue May 24, 2016 11:26 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
thanks, always nice to hear about happy users :)

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.