-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 23 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: potential FileNotFoundException on slave copy
PostPosted: Tue Oct 26, 2010 6:23 pm 
Beginner
Beginner

Joined: Thu Nov 20, 2003 10:16 pm
Posts: 28
Location: Los Angeles, CA
I think I may have uncovered a potential race condition in the master/slave file synchronization mechanism. Specifically, if the master instance switches out the source while the slave is copying over a file from that source AND the new source no longer has the file being copied, the slave will fail with the following stack:

Code:
Caused by: java.io.FileNotFoundException: C:\Users\mogleym\Scratch\csmc\mds\lucene\dev\master-copy\org.csmc.mds.model.Expertise\1\segments_1 (The system cannot find the file specified)
   at java.io.FileInputStream.open(Native Method)
   at java.io.FileInputStream.<init>(FileInputStream.java:106)
   at org.hibernate.search.util.FileHelper.copyFile(FileHelper.java:157)
   at org.hibernate.search.util.FileHelper.synchronize(FileHelper.java:144)
   at org.hibernate.search.util.FileHelper.synchronize(FileHelper.java:132)
   at org.hibernate.search.store.FSSlaveDirectoryProvider.start(FSSlaveDirectoryProvider.java:131)


Since you probably don't want to have to implement locks on the source, I would suggest catching the FileNotFoundException on the slave side, and if thrown, simply deleting the corresponding slave file being synchronized. This should be a safe bet, since the file it's attempting to sync is actually no longer present in the source.

What do you guys think? And what workaround would you suggest in the meantime?

Michael


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Oct 28, 2010 4:02 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
the master will only delete files from the non-active index just before marking it as "active", this means the master switched twice while a slave couldn't finish a copy.
Looking at the code it seems it's currently not expecting that the copy from a slave could be slower than a full master cycle - so a temporary workaround would be to slow down your master's refresh period, it's too fast.

This should be fixed as a FileNotFoundException is not a desirable output, still even if we can prevent the error you should slow down the master as there's no point in writing quicker than what your slaves are able to copy anyway.

Could you please open a new JIRA issue and point to this forum post?
If you have a test case or fix, even better.

thanks

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Oct 28, 2010 4:26 pm 
Beginner
Beginner

Joined: Thu Nov 20, 2003 10:16 pm
Posts: 28
Location: Los Angeles, CA
The way I discovered this: I was trying to debug an ancillary issue with the master-slave copy. Since stepping through the code was drastically slowing down the copy operation, I hit the FileNotFoundException when the master deleted a file from the source from under my feet.


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Oct 28, 2010 4:37 pm 
Beginner
Beginner

Joined: Thu Nov 20, 2003 10:16 pm
Posts: 28
Location: Los Angeles, CA
Sanne, it could also happen for other reasons besides the slave being slow to copy. It's always possible, though improbable, that while the slave has begun synchronizing the source, the master happens to switch it out in that interval. Since there is no directory locking, this is a clear possibility. I think the fix should simply be to catch the possible FileNotFoundException and delete the corresponding file on the slave side.


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Oct 28, 2010 5:16 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
it should only happen if a copy takes longer than two refresh: the master works alternating between two directories, and only switches the active marker at the end of a copy.
The slaves read the marker, and then perform a copy. so unless this copy takes so long that after they started and before they finish the master triggered again on the other dir, and yet again on the same dir.

it doesn't seem to me a matter of improbability, but rather timing: there is no real locking but the marker file does help a bit.
BTW you can't just delete some files :) the index would be corrupted, so ignoring the error doesn't seem a good idea, rather the client should retry from scratch, but then again if it's so slow that the master would do another turn in the same time, it would loop forever.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Oct 28, 2010 5:25 pm 
Beginner
Beginner

Joined: Thu Nov 20, 2003 10:16 pm
Posts: 28
Location: Los Angeles, CA
My point is that even if the slave is quick to copy, there is still that finite window of time in which, if the master happens to switch the active copy, the slave will get the error. Also, my suggestion to delete the file was based on what the code seemed to already be doing if the source file was determined to NOT exist. But it's possible I'm misunderstanding that code.


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Oct 28, 2010 5:26 pm 
Beginner
Beginner

Joined: Thu Nov 20, 2003 10:16 pm
Posts: 28
Location: Los Angeles, CA
Ok, rereading your message, I do see what you're saying about the master switching the markers. I guess it does it depend on the master being much faster than the slave.


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Oct 28, 2010 6:12 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
thank you very much for reporting the issue

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Jul 21, 2011 6:29 am 
Newbie

Joined: Thu Jul 21, 2011 6:26 am
Posts: 8
hi,

I am getting this error quite a lot on a production site. Has the issue been resolved or is there a workaround. I tried increasing the master refresh interval but it is still occurring. I'd appreciate any advice.


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Jul 21, 2011 6:36 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
https://hibernate.onjira.com/browse/HSEARCH-323 fixes the issue of the slave coming up before the master. It's part of Search 3.4

--Hardy


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Jul 21, 2011 6:43 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Depends on which issue you're having exactly.
If it's this one, it's still open as we didn't consider it a priority as it seemed to be more of a configuration issue:
https://hibernate.onjira.com/browse/HSEARCH-614

If that's not good enough, I think we should either use a snapshot policy when making the copy, user marker files as locks, or use the Infinispan DirectoryProvider instead.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Jul 21, 2011 7:15 am 
Newbie

Joined: Thu Jul 21, 2011 6:26 am
Posts: 8
Yes, it's this issue HSEARCH-614.

I have set the master refresh interval to 120 seconds and the slave refresh period to 160 seconds. I have 4 search slaves. It happens on all of the slaves at different but regular intervals. I have had 3 instances on one slave alone today.

Here is the trace. Is there anything else (configuration wise) I could try to at least alleviate the problem in the short term?

Caused by: java.io.FileNotFoundException: /home/declan/lucene/slave/index/publishedAds/1/segments_9w61t (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:552)
at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:582)
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:62)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:49)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:482)
at org.apache.lucene.index.SegmentInfos$2.doBody(SegmentInfos.java:369)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
at org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
at org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:237)
at org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:145)
at org.hibernate.search.reader.SharingBufferReaderProvider$PerDirectoryLatestReader.refreshAndGet(SharingBufferReaderProvider.java:242)


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Jul 21, 2011 7:25 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
Quote:
I have set the master refresh interval to 120 seconds and the slave refresh period to 160 seconds. I have 4 search slaves. It happens on all of the slaves at different but regular intervals. I have had 3 instances on one slave alone today.

That's not a good timing, you see what happens: ?
120 160
240 320
360 480
480 640
600 800
720 960
840 1120
960 1280

You have to make sure the timers are in sync, and still let enough time for the clients to finish between triggers. So if you take 120 seconds for example, you should make sure that clients are able to copy the index in less than ~100 seconds.. depends on your disk&network speed and index sizes.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Jul 21, 2011 7:37 am 
Newbie

Joined: Thu Jul 21, 2011 6:26 am
Posts: 8
The index size is relatively small ~95MB. It takes about 2 seconds to copy the index across, all servers are on a local network, so network speed is not an issue and the index is on very fast SAN so disk speed is certainly OK too.

Should I ensure both the master and slave(s) start at exactly the same time and that the refresh periods match?


Top
 Profile  
 
 Post subject: Re: potential FileNotFoundException on slave copy
PostPosted: Thu Jul 21, 2011 8:56 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
Should I ensure both the master and slave(s) start at exactly the same time and that the refresh periods match?

yes

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.