RE: Regarding journal replay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yeah, thanks Sage for confirming this.

Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sweil@xxxxxxxxxx]
Sent: Thursday, September 10, 2015 3:04 PM
To: Somnath Roy
Cc: ceph-devel
Subject: Re: Regarding journal replay

On Thu, 10 Sep 2015, Somnath Roy wrote:
> Sage et. al,
> Could you please let me know what will happen during journal replay in this scenario ?
>
> 1. Say last committed seq is 3 and after that one more independent
> transaction with say 4 came. Transaction seq 4, has say delete xattr,
> delete object, create a new object,  set xattr
>
> 2. Seq 4 is committed in journal and in half way of applying (say all deletes are done , and created new object but set xattr not done) system crashed.
>
> 3. During restart OSD will try to replay seq 4.
>
> Now, my understanding is, it will blindly run the entire transaction again. But..
>
> 1. Delete will fail since the file doesn't exists.
>
> 2. It will create the new object again even if it is already created ,
> probably get an already exist error (?)
>
> Question is, how it will determine the error is because of filesystem corruption or half executed transaction ?
>  I saw in the code we are ignoring these errors during replay , is it correct ?
> Any information on this will be helpful.

There are a subset of operations where errors (or certain errors) are ignored (and expected) during replay.  ENOENT on delete is one of them.
The largest set of them is whitelisted here

https://github.com/ceph/ceph/blob/master/src/os/FileStore.cc#L2781

but if you grep for 'replaying' you'll see several other instances elsewhere.

Sadly you can't tell if these are happening because of the timing of the crash or because of some other corruption.... the combination of a write-ahead transaction log and posix is far from ideal.  In general, though, operation are all safe to replay.  In the cases where they are not, there is the replay_guard machinery to prevent certain things from being replayed.

sage


________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux