On Thu, 10 Sep 2015, Somnath Roy wrote: > Sage et. al, > Could you please let me know what will happen during journal replay in this scenario ? > > 1. Say last committed seq is 3 and after that one more independent transaction with say 4 came. Transaction seq 4, has say delete xattr, delete object, create a new object, set xattr > > 2. Seq 4 is committed in journal and in half way of applying (say all deletes are done , and created new object but set xattr not done) system crashed. > > 3. During restart OSD will try to replay seq 4. > > Now, my understanding is, it will blindly run the entire transaction again. But.. > > 1. Delete will fail since the file doesn't exists. > > 2. It will create the new object again even if it is already created , probably get an already exist error (?) > > Question is, how it will determine the error is because of filesystem corruption or half executed transaction ? > I saw in the code we are ignoring these errors during replay , is it correct ? > Any information on this will be helpful. There are a subset of operations where errors (or certain errors) are ignored (and expected) during replay. ENOENT on delete is one of them. The largest set of them is whitelisted here https://github.com/ceph/ceph/blob/master/src/os/FileStore.cc#L2781 but if you grep for 'replaying' you'll see several other instances elsewhere. Sadly you can't tell if these are happening because of the timing of the crash or because of some other corruption.... the combination of a write-ahead transaction log and posix is far from ideal. In general, though, operation are all safe to replay. In the cases where they are not, there is the replay_guard machinery to prevent certain things from being replayed. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html