On Sep 15, 2018, at 12:58 AM, 焦晓冬 <milestonejxd@xxxxxxxxx> wrote: > > On Sat, Sep 15, 2018 at 6:23 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> >> On Fri, Sep 14, 2018 at 05:06:44PM +0800, 焦晓冬 wrote: >>> Hi, all, >>> >>> A probably bit of complex question: >>> Does nowadays practical filesystems, eg., extX, btfs, preserve metadata >>> operation order through a crash/power failure? >> >> Yes. >> >> Behaviour is filesystem dependent, but we have tests in fstests that >> specifically exercise order preservation across filesystem failures. >> >>> What I know is modern filesystems ensure metadata consistency >>> after crash/power failure. Journal filesystems like extX do that by >>> write-ahead logging of metadata operations into transactions. Other >>> filesystems do that in various ways as btfs do that by COW. >>> >>> What I'm not so far clear is whether these filesystems preserve >>> metadata operation order after a crash. >>> >>> For example, >>> op 1. rename(A, B) >>> op 2. rename(C, D) >>> >>> As mentioned above, metadata consistency is ensured after a crash. >>> Thus, B is either the original B(or not exists) or has been replaced by A. >>> The same to D. >>> >>> Is it possible that, after a crash, D has been replaced by C but B is still >>> the original file(or not exists)? >> >> Not for XFS, ext4, btrfs or f2fs. Other filesystems might be >> different. > > Thanks, Dave, > > I found this archive: > https://www.mail-archive.com/linux-btrfs@xxxxxxxxxxxxxxx/msg31937.html > > It seems btrfs people thinks reordering could happen. > > It is a relatively old reply. Has the implement changed? Or is there > some new standard that requires reordering not happen? There is nothing in POSIX that requires any particular ordering. However, the sequence "A, B, C, sync C" on ext3/ext4 has "always" resulted in A, B also being sync'd to disk (including parent directory creation, etc). For a while, ext4 with delayed allocation resulted in write A, rename A->B causing "B" to potentially not have any data (commit v2.6.29-5120-g8750c6d). While the applications are depending on non-POSIX behaviour, the operation ordering behaviour has been around long that applications have grown to depend on it, and consider the filesystem to have a bug when it doesn't behave that way. If you want to write a robust application, you should fsync() the files you care about (possibly with AIO so you get a notification on completion rather than waiting). Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP