Re: metadata operation reordering regards to crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sep 15, 2018, at 12:58 AM, 焦晓冬 <milestonejxd@xxxxxxxxx> wrote:
> 
> On Sat, Sep 15, 2018 at 6:23 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> 
>> On Fri, Sep 14, 2018 at 05:06:44PM +0800, 焦晓冬 wrote:
>>> Hi, all,
>>> 
>>> A probably bit of complex question:
>>> Does nowadays practical filesystems, eg., extX, btfs, preserve metadata
>>> operation order through a crash/power failure?
>> 
>> Yes.
>> 
>> Behaviour is filesystem dependent, but we have tests in fstests that
>> specifically exercise order preservation across filesystem failures.
>> 
>>> What I know is modern filesystems ensure metadata consistency
>>> after crash/power failure. Journal filesystems like extX do that by
>>> write-ahead logging of metadata operations into transactions. Other
>>> filesystems do that in various ways as btfs do that by COW.
>>> 
>>> What I'm not so far clear is whether these filesystems preserve
>>> metadata operation order after a crash.
>>> 
>>> For example,
>>> op 1.  rename(A, B)
>>> op 2.  rename(C, D)
>>> 
>>> As mentioned above,  metadata consistency is ensured after a crash.
>>> Thus, B is either the original B(or not exists) or has been replaced by A.
>>> The same to D.
>>> 
>>> Is it possible that, after a crash, D has been replaced by C but B is still
>>> the original file(or not exists)?
>> 
>> Not for XFS, ext4, btrfs or f2fs. Other filesystems might be
>> different.
> 
> Thanks, Dave,
> 
> I found this archive:
> https://www.mail-archive.com/linux-btrfs@xxxxxxxxxxxxxxx/msg31937.html
> 
> It seems btrfs people thinks reordering could happen.
> 
> It is a relatively old reply. Has the implement changed? Or is there
> some new standard that requires reordering not happen?

There is nothing in POSIX that requires any particular ordering.  However,
the sequence "A, B, C, sync C" on ext3/ext4 has "always" resulted in A, B
also being sync'd to disk (including parent directory creation, etc).

For a while, ext4 with delayed allocation resulted in write A, rename A->B
causing "B" to potentially not have any data (commit v2.6.29-5120-g8750c6d).
While the applications are depending on non-POSIX behaviour, the operation
ordering behaviour has been around long that applications have grown to
depend on it, and consider the filesystem to have a bug when it doesn't
behave that way.

If you want to write a robust application, you should fsync() the files you
care about (possibly with AIO so you get a notification on completion rather
than waiting).

Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux