Re: persistent overlay inode nlink

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 19, 2017 at 12:06 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> On Mon, Jun 19, 2017 at 10:45 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>> On Sat, Jun 10, 2017 at 1:56 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>>> On Fri, Jun 9, 2017 at 4:14 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>>>> On Fri, Jun 9, 2017 at 1:49 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>>>>> On Fri, Jun 9, 2017 at 11:38 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>>>>
>>>>> The challenge is in accounting the number of link-ups atomically with
>>>>> the link-up itself.  No ideas off-hand.
>>>>
>>>> Following would work I think:
>>>>
>>>> - copy up to indexdir (it now has st_nlink == 1)
>>>> - set overlay.nlinkup to "1-2"
>>>>   + the first number indicates the target number of linkups
>>>>   + the second indicates the target st_nlink on the file
>>>> - fsync the file (hopefully this writes the xattrs as well as the data)
>>>> - link the file from indexdir to upper
>>>> - set overlay.nlinkup to "1"
>>>>    + this indicates that the linkup was finished and number of linkups is 1.
>>>>
>>>
>>> Or like this:
>>>
>>> struct ovl_nlink_adjust {
>>>   int nlink_diff;
>>>   bool from_lower;
>>> }
>>>
>>> Init overlay inode:
>>> overlay_nlink := (from_lower ? lower_nlink : upper_nlink) + nlink_diff
>>>
>>> Copy/link up:
>>> - take ovl_inode lock
>>> - set overlay.nlink { (overlay_nlink - lower_link), true }
>>> - link index (won't change diff from lower inode nlink on fail/success)
>>>
>>> Unlink/link/rename:
>>> - take ovl_inode lock
>>> - set overlay.nlink { (overlay_nlink - upper_nlink), false }
>>> - unlink/link/rename (won't change diff from upper nlink on fail/success)
>
> What I don't like about this is how it ends up in different states
> depending on the last operation.  How about moving "set overlay.nlink
> { (overlay_nlink - upper_nlink), false }" into the link-up region as
> well?
>

Sure. that's not a problem.
But then I have 2 options in case get_inode finds diff from lower:
- either set it right a way to diff from upper
- or let the next copyup/nlink operation store the consistent state

I guess you prefer the former?

Note that I do have to change the unlink/rename/link operations
to grab the ovl_inode lock anyway, so at least w.r.t. code complexity
it is simple to add those ovl_set_nlink() calls. to unlink/rename/link.

>>> I don't think fsync matters to this game.
>>> I'd be surprised if the nlink changing operations (link/unlink/rename) can be
>>> reordered with the setxattr operations on a power fail safe fs.
>
> Why not?  I know data changes can be reordered around rename without
> an fsync, for example.
>

Certainly, data changes are not journalled in common ext4/xfs configurations
and the data=ordered of ext4 guaranties only that data is flushed
before *related*
metadata is journalled and does not guaranty at all that data/metadata
are flushed
in order that they were changed.

The thing with rename vs. data change is that rename doesn't necessarily
change the renamed inode, so journalling the rename operation doesn't
need to journal the inode itself and therefore, there are no *related* metadata
changes that require flushing the inode data.

But unlike rename(taget, foo), the operations unlink(target),
link(target, foo) and
rename(foo, target) all change nlink of target inode and therefore must journal
the target inode, the same target inode whose xattr are being written before and
after the nlink change, and which also implies ctime change.

So I'm pretty sure that reordering xattr writes with nlink change of
the target inode
are not possible with the traditional journalling file system, but I
don't know of any
written rules about this.

Do you suggest that we call vfs_fsync() at the end of ovl_copy_up_inode()
after ovl_set_origin() instead of only at the end of ovl_copy_up_data()?
I am not at ease with the performance overhead this could generate, but
perhaps there is no reason for concern.

Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux