Re: [potential issue, question] whiteout shows up in merged directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Amir,

On 2023/9/4 22:07, Amir Goldstein wrote:
On Mon, Sep 4, 2023 at 4:27 PM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:



On 2023/9/4 20:49, Jingbo Xu wrote:

...


Thanks for the reply and it's really helpful to me.

I can understand in the normal use case, whiteout can not appear in
non-merged directory without origin xattr, except it's hand crafted.

But indeed we suffer from this issue in the tarfs for erofs-utils we are
developing. As described previously, in tarfs mode erofs-utils can
convert each tar layer into one separate erofs image, and then merge
these erofs images into one merged erofs image in a overlayfs-like model.

Suppose:

layer 0 + layer 1   +        layer 2         -->  merged
         /foo/bar   /foo/bar (whiteout)


To speed the merging process, we may merge the two top-most layers
(layer 1 and layer 2) first, and then make layer0 merged into the final
merged image as:



             layer 1   +        layer 2         -->  merged-intermediate
         /foo/bar   /foo/bar (whiteout)

layer0 + merged-intermediate                -->  merged


I could add some more background to this, assuming layer 0 is a
baseos layer (e.g. almost all images use this layer); and layer 1 +
layer 2 belongs to some specific workload images;

since layer 1 + layer 2 are always used together, so we could merge
layer 1 + layer 2 as a new merged layer to avoid extra overhead of
too many overlay layer dirs (but to simplify, here we just illustrate
layer 1 and layer 2, there could be layer 3, 4, ...), but layer 1 +
layer 2 has no relationship with layer 0 in principle (in principle,
merge tool doesn't need to know if layer 0 or any underlay layer
exists).

So if we merge layer 1 + layer 2 here first, and use layer0 together
with the merged layer, it could generate such whiteout cases
described before.

...

Then there comes the problem: when merging layer1 and layer2, I need to
keep the whiteout in the intermediate merged image though the target of
the whiteout has showed up in underlying layer (/foo/bar in layer 1),
because I have no idea if "/foo/bar" exits in the following further
underlying layer (layer 0).  Reusing this logic, the whiteout is kept
there in the final merged image after merging layer0 and
merged-intermediate.

Then if "/foo" is not a merged directory, the "/foo/bar" whiteout will
be exposed in the overlayfs unexpectedly.

Currently we work around this in erofs-utils side.  Apart from setting
origin xattr on the parent directory of the whiteout, I'm not sure if
the above use case is reasonable enough to fix this in the kernel side.

Anyway, we could work around this in the merge tool, but I'm not
sure if it's a design constaint of overlayfs.


Let me put it this way:
If there was an official offline tool to merge overlayfs layers
I would expect that tool to mark the offline merged directories
with an empty "trusted.overlayfs.origin", to be able to distinguish
them from pure non-merge directories.

I do not consider dealing with this in erofs-utils side a workaround
I consider it crafting layers in expected overlayfs format.

Thanks for the hints.

Ok, marking impure makes sense as long as it's properly described.

Just tried to describe the background since the question I think
is not quite erofs-utils specific, btw, if there could be some
reference official offline tool, that would be great!


You should know that there are potential costs for marking a directory
as merged directory - ovl_iterate() implementation for merged dirs
that needs to filter out whiteouts is quite different than the
ovl_iterate_real() case -
The entire dirs needs to be read into cache before any response
could be returned. For very large dirs this may matter.

So you may want your tool to be able to clear the unneeded whiteouts
and unneeded origin xattr eventually.

Yes, I know there is some overhead though, so I tend to add
some option to the merge tool called "--keep-whiteout=0" to
formally drop unneeded whiteouts in the end, and I think we
also need to clear unneeded origin xattrs later.  Jingbo once
would like to confirm the best way to describe such situation
to work out the merge tool.


OTOH, ovl_dir_read_impure() with xino enabled on layers
not from the same fs, has quite a similar impact.
Not sure if this configuration is relevant for your use case.

Thanks for the reminder, we will check later (off work now..)

Thanks,
Gao Xiang


Thanks,
Amir.



[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux