Re: [PATCH 0/6] ovl: consistent_fd feature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 30, 2017 at 1:34 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> On Wed, Mar 29, 2017 at 5:36 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>> Mikos,
>>
>> This patch set implement the 'simple' solution we discussed on LSF.
>> For the special case of all overlays on the same fs with clone support,
>> files are copied up on open for read (as O_TMPFILE) and linked to
>> upperdir on first open for write.
>>
>> - Patches 1-2 are the refactoring I sent you earlier.  They are not
>>   strictly needed for the consistent_fd feature - up to you.
>> - Patches 3-4 test 'samefs' and 'cloneup' properties of underlying fs.
>> - Patch 5 copies up open for read (for the special case).
>> - Patch 6 defers linking the tmpfile to first open for write.
>>
>> Some of the design choices for patch 6 are questionable:
>> the storing of tempfile in ovl_dentry_update(),
>> temp dentry is freed only on overlay dentry release.
>> awaiting your feedback about those choises.

My gut feel is that we should throw away the tempfile once all files
referring to the dentry are closed.  Memory pressure on dcache does
not seem like a good way to control the number inodes allocated on
underlying fs.

I'm hoping that this is going to be a temporary solution, because I
think it's unreasonably heavyweight compared to the rareness of the
issue it is solving.  I think this should be emphasized: this is for
the paranoid and it will cause a performance to degrade and resource
use to increase (although it may be an acceptable amount).

I also have a feeling that we must get the inode number from the lower
file in this case or it's going to break things (e.g. the pattern:
stat(), open(), fstat() verify st_dev/st_ino unchanged). Same goes for
atime, although that's a much smaller issue.

Yeah, yeah, copy-up has that issue, but lets remember that copy-up is
a rare event compared to open(O_RDONLY).

And we need to fix the constant inode number issue anyway.

So I thought about the constant inode number thing (for now only for
the samefs case) and here are my ideas:

 - need a database of upper_ino -> lower_ino mapping
 - on copy-up: append new mapping to database
 - on remove: delete mapping from database (since upper_ino might be reused)
 - on readdir and stat: check database and replace st_ino/d_ino if needed

For NFS export and hard-link-copy-up support, just need to extend this
with a reverse mapping.

The interesting questions are:

 - where to keep the database(s)
 - and how to cache them efficiently in the kernel.

I'd refrain from changes in disk format (i.e. those that rely on
backward incompatible features for underlying filesystems).  At least
for the first version.

Simplest solution is to keep mapping in the upper inode itself
(xattr).  Takes care of removal automatically.  Can be cached in
ovl_entry.  Makes readdir() really inefficient... need to cache
whether mapping needed for any directory entries in directory, which
complicates rename.

The other idea is to store the database separately from the upper tree
(this is what aufs does, apparently).  This works for reverse mapping
as well.  Makes all operations (except rename) more complicated.
Keeping whole mapping in kernel memory is prone to resource hogging
and DoS.   Could have a bitmap created by hashing the ino's that are
in the map and setting the bit for those.  Then only reach out to the
disk db if there's a possibility of hit.  Would still need to design
an efficient way to store and access the data in the db (and I have
zero experience with that sort of thing).

Thoughts?

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux