On 8/12/20 9:31 PM, Amir Goldstein wrote:
On Wed, Aug 12, 2020 at 5:00 PM Ritesh Harjani <riteshh@xxxxxxxxxxxxx> wrote:
Hello Amir / Miklos
Hi Ritesh,
Those are good questions.
If you don't mind, please reply with the overlayfs list in CC, so
that others could benefit from my answers.
Done & thanks for your answers. :)
I took the liberty to CC Chengguang Xu, who may be interested in this as well.
Sorry for getting late on this. I was pulled into some other work
before, so I couldn't get to this earlier.
I have resumed this work now and and for starters I am going through
some key overlayfs functions and data structures (since it has
changed since I last checked - more than a year back). I guess more
upcoming use cases in containers is the reason for lot of active
development in overlayfs :)
- While going through ovl_copy_up_tmpfile(), I had this query on
why are we setting "temp" dentry as the __upperdentry of ovl_inode?
"ovl_inode_update(d_inode(c->dentry), temp);"
And similarly in ovl_copy_up_workdir()
For two different reasons:
In ovl_copy_up_workdir() d_move() moves 'temp' dentry onto 'upper'.
In ovl_copy_up_tmpfile() we could use either 'temp' or 'upper'.
I don't think there was a specific reason for choosing 'temp' beyond the
fact that before:
b10cdcdc2012 ovl: untangle copy up call chain
There was more code in common between these two functions.
The reason that we could use either 'temp' or 'upper' is because nothing
AFAIK reads d_parent and d_name of __upperdentry of a non-directory inode
and the values of those dentry members would be arbitrary anyway with a
hardlinked/indexed inode.
I agree. I was curious that the above mentioned commit specifically
changed it from "upper" to "temp". While we could have used "upper" as
well.
Shouldn't we set "upper" dentry as __upperdentry and do d_put on temp.
Although even with current code, I don't see a bug in the functionality,
but I was wondering why was this change done in below commit.
commit: 6b52243f633eb552 (ovl: fold copy-up helpers into callers)
Am I missing anything? Shouldn't the obvious thing to do was to make the
"upper" as the upperdentry in ovl_inode?
- I did take a brief look at your commits as well in that ovl-aops-wip
branch. IIUC, below is a brief summary of what we are trying to solve
with aops implementation.
Currently if we have a r/o MAP_SHARED file mmaped on the lowerlayer. And
if some other process changes the underlying file contents of this file,
the process which had r/o mmaped is not able to see those modified
changes. This behavior is not valid as per POSIX standard.
And the problem really happens because in overlayfs if a lower layer
inode is opened for write, it will usually create another inode
in the upper layer, will copy_up and will write the changes to new
inode. Now since these are two different inodes (which could even be
residing in different FS), hence the MAP_SHARED inconsistency problem.
Solution - what you proposed it to implement aops for overlayfs
which should *(also)* help in solving above MAP_SHARED inconsistency
problem.
More or less. yes. I can't even remember the fine details anymore,
but IIRC, MAP_SHARED and upper inodes would always use the overlay inode
pages, while MAP_PRIVATE would use the real inode pages for non-upper inodes,
in order to share pages among containers with the same lower layers.
Some more optimizations may be possible going forward for sharing more pages.
- I guess once I start going through your patches in more detail,
I may approach you for any other query :)
Does overlayfs has any IRC channel as well?
Not that I know of.
Or is it mostly through emails?
Mostly emails.
Sure.
One thing to note is that the syncfs work [1] by Chengguang Xu will be
made redundant
by overlayfs aops.
ohk.
I don't know if he is planning to follow up on his work. It is not so
fair to blocks it until
aops is completed which may take time or take forever, but you may
Sure, agreed.
want to talk to
him about collaborating on implementing the aops solution.
Sure. So here is what I was planning.
I am about to start going over patches mentioned in Amir's ovl-aops-wip
branch. My first preference would be to port it to latest upstream and
start cleaning it up.
I will update the branch details once those patches are ported to
latest upstream. Let me know if we should do this in any different way.
I will also be joining this year's plumber, in case if we want
to discuss anything on this. Although I am not sure if by then, I would
be able to get any substantial work done to discuss with a wider
audience. But nevertheless, we can always have a call/email exchanges
for this, both before and after the conference :)
Looks like he already has a lot of experience with inode writeback and
overlayfs.
That's really nice!!
-ritesh