Re: [RFC PATCH 2/2] libfs: Improve behavior when directory offset values wrap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Nov 22, 2024, at 7:57 AM, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> 
> On Thu, Nov 21, 2024 at 02:54:16PM +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Nov 21, 2024, at 3:34 AM, Christian Brauner <brauner@xxxxxxxxxx> wrote:
>>> 
>>> On Wed, Nov 20, 2024 at 10:05:42AM -0500, Chuck Lever wrote:
>>>> On Wed, Nov 20, 2024 at 09:59:54AM +0100, Christian Brauner wrote:
>>>>> On Mon, Nov 18, 2024 at 03:58:09PM -0500, Chuck Lever wrote:
>>>>>> 
>>>>>> I've been trying to imagine a solution that does not depend on the
>>>>>> range of an integer value and has solidly deterministic behavior in
>>>>>> the face of multiple child entry creations during one timer tick.
>>>>>> 
>>>>>> We could partially re-use the legacy cursor/list mechanism.
>>>>>> 
>>>>>> * When a child entry is created, it is added at the end of the
>>>>>> parent's d_children list.
>>>>>> * When a child entry is unlinked, it is removed from the parent's
>>>>>> d_children list.
>>>>>> 
>>>>>> This includes creation and removal of entries due to a rename.
>>>>>> 
>>>>>> 
>>>>>> * When a directory is opened, mark the current end of the d_children
>>>>>> list with a cursor dentry. New entries would then be added to this
>>>>>> directory following this cursor dentry in the directory's
>>>>>> d_children list.
>>>>>> * When a directory is closed, its cursor dentry is removed from the
>>>>>> d_children list and freed.
>>>>>> 
>>>>>> Each cursor dentry would need to refer to an opendir instance
>>>>>> (using, say, a pointer to the "struct file" for that open) so that
>>>>>> multiple cursors in the same directory can reside in its d_chilren
>>>>>> list and won't interfere with each other. Re-use the cursor dentry's
>>>>>> d_fsdata field for that.
>>>>>> 
>>>>>> 
>>>>>> * offset_readdir gets its starting entry using the mtree/xarray to
>>>>>> map ctx->pos to a dentry.
>>>>>> * offset_readdir continues iterating by following the .next pointer
>>>>>> in the current dentry's d_child field.
>>>>>> * offset_readdir returns EOD when it hits the cursor dentry matching
>>>>>> this opendir instance.
>>>>>> 
>>>>>> 
>>>>>> I think all of these operations could be O(1), but it might require
>>>>>> some additional locking.
>>>>> 
>>>>> This would be a bigger refactor of the whole stable offset logic. So
>>>>> even if we end up doing that I think we should use the jiffies solution
>>>>> for now.
>>>> 
>>>> How should I mark patches so they can be posted for discussion and
>>>> never applied? This series is marked RFC.
>>> 
>>> There's no reason to not have it tested.
>> 
>> 1/2 is reasonable to apply.
>> 
>> 2/2 is work-in-progress. So, fair enough, if you are applying
>> for testing purposes.
>> 
>> 
>>> Generally I don't apply RFCs
>>> but this code has caused various issues over multiple kernel releases so
>>> I like to test this early.
>> 
>> The biggest problem has been that I haven't found an
>> authoritative and comprehensive explanation of how
>> readdir / getdents needs to behave around renames.
>> 
>> The previous cursor-based solution has always been a "best
>> effort" implementation, and most of the other file systems
>> have interesting gaps and heuristics in this area. It's
>> difficult to piece all of these together to get the idealized
>> design requirements, and also a get a sense of where
>> compromises can be made.
>> 
>> Any advice/guidance is welcome.
> 
> I didn't mean to make it sound like you did anything wrong or I was
> blaming you. I was literally just trying to say we had weird behavior in
> this area for legitimate reasons. Posix states this:
> 
>    If posix_getdents() is called concurrently with an operation that
>    adds, deletes, or modifies a directory entry, the results from
>    posix_getdents() shall reflect either all of the effects of the
>    concurrent operation or none of them. If a sequence of calls to
>    posix_getdents() is made that reads from offset zero to end-of-file
>    and a file is removed from or added to the directory between the
>    first and last of those calls, whether the sequence of calls returns
>    an entry for that file is unspecified.
> 
> Which to me all reads like we're pretty much free in what to do as long
> as we clearly document it.

Agreed, it's that documentation that I'm sorely missing.


>>>> I am actually half-way through implementing the approach described
>>>> here. It is not as big a re-write as you might think, and addresses
>>>> some fundamental misunderstandings in the offset_iterate_dir() code.
>>> 
>>> Ok, great then let's see it.
>> 
>> I'm finishing it up now. Unfortunately I have several other
>> (NFSD and not) bugs I'm working through.
> 
> No hurry.

What is making this difficult or maybe even impossible is
commit da549bdd15c2 ("dentry: switch the lists of children to
hlist"). When I last worked on libfs, d_subdirs was a normal
list_head. Now, as an hlist, d_children has no tail pointer.

It's no longer easy or fast to walk the list of children from
oldest to newest, so I'll have to find a solution that
continues to use the mtree for iterating through directories.


--
Chuck Lever






[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux