Re: Spooling large metadata updates / Proposal for a new API/feature in the Linux Kernel (VFS/Filesystems):

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 11, 2025 at 10:18 AM Artem S. Tashkinov <aros@xxxxxxx> wrote:
>
> Hello,
>
> I had this idea on 2021-11-07, then I thought it was wrong/stupid, now
> I've asked AI and it said it was actually not bad, so I'm bringing it
> forward now:
>
> Imagine the following scenarios:
>
>   * You need to delete tens of thousands of files.
>   * You need to change the permissions, ownership, or security context
> (chmod, chown, chcon) for tens of thousands of files.
>   * You need to update timestamps for tens of thousands of files.
>
> All these operations are currently relatively slow because they are
> executed sequentially, generating significant I/O overhead.
>
> What if these operations could be spooled and performed as a single
> transaction? By bundling metadata updates into one atomic operation,

atomicity is not implied from the use case you described.
IOW, the use case should not care in how many sub-transactions
the changes are executed.

> such tasks could become near-instant or significantly faster. This would
> also reduce the number of writes, leading to less wear and tear on
> storage devices.
>
> Does this idea make sense? If it already exists, or if there’s a reason
> it wouldn’t work, please let me know.

Yes it is already how journaled filesystems work, but userspace can only request
to commit the current transaction (a.k.a fsync), so transactions can
be committed
too frequently or at inefficient manner for the workload (e.g. rm -rf).

There was a big effort IIRC around v6.1 to improve scalability of rm
-rf workload
in xfs which led to a long series of regressions and fixes cycles.

I think that an API for rm -rf is interesting because:
- It is a *very* common use case, which is often very inefficient
- filesystems already have "orphan" lists to deal with deferred work
on deleted inodes

What could be done in principle:
1. Filesystems could opt-in to implement unlink(path, AT_REMOVE_NONEMPTY_DIR)
2. This API will fail if the directory has subdirs (i_nlink != 2)
3. If the directory has only files, it can be unlinked and its inode added to an
    "orphan" list as a special "batch delete" transaction
4. When executed, the "batch delete" transaction will iterate the
directory entries,
    decrement nlink of inodes, likely adding those inodes to the "orphan" list
5. rm -rf will iterate DFS, calling unlink(path, AT_REMOVE_NONEMPTY_DIR)
    on leaf directories whose nlink is 2

Among other complications, this API does not take into account permissions for
unlinking the child inodes, based on the child inode attributes such
as immutable
flag or LSM security policies.

This could be an interesting as TOPIC for LSFMM.

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux