Re: [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Aug 04, 2024 at 10:13:22PM +0100, Al Viro wrote:
> On Sun, Aug 04, 2024 at 08:18:14AM -0700, Linus Torvalds wrote:
> > On Sat, 3 Aug 2024 at 20:47, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> > >
> > >         bitmap_copy_and_extend(nfdt->open_fds, ofdt->open_fds,
> > >                         copy_words * BITS_PER_LONG, nwords * BITS_PER_LONG);
> > 
> > Ok, thinking about it, I like this interface, since it looks like the
> > compiler would see that it's in BITS_PER_LONG chunks if we inline it
> > and decide it's important.
> > 
> > So make it so.
> 
> FWIW, what I'm testing right now is the patch below; it does fix the reproducer
> and it seems to be having no problems with xfstests so far.  Needs profiling;
> again, no visible slowdowns on xfstests, but...

Seems to work, no visible slowdowns.  Pushed into #fixes.

BTW, speaking of fdtable, the longer I look at it, the more I'm tempted to
try and kill it off completely.

Look:
	->fdt->max_fds is non-decreasing
	->fdt->full_fds_bits only accessed under ->files_lock
	->fdt->open_fds is sometimes used with rcu_read_lock() alone,
but such users are very few - it's
	bitmap_weight(fdt->open_fds, fdt->max_fds);
in proc_readfd_count() (stat on /proc/*/fd) and max_select_fds();
the latter is checking if anything in the fd_set_bits passed to
select(2) is not opened, in addition to looking for the maximal
descriptor passed to select(2) in that.  Then we do fdget() on
all of them during the main loop of select().  Sure, we want
-EBADF if it hadn't been opened at all and we want EPOLLNVAL
on subsequent passes, but that could've been handled easily
in the main loop itself.  As for the bitmap_weight(), I seriously
suspect that keeping the count (all updates are under ->files_lock)
might be cheap enough, and IIRC we already had cgroup (or was it bpf?)
folks trying to get that count in really scary ways...
	->fdt->close_on_exec has two places accessing it
with rcu_read_lock() alone (F_GETFD and alloc_bprm()).  In both
cases we have already done fdget() on the same descrpiptor.

So... do we really need that indirect?  The problem would be
seeing ->max_fds update before that of the array pointers.
Smells like that should be doable with barrier(s) - and not
many, at that...  

Note that if we coallocate the file references' array and bitmaps,
we could e.g. slap rcu_head over the beginning of ->open_fds
bitmap after the sucker's gone from files_struct and do RCU-delayed
freeing on that.  Am I missing something obvious here?




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux