Patch "fs: fd tables have to be multiples of BITS_PER_LONG" has been added to the 5.15-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    fs: fd tables have to be multiples of BITS_PER_LONG

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     fs-fd-tables-have-to-be-multiples-of-bits_per_long.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit eded3deb384662cb55c3bb23112402f5e9520e10
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date:   Tue Mar 29 15:06:39 2022 -0700

    fs: fd tables have to be multiples of BITS_PER_LONG
    
    [ Upstream commit 1c24a186398f59c80adb9a967486b65c1423a59d ]
    
    This has always been the rule: fdtables have several bitmaps in them,
    and as a result they have to be sized properly for bitmaps.  We walk
    those bitmaps in chunks of 'unsigned long' in serveral cases, but even
    when we don't, we use the regular kernel bitops that are defined to work
    on arrays of 'unsigned long', not on some byte array.
    
    Now, the distinction between arrays of bytes and 'unsigned long'
    normally only really ends up being noticeable on big-endian systems, but
    Fedor Pchelkin and Alexey Khoroshilov reported that copy_fd_bitmaps()
    could be called with an argument that wasn't even a multiple of
    BITS_PER_BYTE.  And then it fails to do the proper copy even on
    little-endian machines.
    
    The bug wasn't in copy_fd_bitmap(), but in sane_fdtable_size(), which
    didn't actually sanitize the fdtable size sufficiently, and never made
    sure it had the proper BITS_PER_LONG alignment.
    
    That's partly because the alignment historically came not from having to
    explicitly align things, but simply from previous fdtable sizes, and
    from count_open_files(), which counts the file descriptors by walking
    them one 'unsigned long' word at a time and thus naturally ends up doing
    sizing in the proper 'chunks of unsigned long'.
    
    But with the introduction of close_range(), we now have an external
    source of "this is how many files we want to have", and so
    sane_fdtable_size() needs to do a better job.
    
    This also adds that explicit alignment to alloc_fdtable(), although
    there it is mainly just for documentation at a source code level.  The
    arithmetic we do there to pick a reasonable fdtable size already aligns
    the result sufficiently.
    
    In fact,clang notices that the added ALIGN() in that function doesn't
    actually do anything, and does not generate any extra code for it.
    
    It turns out that gcc ends up confusing itself by combining a previous
    constant-sized shift operation with the variable-sized shift operations
    in roundup_pow_of_two().  And probably due to that doesn't notice that
    the ALIGN() is a no-op.  But that's a (tiny) gcc misfeature that doesn't
    matter.  Having the explicit alignment makes sense, and would actually
    matter on a 128-bit architecture if we ever go there.
    
    This also adds big comments above both functions about how fdtable sizes
    have to have that BITS_PER_LONG alignment.
    
    Fixes: 60997c3d45d9 ("close_range: add CLOSE_RANGE_UNSHARE")
    Reported-by: Fedor Pchelkin <aissur0002@xxxxxxxxx>
    Reported-by: Alexey Khoroshilov <khoroshilov@xxxxxxxxx>
    Link: https://lore.kernel.org/all/20220326114009.1690-1-aissur0002@xxxxxxxxx/
    Tested-and-acked-by: Christian Brauner <brauner@xxxxxxxxxx>
    Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/fs/file.c b/fs/file.c
index 97d212a9b814..c01c29417ae6 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -87,6 +87,21 @@ static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
 	copy_fd_bitmaps(nfdt, ofdt, ofdt->max_fds);
 }
 
+/*
+ * Note how the fdtable bitmap allocations very much have to be a multiple of
+ * BITS_PER_LONG. This is not only because we walk those things in chunks of
+ * 'unsigned long' in some places, but simply because that is how the Linux
+ * kernel bitmaps are defined to work: they are not "bits in an array of bytes",
+ * they are very much "bits in an array of unsigned long".
+ *
+ * The ALIGN(nr, BITS_PER_LONG) here is for clarity: since we just multiplied
+ * by that "1024/sizeof(ptr)" before, we already know there are sufficient
+ * clear low bits. Clang seems to realize that, gcc ends up being confused.
+ *
+ * On a 128-bit machine, the ALIGN() would actually matter. In the meantime,
+ * let's consider it documentation (and maybe a test-case for gcc to improve
+ * its code generation ;)
+ */
 static struct fdtable * alloc_fdtable(unsigned int nr)
 {
 	struct fdtable *fdt;
@@ -102,6 +117,7 @@ static struct fdtable * alloc_fdtable(unsigned int nr)
 	nr /= (1024 / sizeof(struct file *));
 	nr = roundup_pow_of_two(nr + 1);
 	nr *= (1024 / sizeof(struct file *));
+	nr = ALIGN(nr, BITS_PER_LONG);
 	/*
 	 * Note that this can drive nr *below* what we had passed if sysctl_nr_open
 	 * had been set lower between the check in expand_files() and here.  Deal
@@ -269,11 +285,25 @@ static unsigned int count_open_files(struct fdtable *fdt)
 	return i;
 }
 
+/*
+ * Note that a sane fdtable size always has to be a multiple of
+ * BITS_PER_LONG, since we have bitmaps that are sized by this.
+ *
+ * 'max_fds' will normally already be properly aligned, but it
+ * turns out that in the close_range() -> __close_range() ->
+ * unshare_fd() -> dup_fd() -> sane_fdtable_size() we can end
+ * up having a 'max_fds' value that isn't already aligned.
+ *
+ * Rather than make close_range() have to worry about this,
+ * just make that BITS_PER_LONG alignment be part of a sane
+ * fdtable size. Becuase that's really what it is.
+ */
 static unsigned int sane_fdtable_size(struct fdtable *fdt, unsigned int max_fds)
 {
 	unsigned int count;
 
 	count = count_open_files(fdt);
+	max_fds = ALIGN(max_fds, BITS_PER_LONG);
 	if (max_fds < NR_OPEN_DEFAULT)
 		max_fds = NR_OPEN_DEFAULT;
 	return min(count, max_fds);



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux