[Bug 196405] mkdir mishandles st_nlink in ext4 directory with 64997 subdirectories

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Thu, 20 Jul 2017 00:59:11 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=196405

Theodore Tso (tytso@xxxxxxx) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@xxxxxxx

--- Comment #15 from Theodore Tso (tytso@xxxxxxx) ---
I don't think you need to disable the optimization for all of Linux.  All you
need to do is to disable the optimization if the link count on the directory is
1.    A traditional Unix directory will always have a link count of 2, because
if /mnt/dir is a directory, there will be one hard link for "/mnt/dir", and
another hard link for "/mnt/dir/."   Hence it should be very simple for glibc
to detect the case where the link count is 1 and realize that it shouldn't try
to use the optimization.

There are other Linux file systems which use this same convention.  For
example, directories in /proc:

# stat  /proc/sys/kernel/
  File: /proc/sys/kernel/
  Size: 0               Blocks: 0          IO Block: 1024   directory
Device: 4h/4d   Inode: 10298       Links: 1
   ...                             ^^^^^^^^^

The reason why I thought this was a regression in find is because you said that
code which understood the n_links=1 convention was in the old find code?  
Regardless, this behavior has been around for decades.   I suspect if I checked
a Linux 0.99 kernel, it would show this behavior in procfs.

There are a few things which I think we are getting wrong.  First, the
documentation is not quite right.  It claims that the limit is 65,000
subdirectories, when in fact what dir_nlink does is to exempt the 65,000
maximum number of hard links limitation from applying to subdirectories in a
directory.

Secondly, the ext4 code will silently set the dir_link feature flag if there is
an attempt to create a subdirectory which exceeds the EXT4_MAX_LINK and the
directory is using directory indexing.   There have been times in the past when
ext4 will silently set feature flags, but I believe that's a bad thing to do. 
Back in 2007 is was apparently still tolerated, but I think we should change
things such that if the dir_nlink feature is not enabled, the kernel should
return an error if creating a subdirectory would violate EXT4_MAX_LINK instead
of automagically setting the feature flag.

Finally, allowing tune2fs to clear the dir_nlink flag is not a safe thing to
do.   We could allow it if tune2fs were to scan the whole file system making
sure there are no directories with an i_links_count of 1.  But it's easier to
just disallow it clearing the flag.

I disagree that we should disable dir_nlink in the future.   Old find utilities
apparently had old code that would do the right thing.  The fact that it is not
in ftw is unfortunate, but I will note that ftw will break for Linux's
/proc/sys directories as well, and this behavior has been around for a long,
Long, LONG time.   The fact that glibc was mistaken in assuming the
optimization was always safe for Linux is a glibc bug.  I don't understand why
you resist the suggestion of disabling the optimization iff st_nlinks==1.  
That is a clearly safe thing to do.

As far as other programs who might make the same mistake glibc did, since Posix
does not guarantee that '.' and '..' are implemented as hard links, having an
st_link of 1 for directories is completely allowed by Posix.  (i.e., a Posix
environment which does this is a conforming environment).  Hence, a Strictly
Conforming (or Strictly Portable) Posix application should not be making this
assumption.

The fact that we've gone ten years without anyone noticing or complaining is a
pretty strong indicator to me that this isn't a serious portability problem.

In terms of checking the ext4 code, I think you're confused.  It's always done
what I've described, although how it does the check is a bit confusing.   See
the following in ext4.h:

#define is_dx(dir) (ext4_has_feature_dir_index((dir)->i_sb) && \
                    ext4_test_inode_flag((dir), EXT4_INODE_INDEX))
#define EXT4_DIR_LINK_MAX(dir) (!is_dx(dir) && (dir)->i_nlink >= EXT4_LINK_MAX)

Then see the very beginning of ext4_mkdir() and ext4_inc_count() in
fs/ext4/namei.c.

I believe we should add a check for ext4_has_feature_dir_nlink(), as described
above, but the behavior that ext4 has been exhibiting hasn't changed in a very
long time.  That's why you saw the behavior you did on your old RHEL6 system.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.