https://bugzilla.kernel.org/show_bug.cgi?id=196405 --- Comment #18 from Theodore Tso (tytso@xxxxxxx) --- On Fri, Jul 21, 2017 at 08:22:05AM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > I'm worried about code intended to run on traditional Unix and GNU/Linux, not > about portable POSIX code. There is a reasonable amount of code that uses > st_nlink as a way to avoid unnecessary stat calls when traversing a file > system. This provides a significant performance boost on traditional Unix and > GNU/Linux, and it would be a shame to lose this performance benefit. One of the things which confuses me is why you think there's so much code which tries to use the st_nlink hack. It's ***much*** simpler to just rely on d_type if it exists (and it does on most systems). If the answer is that d_type might not be supported on all file systems, and a recursive descent might enter a file system which didn't support d_type: 1) The assumption that st_nlink always has the property that it is >2 and can be used to derive the number of subdirectories was never valid across all file system types, so you could descend into a file system type where that wasn't true. 2) If you did descend into a file system which didn't support d_type, d_type would be DT_UNKNOWN instead of DT_REG or DT_DIR 3) Using DT_DIR is means you can skip the stat check for all directory entries. If you are doing a recursive descent where you care about the name, you need to call readdir() on all of the directory entries anyway, so you will have access to d_type. If you are doing a recursive descent where you are checking on file ownership, you are doing the stat(2) anyway, so why not check S_ISDIR(st.st_mode) instead of blindly using the st_nlink hack? 4) Using d_type/DT_DIR is implemented in anything that was BSD derived, and many of the SysV derived systems (to the extent that they imported in the BSD Fast Filesystem), would have also had d_type support. So if your argument is what about legacy Unix code, I would think that most of them would have used the much more performant and simpler-to-use d_type interface. > > allowing tune2fs to clear the dir_nlink flag is not a safe thing to do. > That depends on what the dir_nlink flag is supposed to mean. (Since the flag > does not work now, we can define it to mean what we like. :-) If dir_nlink 1 > means "set a directory link count to 1 if it would overflow", and if a link > count of 1 never changes regardless of what dir_nlink is set to, then why > would > it be a problem to allow tunefs to alter the dir_nlink flag? dir_nlink would > affect only future calls to mkdir, not past ones. Well, it's mostly safe now because ten years have passed and even the most pathetically obsolete Enterprice Distro installation has updated to something more recent. But the reason why dir_nlink was defined as an RO_INCOMPAT feature (EXT4_FEATURE_RO_COMPAT_DIR_NLINK) was because a dir_nlink oblivious kernel could get upset when trying to rmdir a directory where n_link was 1. > > The fact that we've gone ten years without anyone noticing or complaining > More accurately, we've gone ten years before people connected the dots. This > time, the original bug report was about 'ls'. Can you give me a pointer to the original bug report? I'm curious how things were misbehaving. > From my point of view The worst thing about all this, is that the dir_nlink > feature is misdocumented and does not work as intended (i.e., it's a flag > that > in effect cannot be turned off). Either dir_nlink needs to be documented and > fixed; or failing that, the dir_nlink flag should be withdrawn and the ext4 > documentation should clearly say that the link count of a directory is > permanently set to 1 after it overflows past 64999. If you take the latter > approach, you needn't update the ext4 code at all, just the documentation > (though the documentation should note that 64999 is off-by-one compared to > the > 65000 that is nominally the maximum link count). There was a time when we tried documenting things in terms that users could understand, as opposed to going into gory details that only a fs programmer would love. And that retrospect as a mistake. We should have done both, with the civilian-friendly bit coming first. It was also a mistake to have dir_nlink be automatically turned on in the case of ext4 file systems. As I said, we used to do this much more often, and ten years ago we weren't so strict about this rule. The issue is that at this point there are multiple implementations of the ext2/3/4 file system format, and not just Linux, so we can't assume that turning on some feature which is supported by a particular Linux kernel won't break things on some other implementation (e.g., Grub, BSD etc.). It was a bad idea back then as well because sometimes people want to downgrade to old kernels, so having a new kernel automatically do something which causes the file system to become unmountable by a new kernel is unfriendly. Since dir_nlink was close to one of the first ext4 features that was added, it was perhaps more excusable --- but it was still wrong. The problem withdrawing the feature is that it would break a lot of users who want to have more than 65,000 subdirectories. Ext4 has been out there for a long time, and while it's true that many people don't create directory trees which are that bushy, I've gotten people screaming at us for much more minor bugs. So that's why I'm curious to see the original ls bug. (Maybe because most people don't use ls -lR on huge directory trees, because they don't like to grow old waiting for it to complete? :-) So I think the right thing to do is to fix the documentation. We actually added the feature first, and only added the documentation much later, so it is the documentation which is wrong. So this is not a case of writing the spec first and then implementing to the spec. This is much more like Unix came first, and Posix came later to document how Unix worked. Except we weren't smart enough to add a clause, as Posix did, that anything that was allowed to use the Unix(tm) trademark was automatically spec/documentation compliant. :-) And even before we added the documentation, it wasn't like it was a secret; it was just not that well known. But there were some blog entries that talked about it[1], and the description of how it worked was in the git commit message. [1] https://www.whatastruggle.com/maximum-number-of-sub-directories-in-ext4 -- You are receiving this mail because: You are watching the assignee of the bug.