On Thu, Aug 03, 2023 at 11:35:53AM -0700, Linus Torvalds wrote: > And yes, those exist. See at least 'adfs_dir_ops', and > 'ceph_dir_fops'. They may be broken, but people actually did do things > like that historically, maybe there's a reason adfs and ceph allow it. Huh? adfs is a red herring - there is a method called 'read' in struct adfs_dir_ops, but it has nothing to do with read(2) or anything of that sort; it is *used* by adfs_iterate() (via adfs_read_inode()), but the only file_operations that sucker has is const struct file_operations adfs_dir_operations = { .read = generic_read_dir, .llseek = generic_file_llseek, .iterate_shared = adfs_iterate, .fsync = generic_file_fsync, }; and generic_read_dir() is "fail with -EISDIR". ceph one is real, and it's... not nice. They do have a genuine ->read() on directories, which acts as if it had been a regular file contents generated by sprintf() (at the first read(2)). Format: "entries: %20lld\n" " files: %20lld\n" " subdirs: %20lld\n" "rentries: %20lld\n" " rfiles: %20lld\n" " rsubdirs: %20lld\n" "rbytes: %20lld\n" "rctime: %10lld.%09ld\n", and yes, it does go stale. Just close and reopen... File position is an offset in that string. Shared with position used by getdents()... It gets better - if you open a directory, then fork and have both child and parent read from it, well... if (!dfi->dir_info) { dfi->dir_info = kmalloc(bufsize, GFP_KERNEL); if (!dfi->dir_info) return -ENOMEM; No locking whatsoever. No priveleges needed either... Frankly, my preference would be to remove that kludge, but short of that we would need at least some exclusion for those allocations and if we do that, we might just as well have ceph_readdir() fail if we'd ever done ceph_read_dir() on the same struct file and vice versa. They are really not mixible. As far as I can tell, ceph is the only place in mainline where we have ->iterate_shared along with non-failing ->read. Nothing mixes it with ->read_iter, ->write or ->write_iter. lseek() is there, of course, and that's enough to kill the "special fdget variant for directory syscalls" kind of approach. But read() and write() on directories are very much not something people legitimately do.