On Mon, Nov 11, 2024 at 11:20:17PM +0800, yangerkun wrote: > > > 在 2024/11/11 22:39, Chuck Lever III 写道: > > > > > > > On Nov 10, 2024, at 9:36 PM, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > > I'm in the cc list ,so I assume you saw my set, then I don't know why > > > you're ignoring my concerns. > > > 1) next_offset is 32-bit and can overflow in a long-time running > > > machine. > > > 2) Once next_offset overflows, readdir will skip the files that offset > > > is bigger. > > I'm sorry, I'm a little busy these days, so I haven't responded to this > series of emails. > > > In that case, that entry won't be visible via getdents(3) > > until the directory is re-opened or the process does an > > lseek(fd, 0, SEEK_SET). > > Yes. > > > > > That is the proper and expected behavior. I suspect you > > will see exactly that behavior with ext4 and 32-bit > > directory offsets, for example. > > Emm... > > For this case like this: > > 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2 > 2. open /tmp/dir with fd1 > 3. readdir and get /tmp/dir/file1 > 4. rm /tmp/dir/file2 > 5. touch /tmp/dir/file2 > 4. loop 4~5 for 2^32 times > 5. readdir /tmp/dir with fd1 > > For tmpfs now, we may see no /tmp/dir/file2, since the offset has been > overflow, for ext4 it is ok... So we think this will be a problem. I constructed a simple test program using the above steps: /* * 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2 * 2. open /tmp/dir with fd1 * 3. readdir and get /tmp/dir/file1 * 4. rm /tmp/dir/file2 * 5. touch /tmp/dir/file2 * 6. loop 4~5 for 2^32 times * 7. readdir /tmp/dir with fd1 */ #include <sys/types.h> #include <sys/stat.h> #include <dirent.h> #include <errno.h> #include <fcntl.h> #include <unistd.h> #include <stdbool.h> #include <stdio.h> #include <string.h> static void list_directory(DIR *dirp) { struct dirent *de; errno = 0; do { de = readdir(dirp); if (!de) break; printf("d_off: %lld\n", de->d_off); printf("d_name: %s\n", de->d_name); } while (true); if (errno) perror("readdir"); else printf("EOD\n"); } int main(int argc, char **argv) { unsigned long i; DIR *dirp; int ret; /* 1. */ ret = mkdir("/tmp/dir", 0755); if (ret < 0) { perror("mkdir"); return 1; } ret = creat("/tmp/dir/file1", 0644); if (ret < 0) { perror("creat"); return 1; } close(ret); ret = creat("/tmp/dir/file2", 0644); if (ret < 0) { perror("creat"); return 1; } close(ret); /* 2. */ errno = 0; dirp = opendir("/tmp/dir"); if (!dirp) { if (errno) perror("opendir"); else fprintf(stderr, "EOD\n"); closedir(dirp); return 1; } /* 3. */ errno = 0; do { struct dirent *de; de = readdir(dirp); if (!de) { if (errno) { perror("readdir"); closedir(dirp); return 1; } break; } if (strcmp(de->d_name, "file1") == 0) { printf("Found 'file1'\n"); break; } } while (true); /* run the test. */ for (i = 0; i < 10000; i++) { /* 4. */ ret = unlink("/tmp/dir/file2"); if (ret < 0) { perror("unlink"); closedir(dirp); return 1; } /* 5. */ ret = creat("/tmp/dir/file2", 0644); if (ret < 0) { perror("creat"); fprintf(stderr, "i = %lu\n", i); closedir(dirp); return 1; } close(ret); } /* 7. */ printf("\ndirectory after test:\n"); list_directory(dirp); /* cel. */ rewinddir(dirp); printf("\ndirectory after rewind:\n"); list_directory(dirp); closedir(dirp); return 0; } > > Does that not directly address your concern? Or do you > > mean that Erkun's patch introduces a new issue? > > Yes, to be honest, my personal feeling is a problem. But for 64bit, it may > never been trigger. I ran the test program above on this kernel: https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=nfsd-testing Note that it has a patch to restrict the range of directory offset values for tmpfs to 2..4096. I did not observe any unexpected behavior after the offset values wrapped. At step 7, I can always see file2, and its offset is always 4. At step "cel" I can see all expected directory entries. I tested on v6.12-rc7 with the same range restriction but using Maple tree and 64-bit offsets. No unexpected behavior there either. So either we're still missing something, or there is no problem. My only theory is maybe it's an issue with an implicit integer sign conversion, and we should restrict the offset range to 2..S32_MAX. I can try testing with a range of (U32_MAX - 4096)..(U32_MAX). > > If there is a problem here, please construct a reproducer > > against this patch set and post it. Invitation still stands: if you have a solid reproducer, please post it. -- Chuck Lever