Hi,
在 2024/11/13 23:17, Chuck Lever 写道:
On Mon, Nov 11, 2024 at 11:20:17PM +0800, yangerkun wrote:
在 2024/11/11 22:39, Chuck Lever III 写道:
On Nov 10, 2024, at 9:36 PM, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
I'm in the cc list ,so I assume you saw my set, then I don't know why
you're ignoring my concerns.
1) next_offset is 32-bit and can overflow in a long-time running
machine.
2) Once next_offset overflows, readdir will skip the files that offset
is bigger.
I'm sorry, I'm a little busy these days, so I haven't responded to this
series of emails.
In that case, that entry won't be visible via getdents(3)
until the directory is re-opened or the process does an
lseek(fd, 0, SEEK_SET).
Yes.
That is the proper and expected behavior. I suspect you
will see exactly that behavior with ext4 and 32-bit
directory offsets, for example.
Emm...
For this case like this:
1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2
2. open /tmp/dir with fd1
3. readdir and get /tmp/dir/file1
4. rm /tmp/dir/file2
5. touch /tmp/dir/file2
4. loop 4~5 for 2^32 times
5. readdir /tmp/dir with fd1
For tmpfs now, we may see no /tmp/dir/file2, since the offset has been
overflow, for ext4 it is ok... So we think this will be a problem.
I constructed a simple test program using the above steps:
/*
* 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2
* 2. open /tmp/dir with fd1
* 3. readdir and get /tmp/dir/file1
* 4. rm /tmp/dir/file2
* 5. touch /tmp/dir/file2
* 6. loop 4~5 for 2^32 times
* 7. readdir /tmp/dir with fd1
*/
#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
static void list_directory(DIR *dirp)
{
struct dirent *de;
errno = 0;
do {
de = readdir(dirp);
if (!de)
break;
printf("d_off: %lld\n", de->d_off);
printf("d_name: %s\n", de->d_name);
} while (true);
if (errno)
perror("readdir");
else
printf("EOD\n");
}
int main(int argc, char **argv)
{
unsigned long i;
DIR *dirp;
int ret;
/* 1. */
ret = mkdir("/tmp/dir", 0755);
if (ret < 0) {
perror("mkdir");
return 1;
}
ret = creat("/tmp/dir/file1", 0644);
if (ret < 0) {
perror("creat");
return 1;
}
close(ret);
ret = creat("/tmp/dir/file2", 0644);
if (ret < 0) {
perror("creat");
return 1;
}
close(ret);
/* 2. */
errno = 0;
dirp = opendir("/tmp/dir");
if (!dirp) {
if (errno)
perror("opendir");
else
fprintf(stderr, "EOD\n");
closedir(dirp);
return 1;
}
/* 3. */
errno = 0;
do {
struct dirent *de;
de = readdir(dirp);
if (!de) {
if (errno) {
perror("readdir");
closedir(dirp);
return 1;
}
break;
}
if (strcmp(de->d_name, "file1") == 0) {
printf("Found 'file1'\n");
break;
}
} while (true);
/* run the test. */
for (i = 0; i < 10000; i++) {
/* 4. */
ret = unlink("/tmp/dir/file2");
if (ret < 0) {
perror("unlink");
closedir(dirp);
return 1;
}
/* 5. */
ret = creat("/tmp/dir/file2", 0644);
if (ret < 0) {
perror("creat");
fprintf(stderr, "i = %lu\n", i);
closedir(dirp);
return 1;
}
close(ret);
}
/* 7. */
printf("\ndirectory after test:\n");
list_directory(dirp);
/* cel. */
rewinddir(dirp);
printf("\ndirectory after rewind:\n");
list_directory(dirp);
closedir(dirp);
return 0;
}
Does that not directly address your concern? Or do you
mean that Erkun's patch introduces a new issue?
Yes, to be honest, my personal feeling is a problem. But for 64bit, it may
never been trigger.
I ran the test program above on this kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=nfsd-testing
Note that it has a patch to restrict the range of directory offset
values for tmpfs to 2..4096.
I did not observe any unexpected behavior after the offset values
wrapped. At step 7, I can always see file2, and its offset is always
4. At step "cel" I can see all expected directory entries.
Then, do you investigate more or not?
I tested on v6.12-rc7 with the same range restriction but using
Maple tree and 64-bit offsets. No unexpected behavior there either.
So either we're still missing something, or there is no problem. My
only theory is maybe it's an issue with an implicit integer sign
conversion, and we should restrict the offset range to 2..S32_MAX.
I can try testing with a range of (U32_MAX - 4096)..(U32_MAX).
You can try the following reproducer, it's much simpler. First, apply
following patch(on latest kernel):
diff --git a/fs/libfs.c b/fs/libfs.c
index a168ece5cc61..7c1a5982a0c8 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -291,7 +291,7 @@ int simple_offset_add(struct offset_ctx *octx,
struct dentry *dentry)
return -EBUSY;
ret = mtree_alloc_cyclic(&octx->mt, &offset, dentry,
DIR_OFFSET_MIN,
- LONG_MAX, &octx->next_offset, GFP_KERNEL);
+ 256, &octx->next_offset, GFP_KERNEL);
if (ret < 0)
return ret;
Then, create a new tmpfs dir, inside the dir:
[root@fedora test-libfs]# for ((i=0; i<256; ++i)); do touch $i; done
touch: cannot touch '255': Device or resource busy
[root@fedora test-libfs]# ls
0 103 109 114 12 125 130 136 141 147 152 158 163 169
174 18 185 190 196 200 206 211 217 222 228 233 239 244 25
26 31 37 42 48 53 59 64 7 75 80 86 91 97
1 104 11 115 120 126 131 137 142 148 153 159 164 17
175 180 186 191 197 201 207 212 218 223 229 234 24 245
250 27 32 38 43 49 54 6 65 70 76 81 87 92 98
10 105 110 116 121 127 132 138 143 149 154 16 165 170
176 181 187 192 198 202 208 213 219 224 23 235 240 246
251 28 33 39 44 5 55 60 66 71 77 82 88 93 99
100 106 111 117 122 128 133 139 144 15 155 160 166 171
177 182 188 193 199 203 209 214 22 225 230 236 241 247
252 29 34 4 45 50 56 61 67 72 78 83 89 94
101 107 112 118 123 129 134 14 145 150 156 161 167 172
178 183 189 194 2 204 21 215 220 226 231 237 242 248
253 3 35 40 46 51 57 62 68 73 79 84 9 95
102 108 113 119 124 13 135 140 146 151 157 162 168 173
179 184 19 195 20 205 210 216 221 227 232 238 243 249
254 30 36 41 47 52 58 63 69 74 8 85 90 96
[root@fedora test-libfs]# rm -f 0
[root@fedora test-libfs]# touch 255
[root@fedora test-libfs]# ls
255
[root@fedora test-libfs]#
I don't think I have to explain why the second ls can only show the file
255...
Thanks,
Kuai
If there is a problem here, please construct a reproducer
against this patch set and post it.
Invitation still stands: if you have a solid reproducer, please post
it.