On 2016-08-09 05:37, Darrick J. Wong wrote: > On Tue, Aug 09, 2016 at 12:13:01AM +0300, Török Edwin wrote: >> On 2016-08-08 19:55, Darrick J. Wong wrote: >>> On Mon, Aug 08, 2016 at 12:08:18PM -0400, Theodore Ts'o wrote: >>>> On Sun, Aug 07, 2016 at 11:28:10PM -0700, Darrick J. Wong wrote: >>>>> >>>>> I have one lingering concern -- is it a bug that two processes could be >>>>> computing the checksum of a buffer simultaneously? I would have thought ext4 >>>>> would serialize that kind of buffer_head access... >>>> >>>> Do we know how this is happening? We've always depended on the VFS to >>>> provide this exclusion. The only way we should be modifying the >>>> buffer_head at the same time if two CPU's are trying to modify the >>>> directory at the same time, and that should _never_ be happening, even >>>> with the new directory parallism code, unless the file system has >>>> given permission and intends to do its own fine-grained locking. >>> >>> It's a combination of two things, I think. The first is that the >>> checksum calculation routine (temporarily) set the checksum field to >>> zero during the computation, which of course is a no-no. The patch >>> fixes that problem and should go in. >> >> Thanks a lot for the patch. >> I wrote a small testcase (see below) that triggers the problem quite soon on >> my box with kernel 4.7.0, and seems to have survived so far with kernel >> 4.7.0+patch. >> When it failed it printed something like "readdir: Bad message". >> >> The drop caches part is quite important for triggering the bug, and might >> explain why this bug was hard to reproduce: IIUC this race condition can >> happen only if 2+ threads/processes try to access the same directory, and the >> directory's inode is not in the cache (i.e. was never cached, or got kicked >> out of the cache). > > Could you formulate this into an xfstest, please? It would be very useful to > have this as a regression test. > > (Or attach a Signed-off-by and I'll take care of it eventually.) I've attached a signed-off-by line and a copyright header (feel free to add yourself in the copyright header too): Signed-off-by: Török Edwin <edwin@xxxxxxxxxx> >> /* >> $ gcc trigger.c -o trigger -pthread >> $ ./trigger >> */ /* * Test concurrent readdir on uncached inodes * * Copyright (C) 2016 Skylable Ltd. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version 2 * of the License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ >> >> #include <stdio.h> >> #include <sys/types.h> >> #include <sys/stat.h> >> #include <dirent.h> >> #include <string.h> >> #include <stdlib.h> >> #include <errno.h> >> #include <pthread.h> >> #include <unistd.h> >> #include <fcntl.h> >> >> #define FILES 100000 >> #define THREADS 16 >> #define LOOPS 1000 >> >> static void die(const char *msg) >> { >> perror(msg); >> exit(EXIT_FAILURE); >> } >> >> static void* list(void* arg) >> { >> for(int i=0;i<LOOPS;i++) { >> DIR *d = opendir("."); >> if (!d) { >> die("opendir"); >> } >> errno = 0; >> while(readdir(d)) {} >> if (errno) { >> die("readdir"); >> } >> closedir(d); >> FILE *f = fopen("/proc/sys/vm/drop_caches", "w"); >> if (f) { >> fputs("3", f); >> fclose(f); >> } >> } >> return NULL; >> } >> >> int main() >> { >> pthread_t t[THREADS]; >> >> if(mkdir("ext4test", 0755) < 0 && errno != EEXIST) >> die("mkdir"); >> if(chdir("ext4test") < 0) >> die("chdir"); >> for (unsigned i=0;i < FILES;i++) { >> char name[16]; >> snprintf(name, sizeof(name), "%d", i); >> int fd = open(name, O_WRONLY|O_CREAT, 0600); >> if (fd < 0) >> die("open"); >> close(fd); >> } >> for (unsigned i=0;i < sizeof(t)/sizeof(t[0]); i++) { >> pthread_create(&t[i], NULL,list, NULL); >> } >> for (unsigned i=0;i < sizeof(t)/sizeof(t[0]); i++) { >> pthread_join(t[i], NULL); >> } >> return 0; >> } >> >> >> >> -- >> Edwin Török | Co-founder and Lead Developer >> >> Skylable open-source object storage: reliable, fast, secure >> http://www.skylable.com > -- Edwin Török | Co-founder and Lead Developer Skylable open-source object storage: reliable, fast, secure http://www.skylable.com -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html