On Thu, 6 Nov 2008 21:21:12 +0100 Alain Knaff <alain@xxxxxxxx> wrote: > This patch fixes a race condition in lseek. While it is expected that > unpredictable behaviour may result while repositioning the offset of a > file descriptor concurrently with reading/writing to the same file > descriptor, this should not happen when merely *reading* the file > descriptor's offset. > > Unfortunately, the only portable way in Unix to read a file > descriptor's offset is lseek(fd, 0, SEEK_CUR); however executing this > concurrently with read/write may mess up the position, as shown by the > testcase below: > > #include <sys/types.h> > #include <stdio.h> > #include <pthread.h> > #include <unistd.h> > #include <errno.h> > #include <string.h> > #include <pthread.h> > > > void *loop(void *ptr) > { > fprintf(stderr, "Starting seek thread\n"); > while(1) { > if(lseek(0, 0LL, SEEK_CUR) < 0LL) > perror("seek"); > } > } > > int main(int argc, char **argv) { > long long l=0; > int r; > char buf[4096]; > > pthread_t thread; > pthread_create(&thread, 0, loop, 0); > > for(r=0; 1 ; r++) { > int n = read(0, buf, 4096); > if(n == 0) > break; > if(n < 4096) { > fprintf(stderr, "Short read %d %s\n", n, strerror(errno)); > } > l+= n; > } > fprintf(stderr, "Read %lld bytes\n", l); > > return 0; > } > > Compile this and run it on a multi-processor machine as > ./a.out <bigFile > > where bigFile is a 1 Gigabyte file. It should print 1073741824. > However, on a buggy kernel, it usually produces a bigger number. The > problem only happens on a multiprocessor machine. This is because an > lseek(fd, 0, SEEK_CUR) running concurrently with a read() or write() > will reset the position back to what it used to be when the read() > started. > > This behavior was observed "in the wild" when using udpcast which uses > lseek to monitor progress of reading/writing the uncompressed data. > > The patch below fixes the issue by "special-casing" the lseek(fd, 0, > SEEK_CUR) pattern. > > Apparently, an attempt was already made to fix the issue by the > following code: > > if (offset != file->f_pos) { > file->f_pos = offset; > file->f_version = 0; > } > > However, this doesn't work if file->f_pos was changed (by read() or > write()) between the time offset was computed, and the time where it > considers writing it back. > > Signed-off-by: Alain Knaff <alain@xxxxxxxx> > > --- > > diff -pur kernel.orig/fs/read_write.c kernel/fs/read_write.c > --- kernel.orig/fs/read_write.c 2008-10-11 14:12:07.000000000 +0200 > +++ kernel/fs/read_write.c 2008-11-06 19:55:59.000000000 +0100 > @@ -42,6 +42,8 @@ generic_file_llseek_unlocked(struct file > offset += inode->i_size; > break; > case SEEK_CUR: > + if(offset == 0) > + return file->f_pos; > offset += file->f_pos; > } > retval = -EINVAL; OK, I think that a concurrent lseek(fd, 0, SEEK_CUR) is a sufficiently sane operation that this is worth doing. As you point out, there is no other way of userspace doing what is effectively a read-only operation - userspace would be entitled to wonder "ytf did the kernel rewrite the file offset for that?". Do the below additions look OK? From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> - fix coding-style - fix default_llseek() as well - add comments Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Cc: Alain Knaff <alain@xxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- fs/read_write.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff -puN fs/read_write.c~vfs-lseekfd-0-seek_cur-race-condition-fix fs/read_write.c --- a/fs/read_write.c~vfs-lseekfd-0-seek_cur-race-condition-fix +++ a/fs/read_write.c @@ -50,6 +50,14 @@ generic_file_llseek_unlocked(struct file offset += inode->i_size; break; case SEEK_CUR: + /* + * Here we special-case the lseek(fd, 0, SEEK_CUR) + * position-querying operation. Avoid rewriting the "same" + * f_pos value back to the file because a concurrent read(), + * write() or lseek() might have altered it + */ + if (offset == 0) + return file->f_pos; offset += file->f_pos; break; } @@ -105,8 +113,14 @@ loff_t default_llseek(struct file *file, offset += i_size_read(file->f_path.dentry->d_inode); break; case SEEK_CUR: - if(offset == 0) - return file->f_pos; + /* + * See SEEK_CUR description in + * generic_file_llseek_unlocked() + */ + if (offset == 0) { + retval = file->f_pos; + goto out; + } offset += file->f_pos; } retval = -EINVAL; @@ -117,6 +131,7 @@ loff_t default_llseek(struct file *file, } retval = offset; } +out: unlock_kernel(); return retval; } _ -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html