Hi Ted, On Mon, Dec 23, 2019 at 02:53:01PM -0500, Theodore Y. Ts'o wrote: > On Mon, Dec 23, 2019 at 07:45:57PM +0100, Andrea Vai wrote: > > basically, it's: > > > > mount UUID=$uuid /mnt/pendrive > > SECONDS=0 > > cp $testfile /mnt/pendrive > > umount /mnt/pendrive > > tempo=$SECONDS > > > > and it copies one file only. Anyway, you can find the whole script > > attached. > > OK, so whether we are doing the writeback at the end of cp, or when > you do the umount, it's probably not going to make any difference. We > can get rid of the stack trace in question by changing the script to > be basically: > > mount UUID=$uuid /mnt/pendrive > SECONDS=0 > rm -f /mnt/pendrive/$testfile > cp $testfile /mnt/pendrive > umount /mnt/pendrive > tempo=$SECONDS > > I predict if you do that, you'll see that all of the time is spent in > the umount, when we are trying to write back the file. > > I really don't think then this is a file system problem at all. It's > just that USB I/O is slow, for whatever reason. We'll see a stack > trace in the writeback code waiting for the I/O to be completed, but > that doesn't mean that the root cause is in the writeback code or in > the file system which is triggering the writeback. Wrt. the slow write on this usb storage, it is caused by two writeback path, one is the writeback wq, another is from ext4_release_file() which is triggered from exit_to_usermode_loop(). When the two write path is run concurrently, the sequential write order is broken, then write performance drops much on this particular usb storage. The ext4_release_file() should be run from read() or write() syscall if Fedora 30's 'cp' is implemented correctly. IMO, it isn't expected behavior for ext4_release_file() to be run thousands of times when just running 'cp' once, see comment of ext4_release_file(): /* * Called when an inode is released. Note that this is different * from ext4_file_open: open gets called at every open, but release * gets called only when /all/ the files are closed. */ static int ext4_release_file(struct inode *inode, struct file *filp) > > I suspect the next step is use a blktrace, to see what kind of I/O is > being sent to the USB drive, and how long it takes for the I/O to > complete. You might also try to capture the output of "iostat -x 1" > while the script is running, and see what the difference might be > between a kernel version that has the problem and one that doesn't, > and see if that gives us a clue. That isn't necessary, given we have concluded that the bad write performance is caused by broken write order. > > > > And then send me > > btw, please tell me if "me" means only you or I cc: all the > > recipients, as usual > > Well, I don't think we know what the root cause is. Ming is focusing > on that stack trace, but I think it's a red herring..... And if it's > not a file system problem, then other people will be best suited to > debug the issue. So far, the reason points to the extra writeback path from exit_to_usermode_loop(). If it is not from close() syscall, the issue should be related with file reference count. If it is from close() syscall, the issue might be in 'cp''s implementation. Andrea, please collect the following log or the strace log requested by Ted, then we can confirm if the extra writeback is from close() or read/write() syscall: # pass PID of 'cp' to this script #!/bin/sh PID=$1 /usr/share/bcc/tools/trace -P $PID -t -C \ 't:block:block_rq_insert "%s %d %d", args->rwbs, args->sector, args->nr_sector' \ 't:syscalls:sys_exit_close ' \ 't:syscalls:sys_exit_read ' \ 't:syscalls:sys_exit_write ' Thanks, Ming