On Tue, Dec 10, 2019 at 09:41:37PM -0500, Theodore Y. Ts'o wrote: > On Tue, Dec 10, 2019 at 04:05:50PM +0800, Ming Lei wrote: > > > > The path[2] is expected behaviour. Not sure path [1] is correct, > > > > given > > > > ext4_release_file() is supposed to be called when this inode is > > > > released. That means the file is closed 4358 times during 1GB file > > > > copying to usb storage. > > > > > > > > [1] insert requests when returning to user mode from syscall > > > > > > > > b'blk_mq_sched_request_inserted' > > > > b'blk_mq_sched_request_inserted' > > > > b'dd_insert_requests' > > > > b'blk_mq_sched_insert_requests' > > > > b'blk_mq_flush_plug_list' > > > > b'blk_flush_plug_list' > > > > b'io_schedule_prepare' > > > > b'io_schedule' > > > > b'rq_qos_wait' > > > > b'wbt_wait' > > > > b'__rq_qos_throttle' > > > > b'blk_mq_make_request' > > > > b'generic_make_request' > > > > b'submit_bio' > > > > b'ext4_io_submit' > > > > b'ext4_writepages' > > > > b'do_writepages' > > > > b'__filemap_fdatawrite_range' > > > > b'ext4_release_file' > > > > b'__fput' > > > > b'task_work_run' > > > > b'exit_to_usermode_loop' > > > > b'do_syscall_64' > > > > b'entry_SYSCALL_64_after_hwframe' > > > > 4358 > > I'm guessing that your workload is repeatedly truncating a file (or > calling open with O_TRUNC) and then writing data to it. When you do > this, then when the file is closed, we assume that since you were > replacing the previous contents of a file with new contents, that you > would be unhappy if the file contents was replaced by a zero length > file after a crash. That's because ten years, ago there were a *huge* > number of crappy applications that would replace a file by reading it > into memory, truncating it, and then write out the new contents of the > file. This could be a high score file for a game, or a KDE or GNOME > state file, etc. > > So if someone does open, truncate, write, close, we still immediately > writing out the data on the close, assuming that the programmer really > wanted open, truncate, write, fsync, close, but was too careless to > actually do the right thing. > > Some workaround[1] like this is done by all of the major file systems, > and was fallout the agreement from the "O_PONIES"[2] controversy. > This was discussed and agreed to at the 2009 LSF/MM workshop. (See > the "rename, fsync, and ponies" section.) > > [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45 > [2] https://blahg.josefsipek.net/?p=364 > [3] https://lwn.net/Articles/327601/ > > So if you're seeing a call to filemap_fdatawrite_range as the result > of a fput, that's why. > > In any case, this behavior has been around for a decade, and it > appears to be incidental to your performance difficulties with your > USB thumbdrive and block-mq. I didn't reproduce the issue in my test environment, and follows Andrea's test commands[1]: mount UUID=$uuid /mnt/pendrive 2>&1 |tee -a $logfile SECONDS=0 cp $testfile /mnt/pendrive 2>&1 |tee -a $logfile umount /mnt/pendrive 2>&1 |tee -a $logfile The 'cp' command supposes to open/close the file just once, however ext4_release_file() & write pages is observed to run for 4358 times when executing the above 'cp' test. [1] https://marc.info/?l=linux-kernel&m=157486689806734&w=2 Thanks, Ming