Il giorno lun, 23/12/2019 alle 21.08 +0800, Ming Lei ha scritto: > On Mon, Dec 23, 2019 at 12:22:45PM +0100, Andrea Vai wrote: > > Il giorno mer, 18/12/2019 alle 17.48 +0800, Ming Lei ha scritto: > > > On Wed, Dec 18, 2019 at 09:25:02AM +0100, Andrea Vai wrote: > > > > Il giorno gio, 12/12/2019 alle 05.33 +0800, Ming Lei ha > scritto: > > > > > On Wed, Dec 11, 2019 at 11:07:45AM -0500, Theodore Y. Ts'o > > > wrote: > > > > > > On Wed, Dec 11, 2019 at 12:00:58PM +0800, Ming Lei wrote: > > > > > > > I didn't reproduce the issue in my test environment, and > > > follows > > > > > > > Andrea's test commands[1]: > > > > > > > > > > > > > > mount UUID=$uuid /mnt/pendrive 2>&1 |tee -a $logfile > > > > > > > SECONDS=0 > > > > > > > cp $testfile /mnt/pendrive 2>&1 |tee -a $logfile > > > > > > > umount /mnt/pendrive 2>&1 |tee -a $logfile > > > > > > > > > > > > > > The 'cp' command supposes to open/close the file just > once, > > > > > however > > > > > > > ext4_release_file() & write pages is observed to run for > > > 4358 > > > > > times > > > > > > > when executing the above 'cp' test. > > > > > > > > > > > > Why are we sure the ext4_release_file() / _fput() is > coming > > > from > > > > > the > > > > > > cp command, as opposed to something else that might be > running > > > on > > > > > the > > > > > > system under test? _fput() is called by the kernel when > the > > > last > > > > > > > > > > Please see the log: > > > > > > > > > > > > > > https://lore.kernel.org/linux-scsi/3af3666920e7d46f8f0c6d88612f143ffabc743c.camel@xxxxxxxx/2-log_ming.zip > > > > > > > > > > Which is collected by: > > > > > > > > > > #!/bin/sh > > > > > MAJ=$1 > > > > > MIN=$2 > > > > > MAJ=$(( $MAJ << 20 )) > > > > > DEV=$(( $MAJ | $MIN )) > > > > > > > > > > /usr/share/bcc/tools/trace -t -C \ > > > > > 't:block:block_rq_issue (args->dev == '$DEV') "%s %d > %d", > > > args- > > > > > >rwbs, args->sector, args->nr_sector' \ > > > > > 't:block:block_rq_insert (args->dev == '$DEV') "%s %d > %d", > > > args- > > > > > >rwbs, args->sector, args->nr_sector' > > > > > > > > > > $MAJ:$MIN points to the USB storage disk. > > > > > > > > > > From the above IO trace, there are two write paths, one is > from > > > cp, > > > > > another is from writeback wq. > > > > > > > > > > The stackcount trace[1] is consistent with the IO trace log > > > since it > > > > > only shows two IO paths, that is why I concluded that the > write > > > done > > > > > via > > > > > ext4_release_file() is from 'cp'. > > > > > > > > > > [1] > > > > > > > > > https://lore.kernel.org/linux-scsi/320b315b9c87543d4fb919ecbdf841596c8fbcea.camel@xxxxxxxx/2-log_ming_20191129_150609.zip > > > > > > > > > > > reference to a struct file is released. (Specifically, if > you > > > > > have a > > > > > > fd which is dup'ed, it's only when the last fd > corresponding > > > to > > > > > the > > > > > > struct file is closed, and the struct file is about to be > > > > > released, > > > > > > does the file system's f_ops->release function get > called.) > > > > > > > > > > > > So the first question I'd ask is whether there is anything > > > else > > > > > going > > > > > > on the system, and whether the writes are happening to the > USB > > > > > thumb > > > > > > drive, or to some other storage device. And if there is > > > something > > > > > > else which is writing to the pendrive, maybe that's why no > one > > > > > else > > > > > > has been able to reproduce the OP's complaint.... > > > > > > > > > > OK, we can ask Andrea to confirm that via the following > trace, > > > which > > > > > will add pid/comm info in the stack trace: > > > > > > > > > > /usr/share/bcc/tools/stackcount > blk_mq_sched_request_inserted > > > > > > > > > > Andrew, could you collect the above log again when running > > > new/bad > > > > > kernel for confirming if the write done by > ext4_release_file() > > > is > > > > > from > > > > > the 'cp' process? > > > > > > > > You can find the stackcount log attached. It has been produced > by: > > > > > > > > - /usr/share/bcc/tools/stackcount > blk_mq_sched_request_inserted > > > > trace.log > > > > - wait some seconds > > > > - run the test (1 copy trial), wait for the test to finish, > wait > > > some seconds > > > > - stop the trace (ctrl+C) > > > > > > Thanks for collecting the log, looks your 'stackcount' doesn't > > > include > > > comm/pid info, seems there is difference between your bcc and > > > my bcc in fedora 30. > > > > > > Could you collect above log again via the following command? > > > > > > /usr/share/bcc/tools/stackcount -P -K t:block:block_rq_insert > > > > > > which will show the comm/pid info. > > > > ok, attached (trace_20191219.txt), the test (1 trial) took 3684 > > seconds. > > From the above trace: > > b'blk_mq_sched_request_inserted' > b'blk_mq_sched_request_inserted' > b'dd_insert_requests' > b'blk_mq_sched_insert_requests' > b'blk_mq_flush_plug_list' > b'blk_flush_plug_list' > b'io_schedule_prepare' > b'io_schedule' > b'rq_qos_wait' > b'wbt_wait' > b'__rq_qos_throttle' > b'blk_mq_make_request' > b'generic_make_request' > b'submit_bio' > b'ext4_io_submit' > b'ext4_writepages' > b'do_writepages' > b'__filemap_fdatawrite_range' > b'ext4_release_file' > b'__fput' > b'task_work_run' > b'exit_to_usermode_loop' > b'do_syscall_64' > b'entry_SYSCALL_64_after_hwframe' > b'cp' [19863] > 4400 > > So this write is clearly from 'cp' process, and it should be one > ext4 fs issue. > > Ted, can you take a look at this issue? > > > > > > > I also tried the usual test with btrfs and xfs. Btrfs behavior > > > looks > > > > "good". xfs seems sometimes better, sometimes worse, I would > say. > > > I > > > > don't know if it matters, anyway you can also find the results > of > > > the > > > > two tests (100 trials each). Basically, btrfs is always > between 68 > > > and > > > > 89 seconds, with a cyclicity (?) with "period=2 trials". xfs > looks > > > > almost always very good (63-65s), but sometimes "bad" (>300s). > > > > > > If you are interested in digging into this one, the following > trace > > > should be helpful: > > > > > > > https://lore.kernel.org/linux-scsi/f38db337cf26390f7c7488a0bc2076633737775b.camel@xxxxxxxx/T/#m5aa008626e07913172ad40e1eb8e5f2ffd560fc6 > > > > > > > Attached: > > - trace_xfs_20191223.txt (7 trials, then aborted while doing the > 8th), > > times to complete: > > 64s > > 63s > > 64s > > 833s > > 1105s > > 63s > > 64s > > oops, looks we have to collect io insert trace with the following > bcc script > on xfs for confirming if there is similar issue with ext4, could you > run > it again on xfs? And only post the trace done in case of slow 'cp'. > > > #!/bin/sh > > MAJ=$1 > MIN=$2 > MAJ=$(( $MAJ << 20 )) > DEV=$(( $MAJ | $MIN )) > > /usr/share/bcc/tools/trace -t -C \ > 't:block:block_rq_issue (args->dev == '$DEV') "%s %d %d", args- > >rwbs, args->sector, args->nr_sector' \ > 't:block:block_rq_insert (args->dev == '$DEV') "%s %d %d", args- > >rwbs, args->sector, args->nr_sector' > > here it is (1 trial, 313 seconds to finish) Thanks, Andrea
Attachment:
trace_20191223_xfs_new.zip
Description: Zip archive