Re: AW: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

Andrea Vai <andrea.vai@xxxxxxxx> · Mon, 23 Dec 2019 15:02:35 +0100



Il giorno lun, 23/12/2019 alle 21.08 +0800, Ming Lei ha scritto:
> On Mon, Dec 23, 2019 at 12:22:45PM +0100, Andrea Vai wrote:
> > Il giorno mer, 18/12/2019 alle 17.48 +0800, Ming Lei ha scritto:
> > > On Wed, Dec 18, 2019 at 09:25:02AM +0100, Andrea Vai wrote:
> > > > Il giorno gio, 12/12/2019 alle 05.33 +0800, Ming Lei ha
> scritto:
> > > > > On Wed, Dec 11, 2019 at 11:07:45AM -0500, Theodore Y. Ts'o
> > > wrote:
> > > > > > On Wed, Dec 11, 2019 at 12:00:58PM +0800, Ming Lei wrote:
> > > > > > > I didn't reproduce the issue in my test environment, and
> > > follows
> > > > > > > Andrea's test commands[1]:
> > > > > > > 
> > > > > > >   mount UUID=$uuid /mnt/pendrive 2>&1 |tee -a $logfile
> > > > > > >   SECONDS=0
> > > > > > >   cp $testfile /mnt/pendrive 2>&1 |tee -a $logfile
> > > > > > >   umount /mnt/pendrive 2>&1 |tee -a $logfile
> > > > > > > 
> > > > > > > The 'cp' command supposes to open/close the file just
> once,
> > > > > however
> > > > > > > ext4_release_file() & write pages is observed to run for
> > > 4358
> > > > > times
> > > > > > > when executing the above 'cp' test.
> > > > > > 
> > > > > > Why are we sure the ext4_release_file() / _fput() is
> coming
> > > from
> > > > > the
> > > > > > cp command, as opposed to something else that might be
> running
> > > on
> > > > > the
> > > > > > system under test?  _fput() is called by the kernel when
> the
> > > last
> > > > > 
> > > > > Please see the log:
> > > > > 
> > > > > 
> > > 
> https://lore.kernel.org/linux-scsi/3af3666920e7d46f8f0c6d88612f143ffabc743c.camel@xxxxxxxx/2-log_ming.zip
> > > > > 
> > > > > Which is collected by:
> > > > > 
> > > > > #!/bin/sh
> > > > > MAJ=$1
> > > > > MIN=$2
> > > > > MAJ=$(( $MAJ << 20 ))
> > > > > DEV=$(( $MAJ | $MIN ))
> > > > > 
> > > > > /usr/share/bcc/tools/trace -t -C \
> > > > >     't:block:block_rq_issue (args->dev == '$DEV') "%s %d
> %d",
> > > args-
> > > > > >rwbs, args->sector, args->nr_sector' \
> > > > >     't:block:block_rq_insert (args->dev == '$DEV') "%s %d
> %d",
> > > args-
> > > > > >rwbs, args->sector, args->nr_sector'
> > > > > 
> > > > > $MAJ:$MIN points to the USB storage disk.
> > > > > 
> > > > > From the above IO trace, there are two write paths, one is
> from
> > > cp,
> > > > > another is from writeback wq.
> > > > > 
> > > > > The stackcount trace[1] is consistent with the IO trace log
> > > since it
> > > > > only shows two IO paths, that is why I concluded that the
> write
> > > done
> > > > > via
> > > > > ext4_release_file() is from 'cp'.
> > > > > 
> > > > > [1] 
> > > > > 
> > > 
> https://lore.kernel.org/linux-scsi/320b315b9c87543d4fb919ecbdf841596c8fbcea.camel@xxxxxxxx/2-log_ming_20191129_150609.zip
> > > > > 
> > > > > > reference to a struct file is released.  (Specifically, if
> you
> > > > > have a
> > > > > > fd which is dup'ed, it's only when the last fd
> corresponding
> > > to
> > > > > the
> > > > > > struct file is closed, and the struct file is about to be
> > > > > released,
> > > > > > does the file system's f_ops->release function get
> called.)
> > > > > > 
> > > > > > So the first question I'd ask is whether there is anything
> > > else
> > > > > going
> > > > > > on the system, and whether the writes are happening to the
> USB
> > > > > thumb
> > > > > > drive, or to some other storage device.  And if there is
> > > something
> > > > > > else which is writing to the pendrive, maybe that's why no
> one
> > > > > else
> > > > > > has been able to reproduce the OP's complaint....
> > > > > 
> > > > > OK, we can ask Andrea to confirm that via the following
> trace,
> > > which
> > > > > will add pid/comm info in the stack trace:
> > > > > 
> > > > > /usr/share/bcc/tools/stackcount
> blk_mq_sched_request_inserted
> > > > > 
> > > > > Andrew, could you collect the above log again when running
> > > new/bad
> > > > > kernel for confirming if the write done by
> ext4_release_file()
> > > is
> > > > > from
> > > > > the 'cp' process?
> > > > 
> > > > You can find the stackcount log attached. It has been produced
> by:
> > > > 
> > > > - /usr/share/bcc/tools/stackcount
> blk_mq_sched_request_inserted >
> > > trace.log
> > > > - wait some seconds
> > > > - run the test (1 copy trial), wait for the test to finish,
> wait
> > > some seconds
> > > > - stop the trace (ctrl+C)
> > > 
> > > Thanks for collecting the log, looks your 'stackcount' doesn't
> > > include
> > > comm/pid info, seems there is difference between your bcc and
> > > my bcc in fedora 30.
> > > 
> > > Could you collect above log again via the following command?
> > > 
> > > /usr/share/bcc/tools/stackcount -P -K t:block:block_rq_insert
> > > 
> > > which will show the comm/pid info.
> > 
> > ok, attached (trace_20191219.txt), the test (1 trial) took 3684
> > seconds.
> 
> From the above trace:
> 
>   b'blk_mq_sched_request_inserted'
>   b'blk_mq_sched_request_inserted'
>   b'dd_insert_requests'
>   b'blk_mq_sched_insert_requests'
>   b'blk_mq_flush_plug_list'
>   b'blk_flush_plug_list'
>   b'io_schedule_prepare'
>   b'io_schedule'
>   b'rq_qos_wait'
>   b'wbt_wait'
>   b'__rq_qos_throttle'
>   b'blk_mq_make_request'
>   b'generic_make_request'
>   b'submit_bio'
>   b'ext4_io_submit'
>   b'ext4_writepages'
>   b'do_writepages'
>   b'__filemap_fdatawrite_range'
>   b'ext4_release_file'
>   b'__fput'
>   b'task_work_run'
>   b'exit_to_usermode_loop'
>   b'do_syscall_64'
>   b'entry_SYSCALL_64_after_hwframe'
>     b'cp' [19863]
>     4400
> 
> So this write is clearly from 'cp' process, and it should be one
> ext4 fs issue.
> 
> Ted, can you take a look at this issue?
> 
> > 
> > > > I also tried the usual test with btrfs and xfs. Btrfs behavior
> > > looks
> > > > "good". xfs seems sometimes better, sometimes worse, I would
> say.
> > > I
> > > > don't know if it matters, anyway you can also find the results
> of
> > > the
> > > > two tests (100 trials each). Basically, btrfs is always
> between 68
> > > and
> > > > 89 seconds, with a cyclicity (?) with "period=2 trials". xfs
> looks
> > > > almost always very good (63-65s), but sometimes "bad" (>300s).
> > > 
> > > If you are interested in digging into this one, the following
> trace
> > > should be helpful:
> > > 
> > > 
> https://lore.kernel.org/linux-scsi/f38db337cf26390f7c7488a0bc2076633737775b.camel@xxxxxxxx/T/#m5aa008626e07913172ad40e1eb8e5f2ffd560fc6
> > > 
> > 
> > Attached:
> > - trace_xfs_20191223.txt (7 trials, then aborted while doing the
> 8th),
> > times to complete:
> > 64s
> > 63s
> > 64s
> > 833s
> > 1105s
> > 63s
> > 64s
> 
> oops, looks we have to collect io insert trace with the following
> bcc script
> on xfs for confirming if there is similar issue with ext4, could you
> run
> it again on xfs? And only post the trace done in case of slow 'cp'.
> 
> 
> #!/bin/sh
> 
> MAJ=$1
> MIN=$2
> MAJ=$(( $MAJ << 20 ))
> DEV=$(( $MAJ | $MIN ))
> 
> /usr/share/bcc/tools/trace -t -C \
>     't:block:block_rq_issue (args->dev == '$DEV') "%s %d %d", args-
> >rwbs, args->sector, args->nr_sector' \
>     't:block:block_rq_insert (args->dev == '$DEV') "%s %d %d", args-
> >rwbs, args->sector, args->nr_sector'
> 
> 
here it is (1 trial, 313 seconds to finish)

Thanks,
Andrea
Attachment:
trace_20191223_xfs_new.zip

Description: Zip archive