Re: [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Mon, Jun 06, 2022 at 09:32:13AM +1000, Dave Chinner wrote:
> On Fri, Jun 03, 2022 at 12:09:06PM -0400, Chris Mason wrote:
> > On 6/3/22 11:06 AM, Johannes Weiner wrote:
> > > On Fri, Jun 03, 2022 at 03:20:47PM +1000, Dave Chinner wrote:
> > > > On Fri, Jun 03, 2022 at 01:29:40AM +0000, Chris Mason wrote:
> > > > > As you describe above, the loops are definitely coming from higher
> > > > > in the stack.  wb_writeback() will loop as long as
> > > > > __writeback_inodes_wb() returns that it’s making progress and
> > > > > we’re still globally over the bg threshold, so write_cache_pages()
> > > > > is just being called over and over again.  We’re coming from
> > > > > wb_check_background_flush(), so:
> > > > > 
> > > > >                  struct wb_writeback_work work = {
> > > > >                          .nr_pages       = LONG_MAX,
> > > > >                          .sync_mode      = WB_SYNC_NONE,
> > > > >                          .for_background = 1,
> > > > >                          .range_cyclic   = 1,
> > > > >                          .reason         = WB_REASON_BACKGROUND,
> > > > >                  };
> > > > 
> > > > Sure, but we end up in writeback_sb_inodes() which does this after
> > > > the __writeback_single_inode()->do_writepages() call that iterates
> > > > the dirty pages:
> > > > 
> > > >                 if (need_resched()) {
> > > >                          /*
> > > >                           * We're trying to balance between building up a nice
> > > >                           * long list of IOs to improve our merge rate, and
> > > >                           * getting those IOs out quickly for anyone throttling
> > > >                           * in balance_dirty_pages().  cond_resched() doesn't
> > > >                           * unplug, so get our IOs out the door before we
> > > >                           * give up the CPU.
> > > >                           */
> > > >                          blk_flush_plug(current->plug, false);
> > > >                          cond_resched();
> > > >                  }
> > > > 
> > > > So if there is a pending IO completion on this CPU on a work queue
> > > > here, we'll reschedule to it because the work queue kworkers are
> > > > bound to CPUs and they take priority over user threads.
> > > 
> > > The flusher thread is also a kworker, though. So it may hit this
> > > cond_resched(), but it doesn't yield until the timeslice expires.
> 
> 17us or 10ms, it doesn't matter. The fact is the writeback thread
> will give up the CPU long before the latency durations (seconds)
> that were reported upthread are seen. Writeback spinning can
> not explain why truncate is not making progress - everything points
> to it being a downstream symptom, not a cause.

Chris can clarify, but I don't remember second-long latencies being
mentioned. Rather sampling periods of multiple seconds during which
the spin bursts occur multiple times.

> Also important to note, as we are talking about kworker sheduling
> hold-offs, the writeback flusher work is unbound (can run on any
> CPU), whilst the IO completion workers in XFS are per-CPU and bound
> to individual CPUs. Bound kernel tasks usually take run queue
> priority on a CPU over unbound and/or user tasks that can be punted
> to a different CPU.

Is that actually true? I'm having trouble finding the corresponding
code in the scheduler.

That said, I'm not sure it matters that much. Even if you take CPU
contention out of the equation entirely, I think we agree it's not a
good idea (from a climate POV) to have CPUs busywait on IO. Even if
that IO is just an ordinary wait_on_page_writeback() on a fast drive.

So if we can get rid of the redirtying, and it sounds like we can, IMO
we should just go ahead and do so.

> > Just to underline this, the long tail latencies aren't softlockups or major
> > explosions.  It's just suboptimal enough that different metrics and
> > dashboards noticed it.
> 
> Sure, but you've brought a problem we don't understand the root
> cause of to my attention. I want to know what the root cause is so
> that I can determine that there are no other unknown underlying
> issues that are contributing to this issue.

It seems to me we're just not on the same page on what the reported
bug is. From my POV, there currently isn't a missing piece in this
puzzle. But Chris worked closer with the prod folks on this, so I'll
leave it to him :)



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux