Re: [BUG?] sync writeback regression from c4a391b5 "writeback: do not sync data dirtied after sync start"?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 18-02-14 11:23:12, Dave Chinner wrote:
> > > In this case, XFS is skipping pages because it can't get the inode
> > > metadata lock without blocking on it, and we are in non-blocking
> > > mode because we are doing WB_SYNC_NONE writeback. We could also
> > > check for wbc->tagged_writepages, but nobody else does, nor does it
> > > fix the problem of calling redirty_page_for_writepage() in
> > > WB_SYNC_ALL conditions. None of the filesystems put writeback
> > > control specific contraints on their calls to
> > > redirty_page_for_writepage() and so it seems to me like it's a
> > > generic issue, not just an XFS issue.
> > > 
> > > Digging deeper, it looks to me like our sync code simply does not
> > > handle redirty_page_for_writepage() being called in WB_SYNC_ALL
> > > properly.
> >   Well, there are two different things:
> > a) If someone calls redirty_page_for_writepage() for WB_SYNC_ALL writeback,
> > we are in trouble because definition of that writeback is that it must
> > write everything. So I would consider that a fs bug (we could put a WARN_ON
> > for this into redirty_page_for_writepage()). Arguably, we could be nice to
> > filesystems and instead of warning just retry writeback indefinitely but
> > unless someone comes with a convincing usecase I'd rather avoid that.
> 
> Right, that might be true, but almost all .writepages
> implementations unconditionally call redirty_page_for_writepage() in
> certain circumstances. e.g. xfs/ext4/btrfs do it when called from
> direct reclaim context to avoid the possibility of stack overruns.
> ext4 does it unconditionally when a memory allocation fails, etc.
> 
> So none of the filesystems behave correctly w.r.t. WB_SYNC_ALL in
> all conditions, and quite frankly I'd prefer that we fail a
> WB_SYNC_ALL writeback than risk a stack overrun. Currently we are
> stuck between a rock and a hard place with that.
  OK, I agree that returning error from sync / fsync in some rare corner
cases is better than crashing the kernel. Reclaim shouldn't be an issue as
that does only WB_SYNC_NONE writeback but out of memory conditions are real
for WB_SYNC_ALL writeback.

Just technically that means we have to return some error code from
->writepage() / ->writepages() for WB_SYNC_ALL writeback while we have to
silently continue for WB_SYNC_NONE writeback. That will require some
tweaking within filesystems.

> > b) Calling redirty_page_for_writepage() for tagged_writepages writeback is
> > a different matter. There it is clearly allowed and writeback code must
> > handle that gracefully.
> 
> *nod*
> 
> > > It looks to me like requeue_inode should never rewrite
> > > the timestamp of the inode if we skipped pages, because that means
> > > we didn't write everything we were supposed to for WB_SYNC_ALL or
> > > wbc->tagged_writepages writeback. Further, if there are skipped
> > > pages we should be pushing the inode to b_more_io, not b_dirty so as
> > > to do another pass on the inode to ensure we writeback the skipped
> > > pages in this writeback pass regardless of the WB_SYNC flags or
> > > wbc->tagged_writepages field.
> >   Resetting timestamp in requeue_inode() is one thing which causes problems
> > but even worse seems the redirty_tail() call which also updates the
> > i_dirtied_when timestamp. So any call to redirty_tail() will effectively
> > exclude the inode from running sync(2) writeback and that's wrong.
> 
> *nod*
> 
> I missed that aspect of the redirty_tail() behaviour, too. Forest,
> trees. This aspect of the problem may be more important than the
> problem with skipped pages....
  redirty_tail() behavior is a pain for a long time. But we cannot just rip
it out because we need a way to tell "try to writeback this inode later"
where later should be "significantly later" - usually writeback on that
inode is blocked by some other IO, lock, or something similar. So without
redirty_tail() we just spinned in writeback code for significant time
busywaiting for IO or a lock. I actually have patches to remove
redirty_tail() from like two years ago but btrfs was particularly inventive
in screwing up writeback back then so we didn't merge the patches in the end.
Maybe it's time to revisit this.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux