Re: [PATCH 0/2 V4] Resubmit items failed during writeback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 20, 2017 at 05:45:50PM -0700, Darrick J. Wong wrote:
> On Mon, Jun 19, 2017 at 02:51:11PM -0400, Brian Foster wrote:
> > On Mon, Jun 19, 2017 at 10:42:03AM -0700, Darrick J. Wong wrote:
> > > On Mon, Jun 19, 2017 at 09:51:22AM -0400, Brian Foster wrote:
> > > > On Fri, Jun 16, 2017 at 12:54:43PM +0200, Carlos Maiolino wrote:
> > > > > Hi,
> > > > > 
> > > > > there goes a new version of this patchset based on previous reviews on V3.
> > > > > 
> > > > > Changelogs added separated to each patch.
> > > > > 
> > > > 
> > > > Hi Carlos,
> > > > 
> > > > I pointed out the last things that I'm aware of that I think need to be
> > > > fixed in this series (along with a few nits here and there). That said,
> > > > it's already been pointed out that we probably want an xfstests test
> > > > case to go along with this before it would be merged. Is that something
> > > > you are still planning on?
> > > > 
> > > > I'd actually like to take this a bit farther than a regression test and
> > > > see if we can improve our ability to test the error handling code in
> > > > general. What do you (or anybody else..) think about including a patch
> > > > in this series that introduces the ability to inject metadata writeback
> > > > errors on DEBUG kernels? For example, consider something that just sets
> > > > b_error at the top of xfs_buf_iodone_callbacks() based on a random value
> > > > and configurable error frequency. This could use XFS_TEST_ERROR() or
> > > > something like a new DEBUG sysfs attribute in the error configuration
> > > > (see log_badcrc_factor for a similar example).
> > > 
> > > Sounds reasonable.
> > > 
> > > I wonder if it would be more useful to have individual knobs for each
> > > metadata object type so that you could have multiple xfstests, each of
> > > which runs the same software scenario but with different failures
> > > configured?  Then we could test what happens when, say, only inode
> > > writes fail, or bmbt writes fail, etc. rather than one big hammer that's
> > > harder to control?
> > > 
> > 
> > I suppose you could do some of that in the test just by making certain
> > modifications in isolation (e.g., test an inode modification, test a
> > dquot, buffer, etc..). It might be harder to do that from a test script
> > at the granularity of things like bmbt buffers, though that may also be
> > the case for error injection. Did you have something in mind to filter
> > the buffer types.. the blf type perhaps?
> > 
> > > For a moment I also wondered why not use the error injection mechanism
> > > that we already have, rather than inventing more sysfs knobs?  But we
> > > have limited space in xfs_error_injection (29/32 bits used) so we
> > > probably should just switch everything to sysfs knobs.
> > > 
> > 
> > Yeah, I think we discussed this before as well. IIRC, I had already
> > implemented whatever the sysfs knob was and and corresponding test, and
> > it just wasn't really important enough to reimplement the whole thing.
> 
> We probably did, back when the other error injection knobs (drop_writes,
> log/log_badcrc_factor) got added.  Unfortunately now they're all over
> sysfs, which makes for sort of a mess.
> 
> > I didn't really have a preference as to how this would be implemented.
> > On a quick look though, it looks like a downside to the existing
> > injection mechanism is that we can't control the error frequency..? If
> > that's the case, I like the idea of stepping back, perhaps audit the
> > granularity of the broader error injection framework and define a new
> > sysfs interface that also allows controlling things like the error
> > frequency dynamically.
> 
> Yes, I've heard you ask for this twice now.  I agree that being able to
> control the frequency would be a useful feature to enable longer-running
> tests so that you could, say, have the refcount_finish_one error trigger
> once every 1,000,000 updates instead of immediately afterwards.
> 

Yep. It allows modeling for longevity tests as you note above, but I'm
also thinking about the ability to turn a particular error on with 100%
frequency in order to run fast, deterministic regression tests.

> So with that in mind I coded up a quick RFC to create sysfs knobs in
> /sys/fs/xfs/$device/errortag/$tagname ; the units are the same as the
> XFS_RANDOM_ values (i.e. inverted frequency).  Now we're free of the
> limitation of only being able to inject 10 error types across all
> mounted fses, and we can individually disable injection too.
> 

Nice, that sounds very interesting.. thanks!

Brian

> > (I would still like to get something targeted towards testing this
> > series in the meantime.)
> 
> Same here.
> 
> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > > This would facilitate longer running tests where iodone callback errors
> > > > occur randomly and transiently and we can thus actually exercise the
> > > > error handling and recovery code. I'd love to run some fsstress testing,
> > > > for example, as I'm hoping that would help wring out any further issues
> > > > that could be lurking (particularly with the tricky xfs_iflush_done()
> > > > logic and whatnot). If implemented generally enough, that could also be
> > > > reused for a more simple xfstests regression test for the original
> > > > problem (e.g., mount fs, set error frequency = 100%, modify an inode,
> > > > set error frequency = 0, umount), albeit it would require debug mode.
> > > > Thoughts?
> > > > 
> > > > Brian
> > > > 
> > > > > 
> > > > > -- 
> > > > > 2.9.4
> > > > > 
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux