Re: [PATCH v2] xfs: make fatal assert failures conditional in debug mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 09, 2017 at 09:14:48AM +1000, Dave Chinner wrote:
> On Mon, May 08, 2017 at 08:55:32AM -0400, Brian Foster wrote:
> > On Sat, May 06, 2017 at 09:09:43AM +1000, Dave Chinner wrote:
> > > On Fri, May 05, 2017 at 09:31:26AM -0400, Brian Foster wrote:
> > > > XFS currently supports two debug modes: XFS_WARN enables assert
> > > > failure warnings and XFS_DEBUG converts assert failures to fatal
> > > > errors (via BUG()) and enables additional runtime debug code.
> > > > 
> > > > While the behavior to BUG the kernel on assert failure is useful in
> > > > certain test scenarios, it is also useful for development/debug to
> > > > enable debug mode code without having to crash the kernel on an
> > > > assert failure.
> > > > 
> > > > To provide this additional flexibility, update XFS debug mode to not
> > > > BUG() the kernel by default and create a new XFS kernel
> > > > configuration option to enable fatal assert failures when debug mode
> > > > is enabled. To provide backwards compatibility with current
> > > > behavior, enable the fatal asserts option by default when debug mode
> > > > is enabled.
> > > > 
> > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > 
> > > Just a suggestion, but why make this a compile time option? Why not
> > > a sysfs variable under /sys/fs/xfs/debug? That would be far more
> > > useful to me - a single kernel that can be configure to just warn or
> > > bug() dynamically. That will save us from having to rebuild a kernel
> > > just to enable this functionality, then rebuild again to turn it
> > > off..
> > > 
> > 
> > I hadn't really considered that approach. The obvious drawback for me is
> > that whatever option is not default has to be reset on every boot or
> > module reload,
> 
> That's what sysctl is for.
> 

I suppose that helps, but only partially. It's still an extra step on a
new system (re: think disposable test system environments). I also don't
think it helps across module reloads (?), which means in certain cases
I'm most likely just back to holding a local patch to comment out the
BUG() (which is fine too).

> > the latter of which tends to be a common action in my use
> > cases (e.g., adding debug code to do specialized debugging or for some
> > new development work and reloading xfs as a kernel module). It's kind of
> > the opposite problem for general regression testing if we were to change
> > the default from BUG() to warn, for example. The tester would have to
> > remember to (or know) to twiddle the knob if one is expecting assert
> > failures to generate a BUG() and crash report.
> 
> Default would be the same as today - BUG on assert - so this
> wouldn't matter.
> 

Yes, the above is an analogy.

> > So for me, the ability to live switch between BUG() or warn in debug
> > mode doesn't add value. In fact, it is less ideal than just being able
> > to (re)compile a kernel module and load it with expected behavior. That
> 
> Who uses kernel modules for testing? I just use monolithic kernels
> because I can boot a new kernel in less time than it takes copy in
> and reload a new module to the test machine.
> 

I use kernel modules quite a bit. For one, I tend to use test machines
that can take a minute or two to boot. Even when using a local vm,
copying a kernel module and reloading works quite well for me for
development, because I may have various contexts (i.e., screen/tmux
sessions) that I prefer not to lose and I tend to work iteratively.

But this is anecdotal... I don't think it matters much what workflow is
the one true best for testing or development. Rather, the question is
whether a workflow is common and useful to enough people to justify a
configuration option.

> > said, that's just my admittedly selfish use case. The ability to switch
> > off BUG() at all is still an improvement over the current situation, so
> > I'm open to a runtime knob if that is the more broadly useful solution.
> > Care to elaborate on how that is more useful to you?
> 
> e.g. think of the dangerous tests in xfstests that don't get run
> because they fire an assert and kill the test machine. have the test
> harness set "warn only" for the dangerous tests and now those tests
> are no longer dangerous and can be run as part of the auto group...
> 

It looks like the majority of the dangerous tests are tagged as such
because they historically (and legitimately) panic, crash or hang a
system. I don't think they are any more or less dangerous based on
assert behavior. Indeed, a quick check with the latest kernel doesn't
result in any dangerous test failures in either XFS_DEBUG or XFS_WARN
mode (though one or two were skipped).

> > A bit of a sidetrack...
> > 
> > To me, runtime live switching seems a bit more appropriate for something
> > at a higher level of enabling/disabling debug mode entirely as opposed
> > to solely assert behavior (for which it seems like overkill). A couple
> > problems with that are bloating the kernel and efficiency associated
> > with losing the ability to compile out asserts, both of which may make
> > something like that not realistic.
> > 
> > I do wonder, however, whether we could condense the current kernel
> > configuration into effectively two logical modes: production and debug.
> > The latter debug mode simply compiles in all of the debug code, but
> > supports various sub-modes: disabled, warn, debug (as today)[1]. The
> 
> We've talked about this in the past and enabling debug like
> having the allocator run random selection paths rather than optimal
> paths can lead to premature freespace fragmetnation and aging of the
> filesystem - exactly what you don't want for a production system
> you're trying to diagnose a problem on.
> 

Good point. I agree that we wouldn't want to enable things like the
random allocation algorithms, randomly doing sparse inode allocations
and such things that generally affect the structure on disk.

As mentioned previously, we can be more granular than the current binary
toggle for debug mode. E.g., we could separate diagnostic mechanisms
from test coverage mechanisms and enable the latter at a higher debug
level or with a separate option entirely, if desired. IOW, I don't think
that's a difficult problem to solve.

> Also, there's debug code that doesn't scale well (such as extent
> list checking) and that sort of thing can badly affect performance -
> these are gotchas in debug builds that production diagnositic
> kernels should not trip over....
> 

Yeah, but I think there's a host of other things one can already enable
on a debug kernel package (I'm thinking about all of the memory related
bits, etc.) that can kill performance just as much, if not worse. It's
still a separate debug package after all. However, that is a good
argument for debug modes to be disabled by default (or effectively set
to XFS_WARN as it is defined today) in such configs, because there is no
guarantee the debug kernel is installed to debug an XFS problem.

> > default debug sub-mode can be selected at kernel compile time and
> > toggled at runtime. So effectively, a debug enabled kernel has the
> > ability to support arbitrary modes and a production kernel still has all
> > of that crap compiled out. This would allow, for example, a distro debug
> > kernel package to ship and enable actual debug code at runtime rather
> > than be limited to XFS_WARN, which is at least what we (rh) do today.
> > Thoughts? Useful, overkill?
> 
> XFS_WARN was the tradeoff for getting useful assert information out
> of production machines without impacting performance, allocation,
> etc by enabling the full debug code. I don't think anything has
> changed that alters that tradeoff since we added XFS_WARN...
> 

I'm not following the logic here... I don't think there's anything
inherently wrong with XFS_WARN as it is deployed today. I was just
suggesting that if we're going to think about providing dynamic debug
options, perhaps there's more benefit to step back and expose some of
the other diagnostic mechanisms that we have available rather than
simply to toggle assert behavior. IOWs, let the kernel config control
what is compiled in or not rather than behavior and let runtime
configuration control behavior.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux