On Tue, May 09, 2017 at 09:11:31AM -0400, Brian Foster wrote: > On Tue, May 09, 2017 at 09:14:48AM +1000, Dave Chinner wrote: > > On Mon, May 08, 2017 at 08:55:32AM -0400, Brian Foster wrote: > > > On Sat, May 06, 2017 at 09:09:43AM +1000, Dave Chinner wrote: > > > > On Fri, May 05, 2017 at 09:31:26AM -0400, Brian Foster wrote: > > > > > XFS currently supports two debug modes: XFS_WARN enables assert > > > > > failure warnings and XFS_DEBUG converts assert failures to fatal > > > > > errors (via BUG()) and enables additional runtime debug code. > > > > > > > > > > While the behavior to BUG the kernel on assert failure is useful in > > > > > certain test scenarios, it is also useful for development/debug to > > > > > enable debug mode code without having to crash the kernel on an > > > > > assert failure. > > > > > > > > > > To provide this additional flexibility, update XFS debug mode to not > > > > > BUG() the kernel by default and create a new XFS kernel > > > > > configuration option to enable fatal assert failures when debug mode > > > > > is enabled. To provide backwards compatibility with current > > > > > behavior, enable the fatal asserts option by default when debug mode > > > > > is enabled. > > > > > > > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > > > > > > > Just a suggestion, but why make this a compile time option? Why not > > > > a sysfs variable under /sys/fs/xfs/debug? That would be far more > > > > useful to me - a single kernel that can be configure to just warn or > > > > bug() dynamically. That will save us from having to rebuild a kernel > > > > just to enable this functionality, then rebuild again to turn it > > > > off.. > > > > > > > > > > I hadn't really considered that approach. The obvious drawback for me is > > > that whatever option is not default has to be reset on every boot or > > > module reload, > > > > That's what sysctl is for. > > > > I suppose that helps, but only partially. It's still an extra step on a > new system (re: think disposable test system environments). I also don't > think it helps across module reloads (?), which means in certain cases > I'm most likely just back to holding a local patch to comment out the > BUG() (which is fine too). How about a runtime config variable (sysctl, module param, {sys,config,debug}fs knob...) and a Kconfig option to set the default value of the knob? That enables dangerous group xfstests to switch off the crash-on-assert behavior and allows developers to preconfigure whatever's most convenient for them. > > > the latter of which tends to be a common action in my use > > > cases (e.g., adding debug code to do specialized debugging or for some > > > new development work and reloading xfs as a kernel module). It's kind of > > > the opposite problem for general regression testing if we were to change > > > the default from BUG() to warn, for example. The tester would have to > > > remember to (or know) to twiddle the knob if one is expecting assert > > > failures to generate a BUG() and crash report. > > > > Default would be the same as today - BUG on assert - so this > > wouldn't matter. > > > > Yes, the above is an analogy. > > > > So for me, the ability to live switch between BUG() or warn in debug > > > mode doesn't add value. In fact, it is less ideal than just being able > > > to (re)compile a kernel module and load it with expected behavior. That > > > > Who uses kernel modules for testing? I just use monolithic kernels > > because I can boot a new kernel in less time than it takes copy in > > and reload a new module to the test machine. I do, because usually my code isn't so bad that it trashes the kernel state to the point of needing a reboot. Lucky me. :P (Now granted the test machine boots so slowly I go for a walk and the VMs boot so quickly I barely lose any time, so I don't really care one way or another... :)) > I use kernel modules quite a bit. For one, I tend to use test machines > that can take a minute or two to boot. Even when using a local vm, > copying a kernel module and reloading works quite well for me for > development, because I may have various contexts (i.e., screen/tmux > sessions) that I prefer not to lose and I tend to work iteratively. > > But this is anecdotal... I don't think it matters much what workflow is > the one true best for testing or development. Rather, the question is > whether a workflow is common and useful to enough people to justify a > configuration option. > > > > said, that's just my admittedly selfish use case. The ability to switch > > > off BUG() at all is still an improvement over the current situation, so > > > I'm open to a runtime knob if that is the more broadly useful solution. > > > Care to elaborate on how that is more useful to you? > > > > e.g. think of the dangerous tests in xfstests that don't get run > > because they fire an assert and kill the test machine. have the test > > harness set "warn only" for the dangerous tests and now those tests > > are no longer dangerous and can be run as part of the auto group... > > > > It looks like the majority of the dangerous tests are tagged as such > because they historically (and legitimately) panic, crash or hang a > system. I don't think they are any more or less dangerous based on > assert behavior. Indeed, a quick check with the latest kernel doesn't > result in any dangerous test failures in either XFS_DEBUG or XFS_WARN > mode (though one or two were skipped). > > > > A bit of a sidetrack... > > > > > > To me, runtime live switching seems a bit more appropriate for something > > > at a higher level of enabling/disabling debug mode entirely as opposed > > > to solely assert behavior (for which it seems like overkill). A couple > > > problems with that are bloating the kernel and efficiency associated > > > with losing the ability to compile out asserts, both of which may make > > > something like that not realistic. > > > > > > I do wonder, however, whether we could condense the current kernel > > > configuration into effectively two logical modes: production and debug. > > > The latter debug mode simply compiles in all of the debug code, but > > > supports various sub-modes: disabled, warn, debug (as today)[1]. The > > > > We've talked about this in the past and enabling debug like > > having the allocator run random selection paths rather than optimal > > paths can lead to premature freespace fragmetnation and aging of the > > filesystem - exactly what you don't want for a production system > > you're trying to diagnose a problem on. > > > > Good point. I agree that we wouldn't want to enable things like the > random allocation algorithms, randomly doing sparse inode allocations > and such things that generally affect the structure on disk. > > As mentioned previously, we can be more granular than the current binary > toggle for debug mode. E.g., we could separate diagnostic mechanisms > from test coverage mechanisms and enable the latter at a higher debug > level or with a separate option entirely, if desired. IOW, I don't think > that's a difficult problem to solve. Agreed. > > Also, there's debug code that doesn't scale well (such as extent > > list checking) and that sort of thing can badly affect performance - > > these are gotchas in debug builds that production diagnositic > > kernels should not trip over.... > > > > Yeah, but I think there's a host of other things one can already enable > on a debug kernel package (I'm thinking about all of the memory related > bits, etc.) that can kill performance just as much, if not worse. It's > still a separate debug package after all. However, that is a good > argument for debug modes to be disabled by default (or effectively set > to XFS_WARN as it is defined today) in such configs, because there is no > guarantee the debug kernel is installed to debug an XFS problem. > > > > default debug sub-mode can be selected at kernel compile time and > > > toggled at runtime. So effectively, a debug enabled kernel has the > > > ability to support arbitrary modes and a production kernel still has all > > > of that crap compiled out. This would allow, for example, a distro debug > > > kernel package to ship and enable actual debug code at runtime rather > > > than be limited to XFS_WARN, which is at least what we (rh) do today. > > > Thoughts? Useful, overkill? > > > > XFS_WARN was the tradeoff for getting useful assert information out > > of production machines without impacting performance, allocation, > > etc by enabling the full debug code. I don't think anything has > > changed that alters that tradeoff since we added XFS_WARN... > > > > I'm not following the logic here... I don't think there's anything > inherently wrong with XFS_WARN as it is deployed today. I was just > suggesting that if we're going to think about providing dynamic debug > options, perhaps there's more benefit to step back and expose some of > the other diagnostic mechanisms that we have available rather than > simply to toggle assert behavior. IOWs, let the kernel config control > what is compiled in or not rather than behavior and let runtime > configuration control behavior. Agreed here too, though I think it's acceptable to have a Kconfig option to set the default values of the knob(s). --D > Brian > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@xxxxxxxxxxxxx > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html