On Mon, Jul 24, 2017 at 03:26:38PM +1000, Dave Chinner wrote: > On Sat, Jul 22, 2017 at 08:29:44AM -0400, Brian Foster wrote: > > On Sat, Jul 22, 2017 at 09:00:58AM +1000, Dave Chinner wrote: > > > On Fri, Jul 21, 2017 at 09:54:05AM -0400, Brian Foster wrote: > > > > On Fri, Jul 21, 2017 at 08:49:34AM +1000, Dave Chinner wrote: > > > > but I'm also not against crafting a new, verifier > > > > specific one to accomplish that. Technically, it doesn't have to be > > > > shouty :), but IMO, the diagnostic/usability benefit outweighs the > > > > aesthetic cost. > > > > > > I disagree - there are so many verifier checks (and we only grow > > > more as time goes on) so whatever we do needs them to be > > > easily maintainable and not compromise the readabilty of the verifer > > > code. > > > > > > > I agree. ;) But I disagree that using a macro somehow automatically > > negatively affects the readability of verifers. Anybody who can make > > sense of this: > > > > if (!xfs_sb_version_hascrc(&mp->m_sb)) > > return __line__; > > if (dsl->sl_magic != cpu_to_be32(XFS_SYMLINK_MAGIC)) > > return __line__; > > if (!uuid_equal(&dsl->sl_uuid, &mp->m_sb.sb_meta_uuid)) > > return __line__; > > > > ... could just as easily make sense of something like this: > > > > xfs_verify_assert(xfs_sb_version_hascrc(&mp->m_sb)); > > xfs_verify_assert(dsl->sl_magic == cpu_to_be32(XFS_SYMLINK_MAGIC)); > > xfs_verify_assert(uuid_equal(&dsl->sl_uuid, &mp->m_sb.sb_meta_uuid)); > > Still needs branches to return on error so the verifier code > functions correctly. So it simply isn't as clean as your example > suggests. And, no hiding goto/return/errno variables inside a macro > is not an option - that makes for horrible code and we should not > repeat those past mistakes we inherited from the old Irix > codebase. > Clearly the example implies (and the supporting text explains) that the macro would lead to function exit. I don't really see the problem with doing that. Again, we already do that all over the place. If we don't want to alter function execution via the macro, then have it return an expression that resolves to false (or an error code or some such) and leave the function execution logic in function context. It doesn't really matter to me either way. > And we need to be really careful here - making the code more complex > for pretty errors blows out the size of the code. That means it runs > slower, even though we never execute most of the pretty error print > stuff. This is an important consideration as verifiers are a > performance critical fast path, both in the kernel and in userspace. > I guess we'd increase the size of the code for strings that represent the associated checks and whatever extra the macro code may end up duplicating. Once an error is detected, I think much of the latter could be packaged into a noinline helper. It's not immediately clear to me what we're looking at for a size increase, so I don't think "blows out" is a fair characterization without some actual data [1]. I do think this is a fair/valid concern however, we'd just need to try and experiment and see how much it costs to convert over a verifier or two, for example. And I'm not sure how you make the leap from there to an automatic performance degradation. I also agree that the verifiers are performance critical and wouldn't suggest taking an approach if I thought we'd measurably affect performance. As it is, I'd expect that additional overhead of storing error context (hell, we could always just dump it from where the problem was originally detected rather than pass it to the generic verifier report function) would essentially be nil up until a verification check fails. Would you consider such an approach if we could demonstrate a reasonable code size tradeoff and that runtime performance is not affected? If you simply hate macros then I guess there isn't much we can do but agree to disagree (and I'm not against a non-macro approach if one exists, I just don't know what it would be). But otherwise I wouldn't mind running some experiments (unless Darrick wanted to..? he brought this up after all.. :P) if the results will be fairly considered. > > There is a simple solution to this problem: just include all of the > > readily accessible state information in the error report. If the > > messages above looked something like the following: > > > > XFS (...): Metadata corruption detected at xfs_symlink_read_verify [xfs], xfs_symlink block 0x58 > > XFS (...): Read verifier failed: xfs_log_check_lsn(mp, be64_to_cpu(dsl->sl_lsn)), xfs_symlink_remote.c:114 > > > > ... we wouldn't have to decode what the error means for every possible > > kernel. The error report is self contained like most of the other > > errors/assert reports in XFS and throughout the kernel. > > Just like an assert/warning, we've still got to go match it to the > exact kernel source tree to determine the context in which the > message was emitted and what it actually means. Making it pretty > doesn't change the amount of work a developer needs to do to > classify or analyse errors in log files, all it does is change the > amount of overhead needed to generate the error message, > Yes, you're still going to want to ultimately dig down into the code and do complex root cause analysis. Unfortunately, we can't fix that with error reporting. :) I've laid out several situations above as to why I think the "pretty" format makes it significantly easier to make sense of corruption reports that may be more involved than a single instance or a single physical node. A broad assessment can be made from the actual corruption report without the need to decode tens or more cryptic messages. It potentially saves significant time spent doing unnecessary busy work. It also helps reduce unnecessary confusion with general reports that might arise from users on distro kernels, etc., where the line numbers might not be sufficient or easily misaligned. Of course, once we get into debugging an actual problem, we've kind of moved outside the scope of this discussion... > Really, verifiers are performance critical fast paths, so the code > that runs inside them needs to be implemented with this in mind. I'm > not against making messages pretty, but let's not do it based on the > on the premise that pretty error messages are more important than > runtime performance when there are no errors... > I agree. As noted above, I would not expect any more overhead than the original suggestion to return a line number in the normal runtime case. The difference is primarily what we do in the code between when a verifier check fails and we report the failure. FWIW, an approach that might be a reasonable compromise is to use the original suggestion to pass along the line number, but also wrap each verifier check in a thinner macro that simply asserts[2] on the verifier expression. The new macro could be ifdef'd out completely in production builds to eliminate code size concerns. Then if we have reports with significant enough corruption issues to warrant it, we collect a metadump and use any local debug kernel to access the fs and decode the errors for us (or ask the user to run a debug kernel). We can expand the verifier macro in the future with production build support if warranted. Thoughts? Brian [1] A simple test that we can do at the moment is to compare a production xfs.ko with one with asserts enabled. My local for-next branch builds a module of 40104000 bytes without XFS_WARN and 40540208 bytes with XFS_WARN enabled. That makes for a difference of ~425k and roughly a 1% code size increase (accounting for ~1750 asserts and supporting code). [2] We may not want to use ASSERT() here, but perhaps define a new variant to omit the stack trace and otherwise customize to a verifier report. > Cheers, > > Dave. > > -- > Dave Chinner > david@xxxxxxxxxxxxx > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html