On Wed, Nov 03, 2010 at 06:18:41PM -0700, Mike Waychison wrote: >Matt Mackall wrote: >>On Wed, 2010-11-03 at 13:29 -0700, Mike Waychison wrote: >>>Mike Waychison wrote: >>>>FWIW, another semantic difference between netconsole and netoops (that >>>>I had missed in the last email) is filtering: we really do want to get >>>>the whole log when a crash happens, debug messages and all. >>>>Netconsole is subject to console filtering (which we _do_ want as >>>>debug messages going out the uart slows the whole world down). >>>> >>>>netconsole and netoops _do_ have bits in common, for instance the >>>>handling of NETDEV events and source+target configuration. I'd rather >>>>those bits become common between the two than figure out how to jam >>>>the semantics we need into netconsole. >>>Hi Matt, >>> >>>I've been reading through the netconsole driver in response to >>>Greg's comments on this thread, and it is definitely more robust >>>in terms of configuration and handling of network device events >>>than the netoops driver I proposed. >> >>I've been following the discussion to see if it went anywhere >>interesting.. >> >>>What are your thoughts on extending netconsole with the same sort >>>of semantics that are in the netoops patchset? >> >>My first thought is that it's a bit unfortunate that some of the the >>netconsole configgy bits weren't implemented in a generic way that would >>be applicable to other netpoll clients. Some people have never gotten it >>into their heads that netconsole isn't the only client. >> >>>I'd still like to have blit-dmesg-to-the-network-on-oops >>>semantics, which seems doable by having a per-target flag for >>>streaming of console messages (enabled by default) and a flag to >>>emit a structured full dmesg dump (disabled by default). >> >>I'd actually like to see you go forward with netoops. It's clear to me >>that it's a different beast and complexifying netconsole with a bunch of >>weird new options doesn't really sit well. If that means abstracting >>some of the sysfs crap from netconsole, great. > >I'd be happy to take a stab at this. This solves most of the ABI >reservations that I have with this v1 patchset. > >Looking at netconsole, it looks to lack some locking for data >consistency, and it appears that we will deadlock if we ever get a >NETDEV_UNREGISTER event (due to recursively grabbing the rtnl in >netpoll_cleanup). I have a couple patches I've been hacking on this >afternoon that should clear those issues up. > You might want to look at net-next-2.6, it has some fixes from Neil. >I'm thinking of pushing all the target handling options down into >net/core/netpoll.c. I'll probably expose this interface as "struct >netpoll_targets" where ->lock and ->list could be completely exposed >to clients. netconsole would then get a lot smaller as would >netoops. > >>That said, I don't think netoops is an ideal name, given how closely >>bound oops _events_ are with their textual output. Presumably it covers >>events other than oopsen like panics too. > >True. We call this code 'netdump' or 'network_dumper' internally, >but I figured it'd be better to follow current conventions with >ramoops and mtdoops already in the tree. I don't really care what >it's called in the end :) > "netdump" was used by a utility that do crash dumping over net. It is deprecated now, since we have kdump. >> >>Regarding rolling oopses: lots of machines regularly survive >>oopses, so I think you ought to consider rate-limiting them (to a >>configurable rate >>with a very low default) rather than suppressing all but the first. >> > >The trouble with Oopses is just that: We don't know whether we can >safely survive them or not and it's a total gamble each time we do >Oops. We can't programmatically know how crapped out the machine is, >so historically we've erred on not allowing bad things to continue >happening once someone notices something wrong. > >It's easier for us to just shoot the machine in the head >(panic_on_oops) and move on than corrupt data or dead-lock in weird >ways at some later point in time. This is definitely not the >behaviour I would want nor expect from my desktop or phone, but for >the cluster, it's just safer. We also have pause_on_oops, or we can invent a oops_once. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html