On Wed, Apr 18 2018, Linus Torvalds wrote: > Ugh, that lustre code is disgusting. > > I thought we were getting rid of it. Lots of people seem to get value out of it. So we're trying to polish the code to make it less disgusting. This is just a little fall-out. The smoking gun is [ 6.528851] LNetError: 1:0:(module.c:546:libcfs_init()) misc_register: error -16 lustre registers a misc char device with the same number as USERIO. If they both try to register, one fails. Until recently, lustre could only be built as a module so when lustre failed to register the char dev, the module-load fails. Now it can be built monolithic (makes my testing easier) and the failure mode is different. The module that tried to register the chardev rewinds some initialization, and a subsequent module assumes that init was done, and explodes. There are patches in Greg's inbox to change lustre to use a dynamically allocated minor. And it is on my todo list to get lustre to do less initialization at module-init time (where, in a monolithic build, it is hard to give up if some previous module failed), and more at mount time. So this is a known bug (maybe a new manifestation) and a fix has been posted. There is certainly room for lots more cleanup and that is slowly happening. I'll make a note to look into the large stack frames you observed. Previous report of bug was Subject: [staging] 184ecc5ceb: BUG:unable_to_handle_kernel Message-ID: <20180319091931.gt6ijdw7ahkbtvrq@inn> Thanks, NeilBrown > > Anyway, I started looking at why the stack trace is such an incredible > mess, with lots of stale entries. > > The reason (well, _one_ reason) seems to be "ksocknal_startup". It has > a 500-byte stack frame for some incomprehensible reason. I assume due > to excessive inlining, because the function itself doesn't seem to be > that bad. > > Similarly, LNetNIInit has a 300-byte stack frame. So it gets pretty deep. > > I'm getting the feeling that KASAN is making things worse because > probably it's disabling all the sane stack frame stuff (ie no merging > of stack slot entries, perhaps?). > > Without KASAN (but also without a lot of other things, so I might be > blaming KASAN incorrectly), the stack usage of ksocknal_startup() is > just under 100 bytes, so if it is KASAN, it's really a big difference. > > Anyway, apart from the excessive elements, the report seems fine, but > I'm adding Neil Brown to the cc, since he's the one that has been > making most of the lustre/lnet changes this merge window. > > Also adding Andrey to check about the oddly large stack usage. > > Not including the whole email with the attachements - Neil, it's on > lkml and lustre-devel if you hadn't seen it. > > Linus
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel