On 11/20/20 12:59 PM, K.R. Foley wrote: > > > > On 2020-11-20 13:51, Jeff Moyer wrote: >> Randy Dunlap <rdunlap@xxxxxxxxxxxxx> writes: >> >>> On 11/20/20 11:16 AM, K.R. Foley wrote: >>>> I have found an issue that triggers by running lsof. The problem is >>>> reproducible, but not consistently. I have seen this issue occur on >>>> multiple versions of the kernel (5.0.10, 5.2.8 and now 5.4.77). It >>>> looks like it could be a race condition or the file pointer is being >>>> corrupted. Any pointers on how to track this down? What additional >>>> information can I provide? >>> >>> Hi, >>> >>> 2 things in general: >>> >>> a) Can you test with a more recent kernel? >>> >>> b) Can you reproduce this without loading the proprietary & out-of-tree >>> kernel modules? They should never have been loaded after bootup. >>> I.e., don't just unload them -- that could leave something bad behind. >> >> Heh, the EIP contains part of the name of one of the modules: >> >>> >>>> [ 8057.297159] BUG: unable to handle page fault for address: 31376f63 >> ^^^^^^^^ Thanks for noticing that, Jeff. I should have seen it. >>>> [ 8057.297219] Modules linked in: ITXico7100Module(O) >> ^^^^ > > Perhaps this is a dumb question, but how could this happen? We don't know what is in that loadable kernel module, so we can't give a definitive answer to your question, other than it's buggy. Or maybe it was just written for an older kernel version. Or a kernel with different build options/settings. Have you contacted IT support? It would (will) be interesting to see if you can reproduce the problem without these modules being loaded... I kind of doubt it, but if it does still fail, it will give us something to look at. -- ~Randy