On 2020-11-20 15:13, Randy Dunlap wrote:
On 11/20/20 12:59 PM, K.R. Foley wrote:
On 2020-11-20 13:51, Jeff Moyer wrote:
Randy Dunlap <rdunlap@xxxxxxxxxxxxx> writes:
On 11/20/20 11:16 AM, K.R. Foley wrote:
I have found an issue that triggers by running lsof. The problem is
reproducible, but not consistently. I have seen this issue occur on
multiple versions of the kernel (5.0.10, 5.2.8 and now 5.4.77). It
looks like it could be a race condition or the file pointer is
being
corrupted. Any pointers on how to track this down? What additional
information can I provide?
Hi,
2 things in general:
a) Can you test with a more recent kernel?
b) Can you reproduce this without loading the proprietary &
out-of-tree
kernel modules? They should never have been loaded after bootup.
I.e., don't just unload them -- that could leave something bad
behind.
Heh, the EIP contains part of the name of one of the modules:
[ 8057.297159] BUG: unable to handle page fault for address:
31376f63
^^^^^^^^
Thanks for noticing that, Jeff. I should have seen it.
[ 8057.297219] Modules linked in: ITXico7100Module(O)
^^^^
Perhaps this is a dumb question, but how could this happen?
We don't know what is in that loadable kernel module, so we can't
give a definitive answer to your question, other than it's buggy.
Or maybe it was just written for an older kernel version.
Or a kernel with different build options/settings.
I am starting to look at this now. It was written for an older kernel by
someone else. Thank you for the tips.
Have you contacted IT support?
It would (will) be interesting to see if you can reproduce the problem
without these modules being loaded...
I kind of doubt it, but if it does still fail, it will give us
something
to look at.
Knowing a little more now. I doubt it will be reproducible without the
module.
--
Regards,
K.R. Foley