Re: Crash in NVME tracing on 5.10LTS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/11/24 10:00 AM, John Sperbeck wrote:
> On Thu, Jan 11, 2024 at 1:46?AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>> On Tue, Jan 09, 2024 at 10:17:22AM -0800, John Sperbeck wrote:
>>> With 5.10LTS (e.g., 5.10.206), on a machine using an NVME device, the
>>> following tracing commands will trigger a crash due to a NULL pointer
>>> dereference:
>>>
>>> KDIR=/sys/kernel/debug/tracing
>>> echo 1 > $KDIR/tracing_on
>>> echo 1 > $KDIR/events/nvme/enable
>>> echo "Waiting for trace events..."
>>> cat $KDIR/trace_pipe
>>>
>>> The backtrace looks something like this:
>>>
>>> Call Trace:
>>>  <IRQ>
>>>  ? __die_body+0x6b/0xb0
>>>  ? __die+0x9e/0xb0
>>>  ? no_context+0x3eb/0x460
>>>  ? ttwu_do_activate+0xf0/0x120
>>>  ? __bad_area_nosemaphore+0x157/0x200
>>>  ? select_idle_sibling+0x2f/0x410
>>>  ? bad_area_nosemaphore+0x13/0x20
>>>  ? do_user_addr_fault+0x2ab/0x360
>>>  ? exc_page_fault+0x69/0x180
>>>  ? asm_exc_page_fault+0x1e/0x30
>>>  ? trace_event_raw_event_nvme_complete_rq+0xba/0x170
>>>  ? trace_event_raw_event_nvme_complete_rq+0xa3/0x170
>>>  nvme_complete_rq+0x168/0x170
>>>  nvme_pci_complete_rq+0x16c/0x1f0
>>>  nvme_handle_cqe+0xde/0x190
>>>  nvme_irq+0x78/0x100
>>>  __handle_irq_event_percpu+0x77/0x1e0
>>>  handle_irq_event+0x54/0xb0
>>>  handle_edge_irq+0xdf/0x230
>>>  asm_call_irq_on_stack+0xf/0x20
>>>  </IRQ>
>>>  common_interrupt+0x9e/0x150
>>>  asm_common_interrupt+0x1e/0x40
>>>
>>> It looks to me like these two upstream commits were backported to 5.10:
>>>
>>> 679c54f2de67 ("nvme: use command_id instead of req->tag in trace_nvme_complete_rq()")
>>> e7006de6c238 ("nvme: code command_id with a genctr for use-after-free validation")
>>>
>>> But they depend on this upstream commit to initialize the 'cmd' field in
>>> some cases:
>>>
>>> f4b9e6c90c57 ("nvme: use driver pdu command for passthrough")
>>>
>>> Does it sound like I'm on the right track?  The 5.15LTS and later seems to be okay.
>>>
>>
>> If you apply that commit, does it solve the issue for you?
>>
>> thanks,
>>
>> greg k-h
> 
> The f4b9e6c90c57 ("nvme: use driver pdu command for passthrough")
> upstream commit doesn't apply cleanly to 5.10LTS.  If I adjust it to
> fit, then the crash no longer occurs for me.
> 
> A revert of 706960d328f5 ("nvme: use command_id instead of req->tag in
> trace_nvme_complete_rq()") from 5.10LTS also prevents the crash.
> 
> My leaning would be for a revert from 5.10LTS, but I think the
> maintainers would have better insight then me.  It's also possible
> that this isn't serious enough to worry about in general.  I don't
> really know.

Either solution is fine with me, doesn't really matter. I was wondering
how this ended up in stable, and it looks like it was one of those
auto-selections... Those seem particularly dangerous the further back
you go.

-- 
Jens Axboe





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux