Re: Kernel oops with 6.14 when enabling TLS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/4/25 08:58, Hannes Reinecke wrote:
> On 3/3/25 23:02, Vlastimil Babka wrote:
>> On 3/3/25 17:15, Vlastimil Babka wrote:
>>> On 3/3/25 16:48, Matthew Wilcox wrote:
>>>> You need to turn on the debugging options Vlastimil mentioned and try to
>>>> figure out what nvme is doing wrong.
>>>
>>> Agree, looks like some error path going wrong?
>>> Since there seems to be actual non-large kmalloc usage involved, another
>>> debug parameter that could help: CONFIG_SLUB_DEBUG=y, and boot with
>>> "slab_debug=FZPU,kmalloc-*"
>> 
>> Also make sure you have CONFIG_DEBUG_VM please.
>> 
> Here you go:
> 
> [  134.506802] page: refcount:0 mapcount:0 mapping:0000000000000000 
> index:0x0 pfn:0x101ef8
> [  134.509253] head: order:3 mapcount:0 entire_mapcount:0 
> nr_pages_mapped:0 pincount:0
> [  134.511594] flags: 
> 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
> [  134.513556] page_type: f5(slab)
> [  134.513563] raw: 0017ffffc0000040 ffff888100041b00 ffffea0004a90810 
> ffff8881000402f0
> [  134.513568] raw: 0000000000000000 00000000000a000a 00000000f5000000 
> 0000000000000000
> [  134.513572] head: 0017ffffc0000040 ffff888100041b00 ffffea0004a90810 
> ffff8881000402f0
> [  134.513575] head: 0000000000000000 00000000000a000a 00000000f5000000 
> 0000000000000000
> [  134.513579] head: 0017ffffc0000003 ffffea000407be01 ffffffffffffffff 
> 0000000000000000
> [  134.513583] head: 0000000000000008 0000000000000000 00000000ffffffff 
> 0000000000000000
> [  134.513585] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) 
> folio_ref_count(folio) + 127u <= 127u))
> [  134.513615] ------------[ cut here ]------------
> [  134.529822] kernel BUG at ./include/linux/mm.h:1455!

Yeah, just as I suspected, folio_get() says the refcount is 0.

> [  134.529835] Oops: invalid opcode: 0000 [#1] PREEMPT SMP 
> DEBUG_PAGEALLOC NOPTI
> [  134.529843] CPU: 0 UID: 0 PID: 274 Comm: kworker/0:1H Kdump: loaded 
> Tainted: G            E      6.14.0-rc4-default+ #309 
> 03b131f1ef70944969b40df9d90a283ed638556f
> [  134.536577] Tainted: [E]=UNSIGNED_MODULE
> [  134.536580] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 0.0.0 02/06/2015
> [  134.536583] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
> [  134.536595] RIP: 0010:__iov_iter_get_pages_alloc+0x676/0x710
> [  134.542810] Code: e8 4c 39 e0 49 0f 47 c4 48 01 45 08 48 29 45 18 e9 
> 90 fa ff ff 48 83 ef 01 e9 7f fe ff ff 48 c7 c6 40 57 4f 82 e8 6a e2 ce 
> ff <0f> 0b e8 43 b8 b1 ff eb c5 f7 c1 ff 0f 00 00 48 89 cf 0f 85 4f ff
> [  134.542816] RSP: 0018:ffffc900004579d8 EFLAGS: 00010282
> [  134.542821] RAX: 000000000000005c RBX: ffffc90000457a90 RCX: 
> 0000000000000027
> [  134.542825] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 
> ffff88817f423748
> [  134.542828] RBP: ffffc90000457d60 R08: 0000000000000000 R09: 
> 0000000000000001
> [  134.554485] R10: ffffc900004579c0 R11: ffffc90000457720 R12: 
> 0000000000000000
> [  134.554488] R13: ffffea000407be40 R14: ffffc90000457a70 R15: 
> ffffc90000457d60
> [  134.554495] FS:  0000000000000000(0000) GS:ffff88817f400000(0000) 
> knlGS:0000000000000000
> [  134.554499] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  134.554502] CR2: 0000556b0675b600 CR3: 0000000106bd8000 CR4: 
> 0000000000350ef0
> [  134.554509] Call Trace:
> [  134.554512]  <TASK>
> [  134.554516]  ? __die_body+0x1a/0x60
> [  134.554525]  ? die+0x38/0x60
> [  134.554531]  ? do_trap+0x10f/0x120
> [  134.554538]  ? __iov_iter_get_pages_alloc+0x676/0x710
> [  134.568839]  ? do_error_trap+0x64/0xa0
> [  134.568847]  ? __iov_iter_get_pages_alloc+0x676/0x710
> [  134.568855]  ? exc_invalid_op+0x53/0x60
> [  134.572489]  ? __iov_iter_get_pages_alloc+0x676/0x710
> [  134.572496]  ? asm_exc_invalid_op+0x16/0x20
> [  134.572512]  ? __iov_iter_get_pages_alloc+0x676/0x710
> [  134.576726]  ? __iov_iter_get_pages_alloc+0x676/0x710
> [  134.576733]  ? srso_return_thunk+0x5/0x5f
> [  134.576740]  ? ___slab_alloc+0x924/0xb60
> [  134.580253]  ? mempool_alloc_noprof+0x41/0x190
> [  134.580262]  ? tls_get_rec+0x3d/0x1b0 [tls 
> 47f199c97f69357468c91efdbba24395e9dbfa77]
> [  134.580282]  iov_iter_get_pages2+0x19/0x30

Presumably that's __iov_iter_get_pages_alloc() doing get_page() either in
the " if (iov_iter_is_bvec(i)) " branch or via iter_folioq_get_pages()?

Which doesn't work for a sub-size kmalloc() from a slab folio, which after
the frozen refcount conversion no longer supports get_page().

The question is if this is a mistake specific for this path that's easy to
fix or there are more paths that do this. At the very least the pinning of
page through a kmalloc() allocation from it is useless - the object itself
has to be kfree()'d and that would never happen through a put_page()
reaching zero.

> [  134.580289]  sk_msg_zerocopy_from_iter+0x85/0x1d0
> [  134.580301]  ? srso_return_thunk+0x5/0x5f
> [  134.586842]  ? srso_return_thunk+0x5/0x5f
> [  134.586847]  ? __kmalloc_noprof+0x187/0x500
> [  134.586854]  ? srso_return_thunk+0x5/0x5f
> [  134.586859]  ? __sk_mem_raise_allocated+0x2ba/0x4a0
> [  134.591697]  ? srso_return_thunk+0x5/0x5f
> [  134.591703]  ? sk_page_frag_refill+0x19/0xb0
> [  134.591708]  ? srso_return_thunk+0x5/0x5f
> [  134.591712]  ? sk_msg_alloc+0x5a/0x2b0
> [  134.591722]  tls_sw_sendmsg+0x6bf/0x9b0 [tls 
> 47f199c97f69357468c91efdbba24395e9dbfa77]
> [  134.598284]  __sock_sendmsg+0x98/0xc0
> [  134.598293]  sock_sendmsg+0x5c/0xa0
> [  134.600490]  ? srso_return_thunk+0x5/0x5f
> [  134.600495]  ? __sock_sendmsg+0x98/0xc0
> [  134.600500]  ? srso_return_thunk+0x5/0x5f
> [  134.600504]  ? sock_sendmsg+0x5c/0xa0
> [  134.600515]  nvme_tcp_try_send_data+0x13f/0x410 [nvme_tcp 
> 71d3ffab2b48b41b11556946fd79065f8f8b0f42]
> [  134.607125]  ? __dequeue_entity+0x401/0x470
> [  134.607142]  nvme_tcp_try_send+0x299/0x330 [nvme_tcp 
> 71d3ffab2b48b41b11556946fd79065f8f8b0f42]
> [  134.607153]  nvme_tcp_io_work+0x37/0xb0 [nvme_tcp 
> 71d3ffab2b48b41b11556946fd79065f8f8b0f42]
> [  134.607162]  process_scheduled_works+0x97/0x400
> [  134.613657]  ? __pfx_worker_thread+0x10/0x10
> [  134.613663]  worker_thread+0x105/0x240
> [  134.613669]  ? __pfx_worker_thread+0x10/0x10
> [  134.613675]  kthread+0xec/0x200
> [  134.618136]  ? __pfx_kthread+0x10/0x10
> [  134.618144]  ret_from_fork+0x30/0x50
> [  134.618151]  ? __pfx_kthread+0x10/0x10
> [  134.618157]  ret_from_fork_asm+0x1a/0x30
> [  134.622519]  </TASK>
> [  134.622522] Modules linked in: tls(E) nvme_tcp(E) af_packet(E) 
> iscsi_ibft(E) iscsi_boot_sysfs(E) xfs(E) nls_iso8859_1(E) nls_cp437(E) 
> vfat(E) fat(E) iTCO_wdt(E) intel_rapl_msr(E) intel_pmc_bxt(E) 
> intel_rapl_common(E) iTCO_vendor_support(E) bnxt_en(E) i2c_i801(E) 
> i2c_mux(E) lpc_ich(E) i2c_smbus(E) joydev(E) mfd_core(E) 
> virtio_balloon(E) button(E) nvme_fabrics(E) nvme_keyring(E) nvme_core(E) 
> fuse(E) nvme_auth(E) efi_pstore(E) configfs(E) dmi_sysfs(E) ip_tables(E) 
> x_tables(E) hid_generic(E) usbhid(E) qxl(E) ahci(E) drm_client_lib(E) 
> libahci(E) drm_exec(E) xhci_pci(E) drm_ttm_helper(E) virtio_scsi(E) 
> libata(E) ttm(E) xhci_hcd(E) sd_mod(E) scsi_dh_emc(E) drm_kms_helper(E) 
> scsi_dh_rdac(E) ghash_clmulni_intel(E) scsi_dh_alua(E) sg(E) 
> sha512_ssse3(E) sha256_ssse3(E) drm(E) usbcore(E) scsi_mod(E) 
> sha1_ssse3(E) scsi_common(E) serio_raw(E) btrfs(E) blake2b_generic(E) 
> xor(E) raid6_pq(E) efivarfs(E) qemu_fw_cfg(E) virtio_rng(E) 
> aesni_intel(E) crypto_simd(E) cryptd(E)
> 
> Cheers,
> 
> Hannes





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux