Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 30 Oct 2023, Marek Marczykowski-Górecki wrote:

> On Mon, Oct 30, 2023 at 04:56:03PM +0100, Jan Kara wrote:
> > On Mon 30-10-23 15:08:56, Mikulas Patocka wrote:
> > > On Mon, 30 Oct 2023, Marek Marczykowski-Górecki wrote:
> > > 
> > > > > Well, it would be possible that larger pages in a bio would trip e.g. bio
> > > > > splitting due to maximum segment size the disk supports (which can be e.g.
> > > > > 0xffff) and that upsets something somewhere. But this is pure
> > > > > speculation. We definitely need more debug data to be able to tell more.
> > > > 
> > > > I can collect more info, but I need some guidance how :) Some patch
> > > > adding extra debug messages?
> > > > Note I collect those via serial console (writing to disk doesn't work
> > > > when it freezes), and that has some limits in the amount of data I can
> > > > extract especially when printed quickly. For example sysrq-t is too much.
> > > > Or maybe there is some trick to it, like increasing log_bug_len?
> > > 
> > > If you can do more tests, I would suggest this:
> > > 
> > > We already know that it works with order 3 and doesn't work with order 4.
> > > 
> > > So, in the file include/linux/mmzone.h, change PAGE_ALLOC_COSTLY_ORDER 
> > > from 3 to 4 and in the file drivers/md/dm-crypt.c leave "unsigned int 
> > > order = PAGE_ALLOC_COSTLY_ORDER" there.
> > > 
> > > Does it deadlock or not?
> > > 
> > > So, that we can see whether the deadlock depends on 
> > > PAGE_ALLOC_COSTLY_ORDER or whether it is just a coincidence.
> > 
> > Good idea. Also if the kernel hangs, please find kcryptd processes. In what
> > state are they? If they are sleeping, please send what's in
> > /proc/<kcryptd-pid>/stack. Thanks!
> 
> Will do.
> 
> In the meantime, while testing version with PAGE_ALLOC_COSTLY_ORDER=4,
> and order=PAGE_ALLOC_COSTLY_ORDER, I'm getting crash like this (see
> important note below both traces):

Perhaps the kernel uses a hardcoded value "3" instead of 
PAGE_ALLOC_COSTLY_ORDER somewhere...

>     [   92.668486] BUG: unable to handle page fault for address: ffff8880c7b64098
>     [   92.668558] #PF: supervisor read access in kernel mode
>     [   92.668574] #PF: error_code(0x0000) - not-present page
>     [   92.668590] PGD 2a32067 P4D 2a32067 PUD 12868a067 PMD 0 
>     [   92.668617] Oops: 0000 [#1] PREEMPT SMP NOPTI
>     [   92.668637] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W          6.5.6-dirty #354
>     [   92.668658] Hardware name: Star Labs StarBook/StarBook, BIOS 8.97 10/03/2023
>     [   92.668675] RIP: e030:__free_one_page+0x301/0x3e0
>     [   92.668704] Code: 02 0f 85 c1 fe ff ff 49 c1 e6 04 49 8d 4c 24 08 4a 8d 94 36 c0 00 00 00 48 8d 34 80 48 8d 04 70 4c 01 fa 48 c1 e0 03 49 01 c6 <4b> 8b b4 37 c0 00 00 00 48 89 4e 08 49 89 74 24 08 49 89 54 24 10
>     [   92.668738] RSP: e02b:ffffc90040154c60 EFLAGS: 00010006
>     [   92.668754] RAX: 0000000000000058 RBX: 0000000000000001 RCX: ffff888075f66bc0
>     [   92.668773] RDX: ffff8880c7b64098 RSI: 0000000000000005 RDI: 0000000000000000
>     [   92.668789] RBP: fffffe7a01d7d9ae R08: ffff888075f66b38 R09: fffffe7a01d7d9ac
>     [   92.668805] R10: ffffea0004b35bc8 R11: 0000000000000001 R12: ffff888075f66bb8
>     [   92.668821] R13: 0000000000000000 R14: 0000000051bfd4d8 R15: ffff888075f66b00
>     [   92.668854] FS:  0000000000000000(0000) GS:ffff888189740000(0000) knlGS:0000000000000000
>     [   92.668874] CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
>     [   92.668889] CR2: ffff8880c7b64098 CR3: 0000000133cf8000 CR4: 0000000000050660
>     [   92.668912] Call Trace:
>     [   92.668924]  <IRQ>
>     [   92.668934]  ? __die+0x1e/0x60
>     [   92.668955]  ? page_fault_oops+0x178/0x4a0
>     [   92.668975]  ? exc_page_fault+0x14e/0x160
>     [   92.668994]  ? asm_exc_page_fault+0x26/0x30
>     [   92.669014]  ? __free_one_page+0x301/0x3e0
>     [   92.669027]  free_pcppages_bulk+0x11c/0x2b0
>     [   92.669042]  free_unref_page+0x10d/0x170
>     [   92.669058]  crypt_free_buffer_pages+0x1f4/0x250
>     [   92.669079]  crypt_endio+0x48/0x70
>     [   92.669094]  blk_mq_end_request_batch+0xd0/0x400
>     [   92.669114]  nvme_irq+0x6d/0x80
>     [   92.669132]  ? __pfx_nvme_pci_complete_batch+0x10/0x10
>     [   92.669148]  __handle_irq_event_percpu+0x42/0x1a0
>     [   92.669166]  handle_irq_event+0x33/0x70
>     [   92.669179]  handle_edge_irq+0x9e/0x240
>     [   92.669197]  handle_irq_desc+0x36/0x50
>     [   92.669215]  __evtchn_fifo_handle_events+0x1af/0x1d0
>     [   92.669236]  __xen_evtchn_do_upcall+0x70/0xd0
>     [   92.669255]  __xen_pv_evtchn_do_upcall+0x3d/0x70
>     [   92.669275]  xen_pv_evtchn_do_upcall+0xd9/0x110
>     [   92.669294]  </IRQ>
>     [   92.669302]  <TASK>
>     [   92.669310]  exc_xen_hypervisor_callback+0x8/0x20
>     [   92.669330] RIP: e030:xen_hypercall_xen_version+0xa/0x20
>     [   92.669348] Code: 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>     [   92.669380] RSP: e02b:ffffc900400c7e08 EFLAGS: 00000246
>     [   92.669394] RAX: 0000000000040011 RBX: 0000000000000000 RCX: ffffffff81fc222a
>     [   92.669411] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
>     [   92.669426] RBP: ffffc900400c7eb0 R08: 0000000000000000 R09: 00000000000001ba
>     [   92.669442] R10: 0000000000007ff0 R11: 0000000000000246 R12: ffff88818976c540
>     [   92.669458] R13: ffff888101e10000 R14: 0000000000000000 R15: 0000000000000402
>     [   92.669475]  ? xen_hypercall_xen_version+0xa/0x20
>     [   92.669493]  ? pmu_msr_read+0x3c/0xd0
>     [   92.669510]  ? xen_force_evtchn_callback+0xd/0x20
>     [   92.669526]  ? check_events+0x16/0x30
>     [   92.669540]  ? xen_irq_enable_direct+0x1d/0x30
>     [   92.669556]  ? finish_task_switch.isra.0+0x8e/0x270
>     [   92.669576]  ? __switch_to+0x165/0x3b0
>     [   92.669590]  ? __schedule+0x316/0x8b0
>     [   92.669609]  ? schedule_idle+0x25/0x40
>     [   92.669626]  ? cpu_startup_entry+0x25/0x30
>     [   92.669641]  ? cpu_bringup_and_idle+0x89/0xa0
>     [   92.669660]  ? asm_cpu_bringup_and_idle+0x9/0x10
>     [   92.669680]  </TASK>
>     [   92.669688] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof snd_sof_utils snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core soundwire_bus snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hwdep hid_multitouch snd_hda_core snd_seq snd_seq_device idma64 snd_pcm i2c_designware_platform iwlwifi i2c_designware_core snd_timer snd soundcore i2c_i801 i2c_smbus efivarfs i2c_hid_acpi i2c_hid pinctrl_tigerlake pinctrl_intel xen_acpi_processor xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
>     [   92.669901] CR2: ffff8880c7b64098
>     [   92.669915] ---[ end trace 0000000000000000 ]---
>     [   92.669930] RIP: e030:__free_one_page+0x301/0x3e0
>     [   92.669946] Code: 02 0f 85 c1 fe ff ff 49 c1 e6 04 49 8d 4c 24 08 4a 8d 94 36 c0 00 00 00 48 8d 34 80 48 8d 04 70 4c 01 fa 48 c1 e0 03 49 01 c6 <4b> 8b b4 37 c0 00 00 00 48 89 4e 08 49 89 74 24 08 49 89 54 24 10
>     [   92.669977] RSP: e02b:ffffc90040154c60 EFLAGS: 00010006
>     [   92.669992] RAX: 0000000000000058 RBX: 0000000000000001 RCX: ffff888075f66bc0
>     [   92.670008] RDX: ffff8880c7b64098 RSI: 0000000000000005 RDI: 0000000000000000
>     [   92.670024] RBP: fffffe7a01d7d9ae R08: ffff888075f66b38 R09: fffffe7a01d7d9ac
>     [   92.670040] R10: ffffea0004b35bc8 R11: 0000000000000001 R12: ffff888075f66bb8
>     [   92.670055] R13: 0000000000000000 R14: 0000000051bfd4d8 R15: ffff888075f66b00
>     [   92.670083] FS:  0000000000000000(0000) GS:ffff888189740000(0000) knlGS:0000000000000000
>     [   92.670101] CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
>     [   92.670116] CR2: ffff8880c7b64098 CR3: 0000000133cf8000 CR4: 0000000000050660
>     [   92.670148] Kernel panic - not syncing: Fatal exception in interrupt
>     [   92.670177] ------------[ cut here ]------------
>     [   92.670188] WARNING: CPU: 5 PID: 0 at kernel/smp.c:766 smp_call_function_many_cond+0x4db/0x560
>     [   92.670219] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof snd_sof_utils snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core soundwire_bus snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hwdep hid_multitouch snd_hda_core snd_seq snd_seq_device idma64 snd_pcm i2c_designware_platform iwlwifi i2c_designware_core snd_timer snd soundcore i2c_i801 i2c_smbus efivarfs i2c_hid_acpi i2c_hid pinctrl_tigerlake pinctrl_intel xen_acpi_processor xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
>     [   92.670400] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G      D W          6.5.6-dirty #354
>     [   92.670419] Hardware name: Star Labs StarBook/StarBook, BIOS 8.97 10/03/2023
>     [   92.670434] RIP: e030:smp_call_function_many_cond+0x4db/0x560
>     [   92.670455] Code: f0 52 1b 81 48 89 74 24 08 e8 d1 c8 f6 ff 48 8b 74 24 08 65 ff 0d 5d 70 e7 7e 0f 85 bb fd ff ff 0f 1f 44 00 00 e9 b1 fd ff ff <0f> 0b e9 69 fb ff ff 8b 7c 24 38 e8 d5 4f f7 ff 84 c0 0f 84 a8 fd
>     [   92.670486] RSP: e02b:ffffc900401549d0 EFLAGS: 00010006
>     [   92.670500] RAX: 0000000080010007 RBX: 0000000000000000 RCX: 0000000000000000
>     [   92.670516] RDX: 0000000000000000 RSI: ffffffff826f3c61 RDI: ffffffff82cbc1d0
>     [   92.670531] RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000001
>     [   92.670546] R10: 00000000ffffdfff R11: ffffffff82a5ddc0 R12: 0000000000000005
>     [   92.670562] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>     [   92.670588] FS:  0000000000000000(0000) GS:ffff888189740000(0000) knlGS:0000000000000000
>     [   92.670606] CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
>     [   92.670620] CR2: ffff8880c7b64098 CR3: 0000000133cf8000 CR4: 0000000000050660
>     [   92.670642] Call Trace:
>     [   92.670650]  <IRQ>
>     [   92.670659]  ? smp_call_function_many_cond+0x4db/0x560
>     [   92.670677]  ? __warn+0x7c/0x130
>     [   92.670693]  ? smp_call_function_many_cond+0x4db/0x560
>     [   92.670711]  ? report_bug+0x191/0x1c0
>     [   92.670726]  ? handle_bug+0x3c/0x80
>     [   92.670742]  ? exc_invalid_op+0x17/0x70
>     [   92.670758]  ? asm_exc_invalid_op+0x1a/0x20
>     [   92.670775]  ? smp_call_function_many_cond+0x4db/0x560
>     [   92.670793]  ? __pfx_stop_self+0x10/0x10
>     [   92.670811]  ? _printk+0x5f/0x80
>     [   92.670823]  ? __pfx_stop_self+0x10/0x10
>     [   92.670840]  smp_call_function+0x38/0x70
>     [   92.670857]  panic+0x19c/0x320
>     [   92.670873]  oops_end+0xd8/0xe0
>     [   92.670887]  page_fault_oops+0x19c/0x4a0
>     [   92.670905]  exc_page_fault+0x14e/0x160
>     [   92.670921]  asm_exc_page_fault+0x26/0x30
>     [   92.670937] RIP: e030:__free_one_page+0x301/0x3e0
>     [   92.670952] Code: 02 0f 85 c1 fe ff ff 49 c1 e6 04 49 8d 4c 24 08 4a 8d 94 36 c0 00 00 00 48 8d 34 80 48 8d 04 70 4c 01 fa 48 c1 e0 03 49 01 c6 <4b> 8b b4 37 c0 00 00 00 48 89 4e 08 49 89 74 24 08 49 89 54 24 10
>     [   92.670983] RSP: e02b:ffffc90040154c60 EFLAGS: 00010006
>     [   92.670997] RAX: 0000000000000058 RBX: 0000000000000001 RCX: ffff888075f66bc0
>     [   92.671013] RDX: ffff8880c7b64098 RSI: 0000000000000005 RDI: 0000000000000000
>     [   92.671028] RBP: fffffe7a01d7d9ae R08: ffff888075f66b38 R09: fffffe7a01d7d9ac
>     [   92.671044] R10: ffffea0004b35bc8 R11: 0000000000000001 R12: ffff888075f66bb8
>     [   92.671059] R13: 0000000000000000 R14: 0000000051bfd4d8 R15: ffff888075f66b00
>     [   92.671077]  free_pcppages_bulk+0x11c/0x2b0
>     [   92.671091]  free_unref_page+0x10d/0x170
>     [   92.671106]  crypt_free_buffer_pages+0x1f4/0x250
>     [   92.671122]  crypt_endio+0x48/0x70
>     [   92.671136]  blk_mq_end_request_batch+0xd0/0x400
>     [   92.671152]  nvme_irq+0x6d/0x80
>     [   92.671166]  ? __pfx_nvme_pci_complete_batch+0x10/0x10
>     [   92.671181]  __handle_irq_event_percpu+0x42/0x1a0
>     [   92.671196]  handle_irq_event+0x33/0x70
>     [   92.671208]  handle_edge_irq+0x9e/0x240
>     [   92.671224]  handle_irq_desc+0x36/0x50
>     [   92.671241]  __evtchn_fifo_handle_events+0x1af/0x1d0
>     [   92.671258]  __xen_evtchn_do_upcall+0x70/0xd0
>     [   92.671275]  __xen_pv_evtchn_do_upcall+0x3d/0x70
>     [   92.671292]  xen_pv_evtchn_do_upcall+0xd9/0x110
>     [   92.671308]  </IRQ>
>     [   92.671316]  <TASK>
>     [   92.671324]  exc_xen_hypervisor_callback+0x8/0x20
>     [   92.671342] RIP: e030:xen_hypercall_xen_version+0xa/0x20
>     [   92.671360] Code: 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>     [   92.671390] RSP: e02b:ffffc900400c7e08 EFLAGS: 00000246
>     [   92.671403] RAX: 0000000000040011 RBX: 0000000000000000 RCX: ffffffff81fc222a
>     [   92.671419] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
>     [   92.671434] RBP: ffffc900400c7eb0 R08: 0000000000000000 R09: 00000000000001ba
>     [   92.671449] R10: 0000000000007ff0 R11: 0000000000000246 R12: ffff88818976c540
>     [   92.671465] R13: ffff888101e10000 R14: 0000000000000000 R15: 0000000000000402
>     [   92.671481]  ? xen_hypercall_xen_version+0xa/0x20
>     [   92.671499]  ? pmu_msr_read+0x3c/0xd0
>     [   92.671514]  ? xen_force_evtchn_callback+0xd/0x20
>     [   92.671529]  ? check_events+0x16/0x30
>     [   92.671543]  ? xen_irq_enable_direct+0x1d/0x30
>     [   92.671560]  ? finish_task_switch.isra.0+0x8e/0x270
>     [   92.671579]  ? __switch_to+0x165/0x3b0
>     [   92.671593]  ? __schedule+0x316/0x8b0
>     [   92.671611]  ? schedule_idle+0x25/0x40
>     [   92.671631]  ? cpu_startup_entry+0x25/0x30
>     [   92.671644]  ? cpu_bringup_and_idle+0x89/0xa0
>     [   92.671663]  ? asm_cpu_bringup_and_idle+0x9/0x10
>     [   92.671683]  </TASK>
>     [   92.671691] ---[ end trace 0000000000000000 ]---
>     [   92.671782] Kernel Offset: disabled
>     (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
> 
> 
> If I change order=PAGE_ALLOC_COSTLY_ORDER+1, then this:
> 
> 
>     [ 2205.112802] BUG: unable to handle page fault for address: ffffffff89630301
>     [ 2205.112866] #PF: supervisor write access in kernel mode
>     [ 2205.112882] #PF: error_code(0x0002) - not-present page
>     [ 2205.112899] PGD 2a35067 P4D 2a35067 PUD 2a36067 PMD 0 
>     [ 2205.112921] Oops: 0002 [#1] PREEMPT SMP NOPTI
>     [ 2205.112946] CPU: 0 PID: 12609 Comm: kworker/u12:9 Tainted: G        W          6.5.6-dirty #355
>     [ 2205.112979] Hardware name: Star Labs StarBook/StarBook, BIOS 8.97 10/03/2023
>     [ 2205.112997] Workqueue: kcryptd/252:0 kcryptd_crypt
>     [ 2205.113022] RIP: e030:get_page_from_freelist+0x281/0x10c0
>     [ 2205.113044] Code: 6c 05 18 49 89 df 8b 5c 24 34 49 8b 47 18 48 39 c5 0f 84 85 02 00 00 49 8b 47 18 48 8b 48 08 48 8b 30 48 8d 50 f8 48 89 4e 08 <48> 89 31 48 b9 00 01 00 00 00 00 ad de 48 89 08 48 83 c1 22 48 89
>     [ 2205.113079] RSP: e02b:ffffc90041ea7c48 EFLAGS: 00010283
>     [ 2205.113096] RAX: ffffea00059b9248 RBX: 0000000000000001 RCX: ffffffff89630301
>     [ 2205.113117] RDX: ffffea00059b9240 RSI: ffffea00059b9201 RDI: 0000000000000001
>     [ 2205.113134] RBP: ffff888189631718 R08: 000000000000c000 R09: 000000000003cb07
>     [ 2205.113153] R10: 0000000000000001 R11: fefefefefefefeff R12: ffff888075f66b00
>     [ 2205.113171] R13: ffff888189631700 R14: 0000000000000000 R15: ffff888189631700
>     [ 2205.113208] FS:  0000000000000000(0000) GS:ffff888189600000(0000) knlGS:0000000000000000
>     [ 2205.113229] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>     [ 2205.113245] CR2: ffffffff89630301 CR3: 00000001049d0000 CR4: 0000000000050660
>     [ 2205.113272] Call Trace:
>     [ 2205.113285]  <TASK>
>     [ 2205.113295]  ? __die+0x1e/0x60
>     [ 2205.113312]  ? page_fault_oops+0x178/0x4a0
>     [ 2205.113330]  ? exc_page_fault+0x14e/0x160
>     [ 2205.113346]  ? asm_exc_page_fault+0x26/0x30
>     [ 2205.113363]  ? get_page_from_freelist+0x281/0x10c0
>     [ 2205.113379]  ? get_page_from_freelist+0x227/0x10c0
>     [ 2205.113396]  __alloc_pages+0x1dd/0x300
>     [ 2205.113412]  crypt_page_alloc+0x29/0x60
>     [ 2205.113425]  mempool_alloc+0x81/0x1b0
>     [ 2205.113443]  kcryptd_crypt+0x293/0x4b0
>     [ 2205.113458]  process_one_work+0x1e0/0x3e0
>     [ 2205.113476]  worker_thread+0x49/0x3b0
>     [ 2205.113490]  ? _raw_spin_lock_irqsave+0x22/0x50
>     [ 2205.113510]  ? __pfx_worker_thread+0x10/0x10
>     [ 2205.113527]  kthread+0xef/0x120
>     [ 2205.113541]  ? __pfx_kthread+0x10/0x10
>     [ 2205.113554]  ret_from_fork+0x2c/0x50
>     [ 2205.113568]  ? __pfx_kthread+0x10/0x10
>     [ 2205.113581]  ret_from_fork_asm+0x1b/0x30
>     [ 2205.113598]  </TASK>
>     [ 2205.113606] Modules linked in: snd_hda_codec_hdmi snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_hda_codec_generic snd_sof ledtrig_audio snd_sof_utils snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core soundwire_bus snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm hid_multitouch snd_timer snd i2c_designware_platform i2c_designware_core i2c_i801 soundcore iwlwifi idma64 i2c_smbus i2c_hid_acpi i2c_hid pinctrl_tigerlake pinctrl_intel xen_acpi_processor xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput efivarfs
>     [ 2205.113795] CR2: ffffffff89630301
>     [ 2205.113808] ---[ end trace 0000000000000000 ]---
>     [ 2205.113822] RIP: e030:get_page_from_freelist+0x281/0x10c0
>     [ 2205.113839] Code: 6c 05 18 49 89 df 8b 5c 24 34 49 8b 47 18 48 39 c5 0f 84 85 02 00 00 49 8b 47 18 48 8b 48 08 48 8b 30 48 8d 50 f8 48 89 4e 08 <48> 89 31 48 b9 00 01 00 00 00 00 ad de 48 89 08 48 83 c1 22 48 89
>     [ 2205.113873] RSP: e02b:ffffc90041ea7c48 EFLAGS: 00010283
>     [ 2205.113887] RAX: ffffea00059b9248 RBX: 0000000000000001 RCX: ffffffff89630301
>     [ 2205.113903] RDX: ffffea00059b9240 RSI: ffffea00059b9201 RDI: 0000000000000001
>     [ 2205.113919] RBP: ffff888189631718 R08: 000000000000c000 R09: 000000000003cb07
>     [ 2205.113935] R10: 0000000000000001 R11: fefefefefefefeff R12: ffff888075f66b00
>     [ 2205.113950] R13: ffff888189631700 R14: 0000000000000000 R15: ffff888189631700
>     [ 2205.113980] FS:  0000000000000000(0000) GS:ffff888189600000(0000) knlGS:0000000000000000
>     [ 2205.113998] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>     [ 2205.114012] CR2: ffffffff89630301 CR3: 00000001049d0000 CR4: 0000000000050660
>     [ 2205.114038] note: kworker/u12:9[12609] exited with irqs disabled
>     [ 2205.114100] note: kworker/u12:9[12609] exited with preempt_count 2
> 
> Then retried with order=PAGE_ALLOC_COSTLY_ORDER and
> PAGE_ALLOC_COSTLY_ORDER back at 3, and also got similar crash.

So, does it mean that even allocating with order=PAGE_ALLOC_COSTLY_ORDER 
isn't safe?

Try enabling CONFIG_DEBUG_VM (it also needs CONFIG_DEBUG_KERNEL) and try 
to provoke a similar crash. Let's see if it crashes on one of the 
VM_BUG_ON statements.

Mikulas

> Both happened only after logging into X session (lightdm -> Xfce) and
> starting xfce4-terminal. If I use text console/ssh/serial and leave
> lightdm at login screen, then it does not crash (this is in fact how I
> did most previous tests). I have no idea if it's related to some other
> bug somewhere else (graphics driver?), or simply higher memory usage due
> to the whole Xfce session running.
> It could be also a coincidence, the sample size is rather small...
> But could be also some memory corruption that depending on memory layout
> sometimes results in a crash and sometimes in "just" storage freeze.
> 
> Note this all is still on top of 6.5.6 with changes we discuss here. If
> you believe it's another issue that got fixed in the meantime, I can
> switch to another version, but otherwise I'd like to limit changes.
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> 

[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux