On Fri, 20 Dec 2019, Eric Wheeler wrote: > On Wed, 25 Sep 2019, Eric Wheeler wrote: > > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi. > > We have been using the 4.19 branch for months without issue; we just > > switched to MQ and we seem to have hit this BUG_ON. Whether or not MQ is > > related to the issue, I don't know, maybe coincidence: > > > > static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b) > > { > > int r; > > enum allocation_event ev; > > struct sm_disk *smd = container_of(sm, struct sm_disk, sm); > > > > /* FIXME: we should loop round a couple of times */ > > r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b); > > if (r) > > return r; > > > > smd->begin = *b + 1; > > r = sm_ll_inc(&smd->ll, *b, &ev); > > if (!r) { > > BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > smd->nr_allocated_this_transaction++; > > } > > Hello all, > > We hit this BUG_ON again, this time with 4.19.86 with > scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as > additionally reported by Markus Schade: > > https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html > and > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398 > > In our case, the most latest trace (below) is from a different system that > has been stable for years on Linux 4.1 with tmeta direct on the SSD. > We updated to 4.19.86 a few weeks ago and just hit this, what Mike > Snitzer explains to be an allocator race: > > On Wed, 25 Sep 2019, Mike Snitzer wrote: > > > [199391.691714] dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool] > > > [199391.692411] alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool] > > > [199391.693142] process_cell+0x2a3/0x550 [dm_thin_pool] > > > [199391.693852] ? sort+0x17b/0x270 > > > [199391.694527] ? u32_swap+0x10/0x10 > > > [199391.695192] do_worker+0x268/0x9a0 [dm_thin_pool] > > > [199391.695890] process_one_work+0x171/0x370 > > > [199391.696640] worker_thread+0x49/0x3f0 > > > [199391.697332] kthread+0xf8/0x130 > > > [199391.697988] ? max_active_store+0x80/0x80 > > > [199391.698659] ? kthread_bind+0x10/0x10 > > > [199391.699281] ret_from_fork+0x1f/0x40 > > > > The stack shows the call to sm_disk_new_block() is due to > > dm_pool_alloc_data_block(). > > > > sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is > > getting called without the passed 'ev' being set to SM_ALLOC. Only > > drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets > > SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate() > > > > sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll > > should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert() > > returns 0 -- which simply means success. And on success > > sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate). > > > > sm_ll_mutate() decided to _not_ set SM_ALLOC because either: > > 1) ref_count wasn't set > > or > > 2) old was identified > > > > So all said: somehow a new data block was found to already be in use. > > _WHY_ that is the case isn't clear from this stack... > > > > But it does speak to the possibility of data block allocation racing > > with other operations to the same block. Which implies missing locking. > > Where would you look to add locking do you think? > > > But that's all I've got so far... I'll review past dm-thinp changes with > > all this in mind and see what turns up. But Joe Thornber (ejt) likely > > needs to have a look at this too. > > > > But could it be that bcache is the source of the data device race (same > > block used concurrently)? And DM thinp is acting as the canary in the > > coal mine? > > As Marcus has shown, this bug triggers without bcache. > > > Other questions: > > 1) Can sm_disk_new_block() fail more gracefully than BUG_ON? For example: > > + spin_lock(&lock); /* protect smd->begin */ > smd->begin = *b + 1; > r = sm_ll_inc(&smd->ll, *b, &ev); > if (!r) { > - BUG_ON(ev != SM_ALLOC); > smd->nr_allocated_this_transaction++; > } > + else { > + r = -ENOSPC; > + smd->begin = *b - 1; > + } > + spin_unlock(&lock); Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other times MQ has been turned on. I'm trying the -ENOSPC hack which will flag the pool as being out of space so I can recover more gracefully than a BUG_ON. Here's a first-draft patch, maybe the spinlock will even prevent the issue. Compile tested, I'll try on a real system tomorrow. Comments welcome: diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c index 32adf6b..cb27a20 100644 --- a/drivers/md/persistent-data/dm-space-map-disk.c +++ b/drivers/md/persistent-data/dm-space-map-disk.c @@ -161,6 +161,7 @@ static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b) return r; } +static DEFINE_SPINLOCK(smd_lock); static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b) { int r; @@ -168,17 +169,30 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b) struct sm_disk *smd = container_of(sm, struct sm_disk, sm); /* FIXME: we should loop round a couple of times */ + spin_lock(&smd_lock); r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b); - if (r) + if (r) { + spin_unlock(&smd_lock); return r; + } smd->begin = *b + 1; r = sm_ll_inc(&smd->ll, *b, &ev); if (!r) { - BUG_ON(ev != SM_ALLOC); - smd->nr_allocated_this_transaction++; + if (ev == SM_ALLOC) + smd->nr_allocated_this_transaction++; + else { + /* Not actually out of space, this is a bug: + * https://lore.kernel.org/linux-block/20190925200138.GA20584@xxxxxxxxxx/ + */ + WARN(ev != SM_ALLOC, "Pool metadata allocation race, marking pool out-of-space."); + r = -ENOSPC; + smd->begin = *b - 1; + } } + spin_unlock(&smd_lock); + return r; } -- Eric Wheeler > > The lock might protect smd->begin, but I'm not sure how &smd->ll might > have been modified by sm_ll_inc(). However, since ll->save_ie() failed in > sm_ll_mutate() then perhaps this is safe. What do you think? > > Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out > of space, but I would take it over a BUG_ON. > > 2) If example #1 above returned -EAGAIN, how might alloc_data_block be > taught retry? This bug shows up weeks or months apart, even on heavily > loaded systems with ~100 live thin volumes, so retrying would be fine > IMHO. > > 3) In the thread from June, Marcus says: > "Judging from the call trace, my guess is that there is somewhere > a race condition, when a new block needs to be allocated which has > still to be discarded." > > Is this discard sitation possible? Wouldn't the bio prison prevent this? > > -- > Eric Wheeler > www.datawall.us > > > > Here is the new trace, old trace below: > > kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178! > invalid opcode: 0000 [#1] SMP NOPTI > CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1 > Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c 11/30/2012 > Workqueue: dm-thin do_worker [dm_thin_pool] > RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data] > Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 > RSP: 0018:ffffc90007237c78 EFLAGS: 00010297 > RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000 > RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00 > RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98 > R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48 > R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a > FS: 0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0 > Call Trace: > dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool] > alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool] > process_cell+0x2a3/0x550 [dm_thin_pool] > ? mempool_alloc+0x6f/0x180 > ? sort+0x17b/0x270 > ? u32_swap+0x10/0x10 > process_deferred_bios+0x1af/0x870 [dm_thin_pool] > do_worker+0x94/0xe0 [dm_thin_pool] > process_one_work+0x171/0x370 > worker_thread+0x49/0x3f0 > kthread+0xf8/0x130 > ? max_active_store+0x80/0x80 > ? kthread_bind+0x10/0x10 > ret_from_fork+0x1f/0x40 > > > > > [199391.677689] ------------[ cut here ]------------ > > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178! > > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI > > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1 > > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015 > > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool] > > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data] > > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 > > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297 > > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000 > > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00 > > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98 > > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58 > > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0 > > [199391.688843] FS: 0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000 > > [199391.689571] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0 > > [199391.690984] Call Trace: > > [199391.691714] dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool] > > [199391.692411] alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool] > > [199391.693142] process_cell+0x2a3/0x550 [dm_thin_pool] > > [199391.693852] ? sort+0x17b/0x270 > > [199391.694527] ? u32_swap+0x10/0x10 > > [199391.695192] do_worker+0x268/0x9a0 [dm_thin_pool] > > [199391.695890] process_one_work+0x171/0x370 > > [199391.696640] worker_thread+0x49/0x3f0 > > [199391.697332] kthread+0xf8/0x130 > > [199391.697988] ? max_active_store+0x80/0x80 > > [199391.698659] ? kthread_bind+0x10/0x10 > > [199391.699281] ret_from_fork+0x1f/0x40 > > [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper > > [199391.705631] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod > > [199391.708083] ---[ end trace c31536d98046e8ec ]--- > > [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data] > > [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 > > [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297 > > [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000 > > [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00 > > [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98 > > [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58 > > [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0 > > [199391.875231] FS: 0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000 > > [199391.875941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0 > > [199391.877317] Kernel panic - not syncing: Fatal exception > > [199391.878006] Kernel Offset: disabled > > [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20) > > [199392.034277] Call Trace: > > [199392.034929] <IRQ> > > [199392.035576] native_apic_msr_write+0x2e/0x40 > > [199392.036228] arch_irq_work_raise+0x28/0x40 > > [199392.036877] irq_work_queue_on+0x83/0xa0 > > [199392.037518] irq_work_run_list+0x4c/0x70 > > [199392.038149] irq_work_run+0x14/0x40 > > [199392.038771] smp_call_function_single_interrupt+0x3a/0xd0 > > [199392.039393] call_function_single_interrupt+0xf/0x20 > > [199392.040011] </IRQ> > > [199392.040624] RIP: 0010:panic+0x209/0x25c > > [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de > > [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04 > > [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006 > > [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0 > > [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5 > > [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c > > [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951 > > [199392.046486] oops_end+0xc1/0xd0 > > [199392.047149] do_trap+0x13d/0x150 > > [199392.047795] do_error_trap+0xd5/0x130 > > [199392.048427] ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data] > > [199392.049048] invalid_op+0x14/0x20 > > [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data] > > [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 > > [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297 > > [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000 > > [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00 > > [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98 > > [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58 > > [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0 > > [199392.054807] ? __wake_up_common_lock+0x87/0xc0 > > [199392.055342] ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data] > > [199392.055877] dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool] > > [199392.056410] alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool] > > [199392.056947] process_cell+0x2a3/0x550 [dm_thin_pool] > > [199392.057484] ? sort+0x17b/0x270 > > [199392.058016] ? u32_swap+0x10/0x10 > > [199392.058538] do_worker+0x268/0x9a0 [dm_thin_pool] > > [199392.059060] process_one_work+0x171/0x370 > > [199392.059576] worker_thread+0x49/0x3f0 > > [199392.060083] kthread+0xf8/0x130 > > [199392.060587] ? max_active_store+0x80/0x80 > > [199392.061086] ? kthread_bind+0x10/0x10 > > [199392.061569] ret_from_fork+0x1f/0x40 > > [199392.062038] ------------[ cut here ]------------ > > [199392.062508] sched: Unexpected reschedule of offline CPU#1! > > [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40 > > [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper > > [199392.067463] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod > > [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G D 4.19.75 #1 > > [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015 > > [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool] > > [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40 > > [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48 > > [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082 > > [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006 > > [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0 > > [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5 > > [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640 > > [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001 > > [199392.076512] FS: 0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000 > > [199392.077179] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0 > > [199392.078481] Call Trace: > > [199392.079117] <IRQ> > > [199392.079745] check_preempt_curr+0x6b/0x90 > > [199392.080373] ttwu_do_wakeup+0x19/0x130 > > [199392.080999] try_to_wake_up+0x1e2/0x460 > > [199392.081623] __wake_up_common+0x8f/0x160 > > [199392.082246] ep_poll_callback+0x1af/0x300 > > [199392.082860] __wake_up_common+0x8f/0x160 > > [199392.083470] __wake_up_common_lock+0x7a/0xc0 > > [199392.084074] irq_work_run_list+0x4c/0x70 > > [199392.084675] smp_call_function_single_interrupt+0x3a/0xd0 > > [199392.085277] call_function_single_interrupt+0xf/0x20 > > [199392.085879] </IRQ> > > [199392.086477] RIP: 0010:panic+0x209/0x25c > > [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de > > [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04 > > [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006 > > [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0 > > [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5 > > [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c > > [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951 > > [199392.092255] oops_end+0xc1/0xd0 > > [199392.092894] do_trap+0x13d/0x150 > > [199392.093516] do_error_trap+0xd5/0x130 > > [199392.094122] ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data] > > [199392.094718] invalid_op+0x14/0x20 > > [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data] > > [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 > > [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297 > > [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000 > > [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00 > > [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98 > > [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58 > > [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0 > > [199392.100229] ? __wake_up_common_lock+0x87/0xc0 > > [199392.100744] ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data] > > [199392.101251] dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool] > > [199392.101752] alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool] > > [199392.102252] process_cell+0x2a3/0x550 [dm_thin_pool] > > [199392.102750] ? sort+0x17b/0x270 > > [199392.103242] ? u32_swap+0x10/0x10 > > [199392.103733] do_worker+0x268/0x9a0 [dm_thin_pool] > > [199392.104228] process_one_work+0x171/0x370 > > [199392.104714] worker_thread+0x49/0x3f0 > > [199392.105193] kthread+0xf8/0x130 > > [199392.105665] ? max_active_store+0x80/0x80 > > [199392.106132] ? kthread_bind+0x10/0x10 > > [199392.106601] ret_from_fork+0x1f/0x40 > > [199392.107069] ---[ end trace c31536d98046e8ed ]--- > > [199392.107544] ------------[ cut here ]------------ > > [199392.108017] sched: Unexpected reschedule of offline CPU#7! > > [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40 > > [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper > > [199392.112967] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod > > [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G D W 4.19.75 #1 > > [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015 > > [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool] > > [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40 > > [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48 > > [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082 > > [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006 > > [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0 > > [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529 > > [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988 > > [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8 > > [199392.121938] FS: 0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000 > > [199392.122589] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0 > > [199392.123867] Call Trace: > > [199392.124495] <IRQ> > > [199392.125116] update_process_times+0x40/0x50 > > [199392.125742] tick_sched_handle+0x25/0x60 > > [199392.126367] tick_sched_timer+0x37/0x70 > > [199392.126987] __hrtimer_run_queues+0xfb/0x270 > > [199392.127601] hrtimer_interrupt+0x122/0x270 > > [199392.128211] smp_apic_timer_interrupt+0x6a/0x140 > > [199392.128819] apic_timer_interrupt+0xf/0x20 > > [199392.129423] </IRQ> > > [199392.130020] RIP: 0010:panic+0x209/0x25c > > [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de > > [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 > > [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006 > > [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0 > > [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5 > > [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c > > [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951 > > [199392.135792] oops_end+0xc1/0xd0 > > [199392.136444] do_trap+0x13d/0x150 > > [199392.137094] do_error_trap+0xd5/0x130 > > [199392.137740] ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data] > > [199392.138383] invalid_op+0x14/0x20 > > [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data] > > [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 > > [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297 > > [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000 > > [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00 > > [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98 > > [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58 > > [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0 > > [199392.144301] ? __wake_up_common_lock+0x87/0xc0 > > [199392.144838] ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data] > > [199392.145374] dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool] > > [199392.145909] alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool] > > [199392.146435] process_cell+0x2a3/0x550 [dm_thin_pool] > > [199392.146949] ? sort+0x17b/0x270 > > [199392.147450] ? u32_swap+0x10/0x10 > > [199392.147944] do_worker+0x268/0x9a0 [dm_thin_pool] > > [199392.148441] process_one_work+0x171/0x370 > > [199392.148937] worker_thread+0x49/0x3f0 > > [199392.149430] kthread+0xf8/0x130 > > [199392.149922] ? max_active_store+0x80/0x80 > > [199392.150406] ? kthread_bind+0x10/0x10 > > [199392.150883] ret_from_fork+0x1f/0x40 > > [199392.151353] ---[ end trace c31536d98046e8ee ]--- > > > > > > -- > > Eric Wheeler > > > > -- > > dm-devel mailing list > > dm-devel@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/dm-devel > > > > > -- > dm-devel mailing list > dm-devel@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/dm-devel > >