On 2/11/22 4:57 PM, Zou Mingzhe wrote:
在 2022/2/11 16:39, Coly Li 写道:
On 2/11/22 2:39 PM, mingzhe.zou@xxxxxxxxxxxx wrote:
From: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx>
When multiple threads to check btree nodes in parallel, the main
thread wait for all threads to stop or CACHE_SET_IO_DISABLE flag:
wait_event_interruptible(check_state->wait,
atomic_read(&check_state->started) == 0 ||
test_bit(CACHE_SET_IO_DISABLE, &c->flags));
However, the bch_btree_node_read and bch_btree_node_read_done
maybe call bch_cache_set_error, then the CACHE_SET_IO_DISABLE
will be set. If the flag already set, the main thread return
error. At the same time, maybe some threads still running and
read NULL pointer, the kernel will crash.
Hi Mingzhe,
Could you please explain a bit more about "read NULL pointer"? Which
NULL pointer might be read in the above condition?
Thanks.
Coly Li
Hi ColyLi:
This is dmesg output information:
[956549.478889] bcache: bch_cache_set_error() bcache: error on
c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8:
[956549.492265] unsupported bset version at bucket 78226, block 0,
282468165 keys
[956549.492266] bcache: bch_cache_set_error() , disabling caching
What happens here? It seems the btree node is corrupted with unexpected
bset version.
[956549.531298] bcache: conditional_stop_bcache_device()
stop_when_cache_set_failed of escache2 is "auto" and cache is dirty,
stop it to avoid potential data corruption.
[956549.557162] bcache: cached_dev_detach_finish() Caching disabled
for sdc1
[956549.570292] bcache: bcache_device_free() escache2 stopped
[956549.603434] bcache: cache_set_free() Cache set
c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8 unregistered
[956583.000094] bcache: bch_cache_set_error() bcache: error on
4f45b0b1-32ea-4d83-a2b1-caa169a95b35:
[956583.013567] unsupported bset version at bucket 25452, block 0,
208758840 keys
[956583.013569] bcache: bch_cache_set_error() , disabling caching
Similar corrupted btree node here.
The corrupted btree node should be the first thing to look into.
[956583.053296] bcache: conditional_stop_bcache_device()
stop_when_cache_set_failed of escache1 is "auto" and cache is dirty,
stop it to avoid potential data corruption.
[956583.079968] bcache: cached_dev_detach_finish() Caching disabled
for sdd1
[956583.093539] bcache: bcache_device_free() escache1 stopped
[956583.126677] bcache: cache_set_free() Cache set
4f45b0b1-32ea-4d83-a2b1-caa169a95b35 unregistered
[956589.522067] sdc: sdc1
[956590.291748] bcache: register_bdev() registered backing device sdc1
[956590.307925] bcache: cache_alloc() set
c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8 with btree_buckets: 256, free:
128, free_inc: 11264, heap: 22528
[956590.406700] bcache: bch_cache_set_error() bcache: error on
c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8:
[956590.411369] bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE
already set
[956590.412453] bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE
already set
[956590.412455] bcache: bch_cache_set_error() bcache: error on
c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8:
[956590.412456] io error reading bucket 81152
[956590.412456] bcache: bch_cache_set_error() , disabling caching
[956590.412578] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000010
[956590.412594] PGD 0 P4D 0
[956590.412624] Oops: 0000 [#1] SMP PTI
[956590.412645] CPU: 12 PID: 55477 Comm: bch_btrchk[9] Kdump: loaded
Tainted: G OE --------- - -
4.18.0-147.5.1.el8_1.5es.16.x86_64 #1
[956590.412646] Hardware name: Tsinghua Tongfang THTF Chaoqiang
Server/072T6D, BIOS 2.4.3 01/17/2017
[956590.412859] RIP: 0010:bch_btree_node_read_done+0x1e9/0x400 [escache]
[956590.412868] Code: 11 00 00 45 89 d3 0f b7 c0 41 c1 e3 09 49 89 c6
41 0f af d3 49 d3 ee 4c 89 f1 48 63 d2 49 8d 1c 10 45 39 f1 0f 83 da
00 00 00 <4d> 8b 78 10 4c 39 7b 10 0f 85 d4 01 00 00 44 8b 43 18 41 83
f8 01
[956590.412870] RSP: 0018:ffffac308ec4fd58 EFLAGS: 00010283
[956590.412882] RAX: 0000000000000200 RBX: 0000000000035600 RCX:
0000000000000200
[956590.412883] RDX: 0000000000035600 RSI: 00000000000001ab RDI:
ffff911fcac20000
[956590.412889] RBP: ffff911f090f7000 R08: 0000000000000000 R09:
00000000000001ab
[956590.412890] R10: 0000000000000001 R11: 0000000000000200 R12:
ffff910fb3238000
[956590.412891] R13: 90135c78b99e07f5 R14: 0000000000000200 R15:
ffff9119994b4620
[956590.412896] FS: 0000000000000000(0000) GS:ffff91107f980000(0000)
knlGS:0000000000000000
[956590.412898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[956590.412902] bcache: cache_set_free() Cache set
c5f7cdbf-fe07-4bc3-99f5-47119a3d2af8 unregistered
[956590.412911] CR2: 0000000000000010 CR3: 0000001534e0a002 CR4:
00000000003626e0
[956590.412925] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[956590.412926] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[956590.412928] Call Trace:
[956590.412941] bch_btree_node_read+0xe7/0x1a0 [escache]
[956590.412953] ? bch_keybuf_init+0x60/0x60 [escache]
[956590.412976] ? bch_ptr_invalid+0x10/0x10 [escache]
[956590.413002] btree_node_prefetch+0x6e/0x90 [escache]
[956590.413025] bch_btree_check_thread+0x15c/0x270 [escache]
[956590.413089] ? finish_task_switch+0xd7/0x2b0
[956590.413099] ? bch_btree_check_recurse+0x1b0/0x1b0 [escache]
[956590.413131] kthread+0x112/0x130
[956590.413139] ? kthread_flush_work_fn+0x10/0x10
[956590.413164] ret_from_fork+0x35/0x40
[956590.413191] Modules linked in: iscsi_target_mod target_core_mod
vhost_net vhost tap tun tcp_diag udp_diag inet_diag unix_diag ext4
mbcache jbd2 nbd sch_htb act_police cls_u32 sch_ingress xt_connmark
xt_CHECKSUM ip_set ip_gre ip_tunnel gre ebtable_filter ebtables
ip6table_filter nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp
nf_conntrack_proto_gre veth nf_conntrack_netlink nfnetlink
ip6table_nat ip6_tables overlay iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi vxlan ip6_udp_tunnel udp_tunnel openvswitch
nf_nat_ipv6 nf_conncount ipt_REJECT nf_reject_ipv4 iptable_filter
xt_conntrack iptable_mangle xt_statistic xt_nat ipt_MASQUERADE
xt_comment xt_addrtype xt_recent xt_mark iptable_nat nf_nat_ipv4
nf_nat iptable_raw vfat fat intel_rapl sb_edac x86_pkg_temp_thermal
mxm_wmi intel_powerclamp coretemp
[956590.413654] kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf pcspkr
mei_me mei joydev lpc_ich sg ipmi_si acpi_power_meter wmi escache nfsd
nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4
auth_rpcgss br_netfilter bridge nfs_acl lockd grace stp llc sunrpc
nf_conntrack ip_tables xfs sd_mod drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm ixgbe nvme drm ahci libahci
libata nvme_core igb mdio dca i2c_algo_bit megaraid_sas(OE) dm_mirror
dm_region_hash dm_log dm_mod rbd libceph dns_resolver ipmi_watchdog
ipmi_devintf ipmi_msghandler drbd_transport_tcp(OE) drbd(OE) libcrc32c
crc32c_intel
[956590.413942] CR2: 0000000000000010
I see some print "bcache: bch_cache_set_error() bcache: error on", but
I don't have any more information on how to go wrong.
Then the 'bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE already
set' mean the CACHE_SET_IO_DISABLE flag has been set and the main
thread will return error.
At the same time, some threads is running bch_btree_check and read
NULL pointer.
As I asked before, can you tell which NULL pointer was read to trigger
the NULL pointer deference ? Then I can evaluate whether the fix is
efficient.
[snipped]
This patch change the event wait condition, the main thread must
wait for all threads to stop.
Signed-off-by: Mingzhe Zou <mingzhe.zou@xxxxxxxxxxxx>
---
drivers/md/bcache/btree.c | 6 ++++--
drivers/md/bcache/writeback.c | 6 ++++--
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 88c573eeb598..ad9f16689419 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -2060,9 +2060,11 @@ int bch_btree_check(struct cache_set *c)
}
}
+ /*
+ * Must wait for all threads to stop.
+ */
wait_event_interruptible(check_state->wait,
- atomic_read(&check_state->started) == 0 ||
- test_bit(CACHE_SET_IO_DISABLE, &c->flags));
+ atomic_read(&check_state->started) == 0);
for (i = 0; i < check_state->total_threads; i++) {
if (check_state->infos[i].result) {
diff --git a/drivers/md/bcache/writeback.c
b/drivers/md/bcache/writeback.c
index c7560f66dca8..68d3dd6b4f11 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -998,9 +998,11 @@ void bch_sectors_dirty_init(struct
bcache_device *d)
}
}
+ /*
+ * Must wait for all threads to stop.
+ */
wait_event_interruptible(state->wait,
- atomic_read(&state->started) == 0 ||
- test_bit(CACHE_SET_IO_DISABLE, &c->flags));
+ atomic_read(&state->started) == 0);
out:
kfree(state);