Re: [PATCH] block: drbd: defer calling kref_put until end of drbd_delete_device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/18/20 10:52 PM, Sarah Newman wrote:
At least once I saw:

drbd resource37: ASSERTION FAILED:
   connection->current_epoch->list not empty
drbd resource37: Connection closed
drbd resource37: conn( Disconnecting -> StandAlone )
drbd resource37: receiver terminated
drbd resource37: Terminating drbd_r_resource
block drbd37: disk( UpToDate -> Failed )
block drbd37: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
block drbd37: disk( Failed -> Diskless )
general protection fault: 0000 [#1] SMP NOPTI
CPU: 0 PID: 18526 Comm: drbdsetup-84 Not tainted 5.4.46-1.el7.x86_64 #1
RIP: e030:kobject_uevent_env+0x1d/0x660
RSP: e02b:ffffc900757a7a10 EFLAGS: 00010246
RAX: 0000000000000001 RBX: fdfdfdfdfdfe023d RCX: ffff8880606f9870
RDX: 0000000000000000 RSI: 0000000000000001 RDI: fdfdfdfdfdfe023d
RBP: ffff8880606f9870 R08: 0000000000000040 R09: ffffffffc01ae500
R10: ffffc900757a7aa8 R11: ffffffffc01f0b58 R12: ffff8880606f9800
R13: ffff88800fb5dc00 R14: ffffffff824055e5 R15: ffff88800fb5dc48
FS:  00007fbbc98e4740(0000) GS:ffff888188a00000(0000)
      knlGS:0000000000000000
CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f978b44c000 CR3: 0000000007926000 CR4: 0000000000040660
Call Trace:
? error_exit+0x5/0x20
blk_integrity_del+0x1a/0x2b
del_gendisk+0x27/0x2f0
drbd_delete_device+0xcc/0x100 [drbd]
adm_del_minor+0xc5/0xe0 [drbd]
drbd_adm_down+0x13f/0x1f0 [drbd]
genl_family_rcv_msg+0x1d2/0x410
genl_rcv_msg+0x47/0x90
? __kmalloc_node_track_caller+0x217/0x2e0
? genl_family_rcv_msg+0x410/0x410
netlink_rcv_skb+0x49/0x110
genl_rcv+0x24/0x40
netlink_unicast+0x191/0x220
netlink_sendmsg+0x21d/0x3f0
sock_sendmsg+0x5b/0x60
sock_write_iter+0x97/0x100
new_sync_write+0x12d/0x1d0
vfs_write+0xa5/0x1a0
ksys_write+0x59/0xd0
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fbbc93f1a00

Which I traced back to drbd_destroy_device being called early, as
drbd_destroy_device sets memory to 0xfd and one of the pointers
observed was an offset from 0xfdfdfdfdfdfdfdfd.

Make it so that the system can be recovered even if we see this bug,
and call out if we unexpectedly do not free the device at the end
of drbd_delete_device.

Signed-off-by: Sarah Newman <srn@xxxxxxxxx>
---
  drivers/block/drbd/drbd_main.c | 16 +++++++++++++---
  1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index a18155cdce41..9148713e8b3b 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2935,6 +2935,7 @@ void drbd_delete_device(struct drbd_device *device)
  	struct drbd_resource *resource = device->resource;
  	struct drbd_connection *connection;
  	struct drbd_peer_device *peer_device;
+	unsigned int minor = device_to_minor(device);
/* move to free_peer_device() */
  	for_each_peer_device(peer_device, device)
@@ -2942,15 +2943,24 @@ void drbd_delete_device(struct drbd_device *device)
  	drbd_debugfs_device_cleanup(device);
  	for_each_connection(connection, resource) {
  		idr_remove(&connection->peer_devices, device->vnr);
-		kref_put(&device->kref, drbd_destroy_device);
  	}
+	/* There is a problem somewhere with the reference counting for
+	 * device->kref, such that at least once we saw the last kref_put before
+	 * the very last one actually call drbd_destroy_device. Since it should
+	 * be syntactically equivalent, move all the kref_puts to the end. We'll
+	 * then get a warning if calling kref_put underflows.
+	 */
  	idr_remove(&resource->devices, device->vnr);
-	kref_put(&device->kref, drbd_destroy_device);
  	idr_remove(&drbd_devices, device_to_minor(device));
-	kref_put(&device->kref, drbd_destroy_device);
  	del_gendisk(device->vdisk);
  	synchronize_rcu();
+	for_each_connection(connection, resource) {
+		kref_put(&device->kref, drbd_destroy_device);
+	}
+	kref_put(&device->kref, drbd_destroy_device);
  	kref_put(&device->kref, drbd_destroy_device);
+	if (!kref_put(&device->kref, drbd_destroy_device))
+		pr_err("invalid kref for device %d\n", minor);
  }
static int __init drbd_init(void)


Added linux-block as a CC. I can resend this patch if necessary.

Checking in to see if the patch is overall suitable and if so, whether any changes or additional testing is required before merging.

Thanks, Sarah



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux