Re: [PATCH] ceph: don't abort linger request in abort_request()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 06/05/2018 04:25 PM, Ilya Dryomov wrote:
On Mon, Jun 4, 2018 at 12:25 PM, Dongsheng Yang
<dongsheng.yang@xxxxxxxxxxxx> wrote:
There is a problem if we abort a linger request and then request_reinit(),
because abort_request() aborted the osd_request but did not put the lreq->r_klref;
Then we will hit this WARN_ON(kref_read(&req->r_kref) != 1);

[  609.492250] WARNING: CPU: 6 PID: 4933 at net/ceph//osd_client.c:558 request_reinit+0x116/0x130 [libceph]
[  609.501561] Modules linked in: rbd(OE) libceph(OE) dns_resolver tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sg cfg80211 rfkill snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core pcbc mbcache aesni_intel snd_hwdep crypto_simd snd_seq cryptd jbd2 glue_helper snd_seq_device nfsd snd_pcm snd_timer snd virtio_balloon auth_rpcgss pcspkr soundcore i2c_piix4 nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c virtio_blk virtio_console ata_generic pata_acpi 8139too qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata crc32c_intel virtio_pci i2c_core virtio_ring 8139cp virtio mii floppy serio_raw dm_mirror dm_region_hash
[  609.518259]  dm_log dm_mod dax [last unloaded: libceph]
[  609.520015] CPU: 6 PID: 4933 Comm: kworker/6:0 Tainted: G           OE     4.17.0-rc6+ #16
[  609.522261] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  609.524135] Workqueue: events handle_timeout [libceph]
[  609.525887] RIP: 0010:request_reinit+0x116/0x130 [libceph]
[  609.527723] RSP: 0018:ffffaf90833ebd78 EFLAGS: 00010202
[  609.529502] RAX: 0000000000000002 RBX: ffff89bb140f1f00 RCX: 0000000000000000
[  609.531576] RDX: 0000000000000000 RSI: ffff89bb5f196978 RDI: ffff89bb5f196978
[  609.533643] RBP: ffff89bb52b47800 R08: 0000000000000000 R09: 0000000000000000
[  609.535737] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff89baf96632e0
[  609.537836] R13: 0000000000000001 R14: 0000000000000000 R15: ffff89bb516af768
[  609.539919] FS:  0000000000000000(0000) GS:ffff89bb5f180000(0000) knlGS:0000000000000000
[  609.542151] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  609.544009] CR2: 000055ed56747020 CR3: 000000032e00a003 CR4: 00000000000206e0
[  609.546111] Call Trace:
[  609.547482]  handle_timeout+0x4e1/0x750 [libceph]
[  609.549208]  ? pick_next_task_fair+0x273/0x630
[  609.550867]  ? __switch_to+0xa8/0x480
[  609.552422]  process_one_work+0x158/0x360
[  609.554012]  worker_thread+0x47/0x3e0
[  609.555560]  kthread+0xf8/0x130
[  609.557018]  ? max_active_store+0x80/0x80
[  609.558606]  ? kthread_bind+0x10/0x10
[  609.560142]  ret_from_fork+0x35/0x40

Signed-off-by: Dongsheng Yang <dongsheng.yang@xxxxxxxxxxxx>
---
  net/ceph/osd_client.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index ab287e5..406ecdf 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -2385,6 +2385,9 @@ static void cancel_request(struct ceph_osd_request *req)

  static void abort_request(struct ceph_osd_request *req, int err)
  {
+       if (req->r_linger)
+               return;
+
         dout("%s req %p tid %llu err %d\n", __func__, req, req->r_tid, err);

         cancel_map_check(req);
I think all OSD requests should be aborted, without exception.
Otherwise abort_request() and its users become less useful...

This WARN can be triggered under (almost) normal conditions, without
abort_request() in the mix -- all it takes is a timing irregularity [1].
A proper fix is to probably get rid of request_reinit(), but it will
take some effort.
Hi Ilya,
   yes, that's not correct way to fix this problem. :)

thanx
Dongsheng


[1] http://tracker.ceph.com/issues/20996

Thanks,

                 Ilya



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux