On 2019/1/20 8:43, Yanjun Zhu wrote:
On 2019/1/20 8:38, Yanjun Zhu wrote:
On 2019/1/17 0:02, Bart Van Assche wrote:
On Wed, 2018-11-07 at 08:42 -0800, Bart Van Assche wrote:
Hello,
If I run the srp tests from the blktests test suite long enough
against
kernel v4.20-rc1 then the complaint shown below appears. Has anyone
else
already encountered this? This is how I run the srp tests:
(cd blktests && while ./check -q srp; do :; done)
Thanks,
Bart.
[ ... ]
This issue also occurs with kernel v5.0-rc2:
==================================================================
BUG: KASAN: use-after-free in rxe_resp_queue_pkt+0x2b/0x70 [rdma_rxe]
Read of size 1 at addr ffff88803fff7455 by task ksoftirqd/0/9
CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.0.0-rc2-dbg+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1
04/01/2014
Call Trace:
dump_stack+0x86/0xca
print_address_description+0x71/0x239
kasan_report.cold.3+0x1b/0x3e
__asan_load1+0x47/0x50
rxe_resp_queue_pkt+0x2b/0x70 [rdma_rxe]
rxe_rcv+0x543/0xb00 [rdma_rxe]
rxe_loopback+0xe/0x10 [rdma_rxe]
rxe_requester+0x144c/0x2120 [rdma_rxe]
rxe_do_task+0xdd/0x170 [rdma_rxe]
tasklet_action_common.isra.14+0xc0/0x280
tasklet_action+0x3d/0x50
__do_softirq+0x128/0x5ae
run_ksoftirqd+0x35/0x50
smpboot_thread_fn+0x38b/0x490
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30
Call Trace:
dump_stack+0x86/0xca
print_address_description+0x71/0x239
kasan_report.cold.3+0x1b/0x3e
__asan_load1+0x47/0x50
rxe_resp_queue_pkt+0x2b/0x70 [rdma_rxe]
rxe_rcv+0x543/0xb00 [rdma_rxe]
rxe_loopback+0xe/0x10 [rdma_rxe]
rxe_requester+0x144c/0x2120 [rdma_rxe]
rxe_do_task+0xdd/0x170 [rdma_rxe]
tasklet_action_common.isra.14+0xc0/0x280
tasklet_action+0x3d/0x50
__do_softirq+0x128/0x5ae
run_ksoftirqd+0x35/0x50
smpboot_thread_fn+0x38b/0x490
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30
From this Call Trace, "rxe_do_task+0xdd/0x170 [rdma_rxe]" will hold
task->state_lock lock.
Then "rxe_resp_queue_pkt+0x2b/0x70 [rdma_rxe]" will call
rxe_do_task, finally task->state_lock will be held by
rxe_resp_queue_pkt.
In the end, spin lock task->state_lock has already been held, but
rxe_resp_queue_pkt still expects to hold this spin lock.
So the following should fix this problem.
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c
b/drivers/infiniband/sw/rxe/rxe_resp.c
index aca9f60f9b21..dc89562393e1 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -112,7 +112,9 @@ void rxe_resp_queue_pkt(struct rxe_qp *qp, struct
sk_buff *skb)
skb_queue_tail(&qp->req_pkts, skb);
must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST) ||
- (skb_queue_len(&qp->req_pkts) > 1);
+ (skb_queue_len(&qp->req_pkts) > 1) ||
+ ((&qp->resp.task)->state == TASK_STATE_BUSY) ||
+ ((&qp->resp.task)->state == TASK_STATE_ARMED);
rxe_run_task(&qp->resp.task, must_sched);
}
Please make tests with the above.
Zhu Yanjun
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c
b/drivers/infiniband/sw/rxe/rxe_resp.c
index aca9f60f9b21..d3658c3de4a2 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -112,7 +112,8 @@ void rxe_resp_queue_pkt(struct rxe_qp *qp, struct
sk_buff *skb)
skb_queue_tail(&qp->req_pkts, skb);
must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST) ||
- (skb_queue_len(&qp->req_pkts) > 1);
+ (skb_queue_len(&qp->req_pkts) > 1) ||
+ spin_is_locked(&(&qp->resp.task)->state_lock);
rxe_run_task(&qp->resp.task, must_sched);
}
Please make tests with the above.
Please ignore this mail.
Zhu Yanjun
Allocated by task 9:
save_stack+0x43/0xd0
__kasan_kmalloc.constprop.9+0xd0/0xe0
kasan_slab_alloc+0x16/0x20
kmem_cache_alloc_node+0xf1/0x380
__alloc_skb+0xa8/0x310
rxe_init_packet+0xc8/0x220 [rdma_rxe]
rxe_requester+0x61f/0x2120 [rdma_rxe]
rxe_do_task+0xdd/0x170 [rdma_rxe]
tasklet_action_common.isra.14+0xc0/0x280
tasklet_action+0x3d/0x50
__do_softirq+0x128/0x5ae
Freed by task 31:
save_stack+0x43/0xd0
__kasan_slab_free+0x13e/0x190
kasan_slab_free+0x13/0x20
kmem_cache_free+0xc7/0x350
kfree_skbmem+0x66/0xa0
kfree_skb+0x80/0x1b0
rxe_responder+0x6e7/0x37f0 [rdma_rxe]
rxe_do_task+0xdd/0x170 [rdma_rxe]
tasklet_action_common.isra.14+0xc0/0x280
tasklet_action+0x3d/0x50
__do_softirq+0x128/0x5ae
The buggy address belongs to the object at ffff88803fff7400
which belongs to the cache skbuff_head_cache of size 200
The buggy address is located 85 bytes inside of
200-byte region [ffff88803fff7400, ffff88803fff74c8)
The buggy address belongs to the page:
page:ffffea0000fffd80 count:1 mapcount:0 mapping:ffff88811abb9e00
index:0x0 compound_mapcount: 0
flags: 0x1fff000000010200(slab|head)
raw: 1fff000000010200 dead000000000100 dead000000000200
ffff88811abb9e00
raw: 0000000000000000 0000000080190019 00000001ffffffff
0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88803fff7300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88803fff7380: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88803fff7400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88803fff7480: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc
ffff88803fff7500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
==================================================================