Re: [PATCH v2 4/9] SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent

"NeilBrown" <neilb@xxxxxxx> · Fri, 25 Mar 2022 12:49:05 +1100

On Tue, 22 Mar 2022, trondmy@xxxxxxxxxx wrote:
> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> 
> Make sure that rpciod and xprtiod are always using the same slab
> allocation modes.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
....
>  xs_stream_prepare_request(struct rpc_rqst *req)
>  {
>  	xdr_free_bvec(&req->rq_rcv_buf);
> -	req->rq_task->tk_status = xdr_alloc_bvec(&req->rq_rcv_buf, GFP_KERNEL);
> +	req->rq_task->tk_status = xdr_alloc_bvec(
> +		&req->rq_rcv_buf, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
>  }

I did some testing of swap-over-NFS, and got a crash quite quickly, due
to this change.
The problem is that GFP_KERNEL allocations almost never fail.
Multi-page allocations might occasionally fail, and others might fail
for a process that has been killed by the OOM killer (or maybe just has
a fatal signal pending), but in general GFP_KERNEL is more likely to
wait (and wait and wait) than to fail.
So the failure paths haven't been tested.

xs_stream_prepare_request() is called from xprt_request_prepare(), which
is called from xprt_request_enqueue_receive() which is called in
call_encode() *after* ->tk_status has been tested.
So when the above code sets ->tk_status to -ENOMEM - which is now more
likely - that fact is ignored and we get a crash

[  298.911356] Workqueue: xprtiod xs_stream_data_receive_workfn
[  298.911696] RIP: 0010:_copy_to_iter+0x1cc/0x435
..
[  298.918259]  __skb_datagram_iter+0x64/0x225
[  298.918507]  skb_copy_datagram_iter+0xe9/0xf2
[  298.918767]  tcp_recvmsg_locked+0x653/0x77e
[  298.919015]  tcp_recvmsg+0x100/0x188
[  298.919226]  inet_recvmsg+0x5d/0x86
[  298.919431]  xs_read_stream_request.constprop.0+0x247/0x378
[  298.919754]  xs_read_stream.constprop.0+0x1c2/0x39b
[  298.920038]  xs_stream_data_receive_workfn+0x50/0x160
[  298.920331]  process_one_work+0x267/0x422
[  298.920568]  worker_thread+0x193/0x234

So we really need to audit all these places where we add __GFP_NORETRY
and ensure errors are actually handled.

For call_encode(), it might be easiest to move
	/* Add task to reply queue before transmission to avoid races */
	if (rpc_reply_expected(task))
		xprt_request_enqueue_receive(task);

up before the
	/* Did the encode result in an error condition? */
	if (task->tk_status != 0) {

and change it to

	/* Add task to reply queue before transmission to avoid races */
	if (task->tk_status == 0 && rpc_reply_expected(task))
		xprt_request_enqueue_receive(task);

I'll try a bit more testing and auditing.

Thanks,
NeilBrown