Re: [PATCH v2] net: fix sk_page_frag() recursion from memory reclaim

David Miller <davem@xxxxxxxxxxxxx> · Mon, 28 Oct 2019 16:18:17 -0700 (PDT)

From: Tejun Heo <tj@xxxxxxxxxx>
Date: Thu, 24 Oct 2019 13:50:27 -0700

> sk_page_frag() optimizes skb_frag allocations by using per-task
> skb_frag cache when it knows it's the only user.  The condition is
> determined by seeing whether the socket allocation mask allows
> blocking - if the allocation may block, it obviously owns the task's
> context and ergo exclusively owns current->task_frag.
> 
> Unfortunately, this misses recursion through memory reclaim path.
> Please take a look at the following backtrace.
 ...
> In [0], tcp_send_msg_locked() was using current->page_frag when it
> called sk_wmem_schedule().  It already calculated how many bytes can
> be fit into current->page_frag.  Due to memory pressure,
> sk_wmem_schedule() called into memory reclaim path which called into
> xfs and then IO issue path.  Because the filesystem in question is
> backed by nbd, the control goes back into the tcp layer - back into
> tcp_sendmsg_locked().
> 
> nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
> sense - it's in the process of freeing memory and wants to be able to,
> e.g., drop clean pages to make forward progress.  However, this
> confused sk_page_frag() called from [2].  Because it only tests
> whether the allocation allows blocking which it does, it now thinks
> current->page_frag can be used again although it already was being
> used in [0].
> 
> After [2] used current->page_frag, the offset would be increased by
> the used amount.  When the control returns to [0],
> current->page_frag's offset is increased and the previously calculated
> number of bytes now may overrun the end of allocated memory leading to
> silent memory corruptions.
> 
> Fix it by adding gfpflags_normal_context() which tests sleepable &&
> !reclaim and use it to determine whether to use current->task_frag.
> 
> v2: Eric didn't like gfp flags being tested twice.  Introduce a new
>     helper gfpflags_normal_context() and combine the two tests.
> 
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>

Applied and queued up for -stable, thanks Tejun.