Re: [PATCH] netfs: If didn't read new data then abandon retry

Lizhi Xu <lizhi.xu@xxxxxxxxxxxxx> · Fri, 13 Dec 2024 15:26:51 +0800



On Mon, 09 Dec 2024 15:53:04 +0000, David Howells wrote:
> David
> ---
> commit d0906b4a4611709c02de610d3c34d6172aa28aaf
> Author: David Howells <dhowells@xxxxxxxxxx>
> Date:   Fri Nov 8 11:40:20 2024 +0800
> 
>     netfs: Work around recursion by abandoning retry if nothing read
>     
>     syzkaller reported recursion with a loop of three calls (netfs_rreq_assess,
>     netfs_retry_reads and netfs_rreq_terminated) hitting the limit of the stack
>     during an unbuffered or direct I/O read.
>     
>     There are a number of issues:
>     
>      (1) There is no limit on the number of retries.
>     
>      (2) A subrequest is supposed to be abandoned if it does not transfer
>          anything (NETFS_SREQ_NO_PROGRESS), but that isn't checked under all
>          circumstances.
>     
>      (3) The actual root cause, which is this:
>     
>             if (atomic_dec_and_test(&rreq->nr_outstanding))
>                     netfs_rreq_terminated(rreq, ...);
>     
>          When we do a retry, we bump the rreq->nr_outstanding counter to
>          prevent the final cleanup phase running before we've finished
>          dispatching the retries.  The problem is if we hit 0, we have to do
>          the cleanup phase - but we're in the cleanup phase and end up
>          repeating the retry cycle, hence the recursion.
>     
>     Work around the problem by limiting the number of retries.  This is based
>     on Lizhi Xu's patch[1], and makes the following changes:
>     
>      (1) Replace NETFS_SREQ_NO_PROGRESS with NETFS_SREQ_MADE_PROGRESS and make
>          the filesystem set it if it managed to read or write at least one byte
>          of data.  Clear this bit before issuing a subrequest.
Will there be conflicts when reading and writing use the same flag to mark?
>     
>      (2) Add a ->retry_count member to the subrequest and increment it any time
>          we do a retry.
>     
>      (3) Remove the NETFS_SREQ_RETRYING flag as it is superfluous with
>          ->retry_count.  If the latter is non-zero, we're doing a retry.
>     
>      (4) Abandon a subrequest if retry_count is non-zero and we made no
>          progress.
>     
>      (5) Use ->retry_count in both the write-side and the read-size.

BR,
Lizhi