On Mon, 09 Dec 2024 15:53:04 +0000, David Howells wrote: > David > --- > commit d0906b4a4611709c02de610d3c34d6172aa28aaf > Author: David Howells <dhowells@xxxxxxxxxx> > Date: Fri Nov 8 11:40:20 2024 +0800 > > netfs: Work around recursion by abandoning retry if nothing read > > syzkaller reported recursion with a loop of three calls (netfs_rreq_assess, > netfs_retry_reads and netfs_rreq_terminated) hitting the limit of the stack > during an unbuffered or direct I/O read. > > There are a number of issues: > > (1) There is no limit on the number of retries. > > (2) A subrequest is supposed to be abandoned if it does not transfer > anything (NETFS_SREQ_NO_PROGRESS), but that isn't checked under all > circumstances. > > (3) The actual root cause, which is this: > > if (atomic_dec_and_test(&rreq->nr_outstanding)) > netfs_rreq_terminated(rreq, ...); > > When we do a retry, we bump the rreq->nr_outstanding counter to > prevent the final cleanup phase running before we've finished > dispatching the retries. The problem is if we hit 0, we have to do > the cleanup phase - but we're in the cleanup phase and end up > repeating the retry cycle, hence the recursion. > > Work around the problem by limiting the number of retries. This is based > on Lizhi Xu's patch[1], and makes the following changes: > > (1) Replace NETFS_SREQ_NO_PROGRESS with NETFS_SREQ_MADE_PROGRESS and make > the filesystem set it if it managed to read or write at least one byte > of data. Clear this bit before issuing a subrequest. Will there be conflicts when reading and writing use the same flag to mark? > > (2) Add a ->retry_count member to the subrequest and increment it any time > we do a retry. > > (3) Remove the NETFS_SREQ_RETRYING flag as it is superfluous with > ->retry_count. If the latter is non-zero, we're doing a retry. > > (4) Abandon a subrequest if retry_count is non-zero and we made no > progress. > > (5) Use ->retry_count in both the write-side and the read-size. BR, Lizhi