> On Jun 30, 2021, at 12:05 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Wed, Jun 30, 2021 at 03:55:16PM +0000, Chuck Lever III wrote: >>> On Jun 30, 2021, at 11:53 AM, Russell King (Oracle) <linux@xxxxxxxxxxxxxxx> wrote: >>> Has 5.13 been tested with nfsv3 ? >>> >>> Any ideas what's going on? >> >> Yes, fortunately! >> >> Have a look at commit 66d9282523b for a one-liner fix, it should apply >> cleanly to v5.13. > > So was this a freak event or do we need some process change? I would say that it is "Somewhat freaky". In the future we expect there to be more consumers of the bulk page allocator, so this kind of thing should become less likely over time. > Looks like the problem commit went in 3 days before 5.13. It was an mm > commit so they're probably not set up to do nfsd testing, but nfsd is > the only user of that function, right? NFSD is one of two consumers of that function. The other is __page_pool_alloc_pages_slow(). The author of __page_pool_alloc_pages_slow() was cc'd on the broken fix, but I was not. I was cc'd on earlier traffic regarding the new API because I instigated its inclusion into the mainline kernel on behalf of NFSD. Mel does indeed have the ability to test NFSD, and has been careful in the past about testing before committing. > It says it fixed a bug that could result in writing past the end of an > array, so made sense that it was urgent. No, a prior commit, b08e50dd6448 ("mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array"), fixed that particular bug. The broken patch fixed a page leak in the same use case. I tested the new API with NFSD extensively, and did not notice page leaks or array overruns (although, rq_pages has an extra element at the end for other reasons). NFSD does feed full arrays to alloc_pages_bulk_array() quite often with RDMA, so I think I would have noticed something. With TCP I think there is at least one missing page per call, so the common case wouldn't have triggered this bug at all. __page_pool_alloc_pages_slow() always passes in an empty array, so it also would never have hit this bug. The problem was discovered by static analysis, not a crash report. IMO the fix should have waited for testing and review by the NFS community. This close to a final release, the fixes for the array overrun and page leak could have gone into v5.13.1. I hope automation will take care of all of this very soon. -- Chuck Lever