hi, David, On Tue, Sep 24, 2024 at 10:47:30PM +0100, David Howells wrote: > Does the attached fix the problem? yes, as I've replied in https://lore.kernel.org/all/ZvTD2t5s8lwQyphN@xsang-OptiPlex-9020/ thanks! > > David > --- > netfs: Fix write oops in generic/346 (9p) and maybe generic/074 (cifs) > > In netfslib, a buffered writeback operation has a 'write queue' of folios > that are being written, held in a linear sequence of folio_queue structs. > The 'issuer' adds new folio_queues on the leading edge of the queue and > populates each one progressively; the 'collector' pops them off the > trailing edge and discards them and the folios they point to as they are > consumed. > > The queue is required to always retain at least one folio_queue structure. > This allows the queue to be accessed without locking and with just a bit of > barriering. > > When a new subrequest is prepared, its ->io_iter iterator is pointed at the > current end of the write queue and then the iterator is extended as more > data is added to the queue until the subrequest is committed. > > Now, the problem is that the folio_queue at the leading edge of the write > queue when a subrequest is prepared might have been entirely consumed - but > not yet removed from the queue as it is the only remaining one and is > preventing the queue from collapsing. > > So, what happens is that subreq->io_iter is pointed at the spent > folio_queue, then a new folio_queue is added, and, at that point, the > collector is at entirely at liberty to immediately delete the spent > folio_queue. > > This leaves the subreq->io_iter pointing at a freed object. If the system > is lucky, iterate_folioq() sees ->io_iter, sees the as-yet uncorrupted > freed object and advances to the next folio_queue in the queue. > > In the case seen, however, the freed object gets recycled and put back onto > the queue at the tail and filled to the end. This confuses > iterate_folioq() and it tries to step ->next, which may be NULL - resulting > in an oops. > > Fix this by the following means: > > (1) When preparing a write subrequest, make sure there's a folio_queue > struct with space in it at the leading edge of the queue. A function > to make space is split out of the function to append a folio so that > it can be called for this purpose. > > (2) If the request struct iterator is pointing to a completely spent > folio_queue when we make space, then advance the iterator to the newly > allocated folio_queue. The subrequest's iterator will then be set > from this. > > Whilst we're at it, also split out the function to allocate a folio_queue, > initialise it and do the accounting. > > The oops could be triggered using the generic/346 xfstest with a filesystem > on9P over TCP with cache=loose. The oops looked something like: > > BUG: kernel NULL pointer dereference, address: 0000000000000008 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > ... > RIP: 0010:_copy_from_iter+0x2db/0x530 > ... > Call Trace: > <TASK> > ... > p9pdu_vwritef+0x3d8/0x5d0 > p9_client_prepare_req+0xa8/0x140 > p9_client_rpc+0x81/0x280 > p9_client_write+0xcf/0x1c0 > v9fs_issue_write+0x87/0xc0 > netfs_advance_write+0xa0/0xb0 > netfs_write_folio.isra.0+0x42d/0x500 > netfs_writepages+0x15a/0x1f0 > do_writepages+0xd1/0x220 > filemap_fdatawrite_wbc+0x5c/0x80 > v9fs_mmap_vm_close+0x7d/0xb0 > remove_vma+0x35/0x70 > vms_complete_munmap_vmas+0x11a/0x170 > do_vmi_align_munmap+0x17d/0x1c0 > do_vmi_munmap+0x13e/0x150 > __vm_munmap+0x92/0xd0 > __x64_sys_munmap+0x17/0x20 > do_syscall_64+0x80/0xe0 > entry_SYSCALL_64_after_hwframe+0x71/0x79 > > This may also fix a similar-looking issue with cifs and generic/074. > > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > | Closes: https://lore.kernel.org/oe-lkp/202409180928.f20b5a08-oliver.sang@xxxxxxxxx > > Signed-off-by: David Howells <dhowells@xxxxxxxxxx> > cc: Eric Van Hensbergen <ericvh@xxxxxxxxxx> > cc: Latchesar Ionkov <lucho@xxxxxxxxxx> > cc: Dominique Martinet <asmadeus@xxxxxxxxxxxxx> > cc: Christian Schoenebeck <linux_oss@xxxxxxxxxxxxx> > cc: Steve French <sfrench@xxxxxxxxx> > cc: Paulo Alcantara <pc@xxxxxxxxxxxxx> > cc: Jeff Layton <jlayton@xxxxxxxxxx> > cc: v9fs@xxxxxxxxxxxxxxx > cc: linux-cifs@xxxxxxxxxxxxxxx > cc: netfs@xxxxxxxxxxxxxxx > cc: linux-fsdevel@xxxxxxxxxxxxxxx > --- > fs/netfs/internal.h | 2 + > fs/netfs/misc.c | 72 ++++++++++++++++++++++++++++++++++--------------- > fs/netfs/objects.c | 12 ++++++++ > fs/netfs/write_issue.c | 12 +++++++- > 4 files changed, 76 insertions(+), 22 deletions(-) > > diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h > index c7f23dd3556a..79c0ad89affb 100644 > --- a/fs/netfs/internal.h > +++ b/fs/netfs/internal.h > @@ -58,6 +58,7 @@ static inline void netfs_proc_del_rreq(struct netfs_io_request *rreq) {} > /* > * misc.c > */ > +struct folio_queue *netfs_buffer_make_space(struct netfs_io_request *rreq); > int netfs_buffer_append_folio(struct netfs_io_request *rreq, struct folio *folio, > bool needs_put); > struct folio_queue *netfs_delete_buffer_head(struct netfs_io_request *wreq); > @@ -76,6 +77,7 @@ void netfs_clear_subrequests(struct netfs_io_request *rreq, bool was_async); > void netfs_put_request(struct netfs_io_request *rreq, bool was_async, > enum netfs_rreq_ref_trace what); > struct netfs_io_subrequest *netfs_alloc_subrequest(struct netfs_io_request *rreq); > +struct folio_queue *netfs_folioq_alloc(struct netfs_io_request *rreq, gfp_t gfp); > > static inline void netfs_see_request(struct netfs_io_request *rreq, > enum netfs_rreq_ref_trace what) > diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c > index 0ad0982ce0e2..a743e8963247 100644 > --- a/fs/netfs/misc.c > +++ b/fs/netfs/misc.c > @@ -9,34 +9,64 @@ > #include "internal.h" > > /* > - * Append a folio to the rolling queue. > + * Make sure there's space in the rolling queue. > */ > -int netfs_buffer_append_folio(struct netfs_io_request *rreq, struct folio *folio, > - bool needs_put) > +struct folio_queue *netfs_buffer_make_space(struct netfs_io_request *rreq) > { > - struct folio_queue *tail = rreq->buffer_tail; > - unsigned int slot, order = folio_order(folio); > + struct folio_queue *tail = rreq->buffer_tail, *prev; > + unsigned int prev_nr_slots = 0; > > if (WARN_ON_ONCE(!rreq->buffer && tail) || > WARN_ON_ONCE(rreq->buffer && !tail)) > - return -EIO; > - > - if (!tail || folioq_full(tail)) { > - tail = kmalloc(sizeof(*tail), GFP_NOFS); > - if (!tail) > - return -ENOMEM; > - netfs_stat(&netfs_n_folioq); > - folioq_init(tail); > - tail->prev = rreq->buffer_tail; > - if (tail->prev) > - tail->prev->next = tail; > - rreq->buffer_tail = tail; > - if (!rreq->buffer) { > - rreq->buffer = tail; > - iov_iter_folio_queue(&rreq->io_iter, ITER_SOURCE, tail, 0, 0, 0); > + return ERR_PTR(-EIO); > + > + prev = tail; > + if (prev) { > + if (!folioq_full(tail)) > + return tail; > + prev_nr_slots = folioq_nr_slots(tail); > + } > + > + tail = netfs_folioq_alloc(rreq, GFP_NOFS); > + if (!tail) > + return ERR_PTR(-ENOMEM); > + tail->prev = prev; > + if (prev) > + /* [!] NOTE: After we set prev->next, the consumer is entirely > + * at liberty to delete prev. > + */ > + WRITE_ONCE(prev->next, tail); > + > + rreq->buffer_tail = tail; > + if (!rreq->buffer) { > + rreq->buffer = tail; > + iov_iter_folio_queue(&rreq->io_iter, ITER_SOURCE, tail, 0, 0, 0); > + } else { > + /* Make sure we don't leave the master iterator pointing to a > + * block that might get immediately consumed. > + */ > + if (rreq->io_iter.folioq == prev && > + rreq->io_iter.folioq_slot == prev_nr_slots) { > + rreq->io_iter.folioq = tail; > + rreq->io_iter.folioq_slot = 0; > } > - rreq->buffer_tail_slot = 0; > } > + rreq->buffer_tail_slot = 0; > + return tail; > +} > + > +/* > + * Append a folio to the rolling queue. > + */ > +int netfs_buffer_append_folio(struct netfs_io_request *rreq, struct folio *folio, > + bool needs_put) > +{ > + struct folio_queue *tail; > + unsigned int slot, order = folio_order(folio); > + > + tail = netfs_buffer_make_space(rreq); > + if (IS_ERR(tail)) > + return PTR_ERR(tail); > > rreq->io_iter.count += PAGE_SIZE << order; > > diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c > index d32964e8ca5d..dd8241bc996b 100644 > --- a/fs/netfs/objects.c > +++ b/fs/netfs/objects.c > @@ -250,3 +250,15 @@ void netfs_put_subrequest(struct netfs_io_subrequest *subreq, bool was_async, > if (dead) > netfs_free_subrequest(subreq, was_async); > } > + > +struct folio_queue *netfs_folioq_alloc(struct netfs_io_request *rreq, gfp_t gfp) > +{ > + struct folio_queue *fq; > + > + fq = kmalloc(sizeof(*fq), gfp); > + if (fq) { > + netfs_stat(&netfs_n_folioq); > + folioq_init(fq); > + } > + return fq; > +} > diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c > index 04e66d587f77..0929d9fd4ce7 100644 > --- a/fs/netfs/write_issue.c > +++ b/fs/netfs/write_issue.c > @@ -153,12 +153,22 @@ static void netfs_prepare_write(struct netfs_io_request *wreq, > loff_t start) > { > struct netfs_io_subrequest *subreq; > + struct iov_iter *wreq_iter = &wreq->io_iter; > + > + /* Make sure we don't point the iterator at a used-up folio_queue > + * struct being used as a placeholder to prevent the queue from > + * collapsing. In such a case, extend the queue. > + */ > + if (iov_iter_is_folioq(wreq_iter) && > + wreq_iter->folioq_slot >= folioq_nr_slots(wreq_iter->folioq)) { > + netfs_buffer_make_space(wreq); > + } > > subreq = netfs_alloc_subrequest(wreq); > subreq->source = stream->source; > subreq->start = start; > subreq->stream_nr = stream->stream_nr; > - subreq->io_iter = wreq->io_iter; > + subreq->io_iter = *wreq_iter; > > _enter("R=%x[%x]", wreq->debug_id, subreq->debug_index); >