On Wed, Nov 08, 2023 at 05:45:19PM +0530, Kanchan Joshi wrote: > On 11/7/2023 8:38 PM, Keith Busch wrote: > > On Tue, Nov 07, 2023 at 03:55:14PM +0530, Kanchan Joshi wrote: > >> On 11/6/2023 8:32 PM, Keith Busch wrote: > >>> On Mon, Nov 06, 2023 at 11:18:03AM +0530, Kanchan Joshi wrote: > >>>> On 10/27/2023 11:49 PM, Keith Busch wrote: > >>>>> + for (i = 0; i < nr_vecs; i = j) { > >>>>> + size_t size = min_t(size_t, bytes, PAGE_SIZE - offs); > >>>>> + struct folio *folio = page_folio(pages[i]); > >>>>> + > >>>>> + bytes -= size; > >>>>> + for (j = i + 1; j < nr_vecs; j++) { > >>>>> + size_t next = min_t(size_t, PAGE_SIZE, bytes); > >>>>> + > >>>>> + if (page_folio(pages[j]) != folio || > >>>>> + pages[j] != pages[j - 1] + 1) > >>>>> + break; > >>>>> + unpin_user_page(pages[j]); > >>>> > >>>> Is this unpin correct here? > >>> > >>> Should be. The pages are bound to the folio, so this doesn't really > >>> unpin the user page. It just drops a reference, and the folio holds the > >>> final reference to the contiguous pages, which is released on > >>> completion. > >> > >> But the completion is still going to see multiple pages and not one > >> (folio). The bip_for_each_vec loop is going to drop the reference again. > >> I suspect it is not folio-aware. > > > > The completion unpins once per bvec, not individual pages. The setup > > creates multipage bvecs with only one pin remaining per bvec for all of > > the bvec's pages. If a page can't be merged into the current bvec, then > > that page is not unpinned and becomes the first page of to the next > > bvec. > > > > Here is a test program [2] that creates this scenario. > Single 8KB+16b read on a 4KB+8b formatted namespace. It prepares > meta-buffer out of a huge-page in a way that it spans two regular 4K pages. > With this, I see more unpins than expected. I understand now. The bip_for_each_bvec is using single page vector iterators. Will fix it up for next time.