On 10/18/2023 8:48 PM, Keith Busch wrote: > From: Keith Busch <kbusch@xxxxxxxxxx> > > User space passthrough commands that utilize metadata currently need to > bounce the "integrity" buffer through the kernel. This adds unnecessary > overhead and memory pressure. > > Add support for mapping user space directly so that we can avoid this > costly copy. This is similiar to how the bio payload utilizes user > addresses with bio_map_user_iov(). > > Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx> > --- > block/bio-integrity.c | 67 +++++++++++++++++++++++++++++++++++++++++++ > include/linux/bio.h | 8 ++++++ > 2 files changed, 75 insertions(+) > > diff --git a/block/bio-integrity.c b/block/bio-integrity.c > index ec8ac8cf6e1b9..08f70b837a29b 100644 > --- a/block/bio-integrity.c > +++ b/block/bio-integrity.c > @@ -91,6 +91,19 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio, > } > EXPORT_SYMBOL(bio_integrity_alloc); > > +static void bio_integrity_unmap_user(struct bio_integrity_payload *bip) > +{ > + bool dirty = bio_data_dir(bip->bip_bio) == READ; > + struct bvec_iter iter; > + struct bio_vec bv; > + > + bip_for_each_vec(bv, bip, iter) { > + if (dirty && !PageCompound(bv.bv_page)) > + set_page_dirty_lock(bv.bv_page); > + unpin_user_page(bv.bv_page); > + } > +} > + > /** > * bio_integrity_free - Free bio integrity payload > * @bio: bio containing bip to be freed > @@ -105,6 +118,8 @@ void bio_integrity_free(struct bio *bio) > > if (bip->bip_flags & BIP_BLOCK_INTEGRITY) > kfree(bvec_virt(bip->bip_vec)); > + else if (bip->bip_flags & BIP_INTEGRITY_USER) > + bio_integrity_unmap_user(bip);; > > __bio_integrity_free(bs, bip); > bio->bi_integrity = NULL; > @@ -160,6 +175,58 @@ int bio_integrity_add_page(struct bio *bio, struct page *page, > } > EXPORT_SYMBOL(bio_integrity_add_page); > > +int bio_integrity_map_user(struct bio *bio, void __user *ubuf, unsigned int len, > + u32 seed, u32 maxvecs) > +{ > + struct request_queue *q = bdev_get_queue(bio->bi_bdev); > + unsigned long align = q->dma_pad_mask | queue_dma_alignment(q); > + struct page *stack_pages[UIO_FASTIOV]; > + size_t offset = offset_in_page(ubuf); > + unsigned long ptr = (uintptr_t)ubuf; > + struct page **pages = stack_pages; > + struct bio_integrity_payload *bip; > + int npages, ret, i; > + > + if (bio_integrity(bio) || ptr & align || maxvecs > UIO_FASTIOV) > + return -EINVAL; > + > + bip = bio_integrity_alloc(bio, GFP_KERNEL, maxvecs); > + if (IS_ERR(bip)) > + return PTR_ERR(bip); > + > + ret = pin_user_pages_fast(ptr, UIO_FASTIOV, FOLL_WRITE, pages); Why not pass maxvecs here? If you pass UIO_FASTIOV, it will map those many pages here. And will result into a leak (missed unpin) eventually (see below). > + if (unlikely(ret < 0)) > + goto free_bip; > + > + npages = ret; > + for (i = 0; i < npages; i++) { > + u32 bytes = min_t(u32, len, PAGE_SIZE - offset); Nit: bytes can be declared outside. > + ret = bio_integrity_add_page(bio, pages[i], bytes, offset); > + if (ret != bytes) { > + ret = -EINVAL; > + goto release_pages; > + } > + len -= ret; Take the case of single '4KB + 8b' io. This len will become 0 in the first iteration. But the loop continues for UIO_FASTIOV iterations. It will add only one page into bio_integrity_add_page. And that is what it will unpin during bio_integrity_unmap_user(). Remaining pages will continue to remain pinned.