Re: [PATCH 01/33] block: add a lower-level bio_add_page interface

Ritesh Harjani <riteshh@xxxxxxxxxxxxxx> · Wed, 16 May 2018 10:36:14 +0530

On 5/9/2018 1:17 PM, Christoph Hellwig wrote:
For the upcoming removal of buffer heads in XFS we need to keep track of
the number of outstanding writeback requests per page.  For this we need
to know if bio_add_page merged a region with the previous bvec or not.
Instead of adding additional arguments this refactors bio_add_page to
be implemented using three lower level helpers which users like XFS can
use directly if they care about the merge decisions.

Signed-off-by: Christoph Hellwig <hch@xxxxxx>
---
  block/bio.c         | 87 ++++++++++++++++++++++++++++++---------------
  include/linux/bio.h |  9 +++++
  2 files changed, 67 insertions(+), 29 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 53e0f0a1ed94..6ceba6adbf42 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -773,7 +773,7 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
  			return 0;
  	}
  
-	if (bio->bi_vcnt >= bio->bi_max_vecs)
+	if (bio_full(bio))
  		return 0;
  
  	/*
@@ -820,6 +820,59 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
  }
  EXPORT_SYMBOL(bio_add_pc_page);
  
+/**
+ * __bio_try_merge_page - try adding data to an existing bvec
+ * @bio: destination bio
+ * @page: page to add
+ * @len: length of the range to add
+ * @off: offset into @page
+ *
+ * Try adding the data described at @page + @offset to the last bvec of @bio.
+ * Return %true on success or %false on failure.  This can happen frequently
+ * for file systems with a block size smaller than the page size.
+ */
+bool __bio_try_merge_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off)
+{
+	if (bio->bi_vcnt > 0) {
+		struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
+
+		if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
+			bv->bv_len += len;
+			bio->bi_iter.bi_size += len;
+			return true;
+		}
+	}
+	return false;
+}
+EXPORT_SYMBOL_GPL(__bio_try_merge_page);
+
+/**
+ * __bio_add_page - add page to a bio in a new segment
+ * @bio: destination bio
+ * @page: page to add
+ * @len: length of the range to add
+ * @off: offset into @page
+ *
+ * Add the data at @page + @offset to @bio as a new bvec.  The caller must
+ * ensure that @bio has space for another bvec.
+ */
+void __bio_add_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off)
+{
+	struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt];
+
+	WARN_ON_ONCE(bio_full(bio));

Please correct my understanding here. I am still new at understanding this.

1. if bio_full is true that means no space in bio->bio_io_vec[] no?
Than how come we are still proceeding ahead with only warning?
While originally in bio_add_page we used to return after checking
bio_full. Callers can still call __bio_add_page directly right.

2. Also the bio_io_vec size allocated will only be upto bio->bi_max_vecs 
right?
I could not follow up very well with the bvec_alloc function,
mainly when nr_iovec > inline_vecs. So how and where it is getting sure 
that we are getting _nr_iovecs_ allocated from the bvec_pool?

hmm.. tricky. Please help me understand this.
1. So we have defined different slabs of different sizes in bvec_slabs. 
and when the allocation request of nr_iovecs come
we try to grab the predefined(in terms of size) slab of bvec_slabs
and return. In case if that allocation does not succeed from slab,
we go for mempool_alloc.

2. IF above is correct why don't we set the bio->bi_max_vecs to the size
of the slab instead of keeeping it to nr_iovecs which user requested?
(in bio_alloc_bioset)


3. Could you please help understand why for cloned bio we still allow
__bio_add_page to work? why not WARN and return like in original code?

4. Ok, I see that in patch 32 you are first checking bio_full and 
calling for xfs_chain_bio. But there also I think you are making sure 
that new ioend->io_bio is the new chained bio which is not full.

Apologies if above doesn't make any sense.

+
+	bv->bv_page = page;
+	bv->bv_offset = off;
+	bv->bv_len = len;
+
+	bio->bi_iter.bi_size += len;
+	bio->bi_vcnt++;
+}
+EXPORT_SYMBOL_GPL(__bio_add_page);
+
  /**
   *	bio_add_page	-	attempt to add page to bio
   *	@bio: destination bio
@@ -833,40 +886,16 @@ EXPORT_SYMBOL(bio_add_pc_page);
  int bio_add_page(struct bio *bio, struct page *page,
  		 unsigned int len, unsigned int offset)
  {
-	struct bio_vec *bv;
-
  	/*
  	 * cloned bio must not modify vec list
  	 */
  	if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
  		return 0;
-
-	/*
-	 * For filesystems with a blocksize smaller than the pagesize
-	 * we will often be called with the same page as last time and
-	 * a consecutive offset.  Optimize this special case.
-	 */
-	if (bio->bi_vcnt > 0) {
-		bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
-
-		if (page == bv->bv_page &&
-		    offset == bv->bv_offset + bv->bv_len) {
-			bv->bv_len += len;
-			goto done;
-		}
+	if (!__bio_try_merge_page(bio, page, len, offset)) {
+		if (bio_full(bio))
+			return 0;
+		__bio_add_page(bio, page, len, offset);
  	}
-
-	if (bio->bi_vcnt >= bio->bi_max_vecs)
-		return 0;
Originally here we were supposed to return and not proceed further.
Should __bio_add_page not have similar checks to safeguard crossing
the bio_io_vec[] boundary?


-
-	bv		= &bio->bi_io_vec[bio->bi_vcnt];
-	bv->bv_page	= page;
-	bv->bv_len	= len;
-	bv->bv_offset	= offset;
-
-	bio->bi_vcnt++;
-done:
-	bio->bi_iter.bi_size += len;
  	return len;
  }
  EXPORT_SYMBOL(bio_add_page);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ce547a25e8ae..3e73c8bc25ea 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -123,6 +123,11 @@ static inline void *bio_data(struct bio *bio)
  	return NULL;
  }
  
+static inline bool bio_full(struct bio *bio)
+{
+	return bio->bi_vcnt >= bio->bi_max_vecs;
+}
+
  /*
   * will die
   */
@@ -470,6 +475,10 @@ void bio_chain(struct bio *, struct bio *);
  extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
  extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
  			   unsigned int, unsigned int);
+bool __bio_try_merge_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off);
+void __bio_add_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int off);
  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
  struct rq_map_data;
  extern struct bio *bio_map_user_iov(struct request_queue *,


--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, 
Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project.