On 4/15/23 05:44, Matthew Wilcox wrote:
On Fri, Apr 14, 2023 at 08:24:56PM -0700, Luis Chamberlain wrote:
I thought of that but I saw that the loop that assigns the arr only
pegs a bh if we don't "continue" for certain conditions, which made me
believe that we only wanted to keep on the array as non-null items which
meet the initial loop's criteria. If that is not accurate then yes,
the simplication is nice!
Uh, right. A little bit more carefully this time ... how does this
look?
diff --git a/fs/buffer.c b/fs/buffer.c
index 5e67e21b350a..dff671079b02 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2282,7 +2282,7 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
{
struct inode *inode = folio->mapping->host;
sector_t iblock, lblock;
- struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
+ struct buffer_head *bh, *head;
unsigned int blocksize, bbits;
int nr, i;
int fully_mapped = 1;
@@ -2335,7 +2335,7 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
if (buffer_uptodate(bh))
continue;
}
- arr[nr++] = bh;
+ nr++;
} while (i++, iblock++, (bh = bh->b_this_page) != head);
if (fully_mapped)
@@ -2352,25 +2352,29 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
return 0;
}
- /* Stage two: lock the buffers */
- for (i = 0; i < nr; i++) {
- bh = arr[i];
+ /*
+ * Stage two: lock the buffers. Recheck the uptodate flag under
+ * the lock in case somebody else brought it uptodate first.
+ */
+ bh = head;
+ do {
+ if (buffer_uptodate(bh))
+ continue;
lock_buffer(bh);
+ if (buffer_uptodate(bh)) {
+ unlock_buffer(bh);
+ continue;
+ }
mark_buffer_async_read(bh);
- }
+ } while ((bh = bh->b_this_page) != head);
- /*
- * Stage 3: start the IO. Check for uptodateness
- * inside the buffer lock in case another process reading
- * the underlying blockdev brought it uptodate (the sct fix).
- */
- for (i = 0; i < nr; i++) {
- bh = arr[i];
- if (buffer_uptodate(bh))
- end_buffer_async_read(bh, 1);
- else
+ /* Stage 3: start the IO */
+ bh = head;
+ do {
+ if (buffer_async_read(bh))
submit_bh(REQ_OP_READ, bh);
- }
+ } while ((bh = bh->b_this_page) != head);
+
return 0;
}
EXPORT_SYMBOL(block_read_full_folio);
I do wonder how much it's worth doing this vs switching to non-BH methods.
I appreciate that's a lot of work still.
That's what I've been wondering, too.
I would _vastly_ prefer to switch over to iomap; however, the blasted
sb_bread() is getting in the way. Currently iomap only runs on entire
pages / folios, but a lot of (older) filesystems insist on doing 512
byte I/O. While this seem logical (seeing that 512 bytes is the
default, and, in most cases, the only supported sector size) question
is whether _we_ from the linux side need to do that.
We _could_ upgrade to always do full page I/O; there's a good
chance we'll be using the entire page anyway eventually.
And with storage bandwidth getting larger and larger we might even
get a performance boost there.
And it would save us having to implement sub-page I/O for iomap.
Hmm?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman