Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations

Pankaj Raghav <p.raghav@xxxxxxxxxxx> · Thu, 16 Mar 2023 16:29:56 +0100

Hi Keith,

On 2023-03-03 23:32, Keith Busch wrote:
>> Yes, clearly it says *yet* so that begs the question what would be
>> required?
> 
> Oh, gotcha. I'll work on a list of places it currently crashes.
>  
I started looking into this to see why it crashes when we increase the LBA
size of a block device greater than the page size. These are my primary
findings:

- Block device aops (address_space_operations) are all based on buffer
head, which limits us to work on only PAGE_SIZE chunks.

For a 8k LBA size, the stack trace you posted ultimately fails inside
alloc_page_buffers as the size will be > PAGE_SIZE.

struct buffer_head *alloc_page_buffers(struct page *page, unsigned long
size, bool retry)

{

        struct buffer_head *bh, *head;

....

        head = NULL;

        offset = PAGE_SIZE;

        while ((offset -= size) >= 0) {
	// we will not go into this loop as offset will be negative
...
...
	}
	return head;
}

- As Dave chinner pointed out later in the thread, we allocate pages in the
page cache with order 0, instead of BS of the device or the filesystem.
Letting filemap_get_folio(FGP_CREAT) allocate folios in LBA size for a
block device should solve that problem, I guess.

Is it a crazy idea to convert block device aops (block/fops.c) to use iomap
which supports higher order folios instead of mpage and other functions
that use buffer head?

Let me know your thoughts.
--
Pankaj