Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Keith,

On 2023-03-03 23:32, Keith Busch wrote:
>> Yes, clearly it says *yet* so that begs the question what would be
>> required?
> 
> Oh, gotcha. I'll work on a list of places it currently crashes.
>  
I started looking into this to see why it crashes when we increase the LBA
size of a block device greater than the page size. These are my primary
findings:

- Block device aops (address_space_operations) are all based on buffer
head, which limits us to work on only PAGE_SIZE chunks.

For a 8k LBA size, the stack trace you posted ultimately fails inside
alloc_page_buffers as the size will be > PAGE_SIZE.

struct buffer_head *alloc_page_buffers(struct page *page, unsigned long
size, bool retry)



{



        struct buffer_head *bh, *head;



....







        head = NULL;



        offset = PAGE_SIZE;



        while ((offset -= size) >= 0) {
	// we will not go into this loop as offset will be negative
...
...
	}
	return head;
}

- As Dave chinner pointed out later in the thread, we allocate pages in the
page cache with order 0, instead of BS of the device or the filesystem.
Letting filemap_get_folio(FGP_CREAT) allocate folios in LBA size for a
block device should solve that problem, I guess.

Is it a crazy idea to convert block device aops (block/fops.c) to use iomap
which supports higher order folios instead of mpage and other functions
that use buffer head?

Let me know your thoughts.
--
Pankaj



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux