Re: [RFC 0/4] minimum folio order support in filemap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/22/23 07:51, Hannes Reinecke wrote:
On 6/22/23 00:07, Dave Chinner wrote:
On Wed, Jun 21, 2023 at 11:00:24AM +0200, Hannes Reinecke wrote:
On 6/21/23 10:38, Pankaj Raghav wrote:
There has been a lot of discussion recently to support devices and fs for bs > ps. One of the main plumbing to support buffered IO is to have a minimum
order while allocating folios in the page cache.

Hannes sent recently a series[1] where he deduces the minimum folio
order based on the i_blkbits in struct inode. This takes a different
approach based on the discussion in that thread where the minimum and
maximum folio order can be set individually per inode.

This series is based on top of Christoph's patches to have iomap aops
for the block cache[2]. I rebased his remaining patches to
next-20230621. The whole tree can be found here[3].

Compiling the tree with CONFIG_BUFFER_HEAD=n, I am able to do a buffered
IO on a nvme drive with bs>ps in QEMU without any issues:

[root@archlinux ~]# cat /sys/block/nvme0n2/queue/logical_block_size
16384
[root@archlinux ~]# fio -bs=16k -iodepth=8 -rw=write -ioengine=io_uring -size=500M
            -name=io_uring_1 -filename=/dev/nvme0n2 -verify=md5
io_uring_1: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=io_uring, iodepth=8
fio-3.34
Starting 1 process
Jobs: 1 (f=1): [V(1)][100.0%][r=336MiB/s][r=21.5k IOPS][eta 00m:00s]
io_uring_1: (groupid=0, jobs=1): err= 0: pid=285: Wed Jun 21 07:58:29 2023
    read: IOPS=27.3k, BW=426MiB/s (447MB/s)(500MiB/1174msec)
    <snip>
Run status group 0 (all jobs):
     READ: bw=426MiB/s (447MB/s), 426MiB/s-426MiB/s (447MB/s-447MB/s), io=500MiB (524MB), run=1174-1174msec     WRITE: bw=198MiB/s (207MB/s), 198MiB/s-198MiB/s (207MB/s-207MB/s), io=500MiB (524MB), run=2527-2527msec

Disk stats (read/write):
    nvme0n2: ios=35614/4297, merge=0/0, ticks=11283/1441, in_queue=12725, util=96.27%

One of the main dependency to work on a block device with bs>ps is
Christoph's work on converting block device aops to use iomap.

[1] https://lwn.net/Articles/934651/
[2] https://lwn.net/ml/linux-kernel/20230424054926.26927-1-hch@xxxxxx/
[3] https://github.com/Panky-codes/linux/tree/next-20230523-filemap-order-generic-v1

Luis Chamberlain (1):
    block: set mapping order for the block cache in set_init_blocksize

Matthew Wilcox (Oracle) (1):
    fs: Allow fine-grained control of folio sizes

Pankaj Raghav (2):
    filemap: use minimum order while allocating folios
    nvme: enable logical block size > PAGE_SIZE

   block/bdev.c             |  9 ++++++++
   drivers/nvme/host/core.c |  2 +-
   include/linux/pagemap.h  | 46 ++++++++++++++++++++++++++++++++++++----
   mm/filemap.c             |  9 +++++---
   mm/readahead.c           | 34 ++++++++++++++++++++---------
   5 files changed, 82 insertions(+), 18 deletions(-)


Hmm. Most unfortunate; I've just finished my own patchset (duplicating much
of this work) to get 'brd' running with large folios.
And it even works this time, 'fsx' from the xfstest suite runs happily on
that.

So you've converted a filesystem to use bs > ps, too? Or is the
filesystem that fsx is running on just using normal 4kB block size?
If the latter, then fsx is not actually testing the large folio page
cache support, it's mostly just doing 4kB aligned IO to brd....

I have been running fsx on an xfs with bs=16k, and it worked like a charm.
I'll try to run the xfstest suite once I'm finished with merging
Pankajs patches into my patchset.
Well, would've been too easy.
'fsx' bails out at test 27 (collapse), with:

XFS (ram0): Corruption detected. Unmount and run xfs_repair
XFS (ram0): Internal error isnullstartblock(got.br_startblock) at line 5787 of file fs/xfs/libxfs/xfs_bmap.c. Caller xfs_bmap_collapse_extents+0x2d9/0x320 [xfs]

Guess some more work needs to be done here.

Cheers,

Hannes
--
Dr. Hannes Reinecke                Kernel Storage Architect
hare@xxxxxxx                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux