Re: [PATCH 00/17] orangefs: page cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Using the page cache seems like a game changer for the Orangefs kernel module.
Workloads with small IO suffer trying to push a parallel filesystem
with just a handful of bytes at a time. Below, vm2 with Fedora's 4.17
has /pvfsmnt mounted from an Orangefs filesystem that is itself running
on vm2. vm1 with 4.19.0-rc2  plus the Orangefs page cache patch, also has
its /pvfsmnt mounted from a local Orangefs filesystem.

[vm2]$ dd if=/dev/zero of=/pvfsmnt/d.vm2/d.foo/dds.out bs=128 count=4194304
4194304+0 records in
4194304+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 662.013 s, 811 kB/s

[vm1]$ dd if=/dev/zero of=/pvfsmnt/d.vm1/d.foo/dds.out bs=128 count=4194304
4194304+0 records in
4194304+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 11.3072 s, 47.5 MB/s

Small IO collects in the page cache until a reasonable amount of
data is available for writeback.

The trick, it seems, is to improve small IO without harming large IO.
Aligning writeback sizes, when possible, with the size of the IO buffer
that the Orangefs kernel module shares with its userspace component seems
promising on my dinky vm tests.

-Mike

On Mon, Sep 17, 2018 at 4:11 PM Martin Brandenburg <martin@xxxxxxxxxxxx> wrote:
>
> If no major issues are found in review or in our testing, we intend to
> submit this during the next merge window.
>
> The goal of all this is to significantly reduce the number of network
> requests made to the OrangeFS
>
> First the xattr cache is needed because otherwise we make a ton of
> getxattr calls from security_inode_need_killpriv.
>
> Then there's some reorganization so inode changes can be cached.
> Finally, we enable write_inode.
>
> Then remove the old readpages.  Next there's some reorganization to
> support readpage/writepage.  Finally, enable readpage/writepage which
> is fairly straightforward except for the need to separate writes from
> different uid/gid pairs due to the design of our server.
>
> Martin Brandenburg (17):
>   orangefs: implement xattr cache
>   orangefs: do not invalidate attributes on inode create
>   orangefs: simply orangefs_inode_getattr interface
>   orangefs: update attributes rather than relying on server
>   orangefs: hold i_lock during inode_getattr
>   orangefs: set up and use backing_dev_info
>   orangefs: let setattr write to cached inode
>   orangefs: reorganize setattr functions to track attribute changes
>   orangefs: remove orangefs_readpages
>   orangefs: service ops done for writeback are not killable
>   orangefs: migrate to generic_file_read_iter
>   orangefs: implement writepage
>   orangefs: skip inode writeout if nothing to write
>   orangefs: write range tracking
>   orangefs: avoid fsync service operation on flush
>   orangefs: use kmem_cache for orangefs_write_request
>   orangefs: implement writepages
>
>  fs/orangefs/acl.c             |   4 +-
>  fs/orangefs/file.c            | 193 ++++--------
>  fs/orangefs/inode.c           | 576 +++++++++++++++++++++++++++-------
>  fs/orangefs/namei.c           |  41 ++-
>  fs/orangefs/orangefs-cache.c  |  24 +-
>  fs/orangefs/orangefs-kernel.h |  56 +++-
>  fs/orangefs/orangefs-mod.c    |  10 +-
>  fs/orangefs/orangefs-utils.c  | 181 +++++------
>  fs/orangefs/super.c           |  38 ++-
>  fs/orangefs/waitqueue.c       |  18 +-
>  fs/orangefs/xattr.c           | 104 ++++++
>  11 files changed, 839 insertions(+), 406 deletions(-)
>
> --
> 2.19.0
>



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux