[RFC PATCH 0/9] ceph: add asynchronous create functionality

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I recently sent a patchset that allows the client to do an asynchronous
UNLINK call to the MDS when it has the appropriate caps and dentry info.
This set adds the corresponding functionality for creates.

When the client has the appropriate caps on the parent directory and
dentry information, and a delegated inode number, it can satisfy a
request locally without contacting the server. This allows the kernel
client to return very quickly from an O_CREAT open, so it can get on
with doing other things.

These numbers are based on my personal test rig, which is a KVM client
vs a vstart cluster running on my workstation (nothing scientific here).

A simple benchmark (with the cephfs mounted at /mnt/cephfs):
-------------------8<-------------------
#!/bin/sh

TESTDIR=/mnt/cephfs/test-dirops.$$

mkdir $TESTDIR
stat $TESTDIR
echo "Creating files in $TESTDIR"
time for i in `seq 1 10000`; do
    echo "foobarbaz" > $TESTDIR/$i
done
-------------------8<-------------------

With async dirops disabled:

real	0m9.865s
user	0m0.353s
sys	0m0.888s

With async dirops enabled:

real	0m5.272s
user	0m0.104s
sys	0m0.454s

That workload is a bit synthetic though. One workload we're interested
in improving is untar. Untarring a deep directory tree (random kernel
tarball I had laying around):

Disabled:
$ time tar xf ~/linux-4.18.0-153.el8.jlayton.006.tar

real	1m35.774s
user	0m0.835s
sys	0m7.410s

Enabled:
$ time tar xf ~/linux-4.18.0-153.el8.jlayton.006.tar

real	1m32.182s
user	0m0.783s
sys	0m6.830s

Not a huge win there. I suspect at this point that synchronous mkdir
may be serializing behind the async creates.

It needs a lot more performance tuning and analysis, but it's now at the
point where it's basically usable. To enable it, turn on the
ceph.enable_async_dirops module option.

There are some places that need further work:

1) The MDS patchset to delegate inodes to the client is not yet merged:

    https://github.com/ceph/ceph/pull/31817

2) this is 64-bit arch only for the moment. I'm using an xarray to track
the delegated inode numbers, and those don't do 64-bit indexes on
32-bit machines. Is anyone using 32-bit ceph clients? We could probably
build an xarray of xarrays if needed.

3) The error handling is still pretty lame. If the create fails, it'll
set a writeback error on the parent dir and the inode itself, but the
client could end up writing a bunch before it notices, if it even
bothers to check. We probably need to do better here. I'm open to
suggestions on this bit especially.

Jeff Layton (9):
  ceph: ensure we have a new cap before continuing in fill_inode
  ceph: print name of xattr being set in set/getxattr dout message
  ceph: close some holes in struct ceph_mds_request
  ceph: make ceph_fill_inode non-static
  libceph: export ceph_file_layout_is_valid
  ceph: decode interval_sets for delegated inos
  ceph: add flag to delegate an inode number for async create
  ceph: copy layout, max_size and truncate_size on successful sync
    create
  ceph: attempt to do async create when possible

 fs/ceph/caps.c               |  31 +++++-
 fs/ceph/file.c               | 202 +++++++++++++++++++++++++++++++++--
 fs/ceph/inode.c              |  57 +++++-----
 fs/ceph/mds_client.c         | 130 ++++++++++++++++++++--
 fs/ceph/mds_client.h         |  12 ++-
 fs/ceph/super.h              |  10 ++
 fs/ceph/xattr.c              |   5 +-
 include/linux/ceph/ceph_fs.h |   8 +-
 net/ceph/ceph_fs.c           |   1 +
 9 files changed, 396 insertions(+), 60 deletions(-)

-- 
2.24.1




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux