Re: [RFC PATCH 0/9] ceph: add asynchronous create functionality

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/11/20 4:56 AM, Jeff Layton wrote:
I recently sent a patchset that allows the client to do an asynchronous
UNLINK call to the MDS when it has the appropriate caps and dentry info.
This set adds the corresponding functionality for creates.

When the client has the appropriate caps on the parent directory and
dentry information, and a delegated inode number, it can satisfy a
request locally without contacting the server. This allows the kernel
client to return very quickly from an O_CREAT open, so it can get on
with doing other things.

These numbers are based on my personal test rig, which is a KVM client
vs a vstart cluster running on my workstation (nothing scientific here).

A simple benchmark (with the cephfs mounted at /mnt/cephfs):
-------------------8<-------------------
#!/bin/sh

TESTDIR=/mnt/cephfs/test-dirops.$$

mkdir $TESTDIR
stat $TESTDIR
echo "Creating files in $TESTDIR"
time for i in `seq 1 10000`; do
     echo "foobarbaz" > $TESTDIR/$i
done
-------------------8<-------------------

With async dirops disabled:

real	0m9.865s
user	0m0.353s
sys	0m0.888s

With async dirops enabled:

real	0m5.272s
user	0m0.104s
sys	0m0.454s

That workload is a bit synthetic though. One workload we're interested
in improving is untar. Untarring a deep directory tree (random kernel
tarball I had laying around):

Disabled:
$ time tar xf ~/linux-4.18.0-153.el8.jlayton.006.tar

real	1m35.774s
user	0m0.835s
sys	0m7.410s

Enabled:
$ time tar xf ~/linux-4.18.0-153.el8.jlayton.006.tar

real	1m32.182s
user	0m0.783s
sys	0m6.830s

Not a huge win there. I suspect at this point that synchronous mkdir
may be serializing behind the async creates.

It needs a lot more performance tuning and analysis, but it's now at the
point where it's basically usable. To enable it, turn on the
ceph.enable_async_dirops module option.

There are some places that need further work:

1) The MDS patchset to delegate inodes to the client is not yet merged:

     https://github.com/ceph/ceph/pull/31817

2) this is 64-bit arch only for the moment. I'm using an xarray to track
the delegated inode numbers, and those don't do 64-bit indexes on
32-bit machines. Is anyone using 32-bit ceph clients? We could probably
build an xarray of xarrays if needed.

3) The error handling is still pretty lame. If the create fails, it'll
set a writeback error on the parent dir and the inode itself, but the
client could end up writing a bunch before it notices, if it even
bothers to check. We probably need to do better here. I'm open to
suggestions on this bit especially.

Jeff Layton (9):
   ceph: ensure we have a new cap before continuing in fill_inode
   ceph: print name of xattr being set in set/getxattr dout message
   ceph: close some holes in struct ceph_mds_request
   ceph: make ceph_fill_inode non-static
   libceph: export ceph_file_layout_is_valid
   ceph: decode interval_sets for delegated inos
   ceph: add flag to delegate an inode number for async create
   ceph: copy layout, max_size and truncate_size on successful sync
     create
   ceph: attempt to do async create when possible

  fs/ceph/caps.c               |  31 +++++-
  fs/ceph/file.c               | 202 +++++++++++++++++++++++++++++++++--
  fs/ceph/inode.c              |  57 +++++-----
  fs/ceph/mds_client.c         | 130 ++++++++++++++++++++--
  fs/ceph/mds_client.h         |  12 ++-
  fs/ceph/super.h              |  10 ++
  fs/ceph/xattr.c              |   5 +-
  include/linux/ceph/ceph_fs.h |   8 +-
  net/ceph/ceph_fs.c           |   1 +
  9 files changed, 396 insertions(+), 60 deletions(-)


client should wait for reply of aysnc create, before sending cap message or request (which operates on the creating inode) to mds


see commit "client: wait for async creating before sending request or cap message" in https://github.com/ceph/ceph/pull/32576





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux