Re: [RFC PATCH v2 00/10] ceph: asynchronous file create support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2020-01-21 at 21:20 +0800, Yan, Zheng wrote:
> On 1/21/20 6:56 PM, Jeff Layton wrote:
> > On Mon, 2020-01-20 at 21:20 +0800, Yan, Zheng wrote:
> > > On 1/16/20 4:59 AM, Jeff Layton wrote:
> > > > v2:
> > > > - move cached layout to dedicated field in inode
> > > > - protect cached layout with i_ceph_lock
> > > > - wipe cached layout in __check_cap_issue
> > > > - set max_size of file to layout.stripe_unit
> > > > - set truncate_size to (u64)-1
> > > > - use dedicated CephFS feature bit instead of CEPHFS_FEATURE_OCTOPUS
> > > > - set cap_id to 1 in async created inode
> > > > - allocate inode number before submitting request
> > > > - rework the prep for an async create to be more efficient
> > > > - don't allow MDS or cap messages involving an inode until we get async
> > > >     create reply
> > > > 
> > > > A lot of changes in this set, mostly based on Zheng and Xiubo's
> > > > comments. Performance is pretty similar to the previous set:
> > > > 
> > > > Untarring a kernel tarball into a cephfs takes about 98s with
> > > > async dirops disabled. With them enabled, it takes around 78s,
> > > > which is about a 25% improvement.
> > > > 
> > > > This is not quite ready for merge. Error handling could still be
> > > > improved. With xfstest generic/531, I see some messages like this pop
> > > > up in the ring buffer:
> > > > 
> > > >       [ 7331.393110] ceph: ceph_async_create_cb: inode number mismatch! err=0 deleg_ino=0x100001232d9 target=0x100001232b9
> > > > 
> > > > Basically, we went to do an async create and got a different inode
> > > > number back than expected. That still needs investigation, but I
> > > > didn't see any test failures due to it.
> > > > 
> > > > Jeff Layton (10):
> > > >     libceph: export ceph_file_layout_is_valid
> > > >     ceph: make ceph_fill_inode non-static
> > > >     ceph: make dentry_lease_is_valid non-static
> > > >     ceph: make __take_cap_refs a public function
> > > >     ceph: decode interval_sets for delegated inos
> > > >     ceph: add flag to designate that a request is asynchronous
> > > >     ceph: add infrastructure for waiting for async create to complete
> > > >     ceph: add new MDS req field to hold delegated inode number
> > > >     ceph: cache layout in parent dir on first sync create
> > > >     ceph: attempt to do async create when possible
> > > > 
> > > >    fs/ceph/caps.c               |  34 ++++--
> > > >    fs/ceph/dir.c                |  13 ++-
> > > >    fs/ceph/file.c               | 218 +++++++++++++++++++++++++++++++++--
> > > >    fs/ceph/inode.c              |  50 ++++----
> > > >    fs/ceph/mds_client.c         | 126 ++++++++++++++++++--
> > > >    fs/ceph/mds_client.h         |   9 +-
> > > >    fs/ceph/super.h              |  16 ++-
> > > >    include/linux/ceph/ceph_fs.h |   8 +-
> > > >    net/ceph/ceph_fs.c           |   1 +
> > > >    9 files changed, 410 insertions(+), 65 deletions(-)
> > > > 
> > > 
> > > An issue of kernel async unlink/create implementation is
> > > get_caps_for_async_unlink/try_prep_async_create are called before
> > > calling ceph_mdsc_submit_request. There is a small windows that
> > > session's state may change. If session is in wrong state,
> > > ceph_mdsc_submit_request() may wait and not send request immediately.
> > > 
> > 
> > Is that a real issue (other than performance)?
> > 
> > We hold cap references, so assuming that the session can be reconnected
> > and that we keep the caps, everything should still work correctly, no?
> > 
> 
> I think it may cause more troubles for error handling (For the case that 
> client failed to reconnect session)
> 

Yeah, error handling is a real Achilles heel in this whole thing. I'm
still considering how best to deal with errors when this occurs. There
may not be any good answers for that w/o adding new interfaces.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux