On 1/25/20 1:19 AM, Jeff Layton wrote:
On Thu, 2020-01-23 at 01:04 +0800, Yan, Zheng wrote:
On Wed, Jan 22, 2020 at 3:31 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
v3:
- move some cephfs-specific code into ceph.ko
- present and track inode numbers as u64 values
- fix up check for dentry and cap eligibility checks
- set O_CEPH_EXCL on async creates
- attempt to handle errors better on async create (invalidate dentries
and dir completeness).
- ensure that fsync waits for async create to complete
v2:
- move cached layout to dedicated field in inode
- protect cached layout with i_ceph_lock
- wipe cached layout in __check_cap_issue
- set max_size of file to layout.stripe_unit
- set truncate_size to (u64)-1
- use dedicated CephFS feature bit instead of CEPHFS_FEATURE_OCTOPUS
- set cap_id to 1 in async created inode
- allocate inode number before submitting request
- rework the prep for an async create to be more efficient
- don't allow MDS or cap messages involving an inode until we get async
create reply
Still not quite ready for merge, but I've cleaned up a number of warts
in the v2 set. Performance numbers still look about the same.
There is definitely still a race of some sort that causes the client to
try to asynchronously create a dentry that already exists. I'm still
working on tracking that down.
Jeff Layton (10):
ceph: move net/ceph/ceph_fs.c to fs/ceph/util.c
ceph: make ceph_fill_inode non-static
ceph: make dentry_lease_is_valid non-static
ceph: make __take_cap_refs non-static
ceph: decode interval_sets for delegated inos
ceph: add flag to designate that a request is asynchronous
ceph: add infrastructure for waiting for async create to complete
ceph: add new MDS req field to hold delegated inode number
ceph: cache layout in parent dir on first sync create
ceph: attempt to do async create when possible
fs/ceph/Makefile | 2 +-
fs/ceph/caps.c | 38 +++--
fs/ceph/dir.c | 13 +-
fs/ceph/file.c | 240 +++++++++++++++++++++++++--
fs/ceph/inode.c | 50 +++---
fs/ceph/mds_client.c | 123 ++++++++++++--
fs/ceph/mds_client.h | 17 +-
fs/ceph/super.h | 16 +-
net/ceph/ceph_fs.c => fs/ceph/util.c | 4 -
include/linux/ceph/ceph_fs.h | 8 +-
net/ceph/Makefile | 2 +-
11 files changed, 443 insertions(+), 70 deletions(-)
rename net/ceph/ceph_fs.c => fs/ceph/util.c (94%)
--
2.24.1
I realized that there still are two issues:
- we needs to clear delegated inos when mds failover
I'm guessing we need to do this whenever any session is reconnected,
right? I think we may have bigger problems here though:
The issue is that with this set we assign out ino_t's prior to
submitting the request. We could get down into it and find that we're
reconnecting the session. At that point, that ino_t is no longer valid
for the session.
yes, that's a problem for current kclient impelmentaion
- we needs to clear caps for async dir operations when reconnecting to
mds. (see last commit of https://github.com/ceph/ceph/pull/32576)
I guess we can use ceph_iterate_session_caps to do this. That said,
looking at your patch:
if (in->is_dir()) {
// remove caps for async dir operations
cap.implemented &= (CEPH_CAP_ANY_SHARED | CEPH_CAP_ANY_EXCL);
}
We remove all but Fsx here. That seems like quite a subtle difference
from how FILE caps work.
only file caps have more than 2 bits. this removes caps other than
AsxLsxXsxFsx.
Given the way that we're doing the inode delegation and handling dir
caps, I think we may need to rework things such that async requests
never get queued to the s_waiting list, and instead return some sort of
distinct error that cues the caller to fall back to a synchronous
request.
That would help prevent some of the potential races here.
my libcephfs implementation check if a request can be async just before
sending the request.