On 6/9/22 12:02 PM, Yan, Zheng wrote:
On Thu, Jun 9, 2022 at 11:56 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
On 6/9/22 11:29 AM, Yan, Zheng wrote:
On Thu, Jun 9, 2022 at 11:19 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
On 6/9/22 10:15 AM, Yan, Zheng wrote:
The recent series of patches that add "wait on async xxxx" at various
places do not seem correct. The correct fix should make mds avoid any
wait when handling async requests.
In this case I am thinking what will happen if the async create request
is deferred, then the cap flush related request should fail to find the
ino.
Should we wait ? Then how to distinguish from migrating a subtree and a
deferred async create cases ?
async op caps are revoked at freezingtree stage of subtree migration.
see Locker::invalidate_lock_caches
Sorry I may not totally understand this issue.
I think you mean in case of migration and then the MDS will revoke caps
for the async create files and then the kclient will send a MclientCap
request to mds, right ?
If my understanding is correct, there is another case that:
1, async create a fileA
2, then write a lot of data to it and then release the Fw cap ref, and
if we should report the size to MDS, it will send a MclientCap request
to MDS too.
3, what if the async create of fileA was deferred due to some reason,
then the MclientCap request will fail to find the ino ?
Async op should not be deferred in any case.
I am still checking the 'mdcache->path_traverse()', in which it seems
could forward the request or requeue the request when failing to acquire
locks. And also in case [1].
[1] https://github.com/ceph/ceph/blob/main/src/mds/Server.cc#L4501.
On Wed, Jun 8, 2022 at 12:56 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
Currently, we'll call ceph_check_caps, but if we're still waiting on the
reply, we'll end up spinning around on the same inode in
flush_dirty_session_caps. Wait for the async create reply before
flushing caps.
Fixes: fbed7045f552 (ceph: wait for async create reply before sending any cap messages)
URL: https://tracker.ceph.com/issues/55823
Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
fs/ceph/caps.c | 1 +
1 file changed, 1 insertion(+)
I don't know if this will fix the tx queue stalls completely, but I
haven't seen one with this patch in place. I think it makes sense on its
own, either way.
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 0a48bf829671..5ecfff4b37c9 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -4389,6 +4389,7 @@ static void flush_dirty_session_caps(struct ceph_mds_session *s)
ihold(inode);
dout("flush_dirty_caps %llx.%llx\n", ceph_vinop(inode));
spin_unlock(&mdsc->cap_dirty_lock);
+ ceph_wait_on_async_create(inode);
ceph_check_caps(ci, CHECK_CAPS_FLUSH, NULL);
iput(inode);
spin_lock(&mdsc->cap_dirty_lock);
--
2.36.1