Re: [PATCH 4/5] ceph: flush the mdlog before waiting on unsafe reqs

Xiubo Li <xiubli@xxxxxxxxxx> · Wed, 30 Jun 2021 09:26:39 +0800

On 6/29/21 9:25 PM, Jeff Layton wrote:
On Tue, 2021-06-29 at 12:42 +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

For the client requests who will have unsafe and safe replies from
MDS daemons, in the MDS side the MDS daemons won't flush the mdlog
(journal log) immediatelly, because they think it's unnecessary.
That's true for most cases but not all, likes the fsync request.
The fsync will wait until all the unsafe replied requests to be
safely replied.

Normally if there have multiple threads or clients are running, the
whole mdlog in MDS daemons could be flushed in time if any request
will trigger the mdlog submit thread. So usually we won't experience
the normal operations will stuck for a long time. But in case there
has only one client with only thread is running, the stuck phenomenon
maybe obvious and the worst case it must wait at most 5 seconds to
wait the mdlog to be flushed by the MDS's tick thread periodically.

This patch will trigger to flush the mdlog in all the MDSes manually
just before waiting the unsafe requests to finish.

Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
  fs/ceph/caps.c | 9 +++++++++
  1 file changed, 9 insertions(+)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index c6a3352a4d52..6e80e4649c7a 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2286,6 +2286,7 @@ static int caps_are_flushed(struct inode *inode, u64 flush_tid)
   */
  static int unsafe_request_wait(struct inode *inode)
  {
+	struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
  	struct ceph_inode_info *ci = ceph_inode(inode);
  	struct ceph_mds_request *req1 = NULL, *req2 = NULL;
  	int ret, err = 0;
@@ -2305,6 +2306,14 @@ static int unsafe_request_wait(struct inode *inode)
  	}
  	spin_unlock(&ci->i_unsafe_lock);
  
+	/*
+	 * Trigger to flush the journal logs in all the MDSes manually,
+	 * or in the worst case we must wait at most 5 seconds to wait
+	 * the journal logs to be flushed by the MDSes periodically.
+	 */
+	if (req1 || req2)
+		flush_mdlog(mdsc);
+
So this is called on fsync(). Do we really need to flush all of the mds
logs on every fsync? That sounds like it might have some performance
impact. Would it be possible to just flush the mdslog on the MDS that's
authoritative for this inode?

I hit one case before, the mds.0 is the auth mds, but the client just 
sent the request to mds.2, then when the mds.2 tried to gather the 
rdlocks then it was stuck for waiting for the mds.0 to flush the mdlog. 
I think it also will happen that if the mds.0 could also be stuck just 
like this even its the auth MDS.

Normally the mdlog submit thread will be triggered per MDS's tick, 
that's 5 seconds. But this is not always true mostly because any other 
client request could trigger the mdlog submit thread to run at any time. 
Since the fsync is not running all the time, so IMO the performance 
impact should be okay.



  	dout("unsafe_request_wait %p wait on tid %llu %llu\n",
  	     inode, req1 ? req1->r_tid : 0ULL, req2 ? req2->r_tid : 0ULL);
  	if (req1) {