Re: MDS Behind on Trimming...

Erich Weiler <weiler@xxxxxxxxxxxx> · Fri, 19 Apr 2024 08:55:00 -0700

Hi Xiubo,

Nevermind I was wrong, most the blocked ops were 12 hours old.  Ug.

I restarted the MDS daemon to clear them.

I just reset to having one active MDS instead of two, let's see if that 
makes a difference.

I am beginning to think it may be impossible to catch the logs that 
matter here.  I feel like sometimes the blocked ops are just waiting 
because of load and sometimes they are waiting because they are stuck. 
But, it's really hard to tell which, without waiting a while.  But, I 
can't wait while having debug turned on because my root disks (which are 
150 GB large) fill up with debug logs in 20 minutes.  So it almost seems 
that unless I could somehow store many TB of debug logs we won't be able 
to catch this.

Let's see how having one MDS helps.  Or maybe I actually need like 4 
MDSs because the load is too high for only one or two.  I don't know. 
Or maybe it's the lock issue you've been working on.  I guess I can test 
the lock order fix when it's available to test.

-erich

On 4/19/24 7:26 AM, Erich Weiler wrote:
So I woke up this morning and checked the blocked_ops again, there were 
150 of them.  But the age of each ranged from 500 to 4300 seconds.  So 
it seems as if they are eventually being processed.

I wonder if we are thinking about this in the wrong way?  Maybe I should 
be *adding* MDS daemons because my current ones are overloaded?

Can a single server hold multiple MDS daemons?  Right now I have three 
physical servers each with one MDS daemon on it.

I can still try reducing to one.  And I'll keep an eye on blocked ops to 
see if any get to a very old age (and are thus wedged).

-erich

On 4/18/24 8:55 PM, Xiubo Li wrote:
Okay, please try it to set only one active mds.

On 4/19/24 11:54, Erich Weiler wrote:
We have 2 active MDS daemons and one standby.

On 4/18/24 8:52 PM, Xiubo Li wrote:
BTW, how man active mds you are using ?

On 4/19/24 10:55, Erich Weiler wrote:
OK, I'm sure I caught it in the right order this time, the logs 
should definitely show when the blocked/slow requests start.  Check 
out these logs and dumps:

http://hgwdev.gi.ucsc.edu/~weiler/

It's a 762 MB tarball but it uncompresses to 16 GB.

-erichll

On 4/18/24 6:57 PM, Xiubo Li wrote:
Okay, could you try this with 18.2.0 ?

I just double it was introduce by:

commit e610179a6a59c463eb3d85e87152ed3268c808ff
Author: Patrick Donnelly <pdonnell@xxxxxxxxxx>
Date:   Mon Jul 17 16:10:59 2023 -0400

     mds: drop locks and retry when lock set changes

     An optimization was added to avoid an unnecessary gather on 
the inode
     filelock when the client can safely get the file size without 
also
     getting issued the requested caps. However, if a retry of 
getattr
     is necessary, this conditional inclusion of the inode filelock
     can cause lock-order violations resulting in deadlock.

     So, if we've already acquired some of the inode's locks then 
we must
     drop locks and retry.

     Fixes: https://tracker.ceph.com/issues/62052
     Fixes: c822b3e2573578c288d170d1031672b74e02dced
     Signed-off-by: Patrick Donnelly <pdonnell@xxxxxxxxxx>
     (cherry picked from commit 
b5719ac32fe6431131842d62ffaf7101c03e9bac)

On 4/19/24 09:54, Erich Weiler wrote:
I'm on 18.2.1.  I think I may have gotten the timing off on the 
logs and dumps so I'll try again.  Just really hard to capture 
because I need to kind of be looking at it in real time to 
capture it. Hang on, lemme see if I can get another capture...

-erich

On 4/18/24 6:35 PM, Xiubo Li wrote:

BTW, which ceph version you are using ?

On 4/12/24 04:22, Erich Weiler wrote:
BTW - it just happened again, I upped the debugging settings as 
you instructed and got more dumps (then returned the debug 
settings to normal).

Attached are the new dumps.

Thanks again,
erich

On 4/9/24 9:00 PM, Xiubo Li wrote:

On 4/10/24 11:48, Erich Weiler wrote:
Dos that mean it could be the locker order bug 
(https://tracker.ceph.com/issues/62123) as Xiubo suggested?

I have raised one PR to fix the lock order issue, if 
possible please have a try to see could it resolve this issue.

Thank you!  Yeah, this issue is happening every couple days 
now. It just happened again today and I got more MDS dumps. 
If it would help, let me know and I can send them!

Once this happen if you could enable the mds debug logs will 
be better:

debug mds = 20

debug ms = 1

And then provide the debug logs together with the MDS dumps.

I assume if this fix is approved and backported it will then 
appear in like 18.2.3 or something?

Yeah, it will be backported after being well tested.

- Xiubo

Thanks again,
erich

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx