Re: MDS Behind on Trimming...

Xiubo Li <xiubli@xxxxxxxxxx> · Mon, 22 Apr 2024 12:39:50 +0800

Hi Erich,

I raised one tracker for this https://tracker.ceph.com/issues/65607.

Currently I haven't figured out where was holding the 'dn->lock' in the 
'lookup' request or somewhere else, since there is not debug log.

Hopefully we can get the debug logs, which we can push it further.

Thanks

- Xiubo

On 4/19/24 23:55, Erich Weiler wrote:
Hi Xiubo,

Nevermind I was wrong, most the blocked ops were 12 hours old. Ug.

I restarted the MDS daemon to clear them.

I just reset to having one active MDS instead of two, let's see if 
that makes a difference.

I am beginning to think it may be impossible to catch the logs that 
matter here.  I feel like sometimes the blocked ops are just waiting 
because of load and sometimes they are waiting because they are stuck. 
But, it's really hard to tell which, without waiting a while.  But, I 
can't wait while having debug turned on because my root disks (which 
are 150 GB large) fill up with debug logs in 20 minutes.  So it almost 
seems that unless I could somehow store many TB of debug logs we won't 
be able to catch this.

Let's see how having one MDS helps.  Or maybe I actually need like 4 
MDSs because the load is too high for only one or two.  I don't know. 
Or maybe it's the lock issue you've been working on.  I guess I can 
test the lock order fix when it's available to test.

-erich

On 4/19/24 7:26 AM, Erich Weiler wrote:
So I woke up this morning and checked the blocked_ops again, there 
were 150 of them.  But the age of each ranged from 500 to 4300 
seconds.  So it seems as if they are eventually being processed.

I wonder if we are thinking about this in the wrong way?  Maybe I 
should be *adding* MDS daemons because my current ones are overloaded?

Can a single server hold multiple MDS daemons?  Right now I have 
three physical servers each with one MDS daemon on it.

I can still try reducing to one.  And I'll keep an eye on blocked ops 
to see if any get to a very old age (and are thus wedged).

-erich

On 4/18/24 8:55 PM, Xiubo Li wrote:
Okay, please try it to set only one active mds.

On 4/19/24 11:54, Erich Weiler wrote:
We have 2 active MDS daemons and one standby.

On 4/18/24 8:52 PM, Xiubo Li wrote:
BTW, how man active mds you are using ?

On 4/19/24 10:55, Erich Weiler wrote:
OK, I'm sure I caught it in the right order this time, the logs 
should definitely show when the blocked/slow requests start.  
Check out these logs and dumps:

http://hgwdev.gi.ucsc.edu/~weiler/

It's a 762 MB tarball but it uncompresses to 16 GB.

-erichll

On 4/18/24 6:57 PM, Xiubo Li wrote:
Okay, could you try this with 18.2.0 ?

I just double it was introduce by:

commit e610179a6a59c463eb3d85e87152ed3268c808ff
Author: Patrick Donnelly <pdonnell@xxxxxxxxxx>
Date:   Mon Jul 17 16:10:59 2023 -0400

     mds: drop locks and retry when lock set changes

     An optimization was added to avoid an unnecessary gather on 
the inode
     filelock when the client can safely get the file size 
without also
     getting issued the requested caps. However, if a retry of 
getattr
     is necessary, this conditional inclusion of the inode filelock
     can cause lock-order violations resulting in deadlock.

     So, if we've already acquired some of the inode's locks 
then we must
     drop locks and retry.

     Fixes: https://tracker.ceph.com/issues/62052
     Fixes: c822b3e2573578c288d170d1031672b74e02dced
     Signed-off-by: Patrick Donnelly <pdonnell@xxxxxxxxxx>
     (cherry picked from commit 
b5719ac32fe6431131842d62ffaf7101c03e9bac)

On 4/19/24 09:54, Erich Weiler wrote:
I'm on 18.2.1.  I think I may have gotten the timing off on the 
logs and dumps so I'll try again.  Just really hard to capture 
because I need to kind of be looking at it in real time to 
capture it. Hang on, lemme see if I can get another capture...

-erich

On 4/18/24 6:35 PM, Xiubo Li wrote:

BTW, which ceph version you are using ?

On 4/12/24 04:22, Erich Weiler wrote:
BTW - it just happened again, I upped the debugging settings 
as you instructed and got more dumps (then returned the debug 
settings to normal).

Attached are the new dumps.

Thanks again,
erich

On 4/9/24 9:00 PM, Xiubo Li wrote:

On 4/10/24 11:48, Erich Weiler wrote:
Dos that mean it could be the locker order bug 
(https://tracker.ceph.com/issues/62123) as Xiubo suggested?

I have raised one PR to fix the lock order issue, if 
possible please have a try to see could it resolve this 
issue.

Thank you!  Yeah, this issue is happening every couple days 
now. It just happened again today and I got more MDS dumps. 
If it would help, let me know and I can send them!

Once this happen if you could enable the mds debug logs will 
be better:

debug mds = 20

debug ms = 1

And then provide the debug logs together with the MDS dumps.

I assume if this fix is approved and backported it will 
then appear in like 18.2.3 or something?

Yeah, it will be backported after being well tested.

- Xiubo

Thanks again,
erich

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx