Re: Help needed, ceph fs down due to large stray dir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

looking at the log data (see snippet at end) we suspect a classic "producer–consumer" deadlock since it seems that the same thread that is filling the purge queue at PurgeQueue.cc:L335:journaler.append_entry(bl) in function PurgeQueue::push is also responsible for processing it but the call to scan_stray_dir() at the end of MDCache::populate_mydir() is atomic and blocks processing until it completes.

I'm looking through the code right now to find what parameter is used in constructing this queue that allows adjusting its size. I need to triple it in size at least to get scan_stray_dir() to complete and see what happens then.

This would, of course, go much faster if someone familiar with the code would tell me what that parameter is.

Here the log snippet when the purge queue runs full:

2025-01-16T14:06:00.490+0100 7f25cfa31700 10 mds.2.cache.ino(0x20028c29e69) purge_stale_snap_data 
2025-01-16T14:06:00.490+0100 7f25cfa31700 10 mds.2.cache.ino(0x20028c29e69)  purging old_inode [c7d,dae]
2025-01-16T14:06:00.490+0100 7f25cfa31700 10 mds.2.cache clear_dirty_bits_for_stray [inode 0x20028c29e69 [...daf,head] ~mds2/stray7/20028c29e69/ auth v2199183125 snaprealm=0x562c9fb6e600 f(v0 m2024-12-28T20:17:59.487395+0100) n(v0 rc2024-12-28T20:17:59.487395+0100 1=0+1) (iversion lock) 0x562c9fb70b00]
2025-01-16T14:06:00.490+0100 7f25cfa31700 20 mds.2.cache.strays enqueue: purging dn: [dentry #0x102/stray7/20028c29e69 [daf,head] auth (dversion lock) pv=0 v=2199200546 ino=0x20028c29e69 state=1073741844 0x562c9fb6db80]
2025-01-16T14:06:00.490+0100 7f25cfa31700 10 mds.2.cache.dir(0x61b.000100011011*) auth_pin by 0x55e0f256ff80 on [dir 0x61b.000100011011* ~mds2/stray7/ [2,head] auth v=2199200547 cv=2199200546/2199200546 ap=1928+0 state=1610620929|complete|sticky f(v23 m2025-01-10T13:30:41.594215+0100 2568=0+2568)/f(v23 m2025-01-10T13:30:41.594215+0100 2569=1+2568) n(v2110 rc2025-01-10T13:30:41.594215+0100 2568=0+2568)/n(v2110 rc2025-01-10T13:30:41.594215+0100 b7068 2569=1+2568) hs=2568+0,ss=0+0 | child=1 sticky=1 dirty=1 waiter=0 authpin=1 0x562c98bf0880] count now 1928
2025-01-16T14:06:00.490+0100 7f25cfa31700 10 mds.2.cache.strays purge [dentry #0x102/stray7/20028c29e69 [daf,head] auth (dversion lock) pv=0 v=2199200546 ino=0x20028c29e69 state=1073741876 | purging=1 0x562c9fb6db80] [inode 0x20028c29e69 [...daf,head] ~mds2/stray7/20028c29e69/ auth v2199183125 snaprealm=0x562c9fb6e600 f(v0 m2024-12-28T20:17:59.487395+0100) n(v0 rc2024-12-28T20:17:59.487395+0100 1=0+1) (iversion lock) 0x562c9fb70b00]
2025-01-16T14:06:00.490+0100 7f25cfa31700  4 mds.2.purge_queue push: pushing inode 0x20028c29e69
2025-01-16T14:06:00.537+0100 7f25d7240700 20 -- [v2:192.168.32.85:6800/845135215,v1:192.168.32.85:6801/845135215] >> v1:192.168.48.132:0/2119497078 conn(0x55e0f3d3e400 legacy=0x55e0f3d34000 unknown :6801 s=STATE_CONNECTION_ESTABLISHED l=0).process
2025-01-16T14:06:00.537+0100 7f25d7240700 20 -- [v2:192.168.32.85:6800/845135215,v1:192.168.32.85:6801/845135215] >> v1:192.168.48.132:0/2119497078 conn(0x55e0f3d3e400 legacy=0x55e0f3d34000 unknown :6801 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=1
2025-01-16T14:06:00.537+0100 7f25d7240700 20 --1- [v2:192.168.32.85:6800/845135215,v1:192.168.32.85:6801/845135215] >> v1:192.168.48.132:0/2119497078 conn(0x55e0f3d3e400 0x55e0f3d34000 :6801 s=OPENED pgs=828965 cs=1 l=0).handle_message r=0
2025-01-16T14:06:00.537+0100 7f25d7240700 20 --1- [v2:192.168.32.85:6800/845135215,v1:192.168.32.85:6801/845135215] >> v1:192.168.48.132:0/2119497078 conn(0x55e0f3d3e400 0x55e0f3d34000 :6801 s=OPENED pgs=828965 cs=1 l=0).handle_message process tag 14

Best regards and have a good weekend.
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux