Re: MDS blocked ops; kernel: Workqueue: ceph-pg-invalid ceph_invalidate_work [ceph]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Hi, I encountered a problem with blocked MDS operations and a client
> becoming unresponsive. I dumped the MDS cache, ops, blocked ops and some
> further log information here:
>
> https://files.dtu.dk/u/peQSOY1kEja35BI5/2010-09-03-mds-blocked-ops?l
>
> A user of our HPC system was running a job that creates a somewhat
> stressful MDS load. This workload tends to lead to MDS warnings like "slow
> metadata ops" and "client does not respond to caps release", which usually
> disappear without intervantion after a while.

We have a HPC cluster with 4K cores with 30+ (large'ish) servers - 128GB
=> 768GB compute nodes - and have experience similar issues.

This bug seem very related:
https://tracker.ceph.com/issues/41467
(we havent gotten a version with that patch yet).

Upgrading to a 5.2 kernel with this commit:
3e1d0452edceebb903d23db53201013c940bf000
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3e1d0452edceebb903d23db53201013c940bf000

Was capable of deadlocking the kernel when memory pressure caused MDS to
reclaim capabilities - smells similar.



Jesper



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux