Quoting Frank Schilder (frans@xxxxxx): > Dear Stefan, > > thanks for the fast reply. We encountered the problem again, this time in a much simpler situation; please see below. However, let me start with your questions first: > > What bug? -- In a single-active MDS set-up, should there ever occur an operation with "op_name": "fragmentdir"? Yes, see http://docs.ceph.com/docs/mimic/cephfs/dirfrags/. If you would have multiple active MDS the load could be shared among those. There are some parameters that might need to be tuned in your environment. But Zheng Yan is an expert in this matter, so maybe after analysis of the mds dump cache it might reveal what is the culprit. > Upgrading: The problem described here is the only issue we observe. > Unless the problem is fixed upstream, upgrading won't help us and > would be a bit of a waste of time. If someone can confirm that this > problem is fixed in a newer version, we will do it. Otherwise, we > might prefer to wait until it is. Keeping your systems up to date generally improves stability. You might prevent hitting issues when your workload changes in the future. First testing new releases on a test system is recommended though. > > News on the problem. We encountered it again when one of our users executed a command in parallel with pdsh on all our ~500 client nodes. This command accesses the same file from all these nodes pretty much simultaneously. We did this quite often in the past, but this time, the command got stuck and we started observing the MDS health problem again. Symptoms: This command, does that incur writes, reads or a combination of both on files in this directory? I wonder if you might prevent this from happening when tuning "Activity thresholds". Especially when you say it is load (# clients) dependend. Gr. Stefan -- | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com