On 4/26/22 2:06 AM, Vladimir Brik wrote:
> a), max_mds > 1 ?
No, but I had tried it in the past (i.e. set max_mds to 2, and then
reverted back to 1)
> b), inline_data enabled ?
No
Okay, this is a different bug.
> c), how to reproduce it, could you provide the detail steps ?
Sometimes, but not always, something like this will hang:
dd if=/dev/zero of=zero bs=100M count=1
I am using the upstream code and have created thousands of file by using
dd command, but couldn't reproduce it.
Could you try kernel-4.18.0-376.el8, which has been synced to the
upstream recently ? Maybe this bug only existing in old versions.
-- Xiubo
We use cephfs a the shared storage for our cluster, and another way to
reproduce it is to start many jobs that execute something like
date > <path_to_some_dir>/$RANDOM
In this case there is no hanging, but all files in path_to_some_dir
are empty.
> d), could you enable the kernel debug log and set the
> debug_mds to 25 in MDSes and share the logs ?
As of this morning we began experiencing OSD cyclically crashing with
"heartbeat_map is_healthy ... had suicide timed out" so the logs
probably will have a lot of unrelated stuff until we fix that issue.
I'll let you know when that happens
Vlad
On 4/24/22 23:40, Xiubo Li wrote:
Hi Vladimir,
This issue looks like the one I am working on now in [1], which is
also a infinitely stuck bug when creating a new file and then writes
something to it.
The issue [1] was caused by setting the max_mds > 1 and enabling the
inline_data and then create a file and then write to it. It seems a
deadlock in MDS vs kernel.
BTW, what's your setup for:
a), max_mds > 1 ?
b), inline_data enabled ?
c), how to reproduce it, could you provide the detail steps ?
d), could you enable the kernel debug log and set the debug_mds to 25
in MDSes and share the logs ?
[1] https://tracker.ceph.com/issues/55377
Thanks
BRs
-- Xiubo
On 4/25/22 5:25 AM, Vladimir Brik wrote:
Hello
We are experiencing an issue where, sometimes, when users write to
cephfs an empty file is created and then the application hangs,
seemingly indefinitely. I am sometimes able to reproduce with dd.
Does anybody know what might be going on?
Some details:
- ceph health complains about 100+ slow metadata IOs
- CPU utilization of ceph-mds is low
- We have almost 200 kernel cephfs clients
- Cephfs metadata is stored on 3 OSDs that use NVMe flash AICs
Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx