I am attaching MDS log with debug set to 25 of a time period
(few seconds' worth) during which a dd command got stuck (it
never got unstuck) and resulted in an empty file. I am
guessing it was able to create the file but was blocked from
writing to it.
The command was "dd if=/dev/zero of=xxxxxxxx bs=100M
count=1" in /mnt/ceph1-npx/user/vbrik/ (on the client) which
corresponds to /npx/user/vbrik/ in cephfs.
The client is running Centos 7.9 with
3.10.0-1160.62.1.el7.x86_64 kernel. This is the highest
kernel version I can test on Centos7. It seems Alma 8.5 is
not affected by this issue (it has other issues though), but
we need cephfs to work on Centos7.
This is Ceph 16.2.7.
Vlad
On 4/25/22 23:56, Xiubo Li wrote:
On 4/26/22 2:06 AM, Vladimir Brik wrote:
> a), max_mds > 1 ?
No, but I had tried it in the past (i.e. set max_mds to 2,
and then reverted back to 1)
> b), inline_data enabled ?
No
Okay, this is a different bug.
> c), how to reproduce it, could you provide the detail
steps ?
Sometimes, but not always, something like this will hang:
dd if=/dev/zero of=zero bs=100M count=1
I am using the upstream code and have created thousands of
file by using dd command, but couldn't reproduce it.
Could you try kernel-4.18.0-376.el8, which has been synced
to the upstream recently ? Maybe this bug only existing in
old versions.
-- Xiubo
We use cephfs a the shared storage for our cluster, and
another way to reproduce it is to start many jobs that
execute something like
date > <path_to_some_dir>/$RANDOM
In this case there is no hanging, but all files in
path_to_some_dir are empty.
> d), could you enable the kernel debug log and set the
> debug_mds to 25 in MDSes and share the logs ?
As of this morning we began experiencing OSD cyclically
crashing with "heartbeat_map is_healthy ... had suicide
timed out" so the logs probably will have a lot of
unrelated stuff until we fix that issue. I'll let you know
when that happens
Vlad
On 4/24/22 23:40, Xiubo Li wrote:
Hi Vladimir,
This issue looks like the one I am working on now in [1],
which is also a infinitely stuck bug when creating a new
file and then writes something to it.
The issue [1] was caused by setting the max_mds > 1 and
enabling the inline_data and then create a file and then
write to it. It seems a deadlock in MDS vs kernel.
BTW, what's your setup for:
a), max_mds > 1 ?
b), inline_data enabled ?
c), how to reproduce it, could you provide the detail
steps ?
d), could you enable the kernel debug log and set the
debug_mds to 25 in MDSes and share the logs ?
[1] https://tracker.ceph.com/issues/55377
Thanks
BRs
-- Xiubo
On 4/25/22 5:25 AM, Vladimir Brik wrote:
Hello
We are experiencing an issue where, sometimes, when
users write to cephfs an empty file is created and then
the application hangs, seemingly indefinitely. I am
sometimes able to reproduce with dd.
Does anybody know what might be going on?
Some details:
- ceph health complains about 100+ slow metadata IOs
- CPU utilization of ceph-mds is low
- We have almost 200 kernel cephfs clients
- Cephfs metadata is stored on 3 OSDs that use NVMe
flash AICs
Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx