On 4/27/22 3:40 AM, Vladimir Brik wrote:
I am attaching MDS log with debug set to 25 of a time period (few
seconds' worth) during which a dd command got stuck (it never got
unstuck) and resulted in an empty file. I am guessing it was able to
create the file but was blocked from writing to it.
Sounds a deadlock too.
The command was "dd if=/dev/zero of=xxxxxxxx bs=100M count=1" in
/mnt/ceph1-npx/user/vbrik/ (on the client) which corresponds to
/npx/user/vbrik/ in cephfs.
The client is running Centos 7.9 with 3.10.0-1160.62.1.el7.x86_64
kernel. This is the highest kernel version I can test on Centos7. It
seems Alma 8.5 is not affected by this issue (it has other issues
though), but we need cephfs to work on Centos7.
Seems only existing in old kernel.
Thanks for the logs, will check it later.
-- Xiubo
This is Ceph 16.2.7.
Vlad
On 4/25/22 23:56, Xiubo Li wrote:
On 4/26/22 2:06 AM, Vladimir Brik wrote:
> a), max_mds > 1 ?
No, but I had tried it in the past (i.e. set max_mds to 2, and then
reverted back to 1)
> b), inline_data enabled ?
No
Okay, this is a different bug.
> c), how to reproduce it, could you provide the detail steps ?
Sometimes, but not always, something like this will hang:
dd if=/dev/zero of=zero bs=100M count=1
I am using the upstream code and have created thousands of file by
using dd command, but couldn't reproduce it.
Could you try kernel-4.18.0-376.el8, which has been synced to the
upstream recently ? Maybe this bug only existing in old versions.
-- Xiubo
We use cephfs a the shared storage for our cluster, and another way
to reproduce it is to start many jobs that execute something like
date > <path_to_some_dir>/$RANDOM
In this case there is no hanging, but all files in path_to_some_dir
are empty.
> d), could you enable the kernel debug log and set the
> debug_mds to 25 in MDSes and share the logs ?
As of this morning we began experiencing OSD cyclically crashing
with "heartbeat_map is_healthy ... had suicide timed out" so the
logs probably will have a lot of unrelated stuff until we fix that
issue. I'll let you know when that happens
Vlad
On 4/24/22 23:40, Xiubo Li wrote:
Hi Vladimir,
This issue looks like the one I am working on now in [1], which is
also a infinitely stuck bug when creating a new file and then
writes something to it.
The issue [1] was caused by setting the max_mds > 1 and enabling
the inline_data and then create a file and then write to it. It
seems a deadlock in MDS vs kernel.
BTW, what's your setup for:
a), max_mds > 1 ?
b), inline_data enabled ?
c), how to reproduce it, could you provide the detail steps ?
d), could you enable the kernel debug log and set the debug_mds to
25 in MDSes and share the logs ?
[1] https://tracker.ceph.com/issues/55377
Thanks
BRs
-- Xiubo
On 4/25/22 5:25 AM, Vladimir Brik wrote:
Hello
We are experiencing an issue where, sometimes, when users write to
cephfs an empty file is created and then the application hangs,
seemingly indefinitely. I am sometimes able to reproduce with dd.
Does anybody know what might be going on?
Some details:
- ceph health complains about 100+ slow metadata IOs
- CPU utilization of ceph-mds is low
- We have almost 200 kernel cephfs clients
- Cephfs metadata is stored on 3 OSDs that use NVMe flash AICs
Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx