Re: cephfs hangs on writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 4/27/22 3:40 AM, Vladimir Brik wrote:
I am attaching MDS log with debug set to 25 of a time period (few seconds' worth) during which a dd command got stuck (it never got unstuck) and resulted in an empty file. I am guessing it was able to create the file but was blocked from writing to it.

Sounds a deadlock too.


The command was "dd if=/dev/zero of=xxxxxxxx bs=100M count=1" in /mnt/ceph1-npx/user/vbrik/ (on the client) which corresponds to /npx/user/vbrik/ in cephfs.

The client is running Centos 7.9 with 3.10.0-1160.62.1.el7.x86_64 kernel. This is the highest kernel version I can test on Centos7. It seems Alma 8.5 is not affected by this issue (it has other issues though), but we need cephfs to work on Centos7.

Seems only existing in old kernel.

Thanks for the logs, will check it later.

-- Xiubo


This is Ceph 16.2.7.


Vlad



On 4/25/22 23:56, Xiubo Li wrote:

On 4/26/22 2:06 AM, Vladimir Brik wrote:
> a), max_mds > 1 ?
No, but I had tried it in the past (i.e. set max_mds to 2, and then reverted back to 1)

> b), inline_data enabled ?
No

Okay, this is a different bug.


> c), how to reproduce it, could you provide the detail steps ?
Sometimes, but not always, something like this will hang:
dd if=/dev/zero of=zero bs=100M count=1

I am using the upstream code and have created thousands of file by using dd command, but couldn't reproduce it.

Could you try kernel-4.18.0-376.el8, which has been synced to the upstream recently ? Maybe this bug only existing in old versions.

-- Xiubo


We use cephfs a the shared storage for our cluster, and another way to reproduce it is to start many jobs that execute something like
date > <path_to_some_dir>/$RANDOM
In this case there is no hanging, but all files in path_to_some_dir are empty.

> d), could you enable the kernel debug log and set the
> debug_mds to 25 in MDSes and share the logs ?
As of this morning we began experiencing OSD cyclically crashing with "heartbeat_map is_healthy ... had suicide timed out" so the logs probably will have a lot of unrelated stuff until we fix that issue. I'll let you know when that happens


Vlad


On 4/24/22 23:40, Xiubo Li wrote:
Hi Vladimir,

This issue looks like the one I am working on now in [1], which is also a infinitely stuck bug when creating a new file and then writes something to it.

The issue [1] was caused by setting the max_mds > 1 and enabling the inline_data and then create a file and then write to it. It seems a deadlock in MDS vs kernel.


BTW, what's your setup for:

a), max_mds > 1 ?

b), inline_data enabled ?

c), how to reproduce it, could you provide the detail steps ?

d), could you enable the kernel debug log and set the debug_mds to 25 in MDSes and share the logs ?


[1] https://tracker.ceph.com/issues/55377

Thanks

BRs

-- Xiubo



On 4/25/22 5:25 AM, Vladimir Brik wrote:
Hello

We are experiencing an issue where, sometimes, when users write to cephfs an empty file is created and then the application hangs, seemingly indefinitely. I am sometimes able to reproduce with dd.

Does anybody know what might be going on?

Some details:
- ceph health complains about 100+ slow metadata IOs
- CPU utilization of ceph-mds is low
- We have almost 200 kernel cephfs clients
- Cephfs metadata is stored on 3 OSDs that use NVMe flash AICs


Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx





_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux