Re: Recursive delete hangs on cephfs

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 22 Nov 2021 15:46:21 -0500

Oh, I misread your initial email and thought you were on hard drives.
These do seem slow for SSDs.

You could try tracking down where the time is spent; perhaps run
strace and see which calls are taking a while, and go through the op
tracker on the MDS and see if it has anything that's obviously taking
a long time.
-Greg

On Wed, Nov 17, 2021 at 8:00 PM Sasha Litvak
<alexander.v.litvak@xxxxxxxxx> wrote:
>
> Gregory,
> Thank you for your reply, I do understand that a number of serialized lookups may take time.  However if 3.25 sec is OK,  11.2 seconds sounds long, and I had once removed a large subdirectory which took over 20 minutes to complete.  I attempted to use nowsync mount option with kernel 5.15 and it seems to hide latency (i.e. it is almost immediately returns prompt after recursive directory removal.  However, I am not sure whether nowsync is safe to use with kernel >= 5.8.  I also have kernel 5.3 on one of the client clusters and nowsync there is not supported, however all rm operations happen reasonably fast.  So the second question is, does 5.3's libceph behave differently on recursing rm compared to 5.4 or 5.8?
>
>
> On Wed, Nov 17, 2021 at 9:52 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>>
>> On Sat, Nov 13, 2021 at 5:25 PM Sasha Litvak
>> <alexander.v.litvak@xxxxxxxxx> wrote:
>> >
>> > I continued looking into the issue and have no idea what hinders the
>> > performance yet. However:
>> >
>> > 1. A client operating with kernel 5.3.0-42 (ubuntu 18.04) has no such
>> > problems.  I delete a directory with hashed subdirs (00 - ff) and total
>> > space taken by files ~707MB spread across those 256 in 3.25 s.
>>
>> Recursive rm first requires the client to get capabilities on the
>> files in question, and the MDS to read that data off disk.
>> Newly-created directories will be cached, but old ones might not be.
>>
>> So this might just be the consequence of having to do 256 serialized
>> disk lookups on hard drives. 3.25 seconds seems plausible to me.
>>
>> The number of bytes isn't going to have any impact on how long it
>> takes to delete from the client side — that deletion is just marking
>> it in the MDS, and then the MDS does the object removals in the
>> background.
>> -Greg
>>
>> >
>> > 2. A client operating with kernel 5.8.0-53 (ubuntu 20.04) processes a
>> > similar directory with less space taken ~ 530 MB spread across 256 subdirs
>> > in 11.2 s.
>> >
>> > 3.    Yet another client with kernel 5.4.156 has similar latency removing
>> > directories as in line 2.
>> >
>> > In all scenarios, mounts are set with the same options, i.e.
>> > noatime,secret-file,acl.
>> >
>> > Client 1 has luminous, client 2 has octopus, client 3 has nautilus.   While
>> > they are all on the same LAN, ceph -s on 2 and 3 returns in ~ 800 ms and on
>> > client in ~300 ms.
>> >
>> > Any ideas are appreciated,
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Nov 12, 2021 at 8:44 PM Sasha Litvak <alexander.v.litvak@xxxxxxxxx>
>> > wrote:
>> >
>> > > The metadata pool is on the same type of drives as other pools; every node
>> > > uses SATA SSDs.  They are all read / write mix DC types.  Intel and Seagate.
>> > >
>> > > On Fri, Nov 12, 2021 at 8:02 PM Anthony D'Atri <anthony.datri@xxxxxxxxx>
>> > > wrote:
>> > >
>> > >> MDS RAM cache vs going to the metadata pool?  What type of drives is your
>> > >> metadata pool on?
>> > >>
>> > >> > On Nov 12, 2021, at 5:30 PM, Sasha Litvak <alexander.v.litvak@xxxxxxxxx>
>> > >> wrote:
>> > >> >
>> > >> > I am running Pacific 16.2.4 cluster and recently noticed that rm -rf
>> > >> > <dir-name> visibly hangs on the old directories.  Cluster is healthy,
>> > >> has a
>> > >> > light load, and any newly created directories deleted immediately (well
>> > >> rm
>> > >> > returns command prompt immediately).  The directories in question have
>> > >> 10 -
>> > >> > 20 small text files so nothing should be slow when removing them.
>> > >> >
>> > >> > I wonder if someone can please give me a hint on where to start
>> > >> > troubleshooting as I see no "big bad bear" yet.
>> > >> > _______________________________________________
>> > >> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> > >>
>> > >>
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >
>>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx