Re: Recursive delete hangs on cephfs

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 17 Nov 2021 07:52:34 -0800

On Sat, Nov 13, 2021 at 5:25 PM Sasha Litvak
<alexander.v.litvak@xxxxxxxxx> wrote:
>
> I continued looking into the issue and have no idea what hinders the
> performance yet. However:
>
> 1. A client operating with kernel 5.3.0-42 (ubuntu 18.04) has no such
> problems.  I delete a directory with hashed subdirs (00 - ff) and total
> space taken by files ~707MB spread across those 256 in 3.25 s.

Recursive rm first requires the client to get capabilities on the
files in question, and the MDS to read that data off disk.
Newly-created directories will be cached, but old ones might not be.

So this might just be the consequence of having to do 256 serialized
disk lookups on hard drives. 3.25 seconds seems plausible to me.

The number of bytes isn't going to have any impact on how long it
takes to delete from the client side — that deletion is just marking
it in the MDS, and then the MDS does the object removals in the
background.
-Greg

>
> 2. A client operating with kernel 5.8.0-53 (ubuntu 20.04) processes a
> similar directory with less space taken ~ 530 MB spread across 256 subdirs
> in 11.2 s.
>
> 3.    Yet another client with kernel 5.4.156 has similar latency removing
> directories as in line 2.
>
> In all scenarios, mounts are set with the same options, i.e.
> noatime,secret-file,acl.
>
> Client 1 has luminous, client 2 has octopus, client 3 has nautilus.   While
> they are all on the same LAN, ceph -s on 2 and 3 returns in ~ 800 ms and on
> client in ~300 ms.
>
> Any ideas are appreciated,
>
>
>
>
>
>
>
>
>
> On Fri, Nov 12, 2021 at 8:44 PM Sasha Litvak <alexander.v.litvak@xxxxxxxxx>
> wrote:
>
> > The metadata pool is on the same type of drives as other pools; every node
> > uses SATA SSDs.  They are all read / write mix DC types.  Intel and Seagate.
> >
> > On Fri, Nov 12, 2021 at 8:02 PM Anthony D'Atri <anthony.datri@xxxxxxxxx>
> > wrote:
> >
> >> MDS RAM cache vs going to the metadata pool?  What type of drives is your
> >> metadata pool on?
> >>
> >> > On Nov 12, 2021, at 5:30 PM, Sasha Litvak <alexander.v.litvak@xxxxxxxxx>
> >> wrote:
> >> >
> >> > I am running Pacific 16.2.4 cluster and recently noticed that rm -rf
> >> > <dir-name> visibly hangs on the old directories.  Cluster is healthy,
> >> has a
> >> > light load, and any newly created directories deleted immediately (well
> >> rm
> >> > returns command prompt immediately).  The directories in question have
> >> 10 -
> >> > 20 small text files so nothing should be slow when removing them.
> >> >
> >> > I wonder if someone can please give me a hint on where to start
> >> > troubleshooting as I see no "big bad bear" yet.
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx