Re: CephFS deletion performance

"Yan, Zheng" <ukernel@xxxxxxxxx> · Tue, 17 Sep 2019 16:46:23 +0800

On Sat, Sep 14, 2019 at 8:57 PM Hector Martin <hector@xxxxxxxxxxxxxx> wrote:
>
> On 13/09/2019 16.25, Hector Martin wrote:
> > Is this expected for CephFS? I know data deletions are asynchronous, but
> > not being able to delete metadata/directories without an undue impact on
> > the whole filesystem performance is somewhat problematic.
>
> I think I'm getting a feeling for who the culprit is here. I just
> noticed that listing directories in a snapshot that were subsequently
> deleted *also* performs horribly, and kills cluster performance too.
>
> We just had a partial outage due to this; a snapshot+rsync triggered
> while a round of deletions were happening, and as far as I can tell,
> when it caught up to newly deleted files, MDS performance tanked as it
> repeatedly had to open stray dirs under the hood. In fact, the
> inode/dentry metrics (opened/closed) skyrocketed during that period,
> from the normal ~1Kops from multiple parallel rsyncs to ~15Kops.
>
> As I mentioned in a prior message to the list, we have ~570k stray files
> due to snapshots. It makes sense that deleting a directory/file means
> moving it to a stray directory (each holding ~57k files already), and
> accessing a deleted file via a snapshot means accessing the stray
> directory. Am I right in thinking that these operations are at least
> O(n) in the amount of strays, and in fact may iterate or otherwise touch
> every single file in the stray directories? (This would explain the
> sudden 15Kops spike in indoe/dentry activity). It seems that with such
> bloated stray dirs, anything that involves them under the scenes just
> make the MDS completely hiccup and grind away, affecting performance for
> any other clients.
>
> I guess at this point we'll have to drastically cut down the time span
> for which we keep CephFS snapshots. Maybe I'll move the snapshot history
> keeping to the backup target, at least then it won't affect production
> data. But since we plan on using the other cluster for production too
> eventually, that would mean we need to use multi-FS in order to isolate
> the workloads...
>

when a snapshoted directory is deleted, mds moves the directory into
to stray directory.  You have 57k strays, each time mds have a cache
miss for stray, mds needs to load a stray dirfrag. This is very
inefficient because a stray dirfrag contains lots of items, most items
are useless.

> --
> Hector Martin (hector@xxxxxxxxxxxxxx)
> Public Key: https://mrcn.st/pub
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com