Re: MDS internal op exportdir despite ephemeral pinning

Frank Schilder <frans@xxxxxx> · Fri, 18 Nov 2022 17:51:14 +0000

Hi Patrick,

thanks! I did the following but don't know how to interpret the result. The three directories we have ephemeral pinning set are:

/shares
/hpc/home
/hpc/groups

If I understand the documentation correctly, everything under /hpc/home/user should be on the same MDS. Trying it out I get (user-name obscured):

# for mds in $(bin/active_mds); do
  echo -n "${mds}: "
  ceph tell mds.$mds get subtrees | grep '"/hpc/home/user' | wc -l
done 2>/dev/null
ceph-13: 14
ceph-16: 2
ceph-14: 2
ceph-08: 14
ceph-17: 0
ceph-11: 6
ceph-12: 14
ceph-10: 14

Its all over the place. Could you please help me with how I should interpret this?

In the data returned, I can see fields like

        "export_pin": -1,
        "distributed_ephemeral_pin": false,
        "random_ephemeral_pin": false,
        "ephemeral_pin": 6,

This is on an immediate child of /hpc/home. The field distributed_ephemeral_pin says false. If I look at the field ephemeral_pin its all over the place for sub-trees on a single MDS as well. Picking the entry /hpc/home itself, I get

        "export_pin": -1,
        "distributed_ephemeral_pin": false,
        "random_ephemeral_pin": false,
        "ephemeral_pin": 7,

Again, distributed_ephemeral_pin is false. I'm quite lost here. Is this expected? How do I check that ephemeral pinning does whet it should do? Just monitoring the output of commands is not enough, I also need to know how the correct output should look like. I would be grateful if you could provide this additional information.

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Patrick Donnelly <pdonnell@xxxxxxxxxx>
Sent: 18 November 2022 16:11:44
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  MDS internal op exportdir despite ephemeral pinning

On Thu, Nov 17, 2022 at 4:45 AM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi Patrick,
>
> thanks for your explanation. Is there a way to check which directory is exported? For example, is the inode contained in the messages somewhere? A readdir would usually happen on log-in and the number of slow exports seems much higher than the number of people logging in (I would assume there are a lot more that go without logging).

You can set debugging to 4 on the MDS and you should see messages for
each export. Or you can monitor subtrees on your MDS by periodically
running `get subtrees` command on each one.

> Also, does an export happen for every client connection? For example, we have a 500+ node HPC cluster with kernel mounts. If a job starts on a dir that needs to be loaded to cache, would such an export happen for every client node (we do dropcaches on client nodes after job completion, so there is potential for reloading data)?

The export only happens once the directory is loaded into cache.

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx