Re: MDS internal op exportdir despite ephemeral pinning

Frank Schilder <frans@xxxxxx> · Thu, 17 Nov 2022 10:21:03 +0000

Hi Patrick,

sorry for the mail flood. The reason I'm asking is that I always see these pairs of warnings:

slow request 34.592600 seconds old, received at 2022-11-17T10:44:39.650761+0100: internal op exportdir:mds.3:15730122 currently failed to wrlock, waiting
slow request 41.092127 seconds old, received at 2022-11-17T10:44:39.651173+0100: rejoin:mds.3:15730122 currently dispatched

The rejoin is worrying me, because it indicates that an active directory fragment has been migrated (a client connection has been moved from one to another MDS). However, active fragments can only be deeper in the directory tree, which in turn should be pinned to a rank and not move. That's why I would really like to know what directories are moved around.

Thanks and best regards!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 17 November 2022 10:45:20
To: Patrick Donnelly
Cc: ceph-users@xxxxxxx
Subject:  Re: MDS internal op exportdir despite ephemeral pinning

Hi Patrick,

thanks for your explanation. Is there a way to check which directory is exported? For example, is the inode contained in the messages somewhere? A readdir would usually happen on log-in and the number of slow exports seems much higher than the number of people logging in (I would assume there are a lot more that go without logging).

Also, does an export happen for every client connection? For example, we have a 500+ node HPC cluster with kernel mounts. If a job starts on a dir that needs to be loaded to cache, would such an export happen for every client node (we do dropcaches on client nodes after job completion, so there is potential for reloading data)?

Thanks a lot!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Patrick Donnelly <pdonnell@xxxxxxxxxx>
Sent: 16 November 2022 22:50:22
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  MDS internal op exportdir despite ephemeral pinning

Hello Frank,

On Wed, Nov 16, 2022 at 5:38 AM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi all,
>
> I have a question about ephemeral pinning on octopus latest. We have ephemeral pinning set on all directories that are mounted (well on all their parents), like /home etc. Every mount point of a ceph file system should, therefore, be pinned to a specific and fixed MDS rank. However, in the log I see a lot of slow ops warnings like this one:
>
> slow request 33.765074 seconds old, received at 2022-11-16T11:30:28.340294+0100: internal op exportdir:mds.0:34770855 currently failed to wrlock, waiting
>
> I don't understand why MDSes still export directories between each other. Am I misunderstanding the warning? What is happening here and why are these ops there? Does this point to a config problem?

It may be whatever /home/X directory was pruned from the cache,
someone did /readdir on that directory thereby loading it into cache,
then the MDS authoritative for /home (probably 0?) exported that
directory to wherever it should go.

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx