"failed to open ino"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

with our Ceph Luminous CephFS, we're plaqued with "failed to open ino" messages. These don't seem to affect daily business (in terms of "file access"). (There's a backup performance issue that may eventually be related, but I'll report on that in a different thread.)

Our Ceph currently is at v12.2.1 (git.1507910930.aea79b8b7a), on OpenSUSE Leap 42.3. Three Ceph nodes, 12 HDD OSDs, two SSD OSDs, status "HEALTH_OK".

We have a single CephFS and two MDS (active/standby), metadata pool is on SSD OSDs, content is on HDD OSDs (all file stores). That CephFS is mounted by several clients (via kernel cephfs support, mostly kernel version 4.4.76) and via NFS (kernel nfsd on a kernel-mounted CephFS).

In the log of the active MDS, we currently see the following two inodes reported over and over again, about every 30 seconds:

--- cut here ---
2017-11-24 18:24:16.496397 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e45e1d err -22/0 2017-11-24 18:24:16.497037 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e4d6a1 err -22/-22 2017-11-24 18:24:16.500645 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e45e1d err -22/0 2017-11-24 18:24:16.501218 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e4d6a1 err -22/-22 2017-11-24 18:24:46.506210 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e45e1d err -22/0 2017-11-24 18:24:46.506926 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e4d6a1 err -22/-22 2017-11-24 18:24:46.510354 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e45e1d err -22/0 2017-11-24 18:24:46.510891 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e4d6a1 err -22/-22
--- cut here ---

There were other reported inodes with other errors, too ("err -5/0", for instance), the root cause seems to be the same (see below).

For the 0x10001e4d6a1 inode, it was first mentioned in the MDS log as follows, and then every 30 seconds as "failed to open". 0x10001e45e1d just appeared at the same time, "out of the blue":

--- cut here ---
2017-11-23 19:12:28.440107 7f3586cbd700 0 mds.0.bal replicating dir [dir 0x10001e4d6a1 /some/path/in/our/cephfs/ [2,head] auth pv=3473 v=3471 cv=2963/2963 ap=1+6+6 state=1610612738|complete f(v0 m2017-11-23 19:12:25.005299 15=4+11) n(v74 rc2017-11-23 19:12:28.429258 b137317605 5935=4875+1060)/n(v74 rc2017-11-23 19:12:28.337259 b139723963 5969=4898+1071) hs=15+23,ss=0+0 dirty=30 | child=1 dirty=1 authpin=1 0x5649e811e9c0] pop 10223.1 .. rdp 373 adj 0 2017-11-23 19:15:55.015347 7f3580cb1700 0 mds.0.cache failed to open ino 0x10001e45e1d err -22/0 2017-11-23 19:15:55.016056 7f3580cb1700 0 mds.0.cache failed to open ino 0x10001e4d6a1 err -22/-22
--- cut here ---

According to a message from this ML, "replicating dir" is supposed to indicate that the directory is hot and being replicated to another active MDS to spread the load. But there is no other active MDS, as we have only two in total (and Ceph properly reports "mds: cephfs-1/1/1 up {0=server1=up:active}, 1 up:standby-replay".

From what I took from the logs, all error messages seem to be related to that same path "/some/path/in/our/cephfs/" all the time, which gets deleted and recreated every evening - thus changing ino numbers. But what can it be that makes that same path, buried far down in some cephfs hierarchy, cause these "failed to open ino" messages? It's just one of a lot of directories, and no particularly populated one (about 40 files and directories).

Another possibly interesting side fact: The previous ino's messages sometimes stop when our evening/night job ends (no more active accesses to that directory), sometimes they continue throughout the day. I've even noticed it crossing the directory recreation time (messages for old inode don't stop when the old directory is deleted, except for after a new "replicating dir" message appears). Today I deleted the directory (including its parent) during the day, but the messages wouldn't stop. The messages also survive switching to the other MDS. And by now the next daily instance of our job has started, recreating the directories (that I had deleted throughout the day). The "failed to open ino" messages for the old ino numbers still appears every 30 seconds, I've not yet seen any "replicating dir" message for any of that cephfs tree area. I have seen a few for other areas of the cephfs tree, but no other ino numbers were reported as "failed to open" - only the two from above:

--- cut here ---
2017-11-24 19:22:13.000050 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e45e1d err -22/0 2017-11-24 19:22:13.000770 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e4d6a1 err -22/-22 2017-11-24 19:22:13.003918 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e45e1d err -22/0 2017-11-24 19:22:13.004469 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e4d6a1 err -22/-22 2017-11-24 19:22:18.635372 7fa30ecfc700 0 mds.0.bal replicating dir [dir 0x1000113bcf8 /some/path/somewhere/else [2,head] auth v=101522 cv=101423/101423 ap=0+1+3 state=1610612738|complete f(v0 m2017-11-12 05:28:27.337616 95=0+95) n(v5624 rc2017-11-24 19:22:18.483203 b242650331 6165=4415+1750)/n(v5624 rc2017-11-24 19:22:18.147208 b243468738 6165=4415+1750) hs=95+0,ss=0+0 dirty=7 | child=1 dirty=1 waiter=0 authpin=0 0x55754c144000] pop 8182.78 .. rdp 102 adj 0 2017-11-24 19:22:43.008134 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e45e1d err -22/0 2017-11-24 19:22:43.008768 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e4d6a1 err -22/-22 2017-11-24 19:22:43.011780 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e45e1d err -22/0 2017-11-24 19:22:43.012355 7fa308cf0700 0 mds.0.cache failed to open ino 0x10001e4d6a1 err -22/-22
--- cut here ---

The affected directory looked fine via "ls -l" and I would be happy if you'd share any ideas of handling this situation.

Regards,
Jens

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux