Re: CephFS: No space left on device

Mykola Dvornik <mykola.dvornik@xxxxxxxxx> · Wed, 5 Oct 2016 08:27:21 +0200

Hi Zheng,

Many thanks for you reply.

This indicates the MDS metadata is corrupted. Did you do any unusual

operation on the cephfs? (e.g reset journal, create new fs using

existing metadata pool)

No, nothing has been explicitly done to the MDS. I had a few inconsistent PGs that belonged to the (3 replica) metadata pool. The symptoms were similar to http://tracker.ceph.com/issues/17177 . The PGs were eventually repaired and no data corruption was expected as explained in the ticket.

BTW, when I posted this issue on the ML the amount of ground state stry objects was around 7.5K. Now it went up to 23K. No inconsistent PGs or any other problems happened to the cluster within this time scale.

-Mykola

On 5 October 2016 at 05:49, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
On Mon, Oct 3, 2016 at 5:48 AM, Mykola Dvornik <mykola.dvornik@xxxxxxxxx> wrote:

> Hi Johan,

>

> Many thanks for your reply. I will try to play with the mds tunables and

> report back to your ASAP.

>

> So far I see that mds log contains a lot of errors of the following kind:

>

> 2016-10-02 11:58:03.002769 7f8372d54700  0 mds.0.cache.dir(100056ddecd)

> _fetched  badness: got (but i already had) [inode 10005729a77 [2,head]

> ~mds0/stray1/10005729a77 auth v67464942 s=196728 nl=0 n(v0 b196728 1=1+0)

> (iversion lock) 0x7f84acae82a0] mode 33204 mtime 2016-08-07 23:06:29.776298

>

> 2016-10-02 11:58:03.002789 7f8372d54700 -1 log_channel(cluster) log [ERR] :

> loaded dup inode 10005729a77 [2,head] v68621 at

> /users/mykola/mms/NCSHNO/final/120nm-uniform-h8200/j002654.out/m_xrange192-320_yrange192-320_016232.dump,

> but inode 10005729a77.head v67464942 already exists at

> ~mds0/stray1/10005729a77

This indicates the MDS metadata is corrupted. Did you do any unusual

operation on the cephfs? (e.g reset journal, create new fs using

existing metadata pool)

>

> Those folders within mds.0.cache.dir that got badness report a size of 16EB

> on the clients. rm on them fails with 'Directory not empty'.

>

> As for the "Client failing to respond to cache pressure", I have 2 kernel

> clients on 4.4.21, 1 on 4.7.5 and 16 fuse clients always running the most

> recent release version of ceph-fuse. The funny thing is that every single

> client misbehaves from time to time. I am aware of quite discussion about

> this issue on the ML, but cannot really follow how to debug it.

>

> Regards,

>

> -Mykola

>

> On 2 October 2016 at 22:27, John Spray <jspray@xxxxxxxxxx> wrote:

>>

>> On Sun, Oct 2, 2016 at 11:09 AM, Mykola Dvornik

>> <mykola.dvornik@xxxxxxxxx> wrote:

>> > After upgrading to 10.2.3 we frequently see messages like

>>

>> From which version did you upgrade?

>>

>> > 'rm: cannot remove '...': No space left on device

>> >

>> > The folders we are trying to delete contain approx. 50K files 193 KB

>> > each.

>>

>> My guess would be that you are hitting the new

>> mds_bal_fragment_size_max check.  This limits the number of entries

>> that the MDS will create in a single directory fragment, to avoid

>> overwhelming the OSD with oversized objects.  It is 100000 by default.

>> This limit also applies to "stray" directories where unlinked files

>> are put while they wait to be purged, so you could get into this state

>> while doing lots of deletions.  There are ten stray directories that

>> get a roughly even share of files, so if you have more than about one

>> million files waiting to be purged, you could see this condition.

>>

>> The "Client failing to respond to cache pressure" messages may play a

>> part here -- if you have misbehaving clients then they may cause the

>> MDS to delay purging stray files, leading to a backlog.  If your

>> clients are by any chance older kernel clients, you should upgrade

>> them.  You can also unmount/remount them to clear this state, although

>> it will reoccur until the clients are updated (or until the bug is

>> fixed, if you're running latest clients already).

>>

>> The high level counters for strays are part of the default output of

>> "ceph daemonperf mds.<id>" when run on the MDS server (the "stry" and

>> "purg" columns).  You can look at these to watch how fast the MDS is

>> clearing out strays.  If your backlog is just because it's not doing

>> it fast enough, then you can look at tuning mds_max_purge_files and

>> mds_max_purge_ops to adjust the throttles on purging.  Those settings

>> can be adjusted without restarting the MDS using the "injectargs"

>> command

>> (http://docs.ceph.com/docs/master/rados/operations/control/#mds-subsystem)

>>

>> Let us know how you get on.

>>

>> John

>>

>>

>> > The cluster state and storage available are both OK:

>> >

>> >     cluster 98d72518-6619-4b5c-b148-9a781ef13bcb

>> >      health HEALTH_WARN

>> >             mds0: Client XXX.XXX.XXX.XXX failing to respond to cache

>> > pressure

>> >             mds0: Client XXX.XXX.XXX.XXX failing to respond to cache

>> > pressure

>> >             mds0: Client XXX.XXX.XXX.XXX failing to respond to cache

>> > pressure

>> >             mds0: Client XXX.XXX.XXX.XXX failing to respond to cache

>> > pressure

>> >             mds0: Client XXX.XXX.XXX.XXX failing to respond to cache

>> > pressure

>> >      monmap e1: 1 mons at {000-s-ragnarok=XXX.XXX.XXX.XXX:6789/0}

>> >             election epoch 11, quorum 0 000-s-ragnarok

>> >       fsmap e62643: 1/1/1 up {0=000-s-ragnarok=up:active}

>> >      osdmap e20203: 16 osds: 16 up, 16 in

>> >             flags sortbitwise

>> >       pgmap v15284654: 1088 pgs, 2 pools, 11263 GB data, 40801 kobjects

>> >             23048 GB used, 6745 GB / 29793 GB avail

>> >                 1085 active+clean

>> >                    2 active+clean+scrubbing

>> >                    1 active+clean+scrubbing+deep

>> >

>> >

>> > Has anybody experienced this issue so far?

>> >

>> > Regards,

>> > --

>> >  Mykola

>> >

>> > _______________________________________________

>> > ceph-users mailing list

>> > ceph-users@xxxxxxxxxxxxxx

>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >

>

>

>

>

> --

>  Mykola

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

-- 
 Mykola 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com