Re: CephFS: No space left on device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Is there any way to repair pgs/cephfs gracefully?

 

-Mykola

 

From: Yan, Zheng
Sent: Thursday, 6 October 2016 04:48
To: Mykola Dvornik
Cc: John Spray; ceph-users
Subject: Re: CephFS: No space left on device

 

On Wed, Oct 5, 2016 at 2:27 PM, Mykola Dvornik <mykola.dvornik@xxxxxxxxx> wrote:

> Hi Zheng,

> 

> Many thanks for you reply.

> 

> This indicates the MDS metadata is corrupted. Did you do any unusual

> operation on the cephfs? (e.g reset journal, create new fs using

> existing metadata pool)

> 

> No, nothing has been explicitly done to the MDS. I had a few inconsistent

> PGs that belonged to the (3 replica) metadata pool. The symptoms were

> similar to http://tracker.ceph.com/issues/17177 . The PGs were eventually

> repaired and no data corruption was expected as explained in the ticket.

> 

 

I'm afraid that issue does cause corruption.

 

> BTW, when I posted this issue on the ML the amount of ground state stry

> objects was around 7.5K. Now it went up to 23K. No inconsistent PGs or any

> other problems happened to the cluster within this time scale.

> 

> -Mykola

> 

> On 5 October 2016 at 05:49, Yan, Zheng <ukernel@xxxxxxxxx> wrote:

>> 

>> On Mon, Oct 3, 2016 at 5:48 AM, Mykola Dvornik <mykola.dvornik@xxxxxxxxx>

>> wrote:

>> > Hi Johan,

>> >

>> > Many thanks for your reply. I will try to play with the mds tunables and

>> > report back to your ASAP.

>> >

>> > So far I see that mds log contains a lot of errors of the following

>> > kind:

>> >

>> > 2016-10-02 11:58:03.002769 7f8372d54700  0 mds.0.cache.dir(100056ddecd)

>> > _fetched  badness: got (but i already had) [inode 10005729a77 [2,head]

>> > ~mds0/stray1/10005729a77 auth v67464942 s=196728 nl=0 n(v0 b196728

>> > 1=1+0)

>> > (iversion lock) 0x7f84acae82a0] mode 33204 mtime 2016-08-07

>> > 23:06:29.776298

>> >

>> > 2016-10-02 11:58:03.002789 7f8372d54700 -1 log_channel(cluster) log

>> > [ERR] :

>> > loaded dup inode 10005729a77 [2,head] v68621 at

>> >

>> > /users/mykola/mms/NCSHNO/final/120nm-uniform-h8200/j002654.out/m_xrange192-320_yrange192-320_016232.dump,

>> > but inode 10005729a77.head v67464942 already exists at

>> > ~mds0/stray1/10005729a77

>> 

>> This indicates the MDS metadata is corrupted. Did you do any unusual

>> operation on the cephfs? (e.g reset journal, create new fs using

>> existing metadata pool)

>> 

>> >

>> > Those folders within mds.0.cache.dir that got badness report a size of

>> > 16EB

>> > on the clients. rm on them fails with 'Directory not empty'.

>> >

>> > As for the "Client failing to respond to cache pressure", I have 2

>> > kernel

>> > clients on 4.4.21, 1 on 4.7.5 and 16 fuse clients always running the

>> > most

>> > recent release version of ceph-fuse. The funny thing is that every

>> > single

>> > client misbehaves from time to time. I am aware of quite discussion

>> > about

>> > this issue on the ML, but cannot really follow how to debug it.

>> >

>> > Regards,

>> >

>> > -Mykola

>> >

>> > On 2 October 2016 at 22:27, John Spray <jspray@xxxxxxxxxx> wrote:

>> >>

>> >> On Sun, Oct 2, 2016 at 11:09 AM, Mykola Dvornik

>> >> <mykola.dvornik@xxxxxxxxx> wrote:

>> >> > After upgrading to 10.2.3 we frequently see messages like

>> >>

>> >> From which version did you upgrade?

>> >>

>> >> > 'rm: cannot remove '...': No space left on device

>> >> >

>> >> > The folders we are trying to delete contain approx. 50K files 193 KB

>> >> > each.

>> >>

>> >> My guess would be that you are hitting the new

>> >> mds_bal_fragment_size_max check.  This limits the number of entries

>> >> that the MDS will create in a single directory fragment, to avoid

>> >> overwhelming the OSD with oversized objects.  It is 100000 by default.

>> >> This limit also applies to "stray" directories where unlinked files

>> >> are put while they wait to be purged, so you could get into this state

>> >> while doing lots of deletions.  There are ten stray directories that

>> >> get a roughly even share of files, so if you have more than about one

>> >> million files waiting to be purged, you could see this condition.

>> >>

>> >> The "Client failing to respond to cache pressure" messages may play a

>> >> part here -- if you have misbehaving clients then they may cause the

>> >> MDS to delay purging stray files, leading to a backlog.  If your

>> >> clients are by any chance older kernel clients, you should upgrade

>> >> them.  You can also unmount/remount them to clear this state, although

>> >> it will reoccur until the clients are updated (or until the bug is

>> >> fixed, if you're running latest clients already).

>> >>

>> >> The high level counters for strays are part of the default output of

>> >> "ceph daemonperf mds.<id>" when run on the MDS server (the "stry" and

>> >> "purg" columns).  You can look at these to watch how fast the MDS is

>> >> clearing out strays.  If your backlog is just because it's not doing

>> >> it fast enough, then you can look at tuning mds_max_purge_files and

>> >> mds_max_purge_ops to adjust the throttles on purging.  Those settings

>> >> can be adjusted without restarting the MDS using the "injectargs"

>> >> command

>> >>

>> >> (http://docs.ceph.com/docs/master/rados/operations/control/#mds-subsystem)

>> >>

>> >> Let us know how you get on.

>> >>

>> >> John

>> >>

>> >>

>> >> > The cluster state and storage available are both OK:

>> >> >

>> >> >     cluster 98d72518-6619-4b5c-b148-9a781ef13bcb

>> >> >      health HEALTH_WARN

>> >> >             mds0: Client XXX.XXX.XXX.XXX failing to respond to cache

>> >> > pressure

>> >> >             mds0: Client XXX.XXX.XXX.XXX failing to respond to cache

>> >> > pressure

>> >> >             mds0: Client XXX.XXX.XXX.XXX failing to respond to cache

>> >> > pressure

>> >> >             mds0: Client XXX.XXX.XXX.XXX failing to respond to cache

>> >> > pressure

>> >> >             mds0: Client XXX.XXX.XXX.XXX failing to respond to cache

>> >> > pressure

>> >> >      monmap e1: 1 mons at {000-s-ragnarok=XXX.XXX.XXX.XXX:6789/0}

>> >> >             election epoch 11, quorum 0 000-s-ragnarok

>> >> >       fsmap e62643: 1/1/1 up {0=000-s-ragnarok=up:active}

>> >> >      osdmap e20203: 16 osds: 16 up, 16 in

>> >> >             flags sortbitwise

>> >> >       pgmap v15284654: 1088 pgs, 2 pools, 11263 GB data, 40801

>> >> > kobjects

>> >> >             23048 GB used, 6745 GB / 29793 GB avail

>> >> >                 1085 active+clean

>> >> >                    2 active+clean+scrubbing

>> >> >                    1 active+clean+scrubbing+deep

>> >> >

>> >> >

>> >> > Has anybody experienced this issue so far?

>> >> >

>> >> > Regards,

>> >> > --

>> >> >  Mykola

>> >> >

>> >> > _______________________________________________

>> >> > ceph-users mailing list

>> >> > ceph-users@xxxxxxxxxxxxxx

>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >> >

>> >

>> >

>> >

>> >

>> > --

>> >  Mykola

>> >

>> > _______________________________________________

>> > ceph-users mailing list

>> > ceph-users@xxxxxxxxxxxxxx

>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >

> 

> 

> 

> 

> --

>  Mykola

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux