Re: cephfs automatic data pool cleanup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi John,

Zitat von John Spray <jspray@xxxxxxxxxx>:
On Wed, Dec 13, 2017 at 2:11 PM, Jens-U. Mozdzen <jmozdzen@xxxxxx> wrote:
[...]
Then we had one of the nodes crash for a lack of memory (MDS was > 12 GB,
plus the new Bluestore OSD and probably the 12.2.1 BlueStore memory leak).

We brought the node back online and at first had MDS report an inconsistent
file system, though no other errors were reported. Once we restarted the
other MDS (by then active MDS on another node), that problem went away, too,
and we were back online. We did not restart clients, neither CephFS mounts
nor rbd clients.

I'm curious about the "MDS report an inconsistent file system" part --
what exactly was the error you were seeing?

my apologies, being off-site I mixed up messages. It wasn't about inconsistencies, but FS_DEGRADED.

When the failed node came back online (and Ceph then had recovered all objects problems after bringing the OSDs online), "ceph -s" reported "1 filesystem is degraded" and "ceph health detail" did also show just this error. At that time, both MDS were up and the MDS on the surviving node was the active MDS.

Once I restarted the MDS on the surviving node, FS_DEGRADED was cleared:

--- cut here ---
2017-12-07 19:05:33.113619 mon.node01 mon.0 192.168.160.15:6789/0 243 : cluster [WRN] overall HEALTH_WARN 1 filesystem is degraded; noout flag(s) set; 1 nearfull osd(s) 2017-12-07 19:06:33.113826 mon.node01 mon.0 192.168.160.15:6789/0 298 : cluster [INF] mon.1 192.168.160.16:6789/0 2017-12-07 19:06:33.113923 mon.node01 mon.0 192.168.160.15:6789/0 299 : cluster [INF] mon.2 192.168.160.17:6789/0 2017-12-07 19:11:16.997308 mon.node01 mon.0 192.168.160.15:6789/0 541 : cluster [INF] Standby daemon mds.node01 assigned to filesystem cephfs as rank 0 2017-12-07 19:11:16.997446 mon.node01 mon.0 192.168.160.15:6789/0 542 : cluster [WRN] Health check failed: insufficient standby MDS daemons available (MDS_INSUFFICIENT_STANDBY) 2017-12-07 19:11:20.968933 mon.node01 mon.0 192.168.160.15:6789/0 553 : cluster [INF] Health check cleared: MDS_INSUFFICIENT_STANDBY (was: insufficient standby MDS daemons available) 2017-12-07 19:11:33.113816 mon.node01 mon.0 192.168.160.15:6789/0 565 : cluster [INF] mon.1 192.168.160.16:6789/0 2017-12-07 19:11:33.114958 mon.node01 mon.0 192.168.160.15:6789/0 566 : cluster [INF] mon.2 192.168.160.17:6789/0 2017-12-07 19:12:09.889106 mon.node01 mon.0 192.168.160.15:6789/0 598 : cluster [INF] daemon mds.node01 is now active in filesystem cephfs as rank 0 2017-12-07 19:12:09.983442 mon.node01 mon.0 192.168.160.15:6789/0 599 : cluster [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is degraded)

No other errors/warnings were obvious. The "insufficient standby" at 19:11:16.997308 is likely caused by the restart of the MDS at node2.

Regards,
Jens

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux