Re: cephfs automatic data pool cleanup

"Jens-U. Mozdzen" <jmozdzen@xxxxxx> · Wed, 13 Dec 2017 16:02:49 +0000

Hi John,

Zitat von John Spray <jspray@xxxxxxxxxx>:
On Wed, Dec 13, 2017 at 2:11 PM, Jens-U. Mozdzen <jmozdzen@xxxxxx> wrote:
[...]
Then we had one of the nodes crash for a lack of memory (MDS was > 12 GB,
plus the new Bluestore OSD and probably the 12.2.1 BlueStore memory leak).

We brought the node back online and at first had MDS report an inconsistent
file system, though no other errors were reported. Once we restarted the
other MDS (by then active MDS on another node), that problem went away, too,
and we were back online. We did not restart clients, neither CephFS mounts
nor rbd clients.

I'm curious about the "MDS report an inconsistent file system" part --
what exactly was the error you were seeing?

my apologies, being off-site I mixed up messages. It wasn't about  
inconsistencies, but FS_DEGRADED.

When the failed node came back online (and Ceph then had recovered all  
objects problems after bringing the OSDs online), "ceph -s" reported  
"1 filesystem is degraded" and "ceph health detail" did also show just  
this error. At that time, both MDS were up and the MDS on the  
surviving node was the active MDS.

Once I restarted the MDS on the surviving node, FS_DEGRADED was cleared:

--- cut here ---
2017-12-07 19:05:33.113619 mon.node01 mon.0 192.168.160.15:6789/0 243  
: cluster [WRN] overall HEALTH_WARN 1 filesystem is degraded; noout  
flag(s) set; 1 nearfull osd(s)
2017-12-07 19:06:33.113826 mon.node01 mon.0 192.168.160.15:6789/0 298  
: cluster [INF] mon.1 192.168.160.16:6789/0
2017-12-07 19:06:33.113923 mon.node01 mon.0 192.168.160.15:6789/0 299  
: cluster [INF] mon.2 192.168.160.17:6789/0
2017-12-07 19:11:16.997308 mon.node01 mon.0 192.168.160.15:6789/0 541  
: cluster [INF] Standby daemon mds.node01 assigned to filesystem  
cephfs as rank 0
2017-12-07 19:11:16.997446 mon.node01 mon.0 192.168.160.15:6789/0 542  
: cluster [WRN] Health check failed: insufficient standby MDS daemons  
available (MDS_INSUFFICIENT_STANDBY)
2017-12-07 19:11:20.968933 mon.node01 mon.0 192.168.160.15:6789/0 553  
: cluster [INF] Health check cleared: MDS_INSUFFICIENT_STANDBY (was:  
insufficient standby MDS daemons available)
2017-12-07 19:11:33.113816 mon.node01 mon.0 192.168.160.15:6789/0 565  
: cluster [INF] mon.1 192.168.160.16:6789/0
2017-12-07 19:11:33.114958 mon.node01 mon.0 192.168.160.15:6789/0 566  
: cluster [INF] mon.2 192.168.160.17:6789/0
2017-12-07 19:12:09.889106 mon.node01 mon.0 192.168.160.15:6789/0 598  
: cluster [INF] daemon mds.node01 is now active in filesystem cephfs  
as rank 0
2017-12-07 19:12:09.983442 mon.node01 mon.0 192.168.160.15:6789/0 599  
: cluster [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem  
is degraded)

No other errors/warnings were obvious. The "insufficient standby" at  
19:11:16.997308 is likely caused by the restart of the MDS at node2.

Regards,
Jens

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com