Re: MDS in read-only mode

Wido den Hollander <wido@xxxxxxxx> · Mon, 8 Aug 2016 12:51:30 +0200 (CEST)

> Op 8 augustus 2016 om 12:49 schreef John Spray <jspray@xxxxxxxxxx>:
> 
> 
> On Mon, Aug 8, 2016 at 9:26 AM, Dmitriy Lysenko <tavx@xxxxxxxxxx> wrote:
> > Good day.
> >
> > My CephFS switched to read only
> > This problem was previously on Hammer, but i recreated cephfs, upgraded to Jewel and problem was solved, but appeared after some time.
> >
> > ceph.log
> > 2016-08-07 18:11:31.226960 mon.0 192.168.13.100:6789/0 148601 : cluster [INF] HEALTH_WARN; mds0: MDS in read-only mode
> >
> > ceph-mds.log:
> > 2016-08-07 18:10:58.699731 7f9fa2ba6700  1 mds.0.cache.dir(10000000afe) commit error -22 v 1
> > 2016-08-07 18:10:58.699755 7f9fa2ba6700 -1 log_channel(cluster) log [ERR] : failed to commit dir 10000000afe object, errno -22
> > 2016-08-07 18:10:58.699763 7f9fa2ba6700 -1 mds.0.2271 unhandled write error (22) Invalid argument, force readonly...
> > 2016-08-07 18:10:58.699773 7f9fa2ba6700  1 mds.0.cache force file system read-only
> > 2016-08-07 18:10:58.699777 7f9fa2ba6700  0 log_channel(cluster) log [WRN] : force file system read-only
> 
> The MDS is going read only because it received an error (22, aka
> EINVAL) from an OSD when trying to write a metadata object.  You need
> to investigate why the error occurred.  Are your OSDs using the same
> Ceph version as your MDS?  Look in the OSD logs for the time at which
> the error happened to see if there is more detail about why.
> 

You might want to add this to the mds config:

debug_rados = 20

That should show you which RADOS operations it is performing and you can also figure out which one failed.

Like John said, might be a issue with a specific OSD.

Wido

> The readonly flag will clear if you restart your MDS (but it will get
> set again if it keeps encountering errors writing to OSDs)
> 
> John
> 
> > I founded this object:
> > $ rados --pool metadata ls | grep 10000000afe
> > 10000000afe.00000000
> >
> > and successfully got it:
> > $ rados --pool metadata get 10000000afe.00000000 obj
> > $ echo $?
> > 0
> >
> > How to switchout MDS from readonly mode?
> > Are there any tools to test the CephFS system for errors?
> >
> > $ ceph -v
> > ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
> >
> > $ ceph fs ls
> > name: cephfs, metadata pool: metadata, data pools: [data ]
> >
> > $ ceph mds stat
> > e2283: 1/1/1 up {0=drop-03=up:active}, 3 up:standby
> >
> > $ ceph osd lspools
> > 0 data,1 metadata,6 one,
> >
> > $ ceph osd dump | grep 'replicated size'
> > pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 45647 crash_replay_interval 45 min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
> > pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 45649 min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
> > pool 6 'one' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 53462 flags hashpspool min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
> >
> >
> > Thank you for help.
> >
> > --
> > Dmitry Lysenko
> > ISP Sovtest, Kursk, Russia
> > jabber: tavx@xxxxxxxxxxxxxxxxx
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com