Re: Disk Down Emergency

Wido den Hollander <wido@xxxxxxxx> · Thu, 16 Nov 2017 14:54:32 +0100 (CET)

> Op 16 november 2017 om 14:46 schreef Caspar Smit <casparsmit@xxxxxxxxxxx>:
> 
> 
> 2017-11-16 14:43 GMT+01:00 Wido den Hollander <wido@xxxxxxxx>:
> 
> >
> > > Op 16 november 2017 om 14:40 schreef Georgios Dimitrakakis <
> > giorgis@xxxxxxxxxxxx>:
> > >
> > >
> > >  @Sean Redmond: No I don't have any unfound objects. I only have "stuck
> > >  unclean" with "active+degraded" status
> > >  @Caspar Smit: The cluster is scrubbing ...
> > >
> > >  @All: My concern is because of one copy left for the data on the failed
> > >  disk.
> > >
> >
> > Let the Ceph recovery do it's work. Don't do anything manually now.
> >
> >
> @Wido, i think his cluster might have stopped recovering because of
> non-optimal tunables in firefly.
> 

Ah, darn. Yes, that's been a long time ago. Could very well be the case.

He could try to remove osd.0 from the CRUSHMap and let recovery progress.

I would however advise him not to fiddle with the data on osd.0. Do not try to copy the data somewhere else and try to fix the OSD.

Wido

> 
> > >  If I just remove the OSD.0 from crush map does that copy all its data
> > >  from the only one available copy to the rest unaffected disks which will
> > >  consequently end in having again two copies on two different hosts?
> > >
> >
> > Do NOT copy the data from osd.0 to another OSD. Let the Ceph recovery
> > handle this.
> >
> > It is already marked as out and within 24 hours or so recovery will have
> > finished.
> >
> > But a few things:
> >
> > - Firefly 0.80.9 is old
> > - Never, never, never run with size=2
> >
> > Not trying to scare you, but it's a reality.
> >
> > Now let Ceph handle the rebalance and wait.
> >
> > Wido
> >
> > >  Best,
> > >
> > >  G.
> > >
> > >
> > > > 2017-11-16 14:05 GMT+01:00 Georgios Dimitrakakis :
> > > >
> > > >> Dear cephers,
> > > >>
> > > >> I have an emergency on a rather small ceph cluster.
> > > >>
> > > >> My cluster consists of 2 OSD nodes with 10 disks x4TB each and 3
> > > >> monitor nodes.
> > > >>
> > > >> The version of ceph running is Firefly v.0.80.9
> > > >> (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
> > > >>
> > > >> The cluster originally was build with "Replicated size=2" and "Min
> > > >> size=1" with the attached crush map,
> > > >> which in my understanding this replicates data across hosts.
> > > >>
> > > >> The emergency comes from the violation of the golden rule: "Never
> > > >> use 2 replicas on a production cluster"
> > > >>
> > > >> Unfortunately the customers never really understood well the risk
> > > >> and now that one disk is down I am in the middle and I must do
> > > >> everything in my power not to loose any data, thus I am requesting
> > > >> your assistance.
> > > >>
> > > >> Here is the output of
> > > >>
> > > >> $ ceph osd tree
> > > >> # id    weight  type name       up/down reweight
> > > >> -1      72.6    root default
> > > >> -2      36.3            host store1
> > > >> 0       3.63                    osd.0   down
> > > >> 0       ---> DISK DOWN
> > > >> 1       3.63                    osd.1   up
> > > >> 1
> > > >> 2       3.63                    osd.2   up
> > > >> 1
> > > >> 3       3.63                    osd.3   up
> > > >> 1
> > > >> 4       3.63                    osd.4   up
> > > >> 1
> > > >> 5       3.63                    osd.5   up
> > > >> 1
> > > >> 6       3.63                    osd.6   up
> > > >> 1
> > > >> 7       3.63                    osd.7   up
> > > >> 1
> > > >> 8       3.63                    osd.8   up
> > > >> 1
> > > >> 9       3.63                    osd.9   up
> > > >> 1
> > > >> -3      36.3            host store2
> > > >> 10      3.63                    osd.10  up      1
> > > >> 11      3.63                    osd.11  up      1
> > > >> 12      3.63                    osd.12  up      1
> > > >> 13      3.63                    osd.13  up      1
> > > >> 14      3.63                    osd.14  up      1
> > > >> 15      3.63                    osd.15  up      1
> > > >> 16      3.63                    osd.16  up      1
> > > >> 17      3.63                    osd.17  up      1
> > > >> 18      3.63                    osd.18  up      1
> > > >> 19      3.63                    osd.19  up      1
> > > >>
> > > >> and here is the status of the cluster
> > > >>
> > > >> # ceph health
> > > >> HEALTH_WARN 497 pgs degraded; 549 pgs stuck unclean; recovery
> > > >> 51916/2552684 objects degraded (2.034%)
> > > >>
> > > >> Althoug OSD.0 is shown as mounted it cannot be started (probably
> > > >> failed disk controller problem)
> > > >>
> > > >> # df -h
> > > >> Filesystem      Size  Used Avail Use% Mounted on
> > > >> /dev/sda3       251G  4.1G  235G   2% /
> > > >> tmpfs            24G     0   24G   0% /dev/shm
> > > >> /dev/sda1       239M  100M  127M  44% /boot
> > > >> /dev/sdj1       3.7T  223G  3.5T   6%
> > > >> /var/lib/ceph/osd/ceph-8
> > > >> /dev/sdh1       3.7T  205G  3.5T   6%
> > > >> /var/lib/ceph/osd/ceph-6
> > > >> /dev/sdg1       3.7T  199G  3.5T   6%
> > > >> /var/lib/ceph/osd/ceph-5
> > > >> /dev/sde1       3.7T  180G  3.5T   5%
> > > >> /var/lib/ceph/osd/ceph-3
> > > >> /dev/sdi1       3.7T  187G  3.5T   6%
> > > >> /var/lib/ceph/osd/ceph-7
> > > >> /dev/sdf1       3.7T  193G  3.5T   6%
> > > >> /var/lib/ceph/osd/ceph-4
> > > >> /dev/sdd1       3.7T  212G  3.5T   6%
> > > >> /var/lib/ceph/osd/ceph-2
> > > >> /dev/sdk1       3.7T  210G  3.5T   6%
> > > >> /var/lib/ceph/osd/ceph-9
> > > >> /dev/sdb1       3.7T  164G  3.5T   5%
> > > >> /var/lib/ceph/osd/ceph-0    ---> This is the problematic OSD
> > > >> /dev/sdc1       3.7T  183G  3.5T   5%
> > > >> /var/lib/ceph/osd/ceph-1
> > > >>
> > > >> # service ceph start osd.0
> > > >> find: `/var/lib/ceph/osd/ceph-0: Input/output error
> > > >> /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines
> > > >> mon.store1 osd.6 osd.9 osd.1 osd.4 osd.3 osd.2 osd.8 osd.5 osd.7
> > > >> mds.store1 mon.store3, /var/lib/ceph defines mon.store1 osd.6 osd.9
> > > >> osd.1 osd.4 osd.3 osd.2 osd.8 osd.5 osd.7 mds.store1)
> > > >>
> > > >> I have found this:
> > > >>
> > > >
> > > > http://ceph.com/geen-categorie/admin-guide-
> > replacing-a-failed-disk-in-a-ceph-cluster/
> > > >> [1]
> > > >>
> > > >> and I am looking for your guidance in order to properly perform all
> > > >> actions in order not to loose any data and keep the ones of the
> > > >> second copy.
> > > >
> > > > What guidance are you looking for besides the steps to replace a
> > > > failed disk (which you already found) ?
> > > > If i look at your situation, there is nothing down in terms of
> > > > availability of pgs, just a failed drive which needs to be replaced.
> > > >
> > > > Is the cluster still recovering? It should reach HEALTH_OK again
> > > > after
> > > > rebalancing the cluster when an OSD goes down.
> > > >
> > > > If it stopped recovering it might have to do with the ceph tunables
> > > > which are not set to optimal by default on firefly and that prevents
> > > > further rebalancing.
> > > > WARNING: Dont just set tunables to optimal because it will trigger a
> > > > massive rebalance!
> > > >
> > > > Perhaps the second golden rule is to never run a CEPH production
> > > > cluster without knowing (and testing) how to replace a failed drive.
> > > > (Im not trying to be harsh here).
> > > >
> > > > Kind regards,
> > > > Caspar
> > > >
> > > >
> > > >> Best regards,
> > > >>
> > > >> G.
> > > >> _______________________________________________
> > > >> ceph-users mailing list
> > > >> ceph-users@xxxxxxxxxxxxxx [2]
> > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
> > > >
> > > >
> > > >
> > > > Links:
> > > > ------
> > > > [1]
> > > >
> > > > http://ceph.com/geen-categorie/admin-guide-
> > replacing-a-failed-disk-in-a-ceph-cluster/
> > > > [2] mailto:ceph-users@xxxxxxxxxxxxxx
> > > > [3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > [4] mailto:giorgis@xxxxxxxxxxxx
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com