Re: Dead pool recovery - Nightmare

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Op 27 oktober 2016 om 11:46 schreef Ralf Zerres <hostmaster@xxxxxxxxxxx>:
> 
> 
> Here we go ...
>  
> 
> > Wido den Hollander <wido@xxxxxxxx> hat am 27. Oktober 2016 um 11:35
> > geschrieben:
> >
> >
> >
> > > Op 27 oktober 2016 om 11:23 schreef Ralf Zerres <ralf.zerres@xxxxxxxxxxx>:
> > >
> > >
> > > Hello community,
> > > hello ceph developers,
> > >
> > > My name is Ralf working as IT-consultant. In this paticular case I do
> > > support a
> > > german customer running a 2 node CEPH cluster.
> > >
> > > This customer is struggeling with a desasterous situation, where a full pool
> > > of
> > > rbd-data (about 12 TB valid production-data) is lost.
> > > Details will follow underneath (The fact; Things already done).
> > >
> > > I urgently need to answer the following questions, where I am aware that any
> > > procedure (if working out) will take time and money.
> > > We will solve this problem, once there is light to go the right way. So, if
> > > you
> > > could point out any path to this way, I'd love to hear from you.
> > > For the community I'm willing and keen to document it for any unlucky one,
> > > who
> > > will face a comparable situation in the future.
> > > That said:
> > >
> > > - Is there any realistic chance to reconstruct the data?
> >
> > That depends on the case, see my questions below.
> >
> > > - A filesystem data-recovery-tool (here: XFS) is able to restore lost+found
> > > folders/objects form the involved OSD's
> > > Is ceph-objectstoor-tool is a valid tool to export -> import this folders to
> > > a
> > > new pool
> > > - If there is no way get it as a well defined structure back a cluster, i
> > > got
> > > aware of the tool rbd_restore.
> > > http://ceph.com/planet/ceph-recover-a-rbd-image-from-a-dead-cluster/#more-6738
> > > Is this one versatil path to reconstruct a rbd-object from the recovered
> > > objects (all as fs-objects in subpathes of the recovery-disk)?
> > >
> > > Again, any help is appreciated very much
> > >
> > > best regards
> > > Ralf
> > >
> > > PS: I will be in IRC on #ceph (dwsadmin)
> > >
> > >
> > > A) The facts
> > >
> > > The cluster: ceph (v10.2.3), state: healthy
> > > State of rbd-pool in question: gone, all PG's are deleted on the underlying
> >
> > How do you mean with gone? Did somebody remove the pool from the system? If
> > Ceph says HEALTH_OK it seems that that was the case.
> >
> > # ceph osd dump|grep pool
> > # ceph -s
> >
> > Can you post the output of both commands?
>  
> ok: i did stop the monitor and the relevant osd.s on  xxxsrv1 (because getting
> out the blocks with xfs-recovery)
>  
> # ceph -s
>     cluster 3d9571c0-b86c-4b6c-85b6-dc0a7aa8923b
>      health HEALTH_WARN
>             2376 pgs degraded
>             2376 pgs stuck unclean
>             2376 pgs undersized
>             recovery 1136266/2272532 objects degraded (50.000%)
>             16/29 in osds are down
>             noout,noscrub,nodeep-scrub flag(s) set
>             1 mons down, quorum 1,2 xxxsrv2,xxxsrv3
>      monmap e21: 3 mons at
> (xxxsrv1=ip:6789/0,xxxsrv2=ip:6789/0,xxxsrv3=ip:6789/0}
>             election epoch 1667830, quorum 1,2 dwssrv2,dwssrv3
>       fsmap e109117: 0/0/1 up
>      osdmap e107820: 29 osds: 13 up, 29 in; 2376 remapped pgs
>             flags noout,noscrub,nodeep-scrub
>       pgmap v48473784: 2376 pgs, 6 pools, 4421 GB data, 1109 kobjects
>             8855 GB used, 37888 GB / 46827 GB avail
>             1136266/2272532 objects degraded (50.000%)
>                 2376 active+undersized+degraded
>  
> # ceph osd dump
> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
> pg_num 192 pgp_num 192 last_change 58521 crash_replay_interval 45
> min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash
> rjenkins pg_num 192 pgp_num 192 last_change 58522 min_read_recency_for_promote 1
> min_write_recency_for_promote 1 stripe_width 0

The pool *rbd* is missing here. This has been deleted by somebody or some application, but the fact is that it is no longer there.

The simple fact now is that the data is gone, really gone. I hope you have some good backups, since Ceph no longer has your data. There is NO way to get this back.

For future reference, you can set the 'mon_allow_pool_delete' setting to 'false' in the [mon] section in ceph.conf to prevent pool deletion to happen and/or set the nodelete flag on a pool:

# ceph osd pool set rbd nodelete true

This is a additional safeguard against removing a pool.

But in your situation now, the pool rbd is gone. It was removed by somebody and not by accident by Ceph itself.

Sorry to bring you this bad news, but it's just not there anymore.

Wido

> pool 13 'archive' replicated size 2 min_size 1 crush_ruleset 4 object_hash
> rjenkins pg_num 256 pgp_num 256 last_change 92699 min_read_recency_for_promote 1
> min_write_recency_for_promote 1 stripe_width 0
> pool 16 'production' replicated size 2 min_size 1 crush_ruleset 3 object_hash
> rjenkins pg_num 1024 pgp_num 1024 last_change 85051 lfor 85050 flags hashpspool
> min_write_recency_for_promote 1 stripe_width 0
>  
> >
> > > OSD's
> > > Cluster-structure:
> > > - 3 server-nodes (64 GB RAM, Opteron CPU's)
> > > - 2 server acting as monitor and osd node, 1 server acting as monitor
> > > - 2 osd-nodes (15 osd's each, spinning disks), journals: party on
> > > ssd-partions,
> > > partly on sata partions
> > > - just used for rbd
> > > - curshmap: will take care to store rbd-pool data to storage-bucketes (pool
> > > size: 2); storgage host1 and host2 take the replicas
> > >
> >
> > size = 2 is always a bad thing, please, never do this again. Always run with
> > size = 3.
> >
> > > The cluster itself is in HEALTH state.
> > >
> > > B) Things already done
> > >
> > > We did analyse the situation and try to make sure not loose any bits on the
> > > underlying OSD disks
> > >
> > > - Cluster activity like : ceph osd set noout, nodeep-scrub, no-scrub
> > > now cluster state change as expected to HEALTH_WARN
> > > - shut down all involved OSD's (seen from crushmap) like : systemctl stop
> > > ceph-osd@<osd-id>
> > > - Get and install a professional Data Recovery Tool handling xfs filesystems
> > > (on
> > > node, 3Ware controller does not support JBOD, so runs in RAID0 mode)
> > > - Drop in new physical disks (node1: 2x 8TB SATA) to copy out Lost+Found
> > > objects
> > > from the OSD's
> > > - make Backup for all other Objects of the Ceph-Cluster
> > >
> > > Of cose, since we are talking about roughly 12 TB data chunks, backukp and
> > > recovary takes an awful lang time ....
> > >
> > >
> > > C) References found
> > > - Incomplete PGs — OH MY! ->
> > > https://ceph.com/community/incomplete-pgs-oh-my/
> > > https://ceph.com/community/incomplete-pgs-oh-my/#comments
> > > - Recovering incomplete PGs ->
> > > http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool
> > > - ceph-users: Recover unfound objects from crashed OSD's underlying
> > > filesystem
> > > ->
> > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007637.html
> > >
> > >
> > > Reference
> > > =========
> > >
> > > # lscpu
> > > Architecture: x86_64
> > > CPU op-mode(s): 32-bit, 64-bit
> > > Byte Order: Little Endian
> > > CPU(s): 16
> > > On-line CPU(s) list: 0-15
> > > Thread(s) per core: 2
> > > Core(s) per socket: 8
> > > Socket(s): 1
> > > NUMA node(s): 2
> > > Vendor ID: AuthenticAMD
> > > CPU family: 21
> > > Model: 1
> > > Model name: AMD Opteron(TM) Processor 6272
> > > Stepping: 2
> > > CPU MHz: 1400.000
> > > CPU max MHz: 2100.0000
> > > CPU min MHz: 1400.0000
> > > BogoMIPS: 4199.99
> > > Virtualization: AMD-V
> > > NUMA node0 CPU(s): 0-7
> > > NUMA node1 CPU(s): 8-15
> > >
> > > # free
> > > total used free shared buff/cache available
> > > Mem: 65956972 751600 315672 1528 64889700 64383492
> > > Swap: 16777212 0 16777212
> > >
> > > # tw-cli show
> > >
> > > Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU
> > > ------------------------------------------------------------------------
> > > c2 9750-4i 16 16 16 1 1 1 OK
> > >
> > > Enclosure Slots Drives Fans TSUnits PSUnits Alarms
> > > --------------------------------------------------------------
> > > /c2/e0 16 16 5 1 2 1
> > >
> > > # ceph --version
> > > ceph version 10.2.3-247-g0c83eb3 (0c83eb355e989fb6ed38a3b82f9705fd5d700e89)
> > >
> > > # ceph osd tree
> > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> > > -12 0 host xxxsrv1
> > > -1 xx room server-room
> > > -2 xx rack rack-daywalker
> > > -4 29.16936 storage data
> > > -6 14.29945 host xxxsrv1-data
> > > 9 1.70000 osd.9 down 1.00000 1.00000
> > > 18 1.79990 osd.18 down 1.00000 1.00000
> > > 19 1.79990 osd.19 down 1.00000 1.00000
> > > 22 1.79990 osd.22 down 1.00000 1.00000
> > > 1 1.79990 osd.1 down 1.00000 1.00000
> > > 0 1.79990 osd.0 down 1.00000 1.00000
> > > 12 1.79999 osd.12 down 1.00000 1.00000
> > > 25 1.79999 osd.25 down 1.00000 1.00000
> > > -7 14.86990 host xxxsrv2-data
> > > 3 1.79999 osd.3 up 1.00000 1.00000
> > > 11 1.79999 osd.11 up 1.00000 1.00000
> > > 13 1.79999 osd.13 up 1.00000 1.00000
> > > 4 1.79999 osd.4 up 1.00000 1.00000
> > > 20 1.79999 osd.20 up 1.00000 1.00000
> > > 21 1.79999 osd.21 up 1.00000 1.00000
> > > 23 2.26999 osd.23 up 1.00000 1.00000
> > > 24 1.79999 osd.24 up 1.00000 1.00000
> > > -5 14.49991 storage archive
> > > -8 8.99994 host xxxsrv1-archive
> > > 7 0.89998 osd.7 down 1.00000 1.00000
> > > 8 0.89998 osd.8 down 1.00000 1.00000
> > > 10 3.59999 osd.10 down 1.00000 1.00000
> > > 26 3.59999 osd.26 down 1.00000 1.00000
> > > -9 5.49997 host xxxsrv2-archive
> > > 5 0.89999 osd.5 up 1.00000 1.00000
> > > 2 3.50000 osd.2 up 1.00000 1.00000
> > > 6 0.89998 osd.6 up 1.00000 1.00000
> > > 17 0.20000 osd.17 up 1.00000 1.00000
> > >
> > > # ceph osd crush rule dump vdi-data
> > > {
> > > "rule_id": 3,
> > > "rule_name": "vdi-data",
> > > "ruleset": 3,
> > > "type": 1,
> > > "min_size": 1,
> > > "max_size": 10,
> > > "steps": [
> > > {
> > > "op": "take",
> > > "item": -4,
> > > "item_name": "data"
> > > },
> > > {
> > > "op": "chooseleaf_firstn",
> > > "num": 0,
> > > "type": "host"
> > > },
> > > {
> > > "op": "emit"
> > > }
> > > ]
> > > }
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux