> Wido den Hollander <wido@xxxxxxxx> hat am 27. Oktober 2016 um 12:37 geschrieben: > > > Bringing back to the list > > > Op 27 oktober 2016 om 12:08 schreef Ralf Zerres <ralf.zerres@xxxxxxxxxxx>: > > > > > > > Wido den Hollander <wido@xxxxxxxx> hat am 27. Oktober 2016 um 11:51 > > > geschrieben: > > > > > > > > > > > > > Op 27 oktober 2016 om 11:46 schreef Ralf Zerres <hostmaster@xxxxxxxxxxx>: > > > > > > > > > > > > Here we go ... > > > > > > > > > > > > > Wido den Hollander <wido@xxxxxxxx> hat am 27. Oktober 2016 um 11:35 > > > > > geschrieben: > > > > > > > > > > > > > > > > > > > > > Op 27 oktober 2016 om 11:23 schreef Ralf Zerres > > > > > > <ralf.zerres@xxxxxxxxxxx>: > > > > > > > > > > > > > > > > > > Hello community, > > > > > > hello ceph developers, > > > > > > > > > > > > My name is Ralf working as IT-consultant. In this paticular case I do > > > > > > support a > > > > > > german customer running a 2 node CEPH cluster. > > > > > > > > > > > > This customer is struggeling with a desasterous situation, where a full > > > > > > pool > > > > > > of > > > > > > rbd-data (about 12 TB valid production-data) is lost. > > > > > > Details will follow underneath (The fact; Things already done). > > > > > > > > > > > > I urgently need to answer the following questions, where I am aware that > > > > > > any > > > > > > procedure (if working out) will take time and money. > > > > > > We will solve this problem, once there is light to go the right way. So, > > > > > > if > > > > > > you > > > > > > could point out any path to this way, I'd love to hear from you. > > > > > > For the community I'm willing and keen to document it for any unlucky > > > > > > one, > > > > > > who > > > > > > will face a comparable situation in the future. > > > > > > That said: > > > > > > > > > > > > - Is there any realistic chance to reconstruct the data? > > > > > > > > > > That depends on the case, see my questions below. > > > > > > > > > > > - A filesystem data-recovery-tool (here: XFS) is able to restore > > > > > > lost+found > > > > > > folders/objects form the involved OSD's > > > > > > Is ceph-objectstoor-tool is a valid tool to export -> import this > > > > > > folders to > > > > > > a > > > > > > new pool > > > > > > - If there is no way get it as a well defined structure back a cluster, > > > > > > i > > > > > > got > > > > > > aware of the tool rbd_restore. > > > > > > http://ceph.com/planet/ceph-recover-a-rbd-image-from-a-dead-cluster/#more-6738 > > > > > > Is this one versatil path to reconstruct a rbd-object from the recovered > > > > > > objects (all as fs-objects in subpathes of the recovery-disk)? > > > > > > > > > > > > Again, any help is appreciated very much > > > > > > > > > > > > best regards > > > > > > Ralf > > > > > > > > > > > > PS: I will be in IRC on #ceph (dwsadmin) > > > > > > > > > > > > > > > > > > A) The facts > > > > > > > > > > > > The cluster: ceph (v10.2.3), state: healthy > > > > > > State of rbd-pool in question: gone, all PG's are deleted on the > > > > > > underlying > > > > > > > > > > How do you mean with gone? Did somebody remove the pool from the system? > > > > > If > > > > > Ceph says HEALTH_OK it seems that that was the case. > > > > > > > > > > # ceph osd dump|grep pool > > > > > # ceph -s > > > > > > > > > > Can you post the output of both commands? > > > > > > > > ok: i did stop the monitor and the relevant osd.s on xxxsrv1 (because > > > > getting > > > > out the blocks with xfs-recovery) > > > > > > > > # ceph -s > > > > cluster 3d9571c0-b86c-4b6c-85b6-dc0a7aa8923b > > > > health HEALTH_WARN > > > > 2376 pgs degraded > > > > 2376 pgs stuck unclean > > > > 2376 pgs undersized > > > > recovery 1136266/2272532 objects degraded (50.000%) > > > > 16/29 in osds are down > > > > noout,noscrub,nodeep-scrub flag(s) set > > > > 1 mons down, quorum 1,2 xxxsrv2,xxxsrv3 > > > > monmap e21: 3 mons at > > > > (xxxsrv1=ip:6789/0,xxxsrv2=ip:6789/0,xxxsrv3=ip:6789/0} > > > > election epoch 1667830, quorum 1,2 dwssrv2,dwssrv3 > > > > fsmap e109117: 0/0/1 up > > > > osdmap e107820: 29 osds: 13 up, 29 in; 2376 remapped pgs > > > > flags noout,noscrub,nodeep-scrub > > > > pgmap v48473784: 2376 pgs, 6 pools, 4421 GB data, 1109 kobjects > > > > 8855 GB used, 37888 GB / 46827 GB avail > > > > 1136266/2272532 objects degraded (50.000%) > > > > 2376 active+undersized+degraded > > > > > > > > # ceph osd dump > > > > pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash > > > > rjenkins > > > > pg_num 192 pgp_num 192 last_change 58521 crash_replay_interval 45 > > > > min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width > > > > 0 > > > > pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash > > > > rjenkins pg_num 192 pgp_num 192 last_change 58522 > > > > min_read_recency_for_promote 1 > > > > min_write_recency_for_promote 1 stripe_width 0 > > > > > > The pool *rbd* is missing here. This has been deleted by somebody or some > > > application, but the fact is that it is no longer there. > > > > > > The simple fact now is that the data is gone, really gone. I hope you have > > > some good backups, since Ceph no longer has your data. There is NO way to get > > > this back. > > > > > > For future reference, you can set the 'mon_allow_pool_delete' setting to > > > 'false' in the [mon] section in ceph.conf to prevent pool deletion to happen > > > and/or set the nodelete flag on a pool: > > > > > > # ceph osd pool set rbd nodelete true > > > > > > This is a additional safeguard against removing a pool. > > > > > > But in your situation now, the pool rbd is gone. It was removed by somebody > > > and not by accident by Ceph itself. > > > > > > Sorry to bring you this bad news, but it's just not there anymore. > > > > > > Wido > > > > > > > really gone data. YES > > And it wasn't a malfunction Ceph. YES > > That wasn't clear from the first e-mail you send. > sorry for that.
> > I don't want and can't discuss who and how the pool was deleted (ceop osd pool > > <poolname> <poolname> {--yes-i-really-really-mean-it}. Data deleted structural. > > They might be recovered on the underlying Filesystem of the involved OSD's. > > The qustion is: If, i mean "if a xfs-restore programm (if found an bought > > 'r-explerer-pro') is able to restore the PG-Folders" (<PG-ID.<int> gets to > > $LostFiles/$Group<int>/$Folder<int>), is there any way to make this data > > valuable? > > > > Maybe, if you find those objects you might be partially able to restore a block device, but the chances are slim. Even if you are missing just a few objects you could have a broken filesystem which will not mount anymore. > > Wido > ok. but slim is better then nothing.
The question is, which steps are needed now.
I already try to restore as much data as possible, since this takes a very long time.1) sacn 2) restore to a filesystem on new disk
> > > > pool 13 'archive' replicated size 2 min_size 1 crush_ruleset 4 object_hash > > > > rjenkins pg_num 256 pgp_num 256 last_change 92699 > > > > min_read_recency_for_promote 1 > > > > min_write_recency_for_promote 1 stripe_width 0 > > > > pool 16 'production' replicated size 2 min_size 1 crush_ruleset 3 > > > > object_hash > > > > rjenkins pg_num 1024 pgp_num 1024 last_change 85051 lfor 85050 flags > > > > hashpspool > > > > min_write_recency_for_promote 1 stripe_width 0 > > > > > > > > > > > > > > > OSD's > > > > > > Cluster-structure: > > > > > > - 3 server-nodes (64 GB RAM, Opteron CPU's) > > > > > > - 2 server acting as monitor and osd node, 1 server acting as monitor > > > > > > - 2 osd-nodes (15 osd's each, spinning disks), journals: party on > > > > > > ssd-partions, > > > > > > partly on sata partions > > > > > > - just used for rbd > > > > > > - curshmap: will take care to store rbd-pool data to storage-bucketes > > > > > > (pool > > > > > > size: 2); storgage host1 and host2 take the replicas > > > > > > > > > > > > > > > > size = 2 is always a bad thing, please, never do this again. Always run > > > > > with > > > > > size = 3. > > > > > > > > > > > The cluster itself is in HEALTH state. > > > > > > > > > > > > B) Things already done > > > > > > > > > > > > We did analyse the situation and try to make sure not loose any bits on > > > > > > the > > > > > > underlying OSD disks > > > > > > > > > > > > - Cluster activity like : ceph osd set noout, nodeep-scrub, no-scrub > > > > > > now cluster state change as expected to HEALTH_WARN > > > > > > - shut down all involved OSD's (seen from crushmap) like : systemctl > > > > > > stop > > > > > > ceph-osd@<osd-id> > > > > > > - Get and install a professional Data Recovery Tool handling xfs > > > > > > filesystems > > > > > > (on > > > > > > node, 3Ware controller does not support JBOD, so runs in RAID0 mode) > > > > > > - Drop in new physical disks (node1: 2x 8TB SATA) to copy out Lost+Found > > > > > > objects > > > > > > from the OSD's > > > > > > - make Backup for all other Objects of the Ceph-Cluster > > > > > > > > > > > > Of cose, since we are talking about roughly 12 TB data chunks, backukp > > > > > > and > > > > > > recovary takes an awful lang time .... > > > > > > > > > > > > > > > > > > C) References found > > > > > > - Incomplete PGs — OH MY! -> > > > > > > https://ceph.com/community/incomplete-pgs-oh-my/ > > > > > > https://ceph.com/community/incomplete-pgs-oh-my/#comments > > > > > > - Recovering incomplete PGs -> > > > > > > http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool > > > > > > - ceph-users: Recover unfound objects from crashed OSD's underlying > > > > > > filesystem > > > > > > -> > > > > > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007637.html > > > > > > > > > > > > > > > > > > Reference > > > > > > ========= > > > > > > > > > > > > # lscpu > > > > > > Architecture: x86_64 > > > > > > CPU op-mode(s): 32-bit, 64-bit > > > > > > Byte Order: Little Endian > > > > > > CPU(s): 16 > > > > > > On-line CPU(s) list: 0-15 > > > > > > Thread(s) per core: 2 > > > > > > Core(s) per socket: 8 > > > > > > Socket(s): 1 > > > > > > NUMA node(s): 2 > > > > > > Vendor ID: AuthenticAMD > > > > > > CPU family: 21 > > > > > > Model: 1 > > > > > > Model name: AMD Opteron(TM) Processor 6272 > > > > > > Stepping: 2 > > > > > > CPU MHz: 1400.000 > > > > > > CPU max MHz: 2100.0000 > > > > > > CPU min MHz: 1400.0000 > > > > > > BogoMIPS: 4199.99 > > > > > > Virtualization: AMD-V > > > > > > NUMA node0 CPU(s): 0-7 > > > > > > NUMA node1 CPU(s): 8-15 > > > > > > > > > > > > # free > > > > > > total used free shared buff/cache available > > > > > > Mem: 65956972 751600 315672 1528 64889700 64383492 > > > > > > Swap: 16777212 0 16777212 > > > > > > > > > > > > # tw-cli show > > > > > > > > > > > > Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU > > > > > > ------------------------------------------------------------------------ > > > > > > c2 9750-4i 16 16 16 1 1 1 OK > > > > > > > > > > > > Enclosure Slots Drives Fans TSUnits PSUnits Alarms > > > > > > -------------------------------------------------------------- > > > > > > /c2/e0 16 16 5 1 2 1 > > > > > > > > > > > > # ceph --version > > > > > > ceph version 10.2.3-247-g0c83eb3 > > > > > > (0c83eb355e989fb6ed38a3b82f9705fd5d700e89) > > > > > > > > > > > > # ceph osd tree > > > > > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > > > > > > -12 0 host xxxsrv1 > > > > > > -1 xx room server-room > > > > > > -2 xx rack rack-daywalker > > > > > > -4 29.16936 storage data > > > > > > -6 14.29945 host xxxsrv1-data > > > > > > 9 1.70000 osd.9 down 1.00000 1.00000 > > > > > > 18 1.79990 osd.18 down 1.00000 1.00000 > > > > > > 19 1.79990 osd.19 down 1.00000 1.00000 > > > > > > 22 1.79990 osd.22 down 1.00000 1.00000 > > > > > > 1 1.79990 osd.1 down 1.00000 1.00000 > > > > > > 0 1.79990 osd.0 down 1.00000 1.00000 > > > > > > 12 1.79999 osd.12 down 1.00000 1.00000 > > > > > > 25 1.79999 osd.25 down 1.00000 1.00000 > > > > > > -7 14.86990 host xxxsrv2-data > > > > > > 3 1.79999 osd.3 up 1.00000 1.00000 > > > > > > 11 1.79999 osd.11 up 1.00000 1.00000 > > > > > > 13 1.79999 osd.13 up 1.00000 1.00000 > > > > > > 4 1.79999 osd.4 up 1.00000 1.00000 > > > > > > 20 1.79999 osd.20 up 1.00000 1.00000 > > > > > > 21 1.79999 osd.21 up 1.00000 1.00000 > > > > > > 23 2.26999 osd.23 up 1.00000 1.00000 > > > > > > 24 1.79999 osd.24 up 1.00000 1.00000 > > > > > > -5 14.49991 storage archive > > > > > > -8 8.99994 host xxxsrv1-archive > > > > > > 7 0.89998 osd.7 down 1.00000 1.00000 > > > > > > 8 0.89998 osd.8 down 1.00000 1.00000 > > > > > > 10 3.59999 osd.10 down 1.00000 1.00000 > > > > > > 26 3.59999 osd.26 down 1.00000 1.00000 > > > > > > -9 5.49997 host xxxsrv2-archive > > > > > > 5 0.89999 osd.5 up 1.00000 1.00000 > > > > > > 2 3.50000 osd.2 up 1.00000 1.00000 > > > > > > 6 0.89998 osd.6 up 1.00000 1.00000 > > > > > > 17 0.20000 osd.17 up 1.00000 1.00000 > > > > > > > > > > > > # ceph osd crush rule dump vdi-data > > > > > > { > > > > > > "rule_id": 3, > > > > > > "rule_name": "vdi-data", > > > > > > "ruleset": 3, > > > > > > "type": 1, > > > > > > "min_size": 1, > > > > > > "max_size": 10, > > > > > > "steps": [ > > > > > > { > > > > > > "op": "take", > > > > > > "item": -4, > > > > > > "item_name": "data" > > > > > > }, > > > > > > { > > > > > > "op": "chooseleaf_firstn", > > > > > > "num": 0, > > > > > > "type": "host" > > > > > > }, > > > > > > { > > > > > > "op": "emit" > > > > > > } > > > > > > ] > > > > > > } > > > > > > > > > > > > _______________________________________________ > > > > > > ceph-users mailing list > > > > > > ceph-users@xxxxxxxxxxxxxx > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com