are the new osds running 0.94.5 or did they get the latest .6 packages? are you also using cache tiering? we ran in to a problem with individual rbd objects getting corrupted when using 0.94.6 with a cache tier and min_read_recency_for_promote was > 1. our only solution to corruption that happened was to restore from backup. setting min_read_recency_for_promote to 1 or making sure the osds were running .5 were sufficient to prevent it from happening though we currently do both.
mike
On Fri, Apr 29, 2016 at 9:41 AM, Robert Sander <r.sander@xxxxxxxxxxxxxxxxxxx> wrote:
Hi,
yesterday we ran into a strange bug / mysterious issue with a Hammer
0.94.5 storage cluster.
We added OSDs and the cluster started the backfilling. Suddenly one of
the running VMs complained that it lost a partition in a 2TB RBD.
After resetting the VM it could not boot any more as the RBD has no
partition info at the start. :(
It looks like the data in the objects has been changed somehow.
How is that possible? Any ideas?
The VM was restored from a backup but we would still like to know how
this happened and maybe restore some data that was not backed up before
the crash.
Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin
http://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com