Re: Backfilling caused RBD corruption on Hammer?

Mike Lovell <mike.lovell@xxxxxxxxxxxxx> · Fri, 29 Apr 2016 09:58:04 -0500

are the new osds running 0.94.5 or did they get the latest .6 packages? are you also using cache tiering? we ran in to a problem with individual rbd objects getting corrupted when using 0.94.6 with a cache tier and min_read_recency_for_promote was > 1. our only solution to corruption that happened was to restore from backup. setting min_read_recency_for_promote to 1 or making sure the osds were running .5 were sufficient to prevent it from happening though we currently do both.
mike

On Fri, Apr 29, 2016 at 9:41 AM, Robert Sander <r.sander@xxxxxxxxxxxxxxxxxxx> wrote:
Hi,

yesterday we ran into a strange bug / mysterious issue with a Hammer

0.94.5 storage cluster.

We added OSDs and the cluster started the backfilling. Suddenly one of

the running VMs complained that it lost a partition in a 2TB RBD.

After resetting the VM it could not boot any more as the RBD has no

partition info at the start. :(

It looks like the data in the objects has been changed somehow.

How is that possible? Any ideas?

The VM was restored from a backup but we would still like to know how

this happened and maybe restore some data that was not backed up before

the crash.

Regards

--

Robert Sander

Heinlein Support GmbH

Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43

Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:

HRB 93818 B / Amtsgericht Berlin-Charlottenburg,

Geschäftsführer: Peer Heinlein -- Sitz: Berlin

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com