Re: Invalid RBD object maps of snapshots on Mimic

Jason Dillaman <jdillama@xxxxxxxxxx> · Thu, 10 Jan 2019 10:28:12 -0500

On Thu, Jan 10, 2019 at 4:01 AM Oliver Freyermuth
<freyermuth@xxxxxxxxxxxxxxxxxx> wrote:
>
> Dear Cephalopodians,
>
> I performed several consistency checks now:
> - Exporting an RBD snapshot before and after the object map rebuilding.
> - Exporting a backup as raw image, all backups (re)created before and after the object map rebuilding.
> - md5summing all of that for a snapshot for which the rebuilding was actually needed.
>
> The good news: I found that all checksums are the same. So the backups are (at least for those I checked) not broken.
>
> I also checked the source and found:
> https://github.com/ceph/ceph/blob/master/src/include/rbd/object_map_types.h
> So to my understanding, the object map entries are OBJECT_EXISTS, but should be OBJECT_EXISTS_CLEAN.
> Do I understand correctly that OBJECT_EXISTS_CLEAN relates to the object being unchanged ("clean") as compared to another snapshot / the main volume?
>
> If so, this would explain why the backups, exports etc. are all okay, since the backup tools only got "too many" objects in the fast-diff and
> hence extracted too many objects from Ceph-RBD even though that was not needed. Since both Benji and Backy2 deduplicate again in their backends,
> this causes only a minor network traffic inefficiency.
>
> Is my understanding correct?
> Then the underlying issue would still be a bug, but (as it seems) a harmless one.

Yes, your understanding is correct in that it's harmless from a
data-integrity point-of-view.

During the creation of the snapshot, the current object map (for the
HEAD revision) is copied to a new object map for that snapshot and
then all the objects in the HEAD revision snapshot are marked as
EXISTS_CLEAN (if they EXIST). Somehow an IO operation is causing the
object map to think there is an update, but apparently no object
update is actually occurring (or at least the OSD doesn't think a
change occurred).

> I'll let you know if it happens again to some of our snapshots, and if so, if it only happens to newly created ones...
>
> Cheers,
>         Oliver
>
> Am 10.01.19 um 01:18 schrieb Oliver Freyermuth:
> > Dear Cephalopodians,
> >
> > inspired by http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032092.html I did a check of the object-maps of our RBD volumes
> > and snapshots. We are running 13.2.1 on the cluster I am talking about, all hosts (OSDs, MONs, RBD client nodes) still on CentOS 7.5.
> >
> > Sadly, I found that for at least 50 % of the snapshots (only the snapshots, not the volumes themselves), I got something like:
> > --------------------------------------------------------------------------------------------------
> > 2019-01-09 23:00:06.481 7f89aeffd700 -1 librbd::ObjectMapIterateRequest: object map error: object rbd_data.519c46b8b4567.0000000000000260 marked as 1, but should be 3
> > 2019-01-09 23:00:06.563 7f89aeffd700 -1 librbd::ObjectMapIterateRequest: object map error: object rbd_data.519c46b8b4567.0000000000000840 marked as 1, but should be 3
> > --------------------------------------------------------------------------------------------------
> > 2019-01-09 23:00:09.166 7fbcff7fe700 -1 librbd::ObjectMapIterateRequest: object map error: object rbd_data.519c46b8b4567.0000000000000480 marked as 1, but should be 3
> > 2019-01-09 23:00:09.228 7fbcff7fe700 -1 librbd::ObjectMapIterateRequest: object map error: object rbd_data.519c46b8b4567.0000000000000840 marked as 1, but should be 3
> > --------------------------------------------------------------------------------------------------
> > It often appears to affect 1-3 entries in the map of a snapshot. The Object Map was *not* marked invalid before I ran the check.
> > After rebuilding it, the check is fine again.
> >
> > The cluster has not yet seen any Ceph update (it was installed as 13.2.1, we plan to upgrade to 13.2.4 soonish).
> > There have been no major causes of worries so far. We purged a single OSD disk, balanced PGs with upmap, modified the CRUSH topology slightly etc.
> > The cluster never was in a prolonged unhealthy period nor did we have to repair any PG.
> >
> > Is this a known error?
> > Is it harmful, or is this just something like reference counting being off, and objects being in the map which did not really change in the snapshot?
> >
> > Our usecase, in case that helps to understand or reproduce:
> > - RBDs are used as disks for qemu/kvm virtual machines.
> > - Every night:
> >    - We run an fstrim in the VM (which propagates to RBD and purges empty blocks), fsfreeze it, take a snapshot, thaw it again.
> >    - After that, we run two backups with Benji backup ( https://benji-backup.me/ ) and Backy2 backup ( http://backy2.com/docs/ )
> >      which seems to work rather well so far.
> >    - We purge some old snapshots.
> >
> > We use the following RBD feature flags:
> > layering, exclusive-lock, object-map, fast-diff, deep-flatten
> >
> > Since Benji and Backy2 are optimized for differential RBD backups to deduplicated storage, they leverage "rbd diff" (and hence make use of fast-diff, I would think).
> > If rbd diff produces wrong output due to this issue, it would affect our backups (but it would also affect classic backups of snapshots via "rbd export"...).
> > In case the issue is known or understood, can somebody extrapolate whether this means "rbd diff" contains too many blocks or actually misses changed blocks?
> >
> >
> > We are from now on running daily, full object-map checks on all volumes and backups, and automatically rebuild any object-map which was found invalid after the check.
> > Hopefully, this will allow to correlate the appearance of these issues with "something" happening on the cluster.
> > I did not detect a clean pattern in the affected snapshots, though, it seemed rather random...
> >
> > Maybe it would also help to understand this issue if somebody else using RBD in a similar manner on Mimic could also check the object-maps.
> > Since this issue does not show up until a check is performed, this was below our radar for many months now...
> >
> > Cheers,
> >       Oliver
> >
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com