Re: Invalid RBD object maps of snapshots on Mimic

Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> · Thu, 10 Jan 2019 16:50:25 +0100

Dear Jason and list,

Am 10.01.19 um 16:28 schrieb Jason Dillaman:
On Thu, Jan 10, 2019 at 4:01 AM Oliver Freyermuth
<freyermuth@xxxxxxxxxxxxxxxxxx> wrote:

Dear Cephalopodians,

I performed several consistency checks now:
- Exporting an RBD snapshot before and after the object map rebuilding.
- Exporting a backup as raw image, all backups (re)created before and after the object map rebuilding.
- md5summing all of that for a snapshot for which the rebuilding was actually needed.

The good news: I found that all checksums are the same. So the backups are (at least for those I checked) not broken.

I also checked the source and found:
https://github.com/ceph/ceph/blob/master/src/include/rbd/object_map_types.h
So to my understanding, the object map entries are OBJECT_EXISTS, but should be OBJECT_EXISTS_CLEAN.
Do I understand correctly that OBJECT_EXISTS_CLEAN relates to the object being unchanged ("clean") as compared to another snapshot / the main volume?

If so, this would explain why the backups, exports etc. are all okay, since the backup tools only got "too many" objects in the fast-diff and
hence extracted too many objects from Ceph-RBD even though that was not needed. Since both Benji and Backy2 deduplicate again in their backends,
this causes only a minor network traffic inefficiency.

Is my understanding correct?
Then the underlying issue would still be a bug, but (as it seems) a harmless one.

Yes, your understanding is correct in that it's harmless from a
data-integrity point-of-view.

During the creation of the snapshot, the current object map (for the
HEAD revision) is copied to a new object map for that snapshot and
then all the objects in the HEAD revision snapshot are marked as
EXISTS_CLEAN (if they EXIST). Somehow an IO operation is causing the
object map to think there is an update, but apparently no object
update is actually occurring (or at least the OSD doesn't think a
change occurred).

thanks a lot for the clarification! Good to know my understanding is correct.

I re-checked all object maps just now. Again, the most recent snapshots show this issue, but only those.
The only "special" thing which probably not everybody is doing would likely be us running fstrim in the machines
running from the RBD regularly, to conserve space.

I am not sure how exactly the DISCARD operation is handled in rbd. But since this was my guess, I just did an fstrim inside one of the VMs,
and checked the object-maps again. I get:
2019-01-10 16:44:25.320 7f06f67fc700 -1 librbd::ObjectMapIterateRequest: object map error: object rbd_data.4f587327b23c6.0000000000000040 marked as 1, but should be 3
In this case, I got it for the volume itself and not a snapshot.

So it seems to me that sometimes, DISCARD causes objects to think they have been updated, albeit they have not.
Sadly due to in-depth code knowledge and lack of a real debug setup I can not track it down further :-(.

Cheers and hope that helps a code expert in tracking it down (at least it's not affecting data integrity),
	Oliver

I'll let you know if it happens again to some of our snapshots, and if so, if it only happens to newly created ones...

Cheers,
         Oliver

Am 10.01.19 um 01:18 schrieb Oliver Freyermuth:
Dear Cephalopodians,

inspired by http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032092.html I did a check of the object-maps of our RBD volumes
and snapshots. We are running 13.2.1 on the cluster I am talking about, all hosts (OSDs, MONs, RBD client nodes) still on CentOS 7.5.

Sadly, I found that for at least 50 % of the snapshots (only the snapshots, not the volumes themselves), I got something like:
--------------------------------------------------------------------------------------------------
2019-01-09 23:00:06.481 7f89aeffd700 -1 librbd::ObjectMapIterateRequest: object map error: object rbd_data.519c46b8b4567.0000000000000260 marked as 1, but should be 3
2019-01-09 23:00:06.563 7f89aeffd700 -1 librbd::ObjectMapIterateRequest: object map error: object rbd_data.519c46b8b4567.0000000000000840 marked as 1, but should be 3
--------------------------------------------------------------------------------------------------
2019-01-09 23:00:09.166 7fbcff7fe700 -1 librbd::ObjectMapIterateRequest: object map error: object rbd_data.519c46b8b4567.0000000000000480 marked as 1, but should be 3
2019-01-09 23:00:09.228 7fbcff7fe700 -1 librbd::ObjectMapIterateRequest: object map error: object rbd_data.519c46b8b4567.0000000000000840 marked as 1, but should be 3
--------------------------------------------------------------------------------------------------
It often appears to affect 1-3 entries in the map of a snapshot. The Object Map was *not* marked invalid before I ran the check.
After rebuilding it, the check is fine again.

The cluster has not yet seen any Ceph update (it was installed as 13.2.1, we plan to upgrade to 13.2.4 soonish).
There have been no major causes of worries so far. We purged a single OSD disk, balanced PGs with upmap, modified the CRUSH topology slightly etc.
The cluster never was in a prolonged unhealthy period nor did we have to repair any PG.

Is this a known error?
Is it harmful, or is this just something like reference counting being off, and objects being in the map which did not really change in the snapshot?

Our usecase, in case that helps to understand or reproduce:
- RBDs are used as disks for qemu/kvm virtual machines.
- Every night:
    - We run an fstrim in the VM (which propagates to RBD and purges empty blocks), fsfreeze it, take a snapshot, thaw it again.
    - After that, we run two backups with Benji backup ( https://benji-backup.me/ ) and Backy2 backup ( http://backy2.com/docs/ )
      which seems to work rather well so far.
    - We purge some old snapshots.

We use the following RBD feature flags:
layering, exclusive-lock, object-map, fast-diff, deep-flatten

Since Benji and Backy2 are optimized for differential RBD backups to deduplicated storage, they leverage "rbd diff" (and hence make use of fast-diff, I would think).
If rbd diff produces wrong output due to this issue, it would affect our backups (but it would also affect classic backups of snapshots via "rbd export"...).
In case the issue is known or understood, can somebody extrapolate whether this means "rbd diff" contains too many blocks or actually misses changed blocks?

We are from now on running daily, full object-map checks on all volumes and backups, and automatically rebuild any object-map which was found invalid after the check.
Hopefully, this will allow to correlate the appearance of these issues with "something" happening on the cluster.
I did not detect a clean pattern in the affected snapshots, though, it seemed rather random...

Maybe it would also help to understand this issue if somebody else using RBD in a similar manner on Mimic could also check the object-maps.
Since this issue does not show up until a check is performed, this was below our radar for many months now...

Cheers,
       Oliver

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Oliver Freyermuth
Universität Bonn
Physikalisches Institut, Raum 1.047
Nußallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869
--

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com