Re: Understanding rbd objects, with snapshots

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 18/10/2022 01:24, Chris Dunlop wrote:
Hi,

Is there anywhere that describes exactly how rbd data (including snapshots) are stored within a pool?

I can see how a rbd broadly stores its data in rados objects in the pool, although the object map is opaque. But once an rbd snap is created and new data written to the rbd, where is the old data associated with the snap?

And/or how can I access the data from an rbd snapshot directly, e.g. using rados?

And, how can an object map be interpreted, i.e. what is the format?

I don't know if the snaps documentation here:

https://docs.ceph.com/en/latest/dev/osd_internals/snaps/

...is related to rbd snaps. Perhaps rbd snaps are "self managed snaps" requiring the use a "SnapContext", but the rados man page doesn't include any mention of this so it's unclear what's going on.

Perhaps rbd snapshots simply can't be accessed directly with the current tools (other than actually mapping a snapshot)?

See below for some test explorations...

Hi Chris,

snaphots are stored on the same OSD as current object.
rbd snapshots are self managed rather than rados pool managed, the rbd client tales responsibility for passing the correct snapshot environment/context to the OSDs in i/o operations via librados. First to create a snapshot, rbd client requests a unique snap id number from the mons. This number and snap name are persisted/stored in rbd_header.xx object for the rbd image. it is added to a list of prev snaps if any.

When rbd client writes to rbd_data.xx rados object, it passes the list of snaps. The OSD will look at the snap list and perform a lot of logic like create and copy original data to the snap before writing in case it did not have this snao before or copy data if snap did not have this offset/extent before...etc. The OSD will keep track of what snapshots it is storing for the object, their blob offset/extent in the object as well as their physical location on the OSD block device all in the rocksdb database. The physical location on block device can be far apart allocated by the allocator from free space on device. You can use ceph-objectstore-tool on the OSD to examine snapshot location and get its data.


When reading, the rbd client passes the snap id to read, or default id for head/current. I do not believe you can use rados get command on the rbd_data.xx as you were doing to get snap shot data, even if you specify the snapshot parameter to the command as i think this works with rados pool snapshots and not self managed. As a user, if you wanted to access rbd snap data, you can rbd map the snap and read from it via kernel rbd. If you want to fiddle with reading snapshots at rados level on the rbd_data.xx, you can write a librados app that first read the snap id from rbd_header.xx based on snap name then pass this id in the context to the librados read function.

/Maged




Cheers,

Chris

----------------------------------------------------------------------
##
## create a test rbd within a test pool
##
$ ceph osd pool create test
$ rbd create --size 10M --object-size 1M "test/test1"
$ rbd info test/test1
rbd image 'test1':
        size 10 MiB in 10 objects
        order 20 (1 MiB objects)
        snapshot_count: 0
        id: 08ceb039ff1c19
        block_name_prefix: rbd_data.08ceb039ff1c19
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten         op_features:         flags:         create_timestamp: Tue Oct 18 09:35:49 2022
        access_timestamp: Tue Oct 18 09:35:49 2022
        modify_timestamp: Tue Oct 18 09:35:49 2022
$ rados -p test ls --all
        rbd_directory
        rbd_info
        rbd_object_map.08ceb039ff1c19
        rbd_header.08ceb039ff1c19
        rbd_id.test1
#
# "clean" object map - but no idea what the contents mean
#
$ rados -p test get rbd_object_map.08ceb039ff1c19 - | od -t x1 > /tmp/om.clean; cat /tmp/om.clean
0000000 0e 00 00 00 01 01 08 00 00 00 0a 00 00 00 00 00
0000020 00 00 00 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
0000040 00 00 00 00 00
0000045

##
## Write to the rbd
## - confirm data appears in rbd_data.xxx object
## - rbd_object_map changes
##
$ dev=$(rbd device map "test/test1"); declare -p dev
declare -- dev="/dev/rbd0"
$ printf '1' > $dev
#
# rdb_data object appears
#
$ rados -p test ls --all | sort
        rbd_data.08ceb039ff1c19.0000000000000000
        rbd_directory
        rbd_header.08ceb039ff1c19
        rbd_id.test1
        rbd_info
        rbd_object_map.08ceb039ff1c19
#
# new rbd_data contains our written data
#
$ rados -p test get rbd_data.08ceb039ff1c19.0000000000000000 - | od -t x1
0000000 31 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0010000
#
# rbd_object_map is updated
#
$ rados -p test get rbd_object_map.08ceb039ff1c19 - | od -t x1 > /tmp/om.head.1; cat /tmp/om.head.1
0000000 0e 00 00 00 01 01 08 00 00 00 0a 00 00 00 00 00
0000020 00 00 40 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
0000040 00 d9 80 65 c7
0000045
$ diff /tmp/om.{clean,head.1}
2,3c2,3
< 0000020 00 00 00 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
< 0000040 00 00 00 00 00
---
0000020 00 00 40 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
0000040 00 d9 80 65 c7

##
## Write again, to offset 2MB (rbd is using 1MB objects)
## - new rbd_data object appears, with object-size offset appearing in the name
## - object map updated
##
$ printf 2 | dd of=${dev} bs=1M seek=2
$ rados -p test ls --all | sort
        rbd_data.08ceb039ff1c19.0000000000000000
        rbd_data.08ceb039ff1c19.0000000000000002
        rbd_directory
        rbd_header.08ceb039ff1c19
        rbd_id.test1
        rbd_info
        rbd_object_map.08ceb039ff1c19
#
# there's our data
#
$ rados -p test get rbd_data.08ceb039ff1c19.0000000000000002 - | od -t x1
0000000 32 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0010000
#
# once again the object map is updated
#
$ rados -p test get rbd_object_map.08ceb039ff1c19 - | od -t x1 > /tmp/om.head.2; cat /tmp/om.head.2
0000000 0e 00 00 00 01 01 08 00 00 00 0a 00 00 00 00 00
0000020 00 00 44 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
0000040 00 c3 24 bb 59
0000045
$ diff /tmp/om.head.{1,2}
2,3c2,3
< 0000020 00 00 40 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
< 0000040 00 d9 80 65 c7
---
0000020 00 00 44 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
0000040 00 c3 24 bb 59


##
## Create a snap
## - new rbd_object_map is created for the snap
## - the head rbd_object_map is updated
## - the snap rbd_object_map is the same as the previous head object map
##
$ rbd snap create --snap "test/test1" "test/test1@snap1"
#
# new rbd_object_map with snap id appended appears
#
$ rados -p test ls --all | sort
        rbd_data.08ceb039ff1c19.0000000000000000
        rbd_data.08ceb039ff1c19.0000000000000002
        rbd_directory
        rbd_header.08ceb039ff1c19
        rbd_id.test1
        rbd_info
        rbd_object_map.08ceb039ff1c19
        rbd_object_map.08ceb039ff1c19.0000000000000004
#
# look at the head and snap object maps
#
$ rados -p test get rbd_object_map.08ceb039ff1c19 - | od -t x1 > /tmp/om.head.3; cat /tmp/om.head.3
0000000 0e 00 00 00 01 01 08 00 00 00 0a 00 00 00 00 00
0000020 00 00 cc 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
0000040 00 45 6d cd ea
0000045
$ rados -p test get rbd_object_map.08ceb039ff1c19.0000000000000004 - | od -t x1 > /tmp/om.snap.1; cat /tmp/om.snap.1
0000000 0e 00 00 00 01 01 08 00 00 00 0a 00 00 00 00 00
0000020 00 00 44 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
0000040 00 c3 24 bb 59
0000045
#
# "head" rbd_object_map.08ceb039ff1c19 is updated
#
$ diff /tmp/om.head.{2,3}
2,3c2,3
< 0000020 00 00 44 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
< 0000040 00 c3 24 bb 59
---
0000020 00 00 cc 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
0000040 00 45 6d cd ea
#
# snap object map is the same as the previous head object map
#
$ diff /tmp/om.{head.2,snap.1} && echo 'same'
same

##
## write new data to the rbd
## - there are no new objects
## - new data appears in existing rbd_data object
## - WHERE IS THE OLD DATA???
##
$ printf '2' > $dev
#
# no new objects
#
b4# rados -p test ls --all | sort
        rbd_data.08ceb039ff1c19.0000000000000000
        rbd_data.08ceb039ff1c19.0000000000000002
        rbd_directory
        rbd_header.08ceb039ff1c19
        rbd_id.test1
        rbd_info
        rbd_object_map.08ceb039ff1c19
        rbd_object_map.08ceb039ff1c19.0000000000000004
#
# existing rbd_data is updated
#
$ rados -p test get rbd_data.08ceb039ff1c19.0000000000000000 - | od -t x1
0000000 32 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0010000
#
# head object map is updated
#
$ rados -p test get rbd_object_map.08ceb039ff1c19 - | od -t x1 > /tmp/om.head.4; cat /tmp/om.head.4
0000000 0e 00 00 00 01 01 08 00 00 00 0a 00 00 00 00 00
0000020 00 00 4c 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
0000040 00 06 1a ea 61
0000045
$ diff /tmp/om.head.{3,4}
2,3c2,3
< 0000020 00 00 cc 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
< 0000040 00 45 6d cd ea
---
0000020 00 00 4c 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
0000040 00 06 1a ea 61
#
# snap object map is the same
#
$ rados -p test get rbd_object_map.08ceb039ff1c19.0000000000000004 - | od -t x1 > /tmp/om.snap.2; cat /tmp/om.snap.2
0000000 0e 00 00 00 01 01 08 00 00 00 0a 00 00 00 00 00
0000020 00 00 44 00 00 0c 00 00 00 c6 44 f4 3a 01 00 00
0000040 00 c3 24 bb 59
0000045
$ diff /tmp/om.snap.{1,2} && echo 'same'
same

##
## WHERE IS THE OLD DATA???
##
----------------------------------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux