On 1/23/2020 2:20 AM, Igor Fedotov wrote:
Additional notes:
...
- Shared blobs created during EC overwrite seems to lack a rollback to
non-shared state after op completion (and snapshot removal). Hence
most probably they pollute onodes and DB (remember their persistence
mechanics) and negatively impact the performance. Needs more
investigation/verification though.
Additional update on the above. Here is object dump snippet for an
object from EC4+2 pool which has got 2 partial overwrites (3 writes
total) using the access pattern from my analysis.
2020-01-23T17:53:34.585+0300 7fc6026040c0 30 _dump_onode 0x5620616ca580
3#3:1b231130:::rbd_data.4.110088be0420.0000000000000000:head# nid 1652
size 0x10000 (65536)
expected_object_size 1048576 expected_write_size 1048576 in 0 shards,
0 spanning blobs
...
2020-01-23T17:53:34.585+0300 7fc6026040c0 20 bluestore(dev/osd0)
fsck_check_objects_shallow 0x0~3000: 0x0~3000 Blob(0x562061641f80
blob([0x1ee0000~10000] csum+s
hared crc32c/0x1000) use_tracker(0x10000 0x3000)
SharedBlob(0x562061640fc0 sbid 0x2804))
2020-01-23T17:53:34.585+0300 7fc6026040c0 20 bluestore(dev/osd0)
fsck_check_objects_shallow 0x3000~1000: 0x3000~1000
Blob(0x56206235c000 blob([0x2400000~10000]
csum+has_unused+shared crc32c/0x1000 unused=0x7) use_tracker(0x10000
0x1000) SharedBlob(0x56206235c070 sbid 0x2805))
2020-01-23T17:53:34.585+0300 7fc6026040c0 20 bluestore(dev/osd0)
fsck_check_objects_shallow 0x4000~c000: 0x4000~c000
Blob(0x56206235c0e0 blob([0x2410000~10000]
csum+has_unused crc32c/0x1000 unused=0xf) use_tracker(0x10000 0xc000)
SharedBlob(0x56206235c150 sbid 0x0))
2020-01-23T17:53:34.585+0300 7fc6026040c0 30 bluestore(dev/osd0)
_fsck_check_extents oid
3#3:1b231130:::rbd_data.4.110088be0420.0000000000000000:head# extents [0x2
410000~10000]
...
2020-01-23T17:53:34.585+0300 7fc6026040c0 1 bluestore(dev/osd0)
_fsck_on_open checking shared_blobs
2020-01-23T17:53:34.585+0300 7fc6026040c0 20 bluestore(dev/osd0)
_fsck_on_open SharedBlob(0x562061640fc0 sbid 0x2804) (sbid 0x2804
ref_map(0x1ee0000~10000=1))
2020-01-23T17:53:34.585+0300 7fc6026040c0 30 bluestore(dev/osd0)
_fsck_check_extents oid
3#3:1b231130:::rbd_data.4.110088be0420.0000000000000000:head# extents
[0x1ee0000~10000]
2020-01-23T17:53:34.585+0300 7fc6026040c0 20 bluestore(dev/osd0)
_fsck_on_open SharedBlob(0x56206235c070 sbid 0x2805) (sbid 0x2805
ref_map(0x2400000~10000=1))
2020-01-23T17:53:34.585+0300 7fc6026040c0 30 bluestore(dev/osd0)
_fsck_check_extents oid 3#3:1b231130:::rbd_data.4.110088b
One can see 3 blobs, two of them (which relates to the initial write and
the first overwrite) are shared. I.e. each partial overwrite might
result in a shared blob appearance which is quite expensive - each
shared blob has corresponding record in RocksDB (hence additional
lookup/update op on access), they share common container at collection
level, their handling is more complicated, etc...
And it makes no sense at this point of onode life, ref_map denotes just
a singe reference per each blob.
Hence IMO this behavior is suboptimal and might need some improvement.
Thanks,
Igor
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx