Re: Root cause analysis for space overhead with erasure coded pools.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/23/2020 2:20 AM, Igor Fedotov wrote:
Additional notes:


...
- Shared blobs created during EC overwrite seems to lack a rollback to non-shared state after op completion (and snapshot removal). Hence most probably they pollute onodes and DB (remember their persistence mechanics) and negatively impact the performance. Needs more investigation/verification though.

Additional update on the above. Here is object dump snippet for an object from EC4+2 pool which has got 2 partial overwrites (3 writes total) using the access pattern from my analysis.

2020-01-23T17:53:34.585+0300 7fc6026040c0 30 _dump_onode 0x5620616ca580 3#3:1b231130:::rbd_data.4.110088be0420.0000000000000000:head# nid 1652 size 0x10000 (65536)  expected_object_size 1048576 expected_write_size 1048576 in 0 shards, 0 spanning blobs

...

2020-01-23T17:53:34.585+0300 7fc6026040c0 20 bluestore(dev/osd0) fsck_check_objects_shallow    0x0~3000: 0x0~3000 Blob(0x562061641f80 blob([0x1ee0000~10000] csum+s hared crc32c/0x1000) use_tracker(0x10000 0x3000) SharedBlob(0x562061640fc0 sbid 0x2804)) 2020-01-23T17:53:34.585+0300 7fc6026040c0 20 bluestore(dev/osd0) fsck_check_objects_shallow    0x3000~1000: 0x3000~1000 Blob(0x56206235c000 blob([0x2400000~10000] csum+has_unused+shared crc32c/0x1000 unused=0x7) use_tracker(0x10000 0x1000) SharedBlob(0x56206235c070 sbid 0x2805)) 2020-01-23T17:53:34.585+0300 7fc6026040c0 20 bluestore(dev/osd0) fsck_check_objects_shallow    0x4000~c000: 0x4000~c000 Blob(0x56206235c0e0 blob([0x2410000~10000] csum+has_unused crc32c/0x1000 unused=0xf) use_tracker(0x10000 0xc000) SharedBlob(0x56206235c150 sbid 0x0)) 2020-01-23T17:53:34.585+0300 7fc6026040c0 30 bluestore(dev/osd0) _fsck_check_extents oid 3#3:1b231130:::rbd_data.4.110088be0420.0000000000000000:head# extents [0x2
410000~10000]
...

2020-01-23T17:53:34.585+0300 7fc6026040c0  1 bluestore(dev/osd0) _fsck_on_open checking shared_blobs 2020-01-23T17:53:34.585+0300 7fc6026040c0 20 bluestore(dev/osd0) _fsck_on_open  SharedBlob(0x562061640fc0 sbid 0x2804) (sbid 0x2804 ref_map(0x1ee0000~10000=1)) 2020-01-23T17:53:34.585+0300 7fc6026040c0 30 bluestore(dev/osd0) _fsck_check_extents oid 3#3:1b231130:::rbd_data.4.110088be0420.0000000000000000:head# extents [0x1ee0000~10000] 2020-01-23T17:53:34.585+0300 7fc6026040c0 20 bluestore(dev/osd0) _fsck_on_open  SharedBlob(0x56206235c070 sbid 0x2805) (sbid 0x2805 ref_map(0x2400000~10000=1)) 2020-01-23T17:53:34.585+0300 7fc6026040c0 30 bluestore(dev/osd0) _fsck_check_extents oid 3#3:1b231130:::rbd_data.4.110088b


One can see 3 blobs, two of them (which relates to the initial write and the first overwrite) are shared. I.e. each partial overwrite might result in a shared blob appearance which is quite expensive - each shared blob has corresponding record in RocksDB (hence additional lookup/update op on access), they share common container at collection level, their handling is more complicated, etc...

And it makes no sense at this point of onode life, ref_map denotes just a singe reference per each blob.

Hence IMO this behavior is suboptimal and might need some improvement.

Thanks,

Igor

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux