On Mon, 30 Jan 2017, Igor Fedotov wrote: > > On 30.01.2017 17:18, Sage Weil wrote: > > On Mon, 30 Jan 2017, Igor Fedotov wrote: > > > Hi Sage, > > > > > > It looks like there is some bug somewhere in > > > BlueStore/store_test/clone_range. > > > > > > I'm occasionally hitting an assert on mismatched data in read result while > > > performing SyntheticMatrixCsumVsCompression/2 test case. > > > > > > --- buffer mismatch between offset 0x7400 and 0xa200, total 0x19e00 > > > --- expected: > > > 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > |................| > > > * > > > > > > 00006c00 39 35 31 37 32 37 31 34 34 31 38 39 31 33 37 39 > > > |9517271441891379| > > > > > > <skipped> > > > > > > 00007400 30 31 33 32 34 39 35 35 30 38 32 37 39 32 37 31 > > > |0132495508279271| > > > > > > 00007410 37 37 37 31 31 38 31 37 33 36 32 36 33 33 31 34 > > > |7771181736263314| > > > > > > --- actual: > > > 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > |................| > > > * > > > 00006c00 39 35 31 37 32 37 31 34 34 31 38 39 31 33 37 39 > > > |9517271441891379| > > > > > > <skipped> > > > > > > 00007400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > |................| > > > * > > > 0000a200 32 35 32 32 38 33 31 34 35 38 37 36 34 35 36 33 > > > |2522831458764563| > > > > > > Multiple runs are required to hit that though... > > > > > > I did some analysis and it seems that there are some issues with > > > clone_range2 > > > stuff. > > > > > > First of all - do we have any limits prerequisites on src/dst offsets in > > > this > > > request? E.g. should they be aligned similarly within alloc unit > > > boundaries? I > > > recall some discussions on that a while ago. > > > > > > store_test doesn't have any as far as I can see, e.g. (min_alloc_size = > > > 0x10000) > > > > > > "ops": [ > > > { > > > "op_num": 0, > > > "op_name": "clonerange2", > > > "collection": "555.0_head", > > > "src_oid": > > > "#555:3b000000:::OBJ_731aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa > > > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa > > > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:head#", > > > "dst_oid": "#555:c7000000:::OBJ_738:7cfc81ab#", > > > "src_offset": 107520, > > > ` "len": 78336, > > > "dst_offset": 27648 > > > } > > > ] > > > > > > This results in potentially invalid blobs for the destination objects, see > > > extent starting at 0x7400 below - it has blob offset = 0 and hence blob > > > isn't > > > aligned with min_alloc_size: > > Oh, right. > > > > Well, the good news is the OSD no longer has any callers for which the src > > and dst clone_range offsets are different, so we could simply assert that > > they match. That's the simplest fix. It's party a question of whether > > we expect future cases where we will need to clone between offsets. > > Perhaps we assert for now but don't clean up the interface in case > > we need to backtrack later? > > > > Or we could do something more limited. The problem below is less about > > min_alloc_size and more that it's not block aligned, I think, right? We > Perhaps you're right and that's rather about block aligned blobs/extents. But > I'm a bit worried about having AU-unaligned blobs. IMO we don't test such > cases much. I think it's better to produce extents/blobs similarly to the > mainstream write path, i.e. AU-alignment. Other approaches are more > error-prone and much harder to catch due to their rarity. On the other hand we > have a capability to modify AU-size on the flight and hence we violate > AU-alignment requirement this way too.... I think we should either adjust the store_test synthetic thing to adjust min_alloc_size randomly, or drop the ability to change it at all. Leaning toward the latter.. let's discuss during standup. > > could make clone_range fall back to the read/write path if the alignment > > does not match the block device... > Yeah, that makes sense. Especially we do R/W for unaligned head/tail only.. > > Actually my major concern is a broken store_test for now. Should we force > aligned-only offsets there at the moment? Yeah, let's do that! sage > > > > sage > > > > > 2017-01-30 03:57:17.802440 7f0036a20700 15 > > > bluestore(bluestore.test_temp_dir) > > > read 555.0_head #555:c7000000:::OBJ_738:7cfc81ab# 0x0~19e0 > > > 0 > > > 2017-01-30 03:57:17.802448 7f0036a20700 30 > > > bluestore.OnodeSpace(0x55eb49789b78 > > > in 0x55eb45dd0620) lookup > > > 2017-01-30 03:57:17.802450 7f0036a20700 30 > > > bluestore.OnodeSpace(0x55eb49789b78 > > > in 0x55eb45dd0620) lookup #555:c7000000:::OBJ_738:7cfc81a > > > b# hit 0x55eb49874700 > > > 2017-01-30 03:57:17.802453 7f0036a20700 20 > > > bluestore(bluestore.test_temp_dir) > > > _do_read 0x0~19e00 size 0x19e00 (105984) > > > 2017-01-30 03:57:17.802455 7f0036a20700 20 bluestore.onode(0x55eb49874700) > > > flush done > > > 2017-01-30 03:57:17.802456 7f0036a20700 30 > > > bluestore.extentmap(0x55eb49874850) > > > fault_range 0x0~19e00 > > > 2017-01-30 03:57:17.802457 7f0036a20700 30 > > > bluestore(bluestore.test_temp_dir) > > > _dump_onode 0x55eb49874700 #555:c7000000:::OBJ_738:7cfc81a > > > b# nid 17377 size 0x19e00 (105984) expected_object_size 2097152 > > > expected_write_size 4096 in 0 shards > > > 2017-01-30 03:57:17.802461 7f0036a20700 30 > > > bluestore(bluestore.test_temp_dir) > > > _dump_extent_map 0x6c00~800: 0x3800~800 Blob(0x55eb5047a4 > > > 60 blob([0x40190000~4000] csum+has_unused+shared crc32c/0x1000 > > > unused=0xff) > > > ref_map(0x3800~800=1) SharedBlob(0x55eb4c2f49f0 sbid 0x3adf > > > loaded shared_blob(ref_map(0x40190000~4000=2)))) > > > 2017-01-30 03:57:17.802469 7f0036a20700 30 > > > bluestore(bluestore.test_temp_dir) > > > _dump_extent_map csum: [0,0,f1e4ed4a,417bbe91] > > > 2017-01-30 03:57:17.802472 7f0036a20700 30 > > > bluestore(bluestore.test_temp_dir) > > > _dump_extent_map 0x7400~12a00: 0x0~12a00 Blob(0x55eb4dc87 > > > b80 blob([0x40194000~18000] csum+shared crc32c/0x1000) > > > ref_map(0x0~12a00=1) > > > SharedBlob(0x55eb4c2f5180 sbid 0x3ae0 loaded shared_blob(ref > > > _map(0x40194000~18000=3)))) > > > 2017-01-30 03:57:17.802479 7f0036a20700 30 > > > bluestore(bluestore.test_temp_dir) > > > _dump_extent_map csum: [d1f849c5,fbe516b8,518379f8,b8 > > > b944c8,18b7be23,2b6562d5,51de5770,40988db7,bf7fd7f3,14744e41,eddcb459,639b3350,d038700c,80ffc21e,d7f4edb3,a7ae1a9,f123b379,dfb76444,8ac0 > > > 3032,c1cbff33,629e4868,12d9f0ea,5d50ca8c,b7ce671d] > > > 2017-01-30 03:57:17.802484 7f0036a20700 30 > > > bluestore(bluestore.test_temp_dir) > > > _dump_extent_map 0x0~18000 buffer(0x55eb45deb020 spa > > > ce 0x55eb4c2f51d8 0x0~18000 clean) > > > 2017-01-30 03:57:17.802487 7f0036a20700 30 > > > bluestore(bluestore.test_temp_dir) > > > _do_read hole 0x0~6c00 > > > 2017-01-30 03:57:17.802490 7f0036a20700 20 > > > bluestore(bluestore.test_temp_dir) > > > _do_read blob Blob(0x55eb5047a460 blob([0x40190000~4000] > > > csum+has_unused+shared crc32c/0x1000 unused=0xff) ref_map(0x3800~800=1) > > > SharedBlob(0x55eb4c2f49f0 sbid 0x3adf loaded shared_blob(ref_map > > > (0x40190000~4000=2)))) need 0x3800~800 cache has 0x[] > > > 2017-01-30 03:57:17.802495 7f0036a20700 30 > > > bluestore(bluestore.test_temp_dir) > > > _do_read will read 0x6c00: 0x3800~800 > > > 2017-01-30 03:57:17.802509 7f0036a20700 20 > > > bluestore(bluestore.test_temp_dir) > > > _do_read blob Blob(0x55eb4dc87b80 blob([0x40194000~18000] > > > csum+shared crc32c/0x1000) ref_map(0x0~12a00=1) > > > SharedBlob(0x55eb4c2f5180 > > > sbid 0x3ae0 loaded shared_blob(ref_map(0x40194000~18000=3)))) > > > need 0x0~12a00 cache has 0x[0~12a00] > > > 2017-01-30 03:57:17.802515 7f0036a20700 30 > > > bluestore(bluestore.test_temp_dir) > > > _do_read use cache 0x7400: 0x0~12a00 > > > 2017-01-30 03:57:17.802519 7f0036a20700 20 > > > bluestore(bluestore.test_temp_dir) > > > _do_read blob Blob(0x55eb5047a460 blob([0x40190000~4000] > > > csum+has_unused+shared crc32c/0x1000 unused=0xff) ref_map(0x3800~800=1) > > > SharedBlob(0x55eb4c2f49f0 sbid 0x3adf loaded > > > shared_blob(ref_map(0x40190000~4000=2)))) need 0x0x6c00:3800~800 > > > 2017-01-30 03:57:17.802529 7f0036a20700 20 > > > bluestore(bluestore.test_temp_dir) > > > _do_read region 0x6c00: 0x3800~800 reading 0x3000~1000 > > > > > > I haven't unwind all the clone_range transformations that lead to this > > > state > > > yet. In the example above source object already has the same unaligned > > > extents > > > issue. > > > > > > But anyway it appears that clone_range neither care nor assert on > > > unaligned > > > input offsets... > > > > > > I can share a couple of logs if needed.. > > > > > > Any comments? > > > > > > Thanks, > > > > > > Igor > > > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html