On Mon, 28 Aug 2017, Xuehan Xu wrote: > Hi,sage. > > About the PR: https://github.com/ceph/ceph/pull/17017, I've tested the > test case in our test environment multiple times, and it never failed. > Where can I find the logs of the OSDs on which the test case failed? > It might be helpful for me to find the cause of the failure. http://pulpito.ceph.com/yuriw-2017-08-24_14:50:56-rados-wip-yuri-testing4_2017_8_24-distro-basic-smithi/ has several failures. For example, http://qa-proxy.ceph.com/teuthology/yuriw-2017-08-24_14:50:56-rados-wip-yuri-testing4_2017_8_24-distro-basic-smithi/1559338/ You'll find all the logs in remote/*/log Thanks! sage > > Thanks:-) > > > On 24 August 2017 at 21:20, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > On Thu, 24 Aug 2017, Xuehan Xu wrote: > >> Hi, Sage. > >> > >> It seems that, when calculating a clone object's bytes, the amount of > >> data recorded in clone_overlap is subtracted from the clone object's > >> size, as demonstrated in SnapSet::get_clone_bytes. This seems to be > >> right when the underlying ObjectStore is using COW clone, but, for > >> ObjectStores like filestore, this seems not to make sense since all > >> the object's data is copied. Is this right? > > > > Right. It was done that way when we were targetting FileStore on top of > > btrfs, which would do the copy-on-write. BlueStore does it now too, so we > > probably want to preserve the behavior. > > > > sage > > > > > > > > >> Thank you:-) > >> > >> On 24 August 2017 at 00:54, Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> > On Thu, 24 Aug 2017, Xuehan Xu wrote: > >> >> Hi, sage. > >> >> > >> >> Thanks for your reply:-) > >> >> > >> >> Actually, our problem is as follows: we are trying to backup the rbd > >> >> data in a ceph cluster which is cache-tier enabled using "rbd > >> >> export-diff", however, due to the missing of some clone overlap, the > >> >> export-diff operation can't be right. > >> >> > >> >> After debugging, we found that there are two reasons for this missing, > >> >> one of which is as mentioned in this thread: > >> >> http://tracker.ceph.com/issues/21072, which would led to the overlap > >> >> recorded in the extensive attrs not to be "copied" to the base tier. > >> >> And when we try to fix it, we found that the "_" policy is in our way, > >> >> because the clone overlap is stored in "snapset" attr which has a non > >> >> "_" prefixed name, and even if we modify the code to ignore the "_" > >> >> policy when getting the extensive attrs, a "_" will be added when > >> >> storing them into the base tier, which would make the following > >> >> "list-snaps" operation not able to find the clone overlap. > >> > > >> > Okay. The fix will need to explicitly copy the overlap metadata as part > >> > of copy-get, though... not the _ attr(s). Those are managed locally in > >> > the pool and not meant to be copied across pools (you'll get all sorts of > >> > bugs if you try to do that!). > >> > > >> >> And the other reason that made the export-diff operate incorrectly is > >> >> as described in http://tracker.ceph.com/issues/20896. To fix this, we > >> >> are planning to add a "cache_clone_overlap" in SnapSet to track the > >> >> cache stats while leaving the orignal "clone_overlap" to track > >> >> modifications to the object. Is this the right way to go? > >> > > >> > I see. I think that would work. It might make sense to take a fresh look > >> > at what the cache stats are, though, to see if it is possible to get rid > >> > of that condition and avoiding maintaining a > >> > duplicate-but-slightly-different structure just for the cache tier. I > >> > don't like the conditional behavior there... it really seems like > >> > clone_overlap should not vary between the cache and base tiers, and any > >> > adjustments for missing clones should happen only with the stats > >> > calculation but not with clone_overlap itself. > >> > > >> > > >> >> By the way, last month, when trying to fix the bug: > >> >> http://tracker.ceph.com/issues/17445, you asked for a test case in > >> >> master branch to confirm that master doesn't have the same bug. I've > >> >> submitted that test case in the PR: > >> >> https://github.com/ceph/ceph/pull/17017 and cherry-pick it into PR: > >> >> https://github.com/ceph/ceph/pull/16355. I'm very sorry for taking so > >> >> long, I was very busy during the past month trying to solve the above > >> >> problems. Could you help me review it when you are free, please? > >> > > >> > Thanks! > >> > sage > >> > > >> > > >> >> > >> >> Thanks very much:-) > >> >> > >> >> On 24 August 2017 at 00:07, Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> >> > On Wed, 23 Aug 2017, Xuehan Xu wrote: > >> >> >> Hi, everyone. > >> >> >> > >> >> >> In ReplicatedPG::_write_copy_chunk method, I saw the following code: > >> >> >> > >> >> >> if (!cop->temp_cursor.attr_complete) { > >> >> >> t->touch(cop->results.temp_oid); > >> >> >> for (map<string, bufferlist>::iterator p = cop->attrs.begin(); > >> >> >> p != cop->attrs.end(); ++p) { > >> >> >> cop->results.attrs[string("_") + p->first] = p->second; > >> >> >> t->setattr(cop->results.temp_oid, string("_") + p->first, > >> >> >> p->second); > >> >> >> } > >> >> >> cop->attrs.clear(); > >> >> >> } > >> >> >> > >> >> >> It seems that user specified attrs are prefixed by "_", but why add > >> >> >> "_" here, in ReplicatedPG::_write_copy_chunk? It seems that this > >> >> >> method is used for copying objects in the RADOS cluster,. > >> >> > > >> >> > That is by design. The non-_ attributes are managed locally. > >> >> > > >> >> > In this case, the clone_overlap is not preserved because there is no code > >> >> > to attempt to preserve it across the cache tier. (And blindly copying the > >> >> > _ attribute wouldn't work.) > >> >> > > >> >> > I'm not sure what the issues would be with passing that metadata across > >> >> > the promote and flush operations... I imagine it will be a bit fragile, > >> >> > though. > >> >> > > >> >> > Is this something that is really worth fixing? > >> >> > > >> >> > sage > >> >> > > >> >> > > >> >> >> > >> >> >> On 23 August 2017 at 15:40, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: > >> >> >> > It seems that when calling ReplicatedPG::getattrs_maybe_cache in > >> >> >> > ReplicatedPG::fill_in_copy_get, "user_only" should be false. Is this > >> >> >> > right? > >> >> >> > > >> >> >> > On 23 August 2017 at 15:25, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: > >> >> >> >> I submitted an issue for this: > >> >> >> >> http://tracker.ceph.com/issues/21072?next_issue_id=21071 > >> >> >> >> > >> >> >> >> On 23 August 2017 at 15:24, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: > >> >> >> >>> Hi, everyone. > >> >> >> >>> > >> >> >> >>> Recently, we did a test as follows: > >> >> >> >>> > >> >> >> >>> We enabled cache tier and added a cache pool "vms_back_cache" on top > >> >> >> >>> of the base pool "vms_back". we first created an object, and then > >> >> >> >>> created a snap in the base pool and writing to that object again, > >> >> >> >>> which would make the object be promoted into the cache pool. At this > >> >> >> >>> time, we used "ceph-objectstore-tool" to dump the object, and the > >> >> >> >>> result is as follows: > >> >> >> >>> > >> >> >> >>> { > >> >> >> >>> "id": { > >> >> >> >>> "oid": "test.obj.6", > >> >> >> >>> "key": "", > >> >> >> >>> "snapid": -2, > >> >> >> >>> "hash": 750422257, > >> >> >> >>> "max": 0, > >> >> >> >>> "pool": 11, > >> >> >> >>> "namespace": "", > >> >> >> >>> "max": 0 > >> >> >> >>> }, > >> >> >> >>> "info": { > >> >> >> >>> "oid": { > >> >> >> >>> "oid": "test.obj.6", > >> >> >> >>> "key": "", > >> >> >> >>> "snapid": -2, > >> >> >> >>> "hash": 750422257, > >> >> >> >>> "max": 0, > >> >> >> >>> "pool": 11, > >> >> >> >>> "namespace": "" > >> >> >> >>> }, > >> >> >> >>> "version": "5010'5", > >> >> >> >>> "prior_version": "4991'3", > >> >> >> >>> "last_reqid": "client.175338.0:1", > >> >> >> >>> "user_version": 5, > >> >> >> >>> "size": 4194303, > >> >> >> >>> "mtime": "2017-08-23 15:09:03.459892", > >> >> >> >>> "local_mtime": "2017-08-23 15:09:03.461111", > >> >> >> >>> "lost": 0, > >> >> >> >>> "flags": 4, > >> >> >> >>> "snaps": [], > >> >> >> >>> "truncate_seq": 0, > >> >> >> >>> "truncate_size": 0, > >> >> >> >>> "data_digest": 4294967295, > >> >> >> >>> "omap_digest": 4294967295, > >> >> >> >>> "watchers": {} > >> >> >> >>> }, > >> >> >> >>> "stat": { > >> >> >> >>> "size": 4194303, > >> >> >> >>> "blksize": 4096, > >> >> >> >>> "blocks": 8200, > >> >> >> >>> "nlink": 1 > >> >> >> >>> }, > >> >> >> >>> "SnapSet": { > >> >> >> >>> "snap_context": { > >> >> >> >>> "seq": 13, > >> >> >> >>> "snaps": [ > >> >> >> >>> 13 > >> >> >> >>> ] > >> >> >> >>> }, > >> >> >> >>> "head_exists": 1, > >> >> >> >>> "clones": [ > >> >> >> >>> { > >> >> >> >>> "snap": 13, > >> >> >> >>> "size": 4194303, > >> >> >> >>> "overlap": "[0~100,115~4194188]" > >> >> >> >>> } > >> >> >> >>> ] > >> >> >> >>> } > >> >> >> >>> } > >> >> >> >>> > >> >> >> >>> Then we did cache-flush and cache-evict to flush that object down to > >> >> >> >>> the base pool, and, again, used "ceph-objectstore-tool" to dump the > >> >> >> >>> object in the base pool: > >> >> >> >>> > >> >> >> >>> { > >> >> >> >>> "id": { > >> >> >> >>> "oid": "test.obj.6", > >> >> >> >>> "key": "", > >> >> >> >>> "snapid": -2, > >> >> >> >>> "hash": 750422257, > >> >> >> >>> "max": 0, > >> >> >> >>> "pool": 10, > >> >> >> >>> "namespace": "", > >> >> >> >>> "max": 0 > >> >> >> >>> }, > >> >> >> >>> "info": { > >> >> >> >>> "oid": { > >> >> >> >>> "oid": "test.obj.6", > >> >> >> >>> "key": "", > >> >> >> >>> "snapid": -2, > >> >> >> >>> "hash": 750422257, > >> >> >> >>> "max": 0, > >> >> >> >>> "pool": 10, > >> >> >> >>> "namespace": "" > >> >> >> >>> }, > >> >> >> >>> "version": "5015'4", > >> >> >> >>> "prior_version": "4991'2", > >> >> >> >>> "last_reqid": "osd.34.5013:1", > >> >> >> >>> "user_version": 5, > >> >> >> >>> "size": 4194303, > >> >> >> >>> "mtime": "2017-08-23 15:09:03.459892", > >> >> >> >>> "local_mtime": "2017-08-23 15:10:48.122138", > >> >> >> >>> "lost": 0, > >> >> >> >>> "flags": 52, > >> >> >> >>> "snaps": [], > >> >> >> >>> "truncate_seq": 0, > >> >> >> >>> "truncate_size": 0, > >> >> >> >>> "data_digest": 163942140, > >> >> >> >>> "omap_digest": 4294967295, > >> >> >> >>> "watchers": {} > >> >> >> >>> }, > >> >> >> >>> "stat": { > >> >> >> >>> "size": 4194303, > >> >> >> >>> "blksize": 4096, > >> >> >> >>> "blocks": 8200, > >> >> >> >>> "nlink": 1 > >> >> >> >>> }, > >> >> >> >>> "SnapSet": { > >> >> >> >>> "snap_context": { > >> >> >> >>> "seq": 13, > >> >> >> >>> "snaps": [ > >> >> >> >>> 13 > >> >> >> >>> ] > >> >> >> >>> }, > >> >> >> >>> "head_exists": 1, > >> >> >> >>> "clones": [ > >> >> >> >>> { > >> >> >> >>> "snap": 13, > >> >> >> >>> "size": 4194303, > >> >> >> >>> "overlap": "[]" > >> >> >> >>> } > >> >> >> >>> ] > >> >> >> >>> } > >> >> >> >>> } > >> >> >> >>> > >> >> >> >>> As is shown, the "overlap" field is empty. > >> >> >> >>> In the osd log, we found the following records: > >> >> >> >>> > >> >> >> >>> 2017-08-23 12:46:36.083014 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3( > >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14) > >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean] got > >> >> >> >>> attrs > >> >> >> >>> 2017-08-23 12:46:36.083021 7f675c704700 15 > >> >> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) read > >> >> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head# > >> >> >> >>> 0~8 > >> >> >> >>> 2017-08-23 12:46:36.083398 7f675c704700 10 > >> >> >> >>> filestore(/home/xuxuehan/github-xxh-fork/ceph/src/dev/osd0) > >> >> >> >>> FileStore::read > >> >> >> >>> 3.3_head/#3:dd4db749:test-rados-api-xxh02v.ops.corp.qihoo.net-10886-3::foo:head# > >> >> >> >>> 0~8/8 > >> >> >> >>> 2017-08-23 12:46:36.083414 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3( > >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14) > >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean] got > >> >> >> >>> data > >> >> >> >>> 2017-08-23 12:46:36.083444 7f675c704700 20 osd.0 pg_epoch: 19 pg[3.3( > >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14) > >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean] > >> >> >> >>> cursor.is_complete=0 0 attrs 8 bytes 0 omap header bytes 0 omap data > >> >> >> >>> bytes in 0 keys 0 reqids > >> >> >> >>> 2017-08-23 12:46:36.083457 7f675c704700 10 osd.0 pg_epoch: 19 pg[3.3( > >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14) > >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean] > >> >> >> >>> dropping ondisk_read_lock > >> >> >> >>> 2017-08-23 12:46:36.083467 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3( > >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14) > >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean] > >> >> >> >>> do_osd_op_effects osd.0 con 0x7f67874f0d00 > >> >> >> >>> 2017-08-23 12:46:36.083478 7f675c704700 15 osd.0 pg_epoch: 19 pg[3.3( > >> >> >> >>> v 15'2 (0'0,15'2] local-les=15 n=2 ec=14 les/c/f 15/15/0 14/14/14) > >> >> >> >>> [0,2,1] r=0 lpr=14 crt=0'0 lcod 15'1 mlcod 15'1 active+clean] > >> >> >> >>> log_op_stats osd_op(osd.0.6:2 3.92edb2bb > >> >> >> >>> test-rados-api-xxh02v.ops.corp > >> >> >> >>> > >> >> >> >>> It seems that, when doing "copy-get", no extensive attributes are > >> >> >> >>> copied. We believe that it's the following code that led to this > >> >> >> >>> result: > >> >> >> >>> > >> >> >> >>> int ReplicatedPG::getattrs_maybe_cache(ObjectContextRef obc, > >> >> >> >>> map<string, bufferlist> *out, > >> >> >> >>> bool user_only) { > >> >> >> >>> int r = 0; > >> >> >> >>> if (pool.info.require_rollback()) { > >> >> >> >>> if (out) > >> >> >> >>> *out = obc->attr_cache; > >> >> >> >>> } else { > >> >> >> >>> r = pgbackend->objects_get_attrs(obc->obs.oi.soid, out); > >> >> >> >>> } > >> >> >> >>> if (out && user_only) { > >> >> >> >>> map<string, bufferlist> tmp; > >> >> >> >>> for (map<string, bufferlist>::iterator i = out->begin(); > >> >> >> >>> i != out->end(); ++i) { > >> >> >> >>> if (i->first.size() > 1 && i->first[0] == '_') > >> >> >> >>> tmp[i->first.substr(1, i->first.size())].claim(i->second); > >> >> >> >>> } > >> >> >> >>> tmp.swap(*out); > >> >> >> >>> } > >> >> >> >>> return r; > >> >> >> >>> } > >> >> >> >>> > >> >> >> >>> It seems that when "user_only" is true, extensive attributes without a > >> >> >> >>> '_' as the starting character in its name would be filtered out. Is it > >> >> >> >>> supposed to be doing things in this way? > >> >> >> >>> And we found that there are only two places in the source code that > >> >> >> >>> invoked ReplicatedPG::getattrs_maybe_cache, in both of which > >> >> >> >>> "user_only" is true. Why add this parameter? > >> >> >> >>> > >> >> >> >>> By the way, we also found that these codes are added in commit > >> >> >> >>> 78d9c0072bfde30917aea4820a811d7fc9f10522, but we don't understand the > >> >> >> >>> purpose of it. > >> >> >> >>> > >> >> >> >>> Thank you:-) > >> >> >> -- > >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> >> >> > >> >> >> > >> >> > >> >> > >> > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html