Re: About the problem "export_diff relies on clone_overlap, which is lost when cache tier is enabled"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, everyone.

I've found that the reason that clone overlap modifications should
pass "is_present_clone" condition check is to make cache stats right
as demonstrated by http://tracker.ceph.com/issues/7964.

So I think if I add a similar clone overlap, say cache_clone_overlap,
that tracks the disk space usage when it's in cache, leaving the
original clone overlap track all the modifications for export-diff to
be right, and make the "copy_get" operation also copys the object's
extensive attributes which are "object_info" and "snapset", the
problem should be solved. Does this sound reasonable to you?

Thank you:-)

On 10 August 2017 at 23:54, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> I cannot speak to the design principles behind this since I didn't
> write it. Whatever the solution is, the results should be consistent
> regardless of the current promotion/eviction/clean/dirty status of an
> object between the base and cache tier. For example, if the cache tier
> always performs a full object write when it writes back to the base
> tier, the base tier clone overlap would show zero overlap, so it would
> be inconsistent for the cache tier to show an accurate clone overlap
> up until the point the object is evicted.
>
> On Thu, Aug 10, 2017 at 11:11 AM, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
>> Yes, that's right. However, as you say, that doesn't seem to be very
>> efficient, and there strict time window constrains to our backup
>> tasks.
>>
>> On the other hand, I still don't quite understand why we can't just
>> simply move the clone overlap subtraction out of the
>> "is_present_clone" condition check in the PrimaryLogPG::make_writeable
>> method. I've searched and read all the source codes that are related
>> to the clone overlap, it seems that clone overlap is only used for
>> calculating the difference between clones or object subsets that needs
>> to be recovered or backfilled. If this is true, wouldn't it be
>> reasonable to keep the overlap accurate no matter whether the clone
>> objects are in the cache tier or the base tier? Why leave it
>> incomplete when clone objects are not in the cache tier?
>>
>> Could you show us some more design principles behind this, please?
>> Maybe we can figure out a better solution if we can fully understand
>> the design intentions:-)
>>
>> Thanks for helping us:-)
>>
>> On 10 August 2017 at 21:11, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>>> Yes, it's quite possible that for simplicity sake the cache tier OSD
>>> writes back the full object to the base tier even if only a single
>>> byte has changed. In that case, while not optimal behavior, at least
>>> if you combine the base + cache tier snapsets together, RBD should be
>>> able to export enough data to reconstruct the image.
>>>
>>> On Thu, Aug 10, 2017 at 3:06 AM, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
>>>> Hi, Jason.
>>>>
>>>> I did a test, it turned out that, after flushing the object out of the
>>>> cache tier, the clone overlap in base tier changed to empty, as is
>>>> shown below. I think this maybe because that the flush operation just
>>>> mark the whole range of the object being flushed as modified, so if
>>>> the object's size hasn't changed, the overlap becomes empty. Is this
>>>> right?
>>>>
>>>> Thank you:-)
>>>>
>>>> {
>>>>     "id": {
>>>>         "oid": "test.obj",
>>>>         "key": "",
>>>>         "snapid": -2,
>>>>         "hash": 3575411564,
>>>>         "max": 0,
>>>>         "pool": 10,
>>>>         "namespace": "",
>>>>         "max": 0
>>>>     },
>>>>     "info": {
>>>>         "oid": {
>>>>             "oid": "test.obj",
>>>>             "key": "",
>>>>             "snapid": -2,
>>>>             "hash": 3575411564,
>>>>             "max": 0,
>>>>             "pool": 10,
>>>>             "namespace": ""
>>>>         },
>>>>         "version": "4876'9",
>>>>         "prior_version": "4854'8",
>>>>         "last_reqid": "osd.35.4869:1",
>>>>         "user_version": 16,
>>>>         "size": 4194303,
>>>>         "mtime": "2017-08-10 14:54:56.087387",
>>>>         "local_mtime": "2017-08-10 14:59:15.252755",
>>>>         "lost": 0,
>>>>         "flags": 52,
>>>>         "snaps": [],
>>>>         "truncate_seq": 0,
>>>>         "truncate_size": 0,
>>>>         "data_digest": 2827420887,
>>>>         "omap_digest": 4294967295,
>>>>         "watchers": {}
>>>>     },
>>>>     "stat": {
>>>>         "size": 4194303,
>>>>         "blksize": 4096,
>>>>         "blocks": 8200,
>>>>         "nlink": 1
>>>>     },
>>>>     "SnapSet": {
>>>>         "snap_context": {
>>>>             "seq": 3,
>>>>             "snaps": [
>>>>                 3
>>>>             ]
>>>>         },
>>>>         "head_exists": 1,
>>>>         "clones": [
>>>>             {
>>>>                 "snap": 3,
>>>>                 "size": 4194303,
>>>>                 "overlap": "[]"
>>>>             }
>>>>         ]
>>>>     }
>>>> }
>>>>
>>>> On 9 August 2017 at 23:26, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>>>>> If you flush the object out of the cache tier so that its changes are
>>>>> recorded in the base tier, is the overlap correctly recorded in the
>>>>> base tier?
>>>>>
>>>>> On Wed, Aug 9, 2017 at 12:24 AM, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
>>>>>> By the way, according to our test, since the modified range is not
>>>>>> recorded either in the cache tier or in the base tier, I think
>>>>>> proxying the "list-snaps" down to the base tier might not work as
>>>>>> well, is that right?
>>>>>>
>>>>>> On 9 August 2017 at 12:20, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
>>>>>>> Sorry, I didn't "reply all"....:-)
>>>>>>>
>>>>>>>
>>>>>>> ---------- Forwarded message ----------
>>>>>>> From: Xuehan Xu <xxhdx1985126@xxxxxxxxx>
>>>>>>> Date: 9 August 2017 at 12:14
>>>>>>> Subject: Re: About the problem "export_diff relies on clone_overlap,
>>>>>>> which is lost when cache tier is enabled"
>>>>>>> To: dillaman@xxxxxxxxxx
>>>>>>>
>>>>>>>
>>>>>>> Um, no, I don't think they are related.
>>>>>>>
>>>>>>> My problem is this:
>>>>>>>
>>>>>>> At first , we tried to use "rbd export-diff" to do incremental backup
>>>>>>> on Jewel verion ceph cluster which is cache-tier enabled. However,
>>>>>>> when we compare the original rbd image and the backup rbd image, we
>>>>>>> find that they are different. After a series of debugging, we found
>>>>>>> that this is because WRITE ops' "modified_range" is not substracted
>>>>>>> from the clone overlap of its targeting object's HEAD version when
>>>>>>> that object's HEAD verion is in cache iter and its most recent clone
>>>>>>> object is not, which led to the miscalculation of the
>>>>>>> "calc_snap_set_diff" function.
>>>>>>>
>>>>>>> For example, we did such a test: we first made created a snap for an
>>>>>>> rbd image "test.2.img" whose size is only 4MB which means it only
>>>>>>> contains one object; then, we sent a series of AioWrites to
>>>>>>> "test.2.img" to promote its HEAD object into cache tier, while leaving
>>>>>>> its clone object in the base tier only; at that time, we used
>>>>>>> "ceph-objectstore-tool" to dump the object we wrote to, and the result
>>>>>>> was as follows:
>>>>>>>
>>>>>>> {
>>>>>>>     "id": {
>>>>>>>         "oid": "rbd_data.2aae62ae8944a.0000000000000000",
>>>>>>>         "key": "",
>>>>>>>         "snapid": -2,
>>>>>>>         "hash": 2375431681,
>>>>>>>         "max": 0,
>>>>>>>         "pool": 4,
>>>>>>>         "namespace": "",
>>>>>>>         "max": 0
>>>>>>>     },
>>>>>>>     "info": {
>>>>>>>         "oid": {
>>>>>>>             "oid": "rbd_data.2aae62ae8944a.0000000000000000",
>>>>>>>             "key": "",
>>>>>>>             "snapid": -2,
>>>>>>>             "hash": 2375431681,
>>>>>>>             "max": 0,
>>>>>>>             "pool": 4,
>>>>>>>             "namespace": ""
>>>>>>>         },
>>>>>>>         "version": "4536'2728",
>>>>>>>         "prior_version": "4536'2727",
>>>>>>>         "last_reqid": "client.174858.0:10",
>>>>>>>         "user_version": 14706,
>>>>>>>         "size": 68,
>>>>>>>         "mtime": "2017-08-09 11:30:18.263983",
>>>>>>>         "local_mtime": "2017-08-09 11:30:18.264310",
>>>>>>>         "lost": 0,
>>>>>>>         "flags": 4,
>>>>>>>         "snaps": [],
>>>>>>>         "truncate_seq": 0,
>>>>>>>         "truncate_size": 0,
>>>>>>>         "data_digest": 4294967295,
>>>>>>>         "omap_digest": 4294967295,
>>>>>>>         "watchers": {}
>>>>>>>     },
>>>>>>>     "stat": {
>>>>>>>         "size": 68,
>>>>>>>         "blksize": 4096,
>>>>>>>         "blocks": 16,
>>>>>>>         "nlink": 1
>>>>>>>     },
>>>>>>>     "SnapSet": {
>>>>>>>         "snap_context": {
>>>>>>>             "seq": 28,
>>>>>>>             "snaps": [
>>>>>>>                 28
>>>>>>>             ]
>>>>>>>         },
>>>>>>>         "head_exists": 1,
>>>>>>>         "clones": [
>>>>>>>             {
>>>>>>>                 "snap": 28,
>>>>>>>                 "size": 68,
>>>>>>>                 "overlap": "[0~64]"
>>>>>>>             }
>>>>>>>         ]
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>> In this result, we found that the overlap for clone "28" is [0~64]. So
>>>>>>> we specifically sent a AioWrite req to this object to write to the
>>>>>>> offset 32 with 4 bytes of ramdon data, which we thought the overlap
>>>>>>> should change to [0~32, 36~64]. However, the result is as follows:
>>>>>>>
>>>>>>> {
>>>>>>>     "id": {
>>>>>>>         "oid": "rbd_data.2aae62ae8944a.0000000000000000",
>>>>>>>         "key": "",
>>>>>>>         "snapid": -2,
>>>>>>>         "hash": 2375431681,
>>>>>>>         "max": 0,
>>>>>>>         "pool": 4,
>>>>>>>         "namespace": "",
>>>>>>>         "max": 0
>>>>>>>     },
>>>>>>>     "info": {
>>>>>>>         "oid": {
>>>>>>>             "oid": "rbd_data.2aae62ae8944a.0000000000000000",
>>>>>>>             "key": "",
>>>>>>>             "snapid": -2,
>>>>>>>             "hash": 2375431681,
>>>>>>>             "max": 0,
>>>>>>>             "pool": 4,
>>>>>>>             "namespace": ""
>>>>>>>         },
>>>>>>>         "version": "4546'2730",
>>>>>>>         "prior_version": "4538'2729",
>>>>>>>         "last_reqid": "client.155555.0:10",
>>>>>>>         "user_version": 14708,
>>>>>>>         "size": 4096,
>>>>>>>         "mtime": "2017-08-09 11:36:20.717910",
>>>>>>>         "local_mtime": "2017-08-09 11:36:20.719039",
>>>>>>>         "lost": 0,
>>>>>>>         "flags": 4,
>>>>>>>         "snaps": [],
>>>>>>>         "truncate_seq": 0,
>>>>>>>         "truncate_size": 0,
>>>>>>>         "data_digest": 4294967295,
>>>>>>>         "omap_digest": 4294967295,
>>>>>>>         "watchers": {}
>>>>>>>     },
>>>>>>>     "stat": {
>>>>>>>         "size": 4096,
>>>>>>>         "blksize": 4096,
>>>>>>>         "blocks": 16,
>>>>>>>         "nlink": 1
>>>>>>>     },
>>>>>>>     "SnapSet": {
>>>>>>>         "snap_context": {
>>>>>>>             "seq": 28,
>>>>>>>             "snaps": [
>>>>>>>                 28
>>>>>>>             ]
>>>>>>>         },
>>>>>>>         "head_exists": 1,
>>>>>>>         "clones": [
>>>>>>>             {
>>>>>>>                 "snap": 28,
>>>>>>>                 "size": 68,
>>>>>>>                 "overlap": "[0~64]"
>>>>>>>             }
>>>>>>>         ]
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>> It's obvious that it didn't change at all. If we do export-diff under
>>>>>>> this circumstance, the result would be wrong. Meanwhile, in the base
>>>>>>> tier, the "ceph-objectstore-tool" dump's result also showed that the
>>>>>>> overlap recorded in the base tier didn't change also:
>>>>>>> {
>>>>>>>     "id": {
>>>>>>>         "oid": "rbd_data.2aae62ae8944a.0000000000000000",
>>>>>>>         "key": "",
>>>>>>>         "snapid": -2,
>>>>>>>         "hash": 2375431681,
>>>>>>>         "max": 0,
>>>>>>>         "pool": 3,
>>>>>>>         "namespace": "",
>>>>>>>         "max": 0
>>>>>>>     },
>>>>>>>     "info": {
>>>>>>>         "oid": {
>>>>>>>             "oid": "rbd_data.2aae62ae8944a.0000000000000000",
>>>>>>>             "key": "",
>>>>>>>             "snapid": -2,
>>>>>>>             "hash": 2375431681,
>>>>>>>             "max": 0,
>>>>>>>             "pool": 3,
>>>>>>>             "namespace": ""
>>>>>>>         },
>>>>>>>         "version": "4536'14459",
>>>>>>>         "prior_version": "4536'14458",
>>>>>>>         "last_reqid": "client.174834.0:10",
>>>>>>>         "user_version": 14648,
>>>>>>>         "size": 68,
>>>>>>>         "mtime": "2017-08-09 11:30:01.412634",
>>>>>>>         "local_mtime": "2017-08-09 11:30:01.413614",
>>>>>>>         "lost": 0,
>>>>>>>         "flags": 36,
>>>>>>>         "snaps": [],
>>>>>>>         "truncate_seq": 0,
>>>>>>>         "truncate_size": 0,
>>>>>>>         "data_digest": 4294967295,
>>>>>>>         "omap_digest": 4294967295,
>>>>>>>         "watchers": {}
>>>>>>>     },
>>>>>>>     "stat": {
>>>>>>>         "size": 68,
>>>>>>>         "blksize": 4096,
>>>>>>>         "blocks": 16,
>>>>>>>         "nlink": 1
>>>>>>>     },
>>>>>>>     "SnapSet": {
>>>>>>>         "snap_context": {
>>>>>>>             "seq": 28,
>>>>>>>             "snaps": [
>>>>>>>                 28
>>>>>>>             ]
>>>>>>>         },
>>>>>>>         "head_exists": 1,
>>>>>>>         "clones": [
>>>>>>>             {
>>>>>>>                 "snap": 28,
>>>>>>>                 "size": 68,
>>>>>>>                 "overlap": "[0~64]"
>>>>>>>             }
>>>>>>>         ]
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>> Then we turn to the source code to find the reason for this. And we
>>>>>>> found that, the reason should be that, in the
>>>>>>> ReplicatedPG::make_writeable method, when determining whether the
>>>>>>> write op's modified range should be subtracted from the clone overlap,
>>>>>>> it has pass two condition check: "ctx->new_snapset.clones.size() > 0"
>>>>>>> and "is_present_clone(last_clone_oid)", as the code below shows.
>>>>>>>
>>>>>>>   // update most recent clone_overlap and usage stats
>>>>>>>   if (ctx->new_snapset.clones.size() > 0) {
>>>>>>>     /* we need to check whether the most recent clone exists, if it's
>>>>>>> been evicted,
>>>>>>>      * it's not included in the stats */
>>>>>>>     hobject_t last_clone_oid = soid;
>>>>>>>     last_clone_oid.snap = ctx->new_snapset.clone_overlap.rbegin()->first;
>>>>>>>     if (is_present_clone(last_clone_oid)) {
>>>>>>>       interval_set<uint64_t> &newest_overlap =
>>>>>>> ctx->new_snapset.clone_overlap.rbegin()->second;
>>>>>>>       ctx->modified_ranges.intersection_of(newest_overlap);
>>>>>>>       // modified_ranges is still in use by the clone
>>>>>>>       add_interval_usage(ctx->modified_ranges, ctx->delta_stats);
>>>>>>>       newest_overlap.subtract(ctx->modified_ranges);
>>>>>>>     }
>>>>>>>   }
>>>>>>>
>>>>>>> We thought that the latter condition check
>>>>>>> "is_present_clone(last_clone_oid)" might not be reasonable to be a
>>>>>>> judgement base for the determination of whether to subtract the clone
>>>>>>> overlap with write ops' modified range, so we changed to code above to
>>>>>>> the following, which moved the subtraction out of the latter condition
>>>>>>> check, and submitted a pr for this:
>>>>>>> https://github.com/ceph/ceph/pull/16790:
>>>>>>>
>>>>>>>   // update most recent clone_overlap and usage stats
>>>>>>>   if (ctx->new_snapset.clones.size() > 0) {
>>>>>>>     /* we need to check whether the most recent clone exists, if it's
>>>>>>> been evicted,
>>>>>>>      * it's not included in the stats */
>>>>>>>     hobject_t last_clone_oid = soid;
>>>>>>>     last_clone_oid.snap = ctx->new_snapset.clone_overlap.rbegin()->first;
>>>>>>>     interval_set<uint64_t> &newest_overlap =
>>>>>>> ctx->new_snapset.clone_overlap.rbegin()->second;
>>>>>>>     ctx->modified_ranges.intersection_of(newest_overlap);
>>>>>>>     newest_overlap.subtract(ctx->modified_ranges);
>>>>>>>
>>>>>>>     if (is_present_clone(last_clone_oid)) {
>>>>>>>       // modified_ranges is still in use by the clone
>>>>>>>       add_interval_usage(ctx->modified_ranges, ctx->delta_stats);
>>>>>>>     }
>>>>>>>   }
>>>>>>>
>>>>>>> In our test, this change solved the problem successfully, however, we
>>>>>>> can't confirm that this change won't cause any new problems. So, here
>>>>>>> we are discussing how to solve this problem:-)
>>>>>>>
>>>>>>> On 9 August 2017 at 09:26, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>>>>>>>> Is this issue related to https://github.com/ceph/ceph/pull/10626?
>>>>>>>>
>>>>>>>> On Mon, Aug 7, 2017 at 8:07 PM, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
>>>>>>>>> OK, I'll try that. Thank you:-)
>>>>>>>>>
>>>>>>>>> On 8 August 2017 at 04:48, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>>>>>>>>>> Could you just proxy the "list snaps" op from the cache tier down to
>>>>>>>>>> the base tier and combine the cache tier + base tier results? Reading
>>>>>>>>>> the associated ticket, it seems kludgy to me to attempt to work around
>>>>>>>>>> this within librbd by just refusing the provide intra-object diffs if
>>>>>>>>>> cache tiering is in-use.
>>>>>>>>>>
>>>>>>>>>> On Sat, Aug 5, 2017 at 11:56 AM, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
>>>>>>>>>>> Hi, everyone.
>>>>>>>>>>>
>>>>>>>>>>> Trying to solve the issue "http://tracker.ceph.com/issues/20896";, I
>>>>>>>>>>> just did another test: I did some writes to an object
>>>>>>>>>>> "rbd_data.1ebc6238e1f29.0000000000000000" to raise its "HEAD" object
>>>>>>>>>>> to the cache tier, after which I specifically write to its offset 0x40
>>>>>>>>>>> with 4 bytes of random data. Then I used "ceph-objectstore-tool" to
>>>>>>>>>>> dump its "HEAD" version in the base tier, the result is as
>>>>>>>>>>> follows(before I raise it to cache tier, there is three snaps the
>>>>>>>>>>> latest of which is 26):
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>>     "id": {
>>>>>>>>>>>         "oid": "rbd_data.1ebc6238e1f29.0000000000000000",
>>>>>>>>>>>         "key": "",
>>>>>>>>>>>         "snapid": -2,
>>>>>>>>>>>         "hash": 1655893237,
>>>>>>>>>>>         "max": 0,
>>>>>>>>>>>         "pool": 3,
>>>>>>>>>>>         "namespace": "",
>>>>>>>>>>>         "max": 0
>>>>>>>>>>>     },
>>>>>>>>>>>     "info": {
>>>>>>>>>>>         "oid": {
>>>>>>>>>>>             "oid": "rbd_data.1ebc6238e1f29.0000000000000000",
>>>>>>>>>>>             "key": "",
>>>>>>>>>>>             "snapid": -2,
>>>>>>>>>>>             "hash": 1655893237,
>>>>>>>>>>>             "max": 0,
>>>>>>>>>>>             "pool": 3,
>>>>>>>>>>>             "namespace": ""
>>>>>>>>>>>         },
>>>>>>>>>>>         "version": "4219'16423",
>>>>>>>>>>>         "prior_version": "3978'16310",
>>>>>>>>>>>         "last_reqid": "osd.70.4213:2359",
>>>>>>>>>>>         "user_version": 17205,
>>>>>>>>>>>         "size": 4194304,
>>>>>>>>>>>         "mtime": "2017-08-03 22:07:34.656122",
>>>>>>>>>>>         "local_mtime": "2017-08-05 23:02:33.628734",
>>>>>>>>>>>         "lost": 0,
>>>>>>>>>>>         "flags": 52,
>>>>>>>>>>>         "snaps": [],
>>>>>>>>>>>         "truncate_seq": 0,
>>>>>>>>>>>         "truncate_size": 0,
>>>>>>>>>>>         "data_digest": 2822203961,
>>>>>>>>>>>         "omap_digest": 4294967295,
>>>>>>>>>>>         "watchers": {}
>>>>>>>>>>>     },
>>>>>>>>>>>     "stat": {
>>>>>>>>>>>         "size": 4194304,
>>>>>>>>>>>         "blksize": 4096,
>>>>>>>>>>>         "blocks": 8200,
>>>>>>>>>>>         "nlink": 1
>>>>>>>>>>>     },
>>>>>>>>>>>     "SnapSet": {
>>>>>>>>>>>         "snap_context": {
>>>>>>>>>>>             "seq": 26,
>>>>>>>>>>>             "snaps": [
>>>>>>>>>>>                 26,
>>>>>>>>>>>                 25,
>>>>>>>>>>>                 16
>>>>>>>>>>>             ]
>>>>>>>>>>>         },
>>>>>>>>>>>         "head_exists": 1,
>>>>>>>>>>>         "clones": [
>>>>>>>>>>>             {
>>>>>>>>>>>                 "snap": 16,
>>>>>>>>>>>                 "size": 4194304,
>>>>>>>>>>>                 "overlap": "[4~4194300]"
>>>>>>>>>>>             },
>>>>>>>>>>>             {
>>>>>>>>>>>                 "snap": 25,
>>>>>>>>>>>                 "size": 4194304,
>>>>>>>>>>>                 "overlap": "[]"
>>>>>>>>>>>             },
>>>>>>>>>>>             {
>>>>>>>>>>>                 "snap": 26,
>>>>>>>>>>>                 "size": 4194304,
>>>>>>>>>>>                 "overlap": "[]"
>>>>>>>>>>>             }
>>>>>>>>>>>         ]
>>>>>>>>>>>     }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> As we can see, its clone_overlap for snap 26 is empty, which,
>>>>>>>>>>> combining with the previous test described in
>>>>>>>>>>> http://tracker.ceph.com/issues/20896, means that the writes' "modified
>>>>>>>>>>> range" is neither recorded in the cache tier nor in the base tier.
>>>>>>>>>>>
>>>>>>>>>>> I think maybe we really should move the clone overlap modification out
>>>>>>>>>>> of the IF block which has the condition check "is_present_clone". As
>>>>>>>>>>> for now, I can't see any other way to fix this problem.
>>>>>>>>>>>
>>>>>>>>>>> Am I right about this?
>>>>>>>>>>>
>>>>>>>>>>> On 4 August 2017 at 03:14, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
>>>>>>>>>>>> I mean I think it's the condition check "is_present_clone" that
>>>>>>>>>>>> prevent the clone overlap to record the client write operations
>>>>>>>>>>>> modified range when the target "HEAD" object exists without its most
>>>>>>>>>>>> recent clone object, and if I'm right, just move the clone overlap
>>>>>>>>>>>> modification out of the "is_present_clone" condition check block is
>>>>>>>>>>>> enough to solve this case, just like the PR
>>>>>>>>>>>> "https://github.com/ceph/ceph/pull/16790";, and this fix wouldn't cause
>>>>>>>>>>>> other problems.
>>>>>>>>>>>>
>>>>>>>>>>>> In our test, this fix solved the problem successfully, however, we
>>>>>>>>>>>> can't confirm it won't cause new problems yet.
>>>>>>>>>>>>
>>>>>>>>>>>> So if anyone see this and knows the answer, please help us. Thank you:-)
>>>>>>>>>>>>
>>>>>>>>>>>> On 4 August 2017 at 11:41, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
>>>>>>>>>>>>> Hi, grep:-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I finally got what you mean in https://github.com/ceph/ceph/pull/16790.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree with you in that " clone overlap is supposed to be tracking
>>>>>>>>>>>>> which data is the same on disk".
>>>>>>>>>>>>>
>>>>>>>>>>>>> My thought is that, "ObjectContext::new_snapset.clones" is already an
>>>>>>>>>>>>> indicator about whether there are clone objects on disk, so, in the
>>>>>>>>>>>>> scenario of "cache tier", although a clone oid does not corresponds to
>>>>>>>>>>>>> a "present clone" in cache tier, as long as
>>>>>>>>>>>>> "ObjectContext::new_snapset.clones" is not empty, there must a one
>>>>>>>>>>>>> such clone object in the base tier. And, as long as
>>>>>>>>>>>>> "ObjectContext::new_snapset.clones" has a strict "one-to-one"
>>>>>>>>>>>>> correspondence to "ObjectContext::new_snapset.clone_overlap", passing
>>>>>>>>>>>>> the condition check "if (ctx->new_snapset.clones.size() > 0)" is
>>>>>>>>>>>>> enough to make the judgement that the clone object exists.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, if I'm right, passing the condition check "if
>>>>>>>>>>>>> (ctx->new_snapset.clones.size() > 0)" is already enough for us to do
>>>>>>>>>>>>> "newest_overlap.subtract(ctx->modified_ranges)", it doesn't have to
>>>>>>>>>>>>> pass "is_present_clone".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am I right about this? Or am I missing anything?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please help us, thank you:-)
>>>>>>>>>>> --
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jason
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason
>
>
>
> --
> Jason
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux