Re: [PATCH v2] ceph: do not update snapshot context when there is no new snapshot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Xiubo Li <xiubli@xxxxxxxxxx> writes:

> On 2/18/22 10:47 AM, xiubli@xxxxxxxxxx wrote:
>> From: Xiubo Li <xiubli@xxxxxxxxxx>
>>
>> We will only track the uppest parent snapshot realm from which we
>> need to rebuild the snapshot contexts _downward_ in hierarchy. For
>> all the others having no new snapshot we will do nothing.
>>
>> This fix will avoid calling ceph_queue_cap_snap() on some inodes
>> inappropriately. For example, with the code in mainline, suppose there
>> are 2 directory hierarchies (with 6 directories total), like this:
>>
>> /dir_X1/dir_X2/dir_X3/
>> /dir_Y1/dir_Y2/dir_Y3/
>>
>> Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a
>> root snapshot under /.snap/root_snap. Every time we make snapshots under
>> /dir_Y1/..., the kclient will always try to rebuild the snap context for
>> snap_X2 realm and finally will always try to queue cap snaps for dir_Y2
>> and dir_Y3, which makes no sense.
>>
>> That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when
>> creating a new snapshot under /dir_Y1/... the new seq will be 4, and
>> the mds will send the kclient a snapshot backtrace in _downward_
>> order: seqs 4, 3.
>>
>> When ceph_update_snap_trace() is called, it will always rebuild the from
>> the last realm, that's the root_snap. So later when rebuilding the snap
>> context, the current logic will always cause it to rebuild the snap_X2
>> realm and then try to queue cap snaps for all the inodes related in that
>> realm, even though it's not necessary.
>>
>> This is accompanied by a lot of these sorts of dout messages:
>>
>>      "ceph:  queue_cap_snap 00000000a42b796b nothing dirty|writing"
>>
>> Fix the logic to avoid this situation.
>>
>> The 'invalidate' word is not precise here, acutally it will rebuild
>> the snapshot existing contexts or just build none-existing ones,
>> rename it to 'rebuild_snapcs'.
>>
>> URL: https://tracker.ceph.com/issues/44100
>> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
>> ---
>>
>> Changed in V2:
>> - Thanks Zheng's feedback and switched to Zheng's patch.
>> - Rename invalidate to rebuild_snapcs.
>>
>>
>>
>>   fs/ceph/snap.c | 28 +++++++++++++++++++---------
>>   1 file changed, 19 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
>> index dbf34f212596..6d55b8ba79d8 100644
>> --- a/fs/ceph/snap.c
>> +++ b/fs/ceph/snap.c
>> @@ -735,7 +735,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>   	__le64 *prior_parent_snaps;        /* encoded */
>>   	struct ceph_snap_realm *realm = NULL;
>>   	struct ceph_snap_realm *first_realm = NULL;
>> -	int invalidate = 0;
>> +	struct ceph_snap_realm *realm_to_rebuild = NULL;
>> +	int rebuild_snapcs;
>>   	int err = -ENOMEM;
>>   	LIST_HEAD(dirty_realms);
>>   @@ -743,6 +744,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>     	dout("update_snap_trace deletion=%d\n", deletion);
>>   more:
>> +	rebuild_snapcs = 0;
>>   	ceph_decode_need(&p, e, sizeof(*ri), bad);
>>   	ri = p;
>>   	p += sizeof(*ri);
>> @@ -766,7 +768,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>   	err = adjust_snap_realm_parent(mdsc, realm, le64_to_cpu(ri->parent));
>>   	if (err < 0)
>>   		goto fail;
>> -	invalidate += err;
>> +	rebuild_snapcs += err;
>>     	if (le64_to_cpu(ri->seq) > realm->seq) {
>>   		dout("update_snap_trace updating %llx %p %lld -> %lld\n",
>> @@ -791,22 +793,30 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>   		if (realm->seq > mdsc->last_snap_seq)
>>   			mdsc->last_snap_seq = realm->seq;
>>   -		invalidate = 1;
>> +		rebuild_snapcs = 1;
>>   	} else if (!realm->cached_context) {
>>   		dout("update_snap_trace %llx %p seq %lld new\n",
>>   		     realm->ino, realm, realm->seq);
>> -		invalidate = 1;
>> +		rebuild_snapcs = 1;
>>   	} else {
>>   		dout("update_snap_trace %llx %p seq %lld unchanged\n",
>>   		     realm->ino, realm, realm->seq);
>>   	}
>>   -	dout("done with %llx %p, invalidated=%d, %p %p\n", realm->ino,
>> -	     realm, invalidate, p, e);
>> +	dout("done with %llx %p, rebuild_snapcs=%d, %p %p\n", realm->ino,
>> +	     realm, rebuild_snapcs, p, e);
>>   -	/* invalidate when we reach the _end_ (root) of the trace */
>> -	if (invalidate && p >= e)
>> -		rebuild_snap_realms(realm, &dirty_realms);
>> +	/*
>> +	 * this will always track the uppest parent realm from which
>> +	 * we need to rebuild the snapshot contexts _downward_ in
>> +	 * hierarchy.
>> +	 */
>> +	if (rebuild_snapcs)
>> +		realm_to_rebuild = realm;
>> +
>> +	/* rebuild_snapcs when we reach the _end_ (root) of the trace */
>> +	if (rebuild_snapcs && p >= e)
>
> s/rebuild_snapcs/realm_to_rebuild/
>
> This will fix the bug Luís Henriques reported.
>
> I have sent the V3 to fix it. Thanks.

Awesome, thanks Xiubo.  I've give it a try today.

Cheer,
-- 
Luís

>
> - Xiubo
>
>
>> +		rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
>>     	if (!first_realm)
>>   		first_realm = realm;
>




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux