Re: Lose allocation hint attribute after recovery or backfill

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2 would be way simpler.
-Sam

On Thu, Jun 2, 2016 at 6:21 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Thu, 2 Jun 2016, Ning Yao wrote:
>> Hi, Sage
>>
>> Currently, allocation hint is set into filesystem fsxattr, like XFS.
>> Or directly assign to onode_t in Blustore and Kstore.
>> It works as expected normally.  However, if the cluster begins to
>> recovering or backfilling object. Those recovered object will lose
>> allocation hint fsxattr forever. This means if a cluster experienced
>> osd up/down flapping, then those kind of hot accessed object will not
>> have the allocation hint xattr any more.
>> As a result, those objects will not be fully written (1M/4M)
>> eventually become fragmented. This will also happen if an object is
>> truncated (become of enabling qume discard option).
>>
>> Therefore, we may want to preserve the allocation hint fsxattr during
>> recovery, there are two alternatives to do this.
>> 1. issue set_alloc_hint with the current object size during recovery, like
>>     t->set_alloc_hint(cid, oid,  expected_object_size, expected_object_size)
>>     but it would be a problem for us to retrieve the
>> expected_object_size. For a block image, it may be object_size in
>> image metadata.
>>
>> 2. So I would like to directly transfer the information of hint
>> fsxattr to the recovered object. Also, I find that a new param flags
>> will be used to indicated whether the data will be compressed.
>> So I propose a PR Like this:
>> https://github.com/ceph/ceph/pull/9452
>>
>> I think the issue may not be related to BlueStore or KStore because of
>> the allocation hint attribute can be overwritten in later
>> set_allocation_hint Ops?
>
> Good catch!
>
> It seems like we have two choices...
>
> 1) get_alloc_hint (like in your PR) in ObjectStore interface.
>
> 2) Store the alloc hint parameters in the object_info_t so the OSD
> remembers them independently and they can be set during recovery.
>
> I'm partial to #2 because it means we still preserve the metadata while
> allowing the ObjectStore implementations to ignore the hint if they so
> choose.  There are new hints, for example, that FileStore/XFS won't store
> at all.
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux