Re: Lose allocation hint attribute after recovery or backfill

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2 Jun 2016, Ning Yao wrote:
> Hi, Sage
> 
> Currently, allocation hint is set into filesystem fsxattr, like XFS.
> Or directly assign to onode_t in Blustore and Kstore.
> It works as expected normally.  However, if the cluster begins to
> recovering or backfilling object. Those recovered object will lose
> allocation hint fsxattr forever. This means if a cluster experienced
> osd up/down flapping, then those kind of hot accessed object will not
> have the allocation hint xattr any more.
> As a result, those objects will not be fully written (1M/4M)
> eventually become fragmented. This will also happen if an object is
> truncated (become of enabling qume discard option).
> 
> Therefore, we may want to preserve the allocation hint fsxattr during
> recovery, there are two alternatives to do this.
> 1. issue set_alloc_hint with the current object size during recovery, like
>     t->set_alloc_hint(cid, oid,  expected_object_size, expected_object_size)
>     but it would be a problem for us to retrieve the
> expected_object_size. For a block image, it may be object_size in
> image metadata.
> 
> 2. So I would like to directly transfer the information of hint
> fsxattr to the recovered object. Also, I find that a new param flags
> will be used to indicated whether the data will be compressed.
> So I propose a PR Like this:
> https://github.com/ceph/ceph/pull/9452
> 
> I think the issue may not be related to BlueStore or KStore because of
> the allocation hint attribute can be overwritten in later
> set_allocation_hint Ops?

Good catch!

It seems like we have two choices...

1) get_alloc_hint (like in your PR) in ObjectStore interface.

2) Store the alloc hint parameters in the object_info_t so the OSD 
remembers them independently and they can be set during recovery.

I'm partial to #2 because it means we still preserve the metadata while 
allowing the ObjectStore implementations to ignore the hint if they so 
choose.  There are new hints, for example, that FileStore/XFS won't store 
at all.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux