RE: xattrs vs. omap with radosgw

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Sage for the quick response.

It is on Firefly v0.80.4.

While trying to put with *rados* directly, the xattrs can be inline. The problem comes to light when using radosgw, since we have a bunch of metadata to keep via xattrs, including:
   rgw.idtag  : 15 bytes
   rgw.manifest :  381 bytes
   rgw.acl : 121 bytes
   rgw.etag : 33 bytes

Given the background, it looks like the problem is that the rgw.manifest is too large so that XFS make it extents. If I understand correctly, if we port the change to Firefly, we should be able to inline the inode since the accumulated size is still less than 2K (please correct me if I am wrong here).

Thanks,
Guang


----------------------------------------
> Date: Tue, 16 Jun 2015 12:43:08 -0700
> From: sage@xxxxxxxxxxxx
> To: yguang11@xxxxxxxxxxx
> CC: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx
> Subject: Re: xattrs vs. omap with radosgw
>
> On Tue, 16 Jun 2015, GuangYang wrote:
>> Hi Cephers,
>> While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O.
>>
>> I would like to check if anybody has experience with offloading the metadata to omap:
>> 1> Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)?
>> 2> Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap.
>>
>> Any sharing is deeply appreciated. Thanks!
>
> Hi Guang,
>
> Is this hammer or firefly?
>
> With hammer the size of object_info_t crossed the 255 byte boundary, which
> is the max xattr value that XFS can inline. We've since merged something
> that stripes over several small xattrs so that we can keep things inline,
> but it hasn't been backported to hammer yet. See
> c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're
> seeing?
>
> I think we're still better off with larger XFS inodes and inline xattrs if
> it means we avoid leveldb at all for most objects.
>
> sage
 		 	   		  --
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux