After back-porting Sage's patch to Giant, with radosgw, the xattrs can get inline. I haven't run extensive testing yet, will update once I have some performance data to share.
Thanks, Guang > Date: Tue, 16 Jun 2015 15:51:44 -0500 > From: mnelson@xxxxxxxxxx > To: yguang11@xxxxxxxxxxx; sage@xxxxxxxxxxxx > CC: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx > Subject: Re: xattrs vs. omap with radosgw > > > > On 06/16/2015 03:48 PM, GuangYang wrote: > > Thanks Sage for the quick response. > > > > It is on Firefly v0.80.4. > > > > While trying to put with *rados* directly, the xattrs can be inline. The problem comes to light when using radosgw, since we have a bunch of metadata to keep via xattrs, including: > > rgw.idtag : 15 bytes > > rgw.manifest : 381 bytes > > Ah, that manifest will push us over the limit afaik resulting in every > inode getting a new extent. > > > rgw.acl : 121 bytes > > rgw.etag : 33 bytes > > > > Given the background, it looks like the problem is that the rgw.manifest is too large so that XFS make it extents. If I understand correctly, if we port the change to Firefly, we should be able to inline the inode since the accumulated size is still less than 2K (please correct me if I am wrong here). > > I think you are correct so long as the patch breaks that manifest down > into 254 byte or smaller chunks. > > > > > Thanks, > > Guang > > > > > > ---------------------------------------- > >> Date: Tue, 16 Jun 2015 12:43:08 -0700 > >> From: sage@xxxxxxxxxxxx > >> To: yguang11@xxxxxxxxxxx > >> CC: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx > >> Subject: Re: xattrs vs. omap with radosgw > >> > >> On Tue, 16 Jun 2015, GuangYang wrote: > >>> Hi Cephers, > >>> While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O. > >>> > >>> I would like to check if anybody has experience with offloading the metadata to omap: > >>> 1> Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)? > >>> 2> Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap. > >>> > >>> Any sharing is deeply appreciated. Thanks! > >> > >> Hi Guang, > >> > >> Is this hammer or firefly? > >> > >> With hammer the size of object_info_t crossed the 255 byte boundary, which > >> is the max xattr value that XFS can inline. We've since merged something > >> that stripes over several small xattrs so that we can keep things inline, > >> but it hasn't been backported to hammer yet. See > >> c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're > >> seeing? > >> > >> I think we're still better off with larger XFS inodes and inline xattrs if > >> it means we avoid leveldb at all for most objects. > >> > >> sage > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com