Re: xattrs vs omap

Somnath Roy <Somnath.Roy@xxxxxxxxxxx> · Mon, 13 Jul 2015 19:04:54 +0000

<<inline

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Jan Schermer
Sent: Monday, July 13, 2015 2:32 AM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  xattrs vs omap

Sorry for reviving an old thread, but could I get some input on this, pretty please?

ext4 has 256-byte inodes by default (at least according to docs) but the fragment below says:
OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)

The default 512b is too much if the inode is just 256b, so shouldn’t that be 256b in case people use the default ext4 inode size?

Anyway, is it better to format ext4 with larger inodes (say 2048b) and set filestore_max_inline_xattr_size_other=1536, or leave it at defaults?
[Somnath] Why 1536 ? why not 1024 or any power of 2 ? I am not seeing any harm though, but, curious.
(As I understand it, on ext4 xattrs ale limited to one block, inode size + something can spill to one different inode - maybe someone knows better).

[Somnath] The xttr size ("_") is now more than 256 bytes and it will spill over, so, bigger inode  size will be good. But, I would suggest do your benchmark before putting it into production.

Is filestore_max_inline_xattr_size and absolute limit, or is it filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality?

[Somnath] The *_size is tracking the xttr size per attribute and *inline_xattrs keep track of max number of inline attributes allowed. So, if a xattr size is > *_size , it will go to omap and also if the total number of xattra > *inline_xattrs , it will go to omap.
If you are only using rbd, the number of inline xattrs will be always 2 and it will not cross that default max limit.

Does OSD do the sane thing if for some reason the xattrs do not fit? What are the performance implications of storing the xattrs in leveldb?

[Somnath] Even though I don't have the exact numbers, but, it has a significant overhead if the xattrs go to leveldb.

And lastly - what size of xattrs should I really expect if all I use is RBD for OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and pool snapshots). This overhead is quite large

[Somnath] It will be 2 xattrs, default "_" will be little bigger than 256 bytes and "_snapset" is small depends on number of snaps/clones, but unlikely will cross 256 bytes range.

My plan so far is to format the drives like this:
mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256 (2048b inode, 4096b block size, one inode for 512k of space and set  filestore_max_inline_xattr_size_other=1536
[Somnath] Not much idea on ext4, sorry..

Does that make sense?

Thanks!

Jan

> On 02 Jul 2015, at 12:18, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>
> Does anyone have a known-good set of parameters for ext4? I want to try it as well but I’m a bit worried what happnes if I get it wrong.
>
> Thanks
>
> Jan
>
>
>
>> On 02 Jul 2015, at 09:40, Nick Fisk <nick@xxxxxxxxxx> wrote:
>>
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
>>> Behalf Of Christian Balzer
>>> Sent: 02 July 2015 02:23
>>> To: Ceph Users
>>> Subject: Re:  xattrs vs omap
>>>
>>> On Thu, 2 Jul 2015 00:36:18 +0000 Somnath Roy wrote:
>>>
>>>> It is replaced with the following config option..
>>>>
>>>> // Use omap for xattrs for attrs over //
>>>> filestore_max_inline_xattr_size or
>>>> OPTION(filestore_max_inline_xattr_size, OPT_U32, 0)     //Override
>>>> OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536)
>>>> OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048)
>>>> OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)
>>>>
>>>> // for more than filestore_max_inline_xattrs attrs
>>>> OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override
>>>> OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10)
>>>> OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10)
>>>> OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2)
>>>>
>>>>
>>>> If these limits crossed, xattrs will be stored in omap..
>>>>
>>> Sounds fair.
>>>
>>> Since I only use RBD I don't think it will ever exceed this.
>>
>> Possibly, see my thread  about performance difference between new and
>> old pools. Still not quite sure what's going on, but for some reasons
>> some of the objects behind RBD's have larger xattrs which is causing
>> really poor performance.
>>
>>>
>>> Thanks,
>>>
>>> Chibi
>>>> For ext4, you can use either filestore_max*_other or
>>>> filestore_max_inline_xattrs/ filestore_max_inline_xattr_size. I any
>>>> case, later two will override everything.
>>>>
>>>> Thanks & Regards
>>>> Somnath
>>>>
>>>> -----Original Message-----
>>>> From: Christian Balzer [mailto:chibi@xxxxxxx]
>>>> Sent: Wednesday, July 01, 2015 5:26 PM
>>>> To: Ceph Users
>>>> Cc: Somnath Roy
>>>> Subject: Re:  xattrs vs omap
>>>>
>>>>
>>>> Hello,
>>>>
>>>> On Wed, 1 Jul 2015 15:24:13 +0000 Somnath Roy wrote:
>>>>
>>>>> It doesn't matter, I think filestore_xattr_use_omap is a 'noop'
>>>>> and not used in the Hammer.
>>>>>
>>>> Then what was this functionality replaced with, esp. considering
>>>> EXT4 based OSDs?
>>>>
>>>> Chibi
>>>>> Thanks & Regards
>>>>> Somnath
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
>>>>> Behalf Of Adam Tygart Sent: Wednesday, July 01, 2015 8:20 AM
>>>>> To: Ceph Users
>>>>> Subject:  xattrs vs omap
>>>>>
>>>>> Hello all,
>>>>>
>>>>> I've got a coworker who put "filestore_xattr_use_omap = true" in
>>>>> the ceph.conf when we first started building the cluster. Now he
>>>>> can't remember why. He thinks it may be a holdover from our first
>>>>> Ceph cluster (running dumpling on ext4, iirc).
>>>>>
>>>>> In the newly built cluster, we are using XFS with 2048 byte
>>>>> inodes, running Ceph 0.94.2. It currently has production data in it.
>>>>>
>>>>> From my reading of other threads, it looks like this is probably
>>>>> not something you want set to true (at least on XFS), due to
>>>>> performance implications. Is this something you can change on a running cluster?
>>>>> Is it worth the hassle?
>>>>>
>>>>> Thanks,
>>>>> Adam
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>> ________________________________
>>>>>
>>>>> PLEASE NOTE: The information contained in this electronic mail
>>>>> message is intended only for the use of the designated
>>>>> recipient(s) named above. If the reader of this message is not the
>>>>> intended recipient, you are hereby notified that you have received
>>>>> this message in error and that any review, dissemination,
>>>>> distribution, or copying of this message is strictly prohibited.
>>>>> If you have received this communication in error, please notify
>>>>> the sender by telephone or e-mail (as shown above) immediately and
>>>>> destroy any and all copies of this message in your possession
>>>>> (whether hard copies or electronically stored copies).
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>>
>>>> --
>>>> Christian Balzer        Network/Systems Engineer
>>>> chibi@xxxxxxx           Global OnLine Japan/Fusion Communications
>>>> http://www.gol.com/
>>>>
>>>> ________________________________
>>>>
>>>> PLEASE NOTE: The information contained in this electronic mail
>>>> message is intended only for the use of the designated recipient(s) named above.
>>>> If the reader of this message is not the intended recipient, you
>>>> are hereby notified that you have received this message in error
>>>> and that any review, dissemination, distribution, or copying of
>>>> this message is strictly prohibited. If you have received this
>>>> communication in error, please notify the sender by telephone or
>>>> e-mail (as shown above) immediately and destroy any and all copies
>>>> of this message in your possession (whether hard copies or electronically stored copies).
>>>>
>>>>
>>>
>>>
>>> --
>>> Christian Balzer        Network/Systems Engineer
>>> chibi@xxxxxxx       Global OnLine Japan/Fusion Communications
>>> http://www.gol.com/
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com