Re: OSD memory consumption significantly increased with greater rgw_obj_stripe_size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 4 Jul 2018, Aleksei Gutikov wrote:
> 
> On 07/03/2018 05:55 AM, Sage Weil wrote:
> > On Fri, 29 Jun 2018, Aleksei Gutikov wrote:
> > > Throughput is 100% the same, just sliced into bigger chunks (rados
> > > objects).
> > > And this throughput is not high, less than single object per second. And
> > > memory stay occupied even after writing stopped.
> > > 
> > > Currently I'm sure that is side effect of sharing buffer::raw object among
> > > different buffer::ptr objects.
> > > 
> > > Please, have a look into this dump of ObjectContext::attr_cache of one of
> > > context in PrimaryLogPG::object_contexts, made after uploading single 4M
> > > object into S3.
> > > Notice "_user.rgw.idtag" and "_user.rgw.tail_tag" xattrs, both 44 bytes
> > > length, holidng 4194304 bytes buffer::raw object (nref=2).
> > 
> > That is the smoking gun!  What version is this?
> 
> Particularly this dump from 12.2.2
> But issue was also reproducible for 12.2.5 and master.

I think this will fix it:

	https://github.com/ceph/ceph/pull/22858

Can you test?  (Patch should be a clean cherry-pick to mimic or luminous)

sage


> 
> > Thanks!
> > sage
> > 
> > 
> > > 
> > > 
> > > "_": buffer::list(len=302, buffer::ptr(0~302 0x559318e74d80 in raw
> > > 0x559318e74d80 len 488 nref 1) ),
> > > 
> > > "_user.rgw.acl": buffer::list(len=147, buffer::ptr(448~147 0x55931677c4c0
> > > in
> > > raw 0x55931677c300 len 1126 nref 9) ),
> > > 
> > > "_user.rgw.content_type": buffer::list(len=25, buffer::ptr(616~25
> > > 0x55931677c568 in raw 0x55931677c300 len 1126 nref 9) ),
> > > 
> > > "_user.rgw.etag": buffer::list(len=33, buffer::ptr(654~33 0x55931677c58e
> > > in
> > > raw 0x55931677c300 len 1126 nref 9) ),
> > > 
> > > "_user.rgw.idtag": buffer::list(len=44, buffer::ptr(14~44 0x55931958e00e
> > > in
> > > raw 0x55931958e000 len 4194304 nref 2) ),
> > > 
> > > "_user.rgw.manifest": buffer::list(len=300, buffer::ptr(136~300
> > > 0x55931677c388
> > > in raw 0x55931677c300 len 1126 nref 9) ),
> > > 
> > > "_user.rgw.pg_ver": buffer::list(len=8, buffer::ptr(0~8 0x559319124000 in
> > > raw
> > > 0x559319124000 len 4008 nref 1) ),
> > > 
> > > "_user.rgw.source_zone": buffer::list(len=4, buffer::ptr(1122~4
> > > 0x55931677c762
> > > in raw 0x55931677c300 len 1126 nref 9) ),
> > > 
> > > "_user.rgw.tail_tag": buffer::list(len=44, buffer::ptr(75~44
> > > 0x55931958e04b in
> > > raw 0x55931958e000 len 4194304 nref 2) ),
> > > 
> > > "_user.rgw.x-amz-content-sha256": buffer::list(len=65, buffer::ptr(716~65
> > > 0x55931677c5cc in raw 0x55931677c300 len 1126 nref 9) ),
> > > 
> > > "_user.rgw.x-amz-date": buffer::list(len=17, buffer::ptr(800~17
> > > 0x55931677c620
> > > in raw 0x55931677c300 len 1126 nref 9) ),
> > > 
> > > "_user.rgw.x-amz-meta-s3cmd-attrs": buffer::list(len=173,
> > > buffer::ptr(848~173
> > > 0x55931677c650 in raw 0x55931677c300 len 1126 nref 9) ),
> > > 
> > > "_user.rgw.x-amz-storage-class": buffer::list(len=9, buffer::ptr(1049~9
> > > 0x55931677c719 in raw 0x55931677c300 len 1126 nref 9) ),
> > > 
> > > "snapset": buffer::list(len=35, buffer::ptr(0~35 0x559319127000 in raw
> > > 0x559319127000 len 4008 nref 1) )
> > > 
> > > 
> > > Theoretically with 300 pg per osd and EC 8+3 and
> > > osd_pg_object_context_cache_count=64
> > > and rgw_obj_stripe_size=4M
> > > this cache can consume up to 300/11*64*4M = 6.9G
> > > just because of this side effect of shared buffer::raw.
> > > We not see so high used memory just because rgw not set xattrs on
> > > all rados objects parts of big S3 object.
> > > But with synthetic test with all s3 objects of size 4M it can be easily
> > > achieved.
> > > 
> > > 
> > > Thanks,
> > > Aleksei
> > > 
> > > 
> > > On 06/29/2018 03:30 AM, Gregory Farnum wrote:
> > > > Can you talk more about how you identified this as an issue and came
> > > > up with the potential solutions you've identified?
> > > > 
> > > > Naively, if I'm told that larger objects make the OSD take up more
> > > > memory, it sounds to me like the OSD is probably providing more
> > > > throughput, and that if you want it to use up less memory you just
> > > > ought to change the amount of outstanding IO it lets in to the system.
> > > > -Greg
> > > > 
> > > > On Thu, Jun 28, 2018 at 1:29 AM, Aleksei Gutikov
> > > > <aleksey.gutikov@xxxxxxxxxx> wrote:
> > > > > 
> > > > > NOTE: rgw_max_chunk_size must be equal to rgw_obj_stripe_size, so I
> > > > > mean
> > > > > both when refer to one.
> > > > > 
> > > > > For example when I changed rgw_obj_stripe_size from 4M to 16M OSD
> > > > > memory
> > > > > usage increased approx 2.5 times.
> > > > > This issue was reproduced with erasure-coded pools.
> > > > > 
> > > > > OSD command dump_mempools show that only anon pool bytes increased.
> > > > > 
> > > > > Further investigations show that whole buffer::raw object received
> > > > > from
> > > > > network
> > > > > (created in alloc_aligned_buffer() in AsyncConnection.cc:623)
> > > > > The whole 4M or 16M buffer::raw objects preserved with nref>0 in
> > > > > PrimaryLogPG::object_contexts
> > > > > in ObjectContext::attr_cache.
> > > > > 
> > > > > This issue was reproduced on both luminous and master branches.
> > > > > 
> > > > > 
> > > > > I see at least two types of improvement:
> > > > > 
> > > > > 1) memcpy relatively small parts of buffer::raw when create new
> > > > > buffer::ptr
> > > > >     For just example with next compile-time configuration parameters:
> > > > >       BUFFER_MIN_SIZE_COPY_FROM = 64k
> > > > >       BUFFER_MAX_SIZE_TO_COPY = 16k
> > > > >       BUFFER_MIN_RATIO_TO_COPY = 128
> > > > >     will copy up to 512 bytes from 64k raw object
> > > > >     or will copy up to 16k from 4M object
> > > > >     will not copy from 63k raw object
> > > > > 
> > > > >     Pros: will improve all issues of this type (preservation of
> > > > > buffer::raw
> > > > > objects)
> > > > >     Cons: unknown impact, memory fragmentation for example
> > > > > 
> > > > > 2) Improvements related particularly to PrimaryLogPG::object_contexts
> > > > > 
> > > > >     2.1) Set osd_pg_object_context_cache_count into 1 or 0
> > > > >       Cons: cache will not actually work
> > > > > 
> > > > >     2.2) Recreate bufferlists of attr_cache entries during inserting
> > > > > into
> > > > > cache to copy attrs and free huge buffer later.
> > > > >       Pros: minimal impact on any other subsystems
> > > > >       Cons: will improve only this particular case
> > > > > 
> > > > >     2.3) Limit object_contexts with total used memory also in addition
> > > > > to
> > > > > osd_pg_object_context_cache_count.
> > > > >       Cons: cache will probably will not work because each entry will
> > > > > occupy
> > > > > lot of memory and all entries will be skipped.
> > > > > 
> > > > >     2.4) Remove object_contexts completely, create contexts every time
> > > > > on
> > > > > fly.
> > > > >       Cons: object_contexts not looks like spare part that can be
> > > > > safely
> > > > > removed
> > > > > 
> > > > > 
> > > > > We tested osd_pg_object_context_cache_count=1 as hotfix
> > > > > and it improved OSD memory usage significantly without dependency from
> > > > > rgw_obj_stripe_size.
> > > > > 
> > > > > 
> > > > > Can, please, somebody clarify a little bit about purpose of
> > > > > PrimaryLogPG::object_contexts.
> > > > > And, maybe suggest something about fixing this issue.
> > > > > 
> > > > > 
> > > > > --
> > > > > 
> > > > > Best regards,
> > > > > Aleksei Gutikov
> > > > > Software Engineer | synesis.ru | Minsk. BY
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > > > in
> > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > -- 
> > > 
> > > Best regards,
> > > Aleksei Gutikov
> > > Software Engineer | synesis.ru | Minsk. BY
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> 
> -- 
> 
> Best regards,
> Aleksei Gutikov
> Software Engineer | synesis.ru | Minsk. BY
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux