Re: OSD memory consumption significantly increased with greater rgw_obj_stripe_size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 29 Jun 2018, Aleksei Gutikov wrote:
> Throughput is 100% the same, just sliced into bigger chunks (rados objects).
> And this throughput is not high, less than single object per second. And
> memory stay occupied even after writing stopped.
> 
> Currently I'm sure that is side effect of sharing buffer::raw object among
> different buffer::ptr objects.
> 
> Please, have a look into this dump of ObjectContext::attr_cache of one of
> context in PrimaryLogPG::object_contexts, made after uploading single 4M
> object into S3.
> Notice "_user.rgw.idtag" and "_user.rgw.tail_tag" xattrs, both 44 bytes
> length, holidng 4194304 bytes buffer::raw object (nref=2).

That is the smoking gun!  What version is this?

Thanks!
sage


> 
> 
> "_": buffer::list(len=302, buffer::ptr(0~302 0x559318e74d80 in raw
> 0x559318e74d80 len 488 nref 1) ),
> 
> "_user.rgw.acl": buffer::list(len=147, buffer::ptr(448~147 0x55931677c4c0 in
> raw 0x55931677c300 len 1126 nref 9) ),
> 
> "_user.rgw.content_type": buffer::list(len=25, buffer::ptr(616~25
> 0x55931677c568 in raw 0x55931677c300 len 1126 nref 9) ),
> 
> "_user.rgw.etag": buffer::list(len=33, buffer::ptr(654~33 0x55931677c58e in
> raw 0x55931677c300 len 1126 nref 9) ),
> 
> "_user.rgw.idtag": buffer::list(len=44, buffer::ptr(14~44 0x55931958e00e in
> raw 0x55931958e000 len 4194304 nref 2) ),
> 
> "_user.rgw.manifest": buffer::list(len=300, buffer::ptr(136~300 0x55931677c388
> in raw 0x55931677c300 len 1126 nref 9) ),
> 
> "_user.rgw.pg_ver": buffer::list(len=8, buffer::ptr(0~8 0x559319124000 in raw
> 0x559319124000 len 4008 nref 1) ),
> 
> "_user.rgw.source_zone": buffer::list(len=4, buffer::ptr(1122~4 0x55931677c762
> in raw 0x55931677c300 len 1126 nref 9) ),
> 
> "_user.rgw.tail_tag": buffer::list(len=44, buffer::ptr(75~44 0x55931958e04b in
> raw 0x55931958e000 len 4194304 nref 2) ),
> 
> "_user.rgw.x-amz-content-sha256": buffer::list(len=65, buffer::ptr(716~65
> 0x55931677c5cc in raw 0x55931677c300 len 1126 nref 9) ),
> 
> "_user.rgw.x-amz-date": buffer::list(len=17, buffer::ptr(800~17 0x55931677c620
> in raw 0x55931677c300 len 1126 nref 9) ),
> 
> "_user.rgw.x-amz-meta-s3cmd-attrs": buffer::list(len=173, buffer::ptr(848~173
> 0x55931677c650 in raw 0x55931677c300 len 1126 nref 9) ),
> 
> "_user.rgw.x-amz-storage-class": buffer::list(len=9, buffer::ptr(1049~9
> 0x55931677c719 in raw 0x55931677c300 len 1126 nref 9) ),
> 
> "snapset": buffer::list(len=35, buffer::ptr(0~35 0x559319127000 in raw
> 0x559319127000 len 4008 nref 1) )
> 
> 
> Theoretically with 300 pg per osd and EC 8+3 and
> osd_pg_object_context_cache_count=64
> and rgw_obj_stripe_size=4M
> this cache can consume up to 300/11*64*4M = 6.9G
> just because of this side effect of shared buffer::raw.
> We not see so high used memory just because rgw not set xattrs on
> all rados objects parts of big S3 object.
> But with synthetic test with all s3 objects of size 4M it can be easily
> achieved.
> 
> 
> Thanks,
> Aleksei
> 
> 
> On 06/29/2018 03:30 AM, Gregory Farnum wrote:
> > Can you talk more about how you identified this as an issue and came
> > up with the potential solutions you've identified?
> > 
> > Naively, if I'm told that larger objects make the OSD take up more
> > memory, it sounds to me like the OSD is probably providing more
> > throughput, and that if you want it to use up less memory you just
> > ought to change the amount of outstanding IO it lets in to the system.
> > -Greg
> > 
> > On Thu, Jun 28, 2018 at 1:29 AM, Aleksei Gutikov
> > <aleksey.gutikov@xxxxxxxxxx> wrote:
> > > 
> > > NOTE: rgw_max_chunk_size must be equal to rgw_obj_stripe_size, so I mean
> > > both when refer to one.
> > > 
> > > For example when I changed rgw_obj_stripe_size from 4M to 16M OSD memory
> > > usage increased approx 2.5 times.
> > > This issue was reproduced with erasure-coded pools.
> > > 
> > > OSD command dump_mempools show that only anon pool bytes increased.
> > > 
> > > Further investigations show that whole buffer::raw object received from
> > > network
> > > (created in alloc_aligned_buffer() in AsyncConnection.cc:623)
> > > The whole 4M or 16M buffer::raw objects preserved with nref>0 in
> > > PrimaryLogPG::object_contexts
> > > in ObjectContext::attr_cache.
> > > 
> > > This issue was reproduced on both luminous and master branches.
> > > 
> > > 
> > > I see at least two types of improvement:
> > > 
> > > 1) memcpy relatively small parts of buffer::raw when create new
> > > buffer::ptr
> > >    For just example with next compile-time configuration parameters:
> > >      BUFFER_MIN_SIZE_COPY_FROM = 64k
> > >      BUFFER_MAX_SIZE_TO_COPY = 16k
> > >      BUFFER_MIN_RATIO_TO_COPY = 128
> > >    will copy up to 512 bytes from 64k raw object
> > >    or will copy up to 16k from 4M object
> > >    will not copy from 63k raw object
> > > 
> > >    Pros: will improve all issues of this type (preservation of buffer::raw
> > > objects)
> > >    Cons: unknown impact, memory fragmentation for example
> > > 
> > > 2) Improvements related particularly to PrimaryLogPG::object_contexts
> > > 
> > >    2.1) Set osd_pg_object_context_cache_count into 1 or 0
> > >      Cons: cache will not actually work
> > > 
> > >    2.2) Recreate bufferlists of attr_cache entries during inserting into
> > > cache to copy attrs and free huge buffer later.
> > >      Pros: minimal impact on any other subsystems
> > >      Cons: will improve only this particular case
> > > 
> > >    2.3) Limit object_contexts with total used memory also in addition to
> > > osd_pg_object_context_cache_count.
> > >      Cons: cache will probably will not work because each entry will
> > > occupy
> > > lot of memory and all entries will be skipped.
> > > 
> > >    2.4) Remove object_contexts completely, create contexts every time on
> > > fly.
> > >      Cons: object_contexts not looks like spare part that can be safely
> > > removed
> > > 
> > > 
> > > We tested osd_pg_object_context_cache_count=1 as hotfix
> > > and it improved OSD memory usage significantly without dependency from
> > > rgw_obj_stripe_size.
> > > 
> > > 
> > > Can, please, somebody clarify a little bit about purpose of
> > > PrimaryLogPG::object_contexts.
> > > And, maybe suggest something about fixing this issue.
> > > 
> > > 
> > > --
> > > 
> > > Best regards,
> > > Aleksei Gutikov
> > > Software Engineer | synesis.ru | Minsk. BY
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> 
> Best regards,
> Aleksei Gutikov
> Software Engineer | synesis.ru | Minsk. BY
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux