Re: OSD memory consumption significantly increased with greater rgw_obj_stripe_size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 07/03/2018 05:55 AM, Sage Weil wrote:
On Fri, 29 Jun 2018, Aleksei Gutikov wrote:
Throughput is 100% the same, just sliced into bigger chunks (rados objects).
And this throughput is not high, less than single object per second. And
memory stay occupied even after writing stopped.

Currently I'm sure that is side effect of sharing buffer::raw object among
different buffer::ptr objects.

Please, have a look into this dump of ObjectContext::attr_cache of one of
context in PrimaryLogPG::object_contexts, made after uploading single 4M
object into S3.
Notice "_user.rgw.idtag" and "_user.rgw.tail_tag" xattrs, both 44 bytes
length, holidng 4194304 bytes buffer::raw object (nref=2).

That is the smoking gun!  What version is this?

Particularly this dump from 12.2.2
But issue was also reproducible for 12.2.5 and master.

Thanks!
sage




"_": buffer::list(len=302, buffer::ptr(0~302 0x559318e74d80 in raw
0x559318e74d80 len 488 nref 1) ),

"_user.rgw.acl": buffer::list(len=147, buffer::ptr(448~147 0x55931677c4c0 in
raw 0x55931677c300 len 1126 nref 9) ),

"_user.rgw.content_type": buffer::list(len=25, buffer::ptr(616~25
0x55931677c568 in raw 0x55931677c300 len 1126 nref 9) ),

"_user.rgw.etag": buffer::list(len=33, buffer::ptr(654~33 0x55931677c58e in
raw 0x55931677c300 len 1126 nref 9) ),

"_user.rgw.idtag": buffer::list(len=44, buffer::ptr(14~44 0x55931958e00e in
raw 0x55931958e000 len 4194304 nref 2) ),

"_user.rgw.manifest": buffer::list(len=300, buffer::ptr(136~300 0x55931677c388
in raw 0x55931677c300 len 1126 nref 9) ),

"_user.rgw.pg_ver": buffer::list(len=8, buffer::ptr(0~8 0x559319124000 in raw
0x559319124000 len 4008 nref 1) ),

"_user.rgw.source_zone": buffer::list(len=4, buffer::ptr(1122~4 0x55931677c762
in raw 0x55931677c300 len 1126 nref 9) ),

"_user.rgw.tail_tag": buffer::list(len=44, buffer::ptr(75~44 0x55931958e04b in
raw 0x55931958e000 len 4194304 nref 2) ),

"_user.rgw.x-amz-content-sha256": buffer::list(len=65, buffer::ptr(716~65
0x55931677c5cc in raw 0x55931677c300 len 1126 nref 9) ),

"_user.rgw.x-amz-date": buffer::list(len=17, buffer::ptr(800~17 0x55931677c620
in raw 0x55931677c300 len 1126 nref 9) ),

"_user.rgw.x-amz-meta-s3cmd-attrs": buffer::list(len=173, buffer::ptr(848~173
0x55931677c650 in raw 0x55931677c300 len 1126 nref 9) ),

"_user.rgw.x-amz-storage-class": buffer::list(len=9, buffer::ptr(1049~9
0x55931677c719 in raw 0x55931677c300 len 1126 nref 9) ),

"snapset": buffer::list(len=35, buffer::ptr(0~35 0x559319127000 in raw
0x559319127000 len 4008 nref 1) )


Theoretically with 300 pg per osd and EC 8+3 and
osd_pg_object_context_cache_count=64
and rgw_obj_stripe_size=4M
this cache can consume up to 300/11*64*4M = 6.9G
just because of this side effect of shared buffer::raw.
We not see so high used memory just because rgw not set xattrs on
all rados objects parts of big S3 object.
But with synthetic test with all s3 objects of size 4M it can be easily
achieved.


Thanks,
Aleksei


On 06/29/2018 03:30 AM, Gregory Farnum wrote:
Can you talk more about how you identified this as an issue and came
up with the potential solutions you've identified?

Naively, if I'm told that larger objects make the OSD take up more
memory, it sounds to me like the OSD is probably providing more
throughput, and that if you want it to use up less memory you just
ought to change the amount of outstanding IO it lets in to the system.
-Greg

On Thu, Jun 28, 2018 at 1:29 AM, Aleksei Gutikov
<aleksey.gutikov@xxxxxxxxxx> wrote:

NOTE: rgw_max_chunk_size must be equal to rgw_obj_stripe_size, so I mean
both when refer to one.

For example when I changed rgw_obj_stripe_size from 4M to 16M OSD memory
usage increased approx 2.5 times.
This issue was reproduced with erasure-coded pools.

OSD command dump_mempools show that only anon pool bytes increased.

Further investigations show that whole buffer::raw object received from
network
(created in alloc_aligned_buffer() in AsyncConnection.cc:623)
The whole 4M or 16M buffer::raw objects preserved with nref>0 in
PrimaryLogPG::object_contexts
in ObjectContext::attr_cache.

This issue was reproduced on both luminous and master branches.


I see at least two types of improvement:

1) memcpy relatively small parts of buffer::raw when create new
buffer::ptr
    For just example with next compile-time configuration parameters:
      BUFFER_MIN_SIZE_COPY_FROM = 64k
      BUFFER_MAX_SIZE_TO_COPY = 16k
      BUFFER_MIN_RATIO_TO_COPY = 128
    will copy up to 512 bytes from 64k raw object
    or will copy up to 16k from 4M object
    will not copy from 63k raw object

    Pros: will improve all issues of this type (preservation of buffer::raw
objects)
    Cons: unknown impact, memory fragmentation for example

2) Improvements related particularly to PrimaryLogPG::object_contexts

    2.1) Set osd_pg_object_context_cache_count into 1 or 0
      Cons: cache will not actually work

    2.2) Recreate bufferlists of attr_cache entries during inserting into
cache to copy attrs and free huge buffer later.
      Pros: minimal impact on any other subsystems
      Cons: will improve only this particular case

    2.3) Limit object_contexts with total used memory also in addition to
osd_pg_object_context_cache_count.
      Cons: cache will probably will not work because each entry will
occupy
lot of memory and all entries will be skipped.

    2.4) Remove object_contexts completely, create contexts every time on
fly.
      Cons: object_contexts not looks like spare part that can be safely
removed


We tested osd_pg_object_context_cache_count=1 as hotfix
and it improved OSD memory usage significantly without dependency from
rgw_obj_stripe_size.


Can, please, somebody clarify a little bit about purpose of
PrimaryLogPG::object_contexts.
And, maybe suggest something about fixing this issue.


--

Best regards,
Aleksei Gutikov
Software Engineer | synesis.ru | Minsk. BY
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--

Best regards,
Aleksei Gutikov
Software Engineer | synesis.ru | Minsk. BY
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--

Best regards,
Aleksei Gutikov
Software Engineer | synesis.ru | Minsk. BY
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux