mem use doubles due to buffer::create_page_aligned + bluestore obj content caching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Cephers,

I've just created a ticket related to bluestore object content caching in particular and buffer::create_page_aligned in general.

But I'd like to additionally share this information here as well since the root cause seems to be pretty global.

Ticker URL:

http://tracker.ceph.com/issues/19198

Description:

When caching object content BlueStore uses twice as much memory than it really needs for that data amount.

The root cause seems to be in buffer::create_page_aligned implementation. Actually it results in
new raw_posix_aligned()

  calling mempool::buffer_data::alloc_char.allocate_aligned(len, align);

      calling  posix_memalign((void**)(void*)&ptr, align, total);

sequence that in fact does 2 allocations:

1) for raw_posix_aligned struct
2) for data itself (4096 bytes).

It looks like this sequence causes 2 * 4096 bytes allocation instead of sizeof(raw_posix_aligned) + alignment + 4096. The additional trick is that mempool stuff is unable to estimate such an overhead and hence BlueStore cache cleanup doesn't work properly.

It's not clear for me why allocator(s) behave that inefficiently for such a pattern though.

The issue is reproducible under Ubuntu 16.04.1 LTS for both jemalloc and tcmalloc builds.


The ticket contains the patch to reproduce the issue and one can see that for 16Gb content system mem usage tend to be ~32Gb.

Patch firstly allocates 4K pages 0x400000 times using:

...

+  size_t alloc_count = 0x400000; // allocate 16 Gb total
+  allocs.resize(alloc_count);
+  for( auto i = 0u; i < alloc_count; ++i) {
+    bufferptr p = buffer::create_page_aligned(bsize);
+    bufferlist* bl = new bufferlist;
+    bl->append(p);
+    *(bl->c_str()) = 0; // touch the page to increment system mem use

...

then do the same reproducing  create_page_aligned() implementation:

+  struct fake_raw_posix_aligned{
+    char stub[8];
+    void* data;
+    fake_raw_posix_aligned() {
+ ::posix_memalign(&data, 0x1000, 0x1000); //mempool::buffer_data::alloc_char.allocate_aligned(0x1000, 0x1000);
+      *((char*)data) = 0; // touch the page
+    }
+    ~fake_raw_posix_aligned() {
+      ::free(data);
+    }
+  };
+  vector <fake_raw_posix_aligned*> allocs2;

+  allocs2.resize(alloc_count);
+  for( auto i = 0u; i < alloc_count; ++i) {
+    allocs2[i] = new fake_raw_posix_aligned();
...

Output shows 32Gb usage in both cases.

Mem before: VmRSS: 45232 kB
Mem after: VmRSS: 33599524 kB
Mem actually used: 33554292 kB
Mem pool reports: 16777216 kB
Mem before2: VmRSS: 2161412 kB
Mem after2: VmRSS: 33632268 kB
Mem actually used: 32226156544 bytes


In general there are two issues here:
1) Doubled memory usage
2) mempool is unaware of such an overhead and miscalculates the actual mem usage.

There is probably a way to resolve 2) by forcing raw_combined::create() use in buffer::create_page_aligned and tuning mempool calculation to take page alignment into account. But I'd like to get some comments/thoughts first....


Thanks,
Igor


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux