Re: radosgw: scrub causing slow requests in the md log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley <cbodley@xxxxxxxxxx> wrote:
>
> On 06/22/2017 04:00 AM, Dan van der Ster wrote:
>>
>> I'm now running the three relevant OSDs with that patch. (Recompiled,
>> replaced /usr/lib64/rados-classes/libcls_log.so with the new version,
>> then restarted the osds).
>>
>> It's working quite well, trimming 10 entries at a time instead of
>> 1000, and no more timeouts.
>>
>> Do you think it would be worth decreasing this hardcoded value in ceph
>> proper?
>>
>> -- Dan
>
>
> I do, yeah. At least, the trim operation should be able to pass in its own
> value for that. I opened a ticket for that at
> http://tracker.ceph.com/issues/20382.
>
> I'd also like to investigate using the ObjectStore's OP_OMAP_RMKEYRANGE
> operation to trim a range of keys in a single osd op, instead of generating
> a different op for each key. I have a PR that does this at
> https://github.com/ceph/ceph/pull/15183. But it's still hard to guarantee
> that leveldb can process the entire range inside of the suicide timeout.

I wonder if that would help. Here's what I've learned today:

  * two of the 3 relevant OSDs have something screwy with their
leveldb. The primary and 3rd replica are ~quick at trimming for only a
few hundred keys, whilst the 2nd OSD is very very fast always.
  * After manually compacting the two slow OSDs, they are fast again
for just a few hundred trims. So I'm compacting, trimming, ..., in a
loop now.
  * I moved the omaps to SSDs -- doesn't help. (iostat confirms this
is not IO bound).
  * CPU util on the slow OSDs gets quite high during the slow trimming.
  * perf top is below [1]. leveldb::Block::Iter::Prev and
leveldb::InternalKeyComparator::Compare are notable.
  * The always fast OSD shows no leveldb functions in perf top while trimming.

I've tried bigger leveldb cache and block sizes, compression on and
off, and played with the bloom size up to 14 bits -- none of these
changes make any difference.

At this point I'm not confident this trimming will ever complete --
there are ~20 million records to remove at maybe 1Hz.

How about I just delete the meta.log object? Would this use a
different, perhaps quicker, code path to remove those omap keys?

Thanks!

Dan

[1]

   4.92%  libtcmalloc.so.4.2.6;5873e42b (deleted)          [.]
0x0000000000023e8d
   4.47%  libc-2.17.so                                     [.] __memcmp_sse4_1
   4.13%  libtcmalloc.so.4.2.6;5873e42b (deleted)          [.]
0x00000000000273bb
   3.81%  libleveldb.so.1.0.7                              [.]
leveldb::Block::Iter::Prev
   3.07%  libc-2.17.so                                     [.]
__memcpy_ssse3_back
   2.84%  [kernel]                                         [k] port_inb
   2.77%  libstdc++.so.6.0.19                              [.]
std::string::_M_mutate
   2.75%  libstdc++.so.6.0.19                              [.]
std::string::append
   2.53%  libleveldb.so.1.0.7                              [.]
leveldb::InternalKeyComparator::Compare
   1.32%  libtcmalloc.so.4.2.6;5873e42b (deleted)          [.]
0x0000000000023e77
   0.85%  [kernel]                                         [k] _raw_spin_lock
   0.80%  libleveldb.so.1.0.7                              [.]
leveldb::Block::Iter::Next
   0.77%  libtcmalloc.so.4.2.6;5873e42b (deleted)          [.]
0x0000000000023a05
   0.67%  libleveldb.so.1.0.7                              [.]
leveldb::MemTable::KeyComparator::operator()
   0.61%  libtcmalloc.so.4.2.6;5873e42b (deleted)          [.]
0x0000000000023a09
   0.58%  libleveldb.so.1.0.7                              [.]
leveldb::MemTableIterator::Prev
   0.51%  [kernel]                                         [k] __schedule
   0.48%  libruby.so.2.1.0                                 [.] ruby_yyparse
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux