Re: Bluestore performance degradation for the latest master

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



IMHO one bug is at

bool BlueStore::ExtentMap::update(Onode *o, KeyValueDB::Transaction t,
                                  bool force)
{

...

      if (!force && len > g_conf->bluestore_extent_map_shard_max_size) {
>>>>        inline_bl.clear();
        return true;
      }

inline_bl should be preserved for proper functioning at

void BlueStore::ExtentMap::reshard(Onode *o, uint64_t min_alloc_size)
...

  unsigned bytes = 0;
  if (o->onode.extent_map_shards.empty()) {
>>>    bytes = inline_bl.length();
Commenting out inline_bl.clear seems to be enough in this case.

Second suspicious place is two lines above:

bool BlueStore::ExtentMap::update(Onode *o, KeyValueDB::Transaction t,
                                  bool force)
{

...

    if (inline_bl.length() == 0) {
      unsigned n;
      if (encode_some(0, OBJECT_MAX_SIZE, inline_bl, &n)) {
    return true;
      }

Looks like encode_some might return true and leave inline_bl empty...
Still thinking how to fix that...

Thanks,
Igor

On 21.10.2016 18:29, Somnath Roy wrote:
Hi Sage,
I am profiling yesterday's master and seeing significant performance degradation both for bigger block seq write and 4K RW.

See the write amp (~5X) it is introducing for 4K RW (with min_alloc_size = 4K)

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  52   8  32   6   0   2|  98M  504M| 122M   81M|   0     0 | 318k  361k
  53   8  31   6   0   2|  91M  517M| 123M   83M|   0     0 | 358k  357k
  53   8  30   6   0   2|  90M  572M| 123M   81M|   0     0 | 339k  348k
  53   8  30   6   0   2|  95M  560M| 123M   81M|   0     0 | 321k  354k
  53   8  31   6   0   2|  97M  509M| 125M   82M|   0     0 | 315k  365k
  53   8  30   6   0   2|  87M  567M| 122M   81M|   0     0 | 322k  355k
  53   9  30   6   0   2|  92M  547M| 122M   81M|   0     0 | 345k  361k

I think the reason is, it seems the extent sharding logic is broken. See the tx we are writing after 10 min run (4K rw on 100G image after 1M preconditioning). Onode key is 18K , no shard key..

2016-10-21 08:24:35.666549 7fd3ac701700 30 submit_transaction Rocksdb transaction:
Put( Prefix = M key = 0x0000000000000490'.0000000016.00000000000000019834' Value size = 182)
Put( Prefix = M key = 0x0000000000000490'._fastinfo' Value size = 186)
Put( Prefix = O key = 0x7f800000000000000139748c93217262'd_data.10156b8b4567.000000000000324c!='0xfffffffffffffffeffffffffffffffff'o' Value size = 18616)
Merge( Prefix = b key = 0x0000004590180000 Value size = 16)
Merge( Prefix = b key = 0x00000046df380000 Value size = 16)

Thanks & Regards
Somnath
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux