Re: storing pg logs outside of rocksdb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/20/2018 08:30 AM, Ilya Dryomov wrote:
On Wed, Jun 20, 2018 at 1:58 PM Mark Nelson <mark.a.nelson@xxxxxxxxx> wrote:

Hi Lisa,


On 06/20/2018 03:19 AM, xiaoyan li wrote:
   Hi all,
I wrote a poc to split pglog from Rocksdb and store them into
standalone space in the block device.

Excellent!  This is very exciting!

The updates are done in OSD and BlueStore:

OSD parts:
1.       Split pglog entries and pglog info from omaps.
BlueStore:
1.       Allocate 16M space in block device per PG for storing pglog.
2.       Per every transaction from OSD,  combine pglog entries and
pglog info, and write them into a block. The block is set to 4k at
this moment.

Currently, I only make the write workflow work.
With librbd+fio on a cluster with an OSD (on Intel Optane 370G), I got
the following performance for 4k random writes, and the performance
got 13.87% better.

Master:
    write: IOPS=48.3k, BW=189MiB/s (198MB/s)(55.3GiB/300009msec)
      slat (nsec): min=1032, max=1683.2k, avg=4345.13, stdev=3988.69
      clat (msec): min=3, max=123, avg=10.60, stdev= 8.31
       lat (msec): min=3, max=123, avg=10.60, stdev= 8.31

Pgsplit branch:
    write: IOPS=55.0k, BW=215MiB/s (225MB/s)(62.0GiB/300010msec)
      slat (nsec): min=1068, max=1339.7k, avg=4360.58, stdev=3878.47
      clat (msec): min=2, max=120, avg= 9.30, stdev= 6.92
       lat (msec): min=2, max=120, avg= 9.31, stdev= 6.92

These are better numbers than I typically get!  I'll play with your
branch but usually I see us pegged in this workload in the
kv_sync_thread.  Did you notice any significant change in CPU consumption?


Here is the POC: https://github.com/lixiaoy1/ceph/commits/pglog-split-fastinfo
The problem is that per every transaction, I use a 4k block to save
the pglog entries and pglog info which is only 130+920 = 1050 bytes.
This wastes a lot of space.
Any suggestions?

I guess 100*3000*4k = ~1.2GB?

What is 100 for?  An estimate for a number of PGs on an OSD?

yes, exactly. I think there's a reasonable argument that 100 PGs per OSD and a log length of 3000 isn't a satisfactory long term target though.

Mark


Thanks,

                 Ilya

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux