Re: storing pg logs outside of rocksdb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 20, 2018 at 1:58 PM Mark Nelson <mark.a.nelson@xxxxxxxxx> wrote:
>
> Hi Lisa,
>
>
> On 06/20/2018 03:19 AM, xiaoyan li wrote:
> >   Hi all,
> > I wrote a poc to split pglog from Rocksdb and store them into
> > standalone space in the block device.
>
> Excellent!  This is very exciting!
>
> > The updates are done in OSD and BlueStore:
> >
> > OSD parts:
> > 1.       Split pglog entries and pglog info from omaps.
> > BlueStore:
> > 1.       Allocate 16M space in block device per PG for storing pglog.
> > 2.       Per every transaction from OSD,  combine pglog entries and
> > pglog info, and write them into a block. The block is set to 4k at
> > this moment.
> >
> > Currently, I only make the write workflow work.
> > With librbd+fio on a cluster with an OSD (on Intel Optane 370G), I got
> > the following performance for 4k random writes, and the performance
> > got 13.87% better.
> >
> > Master:
> >    write: IOPS=48.3k, BW=189MiB/s (198MB/s)(55.3GiB/300009msec)
> >      slat (nsec): min=1032, max=1683.2k, avg=4345.13, stdev=3988.69
> >      clat (msec): min=3, max=123, avg=10.60, stdev= 8.31
> >       lat (msec): min=3, max=123, avg=10.60, stdev= 8.31
> >
> > Pgsplit branch:
> >    write: IOPS=55.0k, BW=215MiB/s (225MB/s)(62.0GiB/300010msec)
> >      slat (nsec): min=1068, max=1339.7k, avg=4360.58, stdev=3878.47
> >      clat (msec): min=2, max=120, avg= 9.30, stdev= 6.92
> >       lat (msec): min=2, max=120, avg= 9.31, stdev= 6.92
>
> These are better numbers than I typically get!  I'll play with your
> branch but usually I see us pegged in this workload in the
> kv_sync_thread.  Did you notice any significant change in CPU consumption?
>
> >
> > Here is the POC: https://github.com/lixiaoy1/ceph/commits/pglog-split-fastinfo
> > The problem is that per every transaction, I use a 4k block to save
> > the pglog entries and pglog info which is only 130+920 = 1050 bytes.
> > This wastes a lot of space.
> > Any suggestions?
>
> I guess 100*3000*4k = ~1.2GB?

What is 100 for?  An estimate for a number of PGs on an OSD?

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux