Re: storing pg logs outside of rocksdb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/01/2018 10:29 PM, xiaoyan li wrote:
Hi all,

Based on your above discussion about pglog, I have the following rough
design. Please help to give your suggestions.

There will be three partitions: raw part for customer IOs, Bluefs for
Rocksdb, and pglog partition.
The former two partitions are same as current. The pglog partition is
splitted into 1M blocks. We allocate blocks for ring buffers per pg.
We will have such following data:

This isn't relevant for prototyping this, but a partition just used for
pg logs shouldn't be another piece an admin needs to setup. Since
this optimization is only applicable for all-flash scenarios, bluestore
can hide the internal structure and allocate the separate pg log space
itself, within the data device.

Allocation bitmap (just in memory)

The pglog partition has a bitmap to record which block is allocated or
not. We can rebuild it through pg->allocated_block_list when starting,
and no need to store it in persistent disk. But we will store basic
information about the pglog partition in Rocksdb, like block size,
block number etc when the objectstore is initialized.

Pg -> allocated_blocks_list

When a pg is created and IOs start, we can allocate a block for every
pg. Every pglog entry is less than 300 bytes, 1M can store 3495
entries. When total pglog entries increase and exceed the number, we
can add a new block to the pg.

Pg->start_position

Record the oldest valid entry per pg.

Pg->next_position

Record the next entry to add per pg. The data will be updated
frequently, but Rocksdb is suitable for its io mode, and most of
data will be merged.

Updated Bluestore write progess:

When writing data to disk (before metadata updating), we can append
the pglog entry to its ring buffer in parallel.
After that, submit pg ring buffer changes like pg->next_position, and
current other metadata changes to Rocksdb.

This sounds good to me.

Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux