Re: Adding compression support for bluestore.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Allen,

On 16.03.2016 22:02, Allen Samuels wrote:

Compression support approach:
The aim is to provide generic compression support allowing random
object read/write.
To do that compression engine to be placed (logically - actual
implementation may be discussed later) on top of bluestore to
"intercept" read-write requests and modify them as needed.
I think it is going to make the most sense to do the compression and
decompression in _do_write and _do_read (or helpers), within
bluestore--not in some layer that sits above it but communicates
metadata down to it.
My original intention was to minimize bluestore modifications needed to add
compression support. Particularly this helps to avoid additional bluestore
complication.
Another point for a segregation is a potential ability to move compression
engine out of store level to a pool one in the future.
Remember we still have 200% CPU utilization overhead for current approach
with replicated pools as each replica is compressed independently.
One advantage of the current scheme is that you can use the same basic flow for EC and replicated pools. The scheme that you propose means that EC chunking boundaries become fluid and data-sensitive -- destroying the "seek" capability (i.e., you no longer know which node has any given logical address within the object). Essentially you'll need an entirely different backend flow for EC pools (at this level) with a complicated metadata mapping scheme. That seems MUCH more complicated and run-time expensive to me.
Wouldn't agree with this statement. Perhaps I improperly presented my ideas or missing something... IMHO current EC pool's write pattern is just a regular append only mode. And read pattern is partially random - EC reads data in arbitrary order at specific offsets only. As long as some layer is able to handle such patterns it's probably OK for EC pool. And I don't see any reasons why compression layer is unable to do that and what's the difference comparing to replicated pools. Actually my idea about segregation was mainly about reusing existing bluestore rather than modifying it. Compression engine should somehow (e.g. by inheriting from bluestore and overriding _do_write/_do_read methods ) intercept write/read requests and maintain its OWN block management independent from bluestore one. Bluestore is left untouched and exposes its functionality ( via Read/Write handlers) AS-IS to the compression layer instead of pools. The key thing is that compressed blocks map and bluestore extents map use the same logical offset, i.e. if some compressed block starts at offset X it's written to bluestore at offset X too. But written block is shorter than original one and thus store space is saved. I would agree with the comment that this probably complicates metadata handling - compression layer metadata has to be handled similar to bluestore ones ( proper sync, WAL, transaction, etc). But I don't see any issues specific to EC here...
Have I missed something?

PS. This is rather academic question to better understand the difference in our POVs. Please ignore if you find it obtrusive or don't have enough time for detailed explanation. It looks like we wouldn't go this way in any case.

Thanks,
Igor
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux