Re: Substriping support in ErasureCodeInterface

Loic Dachary <loic@xxxxxxxxxxx> · Fri, 05 Jun 2015 19:46:43 +0200

Hi,

On 05/06/2015 15:34, Sindre Stene wrote:
> Sending the mail again without pesky html tags.
> 
> On Fri, Jun 5, 2015 at 2:40 PM, Loic Dachary <loic@dachary.orgwrote:
>> (...)
>> Why do you think the current interface is insufficient ? What would you
>> need in addition ?
> 
> I am not sure whether or not the interface is sufficient. Let me try to
> explain my assumptions, so that we can clarify.
> 
> Lets say for the sake of simplicity that I have a system of 14 HDDs, with an
> allocation unit size of 4K, and am using K = 10 systematic drives, and M = 4
> redundancy drives (and no spares).
> Lets say that I am using an encoding scheme with 8 substripes (for each of
> the 14 stripes), and only one object size, that perfectly matches the
> scheme: 4K * 10 * 8. My raw objects are then of 80 chunks, and i am adding
> 32 chunks of redundancy data, making each encoded object take up 4K*8*14 =
> 448K. The chunks are to be physically stored with these offsets:
> HDD0: chunks 0-7
> HDD1: chunks 8-15
> (...)
> HDD13 (redundancy 3): chunks
> Assuming that the coding scheme is MDS, the encoding scheme would guarantee
> recovery of up to 4 lost hard drives. It would not guarantee recovery for 32
> arbitrary chunks (which is the same data amount when considering a single
> object), as they would have to be organized in adjacent groups of 8.
> Assuming the crush map may be used to configure this sort of chunk
> placement, perhaps the interface is indeed sufficient ?

I think so. 

> And, not specific to the interface definition, but about how Ceph uses the
> interface during operation and during tests:
> Would the interface receive decode requests for sets of chunks that are not
> organized in groups of 8?

The plugin only decodes when at least a chunk is missing, otherwise it just concatenates the chunks. When it decodes, in the case of jerasure, it expects the caller (the OSD in this case) to ask about the minimum amount of chunks that are needed. If I understand correctly the plugin would hide the 8 substripe division from the caller entirely. It the caller needs to be aware of that substripe logic, a new interface would have to be defined and documented.

> Would the subpacketization (or grouping of chunks) create problems with the
> unit tests?
> Do you experts see any other implications or side-effects?
> 
> Motivation; The required read access for recovering one drive in a (14 total
> disks,10 systematic data disks) setup using Reed.Solomon, is 10. This can
> theoretically be reduced by ~40% by introducing substripes (splitting each
> of the 14 parts into many smaller parts, but fundamentally storing the first
> 10 major parts in exactly the same way on the HDD, meaning that the I/O of
> normal reads are not impacted at all). There are many trade-offs to
> consider, and so we wish to test the performance differences.

There is a pull request pending that only reads part of the chunks when the size is smaller than a stripe. This may be useful for workloads involving small objects. Is it what you're thinking about ?

Cheers

> 
> Sincerely,
> Sindre B. Stene
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment:
signature.asc

Description: OpenPGP digital signature