RE: controlling erasure code chunk size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Loic,
for the sizeof(int)... the reason is that JERAUSRE internally uses uses long* addresses with operations on them e.g. if you XOR two chunks of size 3 you access illegal memory a long* xor's also byte 4.

The PDF documentation says this:

int packetsize: The packet size as defined in section 1. This must be a multiple of sizeof(long).

int size: The total number of bytes per device to encode/decode. This must be a multiple of sizeof(long). If a
bit-matrix is being employed, then it must be a multiple of packetsize * w. If one desires to encode data blocks
that do not conform to these restrictions, than one must pad the data blocks with zeroes so that the restrictions
are met.

You cannot just do the modulo adjustment because this breaks the requirement that len is a multiple of packetsize*w  !!!
Imagine w=3, packetsize=4 .... a module add would adjust it to 16 and you cannot divide 16 by 12, so the smallest proper adjustment here is 48 ! So the most simple approach is to add just another " * VECTOR_WORD_SIZE" since the condition will be always fulfilled. 

Cheers Andreas.
________________________________________
From: ceph-devel-owner@xxxxxxxxxxxxxxx [ceph-devel-owner@xxxxxxxxxxxxxxx] on behalf of Loic Dachary [loic@xxxxxxxxxxx]
Sent: 04 February 2014 17:17
To: Andreas Joachim Peters
Cc: Ceph Development
Subject: Re: controlling erasure code chunk size

Hi Andreas,

> For w=(multiple of 8) we could probably skip the (*sizeof(int)) and get the chunksize factor 4 down ... Loic we should check if this is ok with the Jerasure implementation .... I wonder if we should have 'packetsize' as a plugin parameter or we should just adjust the packetsize based on the desired chunk_size to get it close.

You are correct : the packet size is best adapted to the object size (or stripe size) rather than being set once for all. However Sam wants to use a fixed stripe size and we don't need this flexibility right now.

I don't fully understand the alignment requirements of Jerasure. Since we're using Cauchy because it is the fastest, here is how I understand its alignment constraints. I copied them from the original encode/decode methods found in jerasure into the get_alignment method whithout understanding the details.

* each chunk memory address must be aligned to allow
https://github.com/ceph/ceph/blob/v0.76/src/osd/ErasureCodePluginJerasure/vectorop.h to be used by https://github.com/ceph/ceph/blob/v0.76/src/osd/ErasureCodePluginJerasure/galois.c#L748 . This is done without reading from get_alignment() because each buffer is created with https://github.com/ceph/ceph/blob/v0.76/src/common/buffer.cc#L519 buffer::create_page_aligned which calls https://github.com/ceph/ceph/blob/v0.76/src/common/buffer.cc#L235 posix_memalign with an alignment of CEPH_PAGE_SIZE which is large enough. It is implicit though and it would be better to explicitly set this constraint.

https://github.com/ceph/ceph/blob/v0.76/src/osd/ErasureCodePluginJerasure/ErasureCodeJerasure.cc#L288

* each chunk size must be a multiple of get_alignment() and in the case of the Cauch techniques it means:
** being a multiple of sizeof(int) (why?)
** being a multiple of LARGEST_VECTOR_WORDSIZE (because https://github.com/ceph/ceph/blob/v0.76/src/osd/ErasureCodePluginJerasure/galois.c#L748)
** being a multiple of k*w*packetsize (because each chunk contains k packets of packets size and each packet is made of words of size w)

I would be grateful if you could explain what the sizeof(int) is about. Also, I understand that k*w*packetsize should be a multiple of LARGEST_VECTOR_WORDSIZE but I don't understand why you would multiply the alignment to achieve this. Is it be enough to if (alignment % LARGEST_VECTOR_WORDSIZE) alignment += alignment % LARGEST_VECTOR_WORDSIZE ?

Thanks in advance for your patience :-)

--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux