Re: controlling erasure code chunk size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sam,

The argument to get_chunk_size is the stripe width, named object_size because the API knows nothing about stripes, it is a concept for the caller to implement. Say you have a desired chunk size in mind, you would:

   object_size = desired_chunk_size * get_data_chunk_count()
   actual_chunk_size = get_chunk_size(object_size)

If you have a desired stripe width / object size in mind you would:

   object_size = desired_stripe_width
   chunk_size = get_chunk_size(object_size)

Following Andreas suggestions, controlling the size of the actual chunk is a matter of tweaking the alignment constraints via the erasure code plugin parameters. 

Cheers

On 02/02/2014 23:45, Samuel Just wrote:
> I assume we will use get_chunksize(desired_chunksize) *
> get_data_chunk_count() on the mon to define the stripe width (the size
> of the buffer which will be presented to the plugin for encoding) for
> the pool.  At the moment, get_chunksize(4*(2<<10)) *
> get_data_chunk_count() = 393216 using the jerasure plugin where
> get_data_chunk_count() = 4.  This seems a bit big?
> -Sam
> 
> On Sun, Feb 2, 2014 at 8:18 AM, Andreas Joachim Peters
> <Andreas.Joachim.Peters@xxxxxxx> wrote:
>> Hi Loic et.al.
>>
>> I think there is now some confusion about chunk_size, alignment, packetsize and the stripe_size to be used upstream.
>>
>> Algorithms with a bit-matrix require that the size per device is a multiple of (packetsize*w). Moreover the size per device and packetsize itself must be a multiple of sizeof(long/int). For other algorithms  you can assume the same with packetsize=1.
>>
>> packetsize and w influence  the performance and too small stripe_size on top will have negative performance effects due to the preparation of bufferlist, internal buffer checks and more loops to execute for the same amount of data. We can also do some measurement for this but the current benchmark would probably not reflect this, since it measures the algorithmic part not the bufferlist preparation part.
>>
>> If you want to define a stripe_size it has to be a multiple of the value returned by get_chunksize  and possibly it is a large multiple but in total not larger than processor caches. The plugin can not define the stripe_size, it defines only the alignment to be used for stripe_size and stripe_size is defined outside the plugin which maybe complicates the understanding. We should carefully check once more the Jerasure alignment requirements and our current implementation.
>>
>> To get rid of the platform dependency we could put a generic alignment requirement that chunksize has to be also 64-byte aligned.
>>
>> Cheers Andreas.
>>
>>
>>
>>
>> ________________________________________
>> From: Loic Dachary [loic@xxxxxxxxxxx]
>> Sent: 02 February 2014 16:15
>> To: Samuel Just
>> Cc: Ceph Development; Andreas Joachim Peters
>> Subject: controlling erasure code chunk size
>>
>> [cc' ceph-devel]
>>
>> Hi Sam,
>>
>> Here is how chunks are expected to be aligned:
>>
>> https://github.com/ceph/ceph/blob/4c4e1d0d470beba7690d1c0e39bfd1146a25f465/src/osd/ErasureCodePluginJerasure/ErasureCodeJerasure.cc#L365
>>
>>  unsigned alignment = k*w*packetsize*sizeof(int);
>>   if ( ((w*packetsize*sizeof(int))%LARGEST_VECTOR_WORDSIZE) )
>>     alignment = k*w*packetsize*LARGEST_VECTOR_WORDSIZE;
>>   return alignment;
>>
>> If you are going to encode small objects, it may very well lead to oversized chunks if packetsize is large. At the moment the default is 3072
>>
>> https://github.com/ceph/ceph/blob/4c4e1d0d470beba7690d1c0e39bfd1146a25f465/src/common/config_opts.h#L406
>>
>> A value I picked when experimenting with 1MB objects encoding ( http://dachary.org/?p=2594 ).
>>
>> I'm not entirely sure why the alignment is calculated the way it is. Andreas certainly has a better understanding on this topic.
>>
>> Cheers
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux