RE: controlling erasure code chunk size

Andreas Joachim Peters <Andreas.Joachim.Peters@xxxxxxx> · Sun, 2 Feb 2014 16:18:53 +0000

Hi Loic et.al.

I think there is now some confusion about chunk_size, alignment, packetsize and the stripe_size to be used upstream.

Algorithms with a bit-matrix require that the size per device is a multiple of (packetsize*w). Moreover the size per device and packetsize itself must be a multiple of sizeof(long/int). For other algorithms  you can assume the same with packetsize=1.

packetsize and w influence  the performance and too small stripe_size on top will have negative performance effects due to the preparation of bufferlist, internal buffer checks and more loops to execute for the same amount of data. We can also do some measurement for this but the current benchmark would probably not reflect this, since it measures the algorithmic part not the bufferlist preparation part.

If you want to define a stripe_size it has to be a multiple of the value returned by get_chunksize  and possibly it is a large multiple but in total not larger than processor caches. The plugin can not define the stripe_size, it defines only the alignment to be used for stripe_size and stripe_size is defined outside the plugin which maybe complicates the understanding. We should carefully check once more the Jerasure alignment requirements and our current implementation.

To get rid of the platform dependency we could put a generic alignment requirement that chunksize has to be also 64-byte aligned. 

Cheers Andreas.

________________________________________
From: Loic Dachary [loic@xxxxxxxxxxx]
Sent: 02 February 2014 16:15
To: Samuel Just
Cc: Ceph Development; Andreas Joachim Peters
Subject: controlling erasure code chunk size

[cc' ceph-devel]

Hi Sam,

Here is how chunks are expected to be aligned:

https://github.com/ceph/ceph/blob/4c4e1d0d470beba7690d1c0e39bfd1146a25f465/src/osd/ErasureCodePluginJerasure/ErasureCodeJerasure.cc#L365

 unsigned alignment = k*w*packetsize*sizeof(int);
  if ( ((w*packetsize*sizeof(int))%LARGEST_VECTOR_WORDSIZE) )
    alignment = k*w*packetsize*LARGEST_VECTOR_WORDSIZE;
  return alignment;

If you are going to encode small objects, it may very well lead to oversized chunks if packetsize is large. At the moment the default is 3072

https://github.com/ceph/ceph/blob/4c4e1d0d470beba7690d1c0e39bfd1146a25f465/src/common/config_opts.h#L406

A value I picked when experimenting with 1MB objects encoding ( http://dachary.org/?p=2594 ).

I'm not entirely sure why the alignment is calculated the way it is. Andreas certainly has a better understanding on this topic.

Cheers

--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html