RE: [PATCH 2/3] ec: make use of added aligned buffers

"Ma, Jianpeng" <jianpeng.ma@xxxxxxxxx> · Tue, 16 Sep 2014 08:23:03 +0000

I see it .Thanks very much!

Jianpeng
> -----Original Message-----
> From: Loic Dachary [mailto:loic@xxxxxxxxxxx]
> Sent: Tuesday, September 16, 2014 3:55 PM
> To: Ma, Jianpeng
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH 2/3] ec: make use of added aligned buffers
> 
> 
> 
> On 16/09/2014 08:59, Ma, Jianpeng wrote:
> >> -----Original Message-----
> >> From: Loic Dachary [mailto:loic@xxxxxxxxxxx]
> >> Sent: Tuesday, September 16, 2014 2:47 PM
> >> To: Ma, Jianpeng
> >> Cc: ceph-devel@xxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH 2/3] ec: make use of added aligned buffers
> >>
> >>
> >>
> >> On 16/09/2014 02:08, Ma, Jianpeng wrote:
> >>>> -----Original Message-----
> >>>> From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> >>>> Sent: Tuesday, September 16, 2014 8:02 AM
> >>>> To: Ma, Jianpeng
> >>>> Cc: Loic Dachary; Janne Grunau; ceph-devel@xxxxxxxxxxxxxxx
> >>>> Subject: RE: [PATCH 2/3] ec: make use of added aligned buffers
> >>>>
> >>>> On Mon, 15 Sep 2014, Ma, Jianpeng wrote:
> >>>>> If we modify bufferlist::c_str()  to bufferlist::c_str(bool align).
> >>>>> If (align)
> >>>>>   Posix_memalign(data, CEPH_PAGE_SIZE, len) Else
> >>>>>   Origin code.
> >>>>
> >>>> Alignment isn't really a bool, it's an int.  c_str(int align=1) ?
> >>>>
> >>> I mean if we need a align memory after bufferlist::c_str. We can set
> >>> the align = true; Now bufferlist::c_str depend on the size when
> >>> using rebuild if
> >> bufferlist have more prt.
> >>>
> >>> BTW, we also can change the rebuild() && rebuild(ptr). Merge two
> >>> func into
> >> one rebuild(bool align).
> >>> By judge the parameter align to alloc align memory or not.
> >>>
> >>> Or am I missing something?
> >>
> >> I don't think there is a need for c_str(int align). We make every
> >> effort to allocate buffers that are properly aligned. If c_str() does
> >> not return an aligned buffer, the proper fix is to align the
> >> allocated buffer at the source, not to allocate a new aligned buffer and copy
> the content of the non aligned buffer into it.
> >>
> >
> >> Do you see a reason why that would be a bad way to deal with alignment ?
> >  We only guarantee memory align after c_str and don't depend on the
> previous steps.
> 
> This is a benefit indeed: less room for error in general.
> 
> > If encode[i] have more ptr suppose they all aligned memory.
> > But how to guarantee aligned memory after c_str.
> > If bufferlist have more ptr, the aligned memory depend on the size of
> bufferlist.
> 
> The ErasureCode::encode_prepare prepares the allocated buffer so that
> 
> *) it holds enough room to contain all chunks
> *) c_str() on a chunk boundary will return a pointer that does not require
> allocating memory
> 
> The ErasureCodeJerasure::get_chunk_size function determines the size of the
> buffer allocated by encode_prepare and further guarantees that each chunk is:
> 
> *) size aligned on 16 bytes
> *) memory aligned on 16 bytes so that SIMD instructions can use it without
> moving it
> 
> In other words, if c_str() had to re-allocate the buffer because it is not aligned,
> it would mean that the allocation of the buffer is not done as it should.
> 

> Cheers
> 
> > Jianpeng
> >>
> >> Cheers
> >>
> >>>
> >>> Jianpeng
> >>>> sage
> >>>>
> >>>>>
> >>>>> I think this is simple and correctly code.
> >>>>>
> >>>>> Jianpeng
> >>>>> Thanks!
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx
> >>>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic
> >>>>>> Dachary
> >>>>>> Sent: Tuesday, September 16, 2014 1:20 AM
> >>>>>> To: Janne Grunau; ceph-devel@xxxxxxxxxxxxxxx
> >>>>>> Subject: Re: [PATCH 2/3] ec: make use of added aligned buffers
> >>>>>>
> >>>>>> Hi Janne,
> >>>>>>
> >>>>>> See below:
> >>>>>>
> >>>>>> On 15/09/2014 17:55, Janne Grunau wrote:
> >>>>>>> Requiring page aligned buffers and realigning the input if
> >>>>>>> necessary creates measurable oberhead.
> >> ceph_erasure_code_benchmark
> >>>>>>> is ~30% faster with this change for technique=reed_sol_van,k=2,m=1.
> >>>>>>>
> >>>>>>> Also prevents a misaligned buffer when
> >>>>>>> bufferlist::c_str(bufferlist) has to allocate a new buffer to
> >>>>>>> provide continuous one. See bug #9408
> >>>>>>>
> >>>>>>> Signed-off-by: Janne Grunau <j@xxxxxxxxxx>
> >>>>>>> ---
> >>>>>>>  src/erasure-code/ErasureCode.cc | 46
> >>>>>>> +++++++++++++++++++++++++----------------
> >>>>>>>  src/erasure-code/ErasureCode.h  |  3 ++-
> >>>>>>>  2 files changed, 30 insertions(+), 19 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/src/erasure-code/ErasureCode.cc
> >>>>>>> b/src/erasure-code/ErasureCode.cc index 5953f49..078f60b 100644
> >>>>>>> --- a/src/erasure-code/ErasureCode.cc
> >>>>>>> +++ b/src/erasure-code/ErasureCode.cc
> >>>>>>> @@ -54,22 +54,38 @@ int
> >>>>>> ErasureCode::minimum_to_decode_with_cost(const
> >>>>>>> set<int> &want_to_read,  }
> >>>>>>>
> >>>>>>>  int ErasureCode::encode_prepare(const bufferlist &raw,
> >>>>>>> -                                bufferlist *prepared) const
> >>>>>>> +                                map<int, bufferlist>
> &encoded)
> >>>>>> const
> >>>>>>>  {
> >>>>>>>    unsigned int k = get_data_chunk_count();
> >>>>>>>    unsigned int m = get_chunk_count() - k;
> >>>>>>>    unsigned blocksize = get_chunk_size(raw.length());
> >>>>>>> -  unsigned padded_length = blocksize * k;
> >>>>>>> -  *prepared = raw;
> >>>>>>> -  if (padded_length - raw.length() > 0) {
> >>>>>>> -    bufferptr pad(padded_length - raw.length());
> >>>>>>> -    pad.zero();
> >>>>>>> -    prepared->push_back(pad);
> >>>>>>> +  unsigned pad_len = blocksize * k - raw.length();
> >>>>>>> +
> >>>>>>> +  bufferlist prepared = raw;
> >>>>>>> +
> >>>>>>> +  if (!prepared.is_aligned()) {
> >>>>>>> +    prepared.rebuild_aligned();  }
> >>>>>>> +
> >>>>>>> +  for (unsigned int i = 0; i < k - !!pad_len; i++) {
> >>>>>>> +    int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] :
> i;
> >>>>>>> +    bufferlist &chunk = encoded[chunk_index];
> >>>>>>> +    chunk.substr_of(prepared, i * blocksize, blocksize);  }
> >>>>>>
> >>>>>> It is possible for more than one chunk to be padding. It's a
> >>>>>> border case but... for instance with alignment = 16, k=12 and in
> >>>>>> of length
> >>>>>> 1550 you end up with two padding chunks because the blocksize is 144.
> >>>>>>
> >>>>>>> +  if (pad_len > 0) {
> >>>>>>> +    int chunk_index = chunk_mapping.size() > 0 ?
> >>>>>>> + chunk_mapping[k
> >>>>>>> + - 1] : k -
> >>>>>> 1;
> >>>>>>> +    bufferlist &chunk = encoded[chunk_index];
> >>>>>>> +    bufferptr padded(buffer::create_aligned(blocksize));
> >>>>>>> +    raw.copy((k - 1) * blocksize, blocksize - pad_len, padded.c_str());
> >>>>>>> +    padded.zero(blocksize - pad_len, pad_len);
> >>>>>>> +    chunk.push_back(padded);
> >>>>>>>    }
> >>>>>>> -  unsigned coding_length = blocksize * m;
> >>>>>>> -  bufferptr coding(buffer::create_page_aligned(coding_length));
> >>>>>>> -  prepared->push_back(coding);
> >>>>>>> -  prepared->rebuild_page_aligned();
> >>>>>>> +  for (unsigned int i = k; i < k + m; i++) {
> >>>>>>> +    int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] :
> i;
> >>>>>>> +    bufferlist &chunk = encoded[chunk_index];
> >>>>>>> +    chunk.push_back(buffer::create_aligned(blocksize));
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>>    return 0;
> >>>>>>>  }
> >>>>>>>
> >>>>>>> @@ -80,15 +96,9 @@ int ErasureCode::encode(const set<int>
> >>>>>> &want_to_encode,
> >>>>>>>    unsigned int k = get_data_chunk_count();
> >>>>>>>    unsigned int m = get_chunk_count() - k;
> >>>>>>>    bufferlist out;
> >>>>>>> -  int err = encode_prepare(in, &out);
> >>>>>>> +  int err = encode_prepare(in, *encoded);
> >>>>>>>    if (err)
> >>>>>>>      return err;
> >>>>>>> -  unsigned blocksize = get_chunk_size(in.length());
> >>>>>>> -  for (unsigned int i = 0; i < k + m; i++) {
> >>>>>>> -    int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] :
> i;
> >>>>>>> -    bufferlist &chunk = (*encoded)[chunk_index];
> >>>>>>> -    chunk.substr_of(out, i * blocksize, blocksize);
> >>>>>>> -  }
> >>>>>>>    encode_chunks(want_to_encode, encoded);
> >>>>>>>    for (unsigned int i = 0; i < k + m; i++) {
> >>>>>>>      if (want_to_encode.count(i) == 0) diff --git
> >>>>>>> a/src/erasure-code/ErasureCode.h
> >>>>>>> b/src/erasure-code/ErasureCode.h index 7aaea95..62aa383 100644
> >>>>>>> --- a/src/erasure-code/ErasureCode.h
> >>>>>>> +++ b/src/erasure-code/ErasureCode.h
> >>>>>>> @@ -46,7 +46,8 @@ namespace ceph {
> >>>>>>>                                              const
> map<int,
> >>>> int>
> >>>>>> &available,
> >>>>>>>                                              set<int>
> >>>> *minimum);
> >>>>>>>
> >>>>>>> -    int encode_prepare(const bufferlist &raw, bufferlist *prepared)
> >>>> const;
> >>>>>>> +    int encode_prepare(const bufferlist &raw,
> >>>>>>> +                       map<int, bufferlist> &encoded) const;
> >>>>>>>
> >>>>>>>      virtual int encode(const set<int> &want_to_encode,
> >>>>>>>                         const bufferlist &in,
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Lo?c Dachary, Artisan Logiciel Libre
> >>>>>
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> >>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>>>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> >>> info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>
> >> --
> >> Loïc Dachary, Artisan Logiciel Libre
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> --
> Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html