RE: [PATCH 2/3] ec: make use of added aligned buffers

"Ma, Jianpeng" <jianpeng.ma@xxxxxxxxx> · Tue, 16 Sep 2014 06:59:24 +0000



> -----Original Message-----
> From: Loic Dachary [mailto:loic@xxxxxxxxxxx]
> Sent: Tuesday, September 16, 2014 2:47 PM
> To: Ma, Jianpeng
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH 2/3] ec: make use of added aligned buffers
> 
> 
> 
> On 16/09/2014 02:08, Ma, Jianpeng wrote:
> >> -----Original Message-----
> >> From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> >> Sent: Tuesday, September 16, 2014 8:02 AM
> >> To: Ma, Jianpeng
> >> Cc: Loic Dachary; Janne Grunau; ceph-devel@xxxxxxxxxxxxxxx
> >> Subject: RE: [PATCH 2/3] ec: make use of added aligned buffers
> >>
> >> On Mon, 15 Sep 2014, Ma, Jianpeng wrote:
> >>> If we modify bufferlist::c_str()  to bufferlist::c_str(bool align).
> >>> If (align)
> >>>   Posix_memalign(data, CEPH_PAGE_SIZE, len) Else
> >>>   Origin code.
> >>
> >> Alignment isn't really a bool, it's an int.  c_str(int align=1) ?
> >>
> > I mean if we need a align memory after bufferlist::c_str. We can set
> > the align = true; Now bufferlist::c_str depend on the size when using rebuild if
> bufferlist have more prt.
> >
> > BTW, we also can change the rebuild() && rebuild(ptr). Merge two func into
> one rebuild(bool align).
> > By judge the parameter align to alloc align memory or not.
> >
> > Or am I missing something?
> 
> I don't think there is a need for c_str(int align). We make every effort to allocate
> buffers that are properly aligned. If c_str() does not return an aligned buffer,
> the proper fix is to align the allocated buffer at the source, not to allocate a
> new aligned buffer and copy the content of the non aligned buffer into it.
> 

> Do you see a reason why that would be a bad way to deal with alignment ?
 We only guarantee memory align after c_str and don't depend on the previous steps.
If encode[i] have more ptr suppose they all aligned memory.
But how to guarantee aligned memory after c_str.
If bufferlist have more ptr, the aligned memory depend on the size of bufferlist.

Jianpeng
> 
> Cheers
> 
> >
> > Jianpeng
> >> sage
> >>
> >>>
> >>> I think this is simple and correctly code.
> >>>
> >>> Jianpeng
> >>> Thanks!
> >>>
> >>>> -----Original Message-----
> >>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx
> >>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary
> >>>> Sent: Tuesday, September 16, 2014 1:20 AM
> >>>> To: Janne Grunau; ceph-devel@xxxxxxxxxxxxxxx
> >>>> Subject: Re: [PATCH 2/3] ec: make use of added aligned buffers
> >>>>
> >>>> Hi Janne,
> >>>>
> >>>> See below:
> >>>>
> >>>> On 15/09/2014 17:55, Janne Grunau wrote:
> >>>>> Requiring page aligned buffers and realigning the input if
> >>>>> necessary creates measurable oberhead.
> ceph_erasure_code_benchmark
> >>>>> is ~30% faster with this change for technique=reed_sol_van,k=2,m=1.
> >>>>>
> >>>>> Also prevents a misaligned buffer when
> >>>>> bufferlist::c_str(bufferlist) has to allocate a new buffer to
> >>>>> provide continuous one. See bug #9408
> >>>>>
> >>>>> Signed-off-by: Janne Grunau <j@xxxxxxxxxx>
> >>>>> ---
> >>>>>  src/erasure-code/ErasureCode.cc | 46
> >>>>> +++++++++++++++++++++++++----------------
> >>>>>  src/erasure-code/ErasureCode.h  |  3 ++-
> >>>>>  2 files changed, 30 insertions(+), 19 deletions(-)
> >>>>>
> >>>>> diff --git a/src/erasure-code/ErasureCode.cc
> >>>>> b/src/erasure-code/ErasureCode.cc index 5953f49..078f60b 100644
> >>>>> --- a/src/erasure-code/ErasureCode.cc
> >>>>> +++ b/src/erasure-code/ErasureCode.cc
> >>>>> @@ -54,22 +54,38 @@ int
> >>>> ErasureCode::minimum_to_decode_with_cost(const
> >>>>> set<int> &want_to_read,  }
> >>>>>
> >>>>>  int ErasureCode::encode_prepare(const bufferlist &raw,
> >>>>> -                                bufferlist *prepared) const
> >>>>> +                                map<int, bufferlist> &encoded)
> >>>> const
> >>>>>  {
> >>>>>    unsigned int k = get_data_chunk_count();
> >>>>>    unsigned int m = get_chunk_count() - k;
> >>>>>    unsigned blocksize = get_chunk_size(raw.length());
> >>>>> -  unsigned padded_length = blocksize * k;
> >>>>> -  *prepared = raw;
> >>>>> -  if (padded_length - raw.length() > 0) {
> >>>>> -    bufferptr pad(padded_length - raw.length());
> >>>>> -    pad.zero();
> >>>>> -    prepared->push_back(pad);
> >>>>> +  unsigned pad_len = blocksize * k - raw.length();
> >>>>> +
> >>>>> +  bufferlist prepared = raw;
> >>>>> +
> >>>>> +  if (!prepared.is_aligned()) {
> >>>>> +    prepared.rebuild_aligned();
> >>>>> +  }
> >>>>> +
> >>>>> +  for (unsigned int i = 0; i < k - !!pad_len; i++) {
> >>>>> +    int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] : i;
> >>>>> +    bufferlist &chunk = encoded[chunk_index];
> >>>>> +    chunk.substr_of(prepared, i * blocksize, blocksize);  }
> >>>>
> >>>> It is possible for more than one chunk to be padding. It's a border
> >>>> case but... for instance with alignment = 16, k=12 and in of length
> >>>> 1550 you end up with two padding chunks because the blocksize is 144.
> >>>>
> >>>>> +  if (pad_len > 0) {
> >>>>> +    int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[k
> >>>>> + - 1] : k -
> >>>> 1;
> >>>>> +    bufferlist &chunk = encoded[chunk_index];
> >>>>> +    bufferptr padded(buffer::create_aligned(blocksize));
> >>>>> +    raw.copy((k - 1) * blocksize, blocksize - pad_len, padded.c_str());
> >>>>> +    padded.zero(blocksize - pad_len, pad_len);
> >>>>> +    chunk.push_back(padded);
> >>>>>    }
> >>>>> -  unsigned coding_length = blocksize * m;
> >>>>> -  bufferptr coding(buffer::create_page_aligned(coding_length));
> >>>>> -  prepared->push_back(coding);
> >>>>> -  prepared->rebuild_page_aligned();
> >>>>> +  for (unsigned int i = k; i < k + m; i++) {
> >>>>> +    int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] : i;
> >>>>> +    bufferlist &chunk = encoded[chunk_index];
> >>>>> +    chunk.push_back(buffer::create_aligned(blocksize));
> >>>>> +  }
> >>>>> +
> >>>>>    return 0;
> >>>>>  }
> >>>>>
> >>>>> @@ -80,15 +96,9 @@ int ErasureCode::encode(const set<int>
> >>>> &want_to_encode,
> >>>>>    unsigned int k = get_data_chunk_count();
> >>>>>    unsigned int m = get_chunk_count() - k;
> >>>>>    bufferlist out;
> >>>>> -  int err = encode_prepare(in, &out);
> >>>>> +  int err = encode_prepare(in, *encoded);
> >>>>>    if (err)
> >>>>>      return err;
> >>>>> -  unsigned blocksize = get_chunk_size(in.length());
> >>>>> -  for (unsigned int i = 0; i < k + m; i++) {
> >>>>> -    int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] : i;
> >>>>> -    bufferlist &chunk = (*encoded)[chunk_index];
> >>>>> -    chunk.substr_of(out, i * blocksize, blocksize);
> >>>>> -  }
> >>>>>    encode_chunks(want_to_encode, encoded);
> >>>>>    for (unsigned int i = 0; i < k + m; i++) {
> >>>>>      if (want_to_encode.count(i) == 0) diff --git
> >>>>> a/src/erasure-code/ErasureCode.h b/src/erasure-code/ErasureCode.h
> >>>>> index 7aaea95..62aa383 100644
> >>>>> --- a/src/erasure-code/ErasureCode.h
> >>>>> +++ b/src/erasure-code/ErasureCode.h
> >>>>> @@ -46,7 +46,8 @@ namespace ceph {
> >>>>>                                              const map<int,
> >> int>
> >>>> &available,
> >>>>>                                              set<int>
> >> *minimum);
> >>>>>
> >>>>> -    int encode_prepare(const bufferlist &raw, bufferlist *prepared)
> >> const;
> >>>>> +    int encode_prepare(const bufferlist &raw,
> >>>>> +                       map<int, bufferlist> &encoded) const;
> >>>>>
> >>>>>      virtual int encode(const set<int> &want_to_encode,
> >>>>>                         const bufferlist &in,
> >>>>>
> >>>>
> >>>> --
> >>>> Lo?c Dachary, Artisan Logiciel Libre
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> >>> info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> --
> Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html