Re: seastar temporary_buffer/packet and ceph::buffer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 17, 2018 at 9:13 PM, Matt Benjamin <mbenjami@xxxxxxxxxx> wrote:
> On Wed, Jan 17, 2018 at 7:53 AM, kefu chai <tchaikov@xxxxxxxxx> wrote:
>>
>>  i think it could be feasible to come up with some generic adapter to
>> use these concepts, but we might need to focus on some typical use
>> cases. for example, to encode and write a MOSDOpReply:
>>
>> assuming that we want to encode it into multiple temporary buffers and
>> send them using a seastar::packet. the MOSDOpReply struct has a field
>> of vector<OSDOp>, and OSDOp in turn has "bufferlist outdata;" in it.
>> for a read op, apparently, we want to avoid memcpy. if the underlying
>> read(store) returns a bufferlist, the encoder should be able to
>> shallow copy the bufferlist and encapsulate it in a fragment with a
>> customized deleter or just a temporary_buffer. if we want to avoid the
>> conversion from buffer::list to seastar::packet, probably we should
>> generalize the appender so it
>
> Yes, we'd like to avoid such copies.
>
>>
>> - is able to manage the life cycle of the buffers in bufferlist being
>> encoded. i.e. should flush current temporary_buffer, and translate all
>> buffer::ptr to seastar::temporary_buffer, then append them to the
>> output seastar::packet
>> - just writes to the local temporary_buffer directly when encoding
>> other kinds of objects.
>>
>> so, i don't think we can have a very generic appender if we want to be
>> efficient. but we can define one which works for certain output type.
>> in this example, it's seastar::packet.
>
> That sounds right to me.
>
>>
>>>
>>> DynamicBuffer::prepare() looks like a natural way to do the append part on
>>> encode, but it returns a MutableBufferSequence so is not guaranteed to be
>>> contiguous. Can you remind me where we really depend on contiguous buffers?
>>
>> IIRC, we don't really depend on contiguous buffers. but we do expect
>> that the buffer we are manipulating is continuous, so we can have the
>> opportunity to optimize the dencoder:
>>
>> - encoding: it's a noticeable overhead to check the underlying buffer
>> boundary when writing to a buffer. that's why we introduced the
>> appender to facilitate the encoding. see [1]
>
> This is kind of surprising.

forgot to mention, preallocated buffer also avoid the possible
overhead to allocate multiple new buffers  when encoding progresses.

>
>> - decoding: for the similar reason, it's more performant to just read
>> a contiguous buffer than reading from a segmented one. in other words,
>> if we can ensure a the bufferlist being decoded has only a single raw
>> buffer, why bother checking the boundary?
>
> Is that likely?  If the decoding case is from seastar, isn't the
> incoming data likely to be in the form of several mbufs?
>
>>
>> if we want to encode object into the mutable buffers returned
>> DynamicBuffer::prepare(), we need to check x.size() when writing to
>> x.data() and move on to the next one if current buffer chunk is too
>> small.
>>
>> --
>> [1] https://www.spinics.net/lists/ceph-devel/msg32608.html
>>
>>
>>> Is it just that some types set need_contiguous=false because we haven't yet
>>> implemented logic for segmented buffers, or are there cases where
>>> implementing that logic is actually intractable?
>>
>> "need_contiguous" trait was added to the denc for the case where the
>> buffer being decoded are large objects and they could be segmented.
>> these objects are read from bluestore. but i think we would expand the
>> case to the large message if we rebuild/flatten the segmented buffer,
>> the overhead of reallocating the buffer and memcpy calls offsets the
>> performance improvement we achieve by avoiding the bound check and
>> simpler logic. before that, we always do the rebuild before decoding a
>> buffer::list::iterator.
>
> That's what I would have expected.
>
>>
>>>
>>> Thanks,
>>> Casey
>>
>>
>>
>> --
>> Regards
>> Kefu Chai
>
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309



-- 
Regards
Kefu Chai
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux