Re: seastar temporary_buffer/packet and ceph::buffer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 17, 2018 at 7:53 AM, kefu chai <tchaikov@xxxxxxxxx> wrote:
>
>  i think it could be feasible to come up with some generic adapter to
> use these concepts, but we might need to focus on some typical use
> cases. for example, to encode and write a MOSDOpReply:
>
> assuming that we want to encode it into multiple temporary buffers and
> send them using a seastar::packet. the MOSDOpReply struct has a field
> of vector<OSDOp>, and OSDOp in turn has "bufferlist outdata;" in it.
> for a read op, apparently, we want to avoid memcpy. if the underlying
> read(store) returns a bufferlist, the encoder should be able to
> shallow copy the bufferlist and encapsulate it in a fragment with a
> customized deleter or just a temporary_buffer. if we want to avoid the
> conversion from buffer::list to seastar::packet, probably we should
> generalize the appender so it

Yes, we'd like to avoid such copies.

>
> - is able to manage the life cycle of the buffers in bufferlist being
> encoded. i.e. should flush current temporary_buffer, and translate all
> buffer::ptr to seastar::temporary_buffer, then append them to the
> output seastar::packet
> - just writes to the local temporary_buffer directly when encoding
> other kinds of objects.
>
> so, i don't think we can have a very generic appender if we want to be
> efficient. but we can define one which works for certain output type.
> in this example, it's seastar::packet.

That sounds right to me.

>
>>
>> DynamicBuffer::prepare() looks like a natural way to do the append part on
>> encode, but it returns a MutableBufferSequence so is not guaranteed to be
>> contiguous. Can you remind me where we really depend on contiguous buffers?
>
> IIRC, we don't really depend on contiguous buffers. but we do expect
> that the buffer we are manipulating is continuous, so we can have the
> opportunity to optimize the dencoder:
>
> - encoding: it's a noticeable overhead to check the underlying buffer
> boundary when writing to a buffer. that's why we introduced the
> appender to facilitate the encoding. see [1]

This is kind of surprising.

> - decoding: for the similar reason, it's more performant to just read
> a contiguous buffer than reading from a segmented one. in other words,
> if we can ensure a the bufferlist being decoded has only a single raw
> buffer, why bother checking the boundary?

Is that likely?  If the decoding case is from seastar, isn't the
incoming data likely to be in the form of several mbufs?

>
> if we want to encode object into the mutable buffers returned
> DynamicBuffer::prepare(), we need to check x.size() when writing to
> x.data() and move on to the next one if current buffer chunk is too
> small.
>
> --
> [1] https://www.spinics.net/lists/ceph-devel/msg32608.html
>
>
>> Is it just that some types set need_contiguous=false because we haven't yet
>> implemented logic for segmented buffers, or are there cases where
>> implementing that logic is actually intractable?
>
> "need_contiguous" trait was added to the denc for the case where the
> buffer being decoded are large objects and they could be segmented.
> these objects are read from bluestore. but i think we would expand the
> case to the large message if we rebuild/flatten the segmented buffer,
> the overhead of reallocating the buffer and memcpy calls offsets the
> performance improvement we achieve by avoiding the bound check and
> simpler logic. before that, we always do the rebuild before decoding a
> buffer::list::iterator.

That's what I would have expected.

>
>>
>> Thanks,
>> Casey
>
>
>
> --
> Regards
> Kefu Chai



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux