Re: [PATCHv2 0/3] rbd: header read/refresh improvements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alex,

I think you are correct in both your understanding and your impression of the approach.

> On Apr 26, 2015, at 4:44 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> 
> On Sun, Apr 26, 2015 at 9:29 AM, Alex Elder <elder@xxxxxxxx> wrote:
>> On 04/24/2015 08:22 AM, Douglas Fuller wrote:
>>> 
>>> Support multiple class op calls in one ceph_msg and consolidate rbd header
>>> read and refresh processes to use this feature to reduce the number of
>>> ceph_msgs sent for that process. Refresh features on header refresh and
>>> begin returning EIO if features have changed since mapping.
>> 
>> This sounds pretty expensive.  For every class operation
>> you are copying the received data two extra times.

I’d really like a solution to this, but I don’t think one is available given the constraints.

>> Will you please correct me where I'm wrong above, and
>> maybe give a little better high-level explanation of
>> what you're trying to do?  I see in patch 1 you mention
>> a problem with short reads, and there may be a simpler
>> fix than that (to begin with).  But beyond that, if
>> you're trying to incorporate more ops in a message,
>> there may be better ways to go about that as well.
> 
> Yeah, the only objective of this was to pack more call ops into an
> osd_request in order to reduce the number of network RTs during rbd map
> and refresh.  The fundamental problem the first patch is trying to work
> around is the first ceph_msg_data item consuming all reply buffers
> instead of just its own.  We have to preallocate response buffers and
> if the preallocated response buffer for the first op is large enough
> (it always is) it will consume the entire ceph_msg, along with replies
> to other ops.
> 
> My understanding is that the first patch is supposed to be a specific
> workarond for just that - I think it's noted somewhere that it will
> break on reads combined with call ops or similar.

That’s correct. This patch only works around that short response issue in this specific corner case. It does not fix cases where call ops returning data could be arbitrarily combined with reads or writes (and apparently they never have been, because that would not work). It should not introduce additional cases of breakage, though.

I was told that the preferred way to proceed for now was to avoid changing the osd_client interface and to handle this case in osd_client instead of the messaging layer. Given those constraints, it was agreed in the standups and on #ceph-devel that it should be done this way.

We can’t know the actual response sizes until they are decoded in osd_client. Therefore, we can’t copy them directly to the user buffers off the wire. That costs us one extra copy. The other copy is required because pagevec.c does not provide an interface for copying arbitrary data between two page vectors.

> I too have my efficiency concerns and I raised them in one of the
> standups but the argument was that this is only going to be used for
> header init/refresh, not copyup, so the overhead is negligible.  I'm
> still not sold on the copying everything twice though and was going to
> try to see if there is a simple way to special case this even more and
> make the diffstat smaller.

You and I agreed in that particular standup discussion; I don’t like it, either. I would prefer a general-case solution that avoids introducing so much extra overhead, especially for such a small amount of data (really just a few bytes).

If there’s a way to handle this with a better method, I’m all ears. I had thought of taking advantage of the fact that the sum total of all this data will never exceed a single page, but that seems to me to require working around even more functionality for what is, essentially, a corner case.

-Doug


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux