Re: rbd top

Wido den Hollander <wido@xxxxxxxx> · Tue, 16 Jun 2015 13:05:20 +0200

On 06/15/2015 06:52 PM, John Spray wrote:
> 
> 
> On 15/06/2015 17:10, Robert LeBlanc wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> John, let me see if I understand what you are saying...
>>
>> When a person runs `rbd top`, each OSD would receive a message saying
>> please capture all the performance, grouped by RBD and limit it to
>> 'X'. That way the OSD doesn't have to constantly update performance
>> for each object, but when it is requested it starts tracking it?
> 
> Right, initially the OSD isn't collecting anything, it starts as soon as
> it sees a query get loaded up (published via OSDMap or some other
> mechanism).
> 

I like that idea very much. Currently the OSDs are already CPU bound, a
lot of time is used by processing a request while it's not waiting on
the disk.

Although tracking IOps might seem like a small and cheap thing to do,
it's yet more CPU time spent by the system on something else then
processing the I/O.

So I'm in favor of not always collecting, but only on demand.

Go for performance, low-latency and high IOps.

Wido

> That said, in practice I can see people having some set of queries that
> they always have loaded and feeding into graphite in the background.
>>
>> If so, that is an interesting idea. I wonder if that would be simpler
>> than tracking the performance of each/MRU objects in some format like
>> /proc/diskstats where it is in memory and not necessarily consistent.
>> The benefit is that you could have "lifelong" stats that show up like
>> iostat and it would be a simple operation.
> 
> Hmm, not sure we're on the same page about this part, what I'm talking
> about is all in memory and would be lost across daemon restarts.  Some
> other component would be responsible for gathering the stats across all
> the daemons in one place (that central part could persist stats if
> desired).
> 
>> Each object should be able
>> to reference back to RBD/CephFS upon request and the client could even
>> be responsible for that load. Client performance data would need stats
>> in addition to the object stats.
> 
> You could extend the mechanism to clients.  However, as much as possible
> it's a good thing to keep it server side, as servers are generally fewer
> (still have to reduce these stats across N servers to present to user),
> and we have multiple client implementations (kernel/userspace).  What
> kind of thing do you want to get from clients?
>> My concern is that adding additional SQL like logic to each op is
>> going to get very expensive. I guess if we could push that to another
>> thread early in the op, then it might not be too bad. I'm enjoying the
>> discussion and new ideas.
> 
> Hopefully in most cases the query can be applied very cheaply, for
> operations like comparing pool ID or grouping by client ID. However, I
> would also envisage an optional sampling number, such that e.g. only 1
> in every 100 ops would go through the query processing.  Useful for
> systems where keeping highest throughput is paramount, and the numbers
> will still be useful if clients are doing many thousands of ops per second.
> 
> Cheers,
> John
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html