Re: rbd top

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Wed, 17 Jun 2015 11:06:46 -0600



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Well, I think this has gone well past my ability to implement. Should
this be turned into a BP and see if someone is able to work on it?
- ----------------
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Jun 16, 2015 at 5:05 AM, Wido den Hollander  wrote:
> On 06/15/2015 06:52 PM, John Spray wrote:
>>
>>
>> On 15/06/2015 17:10, Robert LeBlanc wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>> John, let me see if I understand what you are saying...
>>>
>>> When a person runs `rbd top`, each OSD would receive a message saying
>>> please capture all the performance, grouped by RBD and limit it to
>>> 'X'. That way the OSD doesn't have to constantly update performance
>>> for each object, but when it is requested it starts tracking it?
>>
>> Right, initially the OSD isn't collecting anything, it starts as soon as
>> it sees a query get loaded up (published via OSDMap or some other
>> mechanism).
>>
>
> I like that idea very much. Currently the OSDs are already CPU bound, a
> lot of time is used by processing a request while it's not waiting on
> the disk.
>
> Although tracking IOps might seem like a small and cheap thing to do,
> it's yet more CPU time spent by the system on something else then
> processing the I/O.
>
> So I'm in favor of not always collecting, but only on demand.
>
> Go for performance, low-latency and high IOps.
>
> Wido
>
>> That said, in practice I can see people having some set of queries that
>> they always have loaded and feeding into graphite in the background.
>>>
>>> If so, that is an interesting idea. I wonder if that would be simpler
>>> than tracking the performance of each/MRU objects in some format like
>>> /proc/diskstats where it is in memory and not necessarily consistent.
>>> The benefit is that you could have "lifelong" stats that show up like
>>> iostat and it would be a simple operation.
>>
>> Hmm, not sure we're on the same page about this part, what I'm talking
>> about is all in memory and would be lost across daemon restarts.  Some
>> other component would be responsible for gathering the stats across all
>> the daemons in one place (that central part could persist stats if
>> desired).
>>
>>> Each object should be able
>>> to reference back to RBD/CephFS upon request and the client could even
>>> be responsible for that load. Client performance data would need stats
>>> in addition to the object stats.
>>
>> You could extend the mechanism to clients.  However, as much as possible
>> it's a good thing to keep it server side, as servers are generally fewer
>> (still have to reduce these stats across N servers to present to user),
>> and we have multiple client implementations (kernel/userspace).  What
>> kind of thing do you want to get from clients?
>>> My concern is that adding additional SQL like logic to each op is
>>> going to get very expensive. I guess if we could push that to another
>>> thread early in the op, then it might not be too bad. I'm enjoying the
>>> discussion and new ideas.
>>
>> Hopefully in most cases the query can be applied very cheaply, for
>> operations like comparing pool ID or grouping by client ID. However, I
>> would also envisage an optional sampling number, such that e.g. only 1
>> in every 100 ops would go through the query processing.  Useful for
>> systems where keeping highest throughput is paramount, and the numbers
>> will still be useful if clients are doing many thousands of ops per second.
>>
>> Cheers,
>> John
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVgakhCRDmVDuy+mK58QAALfAP/RoukN52ewY3nRvzHFCD
/r8gsBa5c6o8rMPmUG09kFUALcocD4GPvYmwG45UBQbpI2lL3/SSV50BNS7z
3HtoDgEtn39Qg3P5EqJAehLViaa9Zsj6PukM7nqzOuBvFqGi6BmsEAz7dIAA
X3aQyxHt86D9v/epzPkICa7UytQ+tH+7YMcoFZRAZYqkGrsFQv3m3vVJQZwq
rQN2zkwBYR6CwsayB7j+92q3GTvYTaV90FS8dnBJWzH+9QAsf15044iEbwHo
fzLf4NS4+0cawOl93Oh8jkXV4cmsNmdhcoCJxHd5TUn5tI/oRMSbdHo+i10v
xgqApw8Z3XJ+UTKv/migRFW7RMsUrWABWXquakD5gZ7GfekhJPGCAwl+/FaI
wvjvjKdf5WXPWi0jUd8UciQS/Sj21cqqR7S2Is/AcBx9uoqy/c7855WRub6k
eSbLAnBKXn6s+sjjOwU4zQD1cV7OUOxqVUNY96tJ1+GGQGOngI456tvbVGFN
lpkBLVviVxHtlm+h8r6A82V9CD3zhgbxY+rME8wB7lvbWwqxkYPgH4/JhHMB
fOh1hzB7Ay+SpYEb2chIsjdx9qYzAoQ75dnWTBjxYEQFvt63SdnbMJrFct2h
hgrrZ2QJaLy8/u9WMayzQTShnxtJEo5eo6iB1WsvXafVmUtpONnpNUbWvqUb
LXRT
=7XsA
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html