On Thu, Sep 10, 2015 at 2:43 AM, 瞿天善 <qutianshan@xxxxxxxxx> wrote: > Hi, > I'm working on the topic > <http://tracker.ceph.com/projects/ceph/wiki/Messenger_-_priorities_for_Client> > , first step I add a new rados api set_client_priority, then I will do > more test to get the idea of how to balance the performance. > code is in the pr <https://github.com/ceph/ceph/pull/5602>, > originally client read the config to get priority. > Is the api ok? The API looks fine at first glance, but I continue to be concerned about exposing priorities like this. Right now they behave fairly oddly. For instance: 1) If you have multiple message priorities on the same client, the higher-priority ones always get sent out if there are any. 2) When receiving messages, the higher-priority ones in the queue are always delivered ahead of lower-priority ones 2b) ...unless the ms_fast_dispatch interface is involved (as with client<->OSD communication), in which case priority is ignored by the receiving messenger and it's just dispatched instantly. 3) The priority is used internally by the OSD for a proper tokenized delivery scheme. I think for QOS like this you're most interested in the third option, but simply setting message priority like this will also apply to the monitors and MDSes (and do potentially strange things!), and without guards on the results users could do stuff like set their priority to the max value and basically crowd out everybody else in the cluster. ...come to think of it, that's probably an attack they could do currently if they've got a RADOS key they control. But it's at least *difficult* to do accidentally since the interfaces generally aren't exposed. ;) So before merging something like this I'd like to see us have a more complete solution carved out: 0) Identify if we even want message priority and in-daemon processing priority to be conflated. 1) What does the messenger do? Make it as consistent as possible, probably with tokenization like the OSD has. (This won't be difficult code-wise as we're already using PrioritizedQueue, but we use the "strict" portion of it for all messages right now.) Make sure stuff doesn't break. 2) Establish a way to specify different priorities for different daemons. Just because somebody gets faster data IO doesn't mean they get faster MDS access. Everybody should probably be equal to the monitors. ...and this is one of the things that the proposed API would need to be changed to support. 3) Add capabilities support for specifying priority limits, along with a reasonable default. Check these on message receipt. (...and reject messages outside the bounds? Move them in-bounds?) -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html