On Wed, 8 Aug 2018, John Spray wrote: > As I string together more complex admin flows around creating > filesystems, pools etc, I'm increasingly feeling the lag of ~1s delays > between commands. > > Currently, paxos transactions either happen in paxos_min_wait (0.05) > seconds if there has been no transaction recently, or we insert a > delay for paxos_propose_interval between transactions otherwise. > > Problems with this: > 1. we rarely get the paxos_min_wait path for commands, because there > has usually been a recent update to PG stats (every > mgr_tick_period=2). > 2. A full second seems like a really long default delay for a > responsive mon cluster on SSDs. > 3. Commands tend to be bursty (e.g. creating two pools then creating > a FS), which in practice will always hit the paxos_propose_interval > slow path > > Possible changes that spring to mind: > A let most administrative commands skip the proposal delay using > force_immediate propose. > B fancier throttling mechanism to enforce a rate limit over time > rather than a hard period between proposals > C don't count the periodic background stats updates when doing > throttling (helps with 1 but not 2 or 3) > D reduce the 1s delay to something like 0.1s: we should be able to > handle 10 transactions per second, right? (helps with 1,2,3 but > potentially still making bursty admin commands slower than they need > to be). > > I'm most attracted to A and D, has anyone else thought about this problem? The original motivation for the throttling was to prevent the generation of too many OSDMaps, but the expectation there was that maps would come from OSD state changes (up/down, etc). The epoch is a monotonic 32-bit value which is only safe if we don't have many epochs per second. Given that, I think A is the easy path. My only concern is if some admin thing gets in a loop and isn't throttled. Which makes me think B is actually the right path... and just be pretty generous with the throttle period (since there may be a flurry of activity but, in general, long idle periods before that). sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html