On Tue, 21 May 2013, Sylvain Munaut wrote: > Hi, > > >> So, AFAICT, the bulk of the write would be writing out the pgmap to > >> disk every second or so. > > > > It should be writing out the full map only every N commits... see 'paxos > > stash full interval', which defaults to 25. > > But doesn't it also write it in full when there is a new pgmap ? > > I have a new one about every second and its size * period seemed to > match the IO rate pretty well which it why I thought it was the reason > for the IO. Hmm. Can you generate a log with 'debug mon = 20', 'debug paxos = 20', 'debug ms = 1' for a few minutes over which you see a high data rate and send it my way? It sounds like there is something wrong with the stash_full logic. Thanks! > >> Is it really needed to write it in full ? It doesn't change all that > >> much AFAICT, so writing incremental changes with only periodic flush > >> might be a better option ? > > > > Right. It works this way now only because we haven't fully transitioned > > from the old scheme. The next step is to store the PGMap over lots of > > leveldb keys (one per pg) so that there is no big encode/decode of the > > entire PGMap structure... > > Makes sense. I'm not sure of the "per-key" overhead of leveldb though, > in case where there are lots ( > 10k ) PGs. Yeah, it will be larger on-disk, but the io rate will at least be proportional to the update rate. :) sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html