Hi, >> So, AFAICT, the bulk of the write would be writing out the pgmap to >> disk every second or so. > > It should be writing out the full map only every N commits... see 'paxos > stash full interval', which defaults to 25. But doesn't it also write it in full when there is a new pgmap ? I have a new one about every second and its size * period seemed to match the IO rate pretty well which it why I thought it was the reason for the IO. >> Is it really needed to write it in full ? It doesn't change all that >> much AFAICT, so writing incremental changes with only periodic flush >> might be a better option ? > > Right. It works this way now only because we haven't fully transitioned > from the old scheme. The next step is to store the PGMap over lots of > leveldb keys (one per pg) so that there is no big encode/decode of the > entire PGMap structure... Makes sense. I'm not sure of the "per-key" overhead of leveldb though, in case where there are lots ( > 10k ) PGs. Cheers, Sylvain _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com