Hi David- On Mon, 12 Oct 2015, David Casier wrote: > Ok, > Great. > > With these settings : > // > newstore_max_dir_size = 4096 > newstore_sync_io = true > newstore_sync_transaction = true > newstore_sync_submit_transaction = true Is this a hard disk? Those settings probably don't make sense since it does every IO synchronously, blocking the submitting IO path... > newstore_sync_wal_apply = true > newstore_overlay_max = 0 > // > > And direct IO in the benchmark tool (fio) > > I see that the HDD is 100% charged and there are notransfer of /db to > /fragments after stopping benchmark : Great ! > > But when i launch a bench with random blocs of 256k, i see random blocs > between 32k and 256k on HDD. Any idea ? Random IOs have to be write ahead logged in rocksdb, which has its own IO pattern. Since you made everything sync above I think it'll depend on how many osd threads get batched together at a time.. maybe. Those settings aren't something I've really tested, and probably only make sense with very fast NVMe devices. > Debits to the HDD are about 8MBps when they could be higher with larger blocs> (~30MBps) > And 70 MBps without fsync (hard drive cache disabled). > > Other questions : > newstore_sync_io -> true = fsync immediatly, false = fsync later (Thread > fsync_wq) ? yes > newstore_sync_transaction -> true = sync in DB ? synchronously do the rocksdb commit too > newstore_sync_submit_transaction -> if false then kv_queue (only if > newstore_sync_transaction=false) ? yeah.. there is an annoying rocksdb behavior that makes an async transaction submit block if a sync one is in progress, so this queues them up and explicitly batches them. > newstore_sync_wal_apply = true -> if false then WAL later (thread wal_wq) ? the txn commit completion threads can do the wal work synchronously.. this is only a good idea if it's doing aio (which it generally is). > Is it true ? > > Way for cache with battery (sync DB and no sync data) ? ? s > > Thanks for everything ! > > On 10/12/2015 03:01 PM, Sage Weil wrote: > > On Mon, 12 Oct 2015, David Casier wrote: > > > Hello everybody, > > > fragment is stored in rocksdb before being written to "/fragments" ? > > > I separed "/db" and "/fragments" but during the bench, everything is > > > writing > > > to "/db" > > > I changed options "newstore_sync_*" without success. > > > > > > Is there any way to write all metadata in "/db" and all data in > > > "/fragments" ? > > You can set newstore_overlay_max = 0 to avoid most data landing in db/. > > But if you are overwriting an existing object, doing write-ahead logging > > is usually unavoidable because we need to make the update atomic (and the > > underlying posix fs doesn't provide that). The wip-newstore-frags branch > > mitigates this somewhat for larger writes by limiting fragment size, but > > for small IOs this is pretty much always going to be the case. For small > > IOs, though, putting things in db/ is generally better since we can > > combine many small ios into a single (rocksdb) journal/wal write. And > > often leave them there (via the 'overlay' behavior). > > > > sage > > > > > -- > ________________________________________________________ > > Cordialement, > > *David CASIER > DCConsulting SARL > > > 4 Trait d'Union > 77127 LIEUSAINT > > **Ligne directe: _01 75 98 53 85_ > Email: _david.casier@aevoo.fr_ > * ________________________________________________________ > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html