Re: Fwd: [newstore (again)] how disable double write WAL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok,
Great.

With these  settings :
//
newstore_max_dir_size = 4096
newstore_sync_io = true
newstore_sync_transaction = true
newstore_sync_submit_transaction = true
newstore_sync_wal_apply = true
newstore_overlay_max = 0
//

And direct IO in the benchmark tool (fio)

I see that the HDD is 100% charged and there are notransfer of /db to /fragments after stopping benchmark : Great !

But when i launch a bench with random blocs of 256k, i see random blocs between 32k and 256k on HDD. Any idea ?

Debits to the HDD are about 8MBps when they could be higher with larger blocs (~30MBps)
And 70 MBps without fsync (hard drive cache disabled).

Other questions :
newstore_sync_io -> true = fsync immediatly, false = fsync later (Thread fsync_wq) ?
newstore_sync_transaction -> true = sync in DB ?
newstore_sync_submit_transaction -> if false then kv_queue (only if newstore_sync_transaction=false) ?
newstore_sync_wal_apply = true -> if false then WAL later (thread wal_wq) ?

Is it true ?

Way for cache with battery (sync DB and no sync data) ?

Thanks for everything !

On 10/12/2015 03:01 PM, Sage Weil wrote:
On Mon, 12 Oct 2015, David Casier wrote:
Hello everybody,
fragment is stored in rocksdb before being written to "/fragments" ?
I separed "/db" and "/fragments" but during the bench, everything is writing
to "/db"
I changed options "newstore_sync_*" without success.

Is there any way to write all metadata in "/db" and all data in "/fragments" ?
You can set newstore_overlay_max = 0 to avoid most data landing in db/.
But if you are overwriting an existing object, doing write-ahead logging
is usually unavoidable because we need to make the update atomic (and the
underlying posix fs doesn't provide that).  The wip-newstore-frags branch
mitigates this somewhat for larger writes by limiting fragment size, but
for small IOs this is pretty much always going to be the case.  For small
IOs, though, putting things in db/ is generally better since we can
combine many small ios into a single (rocksdb) journal/wal write.  And
often leave them there (via the 'overlay' behavior).

sage



--
________________________________________________________

Cordialement,

*David CASIER
DCConsulting SARL


4 Trait d'Union
77127 LIEUSAINT

**Ligne directe: _01 75 98 53 85_
Email: _david.casier@aevoo.fr_
* ________________________________________________________
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux