Re: Odd WAL traffic for BlueStore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 22 Aug 2016, Igor Fedotov wrote:
> Hi All,
> 
> While testing BlueStore as a standalone storage via FIO plugin I'm observing
> huge traffic to a WAL device.
> 
> Bluestore is configured to use 2 450 Gb Intel's SSD: INTEL SSDSC2BX480G4L
> 
> The first SSD is split into 2 partitions (200 & 250 Gb) for Block DB and Block
> WAL.
> 
> The second is split similarly and first 200Gb partition allocated for Raw
> Block data.
> 
> RocksDB settings are set as Somnath suggested in his 'RocksDB tuning' . No
> much difference comparing to default settings though...
> 
> As a result when doing 4k sequential write (8Gb total) to a fresh storage I'm
> observing (using nmon and other disk mon tools) significant write traffic to
> WAL device. And it grows eventually from ~10Mbs to ~170Mbs. Raw Block device
> traffic is pretty stable at ~30 Mbs.
> 
> Additionally I inserted an output for BlueFS perf counters on
> umount(l_bluefs_bytes_written_wal & l_bluefs_bytes_written_sst).
> 
> The resulting values are very frustrating: ~28Gb and 4Gb for
> l_bluefs_bytes_written_wal & l_bluefs_bytes_written_sst respectively.

Yeah, this doesn't seem right.  Have you generated a log to see what is 
actually happening on each write?  I don't have any bright ideas about 
what is going wrong here.

sage

> Doing 64K changes the picture dramatically:
> 
> WAL traffic is stable at 10-12 Mbs and RAW Block one is at ~400Mbs
> BlueFS counters are ~140Mb and 1K respectively.
> 
> Surely write completes much faster in the second case.
> 
> No WAL is reported in logs at BlueStore level for both cases.
> 
> 
> High BlueFS WAL traffic is observed when running subsequent random 4K RW over
> the store propagated this way too.
> 
> I'm wondering why WAL device is involved in the process at all ( writes happen
> in min_alloc_size blocks) operate and why the traffic and written data volume
> is so high?
> 
> Don't we have some fault affecting 4K performance here?
> 
> 
> Here are my settings and FIO job specification:
> 
> ###########################
> 
> [global]
>         debug bluestore = 0/0
>         debug bluefs = 1/0
>         debug bdev = 0/0
>         debug rocksdb = 0/0
> 
>         # spread objects over 8 collections
>         osd pool default pg num = 32
>         log to stderr = false
> 
> [osd]
>         osd objectstore = bluestore
>         bluestore_block_create = true
>         bluestore_block_db_create = true
>         bluestore_block_wal_create = true
>         bluestore_min_alloc_size = 4096
>         #bluestore_max_alloc_size = #or 4096
>         bluestore_fsck_on_mount = false
> 
>         bluestore_block_path=/dev/sdi1
>         bluestore_block_db_path=/dev/sde1
>         bluestore_block_wal_path=/dev/sde2
> 
>         enable experimental unrecoverable data corrupting features = bluestore
> rocksdb memdb
> 
>         bluestore_rocksdb_options =
> "max_write_buffer_number=16,min_write_buffer_number_to_merge=2,recycle_log_file_num=16,compaction_threads=32,flusher_threads=8,max_background_compactions=32,max_background_flushes=8,max_bytes_for_level_base=5368709120,write_buffer_size=83886080,level0_file_num_compaction_trigger=4,level0_slowdown_writes_trigger=400,level0_stop_writes_trigger=800"
> 
>         rocksdb_cache_size = 4294967296
>         bluestore_csum = false
>         bluestore_csum_type = none
>         bluestore_bluefs_buffered_io = false
>         bluestore_max_ops = 30000
>         bluestore_max_bytes = 629145600
>         bluestore_buffer_cache_size = 104857600
>         bluestore_block_wal_size = 0
> 
>         # use directory= option from fio job file
>         osd data = ${fio_dir}
> 
>         # log inside fio_dir
>         log file = ${fio_dir}/log
> ####################################
> 
> #FIO jobs
> #################
> # Runs a 4k random write test against the ceph BlueStore.
> [global]
> ioengine=/usr/local/lib/libfio_ceph_objectstore.so # must be found in your
> LD_LIBRARY_PATH
> 
> conf=ceph-bluestore-somnath.conf # must point to a valid ceph configuration
> file
> directory=./fio-bluestore # directory for osd_data
> 
> rw=write
> iodepth=16
> size=256m
> 
> [bluestore]
> nr_files=63
> bs=4k        # or 64k
> numjobs=32
> #############
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux