Samuel, thank you very much for this explicitely description! As far as I understand the journal acts as a ringbuffer in front of the OSD. Using time as a parameter to trigger sync might not be the best for a dynamic Storage subsystem. On a high workload e.g. 10/20 for min/max might be optimal for for 4 nodes with 10 OSDs each, but not after adding 4 additional nodes. Are there parameters to trigger the syncs to OSD in relation to the fill grade of the journal ? e.g. filestore [min|max] sync percent: Do not sync before min-% full; sync after max-% full What would happen if I set "filestore [min|max] sync interval" to 999999 ? Will the journal sync start at 100% full or at X% ? What is 'X' by defaut ? How can I set 'X' ? Best Regards, -Dieter On Thu, Aug 30, 2012 at 12:34:43AM +0200, Samuel Just wrote: > filestore [min|max] sync interval: > > Periodically, the filestore needs to quiesce writes and do a syncfs in > order to create > a consistent commit point up to which it can free journal entries. Syncing more > frequently tends to reduce the time required to do the sync, and > reduces the amount > of data that needs to remain in the journal. Less frequent syncs > would allow the > backing filesystem to better coalesce small writes and metadata > updates hopefully > resulting in more efficient syncs. 'filestore max sync interval' > defines the maximum > time period between syncs, 'filestore min sync interval' defines the > minimum time > period between syncs. > > filestore flusher: > > The filestore flusher forces data from large writes to be written out > using sync_file_range > before the sync in order to (hopefully) reduce the cost of the > eventual sync. In practice, > disabling 'filestore flusher' seems to improve performance in some cases. > > filestore queue max ops: > > 'filestore queue max ops' defines the number of in progress ops the > filestore will accept > before blocking on queueing new ones. This mostly shouldn't have much > of an effect > on performance and should probably be ignored. > > filestore op threads: > > 'filestore op threads' defines the number of threads used to submit > filesystem operations > in parallel. > > journal dio: > > 'journal dio' enables using O_DIRECT for writing to the journal. This > should usually > be enabled. If possible, 'journal aio' should also be enabled to > allow use of libaio > to do asynchronous writes. > > osd op threads: > > 'osd op threads' defines the size of the thread pool used to service > OSD operations > such as client requests. Increasing this may increase the rate of > request processing. > > osd disk threads: > > 'osd disk threads' defines the number of threads used to perform background disk > intensive osd operations such as scrubbing and snap trimming. > > On Wed, Aug 29, 2012 at 12:29 PM, Dieter Kasper <d.kasper@xxxxxxxxxxxx> wrote: > > Hi Josh, > > > > thanks for the hint. > > Can you please spend a view words about the meaing of these parameters ? > > - filestore min/max sync interval = int/float ? seconds ? of what ? > > - filestore flusher = false > > - filestore queue max ops = 10000 > > what is 'one op' ? queue in front of what ? > > - filestore op threads = > > what are useful values here ? > > > > - journal dio = true/false > > - osd op threads = > > - osd disk threads = > > > > > > Kind Regards, > > -Dieter > > > > > > On Wed, Aug 29, 2012 at 07:37:36PM +0200, Josh Durgin wrote: > >> On 08/29/2012 01:50 AM, Alexandre DERUMIER wrote: > >> > Nice results ! > >> > (can you make same benchmark from a qemu-kvm guest with virtio-driver ? > >> > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster) > >> > > >> >>> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) > >> > I think you can try to tune these values > >> > > >> > filestore max sync interval = 30 > >> > filestore min sync interval = 29 > >> > filestore flusher = false > >> > filestore queue max ops = 10000 > >> > >> Increasing filestore_op_threads might help as well. > >> > >> > ----- Mail original ----- > >> > > >> > De: "Dieter Kasper" <d.kasper@xxxxxxxxxxxx> > >> > À: ceph-devel@xxxxxxxxxxxxxxx > >> > Cc: "Dieter Kasper (KD)" <d.kasper@xxxxxxxxxxxx> > >> > Envoyé: Mardi 28 Août 2012 19:48:42 > >> > Objet: RBD performance - tuning hints > >> > > >> > Hi, > >> > > >> > on my 4-node system (SSD + 10GbE, see bench-config.txt for details) > >> > I can observe a pretty nice rados bench performance > >> > (see bench-rados.txt for details): > >> > > >> > Bandwidth (MB/sec): 961.710 > >> > Max bandwidth (MB/sec): 1040 > >> > Min bandwidth (MB/sec): 772 > >> > > >> > > >> > Also the bandwidth performance generated with > >> > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads} > >> > > >> > .... is acceptable, e.g. > >> > fio_write_4m_16 795 MB/s > >> > fio_randwrite_8m_128 717 MB/s > >> > fio_randwrite_8m_16 714 MB/s > >> > fio_randwrite_2m_32 692 MB/s > >> > > >> > > >> > But, the write IOPS seems to be limited around 19k ... > >> > RBD 4M 64k (= optimal_io_size) > >> > fio_randread_512_128 53286 55925 > >> > fio_randread_4k_128 51110 44382 > >> > fio_randread_8k_128 30854 29938 > >> > fio_randwrite_512_128 18888 2386 > >> > fio_randwrite_512_64 18844 2582 > >> > fio_randwrite_8k_64 17350 2445 > >> > (...) > >> > fio_read_4k_128 10073 53151 > >> > fio_read_4k_64 9500 39757 > >> > fio_read_4k_32 9220 23650 > >> > (...) > >> > fio_read_4k_16 9122 14322 > >> > fio_write_4k_128 2190 14306 > >> > fio_read_8k_32 706 13894 > >> > fio_write_4k_64 2197 12297 > >> > fio_write_8k_64 3563 11705 > >> > fio_write_8k_128 3444 11219 > >> > > >> > > >> > Any hints for tuning the IOPS (read and/or write) would be appreciated. > >> > > >> > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) > >> > > >> > > >> > Kind Regards, > >> > -Dieter > >> > > >> > > >> > > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html