No, What test parameters (iodepth/file size/numjobs) would make sense for 3 node/27OSD@4TB ? - Rado -----Original Message----- From: Mark Nelson [mailto:mnelson@xxxxxxxxxx] Sent: Thursday, November 16, 2017 10:56 AM To: Milanov, Radoslav Nikiforov <radonm@xxxxxx>; David Turner <drakonstein@xxxxxxxxx> Cc: ceph-users@xxxxxxxxxxxxxx Subject: Re: Bluestore performance 50% of filestore Did you happen to have a chance to try with a higher io depth? Mark On 11/16/2017 09:53 AM, Milanov, Radoslav Nikiforov wrote: > FYI > > Having 50GB bock.db made no difference on the performance. > > > > - Rado > > > > *From:*David Turner [mailto:drakonstein@xxxxxxxxx] > *Sent:* Tuesday, November 14, 2017 6:13 PM > *To:* Milanov, Radoslav Nikiforov <radonm@xxxxxx> > *Cc:* Mark Nelson <mnelson@xxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx > *Subject:* Re: Bluestore performance 50% of filestore > > > > I'd probably say 50GB to leave some extra space over-provisioned. > 50GB should definitely prevent any DB operations from spilling over to the HDD. > > > > On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov > <radonm@xxxxxx <mailto:radonm@xxxxxx>> wrote: > > Thank you, > > It is 4TB OSDs and they might become full someday, I’ll try 60GB db > partition – this is the max OSD capacity. > > > > - Rado > > > > *From:*David Turner [mailto:drakonstein@xxxxxxxxx > <mailto:drakonstein@xxxxxxxxx>] > *Sent:* Tuesday, November 14, 2017 5:38 PM > > > *To:* Milanov, Radoslav Nikiforov <radonm@xxxxxx > <mailto:radonm@xxxxxx>> > > *Cc:*Mark Nelson <mnelson@xxxxxxxxxx <mailto:mnelson@xxxxxxxxxx>>; > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > > > *Subject:* Re: Bluestore performance 50% of filestore > > > > You have to configure the size of the db partition in the config > file for the cluster. If you're db partition is 1GB, then I can all > but guarantee that you're using your HDD for your blocks.db very > quickly into your testing. There have been multiple threads > recently about what size the db partition should be and it seems to > be based on how many objects your OSD is likely to have on it. The > recommendation has been to err on the side of bigger. If you're > running 10TB OSDs and anticipate filling them up, then you probably > want closer to an 80GB+ db partition. That's why I asked how full > your cluster was and how large your HDDs are. > > > > Here's a link to one of the recent ML threads on this > topic. > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020 > 822.html > > On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov > <radonm@xxxxxx <mailto:radonm@xxxxxx>> wrote: > > Block-db partition is the default 1GB (is there a way to modify > this? journals are 5GB in filestore case) and usage is low: > > > > [root@kumo-ceph02 ~]# ceph df > > GLOBAL: > > SIZE AVAIL RAW USED %RAW USED > > 100602G 99146G 1455G 1.45 > > POOLS: > > NAME ID USED %USED MAX AVAIL > OBJECTS > > kumo-vms 1 19757M 0.02 > 31147G 5067 > > kumo-volumes 2 214G 0.18 > 31147G 55248 > > kumo-images 3 203G 0.17 > 31147G 66486 > > kumo-vms3 11 45824M 0.04 > 31147G 11643 > > kumo-volumes3 13 10837M 0 > 31147G 2724 > > kumo-images3 15 82450M 0.09 > 31147G 10320 > > > > - Rado > > > > *From:*David Turner [mailto:drakonstein@xxxxxxxxx > <mailto:drakonstein@xxxxxxxxx>] > *Sent:* Tuesday, November 14, 2017 4:40 PM > *To:* Mark Nelson <mnelson@xxxxxxxxxx <mailto:mnelson@xxxxxxxxxx>> > *Cc:* Milanov, Radoslav Nikiforov <radonm@xxxxxx > <mailto:radonm@xxxxxx>>; ceph-users@xxxxxxxxxxxxxx > <mailto:ceph-users@xxxxxxxxxxxxxx> > > > *Subject:* Re: Bluestore performance 50% of > filestore > > > > How big was your blocks.db partition for each OSD and what size > are your HDDs? Also how full is your cluster? It's possible > that your blocks.db partition wasn't large enough to hold the > entire db and it had to spill over onto the HDD which would > definitely impact performance. > > > > On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson <mnelson@xxxxxxxxxx > <mailto:mnelson@xxxxxxxxxx>> wrote: > > How big were the writes in the windows test and how much > concurrency was > there? > > Historically bluestore does pretty well for us with small > random writes > so your write results surprise me a bit. I suspect it's the > low queue > depth. Sometimes bluestore does worse with reads, especially if > readahead isn't enabled on the client. > > Mark > > On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote: > > Hi Mark, > > Yes RBD is in write back, and the only thing that changed > was converting OSDs to bluestore. It is 7200 rpm drives and > triple replication. I also get same results (bluestore 2 > times slower) testing continuous writes on a 40GB partition > on a Windows VM, completely different tool. > > > > Right now I'm going back to filestore for the OSDs so > additional tests are possible if that helps. > > > > - Rado > > > > -----Original Message----- > > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx > <mailto:ceph-users-bounces@xxxxxxxxxxxxxx>] On Behalf Of > Mark Nelson > > Sent: Tuesday, November 14, 2017 4:04 PM > > To: ceph-users@xxxxxxxxxxxxxx > <mailto:ceph-users@xxxxxxxxxxxxxx> > > Subject: Re: Bluestore performance 50% of > filestore > > > > Hi Radoslav, > > > > Is RBD cache enabled and in writeback mode? Do you have > client side readahead? > > > > Both are doing better for writes than you'd expect from > the native performance of the disks assuming they are > typical 7200RPM drives and you are using 3X replication > (~150IOPS * 27 / 3 = ~1350 IOPS). Given the small file > size, I'd expect that you might be getting better journal > coalescing in filestore. > > > > Sadly I imagine you can't do a comparison test at this > point, but I'd be curious how it would look if you used > libaio with a high iodepth and a much bigger partition to do > random writes over. > > > > Mark > > > > On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote: > >> Hi > >> > >> We have 3 node, 27 OSDs cluster running Luminous 12.2.1 > >> > >> In filestore configuration there are 3 SSDs used for > journals of 9 > >> OSDs on each hosts (1 SSD has 3 journal paritions for 3 > OSDs). > >> > >> I've converted filestore to bluestore by wiping 1 host a > time and > >> waiting for recovery. SSDs now contain block-db - again > one SSD > >> serving > >> 3 OSDs. > >> > >> > >> > >> Cluster is used as storage for Openstack. > >> > >> Running fio on a VM in that Openstack reveals bluestore > performance > >> almost twice slower than filestore. > >> > >> fio --name fio_test_file --direct=1 --rw=randwrite > --bs=4k --size=1G > >> --numjobs=2 --time_based --runtime=180 --group_reporting > >> > >> fio --name fio_test_file --direct=1 --rw=randread --bs=4k > --size=1G > >> --numjobs=2 --time_based --runtime=180 --group_reporting > >> > >> > >> > >> > >> > >> Filestore > >> > >> write: io=3511.9MB, bw=19978KB/s, iops=4994, > runt=180001msec > >> > >> write: io=3525.6MB, bw=20057KB/s, iops=5014, > runt=180001msec > >> > >> write: io=3554.1MB, bw=20222KB/s, iops=5055, > runt=180016msec > >> > >> > >> > >> read : io=1995.7MB, bw=11353KB/s, iops=2838, > runt=180001msec > >> > >> read : io=1824.5MB, bw=10379KB/s, iops=2594, > runt=180001msec > >> > >> read : io=1966.5MB, bw=11187KB/s, iops=2796, > runt=180001msec > >> > >> > >> > >> Bluestore > >> > >> write: io=1621.2MB, bw=9222.3KB/s, iops=2305, > runt=180002msec > >> > >> write: io=1576.3MB, bw=8965.6KB/s, iops=2241, > runt=180029msec > >> > >> write: io=1531.9MB, bw=8714.3KB/s, iops=2178, > runt=180001msec > >> > >> > >> > >> read : io=1279.4MB, bw=7276.5KB/s, iops=1819, > runt=180006msec > >> > >> read : io=773824KB, bw=4298.9KB/s, iops=1074, > runt=180010msec > >> > >> read : io=1018.5MB, bw=5793.7KB/s, iops=1448, > runt=180001msec > >> > >> > >> > >> > >> > >> - Rado > >> > >> > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com