Re: Initial newstore vs filestore results

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 7 Apr 2015 19:58:07 -0700 (PDT)

On Tue, 7 Apr 2015, Mark Nelson wrote:
> On 04/07/2015 02:16 PM, Mark Nelson wrote:
> > On 04/07/2015 09:57 AM, Mark Nelson wrote:
> > > Hi Guys,
> > > 
> > > I ran some quick tests on Sage's newstore branch.  So far given that
> > > this is a prototype, things are looking pretty good imho.  The 4MB
> > > object rados bench read/write and small read performance looks
> > > especially good.  Keep in mind that this is not using the SSD journals
> > > in any way, so 640MB/s sequential writes is actually really good
> > > compared to filestore without SSD journals.
> > > 
> > > small write performance appears to be fairly bad, especially in the RBD
> > > case where it's small writes to larger objects.  I'm going to sit down
> > > and see if I can figure out what's going on.  It's bad enough that I
> > > suspect there's just something odd going on.
> > > 
> > > Mark
> > 
> > Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those
> > interested:
> > 
> > http://nhm.ceph.com/newstore/
> > 
> > Interestingly small object write/read performance with 4 OSDs was about
> > 1/3-1/4 the speed of the same cluster with 36 OSDs.
> > 
> > Note: Thanks Dan for fixing the directory column width!
> > 
> > Mark
> 
> New fio/librbd results using Sage's latest code that attempts to keep small
> overwrite extents in the db.  This is 4 OSD so not directly comparable to the
> 36 OSD tests above, but does include seekwatcher graphs.  Results in MB/s:
> 
> 	write	read	randw	randr
> 4MB	57.9	319.6	55.2	285.9
> 128KB	2.5	230.6	2.4	125.4
> 4KB	0.46	55.65	1.11	3.56

What would be very interesting would be to see the 4KB performance 
with the defaults (newstore overlay max = 32) vs overlays disabled 
(newstore overlay max = 0) and see if/how much it is helping.

The latest branch also has open-by-handle.  It's on by default (newstore 
open by handle = true).  I think for most workloads it won't be very 
noticeable... I think there are two questions we need to answer though:

1) Does it have any impact on a creation workload (say, 4kb objects).  It 
shouldn't, but we should confirm.

2) Does it impact small object random reads with a cold cache.  I think to 
see the effect we'll probably need to pile a ton of objects into the 
store, drop caches, and then do random reads.  In the best case the 
effect will be small, but hopefully noticeable: we should go from 
a directory lookup (1+ seeks) + inode lookup (1+ seek) + data 
read, to inode lookup (1+ seek) + data read.  So, 3 -> 2 seeks best case?  
I'm not really sure what XFS is doing under the covers here...

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html