Re: Poor read performance.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed Apr 25 02:24:19 PDT 2018 Christian Balzer wrote:

> Hello,

> On Tue, 24 Apr 2018 12:52:55 -0400 Jonathan Proulx wrote:

> > The performence I really care about is over rbd for VMs in my
> > OpenStack but 'rbd bench' seems to line up frety well with 'fio' test
> > inside VMs so a more or less typical random write rbd bench (from a
> > monitor node with 10G connection on same net as osds):
> >

> "rbd bench" does things differently than fio (lots of happy switches
> there) so to make absolutely sure you're not doing and apples and oranges
> thing I'd suggest you stick to fio in a VM.

There's some tradeoffs yes, but I get very close results and I figured
ceph tools for ceph list rather than pulling in all the rest of my
working stack, since the ceph toools do show the problem.

but I do see your point.


> In comparison this fio:
> ---
> fio --size=1G --ioengine=libaio --invalidate=1 --direct=1 --numjobs=1 --rw=randwrite --name=fiojob --blocksize=4K --iodepth=32
> ---

> Will only result in this, due to the network latencies of having direct
> I/O and only having one OSD at a time being busy:
> ---
>   write: io=110864KB, bw=1667.2KB/s, iops=416, runt= 66499msec
> ---

I may simply have under estimated the impact of write caching in
libvirt, that fio commang does get me just abotu as crappy write
performance as read (which would point to I just need more IOPS from
more/faster disks which is definitely true to a greater or lesser
extent.

WRITE: io=1024.0MB, aggrb=5705KB/s, minb=5705KB/s, maxb=5705KB/s,
       mint=183789msec, maxt=183789msec

READ: io=1024.0MB, aggrb=4322KB/s, minb=4322KB/s, maxb=4322KB/s,
      mint=242606msec, maxt=242606msec 

> That being said, something is rather wrong here indeed, my crappy test
> cluster shouldn't be able to outperform yours.

well load ... the asymetry was may main puzzlement but that may be illusory

> > rbd bench  --io-total=4G --io-size 4096 --io-type write \
> > --io-pattern rand --io-threads 16 mypool/myvol
> > 
> > <snip />
> > 
> > elapsed:   361  ops:  1048576  ops/sec:  2903.82  bytes/sec: 11894034.98
> > 
> > same for random read is an order of magnitude lower:
> > 
> > rbd bench  --io-total=4G --io-size 4096 --io-type read \
> > --io-pattern rand --io-threads 16  mypool/myvol
> > 
> > elapsed:  3354  ops:  1048576  ops/sec:   312.60  bytes/sec: 1280403.47
> > 
> > (sequencial reads and bigger io-size help but not a lot)
> > 
> > ceph -s from during read bench so get a sense of comparing traffic:
> > 
> >   cluster:
> >     id:     <UUID>
> >     health: HEALTH_OK
> >  
> >   services:
> >     mon: 3 daemons, quorum ceph-mon0,ceph-mon1,ceph-mon2
> >     mgr: ceph-mon0(active), standbys: ceph-mon2, ceph-mon1
> >     osd: 174 osds: 174 up, 174 in
> >     rgw: 3 daemon active
> >  
> >   data:
> >     pools:   19 pools, 10240 pgs
> >     objects: 17342k objects, 80731 GB
> >     usage:   240 TB used, 264 TB / 505 TB avail
> >     pgs:     10240 active+clean
> >  
> >   io:
> >     client:   4296 kB/s rd, 417 MB/s wr, 1635 op/s rd, 3518 op/s wr
> > 
> > 
> > During deep-scrubs overnight I can see the disks doing >500MBps reads
> > and ~150rx/iops (each at peak), while during read bench (including all
> > traffic from ~1k VMs) individual osd data partitions peak around 25
> > rx/iops and 1.5MBps rx bandwidth so it seems like there should be
> > performance to spare.
> > 
> OK, there are a couple of things here.
> 1k VMs?!?

Actually 1.7k VMs just now, which caught me a bit by surprise when I
looked at it.  Many are idle because we don't charge per use
internally so people are sloppy, but many aren't and even the idle
ones are writing logs and such.

> One assumes that they're not idle, looking at the output above.
> And writes will compete with reads on the same spindle of course.
> "performance to spare" you say, but have you verified this with iostat or
> atop?

Thsi assertion is mostly based on collectd stats that show a spike in
read ops and bandwidth during our scrub window and no large change in
write ops or bandwidth.  So I presume the disk *could* do that much
(at least ops wise) for client traffic as well.

here's a snap of 24hr graph form one server (others are similar in
general shape):

https://snapshot.raintank.io/dashboard/snapshot/gB3FDPl7uRGWmL17NHNBCuWKGsXdiqlt

(link good for 7days)

You can clearly see the low read line behinf the higehr writes jump up
during scrub window (20:00->02:00 local time here) and a much smaller
bump around 6am cron.daily or the thundering herd. 

The scrubs do impact performance which does mean I'm over capacity as
I should be able to scrub and not impact production, but there's still
a fair amount of capacity used during scrubbing that doesn't seem used
outside.

But looking harder the only answer may be "buy hardware" which is
valid answer.

Thanks,
-Jon
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux