Re: read latency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



With sequential read you get "read ahead" mechanics attached which helps a
lot.
So let's say you do 4KB seq reads with fio.
By default, Ubuntu, for example, has 128KB read ahead size. That means when
you request that 4KB of data, driver will actually request 128KB. When your
IO is served, and you request next seq 4KB, they're already in VMs memory,
so no new read IO is necessary.
All those 128KB will likely reside on the same OSD, depending on your CEPH
object size.
When you'll reach the end of that 128KB of data, and request next - once
again it will likely reside in the same rbd object as before, assuming 4MB
object size, so depending on the internal mechanics which I'm not really
familiar with, that data can be either in the hosts memory, or at least in
osd node memory, so no real physical IO will be necessary.
What you're thinking about is the worst case scenario - when that 128KB is
split between 2 objects residing on 2 different osds - well, you just get 2
real physical IO for your 1 virtual, and in that moment you'll have slower
request, but after that you get read ahead to help for a lot of seq IOs.
In the end, read ahead with sequential IOs leads to way way less real
physical reads than random read, hence the IOPS difference.

пн, 2 нояб. 2020 г. в 06:20, Tony Liu <tonyliu0592@xxxxxxxxxxx>:

> Another confusing about read vs. random read. My understanding is
> that, when fio does read, it reads from the test file sequentially.
> When it does random read, it reads from the test file randomly.
> That file read inside VM comes down to volume read handed by RBD
> client who distributes read to PG and eventually to OSD. So a file
> sequential read inside VM won't be a sequential read on OSD disk.
> Is that right?
> Then what difference seq. and rand. read make on OSD disk?
> Is it rand. read on OSD disk for both cases?
> Then how to explain the performance difference between seq. and rand.
> read inside VM? (seq. read IOPS is 20x than rand. read, Ceph is
> with 21 HDDs on 3 nodes, 7 on each)
>
> Thanks!
> Tony
> > -----Original Message-----
> > From: Vladimir Prokofev <v@xxxxxxxxxxx>
> > Sent: Sunday, November 1, 2020 5:58 PM
> > Cc: ceph-users <ceph-users@xxxxxxx>
> > Subject:  Re: read latency
> >
> > Not exactly. You can also tune network/software.
> > Network - go for lower latency interfaces. If you have 10G go to 25G or
> > 100G. 40G will not do though, afaik they're just 4x10G so their latency
> > is the same as in 10G.
> > Software - it's closely tied to your network card queues and processor
> > cores. In short - tune affinity so that the packet receive queues and
> > osds processes run on the same corresponding cores. Disabling process
> > power saving features helps a lot. Also watch out for NUMA interference.
> > But overall all these tricks will save you less than switching from HDD
> > to SSD.
> >
> > пн, 2 нояб. 2020 г. в 02:45, Tony Liu <tonyliu0592@xxxxxxxxxxx>:
> >
> > > Hi,
> > >
> > > AWIK, the read latency primarily depends on HW latency, not much can
> > > be tuned in SW. Is that right?
> > >
> > > I ran a fio random read with iodepth 1 within a VM backed by Ceph with
> > > HDD OSD and here is what I got.
> > > =================
> > >    read: IOPS=282, BW=1130KiB/s (1157kB/s)(33.1MiB/30001msec)
> > >     slat (usec): min=4, max=181, avg=14.04, stdev=10.16
> > >     clat (usec): min=178, max=393831, avg=3521.86, stdev=5771.35
> > >      lat (usec): min=188, max=393858, avg=3536.38, stdev=5771.51
> > > ================= I checked HDD average latency is 2.9 ms. Looks like
> > > the test result makes perfect sense, isn't it?
> > >
> > > If I want to get shorter latency (more IOPS), I will have to go for
> > > better disk, eg. SSD. Right?
> > >
> > >
> > > Thanks!
> > > Tony
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> > > email to ceph-users-leave@xxxxxxx
> > >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> > email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux