Thanks Vladimir for the clarification! Tony > -----Original Message----- > From: Vladimir Prokofev <v@xxxxxxxxxxx> > Sent: Monday, November 2, 2020 3:46 AM > Cc: ceph-users <ceph-users@xxxxxxx> > Subject: Re: read latency > > With sequential read you get "read ahead" mechanics attached which helps > a lot. > So let's say you do 4KB seq reads with fio. > By default, Ubuntu, for example, has 128KB read ahead size. That means > when you request that 4KB of data, driver will actually request 128KB. > When your IO is served, and you request next seq 4KB, they're already in > VMs memory, so no new read IO is necessary. > All those 128KB will likely reside on the same OSD, depending on your > CEPH object size. > When you'll reach the end of that 128KB of data, and request next - once > again it will likely reside in the same rbd object as before, assuming > 4MB object size, so depending on the internal mechanics which I'm not > really familiar with, that data can be either in the hosts memory, or at > least in osd node memory, so no real physical IO will be necessary. > What you're thinking about is the worst case scenario - when that 128KB > is split between 2 objects residing on 2 different osds - well, you just > get 2 real physical IO for your 1 virtual, and in that moment you'll > have slower request, but after that you get read ahead to help for a lot > of seq IOs. > In the end, read ahead with sequential IOs leads to way way less real > physical reads than random read, hence the IOPS difference. > > пн, 2 нояб. 2020 г. в 06:20, Tony Liu <tonyliu0592@xxxxxxxxxxx>: > > > Another confusing about read vs. random read. My understanding is > > that, when fio does read, it reads from the test file sequentially. > > When it does random read, it reads from the test file randomly. > > That file read inside VM comes down to volume read handed by RBD > > client who distributes read to PG and eventually to OSD. So a file > > sequential read inside VM won't be a sequential read on OSD disk. > > Is that right? > > Then what difference seq. and rand. read make on OSD disk? > > Is it rand. read on OSD disk for both cases? > > Then how to explain the performance difference between seq. and rand. > > read inside VM? (seq. read IOPS is 20x than rand. read, Ceph is with > > 21 HDDs on 3 nodes, 7 on each) > > > > Thanks! > > Tony > > > -----Original Message----- > > > From: Vladimir Prokofev <v@xxxxxxxxxxx> > > > Sent: Sunday, November 1, 2020 5:58 PM > > > Cc: ceph-users <ceph-users@xxxxxxx> > > > Subject: Re: read latency > > > > > > Not exactly. You can also tune network/software. > > > Network - go for lower latency interfaces. If you have 10G go to 25G > > > or 100G. 40G will not do though, afaik they're just 4x10G so their > > > latency is the same as in 10G. > > > Software - it's closely tied to your network card queues and > > > processor cores. In short - tune affinity so that the packet receive > > > queues and osds processes run on the same corresponding cores. > > > Disabling process power saving features helps a lot. Also watch out > for NUMA interference. > > > But overall all these tricks will save you less than switching from > > > HDD to SSD. > > > > > > пн, 2 нояб. 2020 г. в 02:45, Tony Liu <tonyliu0592@xxxxxxxxxxx>: > > > > > > > Hi, > > > > > > > > AWIK, the read latency primarily depends on HW latency, not much > > > > can be tuned in SW. Is that right? > > > > > > > > I ran a fio random read with iodepth 1 within a VM backed by Ceph > > > > with HDD OSD and here is what I got. > > > > ================= > > > > read: IOPS=282, BW=1130KiB/s (1157kB/s)(33.1MiB/30001msec) > > > > slat (usec): min=4, max=181, avg=14.04, stdev=10.16 > > > > clat (usec): min=178, max=393831, avg=3521.86, stdev=5771.35 > > > > lat (usec): min=188, max=393858, avg=3536.38, stdev=5771.51 > > > > ================= I checked HDD average latency is 2.9 ms. Looks > > > > like the test result makes perfect sense, isn't it? > > > > > > > > If I want to get shorter latency (more IOPS), I will have to go > > > > for better disk, eg. SSD. Right? > > > > > > > > > > > > Thanks! > > > > Tony > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send > > > > an email to ceph-users-leave@xxxxxxx > > > > > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > > > email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx