Am Donnerstag, 4. August 2011 schrieb Jens Axboe: > On 2011-08-04 10:51, Martin Steigerwald wrote: > > Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald: > >> Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald: > >>> Am Mittwoch, 3. August 2011 schrieben Sie: > >>>> Martin Steigerwald <Martin@xxxxxxxxxxxx> writes: > >> [...] > >> > >>> Does using iodepth > 1 need ioengine=libaio? Let´s see the manpage: > >>> iodepth=int > >>> > >>> Number of I/O units to keep in flight against the > >>> file. Note that increasing iodepth beyond 1 will > >>> not affect synchronous ioengines (except for small > >>> degress when verify_async is in use). Even async > >>> engines my impose OS restrictions causing the > >>> desired depth not to be achieved. This may happen > >>> on Linux when using libaio and not setting > >>> direct=1, since buffered IO is not async on that > >>> OS. Keep an eye on the IO depth distribution in > >>> the fio output to verify that the achieved depth > >>> is as expected. Default: 1. > >>> > >>> Okay, yes, it does. I start getting a hang on it. Its a bit > >>> puzzling to have two concepts of synchronous I/O around: > >>> > >>> 1) synchronous system call interfaces aka fio I/O engine > >>> > >>> 2) synchronous I/O requests aka O_SYNC > >> > >> But isn´t this a case for iodepth=1 if buffered I/O on Linux is > >> synchronous? I bet most regular applications except some databases > >> use buffered I/O. > > > > Thanks a lot for your answers, Jens, Jeff, DongJin. > > > > Now what about the above one? > > > > In what cases is iodepth > 1 relevant, when Linux buffered I/O is > > synchronous? For mutiple threads or processes? > > iodepth controls what depth fio operates at, not the OS. You are right > in that with iodepth=1, for buffered writes you could be seeing a much > higher depth on the device side. > > So think of iodepth as how many IO units fio can have in flight, > nothing else. Ah okay. So when using iodepth=64 and ioengine=libaio with fio then fio issues 64 I/O requests at once before it bothers waiting for I/O requests to complete. And as the block layer completes I/O requests fio fills up the 64 I/O requests queue. Right? Now when I do have two jobs running at once and iodepth=64, will each process submit 64 I/O requests before waiting thus having at most 128 I/O requests in flight? Or will each process use 32 I/O requests? My bet is that iodepth is per job, per process. > > One process / thread can only submit one I/O at a time with > > synchronous system call I/O, but the function returns when the stuff > > is in the page cache. So first why can´t Linux use iodepth > 1 when > > there is lots of stuff in the page cache to be written out? That > > should help the single process case. > > Since the IO unit is done when the system call returns, you can never > have more than the one in flight for a sync engine. So iodepth > 1 > makes no sense for a sync engine. Makes perfect sense then I understand that iodepth option related to what the fio processes do. > > On the mutiple process/threadsa case Linux gets several I/O requests > > from mutiple processes/threads and thus iodepth > 1 does make sense? > > No. Since each synchronous system call I/O fio job still submits one I/O at a time... > > Maybe it helps getting clear where in the stack iodepth is located > > at, is it > > > > process / thread > > systemcall > > pagecache > > blocklayer > > iodepth > > device driver > > device > > > > ? If so, why can´t Linux not make use of iodepth > 1 with > > synchronous system call I/O? Or is it further up on the system call > > level? But then > > Because it is sync. The very nature of the sync system calls is that > submission and completion are one event. For libaio, you could submit a > bunch of requests before retrieving or waiting for completion of any > one of them. > > The only example where a sync engine could drive a higher queue depth > on the device side is buffered writes. For any other case (reads, > direct writes), you need async submission to build up a higher queue > depth. Great! I think that makes it pretty clear. Thus when I want to read subsequent blocks 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 from a file at once and then wait I need async I/O. Block might be of arbitrary size. What when I use 10 processes, each reading one of these blocks as once? Couldn´t this fill up the queue at the device level? But then different processes usually read different files... ... my question hints at how I/O depths might accumulate at the device level, when several processes are issuing read and/or write requests at once. > > what sense would it make there, when using system calls that are > > asynchronous already? > > (Is that ordering above correct at all?) > > Your ordering looks OK. Now consider where and how you end up waiting > for issued IO, that should tell you where queue depth could build up or > not. So we have several levels of queue depth. - queue depth at the system call level - queue depth at device level === sync I/O engines === queue depth at the system call level = 1 == reads == queue depth at the device level = 1 since read() returns when the data is in RAM and thus is synchronous I/O on the lower level by nature page cache will be used unless direct=1, so one might be measuring RAM / read ahead performance, especially when several read jobs are running concurrently. writes might not hit the device unless direct=1 and thus one should use larger than RAM file size. == writes == queue depth at the device level = depending on the workload upto what the device supports unless direct=1, cause then write() is doing synchronous I/O on the lower level and only returns when data is at least in drive cache === libaio === queue depth at the system call level = iodepth option of fio as long as direct=1, since libaio falls back to synchronous system calls with buffered writes queue depth at the device level = same fio submits as much I/Os as specified by iodepth and only then waits. As the block layer completes I/Os fio fills up the queue. conclusion: thus when I want to measure higher I/O depths at read I need libaio and direct=1. but then I am measuring something that does not have any practical effect on processes that use synchronous system call I/O. so for regular applications ioengine=sync + iodepth=64 gives more realistic results - even when its then just I/O depth 1 for reads - and for databases that use direct I/O ioengine=libaio makes sense and will cause higher I/O depths on the device side if it supports it. anything without direct=1 (or the slower sync=1) is potentially measuring RAM performance. direct=1 omits the page cache. sync=1 basically disables caching on the device / controller side as well. Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html