Re: Linux OS killed fio process because fio invoked oom_killer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I took a look, and it's a regression introduced by this change in 2012:

commit c04e4661e4da3b6079f415897e4507cf8e610c54
Author: Daniel Ehrenberg <dehrenberg@xxxxxxxxxx>
Date:   Fri Mar 16 18:54:15 2012 +0100

    time_based: Avoid restarting main I/O loop

that patch tries to keep us in the main loop and reset while going, which is a good thing for short jobs as it keeps the overhead low. But it breaks verification for short jobs! I've checked in this fix:

http://git.kernel.dk/cgit/fio/commit/?id=f1a32461c844c7ba9314f66dd28b5a01ca7cb69a

Please try and see if that fixes things for you. You ran into OOM because you had async verify enabled, yet we never go to run it. So we just kept piling on buffers to verify, but we never did...


On 03/24/2016 08:06 AM, Jens Axboe wrote:
I'll take a look at it. The device is only 128MB? Did you mean GB?

What version of fio are you running?

On 03/24/2016 06:50 AM, flash yan wrote:
Another thing is that older version fio don't have this issue.

2016-03-24 7:32 GMT+08:00 Jeff Furlong <jeff.furlong@xxxxxxxx>:
I believe only the CRC is buffered in DRAM.  So if your IO's size
(bs=X) is large or small, the buffered CRC is the same size per IO.
But, as you increase the bs, the IOPs decreases.  As you decrease the
bs, the IOPs increases.  The total amount of buffered CRC's in DRAM
increases with more IOPs (with a fixed runtime).  You can calculate
out how many IO's times your CRC size will fit into DRAM, then set
your verify_backlog value to be less than that.

Regards,
Jeff

-----Original Message-----
From: flash yan [mailto:flashyan83@xxxxxxxxx]
Sent: Wednesday, March 23, 2016 3:52 PM
To: Jeff Furlong <jeff.furlong@xxxxxxxx>
Cc: Jens Axboe <axboe@xxxxxxxxx>; fio@xxxxxxxxxxxxxxx
Subject: Re: Linux OS killed fio process because fio invoked oom_killer

I will try verify_backlog option.
I have a question. Why it happened with io_size to 4096 not other
io_size? Other io_size should have same problem.

2016-03-24 3:29 GMT+08:00 Jeff Furlong <jeff.furlong@xxxxxxxx>:
I believe you are seeing expected behavior.  When verify is enabled,
the written data is buffered in DRAM until the job is finished, then
compared by reading data from the device.  If the device capacity is
large, or if the device capacity is small but you set the runtime,
you will buffer many IO's.  So the oom_killer sees the process as
hogging most of the DRAM, then kills it.  When verify is disabled,
no buffering takes place, so no oom_killer.

Try the verify_backlog option.  If you have a 4KB bs, and you set
verify_backlog=1048576, then you'll write out 4GB of data, then read
it back and compare with the DRAM buffer, then start again.  Just be
sure the verify_backlog value is less than your free DRAM.

Regards,
Jeff


-----Original Message-----
From: fio-owner@xxxxxxxxxxxxxxx [mailto:fio-owner@xxxxxxxxxxxxxxx] On
Behalf Of flash yan
Sent: Wednesday, March 23, 2016 8:10 AM
To: Jens Axboe <axboe@xxxxxxxxx>
Cc: fio@xxxxxxxxxxxxxxx
Subject: Re: Linux OS killed fio process because fio invoked
oom_killer

I have run fio without verify and this issue didn't happen. So it
should be verify issue.
The fio job file is as below:

[global]
thread=1
invalidate=1
rw=randwrite
time_based=1
runtime=3000
rwmixread=50
ioengine=libaio
direct=1
bs=4096
iodepth=16
verify_dump=1
verify_async=10
do_verify=1
verify=meta
verify_pattern="meta"
[job0]
filename=/dev/sda
[job1]
filename=/dev/sdb

I think you can use ram disk(ubuntu have ram disk /dev/ram*) to
reproduce this issue.
It happened with devices which have high speed.

2016-03-23 8:42 GMT+08:00 Jens Axboe <axboe@xxxxxxxxx>:
What job did you run? When reporting a potential issue, always
include that. Hard to help or advise otherwise.

On Mar 22, 2016, at 5:12 PM, flash yan <flashyan83@xxxxxxxxx> wrote:

This issue happened after about 20 minutes. The iscsi device is very
small, only 128MB.
As you said, I have enabled verify= options.
I will try big iscsi device and no verify.

Thanks

Liang Yan

2016-03-23 0:30 GMT+08:00 Jens Axboe <axboe@xxxxxxxxx>:
On 03/22/2016 08:06 AM, flash yan wrote:

Hi all,

I have run fio-2.7 to test iscsi device, one unusual issue
happened.
If I set the io_size to 4096, queue_depth to 16 ,rw to randwrite
and run_time to 3000, the fio would invoke oom_killer and the
Linux OS would kill the fio process.
The machine have about 11 GB memory and I have tried the machine
with 23GB, the issue also happened.
I think fio have problem when dealing with 4KB io_size then used
too many memory.


When did this happen - shortly after the job is started, or long
after? How big is the iscsi device? Did you have verify= options
enabled?

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in the
body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
http://vger.kernel.org/majordomo-info.html
Western Digital Corporation (and its subsidiaries) E-mail
Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain
confidential or legally privileged information of WDC and/or its
affiliates, and are intended solely for the use of the individual or
entity to which they are addressed. If you are not the intended
recipient, any disclosure, copying, distribution or any action taken
or omitted to be taken in reliance on it, is prohibited. If you have
received this e-mail in error, please notify the sender immediately
and delete the e-mail in its entirety from your system.
Western Digital Corporation (and its subsidiaries) E-mail
Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain
confidential or legally privileged information of WDC and/or its
affiliates, and are intended solely for the use of the individual or
entity to which they are addressed. If you are not the intended
recipient, any disclosure, copying, distribution or any action taken
or omitted to be taken in reliance on it, is prohibited. If you have
received this e-mail in error, please notify the sender immediately
and delete the e-mail in its entirety from your system.




--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux