I have a scripted process that uses fio and after a few tests I start seeing a lot of errors: <sr630-5> Starting 16 processes <sr630-6> Starting 16 processes client <sr630-6>: timeout on SEND_ETA client <sr630-5>: timeout on SEND_ETA client <sr630-6>: timeout on SEND_ETA client <sr630-5>: timeout on SEND_ETA client <sr630-6>: timeout on SEND_ETA client <sr630-5>: timeout on SEND_ETA fio: client: unable to find matching tag (1e278e0) fio: client: unable to find matching tag (1e274b0) client <sr630-6>: timeout on SEND_ETA client <sr630-5>: timeout on SEND_ETA client <sr630-6>: timeout on SEND_ETA client <sr630-5>: timeout on SEND_ETA fio: client: unable to find matching tag (1e278e0) fio: client: unable to find matching tag (1e274b0) client <sr630-6>: timeout on SEND_ETA client <sr630-5>: timeout on SEND_ETA client <sr630-6>: timeout on SEND_ETA fio: client sr630-6, timeout on cmd SEND_ETA fio: client sr630-6 timed out client <sr630-5>: timeout on SEND_ETA fio: client sr630-5, timeout on cmd SEND_ETA fio: client sr630-5 timed out The jobs are intended to drive the client system and storage as hard as possible, so I may be pushing over some kind of boundary perhaps? The issue doesn’t occur with 5 minute job runs, but it does with 1hr job runs making me think it is tied to the job duration in some way. fio 3.10 kernel suse-4.4.171-94.76-default network is 100G, nodes have xeon silver 4110 with spectre/meltdown disabled The jobs are intense, QD=32+, numjobs=16. I see a lot more failures with small I/Os, especially random. This is true on both spinning and SSD based storage. Is there anything that can be tweaked from the jobfile definition to lengthen timeouts? Other thoughts? David Byte Sr. Technology Strategist