Adding to David's comments: my success percentage was roughly 30%. Most of the time, 2 out of 3 runs would hang. Thanks for your involvement here Sitsofe. \Rob On 07/03/2018, 17:33, "David Knierim" <knierim@xxxxxxxxxxx> wrote: Sitsofe, Thanks for your interest in resolving this issue. I no longer have a system to dig into this issue at the moment, but I will attempt to get back to this issue soon. That being said, I like your questions and I am happy to answer them: > can you check whether fio 3.5 also reproduces the problem? When I get a system to run on, I will attempt to reproduce the issue with fio 3.5. Yes, the example fio command line examples you showed are indeed what the script is supposed to be doing. If it helps, I also have a bash script which generates the same fio commands as the python script which also reproduces the issue. I have not attempted to reproduce the issue with files or less raw disks. I can see your desire for a simplified reproducer. When I get back to this issue, I will see what I can come up with. I am sure something simpler will reproduce the issue, but I have not spent the time to find something simpler that works. I will explore number of disks/files, numjobs and also determine if the working set makes any difference. In my experience, the python script never completed when run on Windows. Rob had better luck than I did and the script ran to completion for him several times, but it also showed the failure multiple times as well, but I don't remember what the pass percentage was for him. When I run the same python or bash script on Linux (just updating the path to the raw disks and the ioengine), it runs 100% reliably. Thanks again, David On 3/7/18, 1:01 AM, "Sitsofe Wheeler" <sitsofe@xxxxxxxxx> wrote: Hi Rob, David! On 6 March 2018 at 20:01, Rebecca Cran <rebecca@xxxxxxxxxxxx> wrote: > On 3/6/2018 9:35 AM, Sitsofe Wheeler wrote: >> >> >> I tried out the python script but it seemed to be complaining about a >> whitespace issue. After fixing that up it's unclear exactly what fio >> command lines it runs. I think for others to dig in we'd need >> something less fiddly like the raw fio command lines that generate the >> hang. Is there any chance the mystery user could join the mailing list >> so they can answer questions directly? Ideally we'd need a job that >> works on files within a filesystem and ideally nice small files so it >> could go into an AppVeyor job... >> > > I've CC'd Rob and David. > You can find the FIO command lines used by running the script in verbose > mode: it lists them all at the start, then steps through them one by one. > Unfortunately nobody's had any luck in narrowing down a single run that > causes the hang. > I forgot to say that the problem has been seen on FIO 3.1 - I never got > around to trying it on version 3.5, which is now on https://urldefense.proofpoint.com/v2/url?u=https-3A__bluestop.org_fio&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=Bg0zthdIszvkG4nFhDtYFPUGQbQFMAIndMvXqABqJjo&m=8kyD8W-AtALPg5fmmA3JgHijeRdF0kX7Jc-0r4QyrjE&s=Z_ELgdkCUioTlDh1xSF-SCddfCe2WEoOFnFqNcN-TxQ&e= > . Rebecca has been explaining you've generated a reproducible hang on Windows fio. Just for reference, can you check whether fio 3.5 also reproduces the problem? It looks like your python script generates fio lines which are based off config lines similar to the following: config: {'bs': 4096, 'filename': '\\\\.\\PhysicalDrive1', 'numjobs': 1, 'runtime': '20', 'iodepth': 1, 'dir': 'randread', 'size': '20G'} config: {'bs': 16384, 'filename': '\\\\.\\PhysicalDrive1:\\\\.\\PhysicalDrive2:\\\\.\\PhysicalDrive3:\\\\.\\PhysicalDrive4:\\\\.\\PhysicalDrive5:\\\\.\\PhysicalDrive6:\\\\.\\PhysicalDrive7:\\\\.\\PhysicalDrive8', 'numjobs': 240, 'runtime': '20', 'iodepth': 1, 'dir': 'randread', 'size': '20G'} config: {'bs': 1048576, 'filename': '\\\\.\\PhysicalDrive1', 'numjobs': 30, 'runtime': '20', 'iodepth': 1, 'dir': 'randread', 'size': '20G'} Is this correct? I'm afraid I'm not in a position where I can easily debug this using the Python script on Windows (sadly I have no interactive access to Windows machines at the moment) but a few things might help to narrow down the problem: How frequently is the script able to reproduce the problem? Can you substitute files for disks and still reproduce the problem? What are the minimum number of filenames involved when you've seen a hang? Does it always need more than 1? What's the smallest size that you can use that still reproduces the problem? What's the smallest amount of numjobs that reproduce the problem? Does it always need more than 1? Ideally if we can get down to the stage where we run say only two fio lines repeatedly in a bash script and make the problem happen it will make it easier for others to see the problem too... -- Sitsofe | https://urldefense.proofpoint.com/v2/url?u=http-3A__sucs.org_-7Esits_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=Bg0zthdIszvkG4nFhDtYFPUGQbQFMAIndMvXqABqJjo&m=8kyD8W-AtALPg5fmmA3JgHijeRdF0kX7Jc-0r4QyrjE&s=31fBjMR9Cd95PL7Ssm7hV2W8Kr5WmeHkp-ampBKgCfM&e= ��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�