On 10/09/2015 at 07:20 AM, Andreas Hartmann wrote: > On 10/08/2015 at 09:52 PM, Andreas Hartmann wrote: >> On 10/08/2015 at 08:21 PM, Andreas Hartmann wrote: >>> Am 08.10.2015 um 18:39 schrieb Joerg Roedel: >>>> On Wed, Oct 07, 2015 at 06:52:58PM +0200, Andreas Hartmann wrote: >>>>> To reproduce the error: >>>>> First I mounted /daten2, afterwards /raid/mt, which produces the errors. >>>>> The ssd mounts have been already active (during boot by fstab). >>>> >>>> Okay, I spent the day on that problem, and managed to reproduce it here >>>> on one of my AMD IOMMU boxes. I wasn't an easy journey, as I can only >>>> reproduce it if I setup the crypto partition and everything above that >>>> (like mounting the lvm volumes) _after_ the system has finished booting. >>>> If everything is setup during system boot it works fine and I don't see >>>> any IO_PAGE_FAULTS. >>> >>> Thank you very much for spending so much of your time to reproduce the >>> problem! >>> >>>> I also tried kernel v4.3-rc4 first, to have it tested with a >>>> self-compiled kernel. It didn't show up there, so I built a 4.1.0, where >>>> it showed up again. Something seems to have fixed the issue in the >>>> latest kernels. >>>> >>>> So I looked a little bit around at the commits that were merged into the >>>> respective parts involved here, and found this one: >>>> >>>> 586b286 dm crypt: constrain crypt device's max_segment_size to >>>> PAGE_SIZE >>>> >>>> The problem fixed with this commit looks quite similar to what you have >>>> seen (execpt that there was no IOMMU involved). So I cherry-picked that >>>> commit on 4.1.0 and tested that. The problem was gone. >>> >>> That's true - I already knew this patch and tested it some weeks ago - >>> unfortunately it doesn't fix the problem here. >>> >>> To be really sure, I just retested it now again. I couldn't see any >>> IO_PAGE_FAULTS errors today (unfortunately I can't remember anymore if I >>> didn't see them too a few weeks ago) - but the ata errors remain. >>> Therefore, this patch isn't a solution for the problem I encounter here. >>> >>>> So it looks like it was a dm-crypt issue, the patch went into v4.3-rc3, >>>> either this kernel of rc4 should fix the problem for you too. Can you >>>> please verify this is fixed for you too with v4.3-rc4? >>> >>> As I already wrote, I even couldn't see the problem with v4.3-rc2 any >>> more (as far as I was able to test because of the other problem). I have >>> to do some more tests now with this kernel to be really sure. >> >> I now tested w/ v4.3-rc4. I couldn't see any IO_PAGE_FAULTS but the ata >> errors remain. The ata errors can be easily activated by copying a large >> file (> 4 GB) from one partition on the raid to another partition on the >> raid. > > Hmmm, I retested this morning w/ v4.3-rc4 and 4.1.10 (with the above > mentioned patch applied) - and now, I didn't get any more ata errors. > > I'm confused now. The only difference between yesterday evening and this > morning was, that the machine was over night completely powerless (via > socket outlet switch). Could this really be the reason? Let's wait and > see if this is a persistent state ... . No - it is not a persistent state. The ata errors are back again (in 3.1.10 w/ the above mentioned patch applied). It just isn't that easy any more to trigger them. After a short time of intermission w/ power off / on cycle, the error came up up again doing the first test copy. This means: there must be something more broken. If I revert the original culprit of all of the problems (block: remove artifical max_hw_sectors cap), it is possible to increase max_sectors_kb to 1024 - any higher value leads to ata or IO_PAGE_FAULTS sooner or later. v4.3-rc4 isn't usable at all for me as long as is hangs the machine on the necessary PCI passthrough for VMs (I need them). Regards, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html