On 08/02/2015 at 07:57 PM, Mikulas Patocka wrote: > > > On Sun, 2 Aug 2015, Andreas Hartmann wrote: > >> On 08/01/2015 at 04:20 PM Andreas Hartmann wrote: >>> On 07/28/2015 at 09:29 PM, Mike Snitzer wrote: >>> [...] >>>> Mikulas was saying to biect what is causing ATA to fail. >>> >>> Some good news and some bad news. The good news first: >>> >>> Your patchset >>> >>> f3396c58fd8442850e759843457d78b6ec3a9589, >>> cf2f1abfbd0dba701f7f16ef619e4d2485de3366, >>> 7145c241a1bf2841952c3e297c4080b357b3e52d, >>> 94f5e0243c48aa01441c987743dc468e2d6eaca2, >>> dc2676210c425ee8e5cb1bec5bc84d004ddf4179, >>> 0f5d8e6ee758f7023e4353cca75d785b2d4f6abe, >>> b3c5fd3052492f1b8d060799d4f18be5a5438add >>> >>> seems to work fine w/ 3.18.19 !! >>> >>> Why did I test it with 3.18.x now? Because I suddenly got two ata errors >>> (ata1 and ata2) with clean 3.19.8 (w/o the AMD-Vi IO_PAGE_FAULTs) during >>> normal operation. This means: 3.19 must already be broken, too. >>> >>> Therefore, I applied your patchset to 3.18.x and it seems to work like a >>> charme - I don't get any AMD-Vi IO_PAGE_FAULTs on boot and no ata errors >>> (until now). >>> >>> >>> Next I did: I tried to bisect between 3.18 and 3.19 with your patchset >>> applied, because w/ this patchset applied, the problem can be seen >>> easily and directly on boot. Unfortunately, this does work only a few >>> git bisect rounds until I got stuck because of interferences with your >>> extra patches applied: >> >> [Resolved the problems written at the last post.] >> >> Bisecting ended here: >> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc >> >> block: remove artifical max_hw_sectors cap >> >> >> Removing this patch on 3.19 and 4.1 make things working again. Didn't >> test 4.0, but I think it's the same. No more AMD-Vi IO_PAGE_FAULTS with >> that patch reverted. After long period of testing, I now can say, that max_sectors_kb can be set to 1024 - higher values produce AMD-Vi IO_PAGE_FAULTS and ata faults. This patch "sd: Fix maximum I/O size for BLOCK_PC requests"[1] as part of 4.1.7 produces ata / AMD-Vi IO_PAGE_FAULTS already during boot, too - no matter if "block: remove artifical max_hw_sectors cap"[2] has been applied or not. Next I tested was "dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE" patch[3] applied to an unchanged 4.1.7 kernel w/o setting max_sectors_kb to 1024. Interesting effect was, that booting has been fine, but I could see lots of ata errors afterwards as soon as there is load on the md raid 1 (during kernel compile e.g.), which is built on *rotational* disks: [ 367.264873] ata2.00: exception Emask 0x0 SAct 0x7fbfffff SErr 0x0 action 0x6 frozen [ 367.264883] ata2.00: failed command: WRITE FPDMA QUEUED [ 367.264893] ata2.00: cmd 61/40:00:b0:7b:d4/05:00:06:00:00/40 tag 0 ncq 688128 out [ 367.264893] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 367.264899] ata2.00: status: { DRDY } ... [ 367.265332] ata2.00: failed command: WRITE FPDMA QUEUED [ 367.265339] ata2.00: cmd 61/40:f0:30:71:d4/05:00:06:00:00/40 tag 30 ncq 688128 out [ 367.265339] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 367.265343] ata2.00: status: { DRDY } [ 367.265350] ata2: hard resetting link [ 367.775330] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 367.776970] ata2.00: configured for UDMA/133 [ 367.776997] ata2.00: device reported invalid CHS sector 0 ... [ 367.777761] ata2: EH complete Iow: Using an unpatched kernel >= 3.19 means high risk to break filesystems if there are given some yet unknown conditions [4]. >> >> >> Please check why this patch triggers AMD-Vi IO_PAGE_FAULTS. > > I would submit this bug to maintainers of AMD-Vi. They understand the > hardware, so they should tell why do large I/O requests result in > IO_PAGE_FAULTs. > > It is probably bug either in AMD-Vi driver or in hardware. Until now, I didn't hear anything from the maintainers of AMD-Vi. Regards, Andreas Hartmann [1] http://thread.gmane.org/gmane.linux.kernel.commits.head/538464 [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc [3] http://news.gmane.org/find-root.php?group=gmane.linux.kernel&article=2036495 [4] http://thread.gmane.org/gmane.linux.kernel.pci/43851/focus=44011 -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html