On Wed, Feb 3, 2016 at 1:35 PM, David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > On Wed, 2016-02-03 at 13:07 +0200, Oded Gabbay wrote: >> > Another, perhaps trivial, question. >> > When there is an address fault, who handles it ? the SVM driver, or >> > each device driver ? >> > >> > In other words, is the model the same as (AMD) IOMMU where it binds >> > amd_iommu driver to the IOMMU H/W, and that driver (amd_iommu/v2) is >> > the only one which handles the PPR events ? >> > >> > If that is the case, then with SVM, how will the device driver be made >> > aware of faults, if the SVM driver won't notify him about them, >> > because it has already severed the connection between PASID and >> > process ? > > In the ideal case, there's no need for the device driver to get > involved at all. When a page isn't found in the page tables, the IOMMU > code calls handle_mm_fault() and either populates the page and sends a > a 'success' response, or sends an 'invalid fault' response back. > > To account for broken hardware, we *have* added a callback into the > device driver when these faults happen. Ideally it should never be > used, of course. > > In the case where the process has gone away, the PASID is still > assigned and we still hold mm_count on the MM, just not mm_users. This > callback into the device driver still occurs if a fault happens during > process exit between the exit_mm() and exit_files() stage. > >> And another question, if I may, aren't you afraid of "false positive" >> prints to dmesg ? I mean, I'm pretty sure page faults / pasid faults >> errors will be logged somewhere, probably to dmesg. Aren't you >> concerned of the users seeing those errors and thinking they may have >> a bug, while actually the errors were only caused by process >> termination ? > > If that's the case, it's easy enough to silence them. We are already > explicitly testing for the 'defunct mm' case in our fault handler, to > prevent us from faulting more pages into an obsolescent MM after its > mm_users reaches zero and its page tables are supposed to have been > torn down. That's the 'if(!atomic_inc_not_zere(&svm->mm->mm_users)) > goto bad_req;' part. > >> Or in that case you say that the application is broken, because if it >> still had something running in the H/W, it should not have closed >> itself ? > > That's also true but it's still nice to avoid confusion. Even if only > to disambiguate cause and effect — we don't want people to see PASID > faults which were caused by the process crashing, and to think that > they might be involved in *causing* that process to crash... Yes, that's why in our model, we aim to kill all running waves *before* the amd_iommu_v2 driver unbinds the PASID. > > -- > David Woodhouse Open Source Technology Centre > David.Woodhouse@xxxxxxxxx Intel Corporation > It seems you have most of your bases covered. I'll stop harassing you now :) But in seriousness, its interesting to see the different approaches taken to handling pretty much the same type of H/W (IOMMU). Thanks for your patience in answering my questions. Oded -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href