On Wed, 29 Apr 2020 16:52:46 -0700 Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > On 4/29/20 3:53 PM, Claudio Imbrenda wrote: > >> Actually, that's the problem. You've gone through all these > >> careful checks and made the page inaccessible. *After* that > >> process, how do you keep the page from being hit by an I/O device > >> before it's made accessible again? My patch just assumes that > >> *all* pages have gone through that process and passed those > >> checks. > > I don't understand what you are saying here. > > > > we start with all pages accessible, we mark pages as inaccessible > > when they are imported in the secure guest (we use the PG_arch_1 > > bit in struct page). we then try to catch all I/O on inaccessible > > pages and make them accessible so that I/O devices are happy. > > The catching mechanism is incomplete, that's all I'm saying. well, sendto in the end does a copy_from_user or a get_user_pages_fast, both are covered (once we fix the make_accessible to work on FOLL_GET too). > Without looking too hard, and not even having the hardware, I've found > two paths where the "catching" was incomplete: > > 1. sendfile(), which you've patched > 2. sendto(), which you haven't patched > > > either your quick and dirty patch was too dirty (e.g. not accounting > > for possible races between make_accessible/make_inaccessible), or > > some of the functions in the trace you provided should do > > pin_user_page instead of get_user_page (or both) > > I looked in the traces for any races. For sendto(), at least, the > make_accessible() never happened before the process exited. That's > entirely consistent with the theory that it entirely missed being > caught. I can't find any evidence that there were races. > > Go ahead and try it. You have the patch! I mean, I found a bug in > about 10 minutes in one tiny little VM. I tried your patch, but I could not reproduce the problem. I have a Debian 10 x86_64 with the latest kernel from master and your patch on top. is there anything I'm missing? which virtual devices are you using? any particular .config options? I could easily get the mm_make_accessible tracepoints, but I never manage to trigger the mm_accessible_error ones. are you using transparent hugepages by any chance? the infrastructure for inaccessible pages is meant only for small pages, since on s390x only small pages can ever be used for secure guests and therefore become inaccessible. > And, yes, you need to get rid of the FOLL_PIN check unless you want to > go change a big swath of the remaining get_user_pages*() sites.