On Thu 21-02-19 19:11:14, Aneesh Kumar K.V wrote: > On 2/21/19 5:42 PM, Jan Kara wrote: > > Hi Aneesh, > > > > On Thu 21-02-19 12:52:39, Aneesh Kumar K.V wrote: > > > We found this while testing dax with XFS, but i guess this is true for > > > other file systems too. The stack trace looks as > > > > > > [c00000000007610c] set_pte_at+0x3c/0x190 > > > LR [c000000000378628] insert_pfn+0x208/0x280 > > > Call Trace: > > > [c0000002125df980] [8000000000000104] 0x8000000000000104 (unreliable) > > > [c0000002125df9c0] [c000000000378488] insert_pfn+0x68/0x280 > > > [c0000002125dfa30] [c0000000004a5494] dax_iomap_pte_fault.isra.7+0x734/0xa40 > > > [c0000002125dfb50] [c000000000627250] __xfs_filemap_fault+0x280/0x2d0 > > > [c0000002125dfbb0] [c000000000373abc] do_wp_page+0x48c/0xa40 > > > [c0000002125dfc00] [c000000000379170] __handle_mm_fault+0x8d0/0x1fd0 > > > [c0000002125dfd00] [c00000000037a9b0] handle_mm_fault+0x140/0x250 > > > [c0000002125dfd40] [c000000000074bb0] __do_page_fault+0x300/0xd60 > > > [c0000002125dfe20] [c00000000000acf4] handle_page_fault+0x18 > > > > > > > > > Now that is WARN_ON in set_pte_at which is > > > > > > VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); > > > > > > Multiple architecture optimize set_pte_at based on the assumption that > > > we will never use set_pte_at to update a valid pte entry. This helps in > > > avoid flushing tlb etc. We should be using ptep_set_access_flags for > > > this. > > > > Hum, I didn't know about this assumption and neither did lot of other > > people reviewing DAX patches. Is this documented somewhere? Any answer here? > > > I guess iomap code doesn't handle this correctly? Or am I missing > > > some other ways we can end up flushing tlb? > > > > So for RW->RO transition we use ptep_clear_flush() in dax_entry_mkclean() > > so that one is certainly safe. Similarly for unmapping. The RO->RW > > transition does not seem to have any TLB flush so there TLB could still > > carry stale information but it's the same as with normal page faults on > > invalid PTEs or with protection faults for normal pages (see e.g. > > finish_mkwrite_fault()). > > I am not sure i understood that. RO -> RW transition can have stale TLB > entries with RO mapping in them. So architecture do flush TLB during the > fault, some may not. We do have a NestMMU issue with that transition which > requires us to do mark the pte invalid and flush TLB for that transition. > (see commit bd5050e38aec3055ff4257ade987d808ac93b582 ) OK, I see. > For invalid PTEs we should have a TLB entry at all. > > finish_mkwrite_fault do use wp_page_reuse() which does the right thing. Yes, sorry. Somehow I've misread the code. > > The only thing that's remaining is a situation > > when we replace a PTE with zero page with a PTE pointing to a real storage > > (block allocation on protection fault). However in this case we do > > unmap_mapping_pages() in dax_insert_entry() so the PTE actually gets > > cleared before we install a new correct block mapping. So this case is safe > > as well. Am I missing something? > > > > Do pfn_mkwrite callback need to insert the pfn details for a RO->RW fault > type? Can't we skip that pfn insert and let finish_mkwrite_fault handle that > pte update? Yes, pfn_mkwrite() must fully update the PTE as the PTE update must happen under a lock that is private to DAX code. Using ptep_set_access_flags() in iomap code isn't going to be simple either. I have to think whether / how that is possible. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR