On Mon 06-11-17 15:19:46, Vlastimil Babka wrote: > On 11/06/2017 02:40 PM, Michal Hocko wrote: > > On Mon 06-11-17 13:12:22, Michal Hocko wrote: > >> On Mon 06-11-17 13:00:25, Peter Zijlstra wrote: > >>> On Mon, Nov 06, 2017 at 11:43:54AM +0100, Michal Hocko wrote: > >>>>> Yes the comment is very much accurate. > >>>> > >>>> Which suggests that print_vma_addr might be problematic, right? > >>>> Shouldn't we do trylock on mmap_sem instead? > >>> > >>> Yes that's complete rubbish. trylock will get spurious failures to print > >>> when the lock is contended. > >> > >> Yes, but I guess that it is acceptable to to not print the state under > >> that condition. > > > > So what do you think about this? I think this is more robust than > > playing tricks with the explicit preempt count checks and less tedious > > than checking to make it conditional on the context. This is on top of > > Linus tree and if accepted it should replace the patch discussed here. > > --- > > From 0de6d57cbc54ee2686d1f1e4ffcc4ed490ded8aa Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mhocko@xxxxxxxx> > > Date: Mon, 6 Nov 2017 14:31:20 +0100 > > Subject: [PATCH] mm: do not rely on preempt_count in print_vma_addr > > > > The preempt count check on print_vma_addr has been added by e8bff74afbdb > > ("x86: fix "BUG: sleeping function called from invalid context" in > > print_vma_addr()") and it relied on the elevated preempt count from > > preempt_conditional_sti because preempt_count check doesn't work on > > non preemptive kernels by default. The code has evolved though and > > d99e1bd175f4 ("x86/entry/traps: Refactor preemption and interrupt flag > > handling") has replaced preempt_conditional_sti by an explicit > > preempt_disable which is noop on !PREEMPT so the check in print_vma_addr > > is broken. > > > > Fix the issue by using trylock on mmap_sem rather than chacking the > > preempt count. The allocation we are relying on has to be GFP_NOWAIT > > as well. There is a chance that we won't dump the vma state if the lock > > is contended or the memory short but this is acceptable outcome and much > > less fragile than the not working preemption check or tricks around it. > > If we fail to allocate the page, we could still print the addresses, > just miss the filename? But that's an improvement, not a fix. Agreed. Or we could have some preallocated buffer if this is more widespread pattern > > Fixes: d99e1bd175f4 ("x86/entry/traps: Refactor preemption and interrupt flag handling") > > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> > > Acked-by: Vlastimil Babka <vbabka@xxxxxxx> Thanks! > > > --- > > mm/memory.c | 8 +++----- > > 1 file changed, 3 insertions(+), 5 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index a728bed16c20..1e308ac8ca0a 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -4457,17 +4457,15 @@ void print_vma_addr(char *prefix, unsigned long ip) > > struct vm_area_struct *vma; > > > > /* > > - * Do not print if we are in atomic > > - * contexts (in exception stacks, etc.): > > + * we might be running from an atomic context so we cannot sleep > > */ > > - if (preempt_count()) > > + if (!down_read_trylock(&mm->mmap_sem)) > > return; > > > > - down_read(&mm->mmap_sem); > > vma = find_vma(mm, ip); > > if (vma && vma->vm_file) { > > struct file *f = vma->vm_file; > > - char *buf = (char *)__get_free_page(GFP_KERNEL); > > + char *buf = (char *)__get_free_page(GFP_NOWAIT); > > if (buf) { > > char *p; > > > > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>