Re: [PATCH] mm: add resched points to remap_pmd_range/ioremap_pmd_range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2015-07-23 at 14:54 -0700, Spencer Baugh wrote:
> From: Joern Engel <joern@xxxxxxxxx>
> 
> Mapping large memory spaces can be slow and prevent high-priority
> realtime threads from preempting lower-priority threads for a long time.

Yes, and one of the goals of large page ioremap support is to address such
problem.

> In my case it was a 256GB mapping causing at least 950ms scheduler
> delay.  Problem detection is ratelimited and depends on interrupts
> happening at the right time, so actual delay is likely worse.

ioremap supports 1GB and 2MB mappings now.  If you create 1GB mappings, you
only need to initialize 256 pud entries, which should not take a long time.

Is the 256GB range aligned by 1GB (or 2MB)?  From the log below, it appears
that you ended up with 4KB mappings, which is the problem.

> ------------[ cut here ]------------
> WARNING: at arch/x86/kernel/irq.c:182 do_IRQ+0x126/0x140()
> Thread not rescheduled for 36 jiffies
> CPU: 14 PID: 6684 Comm: foo Tainted: G           O 3.10.59+
>  0000000000000009 ffff883f7fbc3ee0 ffffffff8163a12c ffff883f7fbc3f18
>  ffffffff8103f131 ffff887f48275ac0 0000000000000012 000000000000007c
>  0000000000000000 ffff887f5bc11fd8 ffff883f7fbc3f78 ffffffff8103f19c
> Call Trace:
>  <IRQ>  [<ffffffff8163a12c>] dump_stack+0x19/0x1b
>  [<ffffffff8103f131>] warn_slowpath_common+0x61/0x80
>  [<ffffffff8103f19c>] warn_slowpath_fmt+0x4c/0x50
>  [<ffffffff810bd917>] ? rcu_irq_exit+0x77/0xc0
>  [<ffffffff8164a556>] do_IRQ+0x126/0x140
>  [<ffffffff816407ef>] common_interrupt+0x6f/0x6f
>  <EOI>  [<ffffffff810fde68>] ? set_pageblock_migratetype+0x28/0x30
>  [<ffffffff8126da37>] ? clear_page_c_e+0x7/0x10
>  [<ffffffff811004b3>] ? get_page_from_freelist+0x5b3/0x880
>  [<ffffffff81100863>] __alloc_pages_nodemask+0xe3/0x810
>  [<ffffffff8126f48b>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>  [<ffffffff81138206>] alloc_pages_current+0x86/0x120
>  [<ffffffff810fc02e>] __get_free_pages+0xe/0x50
>  [<ffffffff81034e85>] pte_alloc_one_kernel+0x15/0x20
>  [<ffffffff8111b6cd>] __pte_alloc_kernel+0x1d/0xf0

This shows that you created 4KB (pte) mappings.

>  [<ffffffff8126531c>] ioremap_page_range+0x2cc/0x320
>  [<ffffffff81031619>] __ioremap_caller+0x1e9/0x2b0
>  [<ffffffff810316f7>] ioremap_nocache+0x17/0x20
>  [<ffffffff81275b45>] pci_iomap+0x55/0xb0
>  [<ffffffffa007f29a>] vfio_pci_mmap+0x1ea/0x210 [vfio_pci]
>  [<ffffffffa0025173>] vfio_device_fops_mmap+0x23/0x30 [vfio]
>  [<ffffffff81124ed8>] mmap_region+0x3d8/0x5e0
>  [<ffffffff811253e5>] do_mmap_pgoff+0x305/0x3c0
>  [<ffffffff8126f3f3>] ? call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff81111677>] vm_mmap_pgoff+0x67/0xa0
>  [<ffffffff811237e2>] SyS_mmap_pgoff+0x272/0x2e0
>  [<ffffffff810067e2>] SyS_mmap+0x22/0x30
>  [<ffffffff81648c59>] system_call_fastpath+0x16/0x1b
> ---[ end trace 6b0a8d2341444bdd ]---
> ------------[ cut here ]------------
> WARNING: at arch/x86/kernel/irq.c:182 do_IRQ+0x126/0x140()
> Thread not rescheduled for 95 jiffies
> CPU: 14 PID: 6684 Comm: foo Tainted: G        W  O 3.10.59+
>  0000000000000009 ffff883f7fbc3ee0 ffffffff8163a12c ffff883f7fbc3f18
>  ffffffff8103f131 ffff887f48275ac0 000000000000002f 000000000000007c
>  0000000000000000 00007fadd1e00000 ffff883f7fbc3f78 ffffffff8103f19c
> Call Trace:
>  <IRQ>  [<ffffffff8163a12c>] dump_stack+0x19/0x1b
>  [<ffffffff8103f131>] warn_slowpath_common+0x61/0x80
>  [<ffffffff8103f19c>] warn_slowpath_fmt+0x4c/0x50
>  [<ffffffff810bd917>] ? rcu_irq_exit+0x77/0xc0
>  [<ffffffff8164a556>] do_IRQ+0x126/0x140
>  [<ffffffff816407ef>] common_interrupt+0x6f/0x6f
>  <EOI>  [<ffffffff81640483>] ? _raw_spin_lock+0x13/0x30
>  [<ffffffff8111b621>] __pte_alloc+0x31/0xc0
>  [<ffffffff8111feac>] remap_pfn_range+0x45c/0x470

remap_pfn_range() does not have large page mappings support yet.  So, yes,
this can still take a long time at this point.  We can extend large page
support for this interface if necessary.

>  [<ffffffffa007f1f8>] vfio_pci_mmap+0x148/0x210 [vfio_pci]
>  [<ffffffffa0025173>] vfio_device_fops_mmap+0x23/0x30 [vfio]
>  [<ffffffff81124ed8>] mmap_region+0x3d8/0x5e0
>  [<ffffffff811253e5>] do_mmap_pgoff+0x305/0x3c0
>  [<ffffffff8126f3f3>] ? call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff81111677>] vm_mmap_pgoff+0x67/0xa0
>  [<ffffffff811237e2>] SyS_mmap_pgoff+0x272/0x2e0
>  [<ffffffff810067e2>] SyS_mmap+0x22/0x30
>  [<ffffffff81648c59>] system_call_fastpath+0x16/0x1b
> ---[ end trace 6b0a8d2341444bde ]---
> ------------[ cut here ]------------
> WARNING: at arch/x86/kernel/irq.c:182 do_IRQ+0x126/0x140()
> Thread not rescheduled for 45 jiffies
> CPU: 18 PID: 21726 Comm: foo Tainted: G           O 3.10.59+
>  0000000000000009 ffff88203f203ee0 ffffffff8163a13c ffff88203f203f18
>  ffffffff8103f131 ffff881ec5f1ad60 0000000000000016 000000000000006e
>  0000000000000000 ffffc939a6dd8000 ffff88203f203f78 ffffffff8103f19c
> Call Trace:
>  <IRQ>  [<ffffffff8163a13c>] dump_stack+0x19/0x1b
>  [<ffffffff8103f131>] warn_slowpath_common+0x61/0x80
>  [<ffffffff8103f19c>] warn_slowpath_fmt+0x4c/0x50
>  [<ffffffff810bd917>] ? rcu_irq_exit+0x77/0xc0
>  [<ffffffff8164a556>] do_IRQ+0x126/0x140
>  [<ffffffff816407ef>] common_interrupt+0x6f/0x6f
>  <EOI>  [<ffffffff81640861>] ? retint_restore_args+0x13/0x13
>  [<ffffffff810346c7>] ? free_memtype+0x87/0x150
>  [<ffffffff8112bb46>] ? vunmap_page_range+0x1e6/0x2a0
>  [<ffffffff8112c5e1>] remove_vm_area+0x51/0x70
>  [<ffffffff810318a7>] iounmap+0x67/0xa0

iounmap() should be fast if you created 1GB mappings.

Thanks,
-Toshi

>  [<ffffffff812757e5>] pci_iounmap+0x35/0x40
>  [<ffffffffa00973da>] vfio_pci_release+0x9a/0x150 [vfio_pci]
>  [<ffffffffa0065cbc>] vfio_device_fops_release+0x1c/0x40 [vfio]
>  [<ffffffff8114d82b>] __fput+0xdb/0x220
>  [<ffffffff8114d97e>] ____fput+0xe/0x10
>  [<ffffffff810614ac>] task_work_run+0xbc/0xe0
>  [<ffffffff81043d0e>] do_exit+0x3ce/0xe50
>  [<ffffffff8104557f>] do_group_exit+0x3f/0xa0
>  [<ffffffff81054769>] get_signal_to_deliver+0x1a9/0x5b0
>  [<ffffffff810023f8>] do_signal+0x48/0x5e0
>  [<ffffffff81056778>] ? k_getrusage+0x368/0x3d0
>  [<ffffffff810736e2>] ? default_wake_function+0x12/0x20
>  [<ffffffff816471c0>] ? kprobe_flush_task+0xc0/0x150
>  [<ffffffff81070684>] ? finish_task_switch+0xc4/0xe0
>  [<ffffffff810029f5>] do_notify_resume+0x65/0x80
>  [<ffffffff8164098e>] retint_signal+0x4d/0x9f
> ---[ end trace 3506c05e4a0af3e5 ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]