Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi John,

On Thu, Jun 30, 2022 at 11:52:18AM +0100, John Garry wrote:
> > > > 
> > > > >    [    4.319253] iommu: Adding device 0000:06:00.2 to group 5
> > > > >    [    4.325869] iommu: Adding device 0000:20:01.0 to group 15
> > > > >    [    4.332648] iommu: Adding device 0000:20:02.0 to group 16
> > > > >    [    4.338946] swapper/0 invoked oom-killer:
> > > > > gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null),
> > > > > order=0, oom_score_adj=0
> > > > >    [    4.350251] swapper/0 cpuset=/ mems_allowed=0
> > > > >    [    4.354618] CPU: 0 PID: 1 Comm: swapper/0 Not
> > > > > tainted 4.19.57.mx64.282 #1
> > > > >    [    4.355612] Hardware name: Dell Inc. PowerEdge
> > > > > R7425/08V001, BIOS 1.9.3 06/25/2019
> > > > >    [    4.355612] Call Trace:
> > > > >    [    4.355612]  dump_stack+0x46/0x5b
> > > > >    [    4.355612]  dump_header+0x6b/0x289
> > > > >    [    4.355612]  out_of_memory+0x470/0x4c0
> > > > >    [    4.355612]  __alloc_pages_nodemask+0x970/0x1030
> > > > >    [    4.355612]  cache_grow_begin+0x7d/0x520
> > > > >    [    4.355612]  fallback_alloc+0x148/0x200
> > > > >    [    4.355612]  kmem_cache_alloc_trace+0xac/0x1f0
> > > > >    [    4.355612]  init_iova_domain+0x112/0x170
> 
> Note for Feng Tang: This callchain does not exist anymore since we separated
> out the rcache init from the IOVA domain init. Indeed, not so much memory is
> wasted for unused rcaches now.
 
Thanks for the info, I also planned to remove the callstack as Robin suggested.  

> My point really is that it would be nicer to see a modern callchain - but
> don't read that as me saying that the change is this patch is bad.
> 
> > > > >    [    4.355612]  amd_iommu_domain_alloc+0x138/0x1a0
> > > > >    [    4.355612]  iommu_group_get_for_dev+0xc4/0x1a0
> > > > >    [    4.355612]  amd_iommu_add_device+0x13a/0x610
> > > > >    [    4.355612]  add_iommu_group+0x20/0x30
> > > > >    [    4.355612]  bus_for_each_dev+0x76/0xc0
> > > > >    [    4.355612]  bus_set_iommu+0xb6/0xf0
> > > > >    [    4.355612]  amd_iommu_init_api+0x112/0x132
> > > > >    [    4.355612]  state_next+0xfb1/0x1165
> > > > >    [    4.355612]  amd_iommu_init+0x1f/0x67
> > > > >    [    4.355612]  pci_iommu_init+0x16/0x3f
> > > > >    ...
> > > > >    [    4.670295] Unreclaimable slab info:
> > > > >    ...
> > > > >    [    4.857565] kmalloc-2048           59164KB      59164KB
> > > > > 
> > > > > Change IOVA_MAG_SIZE from 128 to 127 to make size of 'iova_magazine'
> > > > > 1024 bytes so that no memory will be wasted.
> > > > > 
> > > > > [1]. https://lkml.org/lkml/2019/8/12/266
> > > > > 
> > > > > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx>
> > > > > ---
> > > > >   drivers/iommu/iova.c | 7 ++++++-
> > > > >   1 file changed, 6 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> > > > > index db77aa675145b..27634ddd9b904 100644
> > > > > --- a/drivers/iommu/iova.c
> > > > > +++ b/drivers/iommu/iova.c
> > > > > @@ -614,7 +614,12 @@ EXPORT_SYMBOL_GPL(reserve_iova);
> > > > >    * dynamic size tuning described in the paper.
> > > > >    */
> > > > > -#define IOVA_MAG_SIZE 128
> > > > > +/*
> > > > > + * As kmalloc's buffer size is fixed to power of 2, 127 is chosen to
> > > > > + * assure size of 'iova_magzine' to be 1024 bytes, so that no memory
> > > > 
> > > > Typo: iova_magazine
> > > > 
> > > > > + * will be wasted.
> > > > > + */
> > > > > +#define IOVA_MAG_SIZE 127
> > > 
> > > I do wonder if we will see some strange new behaviour since
> > > IOVA_FQ_SIZE % IOVA_MAG_SIZE != 0 now...
> > 
> > I doubt it - even if a flush queue does happen to be entirely full of
> > equal-sized IOVAs, a CPU's loaded magazines also both being perfectly
> > empty when it comes to dump a full fq seem further unlikely, so in
> > practice I don't see this making any appreciable change to the
> > likelihood of spilling back to the depot or not. In fact the smaller the
> > magazines get, the less time would be spent flushing the depot back to
> > the rbtree, where your interesting workload falls off the cliff and> > never catches back up with the fq timer, so at some point it might even
> > improve (unless it's also already close to the point where smaller
> > caches would bottleneck allocation)... might be interesting to
> > experiment with a wider range of magazine sizes if you had the time and
> > inclination.
> > 
> 
> ok, what you are saying sounds reasonable. I just remember that when we
> analyzed the longterm aging issue that we concluded that the FQ size and its
> relation to the magazine size was a factor and this change makes me a little
> worried about new issues. Better the devil you know and all that...
> 
> Anyway, if I get some time I might do some testing to see if this change has
> any influence.
> 
> Another thought is if we need even store the size in the iova_magazine? mags
> in the depot are always full. As such, we only need worry about mags loaded
> in the cpu rcache and their sizes, so maybe we could have something like
> this:
> 
> struct iova_magazine {
> -       unsigned long size;
>        unsigned long pfns[IOVA_MAG_SIZE];
> };
> 
> @@ -631,6 +630,8 @@ struct iova_cpu_rcache {
>        spinlock_t lock;
>        struct iova_magazine *loaded;
>        struct iova_magazine *prev;
> +       int loaded_size;
> +       int prev_size;
> };
> 
> I haven't tried to implement it though..

I have very few knowledge of iova, so you can chose what's the better
solution. I just wanted to raise the problem and will be happy to see
it solved :)

Thanks,
Feng

> Thanks,
> John




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux