Re: About SECTION_SIZE_BITS for Sparsemem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 12, 2010 at 07:35:17PM +0900, Minchan Kim wrote:
> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim@xxxxxxxxxxx> wrote:
> >> > Russell,
> >> >
> >> > Hi,
> >> >
> >> > Kukjin Kim wrote:
> >> >> Russell wrote:
> >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's fine.
> >> >> > That doesn't mean the section size is 16MB.
> >> >> >
> >> >> > As I've already said, the section size has _nothing_ what so ever to do
> >> >> > with the size of memory, or the granularity of the size of memory.  By
> >> >> > way of illustration, it is perfectly legal to have a section size of
> >> >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
> >> >> > sections do not have to be completely filled.
> >> >> >

This is accurate although there is an expectation that a section is as
larger or larger than MAX_ORDER_NR_PAGES.

> >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem if
> >> > a
> >> >> section has a hole when initializing sparse memory.
> >> >>
> >> >> I identified that a section doesn't need to be a contiguous area of physical
> >> >> memory when reading your comment with the fact that the mem_map of a section
> >> >> can be smaller than the size of a section.
> >> >>

This should only happen in one case, on ARM and it breaks assumptions.
It is typically assumed that if a page is valid within a block of
MAX_ORDER_NR_PAGES, then the entire range is active. If
CONFIG_HOLES_IN_ZONE is set, then there may be holes within a
MAX_ORDER_NR_PAGES range and there is a performance hit as a result.

There is also an assumption that a section is fully populated or empty.
Look at the implementation of pfn_valid for sparsemem, it checks if the
section has SECTION_HAS_MEM_MAP set and it's the same check for any page
within that section. If there are holes in the section, the pfn_valid()
check would return true.

Check out the comment for memmap_valid_within() which tries to get
around this problem on ARM which is the only architecture punching holes
in its mem_map. As it's only depended on for the information in one proc
file, the performance hit is not a problem but it should not be
considered a typical thing.

> >> >> I found, however, the kernel panics when modifying min_free_kbytes file
> > in
> >> >> the proc filesystem if a section has a hole.
> >> >>
> >> >> While processing the change of min_free_kbytes in the kernel, page
> >> >> descriptors in a hole of an online section is accessed.
> >> >
> >> > As I said, following error happens.
> >> > It would be helpful to me if any opinions or comments.
> >> >
> >>
> >> Could you test below patch?
> >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
> >>
> > Yes, I did it, and no kernel panic happens :-)
> >
> > Same test...
> > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > 2736
> > [root@Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
> > [root@Samsung ~]#
> > [root@Samsung ~]# cat /proc/sys/vm/min_free_kbytes
> > 2730
> >
> >
> >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
> >> *zone)
> >>         for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
> >>                 if (!pfn_valid(pfn))
> >>                         continue;
> >> +
> >>                 page = pfn_to_page(pfn);
> >>
> >> +                /* Watch for unexpected holes punched in the memmap */
> >> +                if (!memmap_valid_within(pfn, page, zone))
> >> +                        continue;
> >> +
> >>                 /* Watch out for overlapping nodes */
> >>                 if (page_to_nid(page) != zone_to_nid(zone))
> >>                         continue;
> >>
> >>
> >>
> >
> > ...Could you please explain about this issue?
> 

The issue is that ARM can create holes within a section of memory which
breaks the memory model by allowing pfn_valid() to return true for PFNs
backed by no memmap. This causes awkwardness.

> The setup_zone_migrate_reserve doesn't check memmap hole.

It doesn't. The worst case scenario is where the hole is punched at the
beginning of a section, pfn_valid returns true but the PFN is junk and
crashes shortly afterwards. This would require a zone to start in a hole
which should never happen - it makes no sense. If this is the scenario
being encountered, ensure that zones do not start in holes.

> I think
> compaction would have the  same problem, too.
> I don't know there is a
> problem in elsewhere.
> Anyway, I think memmap_valid_within calling whenever walking whole pfn
> range isn't a good solution.

No, it's not. The rules for pfn_valid and pfn_valid_within are already poorly
understood and we shouldn't add additional rules on memmap_valid_within just
for ARM if possible. If the problems are being encountered on sparsemem on
ARM, I'd prefer to simply see holes not punched in the memmap within a section!

> We already have pfn_valid. Could we check
> this in there?

Ordinarily, yes you would use pfn_valid or pfn_valid_within. It's only on ARM
where assumptions of the memory model are violated that memmap_valid_within
is used. It's unsatisfactory even there but as it was only used for a
proc file, it wasn't important. I'd really hate to see its use increased.

At the time it was discussed, a "proper" fix would have consumed as much
memory as saved by deleting portions of the memmap and was rejected.

> For example, mem_section have a valid pfn range and then valid section
> can test it in pfn_valid.
> 
> What do you think about it?
> 
> P.S)
> I know Mel is very busy to test to avoid writeback in direct reclaim.

I'm also heavily distracted by internal bugs so I'm afraid I didn't read
this thread. Hopefully the above information is useful to you.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SoC Development]     [Linux Rockchip Development]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Linux SCSI]     [Yosemite News]

  Powered by Linux