Re: [00/41] Large Blocksize Support V7 (adds memmap support)

Goswin von Brederlow <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx> · Sun, 23 Sep 2007 08:49:08 +0200

mel@xxxxxxxxx (Mel Gorman) writes:

> On (17/09/07 00:38), Goswin von Brederlow didst pronounce:
>> mel@xxxxxxxxx (Mel Gorman) writes:
>> 
>> > On (15/09/07 02:31), Goswin von Brederlow didst pronounce:
>> >> Mel Gorman <mel@xxxxxxxxx> writes:
>> >> Looking at my
>> >> little test program evicting movable objects from a mixed group should
>> >> not be that expensive as it doesn't happen often.
>> >
>> > It happens regularly if the size of the block you need to keep clean is
>> > lower than min_free_kbytes. In the case of hugepages, that was always
>> > the case.
>> 
>> That assumes that the number of groups allocated for unmovable objects
>> will continiously grow and shrink.
>
> They do grow and shrink. The number of pagetables in use changes for
> example.

By numbers of groups worth? And full groups get free, unmixed and
filled by movable objects?

>> I'm assuming it will level off at
>> some size for long times (hours) under normal operations.
>
> It doesn't unless you assume the system remains in a steady state for it's
> lifetime. Things like updatedb tend to throw a spanner into the works.

Moved to cron weekly here. And even normally it is only once a day. So
what if it starts moving some pages while updatedb runs. If it isn't
too braindead it will reclaim some dentries updated has created and
left for good. It should just cause the dentry cache to be smaller at
no cost. I'm not calling that normal operations. That is a once a day
special. What I don't want is to spend 1% of cpu time copying
pages. That would be unacceptable. Copying 1000 pages per updatedb run
would be trivial on the other hand.

>> There should
>> be some buffering of a few groups to be held back in reserve when it
>> shrinks to prevent the scenario that the size is just at a group
>> boundary and always grows/shrinks by 1 group.
>> 
>
> And what size should this group be that all workloads function?

1 is enough to prevent jittering. If you don't hold a group back and
you are exactly at a group boundary then alternatingly allocating and
freeing one page would result in a group allocation and freeing every
time. With one group in reserve you only get an group allocation or
freeing when a groupd worth of change has happened.

This assumes that changing the type and/or state of a group is
expensive. Takes time or locks or some such. Otherwise just let it
jitter.

>> >> So if
>> >> you evict movable objects from mixed group when needed all the
>> >> pagetable pages would end up in the same mixed group slowly taking it
>> >> over completly. No fragmentation at all. See how essential that
>> >> feature is. :)
>> >> 
>> >
>> > To move pages, there must be enough blocks free. That is where
>> > min_free_kbytes had to come in. If you cared only about keeping 64KB
>> > chunks free, it makes sense but it didn't in the context of hugepages.
>> 
>> I'm more concerned with keeping the little unmovable things out of the
>> way. Those are the things that will fragment the memory and prevent
>> any huge pages to be available even with moving other stuff out of the
>> way.
>
> That's fair, just not cheap

That is the price you pay. To allocate 2MB of ram you have to have 2MB
of free ram or make them free. There is no way around that. Moving
pages means that you can actually get those 2MB even if the price
is high and that you have more choice deciding what to throw away or
swap out. I would rather have a 2MB malloc take some time than have it
fail because the kernel doesn't feel like it.

>> Can you tell me how? I would like to do the same.
>> 
>
> They were generated using trace_allocmap kernel module in
> http://www.csn.ul.ie/~mel/projects/vmregress/vmregress-0.81-rc2.tar.gz
> in combination with frag-display in the same package.  However, in the
> current version against current -mm's, it'll identify some movable pages
> wrong. Specifically, it will appear to be mixing movable pages with slab
> pages and it doesn't identify SLUB pages properly at all (SLUB came after
> the last revision of this tool). I need to bring an annotation patch up to
> date before it can generate the images correctly.

Thanks. I will test that out and see what I get on a few lustre
servers and clients. That is probably quite a different workload from
what you test.

MfG
        Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html