On 04/12/2010 11:28 AM, Nick Piggin wrote:
We use the "try" tactic extensively. So long as there's a
reasonable chance of success, and a reasonable fallback on failure,
it's fine.
Do you think we won't have reasonable success rates? Why?
After the memory is fragmented? It's more or less irriversable. So
success rates (to fill a specific number of huges pages) will be fine
up to a point. Then it will be a continual failure.
So we get just a part of the win, not all of it.
Sure, some workloads simply won't trigger fragmentation problems.
Others will.
Some workloads benefit from readahead. Some don't. In fact, readahead
has a higher potential to reduce performance.
Same as with many other optimizations.
Why? If you can isolate all the pointers into the dentry, allocate
the new dentry, make the old one point into the new one, hash it,
move the pointers, drop the old dentry.
Difficult, yes, but insane?
Yes.
Well, I'll accept what you say since I'm nowhere near as familiar with
the code. But maybe someone insane will come along and do it.
Caches have statistical performance. In the long run they average
out. In the short run they can behave badly. Same thing with large
pages, except the runs are longer and the wins are smaller.
You don't understand. Caches don't suddenly or slowly stop working.
For a particular pattern of workload, they statistically pretty much
work the same all the time.
Yet your effective cache size can be reduced by unhappy aliasing of
physical pages in your working set. It's unlikely but it can happen.
For a statistical mix of workloads, huge pages will also work just
fine. Perhaps not all of them, but most (those that don't fill _all_ of
memory with dentries).
Database are the easiest case, they allocate memory up front and
don't give it up. We'll coalesce their memory immediately and
they'll run happily ever after.
Again, you're thinking about a benchmark setup. If you've got various
admin things, backups, scripts running, probably web servers,
application servers etc. Then it's not all that simple.
These are all anonymous/pagecache loads, which we deal with well.
And yes, Linux works pretty well for a multi-workload platform. You
might be thinking too much about virtualization where you put things
in sterile little boxes and take the performance hit.
People do it for a reason.
Virtualization will fragment on overcommit, but the load is all
anonymous memory, so it's easy to defragment. Very little dcache on
the host.
If virtualization is the main worry (which it seems that it is
seeing as your TLB misses cost like 6 times more cachelines),
(just 2x)
then complexity should be pushed into the hypervisor, not the
core kernel.
The whole point behind kvm is to reuse the Linux core. If we have to
reimplement Linux memory management and scheduling, then it's a failure.
Well, I'm not against it, but that would be a much more intrusive
change than what this thread is about. Also, you'd need 4K dentries
etc, no?
No. You'd just be defragmenting 4K worth of dentries at a time.
Dentries (and anything that doesn't care about untranslated KVA)
are trivial. Zero change for users of the code.
I see.
This is going off-topic though, I don't want to hijack the thread
with talk of nonlinear kernel.
Too bad, it's interesting.
Mostly we need a way of identifying pointers into a data structure,
like rmap (after all that's what makes transparent hugepages work).
And that involves auditing and rewriting anything that allocates
and pins kernel memory. It's not only dentries.
Not everything, just the major users that can scale with the amount of
memory in the machine.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>