Thanks for the replies. I will comment on the other post later this
evening, but I'd like
to clarify a bit what I'm looking for.
On Jun 24, 2008, at 2:19 PM, John Fine wrote:
Brian Dessent wrote:
This is mostly how the existing the 'small' memory model works,
which is
the default. All code and data are in the lower 2GB of address
space,
which allows the use of 32 bit relocs and 32 bit PC-relative branches
which saves a lot of overhead in the common cases.
IIUC, that only applies to addresses that must be known at load
time. Addresses not known until run time may go beyond 4GB even in
a small memory model.
For this discussion, there are three types of data objects:
1) Objects whose addresses are known at load time.
2) Objects allocated on the stack.
3) Objects allocated in the heap.
If I understand the point of this thread, the main focus would be
on (3). Small memory model only addresses (1).
I'm interested in 1, 2 and 3. I have a discrete event simulator where
the kernel has to maintain pointers to objects
created by the user (hence i have no say on whether it will be 1, 2
or 3).
The size of the pointers is important (for instance in making the
priority queue used as a time wheel more cache
friendly).
4GB is big enough that you could fit the heap and stack in there as
well for most programs. But I don't know enough about the loader
in Linux to know if it could cooperate.
I think the normal linker and its System V scripts would be able to
do this. Not sure about the new gold linker, but I
would expect so, given that it has linking a Linux kernel as a goal.
But in my case it wouldn't be ideal, as I'm building a library and
forcing users to non-standard linker scripts is
not nice.
For my own purposes, cramming the whole heap and stack into 4GB
would defeat the whole purpose. If the problem is so small that
everything fits in 4GB, it is less likely to be so non localized
that 64 bit pointers greatly hurt the L2 cache. I care about the
case where an identified subset of the data could be allocated from
a 4GB pool and there are a LOT of pointers into or within that
subset of data.
I'm not sure which case the OP cares about.
Probably not too different from yours, but in my case I get lot of
cache trashing because I'm running hundreds of user-level threads in
parallel. Even if individually they would be reasonably well behaved,
together they make a mess.
Thanks again for the answers,
Maurizio