Re: Generating Log of Guest Physical Addresses from a Kernel Function and Perform Analysis at Runtime

"Valdis Klētnieks" <valdis.kletnieks@xxxxxx> · Tue, 24 Sep 2019 14:55:21 -0400

On Tue, 24 Sep 2019 20:26:36 +0900, Sahibzada Irfanullah said:

> After having a reasonable amount  of log data,

If you're trying to figure out how the kernel memory manager is working, you're
probably better off using 'perf'  or one of the other tracing tools already in
the kernel to track the kernel memory manager. For starters, you can get those
tools to give you things like stack tracebacks so you know who is asking for a
page, and who is *releasing* a page, and so on.

Of course, which of these tools to use depends on what data you need to answer
the question - but simply knowing what physical address was involved in a page
fault is almost certainly not going to be sufficient.

> I want to perform some type of analsys at run time, e.g., no. of unique
> addresses, total no. of addresses, frequency of occurences of each addresses
> etc.

So what "some type of analysis" are you trying to do? What question(s)
are you trying to answer? 

The number of unique physical addresses in your system is dictated by how much
RAM you have installed. Similarly for total number of addresses, although I'm
not sure why you list both - that would mean that there is some number of
non-unique addresses.  What would that even mean?

The number of pages actually available for paging and caching depends on other
things as well - the architecture of the system, how much RAM (if any) is
reserved for use by your video card, the size of the kernel, the size of loaded
modules, space taken up by kmalloc allocations, page tables, whether any
processes have called mlock() on a large chunk of space, whether the pages are
locked by the kernel because there's I/O going on, and then there's things like
mmap(), and so on.

The kernel provides /proc/meminfo and /proc/slabinfo - you're going to want
to understand all that stuff before you can make sense of anything.

Simply looking at the frequency of occurrences of each address is probably not
going to tell you much of anything, because you need to know things like
the total working and resident set sizes for the process and other context.

For example - you do the analysis, and find that there are 8 gigabytes of pages
that are constantly being re-used.  But that doesn't tell you if there are two
processes that are thrashing against each other because each is doing heavy
repeated referencing of 6 gigabytes of data, or if one process is wildly referencing
many pages because some programmer has a multi-dimensional array and is
walking across the array with the indices in the wrong order

i_max = 4095; j_max = 4095;
for (i = 0, i < i_max; i++) for j = 0, j < j_max; j++) {sum += foo[i][j]}

If somebdy is doing foo[j][i] instead, things can get ugly.  And if you're
mixing with Fortran code, where the semantics of array references is reverse
and you *want* to use 'foo[j][i]' for efficient memory access, it's a bullet loaded
in the chamber and waiting for somebody to pull the trigger.

Not that I've ever seen *that* particular error happen with a programmer
processing 2 terabytes of arrays on a machine that only had 1.5 terabytes of
RAM.  But I did tease the person involved about it, because they *really*
should have known better. :)

So again:  What question(s) are you trying to get answers to?

Attachment:
pgptr648_MyyI.pgp

Description: PGP signature
_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies