Re: Generating Log of Guest Physical Addresses from a Kernel Function and Perform Analysis at Runtime

Sahibzada Irfanullah <irfan.gomalian@xxxxxxxxx> · Wed, 25 Sep 2019 11:44:49 +0900

1 > Have you tried that today?  I doubt you need any kernel changes at all to get this information directly from the kernel to userspace.  
I also feel the same. Because I have to written these information to the file, as well as read from the file in the same kernel function, i.e, handle_ept_voilation(). I want the file which stores text (e.g., CSV), not bytes so that I can also open it using Open Office etc. 
Thanks. I will try ftrace, and debugfs. I am not sure ftrace, but may be debugfs may help some what.

2 > For starters, you can get those tools to give you things like stack tracebacks so you know who is asking for a page, and who is *releasing* a page, and so on 
At the start my goal is to generating log of physical addresses involved in page faults. Further, I will extend my program to store other informations to the file, like as you said, which process is requesting/releasing the page, and which instruction address refered to which memory reference which was not present in the memory, how many times an address was involved in a page fault etc.)

3 > So what "some type of analysis" are you trying to do? What question(s) are you trying to answer?
   Uptill now I want to perform simple analysis mentioned in the above question 2. Morevoer, this analysis will provide details about the instruction address that is responsible for page fault along with memory reference that is no present, the appliction that generated this page fault, and for a single address how many times a page fault occured etc.  
By unique and non-unique, I  meant the list of addresses in the log without duplication. For example, we have log of addresses [1,2,2,3,3,4,3,3,4,4,4,1, 4]. In this list unique addresses are 1,2,3,4, and frquency of each address is 2,2, 4,5 respectively.   At this stage I want to keep things very simple by ignoring the details like size of the RAM, size of kernel, size of loaded modules etc. Briefly I can say I want to generate the log for guest physical addresses involved in page fault, the corresponding instruction address, coressponsing logical address, along with the corresponding application. 

At the first stage, I am trying to develope an application that provides some basic functionalities (i.e., instruction instrumentation) of Pin Tool for just guest physical address by tracing instruction addresses, memory referencees, and save it to the file. And the file can not only be accessible from within the kernel, but also can be opened using any word processing application ,e.g., csv or .txt file.
Thank you very much for the help.

On Wed, 25 Sep 2019 at 03:55, Valdis Klētnieks <valdis.kletnieks@xxxxxx> wrote:
On Tue, 24 Sep 2019 20:26:36 +0900, Sahibzada Irfanullah said:

> After having a reasonable amount  of log data,

If you're trying to figure out how the kernel memory manager is working, you're

probably better off using 'perf'  or one of the other tracing tools already in

the kernel to track the kernel memory manager. For starters, you can get those

tools to give you things like stack tracebacks so you know who is asking for a

page, and who is *releasing* a page, and so on.

Of course, which of these tools to use depends on what data you need to answer

the question - but simply knowing what physical address was involved in a page

fault is almost certainly not going to be sufficient.

> I want to perform some type of analsys at run time, e.g., no. of unique

> addresses, total no. of addresses, frequency of occurences of each addresses

> etc.

So what "some type of analysis" are you trying to do? What question(s)

are you trying to answer? 

The number of unique physical addresses in your system is dictated by how much

RAM you have installed. Similarly for total number of addresses, although I'm

not sure why you list both - that would mean that there is some number of

non-unique addresses.  What would that even mean?

The number of pages actually available for paging and caching depends on other

things as well - the architecture of the system, how much RAM (if any) is

reserved for use by your video card, the size of the kernel, the size of loaded

modules, space taken up by kmalloc allocations, page tables, whether any

processes have called mlock() on a large chunk of space, whether the pages are

locked by the kernel because there's I/O going on, and then there's things like

mmap(), and so on.

The kernel provides /proc/meminfo and /proc/slabinfo - you're going to want

to understand all that stuff before you can make sense of anything.

Simply looking at the frequency of occurrences of each address is probably not

going to tell you much of anything, because you need to know things like

the total working and resident set sizes for the process and other context.

For example - you do the analysis, and find that there are 8 gigabytes of pages

that are constantly being re-used.  But that doesn't tell you if there are two

processes that are thrashing against each other because each is doing heavy

repeated referencing of 6 gigabytes of data, or if one process is wildly referencing

many pages because some programmer has a multi-dimensional array and is

walking across the array with the indices in the wrong order

i_max = 4095; j_max = 4095;

for (i = 0, i < i_max; i++) for j = 0, j < j_max; j++) {sum += foo[i][j]}

If somebdy is doing foo[j][i] instead, things can get ugly.  And if you're

mixing with Fortran code, where the semantics of array references is reverse

and you *want* to use 'foo[j][i]' for efficient memory access, it's a bullet loaded

in the chamber and waiting for somebody to pull the trigger.

Not that I've ever seen *that* particular error happen with a programmer

processing 2 terabytes of arrays on a machine that only had 1.5 terabytes of

RAM.  But I did tease the person involved about it, because they *really*

should have known better. :)

So again:  What question(s) are you trying to get answers to?

-- 
Regards,
Mr. Irfanullah

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies