On 1/3/2025 6:45 AM, Lorenzo Stoakes wrote: > On Mon, Dec 23, 2024 at 10:12:42PM +0100, David Hildenbrand wrote: >> On 23.12.24 12:10, Lorenzo Stoakes wrote: >>> Peter - could you drop this patch for now until I have a chance to take a >>> look at this issue on my return on 2nd Jan? >>> >>> On Fri, Dec 20, 2024 at 10:53:14PM +0100, David Hildenbrand wrote: >>>> On 20.12.24 22:29, David Hildenbrand wrote: >>>>> On 20.12.24 20:36, Chen, Zide wrote: >>>>>> >>>>>> >>>>>> On 12/20/2024 1:56 AM, David Hildenbrand wrote: >>>>>>> On 20.12.24 10:31, Lorenzo Stoakes wrote: >>>>>>>> On Thu, Dec 19, 2024 at 01:17:44PM -0800, Chen, Zide wrote: >>>>>>>> >>>>>>>>> With this patch, it seems perf tool has some problems in capturing the >>>>>>>>> kernel data with Intel PT. >>>>>>>>> >>>>>>>>> Running the following commands, the size of perf.data is very small, and >>>>>>>>> perf script can't find any valid records. >>>>>>>>> >>>>>>>>> perf record -e intel_pt//u -- /bin/ls >>>>>>>>> perf script --insn-trace >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I'm on leave (and should really go back to relaxing :>), returning on 2nd >>>>>>>> Jan so can't really dig into this. >>>>>>>> >>>>>>>> But I tried it on my intel box and it 'works on my machine' with and >>>>>>>> without patch with commands provided, so I'm not sure this is actually a >>>>>>>> product of this change (which shouldn't impact this). >>>>>>> >>>>>>> Zide Chen, can you try with and without this patch to see if it >>>>>>> introduces the issue? >>>>>> >>>>>> Yes, I re-did the test on a SPR server, and the result is same. Without >>>>>> the patch, it went well; But with it, "perf script --insn-trace" doesn't >>>>>> show valid records. >>>>>> >>>>>> This time I tested it on the clean 6.13-rc1 tag, base commit >>>>>> 40384c840ea1944d7c5a392e8975ed088ecf0b37 >>>>>> >>>>>> Also, with this patch, running tools/perf/tests/shell/test_intel_pt.sh: >>>>>> >>>>>> Error: >>>>>> The - data has no samples! >>>>> >>>>> I just tested it on 6.13-rc1 vs. 6.13-rc1 with this patch. >>>>> >>>>> Indeed, there is quite difference. Below are the main parts that changed, only. >>>>> >>>>> We seem to be recording data, but maybe what we record gets corrupted somehow? >>>> >>>> Huge parts of the new file are full of 0s. Either we are mapping the wrong >>>> pages, or reading from the pages (via PFNMAP) does not work as expected. >>>> >>> >>> Thanks David, and apologies Zide, appears there is an issue here clearly. >>> >>> Could you try this with sudo operations? I was doing this locally and I >>> wonder if there is now a permissioning error? >> >> I ran it under root. >> >>> >>> I'd be surprised if pfn map would cause an issue here as it should just >>> directly map the kernel memory, however if the PT code assumes there will >>> be faults there could be an issue. I did take a brief look at this last >>> week and it seems the PT stuff relies on the aux functionality, so that >>> could also be a source of problems here. >> >> I started a bit at that code, no clue yet what's happening. >> >> I was wondering if we end up mapping the wrong pages, meaning: the pages at >> mmap time end up being different to the pages later at fault time. The code >> is a bit confusing, but I thought we cannot change the effective event/pages >> while we have an active mmap. Maybe there is some corner case ... > > OK I figured it out... it's a very silly mistake on my part (oh how this is so > often the case :). > > When we map the pages, we do not offset by vma->vm_pgoff when looking up the > page, so if you map with an offset (as presumably the intel pt stuff is), it is > then retrieving the wrong pages). > > This also resolves the apparent need for sudo... > > Very silly mistake. Apologies :>) > > I will send a v4 in a second. > > Zide - could you give v4 a test when I send it out just to confirm it resolves > your issue? I will cc- you on this. > > Thanks again for your report, and apologies for the noise! I can confirm that v4 works for my Intel PT tests! -Zide >> >> Nothing else really jumped at me ... moving the mapping og pages after the >> event_mapped() callback also didn't change anything. >> >>> >>> I am on leave at the moment returning on 2nd Jan, I will look at this as a >>> priority when I return, as you can see above I've asked Peter to drop this >>> for now. >> >> Enjoy your time off an Happy Holidays! >> >> -- >> Cheers, >> >> David / dhildenb >> > > Cheers, Lorenzo