On Mon, Nov 18, 2024 at 2:11 PM Roman Gushchin <roman.gushchin@xxxxxxxxx> wrote: > > On Sat, Nov 16, 2024 at 05:59:16PM +0000, Pasha Tatashin wrote: > > Page Detective is a new kernel debugging tool that provides detailed > > information about the usage and mapping of physical memory pages. > > > > It is often known that a particular page is corrupted, but it is hard to > > extract more information about such a page from live system. Examples > > are: > > > > - Checksum failure during live migration > > - Filesystem journal failure > > - dump_page warnings on the console log > > - Unexcpected segfaults > > > > Page Detective helps to extract more information from the kernel, so it > > can be used by developers to root cause the associated problem. > > > > It operates through the Linux debugfs interface, with two files: "virt" > > and "phys". > > > > The "virt" file takes a virtual address and PID and outputs information > > about the corresponding page. > > > > The "phys" file takes a physical address and outputs information about > > that page. > > > > The output is presented via kernel log messages (can be accessed with > > dmesg), and includes information such as the page's reference count, > > mapping, flags, and memory cgroup. It also shows whether the page is > > mapped in the kernel page table, and if so, how many times. > > This looks questionable both from the security and convenience points of view. > Given the request-response nature of the interface, the output can be > provided using a "normal" seq-based pseudo-file. We opted to use dmesg for output because it's the standard method for capturing kernel information and is commonly included in bug reports. Introducing a new file would require modifying existing data collection scripts used for reporting, so this approach minimizes disruption to existing workflows. > But I have a more generic question: > doesn't it make sense to implement it as a set of drgn scripts instead > of kernel code? This provides more flexibility, is safer (even if it's buggy, > you won't crash the host) and should be at least in theory equally > powerful. Regarding your suggestion, our plan is to perform reverse lookups in all page tables: kernel, user, IOMMU, and KVM. Currently, we only traverse the kernel and user page tables, but we intend to extend this functionality to IOMMU and KVM tables in future updates, I am not sure if drgn can provide this level of details within a reasonable amount of time. This approach will be helpful for debugging memory corruption scenarios. Often, external mechanisms detect corruption but require kernel-level information for root cause analysis. In our experience, invalid mappings persist in page tables for a period after corruption, providing a window to identify other users of the corrupted page via timely reverse lookup. Additionally, using crash/drgn is not feasible for us at this time, it requires keeping external tools on our hosts, also it requires approval and a security review for each script before deployment in our fleet. Thanks, Pasha