Hello Rob. I'm debugging a device-tree-related issue with latest stable Linux release on my RISC-V board. After extensive debugging I traced the issue into your recent patch: of/irq: Factor out parsing of interrupt-map parent phandle+args from of_irq_parse_raw() Specifically this change: @@ -300,7 +327,6 @@ int of_irq_parse_raw(const __be32 *addr, struct of_phandle_args *out_irq) skiplevel: /* Iterate again with new parent */ - out_irq->np = newpar; pr_debug(" -> new parent: %pOF\n", newpar); of_node_put(ipar); ipar = newpar; I couldn't finger out exactly why this line of code was removed, but apparently under some conditions, this change causes 'out_irq->np' never being updated to correct node, which is its interrupt parent. The problematic code flow go through the following code: /* No interrupt map, check for an interrupt parent */ if (imap == NULL) { pr_debug(" -> no map, getting parent\n"); newpar = of_irq_find_parent(ipar); goto skiplevel; } after jumping to 'skiplevel', I think 'out_irq->np' is supposed to be updated to the new parent. Becasue 'out_irq->np' is never updated in function 'of_irq_parse_raw', and 'out_irq->np' is expected to have its reference count increased after a successful return from 'of_irq_parse_raw', caller will later calling 'of_node_put' for the wrong 'out_irq->np', corrupting its reference count; also the reference count for 'newpar' will be leaked. When populating device tree on bootup, one particular device node of my board triggered this issue repeatly, eventually causing an underflow of the reference count of that node. A fragment of kernel message log is attached showing node 'usbdrd' with corrupted reference count (I added some pr_debug calls to print the reference count), and other issues caused by that. The device tree blob for my board is attached as well.
Attachment:
kmsg
Description: Binary data
Attachment:
jh7110-visionfive-v2.dtb
Description: Binary data