Patch "Partially revert "perf/arm-cmn: Optimise DTC counter accesses"" has been added to the 6.1-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    Partially revert "perf/arm-cmn: Optimise DTC counter accesses"

to the 6.1-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     partially-revert-perf-arm-cmn-optimise-dtc-counter-a.patch
and it can be found in the queue-6.1 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit a257f5b31e7db0a6a7ae51e3d07a651b2e71653b
Author: Robin Murphy <robin.murphy@xxxxxxx>
Date:   Mon Jan 23 18:30:38 2023 +0000

    Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
    
    [ Upstream commit a428eb4b99ab80454f06ad256b25e930fe8a4954 ]
    
    It turns out the optimisation implemented by commit 4f2c3872dde5 is
    totally broken, since all the places that consume hw->dtcs_used for
    events other than cycle count are still not expecting it to be sparsely
    populated, and fail to read all the relevant DTC counters correctly if
    so.
    
    If implemented correctly, the optimisation potentially saves up to 3
    register reads per event update, which is reasonably significant for
    events targeting a single node, but still not worth a massive amount of
    additional code complexity overall. Getting it right within the current
    design looks a fair bit more involved than it was ever intended to be,
    so let's just make a functional revert which restores the old behaviour
    while still backporting easily.
    
    Fixes: 4f2c3872dde5 ("perf/arm-cmn: Optimise DTC counter accesses")
    Reported-by: Ilkka Koskinen <ilkka@xxxxxxxxxxxxxxxxxxxxxx>
    Signed-off-by: Robin Murphy <robin.murphy@xxxxxxx>
    Link: https://lore.kernel.org/r/b41bb4ed7283c3d8400ce5cf5e6ec94915e6750f.1674498637.git.robin.murphy@xxxxxxx
    Signed-off-by: Will Deacon <will@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index b80a9b74662b..1deb61b22bc7 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1576,7 +1576,6 @@ static int arm_cmn_event_init(struct perf_event *event)
 			hw->dn++;
 			continue;
 		}
-		hw->dtcs_used |= arm_cmn_node_to_xp(cmn, dn)->dtc;
 		hw->num_dns++;
 		if (bynodeid)
 			break;
@@ -1589,6 +1588,12 @@ static int arm_cmn_event_init(struct perf_event *event)
 			nodeid, nid.x, nid.y, nid.port, nid.dev, type);
 		return -EINVAL;
 	}
+	/*
+	 * Keep assuming non-cycles events count in all DTC domains; turns out
+	 * it's hard to make a worthwhile optimisation around this, short of
+	 * going all-in with domain-local counter allocation as well.
+	 */
+	hw->dtcs_used = (1U << cmn->num_dtcs) - 1;
 
 	return arm_cmn_validate_group(cmn, event);
 }



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux