Re: [Intel-xe] [PATCH v6 1/5] drm/xe/hwmon: Expose power attributes

Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> · Wed, 4 Oct 2023 12:11:25 -0400

On Wed, Oct 04, 2023 at 11:56:33AM -0400, Rodrigo Vivi wrote:
> On Wed, Oct 04, 2023 at 12:13:06PM +0530, Nilawar, Badal wrote:
> > Hi Anshutosh,
> > 
> > On 04-10-2023 06:22, Dixit, Ashutosh wrote:
> > > On Fri, 29 Sep 2023 14:41:22 -0700, Dixit, Ashutosh wrote:
> > > > 
> > > 
> > > Hi Badal,
> > > 
> > > Why did you merge the hwmon patches when there is still open discussion
> > > below on the patches? According to upstream rules (I'm not sure if you know
> > > about this) you should not merge patches, even if you have R-b's on the
> > > patches, till all review comments are resolved.
> > > 
> > > Generally you are expected to either address the comments or reply to the
> > > comments are at least inform that you are merging, disregarding the > comments. IMO you should at least have done one of these before merging.
> > 
> > I did selective merging. I haven't merged 5th patch yet as locking is still
> > in discussion. I am working on addressing locking and thought I will address
> > some of your comments with it.

There was still an open discussion going around the (merged) patch 1, regarding
the positioning of the the mem_access get/put. So, next time hold a bit before
pushing. But the positioning of those mem_access get/put are safe although maybe
not ideal... (needed?!). Anyway that can be a follow up fix or improvement and
I'm okay with the way it currently is in the code.

> 
> Just to ensure the split is clear to everyone and that we have CI running on
> the exact chunk that is getting merged, next time, please split the series,
> rebase and resend the ones that are ready. you might even use --subject-prefix=CI
> 
> and as always, let's not rush things in and be sure that all questions
> and concerns are addressed.
> 
> Thanks,
> Rodrigo.
> 
> > 
> > Thanks,
> > Badal
> > > 
> > > Cc: @Vivi, Rodrigo
> > > 
> > > Thanks.
> > > --
> > > Ashutosh
> > > 
> > > 
> > > > On Fri, 29 Sep 2023 09:48:36 -0700, Dixit, Ashutosh wrote:
> > > > > On Thu, 28 Sep 2023 23:37:35 -0700, Nilawar, Badal wrote:
> > > > > > 
> > > > > > On 28-09-2023 10:25, Dixit, Ashutosh wrote:
> > > > > > > On Wed, 27 Sep 2023 01:39:46 -0700, Nilawar, Badal wrote:
> > > > > > > 
> > > > > > > > On 27-09-2023 10:23, Dixit, Ashutosh wrote:
> > > > > > > > > On Mon, 25 Sep 2023 01:18:38 -0700, Badal Nilawar wrote:
> > > > > > > > > > 
> > > > > > > > > > +static umode_t
> > > > > > > > > > +xe_hwmon_is_visible(const void *drvdata, enum hwmon_sensor_types type,
> > > > > > > > > > +		    u32 attr, int channel)
> > > > > > > > > > +{
> > > > > > > > > > +	struct xe_hwmon *hwmon = (struct xe_hwmon *)drvdata;
> > > > > > > > > > +	int ret;
> > > > > > > > > > +
> > > > > > > > > > +	xe_device_mem_access_get(gt_to_xe(hwmon->gt));
> > > > > > > > > 
> > > > > > > > > Maybe we do xe_device_mem_access_get/put in xe_hwmon_process_reg where it
> > > > > > > > > is needed? E.g. xe_hwmon_is_visible doesn't need to do this because it
> > > > > > > > > doesn't read/write registers.
> > > > > > > > Agreed, but visible function is called only once while registering hwmon
> > > > > > > > interface, which happen during driver probe. During driver probe device
> > > > > > > > will be in resumed state. So no harm in keeping
> > > > > > > > xe_device_mem_access_get/put in visible function.
> > > > > > > 
> > > > > > > To me it doesn't make any sense to keep xe_device_mem_access_get/put
> > > > > > > anywhere except in xe_hwmon_process_reg where the HW access actually
> > > > > > > happens. We can eliminate xe_device_mem_access_get/put's all over the place
> > > > > > > if we do it. Isn't it?
> > > > > > Agreed, thought process here suggest that take rpm wakeref at lowest
> > > > > > possible level. I already tried this in rfc series and in some extent in
> > > > > > rev2. There is problem with this approach. See my comments below.
> > > > > > > 
> > > > > > > The only restriction I have heard of (though not sure why) is that
> > > > > > > xe_device_mem_access_get/put should not be called under lock>. Though I am
> > > > > > > not sure it is for spinlock or also mutex. So as we were saying the locking
> > > > > > > will also need to move to xe_hwmon_process_reg.
> > > > > > Yes from rev2 comments its dangerous to take mutex before
> > > > > > xe_device_mem_access_get/put. With code for "PL1 disable/restore during
> > > > > > resume" I saw deadlock. Scenario was power1_max write -> mutex lock -> rpm
> > > > > > resume -> disable pl1 -> mutex lock (dead lock here).
> > > > > 
> > > > > But this is already the wrong order as mentioned below. If we follow the
> > > > > below order do we still see deadlock?
> > > > > 
> > > > > > > 
> > > > > > > So:
> > > > > > > 
> > > > > > > xe_hwmon_process_reg()
> > > > > > > {
> > > > > > > 	xe_device_mem_access_get
> > > > > > > 	mutex_lock
> > > > > > > 	...
> > > > > > > 	mutex_unlock
> > > > > > > 	xe_device_mem_access_put
> > > > > > > }
> > > > > > > 
> > > > > > > So once again if this is not possible for some reason let's figure out why.
> > > > > > There are two problems with this approach.
> > > > > > 
> > > > > > Problem 1: If you see implementation of xe_hwmon_power_max_write, reg
> > > > > > access is happening 3 times, so there will be 3 rpm suspend/resume
> > > > > > cycles. I was observing the same with rfc implementation. So in subsequent
> > > > > > series xe_device_mem_access_put/get is moved to top level functions
> > > > > > i.e. hwmon hooks.
> > > > > 
> > > > > This is not exactly correct because there is also a 1 second autosuspend
> > > > > delay which will prevent such rpm suspend/resume cycles:
> > > > > 
> > > > > xe_pm_runtime_init:
> > > > > 	pm_runtime_set_autosuspend_delay(dev, 1000);
> > > > > 
> > > > > 
> > > > > > 
> > > > > > Problem 2: If locking moved inside xe_hwmon_process_reg then between two
> > > > > > subsequent reg accesses it will open small window during which race can
> > > > > > happen.
> > > > > > As Anshuman suggested in other thread for read are sequential and protected
> > > > > > by sysfs layer. So lets apply locking only for RW attributes.
> > > > > 
> > > > > But what is the locking trying to protect? As far as I understand it is
> > > > > just the registers which have to be atomically modified/read. So it seems
> > > > > sufficient to just protect the register accesses with the lock.
> > > > > 
> > > > > So I am still not convinced.
> > > > 
> > > > Let's figure out the locking first depending on what needs to be protected
> > > > (just registers or other data too). And then we can see where to put the
> > > > xe_device_mem_access_get/put's (following the rule that
> > > > xe_device_mem_access_get/put's should not be called under lock).