Re: [PATCH 5/7] thermal: intel: hfi: Enable notification interrupt

Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx> · Tue, 09 Nov 2021 04:54:08 -0800

On Tue, 2021-11-09 at 09:48 +0100, Peter Zijlstra wrote:
> On Mon, Nov 08, 2021 at 06:26:13PM -0800, Ricardo Neri wrote:
> > On Mon, Nov 08, 2021 at 10:07:40AM +0100, Peter Zijlstra wrote:
> > > On Fri, Nov 05, 2021 at 06:33:10PM -0700, Ricardo Neri wrote:
> 
> > > > +static void hfi_update_work_fn(struct work_struct *work)
> > > > +{
> > > > +       struct hfi_instance *hfi_instance;
> > > > +
> > > > +       hfi_instance = container_of(to_delayed_work(work),
> > > > struct hfi_instance,
> > > > +                                   update_work);
> > > > +       if (!hfi_instance)
> > > > +               return;
> > > > +
> > > > +       /* TODO: Consume update here. */
> > > 
> > >         // this here uses ->event_lock to serialize against the
> > >         // interrupt below changing the data...
> > 
> > Anyone reading the HFI table would need to take ->event_lock.
> 
> Right.. that implies ->event_lock can be taken while there is no
> interrupt active, which then necessitates the additional lock.
> 
Correct.
With the raw_spin_trylock() optimization, we will need additional lock.
So need another lock to protect hfi_instance->table_base.

> > > > +}
> > > > +
> > > > +void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
> > > > +{
> > > > +       struct hfi_instance *hfi_instance;
> > > > +       int cpu = smp_processor_id();
> > > > +       struct hfi_cpu_info *info;
> > > > +       unsigned long flags;
> > > > +       u64 timestamp;
> > > > +
> > > > +       if (!pkg_therm_status_msr_val)
> > > > +               return;
> > > > +
> > > > +       info = &per_cpu(hfi_cpu_info, cpu);
> > > > +       if (!info)
> > > > +               return;
> > > > 

[...]

> > > > +       memcpy(hfi_instance->table_base, hfi_instance-
> > > > >hw_table,
> > > > +              hfi_features.nr_table_pages << PAGE_SHIFT);
> 
> I think we actually need to release ->interrupt_lock here, *before*
> the
> WRMSR that ACKs the HFI update. Because I think the moment that WRMSR
> goes through we can get another interrupt, and that *must* not find
> ->interrupt_lock taken, otherwise it will not process the update
> etc..
> leading to lost interrupts.

Correct.
Once we use raw_spin_trylock() change suggested above, then we need to
release lock here.

Thanks,
Srinivas

> 

> > > > +       /*
> > > > +        * Let hardware and other CPUs know that we are done
> > > > reading the HFI
> > > > +        * table and it is free to update it again.
> > > > +        */
> > > > +       pkg_therm_status_msr_val &= THERM_STATUS_CLEAR_PKG_MASK
> > > > &
> > > > +                                  
> > > > ~PACKAGE_THERM_STATUS_HFI_UPDATED;
> > > > +       wrmsrl(MSR_IA32_PACKAGE_THERM_STATUS,
> > > > pkg_therm_status_msr_val);
> > > > +       schedule_delayed_work(&hfi_instance->update_work,
> > > > HFI_UPDATE_INTERVAL);
> > > > +
> > > > +unlock_spinlock:
> > > > +       raw_spin_unlock_irqrestore(&hfi_instance->event_lock,
> > > > flags);
> > > 
> > >         raw_spin_unlock(&hfi_instance->interrupt_lock);
> > 
> > ... and here we release both locks.
> 
> See above.