On Tue, 2021-11-09 at 09:48 +0100, Peter Zijlstra wrote: > On Mon, Nov 08, 2021 at 06:26:13PM -0800, Ricardo Neri wrote: > > On Mon, Nov 08, 2021 at 10:07:40AM +0100, Peter Zijlstra wrote: > > > On Fri, Nov 05, 2021 at 06:33:10PM -0700, Ricardo Neri wrote: > > > > > +static void hfi_update_work_fn(struct work_struct *work) > > > > +{ > > > > + struct hfi_instance *hfi_instance; > > > > + > > > > + hfi_instance = container_of(to_delayed_work(work), > > > > struct hfi_instance, > > > > + update_work); > > > > + if (!hfi_instance) > > > > + return; > > > > + > > > > + /* TODO: Consume update here. */ > > > > > > // this here uses ->event_lock to serialize against the > > > // interrupt below changing the data... > > > > Anyone reading the HFI table would need to take ->event_lock. > > Right.. that implies ->event_lock can be taken while there is no > interrupt active, which then necessitates the additional lock. > Correct. With the raw_spin_trylock() optimization, we will need additional lock. So need another lock to protect hfi_instance->table_base. > > > > +} > > > > + > > > > +void intel_hfi_process_event(__u64 pkg_therm_status_msr_val) > > > > +{ > > > > + struct hfi_instance *hfi_instance; > > > > + int cpu = smp_processor_id(); > > > > + struct hfi_cpu_info *info; > > > > + unsigned long flags; > > > > + u64 timestamp; > > > > + > > > > + if (!pkg_therm_status_msr_val) > > > > + return; > > > > + > > > > + info = &per_cpu(hfi_cpu_info, cpu); > > > > + if (!info) > > > > + return; > > > > [...] > > > > + memcpy(hfi_instance->table_base, hfi_instance- > > > > >hw_table, > > > > + hfi_features.nr_table_pages << PAGE_SHIFT); > > I think we actually need to release ->interrupt_lock here, *before* > the > WRMSR that ACKs the HFI update. Because I think the moment that WRMSR > goes through we can get another interrupt, and that *must* not find > ->interrupt_lock taken, otherwise it will not process the update > etc.. > leading to lost interrupts. Correct. Once we use raw_spin_trylock() change suggested above, then we need to release lock here. Thanks, Srinivas > > > > > + /* > > > > + * Let hardware and other CPUs know that we are done > > > > reading the HFI > > > > + * table and it is free to update it again. > > > > + */ > > > > + pkg_therm_status_msr_val &= THERM_STATUS_CLEAR_PKG_MASK > > > > & > > > > + > > > > ~PACKAGE_THERM_STATUS_HFI_UPDATED; > > > > + wrmsrl(MSR_IA32_PACKAGE_THERM_STATUS, > > > > pkg_therm_status_msr_val); > > > > + schedule_delayed_work(&hfi_instance->update_work, > > > > HFI_UPDATE_INTERVAL); > > > > + > > > > +unlock_spinlock: > > > > + raw_spin_unlock_irqrestore(&hfi_instance->event_lock, > > > > flags); > > > > > > raw_spin_unlock(&hfi_instance->interrupt_lock); > > > > ... and here we release both locks. > > See above.