On Mon, Nov 08, 2021 at 06:26:13PM -0800, Ricardo Neri wrote: > On Mon, Nov 08, 2021 at 10:07:40AM +0100, Peter Zijlstra wrote: > > On Fri, Nov 05, 2021 at 06:33:10PM -0700, Ricardo Neri wrote: > > > +static void hfi_update_work_fn(struct work_struct *work) > > > +{ > > > + struct hfi_instance *hfi_instance; > > > + > > > + hfi_instance = container_of(to_delayed_work(work), struct hfi_instance, > > > + update_work); > > > + if (!hfi_instance) > > > + return; > > > + > > > + /* TODO: Consume update here. */ > > > > // this here uses ->event_lock to serialize against the > > // interrupt below changing the data... > > Anyone reading the HFI table would need to take ->event_lock. Right.. that implies ->event_lock can be taken while there is no interrupt active, which then necessitates the additional lock. > > > +} > > > + > > > +void intel_hfi_process_event(__u64 pkg_therm_status_msr_val) > > > +{ > > > + struct hfi_instance *hfi_instance; > > > + int cpu = smp_processor_id(); > > > + struct hfi_cpu_info *info; > > > + unsigned long flags; > > > + u64 timestamp; > > > + > > > + if (!pkg_therm_status_msr_val) > > > + return; > > > + > > > + info = &per_cpu(hfi_cpu_info, cpu); > > > + if (!info) > > > + return; > > > + > > > + /* > > > + * It is possible that we get an HFI thermal interrupt on this CPU > > > + * before its HFI instance is initialized. This is not a problem. The > > > + * CPU that enabled the interrupt for this package will also get the > > > + * interrupt and is fully initialized. > > > + */ > > > + hfi_instance = info->hfi_instance; > > > + if (!hfi_instance) > > > + return; > > > + > > > > /* > > * If someone is already handling the interrupt, we shouldn't be > > * burning time waiting for them to then do more nothing. > > */ > > if (!raw_spin_trylock(&hfi_instance->interrupt_lock)) > > return; > > > > > > > + raw_spin_lock_irqsave(&hfi_instance->event_lock, flags); > > The CPU who acquired ->interrupt_lock successfully now will acquire > ->event_lock to serialize writes and reads to the HFI table. Right, so ->interrupt_lock is purely used to serialize interrupts, and only one interrupt gets to do the update, while the others can exit and resume with what they were doing asap, without wasting cycles spinning on ->event_lock only to then not do anything. > > > + /* > > > + * On most systems, all CPUs in the package receive a package-level > > > + * thermal interrupt when there is an HFI update. Since they all are > > > + * dealing with the same update (as indicated by the update timestamp), > > > + * it is sufficient to let a single CPU to acknowledge the update and > > > + * schedule work to process it. > > > + */ > > > + timestamp = *(u64 *)hfi_instance->hw_table; > > > + if (hfi_instance->timestamp >= timestamp) > > > + goto unlock_spinlock; > > > > This can go the way of the dodo. > > (I guess I can still check the timestamp in case buggy firmware triggers > updates with the same timestamp, right?) Sure.. > > > > > + > > > + hfi_instance->timestamp = timestamp; > > > + > > > + memcpy(hfi_instance->table_base, hfi_instance->hw_table, > > > + hfi_features.nr_table_pages << PAGE_SHIFT); I think we actually need to release ->interrupt_lock here, *before* the WRMSR that ACKs the HFI update. Because I think the moment that WRMSR goes through we can get another interrupt, and that *must* not find ->interrupt_lock taken, otherwise it will not process the update etc.. leading to lost interrupts. > > > + /* > > > + * Let hardware and other CPUs know that we are done reading the HFI > > > + * table and it is free to update it again. > > > + */ > > > + pkg_therm_status_msr_val &= THERM_STATUS_CLEAR_PKG_MASK & > > > + ~PACKAGE_THERM_STATUS_HFI_UPDATED; > > > + wrmsrl(MSR_IA32_PACKAGE_THERM_STATUS, pkg_therm_status_msr_val); > > > + schedule_delayed_work(&hfi_instance->update_work, HFI_UPDATE_INTERVAL); > > > + > > > +unlock_spinlock: > > > + raw_spin_unlock_irqrestore(&hfi_instance->event_lock, flags); > > > > raw_spin_unlock(&hfi_instance->interrupt_lock); > > ... and here we release both locks. See above.