On Tue, Oct 25, 2022 at 12:39:30PM +0000, Kai Ye wrote: > Update documentation describing sysfs node that could help to > configure isolation strategy for users in the user space. And > describing sysfs node that could read the device isolated state. > > Signed-off-by: Kai Ye <yekai13@xxxxxxxxxx> > --- > Documentation/ABI/testing/sysfs-driver-uacce | 27 ++++++++++++++++++++ > 1 file changed, 27 insertions(+) > > diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce > index 08f2591138af..50737c897ba3 100644 > --- a/Documentation/ABI/testing/sysfs-driver-uacce > +++ b/Documentation/ABI/testing/sysfs-driver-uacce > @@ -19,6 +19,33 @@ Contact: linux-accelerators@xxxxxxxxxxxxxxxx > Description: Available instances left of the device > Return -ENODEV if uacce_ops get_available_instances is not provided > > +What: /sys/class/uacce/<dev_name>/isolate_strategy > +Date: Oct 2022 > +KernelVersion: 6.1 > +Contact: linux-accelerators@xxxxxxxxxxxxxxxx > +Description: (RW) Configure the frequency size for the hardware error > + isolation strategy. This unit is the number of times. Number Number of times what? > + of occurrences in a period, also means threshold. If the number > + of device pci AER error exceeds the threshold in a time window, What is the time window? > + the device is isolated. This size is a configured integer value. > + The default is 0. The maximum value is 65535. > + > + In the hisilicon accelerator engine, first we will > + time-stamp every slot AER error. Then check the AER error log > + when the device AER error occurred. if the device slot AER error > + count exceeds the preset the number of times in one hour, the > + isolated state will be set to true. So the device will be > + isolated. And the AER error log that exceed one hour will be > + cleared. This seems like a very hardware-specific implementation here. And this is supposed to be a generic class? I feel this is getting really messy :( thanks, greg k-h