On 2022/9/9 16:27, Greg KH wrote: > On Fri, Sep 02, 2022 at 03:13:03AM +0000, Kai Ye wrote: >> Update documentation describing sysfs node that could help to >> configure isolation strategy for users in the user space. And >> describing sysfs node that could read the device isolated state. >> >> Signed-off-by: Kai Ye <yekai13@xxxxxxxxxx> >> --- >> Documentation/ABI/testing/sysfs-driver-uacce | 26 ++++++++++++++++++++ >> 1 file changed, 26 insertions(+) >> >> diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce >> index 08f2591138af..af5bc2f326d2 100644 >> --- a/Documentation/ABI/testing/sysfs-driver-uacce >> +++ b/Documentation/ABI/testing/sysfs-driver-uacce >> @@ -19,6 +19,32 @@ Contact: linux-accelerators@xxxxxxxxxxxxxxxx >> Description: Available instances left of the device >> Return -ENODEV if uacce_ops get_available_instances is not provided >> >> +What: /sys/class/uacce/<dev_name>/isolate_strategy >> +Date: Sep 2022 >> +KernelVersion: 6.0 >> +Contact: linux-accelerators@xxxxxxxxxxxxxxxx >> +Description: (RW) Configure the frequency size for the hardware error >> + isolation strategy. This size is a configured integer value. >> + The default is 0. The maximum value is 65535. This value is a >> + threshold based on your driver strategies. > I do not understand what the units are here. > > How is anyone supposed to know what they are? This unit is the number of times. Number of occurrences in a period, also means threshold. If the number of device pci AER error exceeds the threshold in a time window, the device is isolated. >> + For example, in the hisilicon accelerator engine, first we will >> + time-stamp every slot AER error. Then check the AER error log >> + when the device AER error occurred. if the device slot AER error >> + count exceeds the preset the number of times in one hour, the >> + isolated state will be set to true. So the device will be >> + isolated. And the AER error log that exceed one hour will be >> + cleared. Of course, different strategies can be defined in >> + different drivers. > So this file can contain values of different units depending on the > different driver that creates it? How is anyone supposed to know what > it is and what it should be? > > This feels very loose, please define this much better so that it can be > understood and maintained properly. > > thanks, > > greg k-h > . > Yes, We started out with the idea of not restricting the different drive, only specifying the input and output. Because we think different drivers require different processing strategy.