Re: [PATCH] drm/i915/gt: Continue creating engine sysfs files even after a failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 04/09/2024 15:34, Jani Nikula wrote:
On Wed, 04 Sep 2024, Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx> wrote:
Hi Sima,

On Tue, Aug 27, 2024 at 07:05:05PM +0200, Daniel Vetter wrote:
On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
The i915 driver generates sysfs entries for each engine of the
GPU in /sys/class/drm/cardX/engines/.

The process is straightforward: we loop over the UABI engines and
for each one, we:

  - Create the object.
  - Create basic files.
  - If the engine supports timeslicing, create timeslice duration files.
  - If the engine supports preemption, create preemption-related files.
  - Create default value files.

Currently, if any of these steps fail, the process stops, and no
further sysfs files are created.

However, it's not necessary to stop the process on failure.
Instead, we can continue creating the remaining sysfs files for
the other engines. Even if some files fail to be created, the
list of engines can still be retrieved by querying i915.

Signed-off-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>

Uh, sysfs is uapi. Either we need it, and it _must_ be there, or it's not
needed, and we should delete those files probably.

This is different from debugfs, where failures are consistently ignored
because that's the conscious design choice Greg made and wants supported.
Because debugfs is optional.

So please make sure we correctly fail driver load if these don't register.
Even better would be if sysfs files are registered atomically as attribute
blocks, but that's an entire different can of worms. But that would really
clean up this code and essentially put any failure handling onto core
driver model and sysfs code.

This comment came after I merged the patch. So far, we have been
keeping the driver going even if sysfs fails to create, with the
idea of "if there is something wrong let it go as far as it can
and fail on its own".

This change is just setting the behavior to what the rest of the
interfaces are doing, so that either we change them all to fail
the driver's probe or we have them behaving consistently as they
are.

Tvrtko, Chris, Rodrigo any opinion from your side? Shall we bail
out as Sima is suggesting?

Are there any causes for sysfs creation errors that would be acceptable
to ignore? I didn't see any examples. Or is this just speculative?

I think it is speculative and that the reason for "carry on on failure" was probably simply because there aren't any real world reasons any would ever fail. Either a programming error or kernel out of memory on driver load, and neither of those sounds interesting.

I suspect historically it was probably deemed simpler not to bother with any unwind or such, and that is the only reason i915_setup_sysfs() returns void.

In this context I don't see a big ROI in making someone work on implementing a driver load abort here, but also don't think it would harm.

IMO it would be fine to tie the decision with the fate of dynamic CCS engines. If that will go in then it definitely more than makes sense to propagate all errors to the entity doing the sysfs write.

Regards,

Tvrtko

IMO fail fast and loud. We get enough bug reports where there's some big
backtrace splash copy-pasted on the bug, but the root cause happened
much earlier and was ignored.

BR,
Jani.





[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux