Re: [PATCH] Revert "ALSA: memalloc: Workaround for Xen PV"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 10, 2024 at 01:17:03PM +0200, Takashi Iwai wrote:
> On Mon, 09 Sep 2024 22:02:08 +0200,
> Elliott Mitchell wrote:
> > 
> > On Sat, Sep 07, 2024 at 11:38:50AM +0100, Andrew Cooper wrote:
> > > 
> > > Individual subsystems ought not to know or care about XENPV; it's a
> > > layering violation.
> > > 
> > > If the main APIs don't behave properly, then it probably means we've got
> > > a bug at a lower level (e.g. Xen SWIOTLB is a constant source of fun)
> > > which is probably affecting other subsystems too.
> > 
> > This is a big problem.  Debian bug #988477 (https://bugs.debian.org/988477)
> > showed up in May 2021.  While some characteristics are quite different,
> > the time when it was first reported is similar to the above and it is
> > also likely a DMA bug with Xen.
> 
> Yes, some incompatible behavior has been seen on Xen wrt DMA buffer
> handling, as it seems.  But note that, in the case of above, it was
> triggered by the change in the sound driver side, hence we needed a
> quick workaround there.  The result was to move back to the old method
> for Xen in the end.
> 
> As already mentioned in another mail, the whole code was changed for
> 6.12, and the revert isn't applicable in anyway.
> 
> So I'm going to submit another patch to drop this Xen PV-specific
> workaround for 6.12.  The new code should work without the workaround
> (famous last words).  If the problem happens there, I'd rather leave
> it to Xen people ;)

I've seen that patch, but haven't seen any other activity related to
this sound problem.  I'm wondering whether the problem got fixed by
something else, there is activity on different lists I don't see, versus
no activity until Qubes OS discovers it is again broken.


An overview of the other bug which may or may not be the same as this
sound card bug:

Both reproductions of the RAID1 bug have been on systems with AMD
processors.  This may indicate this is distinct, but could also mean only
people who get AMD processors are wary enough of flash to bother with
RAID1 on flash devices.  Presently I suspect it is the latter, but not
very many people are bothering with RAID1 with flash.

Only systems with IOMMUv2 (full IOMMU, not merely GART) are effected.

Samsung SATA devices are severely effected.

Crucial/Micron NVMe devices are mildly effected.

Crucial/Micron SATA devices are uneffected.


Specifications for Samsung SATA and Crucial/Micron SATA devices are
fairly similar.  Similar IOps, similar bandwith capabilities.

Crucial/Micron NVMe devices have massively superior specifications to
the Samsung SATA devices.  Yet the Crucial/Micron NVMe devices are less
severely effected than the Samsung SATA devices.


This seems likely to be a latency issue.  Could be when commands are sent
to the Samsung SATA devices, they are fast enough to start executing them
before the IOMMU is ready.

This could match with the sound driver issue.  Since the sound hardware
is able to execute its first command with minimal latency, that is when
the problem occurs.  If the first command gets through, the second
command is likely executed with some delay and the IOMMU is reliably
ready.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@xxxxxxx  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445






[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux