Am 24.05.22 um 00:36 schrieb Ken Smith:
Sean Young wrote:On Tue, Apr 28, 2020 at 08:24:20PM +0200, Martin Burnicki wrote:Hi, Am 27.04.20 um 10:59 schrieb Martin Burnicki:Sean Young wrote:Would you mind testing this patch please?I'm going to try it this evening. I'll have to find out how to do an out-of-tree build for a copy of the cx module that includes the patch. My own kernel driver is always and only built out-of-tree, but for thecx driver I need to see which files I need to copy to a local directory,and if there is anything else that needs to be done to build a copy of it out-of-tree.Sorry, I haven't managed to test the patch, yet. Currently I have the driver loaded with options cx23885 dma_reset_workaround=2 but today there were 3 occurrences of the risc opcode error:Drats.So the workaround doesn't seem to fix the problem anyway, and the patch would just enable the workaround with out the specific option, right?Yes, that's right.I would agree with that. I would suspect same issue was being papered over by the patch; now what that issue is, I don't know. Certainly some orderingThe effect of the workaround looks just like debug levels lower than 7, it just seems to reduce the probability that the bug occurs, but doesn't really fix it. So my conclusion is still that that this smells like a missing memory barrier or so in the driver. Since the driver seems to work properly with older mainboards/CPU types, this doesn't sound like a problem in the CX chip, IMO.or barrier issue seems likely.Actually I suspected this all along, but the workaround is the best we have.I think, some time spent hunting down the issue would really be helpful here. Hopefully that doesn't mean too many aborted recordings.. Thanks, SeanHi, I'd like to resurrect this thread (copied below). I have a system showing this error. Its a HP ML350 server with 2x Xeon 5675 running Rocky Linux 8.5. It has a Hauppauge HVR5525 card that uses the same cx23885 kernel module as the quadHD card discussed above. The HVR5525 is a dual DVB-T2/DVB-S2 card.In other threads I read about the dma_reset_workaround option. That option did not appear to be in the version included in standard kernel in Rocky 8.5. I have loaded a 5.4 kernel and compiled the DVB media modules from .git source and set dma_reset_workaround=2 in a file in modprobe.d. The built module shows version 0.0.4Sadly the error remains. The system runs MythTV v.31. The main symptom is occasional aborted recordings. Although the card does appear to recover, not requiring a reboot/cold restart.I'd appreciate some assistance with this. What information can I provide to help to trace this.
I'm also maintaining a driver which started to show problems on systems with new CPUs and chipsets quite some time ago, for example on some Ryzen CPUs. In my case it turned out that the problem was because my driver accessed memory locations on a my PCI card directly via a pointer.
Looks like the problem occurred because the CPU/chipset "optimized" and re-ordered the execution of some machine instructions. There are "barrier" instructions that can be inserted in the source code to avoid this, but my original code didn't use them because the driver had been working on many systems for a long time.
Anyway, the low level functions provided by the kernel to access registers on a peripheral are implemented to use those barriers, so simply using those primitives (writel, readl and friends) instead of accessing the registers directly via a pointer (*p = cmd; val = *(p+1) ) fixed the problem for my driver.
All the symptoms described here for the cx23885 module make me assume that the problem is very similar, i.e. due to a missing barrier instruction somewhere in the source code. Unfortunately I'm not familiar with the Linux media driver stuff, so I don't know where I could start to look for a missing barrier instruction.
The only workaround that fixed the problem for me, and that I'm still using, is to load the cx23885 module with a high debug level, by putting a line
options cx23885 debug=8 into a file /etc/modprobe.d/cx23885.confThis produces a HUGE amount of kernel log messages (dmesg), but with lower debug levels the driver still didn't work reliably.
To make this stable for a long time, I changed /var/log/ to NOT point to my SSD but to a real hard disk, and I created a cronjob file in /etc/etc/cron.d/ with the line
1 0-23 * * * root rm -f /var/log/kern.log* to periodically remove the huge kernel log files. This hack works for me since this has been discussed on this ML years ago. Martin
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature