On 11/11/2014 06:46 PM, Andrey Utkin wrote: > At Bluecherry, we have issues with servers which have 3 solo6110 cards > (and cards have up to 16 analog video cameras connected to them, and > being actively read). > This is a kernel which I tested with such a server last time. It is > based on linux-next of October, 31, with few patches of mine (all are > in review for upstream). > https://github.com/krieger-od/linux/ . The HEAD commit is > 949e18db86ebf45acab91d188b247abd40b6e2a1 at the moment. > > The problem is the following: after ~1 hour of uptime with working > application reading the streams, one card (the same one every time) > stops producing interrupts (counter in /proc/interrupts freezes), and > all threads reading from that card hang forever in > ioctl(VIDIOC_DQBUF). The application uses libavformat (ffmpeg) API to > read the corresponding /dev/videoX devices of H264 encoders. > Application restart doesn't help, just interrupt counter increases by > 64. To help that, we need reboot or programmatic PCI device reset by > "echo 1 > /sys/bus/pci/devices/0000\:03\:05.0/reset", which requires > unloading app and driver and is not a solution obviously. > > We had this issue for a long time, even before we used libavformat for > reading from such sources. > A few days ago, we had standalone ffmpeg processes working stable for > several days. The kernel was 3.17, the only probably-relevant change > in code over the above mentioned revision is an additional bool > variable set in solo_enc_v4l2_isr() and checked in solo_ring_thread() > to figure out whether to do or skip solo_handle_ring(). The variable > was guarded with spin_lock_irqsave(). I am not sure if it makes any > difference, will try it again eventually. > > Any thoughts, can it be a bug in driver code causing that (please > point which areas of code to review/fix)? Or is that desperate > hardware issue? How to figure out for sure whether it is the former or > the latter? I would first try to exclude hardware issues: since you say it is always the same card, try either replacing it or swapping it with another solo card and see if the problem follows the card or not. If it does, then it is likely a hardware problem. If it doesn't, then it suggests a race condition in the interrupt handling somewhere. Regards, Hans -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html