I should get some sleep - but can't it be that I hit the potential
deadlock mentioned here:
Please take my observation with a grain of salt (as I don't have Linux source code in front of me to check the exact locking sequence, nor can I afford spending time on this) ...
I don't see a strong evidence of deadlock (but it could) from the thread backtraces However, assuming the cluster worked before, you could have overloaded the e1000 driver in this case. There are suspicious page faults but memory is very "ok". So one possibility is that GFS had generated too many sync requests that flooded the e1000. As the result, the cluster heart beat missed its interval. Do you have the same ethernet card for both AOE and cluster traffic ? If yes, seperate them to see how it goes. And of course, if you don't have Ben's mmap patch (as you described in your post), it is probably a good idea to get it into your gfs-kmod.
But honestly, I think running GFS1 on newer kernels is a bad idea.
-- Wendy
commit 4787e11dc7831f42228b89ba7726fd6f6901a1e3
gfs-kmod: workaround for potential deadlock. Prefault user pages
The bug uncovered in 461770 does not seem fixable without a massive
change to how gfs works. There is a lock ordering mismatch between
the process address space lock and the glocks. The only good way to
avoid this in all cases is to not hold the glock for so long, which
is what gfs2 does. This is impossible without completely changing
how gfs does locking. Fortunately, this is only a problem when you
have multiple processes sharing an address space, and are doing IO
to a gfs file with a userspace buffer that's part of an mmapped gfs
file. In this case, prefaulting the buffer's pages immediately
before acquiring the glocks significantly shortens the window for
this deadlock. Closing the window any more causes a large
performance hit.
Mailman do mmap files...
Best regards,
Jozsef
--
E-mail : kadlec@xxxxxxxxxxxx, kadlec@xxxxxxxxxxxxxxxxx
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster