Because - PHP. We have many php scripts on VM, runned by cron - parsers.1) Every once in a while, some processes (PHP) accessing the filesystem get stuck in a D-state (Uninterruptable sleep). I wonder if this happens due to network fluctuations (both server are connected via a simple Gigabit crosslink cable) or how to diagnose this. Why exactly does this happen in the first place? And what is the proper way to get these processes out of this situation? Why doesnt a timeout happen or anything else? I've read about client eviction, but when I enter "ceph daemon mds.node1 session ls" I only see two "entries" - one for each server. But I don't want to evict all processes on the server, obviously. Only the stuck process. So far, the only method I found to remove the D process is to reboot. Which is of course not a great solution. When I tried to only restart the MDS service instead of rebooting, many more processes got stuck and the load was >500 (not CPU most probably but due to processes waiting for I/O). This machine die. The question is only - when this happens. Usually reboot when LA like 200-350. I think this because some Main PHP PID is dead - ioctl() is newer return answer to child process = D state. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com