On Tue, Jun 17, 2008 at 3:09 PM, Wendy Cheng <s.wendy.cheng@xxxxxxxxx> wrote: > Hi, Terry, >> >> I am still seeing some high load averages. Here is an example of a >> gfs configuration. I left statfs_fast off as it would not apply to >> one of my volumes for an unknown reason. Not sure that would have >> helped anyways. I do, however, feel that reducing scand_secs helped a >> little: >> > > Sorry I missed scand_secs (was mindless as the brain was mostly occupied by > day time work). > > To simplify the view, glock states include exclusive (write), share (read), > and not-locked (in reality, there are more). Exclusive lock has to be > demoted (demote_secs) to share, then to not-locked (another demote_secs) > before it is scanned (every scand_secs) to get added into reclaim list where > it can be purged. Between exclusive and share state transition, the file > contents need to get flushed to disk (to keep file content cluster > coherent). All of above assume the file (protected by this glock) is not > accessed (idle). > > You hit an area that GFS normally doesn't perform well. With GFS1 in > maintenance mode while GFS2 seems to be so far away, ext3 could be a better > answer. However, before switching, do make sure to test it thoroughly (since > Ext3 could have the very same issue as well - check out: > http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ). > > Did you look (and test) GFS "nolock" protocol (for single node GFS)? It > bypasses some locking overhead and can be switched to DLM in the future > (just make sure you reserve enough journal space - the rule of thumb is one > journal per node and know how many nodes you plan to have in the future). > > -- Wendy Good points. I could try the nolock feature I suppose. Not quite clear on how to reserve journal space. I forgot to post the cpu time, check out this: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4822 root 10 -5 0 0 0 S 1 0.0 2159:15 dlm_recv 4820 root 10 -5 0 0 0 S 1 0.0 368:09.34 dlm_astd 4821 root 10 -5 0 0 0 S 0 0.0 153:06.80 dlm_scand 3659 root 10 -5 0 0 0 S 0 0.0 134:40.14 scsi_wq_4 4823 root 11 -5 0 0 0 S 1 0.0 109:33.33 dlm_send 367 root 10 -5 0 0 0 S 0 0.0 103:33.74 kswapd0 gfs_glockd is further below so not so concerned with that right now. It appears turning on nolock would do the trick. The times aren't extremely accurate because I have failed this cluster between nodes while testing. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster