On Thu, Jun 19, 2008 at 10:42 AM, Wendy Cheng <s.wendy.cheng@xxxxxxxxx> wrote: > Wendy Cheng wrote: >> >> Terry wrote: >>> >>> On Tue, Jun 17, 2008 at 5:22 PM, Terry <td3201@xxxxxxxxx> wrote: >>> >>>> >>>> On Tue, Jun 17, 2008 at 3:09 PM, Wendy Cheng <s.wendy.cheng@xxxxxxxxx> >>>> wrote: >>>> >>>>> >>>>> Hi, Terry, >>>>> >>>>>> >>>>>> I am still seeing some high load averages. Here is an example of a >>>>>> gfs configuration. I left statfs_fast off as it would not apply to >>>>>> one of my volumes for an unknown reason. Not sure that would have >>>>>> helped anyways. I do, however, feel that reducing scand_secs helped a >>>>>> little: >>>>>> >>>>>> >>>>> >>>>> Sorry I missed scand_secs (was mindless as the brain was mostly >>>>> occupied by >>>>> day time work). >>>>> >>>>> To simplify the view, glock states include exclusive (write), share >>>>> (read), >>>>> and not-locked (in reality, there are more). Exclusive lock has to be >>>>> demoted (demote_secs) to share, then to not-locked (another >>>>> demote_secs) >>>>> before it is scanned (every scand_secs) to get added into reclaim list >>>>> where >>>>> it can be purged. Between exclusive and share state transition, the >>>>> file >>>>> contents need to get flushed to disk (to keep file content cluster >>>>> coherent). All of above assume the file (protected by this glock) is >>>>> not >>>>> accessed (idle). >>>>> >>>>> You hit an area that GFS normally doesn't perform well. With GFS1 in >>>>> maintenance mode while GFS2 seems to be so far away, ext3 could be a >>>>> better >>>>> answer. However, before switching, do make sure to test it thoroughly >>>>> (since >>>>> Ext3 could have the very same issue as well - check out: >>>>> http://marc.info/?l=linux-nfs&m=121362947909974&w=2 ). >>>>> >>>>> Did you look (and test) GFS "nolock" protocol (for single node GFS)? It >>>>> bypasses some locking overhead and can be switched to DLM in the >>>>> future >>>>> (just make sure you reserve enough journal space - the rule of thumb is >>>>> one >>>>> journal per node and know how many nodes you plan to have in the >>>>> future). >>>>> >>>>> -- Wendy >>>>> >>>> >>>> Good points. I could try the nolock feature I suppose. Not quite >>>> clear on how to reserve journal space. I forgot to post the cpu time, >>>> check out this: >>>> >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>> 4822 root 10 -5 0 0 0 S 1 0.0 2159:15 dlm_recv >>>> 4820 root 10 -5 0 0 0 S 1 0.0 368:09.34 dlm_astd >>>> 4821 root 10 -5 0 0 0 S 0 0.0 153:06.80 dlm_scand >>>> 3659 root 10 -5 0 0 0 S 0 0.0 134:40.14 scsi_wq_4 >>>> 4823 root 11 -5 0 0 0 S 1 0.0 109:33.33 dlm_send >>>> 367 root 10 -5 0 0 0 S 0 0.0 103:33.74 kswapd0 >>>> >>>> gfs_glockd is further below so not so concerned with that right now. >>>> It appears turning on nolock would do the trick. The times aren't >>>> extremely accurate because I have failed this cluster between nodes >>>> while testing. >>>> >>>> >>> >>> Here is some more testing information.... >>> >>> I created a new volume on my iscsi san of 1 TB and formatted it for >>> ext3. I then used dd to create a 100G file. This yielded roughly 900 >>> Mb/sec. I then stopped my application and did the same thing with an >>> existing GFS volume. This gave me about 850 Kb/sec. This isn't an >>> iscsi issue. This appears to be a load issue and the number of I/O >>> occurring on these volumes. That said, I would expect that performing >>> the changes I did would result in a major performance improvement. >>> Since it didn't, what are my other points I could consider? If its a >>> GFS issue, ext3 is the way to go. Maybe even switch to using >>> active-active on my NFS cluster. If its a backend disk issue, I >>> would expect to see the throughput on my iscsi link (bond1) be fully >>> utilized. Its not. Could I be thrashing the disks? This is an iscsi >>> san with 30 sata disks. Just bouncing some thoughts around to see if >>> anyone has any more thoughts. >>> >>> >> >> Really need to focus on my day time job - its worload has been climbing >> ... but can't help to place a quick comment here .. >> >> The 900 MB/s vs. 850 KB/s difference looks like a caching issue - that >> is, for 900 MB/s, it looks like the data was still lingering in the system >> cache while in 850 KB/s case, the data might already hit disk. Cluster >> filesystem normally syncs more by its nature. In general, ext3 does perform >> better in single node environment but the difference should not be as big as >> above. >> There are certainly more tuning knobs available (such as journal size >> and/or network buffer size) to make GFS-iscsi "dd" run better but it is >> pointless. To deploy a cluster filesystem for production usage, the tuning >> should not be driven by such a simple-mind command. You also have to >> consider the support issues when deploying a filesystem. GFS1 is a little >> bit out of date and any new development and/or significant performance >> improvements would likely be in GFS2, not in GFS1. Research GFS2 (googling >> to see how other people said about it) to understand whether its direction >> fits your need (so you can migrate from GFS1 to GFS2 if you bump into any >> show stopper in the future). If not, ext3 (with ext4 actively developed) is >> a fine choice if I read your configuration right from previous posts. >> > Or .. there is a known GFS1 writepage issue if most of your files are all > very big .. The problem is fixed in RHEL kernels though. What is your kernel > version ? > > -- Wendy 2.6.18-92.el5 The files are not all very big though. Varies. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster