Hi, On Tue, 2011-02-15 at 21:07 +0100, Marc Grimme wrote: > Hi Steve, > I think lately I observed a very similar behavior with RHEL5 and gfs2. > It was a gfs2 filesystem that had about 2Mio files with sum of 2GB in a directory. When I did a du -shx . in this directory it took about 5 Minutes (noatime mountoption given). Independently on how much nodes took part in the cluster (in the end I only tested with one node). This was only for the first time running all later executed du commands were much faster. > When I mounted the exact same filesystem with lockproto=lock_nolock it took about 10-20 seconds to proceed with the same command. > > Next I started to analyze this with oprofile and observed the following result: > > opreport --long-file-names: > CPU: AMD64 family10, speed 2900.11 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 > samples % symbol name > 200569 46.7639 search_rsb_list The resource table size is by default 256 entries in size. Assuming that you have enough ram that all 4m locks (for 2m files) are in memory at the same time, that is approx 15625 resources per hash chain, so it would make sense that this would start to slow things down a bit. There is a config option to increase the resource table size though, so perhaps you could try that? > 118905 27.7234 create_lkb This reads down a hash chain in the lkb table. That table is larger by default (1024), which is probably why there is less cpu time burned here. On the other hand, the hash chain might be read more than once if there is a collision on the lock ids. Again it is a config option, so it should be possible to increase the size of the table. > 32499 7.5773 search_bucket > 4125 0.9618 find_lkb > 3641 0.8489 process_send_sockets > 3420 0.7974 dlm_scan_rsbs > 3184 0.7424 _request_lock > 3012 0.7023 find_rsb > 2735 0.6377 receive_from_sock > 2610 0.6085 _receive_message > 2543 0.5929 dlm_allocate_rsb > 2299 0.5360 dlm_hash2nodeid > 2228 0.5195 _create_message > 2180 0.5083 dlm_astd > 2163 0.5043 dlm_find_lockspace_global > 2109 0.4917 dlm_find_lockspace_local > 2074 0.4836 dlm_lowcomms_get_buffer > 2060 0.4803 dlm_lock > 1982 0.4621 put_rsb > .. > > opreport --image /gfs2 > CPU: AMD64 family10, speed 2900.11 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 > samples % symbol name > 9310 15.5600 search_bucket This should get better in RHEL6.1 and above, due to the new design of glock hash table. The patch is already in upstream. The glock hash table is much larger than the dlm hash table, though there are still scalability issues due to the locking and that we cannot currently grow the hash table. > 6268 10.4758 do_promote The result in do_promote is interesting, as I wouldn't have expected that to show up here really, so I'll look into that when I have a moment and try to figure out what is going on. > 2704 4.5192 gfs2_glock_put > 2289 3.8256 gfs2_glock_hold > 2286 3.8206 gfs2_glock_schedule_for_reclaim > 2204 3.6836 gfs2_glock_nq > 2204 3.6836 run_queue > 2001 3.3443 gfs2_holder_wake > .. > > opreport --image /dlm > CPU: AMD64 family10, speed 2900.11 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 > samples % symbol name > 200569 46.7639 search_rsb_list > 118905 27.7234 create_lkb > 32499 7.5773 search_bucket > 4125 0.9618 find_lkb > 3641 0.8489 process_send_sockets > 3420 0.7974 dlm_scan_rsbs > 3184 0.7424 _request_lock > 3012 0.7023 find_rsb > 2735 0.6377 receive_from_sock > 2610 0.6085 _receive_message > 2543 0.5929 dlm_allocate_rsb > 2299 0.5360 dlm_hash2nodeid > 2228 0.5195 _create_message > .. > > This very much reminded me on a similar test we've done years ago with gfs (see http://www.open-sharedroot.org/Members/marc/blog/blog-on-dlm/red-hat-dlm-__find_lock_by_id/profile-data-with-diffrent-table-sizes). > > Does this not show that during the du command 46% of the time the kernel stays in the dlm:search_rsb_list function while looking out for locks. It still looks like the hashtable for the lock in dlm is much too small and searching inside the hashmap is not constant anymore? > > I would be really interesting how long the described backup takes when the gfs2 filesystem is mounted exclusively on one node without locking. > For me it looks like you're facing a similar problem with gfs2 that has been worked around with gfs by introducing the glock_purge functionality that leads to a much smaller glock->dlm->hashtable and makes backups and the like much faster. > > I hope this helps. > > Thanks and regards > Marc. > Many thanks for this information, it is really helpful to get feedback like this which helps identify issues in the code, Steve. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster