Am Dienstag 08 Januar 2008 10:06:33 schrieb Anand Avati: > Sascha, > few points - > > 1. do you really want 4 copies of the NS with AFR? I personally think that > is an overkill. 2 should be sufficient. at least that would give us the freedom to take any server out without having to think about. however, we also tried with only one namespace and no afr, same bad result :-( (with 1.3.2) > 2. as you rightly mentioned, it might be the self heal which is slowing > down. Do you have directories with a LOT of files in the immediate level? not do many: we have nested 6 directories, only the leafs carry files. each directory level can have up to 16 sub-directories. in the leaf, up to 255 * N files with 0 < X < 4, with X most likely being near 2. that is, at max. 256 * 4 = 1024 files per dir. > the self-heal is being heavily reworked to be more memory and cpu efficient > and will be completed very soon. If you do have a LOT of files in a > directory (not subdirs), then, it would help to recreate the NS offline and > slip it in with the upgraded glusterfs. one half-efficient way: > > on each server: > mkdir /partial-ns-tmp > (cd /data/export/dir ; find . -type d) | (cd /partial-ns-tmp ; xargs mkdir > -p) > (cd /data/export/dir ; find . -type f) | (cd /partial-ns-tmp; xargs touch) > > now tar the /partial-ns-tmp on each server and extract them over each other > in the name server. I assume you do not have special fifo and device files, > if you do, recreate them like the mkdir too :) thanks for the hint. still, in an earlier attempt, we forced a self heal and waited until it was finished after 24 h, but even then was the load staying high, the webservers almost not responding (as said above, with only one namespace brick). > the updated self-heal should handle such cases much better (assuming your > problem is LOTS of files in the same dir and/or LOTS of such dirs). can't wait to test it :-)) Thanks a lot, Sascha