I just wanted to comment on the multiple NS volumes too. We have a
similar setup and found that if we restarted the servers in sequence for
updates it took a few minutes for the NS to update, so with only 2 we
accidentally took the whole cluster down by restarting too fast.
(restart server with NS1, wait a few minutes, restart NS2)
2 Questions come out of this though:
1) Is there a way (besides parsing the logs) to determine if the NS
servers are up to date or healing?
2) Is there a significant speed benefit to having fewer? (other than
file creation I mean) All out tests used dd and single files so we
didn't notice much.
Thanks! and as always thanks for this wonderful contribution to the it
world!
-Mic
Sascha Ottolski wrote:
Am Dienstag 08 Januar 2008 10:06:33 schrieb Anand Avati:
Sascha,
few points -
1. do you really want 4 copies of the NS with AFR? I personally think that
is an overkill. 2 should be sufficient.
at least that would give us the freedom to take any server out without having
to think about. however, we also tried with only one namespace and no afr,
same bad result :-( (with 1.3.2)
2. as you rightly mentioned, it might be the self heal which is slowing
down. Do you have directories with a LOT of files in the immediate level?
not do many: we have nested 6 directories, only the leafs carry files. each
directory level can have up to 16 sub-directories. in the leaf, up to 255 * N
files with 0 < X < 4, with X most likely being near 2.
that is, at max. 256 * 4 = 1024 files per dir.
the self-heal is being heavily reworked to be more memory and cpu efficient
and will be completed very soon. If you do have a LOT of files in a
directory (not subdirs), then, it would help to recreate the NS offline and
slip it in with the upgraded glusterfs. one half-efficient way:
on each server:
mkdir /partial-ns-tmp
(cd /data/export/dir ; find . -type d) | (cd /partial-ns-tmp ; xargs mkdir
-p)
(cd /data/export/dir ; find . -type f) | (cd /partial-ns-tmp; xargs touch)
now tar the /partial-ns-tmp on each server and extract them over each other
in the name server. I assume you do not have special fifo and device files,
if you do, recreate them like the mkdir too :)
thanks for the hint. still, in an earlier attempt, we forced a self heal and
waited until it was finished after 24 h, but even then was the load staying
high, the webservers almost not responding (as said above, with only one
namespace brick).
the updated self-heal should handle such cases much better (assuming your
problem is LOTS of files in the same dir and/or LOTS of such dirs).
can't wait to test it :-))
Thanks a lot,
Sascha
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel
--