On Tue, Sep 20, 2011 at 7:33 PM, Pranith Kumar K <pranithk at gluster.com> wrote: > On 09/21/2011 05:53 AM, George Georgalis wrote: >> >> ... We have >> an issue with openvz gluster clients where, due to a vm environment >> bug the supplemental groups cannot be properly verified on the servers >> (openvz is answering host pid mapping in /proc vs the expected >> container pid mapping, so when that broken UID/GID info is sent to the >> server for access control, fail). >> >> http://bugs.gluster.com/show_bug.cgi?id=3563 >> http://bugzilla.openvz.org/show_bug.cgi?id=1992 >> >> We are about to attempt a workaround where we manually modify the vol >> file on each of the servers to exclude the volume stanza which >> contains "type features/access-control", and modify the "type >> features/access-control" block to shortcut to the "type storage/posix" >> subvolume stanza. >> >> Two major questions are: >> a) What is the cksum file and will it cause havoc with our change >> b) Is there some way possible to modify one volume file and use a >> builtin facility to propagate it to the servers? >> >> Anybody have experience to share? >> >> -George >> >> > hi George, > ? ? ? answer for a): cksum file is checksum for something else. It wont > cause any problems for the changes made to volfiles. > ? ? ? answer for b): Since the change is on brick volfile, it is recommended > to stop and start the volume file. It will then load the new volfile without > access-control, let us know if you face any problems. Our partition is 10 drives on 10 hosts configured with 2x replication. We changed the first three stanzas in all 10 volfiles on all 10 hosts to read as follows: volume myvol-posix type storage/posix option directory /data2 end-volume #volume myvol-access-control # type features/access-control # subvolumes myvol-posix #end-volume volume myvol-locks type features/locks # subvolumes myvol-access-control subvolumes myvol-posix end-volume Then issued a glusterd restart on each odd one then each even server. We had two problems. Files on the gluster partition which where open for writing seemed to have lost their owners, ie they where root:root afterwords. No report of any write blocks and users typically had these files in directories they own so they where able to cleanup. More serious was one case of blocking, ie md5sum of a particular file would hang and the initial process that was holding onto this file could not be killed with -9. Next time we restart half of the gluster servers, we will stat all the files in the partition before restarting the second half of the servers. Not sure if this will work but we should do it to trigger self healing in any event. Is there anything to check to know it's okay to restart the second half of the servers? It would be really great if we could turn on a journal of sorts that would list files opened for writing between time point A and B. That way after cycling the first half of the servers we could just stat those files before cycling the second half of the servers and turn off the file logger. We have tons of files and this could save a lot of time. After we changed the perms scheme (on the live filesystem) files that where open for writing lost their ownership and became root owned. -George -- George Georgalis, (415) 894-2710, http://www.galis.org/