I built a patched version of 3.6.4 and the problem does seem to be fixed on a test server/client when I mounted with those flags (acl, resolve-gids, and gid-timeout). Seeing as it was a test system, I can't really provide anything meaningful as to the performance hit seen without the gid-timeout option. Thank you for implementing it so quickly, though!
Is there any chance of getting this fix incorporated in the upcoming 3.6.5 release?
PatrickIs there any chance of getting this fix incorporated in the upcoming 3.6.5 release?
On Thu, Jul 23, 2015 at 6:27 PM, Niels de Vos <ndevos@xxxxxxxxxx> wrote:
The patch linked above had been tested, corrected and updated. TheOn Tue, Jul 21, 2015 at 10:30:04PM +0200, Niels de Vos wrote:
> On Wed, Jul 08, 2015 at 03:20:41PM -0400, Glomski, Patrick wrote:
> > Gluster devs,
> >
> > I'm running gluster v3.6.3 (both server and client side). Since my
> > application requires more than 32 groups, I don't mount with ACLs on the
> > client. If I mount with ACLs between the bricks and set a default ACL on
> > the server, I think I'm right in stating that the server should respect
> > that ACL whenever a new file or folder is made.
>
> I would expect that the ACL gets in herited on the brick. When a new
> file is created without the default ACL, things seem to be wrong. You
> mention that creating the file directly on the brick has the correct
> ACL, so there must be some Gluster component interfering.
>
> You reminded me on IRC about this email, and that helped a lot. Its very
> easy to get distracted when trying to investigate things from the
> mailinglists.
>
> I had a brief look, and I think we could reach a solution. An ugly patch
> for initial testing is ready. Well... it compiles. I'll try to run some
> basic tests tomorrow and see if it improves things and does not crash
> immediately.
>
> The change can be found here:
> http://review.gluster.org/11732
>
> It basically adds a "resolve-gids" mount option for the FUSE client.
> This causes the fuse daemon to call getgrouplist() and retrieve all the
> groups for the UID that accesses the mountpoint. Without this option,
> the behavior is not changed, and /proc/$PID/status is used to get up to
> 32 groups (the $PID is the process that accesses the mountpoint).
>
> You probably want to also mount with "gid-timeout=N" where N is seconds
> that the group cache is valid. In the current master branch this is set
> to 300 seconds (like the sssd default), but if the groups of a used
> rarely change, this value can be increased. Previous versions had a
> lower timeout which could cause resolving the groups on almost each
> network packet that arrives (HUGE performance impact).
>
> When using this option, you may also need to enable server.manage-gids.
> This option allows using more than ~93 groups on the bricks. The network
> packets can only contain ~93 groups, when server.manage-gids is enabled,
> the groups are not sent in the network packets, but are resolved on the
> bricks with getgrouplist().
change works for me on a test-system.
A backport that you should be able to include in a package for 3.6 can
be found here: http://termbin.com/f3cj
Let me know if you are not familiar with rebuilding patched packages,
and I can build a test-version for you tomorrow.
On glusterfs-3.6, you will want to pass a gid-timeout mount option too.
The option enables caching of the resolved groups that the uid belongs
too, if caching is not enebled (or expires quickly), you will probably
notice a preformance hit. Newer version of GlusterFS set the timeout to
300 seconds (like the default timeout sssd uses).
Please test and let me know if this fixes your use case.
Thanks,
Niels
>
> Cheers,
> Niels
>
> > Maybe an example is in order:
> >
> > We first set up a test directory with setgid bit so that our new
> > subdirectories inherit the group.
> > [root@gfs01a hpc_shared]# mkdir test; cd test; chown pglomski.users .;
> > chmod 2770 .; getfacl .
> > # file: .
> > # owner: pglomski
> > # group: users
> > # flags: -s-
> > user::rwx
> > group::rwx
> > other::---
> >
> > New subdirectories share the group, but the umask leads to them being group
> > read-only.
> > [root@gfs01a test]# mkdir a; getfacl a
> > # file: a
> > # owner: root
> > # group: users
> > # flags: -s-
> > user::rwx
> > group::r-x
> > other::r-x
> >
> > Setting default ACLs on the server allows group write to new directories
> > made on the server.
> > [root@gfs01a test]# setfacl -m d:g::rwX ./; mkdir b; getfacl b
> > # file: b
> > # owner: root
> > # group: users
> > # flags: -s-
> > user::rwx
> > group::rwx
> > other::---
> > default:user::rwx
> > default:group::rwx
> > default:other::---
> >
> > The respect for ACLs is (correctly) shared across bricks.
> > [root@gfs02a test]# getfacl b
> > # file: b
> > # owner: root
> > # group: users
> > # flags: -s-
> > user::rwx
> > group::rwx
> > other::---
> > default:user::rwx
> > default:group::rwx
> > default:other::---
> >
> > [root@gfs02a test]# mkdir c; getfacl c
> > # file: c
> > # owner: root
> > # group: users
> > # flags: -s-
> > user::rwx
> > group::rwx
> > other::---
> > default:user::rwx
> > default:group::rwx
> > default:other::---
> >
> > However, when folders are created client-side, the default ACLs appear on
> > the server, but don't seem to be correctly applied.
> > [root@client test]# mkdir d; getfacl d
> > # file: d
> > # owner: root
> > # group: users
> > # flags: -s-
> > user::rwx
> > group::r-x
> > other::---
> >
> > [root@gfs01a test]# getfacl d
> > # file: d
> > # owner: root
> > # group: users
> > # flags: -s-
> > user::rwx
> > group::r-x
> > other::---
> > default:user::rwx
> > default:group::rwx
> > default:other::---
> >
> > As no groups or users were specified, I shouldn't need to specify a mask
> > for the ACL and, indeed, specifying a mask doesn't help.
> >
> > If it helps diagnose the problem, the volume options are as follows:
> > Options Reconfigured:
> > performance.io-thread-count: 32
> > performance.cache-size: 128MB
> > performance.write-behind-window-size: 128MB
> > server.allow-insecure: on
> > network.ping-timeout: 10
> > storage.owner-gid: 100
> > geo-replication.indexing: off
> > geo-replication.ignore-pid-check: on
> > changelog.changelog: on
> > changelog.fsync-interval: 3
> > changelog.rollover-time: 15
> > server.manage-gids: on
> >
> > This approach to server-side ACLs worked properly with previous versions of
> > gluster. Can anyone assess the situation for me, confirm/deny that
> > something changed, and possibly suggest how I can achieve inherited groups
> > with write permission for new subdirectories in a >32-group environment?
> >
> > Thanks for your time,
> >
> > Patrick
>
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxxx
> > http://www.gluster.org/mailman/listinfo/gluster-devel
>
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel