Re: Need advice re some major issues with glusterfind

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thu, Oct 22, 2015 at 8:41 AM, Sincock, John [FLCPTY] <J.Sincock@xxxxxxxxx> wrote:

Pls see below

 

From: Vijaikumar Mallikarjuna [mailto:vmallika@xxxxxxxxxx]
Sent: Wednesday, 21 October 2015 6:37 PM
To: Sincock, John [FLCPTY]
Cc: gluster-devel@xxxxxxxxxxx
Subject: Re: Need advice re some major issues with glusterfind

 

 

 

On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY] <J.Sincock@xxxxxxxxx> wrote:

Hi Everybody,

We have recently upgraded our 220 TB gluster to 3.7.4, and we've been trying to use the new glusterfind feature but have been having some serious problems with it. Overall the glusterfind looks very promising, so I don't want to offend anyone by raising these issues.

If these issues can be resolved or worked around, glusterfind will be a great feature.  So I would really appreciate any information or advice:

1) What can be done about the vast number of tiny changelogs? We are seeing often 5+ small 89 byte changelog files per minute on EACH brick. Larger files if busier. We've been generating these changelogs for a few weeks and have in excess of 10,000 or 12,000 on most bricks. This makes glusterfinds very, very slow, especially on a node which has a lot of bricks, and looks unsustainable in the long run. Why are these files so small, and why are there so many of them, and how are they supposed to be managed in the long run? The sheer number of these files looks sure to impact performance in the long run.

2) Pgfid xattribute is wreaking havoc with our backup scheme - when gluster adds this extended attribute to files it changes the ctime, which we were using to determine which files need to be archived. There should be a warning added to release notes & upgrade notes, so people can make a plan to manage this if required.

Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the rebalance took 5 days or so to complete, which looks like a major speed improvement over the more serial rebalance algorithm, so that's good. But I was hoping that the rebalance would also have had the side-effect of triggering all files to be labelled with the pgfid attribute by the time the rebalance completed, or failing that, after creation of an mlocate database across our entire gluster (which would have accessed every file, unless it is getting the info it needs only from directory inodes). Now it looks like ctimes are still being modified, and I think this can only be caused by files still being labelled with pgfids.

How can we force gluster to get this pgfid labelling over and done with, for all files that are already on the volume? We can't have gluster continuing to add pgfids in bursts here and there, eg when files are read for the first time since the upgrade. We need to get it over and done with. We have just had to turn off pgfid creation on the volume until we can force gluster to get it over and done with in one go.

 

 

Hi John,

 

Was quota turned on/off before/after performing re-balance? If the pgfid is  missing, this can be healed by performing 'find <mount_point> | xargs stat', all the files will get looked-up once and the pgfid healing will happen.

Also could you please provide all the volume files under '/var/lib/glusterd/vols/<volname>/*.vol'?

 

Thanks,

Vijay

 

 

Hi Vijay

 

Quota has never been turned on in our gluster, so it can’t be any quota-related xattrs which are resetting our ctimes, so I’m pretty sure it must be due to pgfids still being added.

 

Thanks for the tip re using stat, if that should trigger the pgfid build on each file, then I will run that when I have a chance. We’ll have to get our archiving of data back up to date, re-enable pgfid build option, and then run the stat over a weekend or something, as it will take a while.

 

I’m still quite concerned about the number of changelogs being generated. Do you know if there any plans to change the way changelogs are generated so there aren’t so many of them, and to process them more efficiently? I think this will be vital to improving performance of glusterfind in future, as there are currently an enormous number of these small changelogs being generated on each of our gluster bricks.

  

Below is the volfile for one brick, the others are all equivalent. We haven’t tweaked the volume options much, besides increasing the io thread count to 32, and client/event threads to 6 (since we have a lot of small files on our gluster (30 million files, a lot of which are small, and some of which are large to very large):

 


Hi John,

PGFID xattrs are updated only when update-link-count-parent is enabled in the brick volume file. This option is enabled when quota is enabled on a volume.
In the volume file you provided below has update-link-count-parent disabled, I am wondering why PGFID xattrs are updated.

Thanks,
Vijay
 
 

[root@g-unit-1 sbin]# cat /var/lib/glusterd/vols/vol00/vol00.g-unit-1.mnt-glusterfs-bricks-1.vol   

volume vol00-posix                                                                                 

    type storage/posix                                                                              

    option update-link-count-parent off                                                            

    option volume-id 292b8701-d394-48ee-a224-b5a20ca7ce0f                                          

    option directory /mnt/glusterfs/bricks/1                                                       

end-volume                                                                                         

 

volume vol00-trash

    type features/trash

    option trash-internal-op off

    option brick-path /mnt/glusterfs/bricks/1

    option trash-dir .trashcan              

    subvolumes vol00-posix                  

end-volume                                  

 

volume vol00-changetimerecorder

    type features/changetimerecorder

    option record-counters off     

    option ctr-enabled off         

    option record-entry on         

    option ctr_inode_heal_expire_period 300

    option ctr_hardlink_heal_expire_period 300

    option ctr_link_consistency off          

    option record-exit off                   

    option db-path /mnt/glusterfs/bricks/1/.glusterfs/

    option db-name 1.db                              

    option hot-brick off                             

    option db-type sqlite3                           

    subvolumes vol00-trash                           

end-volume                                           

 

volume vol00-changelog

    type features/changelog

    option capture-del-path on

    option changelog-barrier-timeout 120

    option changelog on                

    option changelog-dir /mnt/glusterfs/bricks/1/.glusterfs/changelogs

    option changelog-brick /mnt/glusterfs/bricks/1                   

    subvolumes vol00-changetimerecorder                               

end-volume                                                           

 

volume vol00-bitrot-stub

    type features/bitrot-stub

    option export /mnt/glusterfs/bricks/1

    subvolumes vol00-changelog          

end-volume                              

 

volume vol00-access-control

    type features/access-control

    subvolumes vol00-bitrot-stub

end-volume                     

 

volume vol00-locks

    type features/locks

    subvolumes vol00-access-control

end-volume                        

 

volume vol00-upcall

    type features/upcall

    option cache-invalidation off

    subvolumes vol00-locks      

end-volume                      

 

volume vol00-io-threads

    type performance/io-threads

    option thread-count 32    

    subvolumes vol00-upcall   

end-volume                    

 

volume vol00-marker

    type features/marker

    option inode-quota off

    option quota off     

    option gsync-force-xtime off

    option xtime off           

    option timestamp-file /var/lib/glusterd/vols/vol00/marker.tstamp

    option volume-uuid 292b8701-d394-48ee-a224-b5a20ca7ce0f        

    subvolumes vol00-io-threads

end-volume

 

volume vol00-barrier

    type features/barrier

    option barrier-timeout 120

    option barrier disable

    subvolumes vol00-marker

end-volume

 

volume vol00-index

    type features/index

    option index-base /mnt/glusterfs/bricks/1/.glusterfs/indices

    subvolumes vol00-barrier

end-volume

 

volume vol00-quota

    type features/quota

    option deem-statfs off

    option timeout 0

    option server-quota off

    option volume-uuid vol00

    subvolumes vol00-index

end-volume

 

volume vol00-worm

    type features/worm

    option worm off

    subvolumes vol00-quota

end-volume

 

volume vol00-read-only

    type features/read-only

    option read-only off

    subvolumes vol00-worm

end-volume

 

volume /mnt/glusterfs/bricks/1

    type debug/io-stats

    option count-fop-hits off

    option latency-measurement off

    subvolumes vol00-read-only

end-volume

 

volume vol00-server

    type protocol/server

    option event-threads 6

    option rpc-auth-allow-insecure on

    option auth.addr./mnt/glusterfs/bricks/1.allow *

    option auth.login.dc3d05ba-40ce-47ee-8f4c-a729917784dc.password 58c2072b-8d1c-4921-9270-bf4b477c4126

    option auth.login./mnt/glusterfs/bricks/1.allow dc3d05ba-40ce-47ee-8f4c-a729917784dc

    option transport-type tcp

    subvolumes /mnt/glusterfs/bricks/1

end-volume

 

 

 

 

 


_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux