-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 08.07.2014 16:26, schrieb Pranith Kumar Karampuri: > > On 07/08/2014 06:14 PM, Norman Mähler wrote: > > > Am 08.07.2014 14:34, schrieb Pranith Kumar Karampuri: >>>> On 07/08/2014 05:23 PM, Norman Mähler wrote: >>>> >>>> >>>> Am 08.07.2014 13:24, schrieb Pranith Kumar Karampuri: >>>>>>> On 07/08/2014 04:49 PM, Norman Mähler wrote: >>>>>>> >>>>>>> >>>>>>> Am 08.07.2014 13:02, schrieb Pranith Kumar Karampuri: >>>>>>>>>> On 07/08/2014 04:23 PM, Norman Mähler wrote: Of >>>>>>>>>> course: >>>>>>>>>> >>>>>>>>>> The configuration is: >>>>>>>>>> >>>>>>>>>> Volume Name: gluster_dateisystem Type: Replicate >>>>>>>>>> Volume ID: 2766695c-b8aa-46fd-b84d-4793b7ce847a >>>>>>>>>> Status: Started Number of Bricks: 1 x 2 = 2 >>>>>>>>>> Transport-type: tcp Bricks: Brick1: >>>>>>>>>> filecluster1:/mnt/raid Brick2: >>>>>>>>>> filecluster2:/mnt/raid Options Reconfigured: >>>>>>>>>> nfs.enable-ino32: on performance.cache-size: >>>>>>>>>> 512MB diagnostics.brick-log-level: WARNING >>>>>>>>>> diagnostics.client-log-level: WARNING >>>>>>>>>> nfs.addr-namelookup: off >>>>>>>>>> performance.cache-refresh-timeout: 60 >>>>>>>>>> performance.cache-max-file-size: 100MB >>>>>>>>>> performance.write-behind-window-size: 10MB >>>>>>>>>> performance.io-thread-count: 18 >>>>>>>>>> performance.stat-prefetch: off >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The file count in xattrop is >>>>>>>>>>> Do "gluster volume set gluster_dateisystem >>>>>>>>>>> cluster.self-heal-daemon off" This should stop >>>>>>>>>>> all the entry self-heals and should also get >>>>>>>>>>> the CPU usage low. When you don't have a lot of >>>>>>>>>>> activity you can enable it again using "gluster >>>>>>>>>>> volume set gluster_dateisystem >>>>>>>>>>> cluster.self-heal-daemon on" If it doesn't get >>>>>>>>>>> the CPU down execute "gluster volume set >>>>>>>>>>> gluster_dateisystem cluster.entry-self-heal >>>>>>>>>>> off". Let me know how it goes. Pranith >>>>>>> Thanks for your help so far but stopping the self heal >>>>>>> deamon and the self heal machanism itself did not >>>>>>> improve the situation. >>>>>>> >>>>>>> Do you have further suggestions? Is it simply the load >>>>>>> on the system? NFS could handle it easily before... >>>>>>>> Is it at least a little better or no improvement at >>>>>>>> all? >>>> After waiting half an hour more the system load is falling >>>> steadily. At the moment it is around 10 which is not good but >>>> a lot better than before. There are no messages in the >>>> nfs.log and the glusterfshd.log anymore. In the brick log >>>> there are still "inode not found - anonymous fd creation >>>> failed" messages. >>>>> They should go away once the heal is complete and the >>>>> system is back to normal. I believe you have directories >>>>> with lots of files? When can you start the healing process >>>>> again (i.e. window where there won't be a lot of activity >>>>> and you can afford the high CPU usage) so that things will >>>>> be back to normal? > We have got a window at night, but by now our admin decided to > copy the files back to an nfs system, because even with diabled > self heal our colleagues can not do their work with such a slow > system. >> This performance problem is addressed in 3.6 with a design change >> in replication module in glusterfs. Ok, this sounds good. > > After that we may be able to start again with a new system. We are > considering taking another network cluster sytem, but we are not > quite sure what to do. > >> Things should be smooth again after the self-heals are complete >> IMO. What is the size of volume? How many files approximately? It >> would be nice if you could give the complete logs at least later >> to help in analyzing. There are about 250 GB in approximately 650000 files on the volumes. I will send you an additional Mail with links to the complete logs later. Norman > >> Pranith > > > There are a lot of small files and lock files in these > directories. > > Norman > > >>>>> Pranith >>>> >>>> >>>> Norman >>>> >>>>>>>> Pranith >>>>>>> Norman >>>>>>> >>>>>>>>>> Brick 1: 2706 Brick 2: 2687 >>>>>>>>>> >>>>>>>>>> Norman >>>>>>>>>> >>>>>>>>>> Am 08.07.2014 12:28, schrieb Pranith Kumar >>>>>>>>>> Karampuri: >>>>>>>>>>>>> It seems like entry self-heal is happening. >>>>>>>>>>>>> What is the volume configuration? Could you >>>>>>>>>>>>> give ls >>>>>>>>>>>>> <brick-path>/.glusterfs/indices/xattrop | >>>>>>>>>>>>> wc -l Count for all the bricks >>>>>>>>>>>>> >>>>>>>>>>>>> Pranith On 07/08/2014 03:36 PM, Norman >>>>>>>>>>>>> Mähler wrote: >>>>>>>>>>>>>> Hello Pranith, >>>>>>>>>>>>>> >>>>>>>>>>>>>> here are the logs. I only giv you the >>>>>>>>>>>>>> last 3000 lines, because the nfs.log from >>>>>>>>>>>>>> today is already 550 MB. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There are the standard files from a user >>>>>>>>>>>>>> home on the gluster system. All you >>>>>>>>>>>>>> normally find in a user home. Config >>>>>>>>>>>>>> files, firefox and thunderbird files >>>>>>>>>>>>>> etc. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks in advance Norman >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 08.07.2014 11:46, schrieb Pranith >>>>>>>>>>>>>> Kumar Karampuri: >>>>>>>>>>>>>>> On 07/08/2014 02:46 PM, Norman Mähler >>>>>>>>>>>>>>> wrote: Hello again, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> i could resolve the self heal problems >>>>>>>>>>>>>>> with the missing gfid files on one of >>>>>>>>>>>>>>> the servers by deleting the gfid files >>>>>>>>>>>>>>> on the other server. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> They had a link count of 1 which means >>>>>>>>>>>>>>> that the file on that the gfid pointed >>>>>>>>>>>>>>> was already deleted. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We have still these errors >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [2014-07-08 09:09:43.564488] W >>>>>>>>>>>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 0-gluster_dateisystem-client-0: remote >>>>>>>>>>>>>>> operation failed: File exists >>>>>>>>>>>>>>> (00000000-0000-0000-0000-000000000000 >>>>>>>>>>>>>>> -> >>>>>>>>>>>>>>> <gfid:b338b09e-2577-45b3-82bd-032f954dd083>/lock) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> which appear in the glusterfshd.log and these >>>>>>>>>>>>>>> [2014-07-08 09:13:31.198462] E >>>>>>>>>>>>>>> [client-rpc-fops.c:5179:client3_3_inodelk] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(+0x466b8) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> > [0x7f5d29d4e6b8] >>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(afr_lock_blocking+0x844) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [0x7f5d29d4e2e4] >>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/protocol/client.so(client_inodelk+0x99) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [0x7f5d29f8b3c9]))) 0-: Assertion failed: 0 >>>>>>>>>>>>>>> from the nfs.log. >>>>>>>>>>>>>>>> Could you attach mount (nfs.log) and >>>>>>>>>>>>>>>> brick logs please. Do you have files >>>>>>>>>>>>>>>> with lots of hard-links? Pranith >>>>>>>>>>>>>>> I think the error messages belong >>>>>>>>>>>>>>> together but I don't have any idea how >>>>>>>>>>>>>>> to solve them. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Still we have got a very bad >>>>>>>>>>>>>>> performance issue. The system load on >>>>>>>>>>>>>>> the servers is above 20 and nearly no >>>>>>>>>>>>>>> one is able to work in here on a >>>>>>>>>>>>>>> client... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hope for help Norman >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am 07.07.2014 15:39, schrieb Pranith >>>>>>>>>>>>>>> Kumar Karampuri: >>>>>>>>>>>>>>>>>> On 07/07/2014 06:58 PM, Norman >>>>>>>>>>>>>>>>>> Mähler wrote: Dear community, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> we have got some serious problems >>>>>>>>>>>>>>>>>> with our Gluster installation. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Here is the setting: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We have got 2 bricks (version >>>>>>>>>>>>>>>>>> 3.4.4) on a debian 7.5, one of >>>>>>>>>>>>>>>>>> them with an nfs export. There >>>>>>>>>>>>>>>>>> are about 120 clients connecting >>>>>>>>>>>>>>>>>> to the exported nfs. These >>>>>>>>>>>>>>>>>> clients are thin clients reading >>>>>>>>>>>>>>>>>> and writing their Linux home >>>>>>>>>>>>>>>>>> directories from the exported >>>>>>>>>>>>>>>>>> nfs. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We want to change the access of >>>>>>>>>>>>>>>>>> these clients one by one to >>>>>>>>>>>>>>>>>> access via gluster client. >>>>>>>>>>>>>>>>>>> I did not understand what you >>>>>>>>>>>>>>>>>>> meant by this. Are you moving >>>>>>>>>>>>>>>>>>> to glusterfs-fuse based >>>>>>>>>>>>>>>>>>> mounts? >>>>>>>>>>>>>>>>>> Here are our problems: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In the moment we have got two >>>>>>>>>>>>>>>>>> types of error messages which >>>>>>>>>>>>>>>>>> come in burts to our >>>>>>>>>>>>>>>>>> glusterfshd.log >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [2014-07-07 13:10:21.572487] W >>>>>>>>>>>>>>>>>> [client-rpc-fops.c:1538:client3_3_inodelk_cbk] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 0-gluster_dateisystem-client-1: remote operation >>>>>>>>>>>>>>>>>> failed: No such file or >>>>>>>>>>>>>>>>>> directory [2014-07-07 >>>>>>>>>>>>>>>>>> 13:10:21.573448] W >>>>>>>>>>>>>>>>>> [client-rpc-fops.c:471:client3_3_open_cbk] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 0-gluster_dateisystem-client-1: remote >>>>>>>>>>>>>>>>>> operation failed: No such file >>>>>>>>>>>>>>>>>> or directory. Path: >>>>>>>>>>>>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> (00000000-0000-0000-0000-000000000000) >>>>>>>>>>>>>>>>>> [2014-07-07 13:10:21.573468] E >>>>>>>>>>>>>>>>>> [afr-self-heal-data.c:1270:afr_sh_data_open_cbk] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 0-gluster_dateisystem-replicate-0: open of >>>>>>>>>>>>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> failed on child gluster_dateisystem-client-1 >>>>>>>>>>>>>>>>>> (No such file or directory) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This looks like a missing gfid >>>>>>>>>>>>>>>>>> file on one of the bricks. I >>>>>>>>>>>>>>>>>> looked it up and yes the file is >>>>>>>>>>>>>>>>>> missing on the second brick. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We got these messages the other >>>>>>>>>>>>>>>>>> way round, too (missing on >>>>>>>>>>>>>>>>>> client-0 and the first brick). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Is it possible to repair this one >>>>>>>>>>>>>>>>>> by copying the gfid file to the >>>>>>>>>>>>>>>>>> brick where it was missing? Or >>>>>>>>>>>>>>>>>> ist there another way to repair >>>>>>>>>>>>>>>>>> it? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The second message is >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [2014-07-07 13:06:35.948738] W >>>>>>>>>>>>>>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 0-gluster_dateisystem-client-1: remote >>>>>>>>>>>>>>>>>> operation failed: File exists >>>>>>>>>>>>>>>>>> (00000000-0000-0000-0000-000000000000 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - -> >>>>>>>>>>>>>>>>>> <gfid:aae47250-8f69-480c-ac75-2da2f4d21d7a>/lock) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> and I really do not know what to do with this >>>>>>>>>>>>>>>>>> one... >>>>>>>>>>>>>>>>>>> Did any of the bricks went >>>>>>>>>>>>>>>>>>> offline and came back online? >>>>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>>>> I am really looking forward to >>>>>>>>>>>>>>>>>> your help because this is an >>>>>>>>>>>>>>>>>> active system and the system load >>>>>>>>>>>>>>>>>> on the nfs brick is about 25 >>>>>>>>>>>>>>>>>> (!!) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks in advance! Norman >>>>>>>>>>>>>>>>>> Maehler >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>>>>>> Gluster-users@xxxxxxxxxxx >>>>>>>>>>>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> - -- Mit freundlichen Grüßen, >>>>>>>>>> Norman Mähler >>>>>>>>>> >>>>>>>>>> Bereichsleiter IT-Hochschulservice uni-assist e. >>>>>>>>>> V. Geneststr. 5 Aufgang H, 3. Etage 10829 Berlin >>>>>>>>>> >>>>>>>>>> Tel.: 030-66644382 n.maehler@xxxxxxxxxxxxx >>>>>>>>>> >>>>>>> -- Mit freundlichen Grüßen, >>>>>>> >>>>>>> Norman Mähler >>>>>>> >>>>>>> Bereichsleiter IT-Hochschulservice uni-assist e. V. >>>>>>> Geneststr. 5 Aufgang H, 3. Etage 10829 Berlin >>>>>>> >>>>>>> Tel.: 030-66644382 n.maehler@xxxxxxxxxxxxx >>>>>>> >>>> -- Mit freundlichen Grüßen, >>>> >>>> Norman Mähler >>>> >>>> Bereichsleiter IT-Hochschulservice uni-assist e. V. >>>> Geneststr. 5 Aufgang H, 3. Etage 10829 Berlin >>>> >>>> Tel.: 030-66644382 n.maehler@xxxxxxxxxxxxx >>>> > -- Mit freundlichen Grüßen, > > Norman Mähler > > Bereichsleiter IT-Hochschulservice uni-assist e. V. Geneststr. 5 > Aufgang H, 3. Etage 10829 Berlin > > Tel.: 030-66644382 n.maehler@xxxxxxxxxxxxx > - -- Mit freundlichen Grüßen, Norman Mähler Bereichsleiter IT-Hochschulservice uni-assist e. V. Geneststr. 5 Aufgang H, 3. Etage 10829 Berlin Tel.: 030-66644382 n.maehler@xxxxxxxxxxxxx -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTvAijAAoJEB810LSP8y+Ri4MIAIk3rPF1FVVL/3R7Bp3lEpV/ 5BPb3A8htSkg+Zq8udmWJBNauXyEJ3LOt4XsZIb/9GYZnD6wWWyIQJrq8cz3b67H MXsTk4wnYzgCc8wDEPVjjz5UgRCA3rSoME1W8cZQmNfA3H9mLVBwh3/jQu9Av6LG qpVkMPEwH6ln7xjh1UnzEJOWPmn45Q/shqo15fAMcredF7rXZ95u8awlfu9d6zR2 mxruBlnXTLe5xO+RHGR8hFfzS9eZI5XNhE8gz3bgRu0wiyShu4gt24GloxjwSx/N G1/2vtNBBmabyISSlsjWlws0PjOznRzZcs8IFitQ1pE59sGCAUEVEr5HLyQkuQ0= =N2gT -----END PGP SIGNATURE----- _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users