Re: "Too many levels of symbolic links"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2014-02-19 at 11:17 +0100, Donald Buczek wrote:
> Hello,
> 
> 
> On 01/31/14 11:29, Donald Buczek wrote:
> > On 01/31/14 11:10, Donald Buczek wrote:
> >
> >> I'll surly tell you, when the problem is seen again with 8.13.
> >
> > 3.13
> >
> 
> 
> 
> now we've seen the "Too many levels of symbolic links" also on latest 
> and greatest kernel 3.13.1 (and, btw, 3.10.29  as well)

Right, I'll have a look again but I couldn't see how this could possibly
happen every other time I've looked at it. It just doesn't seem possible
and I could find any broken assignments anywhere. Apart from a really
odd compiler optimization or bug I can't see how it happens.

I could work around it but I use d_mountpoint() quite a bit and there
would be much more overhead if I had to follow the mount to check if
it's mounted. Not only that, I expect such a change would not be well
received since we should find and fix the problem, .... s*!

> 
> Debug script session in 
> http://www.molgen.mpg.de/~buczek/autofs-demo/typescript.l
> System log (excerpt) in 
> http://www.molgen.mpg.de/~buczek/autofs-demo/messages.l (16MB!)
> 
> Again I've put line numbers into the files for easier reference...
> 
> What is different this time is that multiple directories in multiple 
> autofs trees exhibit the problem at the same time. Interestingly, the 
> crafted dentry nodes have a similar (but not identical) mtime:
> 
> 44 root:pate:/home/web/buczek/autofs-demo/# .]2;root@xxxxxxx -l /home /project /package /scratch
> 45 /home:
> 46 total 144
> 47 drwx--x--x 361 buczek users   32768 Feb 19 08:20 buczek  <--- working
> 48 drwxr-x--x 161 haas   abt_vin 40960 Feb 13 15:41 haas    <--- working
> 49 drwxr-xrwt 198 root   system  12288 Jan 27 20:26 web     <--- working
> 50
> 51 /package:
> 52 total 0
> 53 dr-xr-xr-x 2 root system 0 Feb 17 11:03 usr              <---- dead
> 54
> 55 /project:
> 56 total 0
> 57 dr-xr-xr-x 2 root system 0 Feb 17 11:06 gbrowse          <--- dead
> 58 dr-xr-xr-x 2 root system 0 Feb 17 11:06 ngs_haas         <--- dead
> 59 dr-xr-xr-x 2 root system 0 Feb 17 11:03 postgres         <--- dead
> 60 dr-xr-xr-x 2 root system 0 Feb 17 11:03 splicenest       <--- dead
> 61
> 62 /scratch:
> 63 total 4
> 64 drwxrwxrwt 24 root system 4096 Sep 18 14:58 ngsvin       <----- working
> 65 dr-xr-xr-x  2 root system    0 Feb 17 11:03 ngsvin2      <----- dead
> 
> dentry flags on one of theses "/project/gbrowse) are 1523840 :
> 
> 1969 (gdb) print *(struct dentry *) 0xffff88007fd06c50
> 1970 $3 = {d_flags = 1523840, d_seq = {sequence = 4}, d_hash = {next = 0xffff880214025c98, pprev = 0xffffc9000013d570}, d_parent = 0xffff8800caa66810, d_name = {{{hash = 1876415966, len = 7}, hash_len = 31941187038}, name = 0xffff88007fd06c88 "gbrowse"}, d_inode = 0xffff8800961ad250, d_iname = "gbrowse", '\000' <repeats 24 times>, d_lockref = {{lock_count = 8610971969, {
> 1971         lock = {{rlock = {raw_lock = {{head_tail = 21037377, tickets = {head = 321, tail = 321}}}}}}, count = 2}}}, d_op = 0xffffffff81c45b40, d_sb = 0xffff880222a33800, d_time = 0, d_fsdata = 0xffff8800ca443b80, d_lru = {next = 0xffff88007fd06cd0, prev = 0xffff88007fd06cd0}, d_u = {d_child = {next = 0xffff88007fc9f420, prev = 0xffff8800caa668b0}, d_rcu = {
> 1972       next = 0xffff88007fc9f420, func = 0xffff8800caa668b0}}, d_subdirs = {next = 0xffff88007fd06cf0, prev = 0xffff88007fd06cf0}, d_alias = {next = 0x0, pprev = 0xffff8800961ad360}}
> 
> 
> which is 0x174080 which is
> 
>    DCACHE_RCUACCESS+
>    DCACHE_FSNOTIFY_PARENT_WATCHED+
>    DCACHE_MOUNTED+
>    DCACHE_NEED_AUTOMOUNT+
>    DCACHE_MANAGE_TRANSIT+
>    DCACHE_DIRECTORY_TYPE
> 
> so again we have DCACHE_MOUNTED but no mount ( line 1753ff )
> 
> looking just at "grep gbrowse messages.l" the interesting part might be 
> here:
> 
> 
> 1056818 2014-02-17T11:11:09.290343+01:00 pate kernel: [518408.741228] 
> pid 15221: autofs4_expire_indirect: checking mountpoint ffff88007fd06c50 
> gbrowse
> 1056819 2014-02-17T11:11:09.290344+01:00 pate kernel: [518408.741229] 
> pid 15221: autofs4_mount_busy: dentry ffff88007fd06c50 gbrowse
> 1056830 2014-02-17T11:11:09.290364+01:00 pate kernel: [518408.741242] 
> pid 15221: autofs4_expire_indirect: checking mountpoint ffff88007fd06c50 
> gbrowse
> 1056831 2014-02-17T11:11:09.290371+01:00 pate kernel: [518408.741243] 
> pid 15221: autofs4_mount_busy: dentry ffff88007fd06c50 gbrowse
> 1056885 2014-02-17T11:12:24.290190+01:00 pate kernel: [518483.724082] 
> pid 15245: autofs4_expire_indirect: checking mountpoint ffff88007fd06c50 
> gbrowse
> 1056886 2014-02-17T11:12:24.290201+01:00 pate kernel: [518483.724084] 
> pid 15245: autofs4_mount_busy: dentry ffff88007fd06c50 gbrowse
> 1056888 2014-02-17T11:12:24.290204+01:00 pate kernel: [518483.724088] 
> pid 15245: autofs4_expire_indirect: returning ffff88007fd06c50 gbrowse
> 1056889 2014-02-17T11:12:24.290205+01:00 pate kernel: [518483.724093] 
> pid 15245: autofs4_wait: new wait id = 0x000017b5, name = gbrowse, nfy=2
> 1056890 2014-02-17T11:12:24.290206+01:00 pate kernel: [518483.724095] 
> pid 15245: autofs4_notify_daemon: wait id = 0x000017b5, name = gbrowse, 
> type=4
> 1056891 2014-02-17T11:12:24+01:00 pate automount[531]: expiring path 
> /project/gbrowse
> 1056892 2014-02-17T11:12:24.291190+01:00 pate kernel: [518483.724971] 
> pid 15247: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse
> 1056893 2014-02-17T11:12:24+01:00 pate automount[531]: expired 
> /project/gbrowse
> 1056894 2014-02-17T11:12:24.311242+01:00 pate kernel: [518483.744358] 
> pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse
> 1056895 2014-02-17T11:12:24.311247+01:00 pate kernel: [518483.744360] 
> pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse
> 1056896 2014-02-17T11:12:24.311248+01:00 pate kernel: [518483.744364] 
> pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse
> 1056897 2014-02-17T11:12:24.311249+01:00 pate kernel: [518483.744365] 
> pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse
> 1056898 2014-02-17T11:12:24.311250+01:00 pate kernel: [518483.744366] 
> pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse
> 1056899 2014-02-17T11:12:24.311251+01:00 pate kernel: [518483.744367] 
> pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse
> 1056900 2014-02-17T11:12:24.311252+01:00 pate kernel: [518483.744368] 
> pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse
> 1056901 2014-02-17T11:12:24.311253+01:00 pate kernel: [518483.744369] 
> pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse
> 1056902 2014-02-17T11:12:24.311254+01:00 pate kernel: [518483.744370] 
> pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse
> 1056903 2014-02-17T11:12:24.311255+01:00 pate kernel: [518483.744372] 
> pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse
> 1056904 2014-02-17T11:12:24.311255+01:00 pate kernel: [518483.744373] 
> pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse
> 1056905 2014-02-17T11:12:24.311256+01:00 pate kernel: [518483.744374] 
> pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse
> 1056906 2014-02-17T11:12:24.311257+01:00 pate kernel: [518483.744375] 
> pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse
> 1056907 2014-02-17T11:12:24.311258+01:00 pate kernel: [518483.744376] 
> pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse
> 1056908 2014-02-17T11:12:24.311259+01:00 pate kernel: [518483.744377] 
> pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse
> 1056909 2014-02-17T11:12:24.311260+01:00 pate kernel: [518483.744378] 
> pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse
> 1056910 2014-02-17T11:12:24.311261+01:00 pate kernel: [518483.744379] 
> pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse
> 
> 
> where the daemon dismounts /project/gbrowse (but it looks like 
> dentry=ffff88007fd06c50 stays in place and has DCACHE_MOUNTED even a day 
> later.
> 
> The log from the fileserver confirms the unmount at Feb 17 11:12:24 :
> 
> root@genome:~# fgrep gbrowse /var/log/messages |grep pate|tail -6
> Feb 17 10:53:42 genome mountd[3074]: authenticated mount request from 
> pate.molgen.mpg.de:759 for /amd/genome/0/project/gbrowse 
> (/amd/genome/0/project/gbrowse)
> Feb 17 10:59:54 genome mountd[3079]: authenticated unmount request from 
> pate.molgen.mpg.de:1004 for /amd/genome/0/project/gbrowse 
> (/amd/genome/0/project/gbrowse)
> Feb 17 10:59:58 genome mountd[3074]: authenticated mount request from 
> pate.molgen.mpg.de:1005 for /amd/genome/0/project/gbrowse 
> (/amd/genome/0/project/gbrowse)
> Feb 17 11:06:09 genome mountd[3079]: authenticated unmount request from 
> pate.molgen.mpg.de:786 for /amd/genome/0/project/gbrowse 
> (/amd/genome/0/project/gbrowse)
> Feb 17 11:06:14 genome mountd[3070]: authenticated mount request from 
> pate.molgen.mpg.de:985 for /amd/genome/0/project/gbrowse 
> (/amd/genome/0/project/gbrowse)
> Feb 17 11:12:24 genome mountd[3070]: authenticated unmount request from 
> pate.molgen.mpg.de:1008 for /amd/genome/0/project/gbrowse 
> (/amd/genome/0/project/gbrowse)
> 
> 
> The system log on the web server is only
> 
>   tail -n +8256 /var/log/messages |egrep "Linux 
> version|autofs4|automount" > messages
> 
> But of course I had a glimpse at the complete log around these times for 
> other messages which might be related. Nothing...
> 
> 
> Other hints:
> 
>    * Until now we only had a single failure on kernel 3.13.1 (described 
> here) and another single failure on kernel 3.10.29. I think the bug 
> frequency is lower than it used to be with 3.8
>    * The two systems who have problems now belond to the group of 
> systems which had problems on kernel 3.8.
>    * This time it does looks like a problem during unmount. Previously 
> we had some bad dentries on the very first access never received by the 
> nfs server.
>    * All systems are multicore of course
> 
> root:pate:/home/web/buczek/autofs-demo/# gunzip -c /proc/config.gz 
> |egrep -i 'preempt|smp'
> CONFIG_X86_64_SMP=y
> # CONFIG_PREEMPT_RCU is not set
> CONFIG_GENERIC_SMP_IDLE_THREAD=y
> CONFIG_PREEMPT_NOTIFIERS=y
> CONFIG_SMP=y
> # CONFIG_X86_VSMP is not set
> # CONFIG_MAXSMP is not set
> # CONFIG_PREEMPT_NONE is not set
> CONFIG_PREEMPT_VOLUNTARY=y
> # CONFIG_PREEMPT is not set
> CONFIG_PM_SLEEP_SMP=y
> CONFIG_SCSI_SAS_HOST_SMP=y
> 
> 
> 
> So, who can make head and tail of it?
> 
> 
> Regards
>    Donald
> 


--
To unsubscribe from this list: send the line "unsubscribe autofs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux Ext4]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux