"Too many levels of symbolic links"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,


On 01/31/14 11:29, Donald Buczek wrote:
On 01/31/14 11:10, Donald Buczek wrote:

I'll surly tell you, when the problem is seen again with 8.13.

3.13




now we've seen the "Too many levels of symbolic links" also on latest and greatest kernel 3.13.1 (and, btw, 3.10.29 as well)

Debug script session in http://www.molgen.mpg.de/~buczek/autofs-demo/typescript.l System log (excerpt) in http://www.molgen.mpg.de/~buczek/autofs-demo/messages.l (16MB!)

Again I've put line numbers into the files for easier reference...

What is different this time is that multiple directories in multiple autofs trees exhibit the problem at the same time. Interestingly, the crafted dentry nodes have a similar (but not identical) mtime:

44 root:pate:/home/web/buczek/autofs-demo/# .]2;root@xxxxxxx -l /home /project /package /scratch
45 /home:
46 total 144
47 drwx--x--x 361 buczek users   32768 Feb 19 08:20 buczek  <--- working
48 drwxr-x--x 161 haas   abt_vin 40960 Feb 13 15:41 haas    <--- working
49 drwxr-xrwt 198 root   system  12288 Jan 27 20:26 web     <--- working
50
51 /package:
52 total 0
53 dr-xr-xr-x 2 root system 0 Feb 17 11:03 usr              <---- dead
54
55 /project:
56 total 0
57 dr-xr-xr-x 2 root system 0 Feb 17 11:06 gbrowse          <--- dead
58 dr-xr-xr-x 2 root system 0 Feb 17 11:06 ngs_haas         <--- dead
59 dr-xr-xr-x 2 root system 0 Feb 17 11:03 postgres         <--- dead
60 dr-xr-xr-x 2 root system 0 Feb 17 11:03 splicenest       <--- dead
61
62 /scratch:
63 total 4
64 drwxrwxrwt 24 root system 4096 Sep 18 14:58 ngsvin       <----- working
65 dr-xr-xr-x  2 root system    0 Feb 17 11:03 ngsvin2      <----- dead

dentry flags on one of theses "/project/gbrowse) are 1523840 :

1969 (gdb) print *(struct dentry *) 0xffff88007fd06c50
1970 $3 = {d_flags = 1523840, d_seq = {sequence = 4}, d_hash = {next = 0xffff880214025c98, pprev = 0xffffc9000013d570}, d_parent = 0xffff8800caa66810, d_name = {{{hash = 1876415966, len = 7}, hash_len = 31941187038}, name = 0xffff88007fd06c88 "gbrowse"}, d_inode = 0xffff8800961ad250, d_iname = "gbrowse", '\000' <repeats 24 times>, d_lockref = {{lock_count = 8610971969, {
1971         lock = {{rlock = {raw_lock = {{head_tail = 21037377, tickets = {head = 321, tail = 321}}}}}}, count = 2}}}, d_op = 0xffffffff81c45b40, d_sb = 0xffff880222a33800, d_time = 0, d_fsdata = 0xffff8800ca443b80, d_lru = {next = 0xffff88007fd06cd0, prev = 0xffff88007fd06cd0}, d_u = {d_child = {next = 0xffff88007fc9f420, prev = 0xffff8800caa668b0}, d_rcu = {
1972       next = 0xffff88007fc9f420, func = 0xffff8800caa668b0}}, d_subdirs = {next = 0xffff88007fd06cf0, prev = 0xffff88007fd06cf0}, d_alias = {next = 0x0, pprev = 0xffff8800961ad360}}


which is 0x174080 which is

  DCACHE_RCUACCESS+
  DCACHE_FSNOTIFY_PARENT_WATCHED+
  DCACHE_MOUNTED+
  DCACHE_NEED_AUTOMOUNT+
  DCACHE_MANAGE_TRANSIT+
  DCACHE_DIRECTORY_TYPE

so again we have DCACHE_MOUNTED but no mount ( line 1753ff )

looking just at "grep gbrowse messages.l" the interesting part might be here:


1056818 2014-02-17T11:11:09.290343+01:00 pate kernel: [518408.741228] pid 15221: autofs4_expire_indirect: checking mountpoint ffff88007fd06c50 gbrowse 1056819 2014-02-17T11:11:09.290344+01:00 pate kernel: [518408.741229] pid 15221: autofs4_mount_busy: dentry ffff88007fd06c50 gbrowse 1056830 2014-02-17T11:11:09.290364+01:00 pate kernel: [518408.741242] pid 15221: autofs4_expire_indirect: checking mountpoint ffff88007fd06c50 gbrowse 1056831 2014-02-17T11:11:09.290371+01:00 pate kernel: [518408.741243] pid 15221: autofs4_mount_busy: dentry ffff88007fd06c50 gbrowse 1056885 2014-02-17T11:12:24.290190+01:00 pate kernel: [518483.724082] pid 15245: autofs4_expire_indirect: checking mountpoint ffff88007fd06c50 gbrowse 1056886 2014-02-17T11:12:24.290201+01:00 pate kernel: [518483.724084] pid 15245: autofs4_mount_busy: dentry ffff88007fd06c50 gbrowse 1056888 2014-02-17T11:12:24.290204+01:00 pate kernel: [518483.724088] pid 15245: autofs4_expire_indirect: returning ffff88007fd06c50 gbrowse 1056889 2014-02-17T11:12:24.290205+01:00 pate kernel: [518483.724093] pid 15245: autofs4_wait: new wait id = 0x000017b5, name = gbrowse, nfy=2 1056890 2014-02-17T11:12:24.290206+01:00 pate kernel: [518483.724095] pid 15245: autofs4_notify_daemon: wait id = 0x000017b5, name = gbrowse, type=4 1056891 2014-02-17T11:12:24+01:00 pate automount[531]: expiring path /project/gbrowse 1056892 2014-02-17T11:12:24.291190+01:00 pate kernel: [518483.724971] pid 15247: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse 1056893 2014-02-17T11:12:24+01:00 pate automount[531]: expired /project/gbrowse 1056894 2014-02-17T11:12:24.311242+01:00 pate kernel: [518483.744358] pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse 1056895 2014-02-17T11:12:24.311247+01:00 pate kernel: [518483.744360] pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse 1056896 2014-02-17T11:12:24.311248+01:00 pate kernel: [518483.744364] pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse 1056897 2014-02-17T11:12:24.311249+01:00 pate kernel: [518483.744365] pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse 1056898 2014-02-17T11:12:24.311250+01:00 pate kernel: [518483.744366] pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse 1056899 2014-02-17T11:12:24.311251+01:00 pate kernel: [518483.744367] pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse 1056900 2014-02-17T11:12:24.311252+01:00 pate kernel: [518483.744368] pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse 1056901 2014-02-17T11:12:24.311253+01:00 pate kernel: [518483.744369] pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse 1056902 2014-02-17T11:12:24.311254+01:00 pate kernel: [518483.744370] pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse 1056903 2014-02-17T11:12:24.311255+01:00 pate kernel: [518483.744372] pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse 1056904 2014-02-17T11:12:24.311255+01:00 pate kernel: [518483.744373] pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse 1056905 2014-02-17T11:12:24.311256+01:00 pate kernel: [518483.744374] pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse 1056906 2014-02-17T11:12:24.311257+01:00 pate kernel: [518483.744375] pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse 1056907 2014-02-17T11:12:24.311258+01:00 pate kernel: [518483.744376] pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse 1056908 2014-02-17T11:12:24.311259+01:00 pate kernel: [518483.744377] pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse 1056909 2014-02-17T11:12:24.311260+01:00 pate kernel: [518483.744378] pid 15246: autofs4_d_manage: dentry=ffff88007fd06c50 gbrowse 1056910 2014-02-17T11:12:24.311261+01:00 pate kernel: [518483.744379] pid 15246: autofs4_d_automount: dentry=ffff88007fd06c50 gbrowse


where the daemon dismounts /project/gbrowse (but it looks like dentry=ffff88007fd06c50 stays in place and has DCACHE_MOUNTED even a day later.

The log from the fileserver confirms the unmount at Feb 17 11:12:24 :

root@genome:~# fgrep gbrowse /var/log/messages |grep pate|tail -6
Feb 17 10:53:42 genome mountd[3074]: authenticated mount request from pate.molgen.mpg.de:759 for /amd/genome/0/project/gbrowse (/amd/genome/0/project/gbrowse) Feb 17 10:59:54 genome mountd[3079]: authenticated unmount request from pate.molgen.mpg.de:1004 for /amd/genome/0/project/gbrowse (/amd/genome/0/project/gbrowse) Feb 17 10:59:58 genome mountd[3074]: authenticated mount request from pate.molgen.mpg.de:1005 for /amd/genome/0/project/gbrowse (/amd/genome/0/project/gbrowse) Feb 17 11:06:09 genome mountd[3079]: authenticated unmount request from pate.molgen.mpg.de:786 for /amd/genome/0/project/gbrowse (/amd/genome/0/project/gbrowse) Feb 17 11:06:14 genome mountd[3070]: authenticated mount request from pate.molgen.mpg.de:985 for /amd/genome/0/project/gbrowse (/amd/genome/0/project/gbrowse) Feb 17 11:12:24 genome mountd[3070]: authenticated unmount request from pate.molgen.mpg.de:1008 for /amd/genome/0/project/gbrowse (/amd/genome/0/project/gbrowse)


The system log on the web server is only

tail -n +8256 /var/log/messages |egrep "Linux version|autofs4|automount" > messages

But of course I had a glimpse at the complete log around these times for other messages which might be related. Nothing...


Other hints:

* Until now we only had a single failure on kernel 3.13.1 (described here) and another single failure on kernel 3.10.29. I think the bug frequency is lower than it used to be with 3.8 * The two systems who have problems now belond to the group of systems which had problems on kernel 3.8. * This time it does looks like a problem during unmount. Previously we had some bad dentries on the very first access never received by the nfs server.
  * All systems are multicore of course

root:pate:/home/web/buczek/autofs-demo/# gunzip -c /proc/config.gz |egrep -i 'preempt|smp'
CONFIG_X86_64_SMP=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_SMP=y
# CONFIG_X86_VSMP is not set
# CONFIG_MAXSMP is not set
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PM_SLEEP_SMP=y
CONFIG_SCSI_SAS_HOST_SMP=y



So, who can make head and tail of it?


Regards
  Donald

--
Donald Buczek
buczek@xxxxxxxxxxxxx
Tel: +49 30 8413 1433

--
To unsubscribe from this list: send the line "unsubscribe autofs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux Ext4]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux