Hi Bernhard, Krishna, There were three issues which caused all these segfaults in afr. One of it was in fuse-bridge code, where handling inode was a problem. Other two were in unify. All these problems should be fixed in patch-469. Bernhard, can you check with the latest tla and confirm all these bugs are fixed? -amar On 8/24/07, Bernhard J. M. Grün <bernhard.gruen@xxxxxxxxxxxxxx> wrote: > > Hi Krishna, > > here is your requested information. The following is the information > from the first mail: > #0 0x00002aaaaacbc2bd in afr_stat (frame=0x2aaabce32cb0, > this=<value optimized out>, loc=0x2aaaac0fe168) at afr.c:2602 > 2602 afr.c: No such file or directory. > in afr.c > (gdb) p *loc > $1 = {path = 0x2aaaaf21f000 > "/imagecache/galerie/4197/thumbnail/419775.jpg", > ino = 1744179, inode = 0x2aaab237c360} > (gdb) p *loc->inode > $2 = {lock = 1, table = 0x60c590, nlookup = 1, generation = 0, ref = 2, > ino = 1744179, st_mode = 33188, fds = {next = 0x2aaab237c38c, > prev = 0x2aaab237c38c}, ctx = 0x0, dentry = {inode_list = { > next = 0x2aaab237c3a4, prev = 0x2aaab237c3a4}, name_hash = { > next = 0x2aaab93809a4, prev = 0x2aaac4483bf4}, inode = > 0x2aaab237c360, > name = 0x2aaab1533820 "419775.jpg", parent = 0x2aaab4dc0c90}, > inode_hash = {next = 0x2aaab65afb5c, prev = 0x2aaaac1a4fdc}, list = { > next = 0x2aaabada476c, prev = 0x60c5f0}} > > > Now here is the information from two crashes of the later mail. > The first crash from the last mail: > Program terminated with signal 11, Segmentation fault. > #0 0x00002aaaaacbc2bd in afr_stat (frame=0x2aaab0831c80, > this=<value optimized out>, loc=0x2aaaed03abf8) at afr.c:2602 > 2602 afr.c: No such file or directory. > in afr.c > (gdb) p *loc > $1 = {path = 0x2aaabf51f790 > "/imagecache/galerie/4482/thumbnail/448221.jpg", > ino = 1879050, inode = 0x2aab132f86b0} > (gdb) p *loc->inode > $2 = {lock = 1, table = 0x60c590, nlookup = 1, generation = 0, ref = 2, > ino = 1879050, st_mode = 33188, fds = {next = 0x2aab132f86dc, > prev = 0x2aab132f86dc}, ctx = 0x0, dentry = {inode_list = { > next = 0x2aab132f86f4, prev = 0x2aab132f86f4}, name_hash = { > next = 0x2aaacb980464, prev = 0x2aaaab567dd0}, inode = > 0x2aab132f86b0, > name = 0x2aab04c6b030 "448221.jpg", parent = 0x2aaabfe97300}, > inode_hash = {next = 0x2aaaaca36d0c, prev = 0x2aaaab523fe0}, list = { > next = 0x2aaad4ba01dc, prev = 0x60c5f0}} > > The second crash: > Program terminated with signal 11, Segmentation fault. > #0 0x00002aaaaacbc2bd in afr_stat (frame=0x2aab10d94830, > this=<value optimized out>, loc=0x2aab1105bd68) at afr.c:2602 > 2602 afr.c: No such file or directory. > in afr.c > (gdb) p *loc > $1 = {path = 0x2aab1208aa00 "", ino = 39422758, inode = 0x2aaae69be260} > (gdb) p *loc->inode > $2 = {lock = 1, table = 0x60c590, nlookup = 1, generation = 0, ref = 3, > ino = 39422758, st_mode = 16877, fds = {next = 0x2aaae69be28c, > prev = 0x2aaae69be28c}, ctx = 0x0, dentry = {inode_list = { > next = 0x2aaae69be2a4, prev = 0x2aaae69be2a4}, name_hash = { > next = 0x2aaae69be2b4, prev = 0x2aaae69be2b4}, inode = > 0x2aaae69be260, > name = 0x0, parent = 0x0}, inode_hash = {next = 0x15a5c7c, > prev = 0x2aaaab51a130}, list = {next = 0x2aaac0fa2fac, > prev = 0x2aaab7d5ad3c}} > > We can also meet in a chat if you like. I think this will speed up > debugging. Just give me some time frame when we can meet and where we > can meet. > > Bernhard > > 2007/8/24, Krishna Srinivas <krishna@xxxxxxxxxxxxx>: > > Bernhard, > > > > Can you do "p *loc" and "p *loc->inode" > > > > Thanks > > Krishna > > > > On 8/24/07, Bernhard J. M. Grün <bernhard.gruen@xxxxxxxxxxxxxx> wrote: > > > Hi Krishna, > > > > > > Unfortunately I can't give you access to our production systems. At > > > least not at the moment. > > > What I can do is to give you the compiled version of glusterfs, the > > > system (Ubuntu 7.04 x86-64) and the core dumps. > > > > > > But I have two new back traces for you. They are from the second > > > glusterfs client but the binaries of boths clients are the same: > > > First back trace: > > > Core was generated by `[glusterfs] > > > '. > > > Program terminated with signal 11, Segmentation fault. > > > #0 0x00002aaaaacbc2bd in afr_stat (frame=0x2aaab0831c80, this=<value > > > optimized out>, loc=0x2aaaed03abf8) at afr.c:2602 > > > 2602 afr.c: No such file or directory. > > > in afr.c > > > (gdb) bt > > > #0 0x00002aaaaacbc2bd in afr_stat (frame=0x2aaab0831c80, this=<value > > > optimized out>, loc=0x2aaaed03abf8) at afr.c:2602 > > > #1 0x00002aaaaaece1bb in iot_stat (frame=0x2aaab10bfd50, > > > this=0x6126d0, loc=0x2aaaed03abf8) at io-threads.c:651 > > > #2 0x00002ab5c53e1382 in default_stat (frame=0x2aaae2a881a0, > > > this=0x612fe0, loc=0x2aaaed03abf8) at defaults.c:112 > > > #3 0x00002aaaab2db252 in wb_stat (frame=0x2aaac90c5420, > > > this=0x613930, loc=0x2aaaed03abf8) at write-behind.c:236 > > > #4 0x0000000000405fd2 in fuse_getattr (req=<value optimized out>, > > > ino=<value optimized out>, fi=<value optimized out>) at > > > fuse-bridge.c:496 > > > #5 0x0000000000407139 in fuse_transport_notify (xl=<value optimized > > > out>, event=<value optimized out>, data=<value optimized out>) at > > > fuse-bridge.c:2067 > > > #6 0x00002ab5c53e3632 in sys_epoll_iteration (ctx=<value optimized > > > out>) at epoll.c:53 > > > #7 0x000000000040356b in main (argc=5, argv=0x7fffe58f3348) at > glusterfs.c:387 > > > > > > Second back trace: > > > Program terminated with signal 11, Segmentation fault. > > > #0 0x00002aaaaacbc2bd in afr_stat (frame=0x2aab10d94830, this=<value > > > optimized out>, loc=0x2aab1105bd68) at afr.c:2602 > > > 2602 afr.c: No such file or directory. > > > in afr.c > > > (gdb) bt > > > #0 0x00002aaaaacbc2bd in afr_stat (frame=0x2aab10d94830, this=<value > > > optimized out>, loc=0x2aab1105bd68) at afr.c:2602 > > > #1 0x00002aaaaaece1bb in iot_stat (frame=0x2aab11aee060, > > > this=0x6126d0, loc=0x2aab1105bd68) at io-threads.c:651 > > > #2 0x00002b56305f4382 in default_stat (frame=0x2aab15cb30a0, > > > this=0x612fe0, loc=0x2aab1105bd68) at defaults.c:112 > > > #3 0x00002aaaab2db252 in wb_stat (frame=0x2aab1807e2e0, > > > this=0x613930, loc=0x2aab1105bd68) at write-behind.c:236 > > > #4 0x0000000000405fd2 in fuse_getattr (req=<value optimized out>, > > > ino=<value optimized out>, fi=<value optimized out>) at > > > fuse-bridge.c:496 > > > #5 0x0000000000407139 in fuse_transport_notify (xl=<value optimized > > > out>, event=<value optimized out>, data=<value optimized out>) at > > > fuse-bridge.c:2067 > > > #6 0x00002b56305f6632 in sys_epoll_iteration (ctx=<value optimized > > > out>) at epoll.c:53 > > > #7 0x000000000040356b in main (argc=5, argv=0x7fff7a6de138) at > glusterfs.c:387 > > > > > > It seems the error is the same in all three cases. > > > > > > Bernhard > > > > > > 2007/8/22, Krishna Srinivas <krishna@xxxxxxxxxxxxx>: > > > > Hi Bernhard, > > > > > > > > We are not able to figure out the bug's cause. Is it possible for > > > > you to give us access to your machine for debugging the core? > > > > > > > > Thanks > > > > Krishna > > > > > > > > On 8/20/07, Bernhard J. M. Grün <bernhard.gruen@xxxxxxxxxxxxxx> > wrote: > > > > > I still have the core dump of the crash I've reported. But I don't > > > > > know if the backtrace is the same every time. The glusterfs client > now > > > > > runs perfectly since 2007-08-16. So we have to wait for the next > crash > > > > > to analyse that issue further. > > > > > Also the "print child_errno" does not output anything useful. It > just > > > > > says that there is no symbol with that name in the current > context. > > > > > > > > > > 2007/8/20, Krishna Srinivas <krishna@xxxxxxxxxxxxx>: > > > > > > Do you see the same backtrace everytime it crashes? > > > > > > can you do "print child_errno" at the gdb prompt when you have > the core? > > > > > > > > > > > > Thanks > > > > > > Krishna > > > > > > > > > > > > On 8/20/07, Bernhard J. M. Grün <bernhard.gruen@xxxxxxxxxxxxxx> > wrote: > > > > > > > Hi Krishna, > > > > > > > > > > > > > > One or also both of our glusterfs clients with that version > crash > > > > > > > every 3 to 5 days I think. The problem is that there is much > > > > > > > throughput (about 30MBit/s on each client with about 99.5%file reads, > > > > > > > rest file writes). This makes it hard to debug. > > > > > > > We also have a core file from that crash (If I did not deleted > it > > > > > > > because it was quite big) anyway when the next crash occurs > I'll save > > > > > > > the core dump for sure. > > > > > > > Do you have some idea how to work around that crash? > > > > > > > . > > > > > > > 2007/8/20, Krishna Srinivas <krishna@xxxxxxxxxxxxx>: > > > > > > > > Hi Bernhard, > > > > > > > > > > > > > > > > Sorry for the late response. We are not able to figure out > > > > > > > > the cause for this bug. Do you have the core file? > > > > > > > > Is the bug seen regularly? > > > > > > > > > > > > > > > > Thanks > > > > > > > > Krishna > > > > > > > > > > > > > > > > On 8/16/07, Bernhard J. M. Grün < > bernhard.gruen@xxxxxxxxxxxxxx> wrote: > > > > > > > > > Hello developers, > > > > > > > > > > > > > > > > > > We just discovered another segfault on client side. At the > moment we > > > > > > > > > can't give you more information than our version number, a > back trace > > > > > > > > > and our client configuration. > > > > > > > > > > > > > > > > > > We use version 1.3.0 with patches up to patch-449. > > > > > > > > > > > > > > > > > > The back trace looks as the follows: > > > > > > > > > Core was generated by `[glusterfs] > > > > > > > > > '. > > > > > > > > > Program terminated with signal 11, Segmentation fault. > > > > > > > > > #0 0x00002aaaaacbc2bd in afr_stat (frame=0x2aaabce32cb0, > > > > > > > > > this=<value optimized out>, loc=0x2aaaac0fe168) at > afr.c:2602 > > > > > > > > > 2602 afr.c: No such file or directory. > > > > > > > > > in afr.c > > > > > > > > > (gdb) bt > > > > > > > > > #0 0x00002aaaaacbc2bd in afr_stat (frame=0x2aaabce32cb0, > > > > > > > > > this=<value optimized out>, loc=0x2aaaac0fe168) at > afr.c:2602 > > > > > > > > > #1 0x00002aaaaaece1bb in iot_stat (frame=0x2aaabcc00860, > this=0x6126d0, > > > > > > > > > loc=0x2aaaac0fe168) at io-threads.c:651 > > > > > > > > > #2 0x00002aaaab0d2252 in wb_stat (frame=0x2aaaad05c5e0, > this=0x612fe0, > > > > > > > > > loc=0x2aaaac0fe168) at write-behind.c:236 > > > > > > > > > #3 0x0000000000405fd2 in fuse_getattr (req=<value > optimized out>, > > > > > > > > > ino=<value optimized out>, fi=<value optimized out>) > at fuse-bridge.c:496 > > > > > > > > > #4 0x0000000000407139 in fuse_transport_notify (xl=<value > optimized out>, > > > > > > > > > event=<value optimized out>, data=<value optimized > out>) > > > > > > > > > at fuse-bridge.c:2067 > > > > > > > > > #5 0x00002af562b6a632 in sys_epoll_iteration (ctx=<value > optimized out>) > > > > > > > > > at epoll.c:53 > > > > > > > > > #6 0x000000000040356b in main (argc=9, > argv=0x7fff48169b78) at glusterfs.c:387 > > > > > > > > > > > > > > > > > > And here is our client configuration for that machine: > > > > > > > > > ### Add client feature and attach to remote subvolume > > > > > > > > > volume client1 > > > > > > > > > type protocol/client > > > > > > > > > option transport-type tcp/client # for TCP/IP > transport > > > > > > > > > option remote-host 10.1.1.13 # IP address of the > remote brick > > > > > > > > > option remote-port 9999 # default server > port is 6996 > > > > > > > > > option remote-subvolume iothreads # name of the > remote volume > > > > > > > > > end-volume > > > > > > > > > > > > > > > > > > ### Add client feature and attach to remote subvolume > > > > > > > > > volume client2 > > > > > > > > > type protocol/client > > > > > > > > > option transport-type tcp/client # for TCP/IP > transport > > > > > > > > > option remote-host 10.1.1.14 # IP address of the > remote brick > > > > > > > > > option remote-port 9999 # default server > port is 6996 > > > > > > > > > option remote-subvolume iothreads # name of the > remote volume > > > > > > > > > end-volume > > > > > > > > > > > > > > > > > > volume afrbricks > > > > > > > > > type cluster/afr > > > > > > > > > subvolumes client1 client2 > > > > > > > > > option replicate *:2 > > > > > > > > > option self-heal off > > > > > > > > > end-volume > > > > > > > > > > > > > > > > > > volume iothreads #iothreads can give performance a > boost > > > > > > > > > type performance/io-threads > > > > > > > > > option thread-count 16 > > > > > > > > > subvolumes afrbricks > > > > > > > > > end-volume > > > > > > > > > > > > > > > > > > ### Add writeback feature > > > > > > > > > volume bricks > > > > > > > > > type performance/write-behind > > > > > > > > > option aggregate-size 0 # unit in bytes > > > > > > > > > subvolumes iothreads > > > > > > > > > end-volume > > > > > > > > > > > > > > > > > > > > > > > > > > > We hope you can easily find and fix that error. Thank you > in advance > > > > > > > > > > > > > > > > > > Bernhard J. M. Grün > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Gluster-devel mailing list > > > > > > > > > Gluster-devel@xxxxxxxxxx > > > > > > > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > -- Amar Tumballi Engineer - Gluster Core Team [bulde on #gluster/irc.gnu.org] http://www.zresearch.com - Commoditizing Supercomputing and Superstorage!