Daniel, Awesome! the bug was in readdir which was not flushing cache on seek (rewinddir), hence used to return stale (deleted) entries which used to put it in a loop. So rm -rf should work smooth now.. good catch! and thanks :) avati 2007/7/2, Daniel van Ham Colchete <daniel.colchete@xxxxxxxxx>:
People, I tried this here. I'm getting the very same results, but I noticed something strace'ing the 'rm -rf *' process here. All the unlink goes OK untill the rm process calls lseek at the directory descriptor: unlink("/proc/self/fd/5/00-INDEX") = 0 unlink("/proc/self/fd/5/Mylex.txt") = 0 getdents64(5, /* 0 entries */, 4096) = 0 lseek(5, 0, SEEK_SET) = 0 getdents64(5, /* 49 entries */, 4096) = 1760 unlink("/proc/self/fd/5/st.txt") = -1 ENOENT (No such file or directory) open(".", O_RDONLY|O_LARGEFILE) = 3 fchdir(5) = 0 unlink("st.txt") = -1 ENOENT (No such file or directory) fchdir(3) = 0 close(3) = 0 unlink("/proc/self/fd/5/ChangeLog.megaraid") = -1 ENOENT (No such file or directory) open(".", O_RDONLY|O_LARGEFILE) = 3 fchdir(5) = 0 unlink("ChangeLog.megaraid") = -1 ENOENT (No such file or directory) There is no lseek call before that. The files being removed after that was already removed before. And now 'rm -rf' starts to loop itself trying to lseek and remove the same files over and over again. Best regards, Daniel On 7/1/07, Harris Landgarten <harrisl@xxxxxxxxxxxxx> wrote: > > This bug is easily reproduced by copying the Linux source tree to gluster > and then trying to remove it with rm -rf > > Harris > > ----- Original Message ----- > From: "Majied Najjar" <majied.najjar@xxxxxxxxxxxxxxx> > To: "Harris Landgarten" <harrisl@xxxxxxxxxxxxx> > Cc: "gluster-devel" <gluster-devel@xxxxxxxxxx> > Sent: Friday, June 29, 2007 4:01:09 PM (GMT-0500) America/New_York > Subject: Re: Problem with rm -rf > > I have some core file outputs for the same operations: > > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". > Core was generated by > `[glusterfsd] '. > Program terminated with signal 11, Segmentation fault. > #0 0xb7e50639 in ?? () > (gdb) bt > #0 0xb7e50639 in ?? () > > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". > Core was generated by > `[glusterfsd] '. > Program terminated with signal 11, Segmentation fault. > #0 0xb7e92639 in ?? () > (gdb) bt > #0 0xb7e92639 in ?? () > > Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". > Core was generated by > `[glusterfs] '. > Program terminated with signal 11, Segmentation fault. > #0 0xffffe410 in __kernel_vsyscall () > (gdb) bt > #0 0xffffe410 in __kernel_vsyscall () > #1 0xb7f0aa8d in ?? () > > > On Fri, 29 Jun 2007 13:36:27 -0400 (EDT) > Harris Landgarten <harrisl@xxxxxxxxxxxxx> wrote: > > > Avati, > > > > More info on rm -rf problem > > > > rm -rf * and find . -exec rm -rf {} \; > > > > both begin properly and then fall into a sequence of looking for files: > > > > find . -type f -exec rm {} \; > > > > works fast and properly > > > > rm -rf * then works with empty dirs. > > > > Harris > > > > > > > > > > ----- Original Message ----- > > From: "Harris Landgarten" <harrisl@xxxxxxxxxxxxx> > > To: "Anand Avati" <avati@xxxxxxxxxxxxx> > > Cc: "gluster-devel" <gluster-devel@xxxxxxxxxx> > > Sent: Thursday, June 28, 2007 9:21:46 AM (GMT-0500) America/New_York > > Subject: Re: Problem with rm -rf > > > > the rm -rf hangs. It looks like one or two unlinks are sent to the log. > I can cntl-C the client and the data is still there. The data was is the tmp > dir from failed backups. It is gone now. I will investigate more when I have > more data later today. > > > > Harris > > > > ----- Original Message ----- > > From: "Anand Avati" <avati@xxxxxxxxxxxxx> > > To: "Harris Landgarten" <harrisl@xxxxxxxxxxxxx> > > Cc: "gluster-devel" <gluster-devel@xxxxxxxxxx> > > Sent: Thursday, June 28, 2007 9:17:41 AM (GMT-0500) America/New_York > > Subject: Re: Problem with rm -rf > > > > Strange, > > what is your configuration? At the time of 'hang', is it possible for > you to attach gdb to glusterfs and get a backtrace (from every thread, by > switching as 'thr 1' 'thr 2' etc) ? > > rm -rf seems to work fine for me, wondering how find . -exec rm would > make a difference. > > thanks, > > avati > > > > > I am trying to delete the contents of a tmp dir with 3 trees > containing about 1.7G > > > as root, from withint the top level tmp dir I issue > > > > > > rm -rf * > > > > > > and the command hangs are never returns. > > > > > > find . -exec rm -rf {} \; > > > > > > works as expected. > > > > > > > > > Harris > > > > > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > > > -- > > Anand V. Avati > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel
-- Anand V. Avati