On Wed, 15 Jun 2011 19:58:58 +0900 (JST), Ryusuke Konishi wrote: > On Wed, 15 Jun 2011 10:42:51 +0900 (JST), Ryusuke Konishi wrote: > > On Tue, 14 Jun 2011 11:04:26 -0700, Zahid Chowdhury wrote: > > > Hello Ryusuke, > > > I changed the code some to: > > > diff -u --ignore-all-space fsck0.nilfs2.c ~/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck > > > --- fsck0.nilfs2.c 2011-06-14 11:03:49.000000000 -0700 > > > +++ /root/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck/fsck0.nilfs2.c 2011-06-14 11:01:34.000000000 -0700 > > > @@ -172,10 +172,14 @@ > > > static void read_block(int fd, __u64 blocknr, void *buf, > > > unsigned long size) > > > { > > > + int num_read; > > > if (lseek64(fd, blocknr * blocksize, SEEK_SET) < 0 || > > > - read(fd, buf, size) < size) > > > - die("cannot read block (blocknr = %llu): %s", > > > - (unsigned long long)blocknr, strerror(errno)); > > > + (num_read = read(fd, buf, size) < size)) { > > > + fprintf(stderr, "Read size was: %d\tNum read: %d\tStrerror: %s\n", > > > + size, num_read, strerror(errno)); > > > + die("cannot read block (blocknr = %llu)", > > > + (unsigned long long)blocknr); > > > + } > > > } > > > > > > static inline __u64 segment_start_blocknr(unsigned long segnum) > > > > > > and I got this as output: > > > > > > ./fsck0.nilfs2 -f -v /dev/sda2 > > > Super-block: > > > revision = 2.0 > > > blocksize = 4096 > > > write time = 2011-06-11 23:22:03 > > > indicated log: blocknr = 1648528 > > > segnum = 804, seq = 401758, cno=3250953 > > > > > > Unclean FS. > > > The latest log is lost. Trying rollback recovery.. > > > ...... > > > Searching the latest checkpoint. > > > Read size was: 4096 Num read: 1 Strerror: Success > > > fsck0.nilfs2: cannot read block (blocknr = 2696911) > > Ah, sorry. I noticed that the block number (= 2696911) is beyond the > size of your block device. It is the cause of this error. > > I'll look into the rollback loop code of fsck0.nilfs2 to find out the > root cause of this out-of-range access. Uum, this bug is not trivial. Clearly this happened in the context of find_latest_cno_in_logical_segment() function, but I couldn't find any suspicious callsites so far. If you hurry, please go ahead. Otherwise (if the data on the partition is important), I need your help to narrow down this problem. If we can get a backtrace of the error, things would become clear. Anyway, I would like to release an updated nilfs2 kmod in a week or so for centos users to minimize this sort of thing. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html