On Sat, 30 Jul 2011, Ted Ts'o wrote: > On Sat, Jul 30, 2011 at 10:21:13AM -0700, Sage Weil wrote: > > > > We do use xattrs extensively, though; that was the last extN bug we > > uncovered. That's where my money is. > > Hmm, yes. That could very well be. How big are the xattrs, and are > there cases where you: > > a) start with a small xattr (where the total size is less than 128 > bytes, so it can be stored in the inode table), and then increase it > something where it needs to be stored in an external block? > > b) start with enough xattrs so it's large, and then delete all or most > of them? > > I could easily believe we might have some bugs as we transition from > in-inode to external block storage, or vice versa. I'll take a look > at the code and try to create some reproduction cases, but if you > could give me a handle on workload patterns of ceph around xattrs, > that would be interesting. I would guess a, but it could also be a+b. Fyodor, can you take some of the corrupt inos that fsck complained about and see what files/directories they are? find /osd.0 -inum NNN. (I'm guessing the largest xattrs are on the collection directories, like /osd.0/current/something_head/.) Then grep that filename out of the log to see exactly which operations took place. The setattr log normally includes xattr size. > Another thing to try might be to format the disk with 128 byte inodes > (mke2fs -t ext4 -I 128 /dev/hdXX) and see if you can reproduce the > problem that way. The support for in-inode xattrs is a new feature > (to ext4), and so it's a bit more likely that if there is a bug, it's > related to our in-inode xattr handling --- and using a 128 byte inode > would suppress that feature. I don't recommend running that way, of > course, but it might help tell us if that's where we should be looking > for a bug. > > > (BTW we'll be really happy if/when the large xattr patches from the Lustre > > guys make it into mainline! The (4k?) limit on total xattrs is a problem > > for us.) > > OK, good to know. It hadn't been high priority for the ext4 team > (since I thought it was only the Lustre folks that really needed it), > but I'll escalate the priority of that on our todo list. Wonderful. Thanks, Ted! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html