Re: Kernel 3.0.0 + ext4 + ceph == ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 30 Jul 2011, Ted Ts'o wrote:
> On Sat, Jul 30, 2011 at 10:21:13AM -0700, Sage Weil wrote:
> > 
> > We do use xattrs extensively, though; that was the last extN bug we 
> > uncovered.  That's where my money is.
> 
> Hmm, yes.  That could very well be.  How big are the xattrs, and are
> there cases where you:
> 
> a) start with a small xattr (where the total size is less than 128
> bytes, so it can be stored in the inode table), and then increase it
> something where it needs to be stored in an external block?
> 
> b) start with enough xattrs so it's large, and then delete all or most
> of them?
> 
> I could easily believe we might have some bugs as we transition from
> in-inode to external block storage, or vice versa.  I'll take a look
> at the code and try to create some reproduction cases, but if you
> could give me a handle on workload patterns of ceph around xattrs,
> that would be interesting.

I would guess a, but it could also be a+b. 

Fyodor, can you take some of the corrupt inos that fsck complained about 
and see what files/directories they are?  find /osd.0 -inum NNN.  (I'm 
guessing the largest xattrs are on the collection directories, like 
/osd.0/current/something_head/.)  Then grep that filename out of the log 
to see exactly which operations took place.  The setattr log normally 
includes xattr size.

> Another thing to try might be to format the disk with 128 byte inodes
> (mke2fs -t ext4 -I 128 /dev/hdXX) and see if you can reproduce the
> problem that way.  The support for in-inode xattrs is a new feature
> (to ext4), and so it's a bit more likely that if there is a bug, it's
> related to our in-inode xattr handling --- and using a 128 byte inode
> would suppress that feature.  I don't recommend running that way, of
> course, but it might help tell us if that's where we should be looking
> for a bug.
>
> > (BTW we'll be really happy if/when the large xattr patches from the Lustre 
> > guys make it into mainline!  The (4k?) limit on total xattrs is a problem 
> > for us.)
> 
> OK, good to know.  It hadn't been high priority for the ext4 team
> (since I thought it was only the Lustre folks that really needed it),
> but I'll escalate the priority of that on our todo list.

Wonderful.

Thanks, Ted!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux