Re: BUG at fs/inode.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday 24 October 2011 wrote Yehuda Sadeh Weinraub:
> On Mon, Oct 24, 2011 at 3:39 AM, Amon Ott <a.ott@xxxxxxxxxxxx> wrote:
> > we have hit a kernel bug with current ceph-client master (commit
> > a2742a09568f81315e0f30021f29f14e7cd3924b), which I assume to be a Ceph
> > bug.
>
> Is it easily reproducible? What's the scenario?

It is quite easy to reproduce. We run a virtual test cluster with two nodes, 
each running OSD, MDS and MON, but using "max mon = 1".

Cephfs is mounted on both nodes so that they share the same data. Kernel is 
3.0.7 with PaX, RSBAC and ceph-client master. The intention is to have a 
scalable cluster of servers where any number of nodes may fail at any time, 
as long as there are always enough left to keep at least one copy of the data 
and restore redundancy. If it works out as expected, we want to scale to 20 
or even more nodes, depending on the needs of our customers.

> > Kernel is x86-32, Ceph is running on a two node cluster over ext4. The
> > kernel traces are attached, the system dies shortly after these messages.
> > The bug is reproducable. I have not found anything useful in ceph bug
> > tracker when searching for "fs/inode.c".
>
> How many mds servers?

We run a test cluster with two nodes, each running OSD, MDS and MON, but 
using "max mon = 1".

> > Around fs/inode.c line 1375 mentioned in the trace is the iput()
> > function: void iput(struct inode *inode)
> > {
> >        if (inode) {
> >                BUG_ON(inode->i_state & I_CLEAR);
> >
> >                if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock))
> >                        iput_final(inode);
> >        }
> > }
> >
> > So inode->i_state seems to be incorrect when iput() is called, maybe a
> > double call to iput() or a missing iget() somewhere. Is this really a
> > Ceph bug or have I messed up our kernel code when merging patches?
>
> What patches?

See above. PaX, RSBAC and Ceph master. I have been merging the first two in 
for years now, being the RSBAC main author myself.

> Also, the client logs could help shedding a light on the issue. You
> should have dynamic debugging turned on (CONFIG_DYNAMIC_DEBUG), and
> something along the lines of:
>
> # mount -t debugfs none /sys/kernel/debug
> # echo 'module ceph +p' > /sys/kernel/debug/dynamic_debug/control
> # echo 'module libceph +p' > /sys/kernel/debug/dynamic_debug/control

New kernels are building right now. Upgraded to 3.0.8, put in new ceph-client 
master fix 8ba1683acc83aee4bcab304844f8e60330e5ef1f and added 
CONFIG_DYNAMIC_DEBUG. This kernel will go into two big servers this time to 
give it some real load. Let's see whether I can reproduce there, too. If so, 
I will provide debug output as requested.

Amon Ott
-- 
Dr. Amon Ott
m-privacy GmbH           Tel: +49 30 24342334
Am Köllnischen Park 1    Fax: +49 30 24342336
10179 Berlin             http://www.m-privacy.de

Amtsgericht Charlottenburg, HRB 84946

Geschäftsführer:
 Dipl.-Kfm. Holger Maczkowsky,
 Roman Maczkowsky

GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux