On Monday 24 October 2011 wrote Yehuda Sadeh Weinraub: > On Mon, Oct 24, 2011 at 3:39 AM, Amon Ott <a.ott@xxxxxxxxxxxx> wrote: > > we have hit a kernel bug with current ceph-client master (commit > > a2742a09568f81315e0f30021f29f14e7cd3924b), which I assume to be a Ceph > > bug. > > Is it easily reproducible? What's the scenario? It is quite easy to reproduce. We run a virtual test cluster with two nodes, each running OSD, MDS and MON, but using "max mon = 1". Cephfs is mounted on both nodes so that they share the same data. Kernel is 3.0.7 with PaX, RSBAC and ceph-client master. The intention is to have a scalable cluster of servers where any number of nodes may fail at any time, as long as there are always enough left to keep at least one copy of the data and restore redundancy. If it works out as expected, we want to scale to 20 or even more nodes, depending on the needs of our customers. > > Kernel is x86-32, Ceph is running on a two node cluster over ext4. The > > kernel traces are attached, the system dies shortly after these messages. > > The bug is reproducable. I have not found anything useful in ceph bug > > tracker when searching for "fs/inode.c". > > How many mds servers? We run a test cluster with two nodes, each running OSD, MDS and MON, but using "max mon = 1". > > Around fs/inode.c line 1375 mentioned in the trace is the iput() > > function: void iput(struct inode *inode) > > { > > if (inode) { > > BUG_ON(inode->i_state & I_CLEAR); > > > > if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) > > iput_final(inode); > > } > > } > > > > So inode->i_state seems to be incorrect when iput() is called, maybe a > > double call to iput() or a missing iget() somewhere. Is this really a > > Ceph bug or have I messed up our kernel code when merging patches? > > What patches? See above. PaX, RSBAC and Ceph master. I have been merging the first two in for years now, being the RSBAC main author myself. > Also, the client logs could help shedding a light on the issue. You > should have dynamic debugging turned on (CONFIG_DYNAMIC_DEBUG), and > something along the lines of: > > # mount -t debugfs none /sys/kernel/debug > # echo 'module ceph +p' > /sys/kernel/debug/dynamic_debug/control > # echo 'module libceph +p' > /sys/kernel/debug/dynamic_debug/control New kernels are building right now. Upgraded to 3.0.8, put in new ceph-client master fix 8ba1683acc83aee4bcab304844f8e60330e5ef1f and added CONFIG_DYNAMIC_DEBUG. This kernel will go into two big servers this time to give it some real load. Let's see whether I can reproduce there, too. If so, I will provide debug output as requested. Amon Ott -- Dr. Amon Ott m-privacy GmbH Tel: +49 30 24342334 Am Köllnischen Park 1 Fax: +49 30 24342336 10179 Berlin http://www.m-privacy.de Amtsgericht Charlottenburg, HRB 84946 Geschäftsführer: Dipl.-Kfm. Holger Maczkowsky, Roman Maczkowsky GnuPG-Key-ID: 0x2DD3A649 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html