> On Fri, Sep 3, 2010 at 8:02 AM, Bogdan Lobodzinski <bogdan@xxxxxxxxxxxx> wrote: >> >> Hello all, >> >> let me continue my troubles, the title can stay the same. >> As I wrote, my ceph configuration survived my critical test >> svn co https://root.cern.ch/svn/root/trunk root >> and suddenly, during the night, at 5 oclock ceph became stuck again - without any kind of user activity, no work at all with /ceph directory. >> The node is running as >> mds1, mon1, osd0 >> >> System log file reports (the problem starts with entry: >> "Sep 2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale" ): >> -------- >> Sep 1 12:40:38 h1farm183 kernel: [10983.398458] Btrfs loaded >> Sep 1 12:44:25 h1farm183 kernel: [11210.109913] ceph: loaded (mon/mds/osd proto 15/32/24, osdmap 5/5 5/5) >> Sep 1 13:08:25 h1farm183 kernel: [12650.255052] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb >> 1 >> Sep 1 14:25:06 h1farm183 kernel: [17251.100851] RPC: Registered udp transport module. >> Sep 1 14:25:06 h1farm183 kernel: [17251.100854] RPC: Registered tcp transport module. >> Sep 1 14:25:06 h1farm183 kernel: [17251.100855] RPC: Registered tcp NFSv4.1 backchannel transport module. >> Sep 1 14:25:20 h1farm183 kernel: [17265.404967] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb >> 1 >> Sep 1 14:25:20 h1farm183 kernel: [17265.562870] udev: starting version 151 >> Sep 1 14:25:26 h1farm183 kernel: [17271.752817] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb >> 1 >> ... >> Sep 1 16:41:51 h1farm183 kernel: [25456.385184] device fsid 4940eafa1c110ce7-c14b44192348589f devid 1 transid 12 /dev/sdb1 >> Sep 1 16:42:21 h1farm183 kernel: [25486.297025] ceph: client4100 fsid 4ea08089-acf1-b738-6f72-96c3ed029b71 >> Sep 1 16:42:21 h1farm183 kernel: [25486.297169] ceph: mon0 131.169.74.116:6789 session established >> Sep 2 02:37:54 h1farm183 rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="863" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'. >> Sep 2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale >> Sep 2 05:44:57 h1farm183 kernel: [72441.976037] ceph: mds0 caps stale >> Sep 2 05:45:27 h1farm183 kernel: [72472.066320] ceph: mds0 reconnect start >> Sep 2 05:45:27 h1farm183 kernel: [72472.069681] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph btrfs zlib_deflate crc32c libcrc32c ppdev lp parport openafs(P) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp fbcon tileblit font bitblit softcursor vga16fb vgastate radeon ttm mptctl drm_kms_helper bnx2 drm usbhid i5000_edac hid dell_wmi shpchp edac_core agpgart i2c_algo_bit i5k_amb dcdbas psmouse serio_raw mptsas mptscsih mptbase scsi_transport_sas [last unloaded: kvm] >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] Pid: 6184, comm: ceph-msgr/1 Tainted: P (2.6.32-24-generic-pae #42-Ubuntu) PowerEdge 1950 >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] EIP: 0060:[<c01ea907>] EFLAGS: 00010246 CPU: 1 >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] EIP is at kunmap_high+0x97/0xa0 >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] EAX: 00000000 EBX: f5d17000 ECX: c0916848 EDX: 00000292 >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] ESI: c17ee940 EDI: f5d18000 EBP: f5fb3c6c ESP: f5fb3c64 >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] c07d9280 f50b10a0 f5fb3c74 c0138307 f5fb3c98 f9ad7d54 00000000 f5fb3cbc >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] <0> 00000038 0000002b eaee1018 ee4bcd70 00000000 f5fb3d14 f9ada09d 00000000 >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] <0> eaee108c 0000005c f60bab40 eaee0e00 ee788440 f50b10a0 00000a21 00000000 >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [<c0138307>] ? kunmap+0x57/0x60 >> Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [<f9ad7d54>] ? ceph_pagelist_append+0x54/0x110 [ceph] ... >> The node was stuck at all. >> Do you know what can be a reason ? Maybe the following patch fixes it? I'll push a fix to the unstable branch, let me know if it works for you. Thanks, Yehuda diff --git a/fs/ceph/pagelist.c b/fs/ceph/pagelist.c index b6859f4..46a368b 100644 --- a/fs/ceph/pagelist.c +++ b/fs/ceph/pagelist.c @@ -5,10 +5,18 @@ #include "pagelist.h" +static void ceph_pagelist_unmap_tail(struct ceph_pagelist *pl) +{ + struct page *page = list_entry(pl->head.prev, struct page, + lru); + kunmap(page); +} + int ceph_pagelist_release(struct ceph_pagelist *pl) { if (pl->mapped_tail) - kunmap(pl->mapped_tail); + ceph_pagelist_unmap_tail(pl); + while (!list_empty(&pl->head)) { struct page *page = list_first_entry(&pl->head, struct page, lru); @@ -26,7 +34,7 @@ static int ceph_pagelist_addpage(struct ceph_pagelist *pl) pl->room += PAGE_SIZE; list_add_tail(&page->lru, &pl->head); if (pl->mapped_tail) - kunmap(pl->mapped_tail); + ceph_pagelist_unmap_tail(pl); pl->mapped_tail = kmap(page); return 0; } -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html