Thanks for the info on the 510 snapshot per rbd kernel limit. We're also wondering what the rbd metadata limits might be. Are the metadata key and value size limitations listed anywhere? We're planning on using key names with 64 characters (the same string as the snapshot name) with a string json payload in the value field. So on on an rbd with with 100 snapshots, we would also have 100 metadata key/value pairs. The value/data would probably be at least 100 characters per key. On Wed, Jul 27, 2016 at 12:26 PM, Victor Payno <vpayno@xxxxxxxxxx> wrote: > Yes, it has 697 snapshots. The overall average snapshot count per rdb is > 355. The lowest number is 3 and the highest number is 708. > > Interestingly enough we can't map that rbd anymore. > > $ rbd info > test-rbd/3f370dbabff91bbb7ff23ae7a96e5cb414cac3408013cefed6d4b627b5eed9c7-willnotmap > rbd image > '3f370dbabff91bbb7ff23ae7a96e5cb414cac3408013cefed6d4b627b5eed9c7-willnotmap': > size 9536 MB in 2385 objects > order 22 (4096 kB objects) > block_name_prefix: rbd_data.11e601379f78e7 > format: 2 > features: layering, striping > flags: > stripe unit: 4096 kB > stripe count: 1 > > > When we try to mount it, it hangs here: > > setsockopt(3, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 > open("/sys/bus/rbd/add_single_major", O_WRONLY) = 4 > write(4, "192.168.2.63:6789,192.168.3.63:6789,"..., 147 > > > Other rbds mount fine. > > The cluster health is HEALTH_OK. > > > > > From Cliff: > > Sifting through the osd2's log for "d960431b30e2f" I see this: > > 2016-07-27 00:47:47.199252 7f1ac7dbb700 1 -- 10.9.228.101:6814/25892 <== > client.1252427 10.9.72.23:0/273648291 13 ==== osd_op(client.1252427.0:19 > rbd_header.d960431b30e2f [call lock.lock] 37.a819967 > ondisk+write+known_if_redirected e107453) v6 ==== 210+0+51 (4053684278 0 > 659001106) 0x55c9ec362000 con 0x55c9f0101600 > 2016-07-27 00:47:47.199515 7f1ad3539700 1 -- 10.9.228.101:6814/25892 --> > 10.9.72.23:0/273648291 -- osd_op_reply(19 rbd_header.d960431b30e2f [call > lock.lock] v0'0 uv0 ondisk = -16 ((16) Device or resource busy)) v6 -- ?+0 > > On Tue, Jul 26, 2016 at 8:59 PM, Cliff Pajaro <cpajaro@xxxxxxxxxx> wrote: >> >> Sifting through the osd2's log for "d960431b30e2f" I see this: >> >>> 2016-07-27 00:47:47.199252 7f1ac7dbb700 1 -- 10.9.228.101:6814/25892 <== >>> client.1252427 10.9.72.23:0/273648291 13 ==== osd_op(client.1252427.0:19 >>> rbd_header.d960431b30e2f [call lock.lock] 37.a819967 >>> ondisk+write+known_if_redirected e107453) v6 ==== 210+0+51 (4053684278 0 >>> 659001106) 0x55c9ec362000 con 0x55c9f0101600 >>> 2016-07-27 00:47:47.199515 7f1ad3539700 1 -- 10.9.228.101:6814/25892 --> >>> 10.9.72.23:0/273648291 -- osd_op_reply(19 rbd_header.d960431b30e2f [call >>> lock.lock] v0'0 uv0 ondisk = -16 ((16) Device or resource busy)) v6 -- ?+0 >>> 0x55c9ee4b8b00 con 0x55c9f0101600 >> >> >> >> On Tue, Jul 26, 2016 at 6:48 PM, Victor Payno <vpayno@xxxxxxxxxx> wrote: >>> >>> REQUESTS 1 homeless 0 >>> 322452 osd2 37.a819967 [2]/2 [2]/2 >>> rbd_header.d960431b30e2f 0x400019 3 0'0 call >>> LINGER REQUESTS >>> 167 osd2 37.a819967 [2]/2 [2]/2 >>> rbd_header.d960431b30e2f 0x24 2 WC/0 >>> >>> >>> ceph-osd.2.log.gz has been attached/uploaded to here: >>> http://tracker.ceph.com/issues/16630 >>> >>> >>> On Tue, Jul 26, 2016 at 3:31 PM, Victor Payno <vpayno@xxxxxxxxxx> wrote: >>> > We'll have to run the test again for the OSD log data. Forgot to make >>> > sure that the ceph log partition wasn't full. >>> > >>> > >>> > client: >>> > /sys/kernel/debug/ceph/e88d2684-47c1-5a64-a275-6e375d11b557.client1242818/osdc >>> > >>> > REQUESTS 1 homeless 0 >>> > 11475 osd2 37.a819967 [2]/2 [2]/2 >>> > rbd_header.d960431b30e2f 0x400019 3 0'0 call >>> > LINGER REQUESTS >>> > 91 osd0 37.851f81e1 [0]/0 [0]/0 >>> > rbd_header.d94f26cf2eafd 0x24 0 WC/0 >>> > 93 osd0 37.98ca7eab [0]/0 [0]/0 >>> > rbd_header.d94b96a21ce28 0x24 0 WC/0 >>> > 106 osd0 37.9720d758 [0]/0 [0]/0 >>> > rbd_header.d94f53a7da731 0x24 0 WC/0 >>> > 104 osd1 37.de8088a1 [1]/1 [1]/1 >>> > rbd_header.d94f52e0f9b4d 0x24 0 WC/0 >>> > 105 osd1 37.db9af301 [1]/1 [1]/1 >>> > rbd_header.d94f1ed40c97 0x24 0 WC/0 >>> > 14 osd2 37.a819967 [2]/2 [2]/2 >>> > rbd_header.d960431b30e2f 0x24 2 WC/0 >>> > 96 osd2 37.8fb9befc [2]/2 [2]/2 >>> > rbd_header.d94f03028192c 0x24 0 WC/0 >>> > 82 osd3 37.370c3798 [3]/3 [3]/3 >>> > rbd_header.d94da25e9605b 0x24 0 WC/0 >>> > 87 osd4 37.9c510a15 [4]/4 [4]/4 >>> > rbd_header.d94f079de7a55 0x24 0 WC/0 >>> > 85 osd5 37.832091ad [5]/5 [5]/5 >>> > rbd_header.d94f22d15792a 0x24 0 WC/0 >>> > 94 osd5 37.344d5f3 [5]/5 [5]/5 rbd_header.d94f6b30f4a >>> > 0x24 0 WC/0 >>> > 103 osd5 37.4cb8bb74 [5]/5 [5]/5 >>> > rbd_header.d94ef2bac496b 0x24 0 WC/0 >>> > 77 osd6 37.7c480437 [6]/6 [6]/6 >>> > rbd_header.d94c057a9331d 0x24 1 WC/0 >>> > 88 osd7 37.58634cdc [7]/7 [7]/7 >>> > rbd_header.d94da55fd8a34 0x24 1 WC/0 >>> > 98 osd7 37.a61c68b [7]/7 [7]/7 >>> > rbd_header.d94ce7d11bf7f 0x24 0 WC/0 >>> > >>> > >>> > rbd image >>> > '3f370dbabff91bbb7ff23ae7a96e5cb414cac3408013cefed6d4b627b5eed9c7': >>> > size 9536 MB in 2385 objects >>> > order 22 (4096 kB objects) >>> > block_name_prefix: rbd_data.d960431b30e2f >>> > format: 2 >>> > features: layering, striping >>> > flags: >>> > stripe unit: 4096 kB >>> > stripe count: 1 >>> > >>> > >>> > osdmaptool: osdmap file '/tmp/osdmap' >>> > object >>> > '3f370dbabff91bbb7ff23ae7a96e5cb414cac3408013cefed6d4b627b5eed9c7' >>> > -> 37.c6 -> [7] >>> > >>> > On Tue, Jul 26, 2016 at 3:27 PM, Victor Payno <vpayno@xxxxxxxxxx> >>> > wrote: >>> >> We'll have to run the test again for the OSD log data. Forgot to make >>> >> sure >>> >> that the ceph log partition wasn't full. >>> >> >>> >> >>> >> client: >>> >> >>> >> /sys/kernel/debug/ceph/e88d2684-47c1-5a64-a275-6e375d11b557.client1242818/osdc >>> >> >>> >> REQUESTS 1 homeless 0 >>> >> 11475 osd2 37.a819967 [2]/2 [2]/2 >>> >> rbd_header.d960431b30e2f >>> >> 0x400019 3 0'0 call >>> >> LINGER REQUESTS >>> >> 91 osd0 37.851f81e1 [0]/0 [0]/0 >>> >> rbd_header.d94f26cf2eafd >>> >> 0x24 0 WC/0 >>> >> 93 osd0 37.98ca7eab [0]/0 [0]/0 >>> >> rbd_header.d94b96a21ce28 >>> >> 0x24 0 WC/0 >>> >> 106 osd0 37.9720d758 [0]/0 [0]/0 >>> >> rbd_header.d94f53a7da731 >>> >> 0x24 0 WC/0 >>> >> 104 osd1 37.de8088a1 [1]/1 [1]/1 >>> >> rbd_header.d94f52e0f9b4d >>> >> 0x24 0 WC/0 >>> >> 105 osd1 37.db9af301 [1]/1 [1]/1 >>> >> rbd_header.d94f1ed40c97 0x24 >>> >> 0 WC/0 >>> >> 14 osd2 37.a819967 [2]/2 [2]/2 >>> >> rbd_header.d960431b30e2f >>> >> 0x24 2 WC/0 >>> >> 96 osd2 37.8fb9befc [2]/2 [2]/2 >>> >> rbd_header.d94f03028192c >>> >> 0x24 0 WC/0 >>> >> 82 osd3 37.370c3798 [3]/3 [3]/3 >>> >> rbd_header.d94da25e9605b >>> >> 0x24 0 WC/0 >>> >> 87 osd4 37.9c510a15 [4]/4 [4]/4 >>> >> rbd_header.d94f079de7a55 >>> >> 0x24 0 WC/0 >>> >> 85 osd5 37.832091ad [5]/5 [5]/5 >>> >> rbd_header.d94f22d15792a >>> >> 0x24 0 WC/0 >>> >> 94 osd5 37.344d5f3 [5]/5 [5]/5 rbd_header.d94f6b30f4a >>> >> 0x24 >>> >> 0 WC/0 >>> >> 103 osd5 37.4cb8bb74 [5]/5 [5]/5 >>> >> rbd_header.d94ef2bac496b >>> >> 0x24 0 WC/0 >>> >> 77 osd6 37.7c480437 [6]/6 [6]/6 >>> >> rbd_header.d94c057a9331d >>> >> 0x24 1 WC/0 >>> >> 88 osd7 37.58634cdc [7]/7 [7]/7 >>> >> rbd_header.d94da55fd8a34 >>> >> 0x24 1 WC/0 >>> >> 98 osd7 37.a61c68b [7]/7 [7]/7 >>> >> rbd_header.d94ce7d11bf7f >>> >> 0x24 0 WC/0 >>> >> >>> >> >>> >> rbd image >>> >> '3f370dbabff91bbb7ff23ae7a96e5cb414cac3408013cefed6d4b627b5eed9c7': >>> >> size 9536 MB in 2385 objects >>> >> order 22 (4096 kB objects) >>> >> block_name_prefix: rbd_data.d960431b30e2f >>> >> format: 2 >>> >> features: layering, striping >>> >> flags: >>> >> stripe unit: 4096 kB >>> >> stripe count: 1 >>> >> >>> >> >>> >> osdmaptool: osdmap file '/tmp/osdmap' >>> >> object >>> >> '3f370dbabff91bbb7ff23ae7a96e5cb414cac3408013cefed6d4b627b5eed9c7' >>> >> -> 37.c6 -> [7] >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> On Tue, Jul 26, 2016 at 11:11 AM, Ilya Dryomov <idryomov@xxxxxxxxx> >>> >> wrote: >>> >>> >>> >>> On Tue, Jul 26, 2016 at 7:58 PM, Patrick McLean <patrickm@xxxxxxxxxx> >>> >>> wrote: >>> >>> > Hi Ilya, >>> >>> > >>> >>> > We discovered this weekend that enabling lockdep in the kernel >>> >>> > makes the >>> >>> > issue go away. We are working on reproducing without lockdep, and >>> >>> > isolating >>> >>> > the issue in the OSD logs. We should be have OSD debug logs this >>> >>> > week. >>> >>> >>> >>> I'm going to need the "cat /sys/kernel/debug/ceph/*/osdc" output, the >>> >>> osd log for the osd from that output, and the output of "echo w" and >>> >>> "echo t" to /proc/sysrq-trigger. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Ilya >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Victor Payno >>> >> ビクター·ペイン >>> >> >>> >> Sr. Release Engineer >>> >> シニアリリースエンジニア >>> >> >>> >> >>> >> >>> >> Gaikai, a Sony Computer Entertainment Company ∆○×□ >>> >> ガイカイ、ソニー・コンピュータエンタテインメント傘下会社 >>> >> 65 Enterprise >>> >> Aliso Viejo, CA 92656 USA >>> >> >>> >> Web: www.gaikai.com >>> >> Email: vpayno@xxxxxxxxxx >>> >> Phone: (949) 330-6850 >>> > >>> > >>> > >>> > -- >>> > Victor Payno >>> > ビクター·ペイン >>> > >>> > Sr. Release Engineer >>> > シニアリリースエンジニア >>> > >>> > >>> > >>> > Gaikai, a Sony Computer Entertainment Company ∆○×□ >>> > ガイカイ、ソニー・コンピュータエンタテインメント傘下会社 >>> > 65 Enterprise >>> > Aliso Viejo, CA 92656 USA >>> > >>> > Web: www.gaikai.com >>> > Email: vpayno@xxxxxxxxxx >>> > Phone: (949) 330-6850 >>> >>> >>> >>> -- >>> Victor Payno >>> ビクター·ペイン >>> >>> Sr. Release Engineer >>> シニアリリースエンジニア >>> >>> >>> >>> Gaikai, a Sony Computer Entertainment Company ∆○×□ >>> ガイカイ、ソニー・コンピュータエンタテインメント傘下会社 >>> 65 Enterprise >>> Aliso Viejo, CA 92656 USA >>> >>> Web: www.gaikai.com >>> Email: vpayno@xxxxxxxxxx >>> Phone: (949) 330-6850 >> >> > > > > -- > Victor Payno > ビクター·ペイン > > Sr. Release Engineer > シニアリリースエンジニア > > > > Gaikai, a Sony Computer Entertainment Company ∆○×□ > ガイカイ、ソニー・コンピュータエンタテインメント傘下会社 > 65 Enterprise > Aliso Viejo, CA 92656 USA > > Web: www.gaikai.com > Email: vpayno@xxxxxxxxxx > Phone: (949) 330-6850 -- Victor Payno ビクター·ペイン Sr. Release Engineer シニアリリースエンジニア Gaikai, a Sony Computer Entertainment Company ∆○×□ ガイカイ、ソニー・コンピュータエンタテインメント傘下会社 65 Enterprise Aliso Viejo, CA 92656 USA Web: www.gaikai.com Email: vpayno@xxxxxxxxxx Phone: (949) 330-6850 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html