On Fri, Sep 19, 2014 at 11:22 AM, Micha Krause <micha at krausam.de> wrote: > Hi, > >> I have build an NFS Server based on Sebastiens Blog Post here: >> >> http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/ >> >> Im using Kernel 3.14-0.bpo.1-amd64 on Debian wheezy, the host is a VM on >> Vmware. >> >> Using rsync im writing data via nfs from one client to this Server. >> >> The NFS Server crashes multiple times per day, I can't even login to the >> Server then. >> After a reset, there is no kernel log about the crash, so I guess >> something is blocking >> all I/Os. > > > Ok, it seems that I just can't get a shell, but I can run commands via ssh > directly. So does it actually crash or it's just the blocked I/Os? If it doesn't crash, you should be able to get everything off dmesg. > > I was able to get the following informations: > > dmesg: > > [18102.981064] INFO: task nfsd:2769 blocked for more than 120 seconds. > [18102.981112] Not tainted 3.14-0.bpo.1-amd64 #1 > [18102.981150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [18102.981216] nfsd D ffff88003fc14340 0 2769 2 > 0x00000000 > [18102.981218] ffff88003bac6e20 0000000000000046 0000000000000000 > ffff88003d47ada0 > [18102.981219] 0000000000014340 ffff88003ce31fd8 0000000000014340 > ffff88003bac6e20 > [18102.981221] ffff88003ce31728 ffff8800029539f0 7fffffffffffffff > 7fffffffffffffff > [18102.981223] Call Trace: > [18102.981225] [<ffffffff814eedbd>] ? schedule_timeout+0x1ed/0x250 > [18102.981231] [<ffffffffa04b0f92>] ? _xfs_buf_find+0xd2/0x280 [xfs] > [18102.981234] [<ffffffff8117fc2c>] ? kmem_cache_alloc+0x1bc/0x1f0 > [18102.981236] [<ffffffff814f193c>] ? __down_common+0x97/0xea > [18102.981241] [<ffffffffa04b0faa>] ? _xfs_buf_find+0xea/0x280 [xfs] > [18102.981243] [<ffffffff810aa697>] ? down+0x37/0x40 > [18102.981247] [<ffffffffa04b0e02>] ? xfs_buf_lock+0x32/0xf0 [xfs] > [18102.981252] [<ffffffffa04b0faa>] ? _xfs_buf_find+0xea/0x280 [xfs] > [18102.981257] [<ffffffffa04b1215>] ? xfs_buf_get_map+0x35/0x1a0 [xfs] > [18102.981263] [<ffffffffa04b2153>] ? xfs_buf_read_map+0x33/0x130 [xfs] > [18102.981269] [<ffffffffa05161da>] ? xfs_trans_read_buf_map+0x34a/0x4f0 > [xfs] > [18102.981275] [<ffffffffa05036f9>] ? xfs_imap_to_bp+0x69/0xf0 [xfs] > [18102.981281] [<ffffffffa0503bcd>] ? xfs_iread+0x7d/0x3f0 [xfs] > [18102.981284] [<ffffffff810e8939>] ? make_kgid+0x9/0x10 > [18102.981286] [<ffffffff811b148e>] ? inode_init_always+0x10e/0x1d0 > [18102.981292] [<ffffffffa04ba11a>] ? xfs_iget+0x2ba/0x810 [xfs] > [18102.981298] [<ffffffffa04fd9a6>] ? xfs_ialloc+0xe6/0x740 [xfs] > [18102.981305] [<ffffffffa04ca1ee>] ? kmem_zone_alloc+0x6e/0xf0 [xfs] > [18102.981311] [<ffffffffa04fe083>] ? xfs_dir_ialloc+0x83/0x300 [xfs] > [18102.981317] [<ffffffffa04c8e43>] ? xfs_trans_reserve+0x213/0x220 [xfs] > [18102.981323] [<ffffffffa04fe87e>] ? xfs_create+0x4fe/0x720 [xfs] > [18102.981329] [<ffffffffa04bfd02>] ? xfs_vn_mknod+0xd2/0x200 [xfs] > [18102.981331] [<ffffffff811a6b54>] ? vfs_create+0xe4/0x160 > [18102.981335] [<ffffffffa0400d9e>] ? do_nfsd_create+0x53e/0x610 [nfsd] > [18102.981339] [<ffffffffa0407f4d>] ? nfsd3_proc_create+0x16d/0x250 [nfsd] > [18102.981342] [<ffffffffa03f9d74>] ? nfsd_dispatch+0xe4/0x230 [nfsd] > [18102.981347] [<ffffffffa035dd64>] ? svc_process_common+0x354/0x690 > [sunrpc] > [18102.981349] [<ffffffff81096ab0>] ? try_to_wake_up+0x280/0x280 > [18102.981353] [<ffffffffa035e3fb>] ? svc_process+0x10b/0x160 [sunrpc] > [18102.981359] [<ffffffffa03f96d7>] ? nfsd+0xb7/0x130 [nfsd] > [18102.981363] [<ffffffffa03f9620>] ? nfsd_destroy+0x70/0x70 [nfsd] > [18102.981365] [<ffffffff81086d6c>] ? kthread+0xbc/0xe0 > [18102.981367] [<ffffffff81086cb0>] ? flush_kthread_worker+0xa0/0xa0 > [18102.981369] [<ffffffff814faecc>] ? ret_from_fork+0x7c/0xb0 > [18102.981371] [<ffffffff81086cb0>] ? flush_kthread_worker+0xa0/0xa0 Is that the only hung task in dmesg? > > iostat: > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 1.00 99.00 0.00 0.00 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > rbd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 46.00 0.00 0.00 0.00 0.00 100.00 > rbd1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 12.00 0.00 0.00 0.00 0.00 100.00 > rbd2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 136.00 0.00 0.00 0.00 0.00 100.00 > rbd3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > rbd4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 11.00 0.00 0.00 0.00 0.00 100.00 > rbd5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 57.00 0.00 0.00 0.00 0.00 100.00 > emcpowerig 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 32.00 0.00 0.00 0.00 0.00 100.00 > emcpowerhq 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 38.00 0.00 0.00 0.00 0.00 100.00 > > (for some reason rbd6 and rbd7 are shown as emcpower in iostat, no idea why) > > cat /sys/kernel/debug/ceph/*/* > > have osdmap 39226 > want next osdmap > epoch 10 > mon0 10.210.32.11:6789 > mon1 10.210.33.11:6789 > mon2 10.210.34.11:6789 > 582277 osd19 37.e52208ae rb.0.1676178.2ae8944a.000000005ffd > write > 582278 osd2 37.f52b9433 rb.0.1676178.2ae8944a.000000006000 > write > 582279 osd2 37.b3f0aae3 rb.0.1676178.2ae8944a.00000000641b > write > 582280 osd28 37.d8768bba rb.0.1676178.2ae8944a.00000000641c > write > 582282 osd29 37.a923b4c6 rb.0.1676178.2ae8944a.000000008032 > write > 582283 osd28 37.a2510620 rb.0.1676178.2ae8944a.000000008034 > write > 582289 osd18 37.c96bc19d rb.0.1345def.2ae8944a.0000001401cf > write > 582290 osd1 37.3edba98c rb.0.165171e.238e1f29.000000039fe3 > write > 582291 osd20 37.89f3f734 rb.0.1676160.238e1f29.0000000002ee > write > 582292 osd20 37.89f3f734 rb.0.1676160.238e1f29.0000000002ee > write > 582293 osd2 37.d34be89b rb.0.1345def.2ae8944a.00000003c961 > read > 582294 osd4 37.c611292f rb.0.1345def.2ae8944a.00000007a19a > read > 582295 osd20 37.b1ae9634 rb.0.1345def.2ae8944a.00000003def7 > read > 582296 osd19 37.3928a207 rb.0.1689cf4.238e1f29.000000034106 > write > 582297 osd15 37.1011005e rb.0.1689d69.238e1f29.000000034614 > write > 582298 osd15 37.1011005e rb.0.1689d69.238e1f29.000000034614 > write > 582299 osd24 37.80a8ba78 rb.0.167614e.2ae8944a.000000026149 > write > 582300 osd18 37.fe0a5354 rb.0.1689d69.238e1f29.0000000221bb > write > 582301 osd18 37.fe0a5354 rb.0.1689d69.238e1f29.0000000221bb > write > 582302 osd18 37.fe0a5354 rb.0.1689d69.238e1f29.0000000221bb > write > 582303 osd18 37.fe0a5354 rb.0.1689d69.238e1f29.0000000221bb > write > 582304 osd18 37.fe0a5354 rb.0.1689d69.238e1f29.0000000221bb > write > 582305 osd18 37.fe0a5354 rb.0.1689d69.238e1f29.0000000221bb > write > 582306 osd18 37.fe0a5354 rb.0.1689d69.238e1f29.0000000221bb > write > 582307 osd18 37.fe0a5354 rb.0.1689d69.238e1f29.0000000221bb > write > 582308 osd2 37.d142a662 rb.0.1689d69.238e1f29.0000000221bc > write > 582309 osd15 37.1011005e rb.0.1689d69.238e1f29.000000034614 > write > 582310 osd15 37.1011005e rb.0.1689d69.238e1f29.000000034614 > write > 582311 osd24 37.3c5f1cf8 rb.0.1689d69.238e1f29.0000000221b9 > write > 582312 osd24 37.3c5f1cf8 rb.0.1689d69.238e1f29.0000000221b9 > write > 582313 osd24 37.3c5f1cf8 rb.0.1689d69.238e1f29.0000000221b9 > write > 582314 osd24 37.3c5f1cf8 rb.0.1689d69.238e1f29.0000000221b9 > write > 582315 osd24 37.3c5f1cf8 rb.0.1689d69.238e1f29.0000000221b9 > write > 582316 osd24 37.3c5f1cf8 rb.0.1689d69.238e1f29.0000000221b9 > write > 582317 osd24 37.3c5f1cf8 rb.0.1689d69.238e1f29.0000000221b9 > write > 582318 osd24 37.3c5f1cf8 rb.0.1689d69.238e1f29.0000000221b9 > write > 582319 osd1 37.36930f4c rb.0.1689d69.238e1f29.0000000221ba > write > 582320 osd2 37.d142a662 rb.0.1689d69.238e1f29.0000000221bc > write > 582321 osd2 37.d142a662 rb.0.1689d69.238e1f29.0000000221bc > write > 582322 osd2 37.d142a662 rb.0.1689d69.238e1f29.0000000221bc > write > 582323 osd2 37.d142a662 rb.0.1689d69.238e1f29.0000000221bc > write > 582324 osd2 37.d142a662 rb.0.1689d69.238e1f29.0000000221bc > write > 582325 osd2 37.d142a662 rb.0.1689d69.238e1f29.0000000221bc > write > 582326 osd2 37.d142a662 rb.0.1689d69.238e1f29.0000000221bc > write > 582327 osd2 37.d142a662 rb.0.1689d69.238e1f29.0000000221bc > write > 582328 osd27 37.886d4a41 rb.0.1689d69.238e1f29.0000000221bd > write > 582329 osd19 37.3928a207 rb.0.1689cf4.238e1f29.000000034106 > write > 582330 osd23 37.834e04c8 rb.0.1689cf4.238e1f29.00000002485f > write > 582331 osd2 37.a7fb0062 rb.0.1689cf4.238e1f29.00000002bfea > write > 582332 osd2 37.a7fb0062 rb.0.1689cf4.238e1f29.00000002bfea > write > 582333 osd29 37.e3741a18 rb.0.1689cf4.238e1f29.00000002c0f6 > write > 582334 osd19 37.dec7ad07 rb.0.1689cf4.238e1f29.000000031fe7 > write > 582335 osd19 37.dec7ad07 rb.0.1689cf4.238e1f29.000000031fe7 > write > 582336 osd26 37.3f51e7ac rb.0.1689cf4.238e1f29.0000000320e7 > write > 582337 osd18 37.536a8f6 rb.0.1689cf4.238e1f29.000000033fe6 > write > 582338 osd18 37.536a8f6 rb.0.1689cf4.238e1f29.000000033fe6 > write > 582339 osd18 37.e3d0fce6 rb.0.1689cf4.238e1f29.00000002005f > write > 582340 osd1 37.36930f4c rb.0.1689d69.238e1f29.0000000221ba > write > 582341 osd1 37.36930f4c rb.0.1689d69.238e1f29.0000000221ba > write > 582342 osd1 37.36930f4c rb.0.1689d69.238e1f29.0000000221ba > write > 582343 osd1 37.36930f4c rb.0.1689d69.238e1f29.0000000221ba > write > 582344 osd27 37.886d4a41 rb.0.1689d69.238e1f29.0000000221bd > write > 582345 osd27 37.886d4a41 rb.0.1689d69.238e1f29.0000000221bd > write > 582346 osd27 37.886d4a41 rb.0.1689d69.238e1f29.0000000221bd > write > 582347 osd16 37.9764b7df rb.0.1676178.2ae8944a.000000016392 > write > 582348 osd16 37.9764b7df rb.0.1676178.2ae8944a.000000016392 > write > 582349 osd16 37.9764b7df rb.0.1676178.2ae8944a.000000016392 > write > 582350 osd16 37.9764b7df rb.0.1676178.2ae8944a.000000016392 > write > 582351 osd16 37.9764b7df rb.0.1676178.2ae8944a.000000016392 > write > 582352 osd16 37.9764b7df rb.0.1676178.2ae8944a.000000016392 > write > 582353 osd4 37.1067846f rb.0.1676178.2ae8944a.00000003a43c > write > 582354 osd26 37.ecccc929 rb.0.1676178.2ae8944a.00000003c703 > write > 582355 osd18 37.ed91673e rb.0.1676178.2ae8944a.00000003c704 > write > 582356 osd23 37.58fdaf5a rb.0.1676178.2ae8944a.00000003dfe1 > write > 582357 osd23 37.58fdaf5a rb.0.1676178.2ae8944a.00000003dfe1 > write > 582358 osd23 37.a050d40f rb.0.1676178.2ae8944a.00000003e0a4 > write > 582359 osd1 37.c3f1f60c rb.0.1676178.2ae8944a.00000003e24f > write > 582360 osd18 37.8ad6d326 rb.0.1676178.2ae8944a.00000003e50b > write > 582361 osd28 37.cdf2c9ba rb.0.1676178.2ae8944a.00000003e50c > write > 582362 osd27 37.99cdab71 rb.0.1676178.2ae8944a.00000003e50d > write > 582363 osd27 37.99cdab71 rb.0.1676178.2ae8944a.00000003e50d > write > 582364 osd16 37.1eaec3ff rb.0.1676178.2ae8944a.00000000641a > write > 582365 osd24 37.d881c8e rb.0.1676178.2ae8944a.000000007ffc > write > 582366 osd24 37.d881c8e rb.0.1676178.2ae8944a.000000007ffc > write > 582367 osd24 37.d881c8e rb.0.1676178.2ae8944a.000000007ffc > write > 582368 osd27 37.d054a135 rb.0.1676178.2ae8944a.00000000842c > write > 582369 osd27 37.d054a135 rb.0.1676178.2ae8944a.00000000842c > write > 582370 osd18 37.876fd4d7 rb.0.1676178.2ae8944a.000000008430 > write > 582371 osd19 37.e08b5547 rb.0.1676178.2ae8944a.000000008431 > write > 582372 osd26 37.cd501c rb.0.1676178.2ae8944a.00000000843e > write > 582373 osd26 37.cd501c rb.0.1676178.2ae8944a.00000000843e > write > 582374 osd2 37.c5381d37 rb.0.1676178.2ae8944a.000000015ff5 > write > 582375 osd2 37.c5381d37 rb.0.1676178.2ae8944a.000000015ff5 > write > 582376 osd17 37.82f3d395 rb.0.1676178.2ae8944a.000000016173 > write > 582377 osd19 37.9af0f044 rb.0.1676178.2ae8944a.000000016389 > write > 582378 osd29 37.6d121a86 rb.0.1676178.2ae8944a.00000002004a > write > 582379 osd2 37.ee45629b rb.0.1689d69.238e1f29.00000000484e > write > 582380 osd28 37.b24e1c3a rb.0.1689d69.238e1f29.000000005ffd > write > 582381 osd28 37.b24e1c3a rb.0.1689d69.238e1f29.000000005ffd > write > 582382 osd27 37.56c53271 rb.0.1689d69.238e1f29.00000000618d > write > 582383 osd4 37.a5d8fbaf rb.0.1689d69.238e1f29.000000007ffc > write > 582384 osd4 37.a5d8fbaf rb.0.1689d69.238e1f29.000000007ffc > write > 582385 osd4 37.a5d8fbaf rb.0.1689d69.238e1f29.000000007ffc > write > 582386 osd1 37.947c5bcc rb.0.1689d69.238e1f29.0000000080bd > write > 582387 osd26 37.7ae1d612 rb.0.1689d69.238e1f29.000000008112 > write > 582388 osd18 37.9bf4cf3c rb.0.1689d69.238e1f29.0000000081eb > write > 582389 osd1 37.9d5cf7f0 rb.0.1689d69.238e1f29.0000000081ec > write > 582390 osd19 37.4f750fee rb.0.1689d69.238e1f29.0000000081f0 > write > 582391 osd19 37.3db853ad rb.0.1689d69.238e1f29.000000009ffb > write > 582392 osd19 37.3db853ad rb.0.1689d69.238e1f29.000000009ffb > write > 582393 osd26 37.94b385c rb.0.1689d69.238e1f29.00000000a1d1 > write > 582394 osd28 37.fd40607a rb.0.1689d69.238e1f29.000000022197 > write > 582395 osd15 37.1011005e rb.0.1689d69.238e1f29.000000034614 > write > 582396 osd27 37.c42eab1 rb.0.1689d69.238e1f29.000000036160 > write > 582397 osd2 37.2ac07662 rb.0.1689d69.238e1f29.00000001fffc > write > 582398 osd24 37.80a8ba78 rb.0.167614e.2ae8944a.000000026149 > write > 582399 osd18 37.5c0f9c25 rb.0.167614e.2ae8944a.00000002a133 > write > 582400 osd2 37.89e0f5b3 rb.0.167614e.2ae8944a.00000002ffe8 > write > 582401 osd26 37.9be1215c rb.0.167614e.2ae8944a.00000000042b > write > 582402 osd23 37.a64e7d0f rb.0.167614e.2ae8944a.00000000238f > write > 582403 osd20 37.da28f8ab rb.0.167614e.2ae8944a.000000003ffe > write > 582404 osd24 37.b7af1f40 rb.0.167614e.2ae8944a.000000004005 > write > 582405 osd20 37.cecda9b4 rb.0.167614e.2ae8944a.000000004b8a > write > 582406 osd19 37.29bcdcae rb.0.167614e.2ae8944a.000000005ffd > write > 582407 osd19 37.29bcdcae rb.0.167614e.2ae8944a.000000005ffd > write > 582408 osd27 37.f29b1b01 rb.0.167614e.2ae8944a.000000006242 > write > 582409 osd18 37.d68a6565 rb.0.167614e.2ae8944a.000000007ffc > write > 582410 osd18 37.d68a6565 rb.0.167614e.2ae8944a.000000007ffc > write > 582411 osd24 37.87cc6a8e rb.0.167614e.2ae8944a.0000000082f2 > write > 582412 osd15 37.a91eb70d rb.0.167614e.2ae8944a.000000009ffb > write > 582413 osd15 37.3b20fd05 rb.0.167614e.2ae8944a.00000000a03a > write > 582414 osd2 37.89e0f5b3 rb.0.167614e.2ae8944a.00000002ffe8 > write > 582415 osd2 37.89e0f5b3 rb.0.167614e.2ae8944a.00000002ffe8 > write > 582416 osd26 37.f0bb2a2a rb.0.167614e.2ae8944a.0000000301cc > write > 582417 osd26 37.3b4ebe12 rb.0.167614e.2ae8944a.0000000301cd > write > 582418 osd24 37.346ed74e rb.0.167614e.2ae8944a.000000031fe7 > write > 582419 osd24 37.346ed74e rb.0.167614e.2ae8944a.000000031fe7 > write > 582420 osd27 37.3f46e175 rb.0.167614e.2ae8944a.00000003215c > write > 582421 osd18 37.796dd926 rb.0.167614e.2ae8944a.00000003217a > write > 582422 osd19 37.4287ec84 rb.0.167614e.2ae8944a.000000033fe6 > write > 582423 osd19 37.4287ec84 rb.0.167614e.2ae8944a.000000033fe6 > write > 582424 osd26 37.84b7afe9 rb.0.167614e.2ae8944a.000000034112 > write > 582425 osd24 37.f9a234b8 rb.0.167614e.2ae8944a.000000034113 > write > 582426 osd18 37.6bd0b876 rb.0.167614e.2ae8944a.000000035fe5 > write > 582427 osd18 37.6bd0b876 rb.0.167614e.2ae8944a.000000035fe5 > write > 582428 osd23 37.1ae6123b rb.0.167614e.2ae8944a.00000003611a > write > 582429 osd18 37.8202a597 rb.0.167614e.2ae8944a.00000003611b > write > 582430 osd17 37.4d252e13 rb.0.167614e.2ae8944a.000000037fe4 > write > 582431 osd17 37.4d252e13 rb.0.167614e.2ae8944a.000000037fe4 > write > 582432 osd4 37.7caa0b rb.0.167614e.2ae8944a.00000003800a > write > 582433 osd4 37.5917feaf rb.0.167614e.2ae8944a.0000000380cd > write > 582434 osd29 37.abe267c6 rb.0.167614e.2ae8944a.0000000380cf > write > 582435 osd16 37.dce75fe7 rb.0.167614e.2ae8944a.000000039fe3 > write > 582436 osd16 37.dce75fe7 rb.0.167614e.2ae8944a.000000039fe3 > write > 582437 osd1 37.f04f6991 rb.0.167614e.2ae8944a.00000003a02f > write > 582438 osd1 37.f04f6991 rb.0.167614e.2ae8944a.00000003a02f > write > 582439 osd27 37.33f8b641 rb.0.167614e.2ae8944a.00000003cc91 > write > 582440 osd22 37.be13ad3d rb.0.167614e.2ae8944a.00000003e40d > write > 582441 osd18 37.bbf96366 rb.0.167614e.2ae8944a.000000020035 > write > 582442 osd18 37.bbf96366 rb.0.167614e.2ae8944a.000000020035 > write > 582443 osd1 37.3edba98c rb.0.165171e.238e1f29.000000039fe3 > write > 582444 osd1 37.3edba98c rb.0.165171e.238e1f29.000000039fe3 > write > 582445 osd1 37.3edba98c rb.0.165171e.238e1f29.000000039fe3 > write > 582446 osd16 37.4800911f rb.0.165171e.238e1f29.00000003a085 > write > 582447 osd2 37.bdec20a3 rb.0.165171e.238e1f29.00000003a12c > write > 582448 osd23 37.33d51508 rb.0.165171e.238e1f29.00000003a12d > write > 582449 osd16 37.8a8ec527 rb.0.165171e.238e1f29.00000003a138 > write > 582450 osd17 37.d7435c13 rb.0.165171e.238e1f29.00000003a139 > write > 582451 osd17 37.d7435c13 rb.0.165171e.238e1f29.00000003a139 > write > 582452 osd15 37.7e4b4f9e rb.0.165171e.238e1f29.00000003a13a > write > 582453 osd26 37.ae80f3ac rb.0.165171e.238e1f29.00000003a13b > write > 582454 osd26 37.ae80f3ac rb.0.165171e.238e1f29.00000003a13b > write > 582455 osd15 37.fc3cadcd rb.0.165171e.238e1f29.00000003a13c > write > 582456 osd15 37.fc3cadcd rb.0.165171e.238e1f29.00000003a13c > write > 582457 osd19 37.dcc1f244 rb.0.165171e.238e1f29.00000003a13d > write > 582458 osd28 37.c5ce907a rb.0.165171e.238e1f29.00000003a13e > write > 582459 osd28 37.c5ce907a rb.0.165171e.238e1f29.00000003a13e > write > 582460 osd18 37.d7371b26 rb.0.165171e.238e1f29.00000003a13f > write > 582461 osd18 37.89ec9be5 rb.0.165171e.238e1f29.00000003a140 > write > 582462 osd4 37.5032c82f rb.0.165171e.238e1f29.00000003bfe2 > write > 582463 osd4 37.5032c82f rb.0.165171e.238e1f29.00000003bfe2 > write > 582464 osd26 37.54a4fd50 rb.0.165171e.238e1f29.00000003bfe9 > write > 582465 osd23 37.2929897b rb.0.165171e.238e1f29.00000003c136 > write > 582466 osd23 37.2929897b rb.0.165171e.238e1f29.00000003c136 > write > 582467 osd20 37.b9aff419 rb.0.165171e.238e1f29.00000003dfe1 > write > 582468 osd20 37.b9aff419 rb.0.165171e.238e1f29.00000003dfe1 > write > 582469 osd24 37.685a8638 rb.0.165171e.238e1f29.00000003e08c > write > 582470 osd26 37.adfd8b12 rb.0.165171e.238e1f29.00000003e14a > write > 582471 osd2 37.a67386a2 rb.0.165171e.238e1f29.00000003e14b > write > 582472 osd18 37.688ac754 rb.0.165171e.238e1f29.00000002802c > write > 582473 osd15 37.d2bed74d rb.0.1676160.238e1f29.00000002000d > write > 582474 osd23 37.9c9a1a8f rb.0.165171e.238e1f29.000000020002 > write Have you looked at Ceph servers? krbd is really just a client, so if OSDs don't reply to its requests it can't do much. From a quick look this doesn't look like a krbd bug. > epoch 39226 > flags > pg_pool 0 pg_num 256 / 255 > pg_pool 1 pg_num 128 / 127 > pg_pool 4 pg_num 32 / 31 > pg_pool 19 pg_num 512 / 511 > pg_pool 25 pg_num 8 / 7 > pg_pool 27 pg_num 1 / 0 > pg_pool 28 pg_num 1 / 0 > pg_pool 29 pg_num 1 / 0 > pg_pool 30 pg_num 1 / 0 > pg_pool 31 pg_num 1 / 0 > pg_pool 32 pg_num 1 / 0 > pg_pool 33 pg_num 1 / 0 > pg_pool 34 pg_num 1 / 0 > pg_pool 35 pg_num 2 / 1 > pg_pool 36 pg_num 1 / 0 > pg_pool 37 pg_num 64 / 63 > pg_pool 40 pg_num 2 / 1 > pg_pool 41 pg_num 1 / 0 > osd0 10.210.33.22:6815 100% (exists, up) > osd1 10.210.33.22:6800 100% (exists, up) > osd2 10.210.33.22:6805 100% (exists, up) > osd3 10.210.32.22:6800 0% (doesn't exist) > osd4 10.210.33.22:6810 100% (exists, up) > osd5 10.210.33.22:6820 100% (doesn't exist) > osd6 10.210.33.22:6805 100% (doesn't exist) > osd7 10.210.33.22:6825 100% (doesn't exist) > osd8 10.210.33.22:6830 100% (doesn't exist) > osd9 10.210.33.22:6835 100% (doesn't exist) > osd10 10.210.33.22:6805 100% (doesn't exist) > osd11 10.210.33.22:6845 100% (doesn't exist) > osd12 10.210.33.22:6850 100% (doesn't exist) > osd13 10.210.33.22:6855 100% (doesn't exist) > osd14 10.210.33.22:6860 100% (doesn't exist) > osd15 10.210.32.23:6800 100% (exists, up) > osd16 10.210.32.23:6807 100% (exists, up) > osd17 10.210.32.23:6801 100% (exists, up) > osd18 10.210.32.23:6816 100% (exists, up) > osd19 10.210.32.23:6812 100% (exists, up) > osd20 10.210.34.21:6800 100% (exists, up) > osd21 10.210.34.21:6804 100% (exists, up) > osd22 10.210.34.21:6809 100% (exists, up) > osd23 10.210.34.21:6814 100% (exists, up) > osd24 10.210.34.21:6819 100% (exists, up) > osd25 10.210.33.21:6800 100% (exists, up) > osd26 10.210.33.21:6805 100% (exists, up) > osd27 10.210.33.21:6810 100% (exists, up) > osd28 10.210.33.21:6815 100% (exists, up) > osd29 10.210.33.21:6820 100% (exists, up) > osd30 10.210.33.22:6865 100% (doesn't exist) > osd31 10.210.33.22:6870 0% (doesn't exist) > osd32 10.210.33.21:6800 0% (doesn't exist) > > I don't know how to interpret this, the doesn't exist lines are correct, > these osds where removed. > Why are they still known to the rbd client? The OSDs where removed before > the client was booted. What procedure did you follow to remove those OSDs? Thanks, Ilya