Re: bad sectors on rbd device?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think you are running out of memory(?), or at least of the memory for the type of allocation krbd tries to use.
I'm not going to decode all the logs but you can try increasing min_free_kbytes as the first step. I assume this is amd64 when there's no HIGHMEM trouble (I don't remember how to solve those).
It can happen either due to system being under memory pressure (from device drivers and other in-kernel allocations) or if it is too slow to satisfy the allocation request in time (if it's a VM for example). It can also be caused by bug in the rbd client of course...

Newer kernel almost always helps with vm troubles like this :-)

Jan


> On 05 Jan 2016, at 14:55, Philipp Schwaha <philipp@xxxxxxxxxxx> wrote:
> 
> Hi List,
> 
> I have an issue with an rbd device. I have an rbd device on which I
> created a file system. When I copy files to the file system I get issues
> about failing to write to a sector to sectors on the rbd block device.
> I see the following in the log file:
> 
> [88931.224311] rbd: rbd0: write 80000 at 202e777000 result -12
> [88931.224317] blk_update_request: I/O error, dev rbd0, sector 269958072
> [88931.224542] rbd: rbd0: write 80000 at 202e6f7000 result -12
> [88931.225908] rbd: rbd0: write 80000 at 202e677000 result -12
> [88931.226198] rbd: rbd0: write 80000 at 202e7f7000 result -12
> [88931.227501] rbd: rbd0: write 80000 at 202e877000 result -12
> [88931.247151] rbd: rbd0: write 80000 at 202eff7000 result -12
> [88931.247827] rbd: rbd0: write 80000 at 202f077000 result -12
> 
> Looking further I found the following:
> 
> [88931.181608] warn_alloc_failed: 119 callbacks suppressed
> [88931.181616] kworker/2:13: page allocation failure: order:1, mode:0x204020
> [88931.181621] CPU: 2 PID: 7300 Comm: kworker/2:13 Tainted: G W 4.3.3-ge
> [88931.181636] Workqueue: rbd rbd_queue_workfn [rbd]
> [88931.181641] ffff88013c483ae0 ffffffff813656c3 0000000000204020
> ffffffff8114c438
> [88931.181645] 0000000000000000 ffff88017fff9b00 0000000000000000
> 0000000000000000
> [88931.181648] 0000000000000000 0000000000000f12 0000000000244220
> 0000000000000000
> [88931.181652] Call Trace:
> [88931.181665] [<ffffffff813656c3>] ? dump_stack+0x40/0x5d
> [88931.181670] [<ffffffff8114c438>] ? warn_alloc_failed+0xd8/0x130
> [88931.181673] [<ffffffff8114fa53>] ? __alloc_pages_nodemask+0x2b3/0x9e0
> [88931.181679] [<ffffffff8119752d>] ? kmem_getpages+0x5d/0x100
> [88931.181683] [<ffffffff81199231>] ? fallback_alloc+0x141/0x1f0
> [88931.181686] [<ffffffff8119a4c3>] ? kmem_cache_alloc+0x1e3/0x450
> [88931.181696] [<ffffffffc060df51>] ? ceph_osdc_alloc_request+0x51/0x250
> [libceph]
> [88931.181700] [<ffffffffc0648851>] ?
> rbd_osd_req_create.isra.25+0x51/0x1a0 [rbd]
> [88931.181704] [<ffffffffc064a1f8>] ? rbd_img_request_fill+0x228/0x850 [rbd]
> [88931.181708] [<ffffffffc064b8e9>] ? rbd_queue_workfn+0x2b9/0x3b0 [rbd]
> [88931.181713] [<ffffffff81081dac>] ? process_one_work+0x14c/0x3b0
> [88931.181717] [<ffffffff810826fd>] ? worker_thread+0x4d/0x440
> [88931.181720] [<ffffffff810826b0>] ? rescuer_thread+0x2e0/0x2e0
> [88931.181724] [<ffffffff8108770d>] ? kthread+0xbd/0xe0
> [88931.181727] [<ffffffff81087650>] ? kthread_park+0x50/0x50
> [88931.181731] [<ffffffff8175160f>] ? ret_from_fork+0x3f/0x70
> [88931.181734] [<ffffffff81087650>] ? kthread_park+0x50/0x50
> [88931.181736] Mem-Info:
> [88931.181745] active_anon:57146 inactive_anon:65771 isolated_anon:0
> [88931.181745] active_file:405123 inactive_file:397563 isolated_file:0
> [88931.181745] unevictable:0 dirty:192 writeback:16100 unstable:0
> [88931.181745] slab_reclaimable:28501 slab_unreclaimable:8143
> [88931.181745] mapped:14501 shmem:24976 pagetables:1962 bounce:0
> [88931.181745] free:8824 free_pcp:816 free_cma:0
> [88931.181750] Node 0 DMA free:15436kB min:28kB low:32kB high:40kB
> active_anon:4kB in
> B inactive_file:28kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB present:15
> kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:48kB
> slab_unreclaima
> tables:4kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
> free_cma:0kB writeback_
> eclaimable? no
> [88931.181758] lowmem_reserve[]: 0 1873 3856 3856
> [88931.181762] Node 0 DMA32 free:13720kB min:3800kB low:4748kB
> high:5700kB active_ano
> B active_file:806264kB inactive_file:776948kB unevictable:0kB
> isolated(anon):0kB isol
> B managed:1921632kB mlocked:0kB dirty:104kB writeback:9384kB
> mapped:35712kB shmem:514
> lab_unreclaimable:12532kB kernel_stack:2224kB pagetables:3464kB
> unstable:0kB bounce:0
> 0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [88931.181769] lowmem_reserve[]: 0 0 1982 1982
> [88931.181773] Node 0 Normal free:6140kB min:4024kB low:5028kB
> high:6036kB active_ano
> kB active_file:814224kB inactive_file:813276kB unevictable:0kB
> isolated(anon):0kB iso
> kB managed:2030320kB mlocked:0kB dirty:664kB writeback:55016kB
> mapped:22292kB shmem:4
> slab_unreclaimable:19964kB kernel_stack:2352kB pagetables:4380kB
> unstable:0kB bounce
> 100kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [88931.181780] lowmem_reserve[]: 0 0 0 0
> [88931.181784] Node 0 DMA: 11*4kB (UEM) 8*8kB (EM) 8*16kB (UEM) 3*32kB
> (UE) 2*64kB (U
> 12kB (EM) 3*1024kB (UEM) 1*2048kB (E) 2*4096kB (M) = 15436kB
> [88931.181801] Node 0 DMA32: 3432*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB
> 0*128kB 0*256k
> 096kB = 13728kB
> [88931.181811] Node 0 Normal: 1506*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB
> 0*128kB 0*256
> 4096kB = 6024kB
> [88931.181823] Node 0 hugepages_total=0 hugepages_free=0
> hugepages_surp=0 hugepages_s
> [88931.181825] 828066 total pagecache pages
> [88931.181828] 386 pages in swap cache
> [88931.181831] Swap cache stats: add 8124, delete 7738, find 374/470
> [88931.181832] Free swap = 16745848kB
> [88931.181834] Total swap = 16777212kB
> [88931.181836] 1028188 pages RAM
> [88931.181837] 0 pages HighMem/MovableOnly
> [88931.181838] 36226 pages reserved
> [88931.181840] 0 pages hwpoisoned
> [88931.181948] rbd: rbd0: write 80000 at 202d8f7000 result -12
> [88931.181952] blk_update_request: 119 callbacks suppressed
> [88931.181955] blk_update_request: I/O error, dev rbd0, sector 269928376
> [88931.182792] kworker/2:13: page allocation failure: order:1, mode:0x204020
> 
> I'm using ceph version 0.94.5. Can this be due to my misconfiguring
> something? Is this not a supported use case?
> Is there something I could do to remedy this or perhaps detect in more
> detail where this comes from?
> 
> thanks & best regards
> 	Philipp
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux