Re: Trouble with rbd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Fyodor,

I'm seeing problems with the latest rbd in my environment as well.  
Looking into it.

sage



On Fri, 6 May 2011, Fyodor Ustinov wrote:
> Hi!
> 
> Latest 64 bit kernel (2.6.39-rc6, but with latest ubuntu 2.6.38-8 - the same).
> 
> Very similar to that rbd or libceph ruin the memory.
> 
> 1. Make rbd.
> 
> rbd create tmt --size 102400
> 
> 
> 
> 2. Attach and make disk
> 
> modprobe rbd
> 
> echo "xx.xx.xx.xx name=admin rbd tmt" > /sys/bus/rbd/add
> 
> mkfs.ext4 -E lazy_itable_init=0 /dev/rbd0
> 
> mount /dev/rbd0 /mnt
> 
> 
> 
> 3. start test - start iozone + bonnie++ + rsync.
> 
> root@stb1:/mnt# iozone -a -n4g -g20g
>         Iozone: Performance Test of File I/O
>                 Version $Revision: 3.373 $
>                 Compiled for 64 bit mode.
>                 Build: linux-AMD64
> [...]
> 
>         Auto Mode
>         Using minimum file size of 4194304 kilobytes.
>         Using maximum file size of 20971520 kilobytes.
>         Command line used: iozone -a -n4g -g20g
>         Output is in Kbytes/sec
>         Time Resolution = 0.000001 seconds.
>         Processor cache size set to 1024 Kbytes.
>         Processor cache line size set to 32 bytes.
>         File stride size set to 17 * record size.
>                                                             random  random
> bkwd   record   stride
>               KB  reclen   write rewrite    read    reread    read   write
> read  rewrite     read   fwrite frewrite   fread  freread
>          4194304      64  120955   66607    25500    28465   14384   89028
> 42840  5504993    84387   111439   124649
> 
> Error in file: Found ?170aba407000ffc? Expecting ?3a3a3a3a3a3a3a3a? addr
> 7fb85eb06000
> Error in file: Position 3984547840
> Record # 60799 Record size 64 kb
> where 7fb85eb06000 loop 24576
> root@stb1:/mnt#
> 
> umount /mnt
> 
> root@stb1:~# fsck -f /dev/rbd0
> fsck from util-linux-ng 2.17.2
> e2fsck 1.41.14 (22-Dec-2010)
> Pass 1: Checking inodes, blocks, and sizes
> Inode 1850570 is in use, but has dtime set.  Fix<y>? /dev/rbd0: e2fsck
> canceled.
> root@stb1:~#
> 
> 
> 
> So.
> 
> No errors in logs on stb1
> 
> But format /dev/rbd0 by ocfs2 and the same test make nice show in syslog -
> very nicely fluttering out in all directions errors in all that you can
> imagine. Like that:
> May  5 22:36:57 stb1 kernel: [13683.907928] BUG: Bad page state in process
> iozone  pfn:5ba7e
> May  5 22:36:57 stb1 kernel: [13683.907936] page:ffffea000140cb90 count:1
> mapcount:0 mapping:          (null) index:0xa0f63
> May  5 22:36:57 stb1 kernel: [13683.907941] page flags: 0x100000000000000()
> May  5 22:36:57 stb1 kernel: [13683.907945] Pid: 2752, comm: iozone Not
> tainted 2.6.39-rc6-ufm #1
> May  5 22:36:57 stb1 kernel: [13683.907947] Call Trace:
> May  5 22:36:57 stb1 kernel: [13683.907953]  [<ffffffff8110e90b>] ?
> dump_page+0x9b/0xd0
> May  5 22:36:57 stb1 kernel: [13683.907956]  [<ffffffff8110ea09>]
> bad_page+0xc9/0x120
> May  5 22:36:57 stb1 kernel: [13683.907959]  [<ffffffff8110f258>]
> get_page_from_freelist+0x6f8/0x7a0
> May  5 22:36:57 stb1 kernel: [13683.907963]  [<ffffffff8118dd74>] ?
> __find_get_block_slow.clone.11+0x44/0x140
> May  5 22:36:57 stb1 kernel: [13683.907967]  [<ffffffff8110fb28>]
> __alloc_pages_nodemask+0x118/0x8a0
> May  5 22:36:57 stb1 kernel: [13683.907971]  [<ffffffff81084517>] ?
> bit_waitqueue+0x17/0xd0
> May  5 22:36:57 stb1 kernel: [13683.907973]  [<ffffffff8108463f>] ?
> wake_up_bit+0x2f/0x40
> May  5 22:36:57 stb1 kernel: [13683.907976]  [<ffffffff8123dfa4>] ?
> do_get_write_access+0x2e4/0x490
> May  5 22:36:57 stb1 kernel: [13683.907979]  [<ffffffff8123c807>] ?
> start_this_handle.clone.7+0x417/0x4a0
> May  5 22:36:57 stb1 kernel: [13683.908000]  [<ffffffffa02b2492>] ?
> ocfs2_inode_cache_unlock+0x12/0x20 [ocfs2]
> May  5 22:36:57 stb1 kernel: [13683.908004]  [<ffffffff81144b05>]
> alloc_pages_current+0xa5/0x110
> May  5 22:36:57 stb1 kernel: [13683.908006]  [<ffffffff81107eef>]
> __page_cache_alloc+0x8f/0xa0
> May  5 22:36:57 stb1 kernel: [13683.908009]  [<ffffffff8110895c>]
> find_or_create_page+0x4c/0xb0
> May  5 22:36:57 stb1 kernel: [13683.908019]  [<ffffffffa0291c1d>]
> ocfs2_write_begin_nolock+0x15dd/0x1bf0 [ocfs2]
> May  5 22:36:57 stb1 kernel: [13683.908032]  [<ffffffffa02b21e0>] ?
> ocfs2_find_actor+0x140/0x140 [ocfs2]
> May  5 22:36:57 stb1 kernel: [13683.908043]  [<ffffffffa0292326>]
> ocfs2_write_begin+0xf6/0x210 [ocfs2]
> May  5 22:36:57 stb1 kernel: [13683.908046]  [<ffffffff81107489>]
> generic_file_buffered_write+0x109/0x250
> May  5 22:36:57 stb1 kernel: [13683.908059]  [<ffffffffa02b1d44>]
> ocfs2_file_aio_write+0x7c4/0x810 [ocfs2]
> May  5 22:36:57 stb1 kernel: [13683.908070]  [<ffffffffa0292663>] ?
> ocfs2_write_end_nolock+0x223/0x380 [ocfs2]
> May  5 22:36:57 stb1 kernel: [13683.908073]  [<ffffffff8115fcb2>]
> do_sync_write+0xd2/0x110
> May  5 22:36:57 stb1 kernel: [13683.908077]  [<ffffffff812ab318>] ?
> apparmor_file_permission+0x18/0x20
> May  5 22:36:57 stb1 kernel: [13683.908080]  [<ffffffff812770dc>] ?
> security_file_permission+0x2c/0xb0
> May  5 22:36:57 stb1 kernel: [13683.908111]  [<ffffffff811600e1>] ?
> rw_verify_area+0x61/0xf0
> May  5 22:36:57 stb1 kernel: [13683.908115]  [<ffffffff81160426>]
> vfs_write+0xc6/0x180
> May  5 22:36:57 stb1 kernel: [13683.908117]  [<ffffffff81160741>]
> sys_write+0x51/0x90
> May  5 22:36:57 stb1 kernel: [13683.908120]  [<ffffffff815d5b8e>] ?
> common_interrupt+0xe/0x13
> May  5 22:36:57 stb1 kernel: [13683.908125]  [<ffffffff815dd902>]
> system_call_fastpath+0x16/0x1b
> 
> May  5 22:37:39 stb1 kernel: [13726.005858] BUG: Bad page state in process
> swapper  pfn:22eff
> May  5 22:37:39 stb1 kernel: [13726.005866] page:ffffea00007a47c8 count:0
> mapcount:-127 mapping:          (null) index:0xd9772
> May  5 22:37:39 stb1 kernel: [13726.005871] page flags: 0x100000000000000()
> May  5 22:37:39 stb1 kernel: [13726.005882] Pid: 0, comm: swapper Tainted: G
> B       2.6.39-rc6-ufm #1
> May  5 22:37:39 stb1 kernel: [13726.005884] Call Trace:
> May  5 22:37:39 stb1 kernel: [13726.005885] <IRQ>  [<ffffffff8110e90b>] ?
> dump_page+0x9b/0xd0
> May  5 22:37:39 stb1 kernel: [13726.005894]  [<ffffffff8110ea09>]
> bad_page+0xc9/0x120
> May  5 22:37:39 stb1 kernel: [13726.005897]  [<ffffffff8110eb4f>]
> free_pages_prepare+0xef/0x100
> May  5 22:37:39 stb1 kernel: [13726.005900]  [<ffffffff81110419>]
> free_hot_cold_page+0x49/0x470
> May  5 22:37:39 stb1 kernel: [13726.005902]  [<ffffffff81113770>]
> __put_single_page+0x20/0x30
> May  5 22:37:39 stb1 kernel: [13726.005904]  [<ffffffff811138bd>]
> put_page+0x2d/0x40
> May  5 22:37:39 stb1 kernel: [13726.005909]  [<ffffffff8152a0d1>]
> __pskb_trim_head+0xd1/0x120
> May  5 22:37:39 stb1 kernel: [13726.005911]  [<ffffffff8152ad70>]
> tcp_trim_head+0x70/0x120
> May  5 22:37:39 stb1 kernel: [13726.005914]  [<ffffffff81526b47>]
> tcp_ack+0x547/0x1de0
> May  5 22:37:39 stb1 kernel: [13726.005916]  [<ffffffff8152c100>] ?
> tcp_write_xmit+0x1d0/0x990
> May  5 22:37:39 stb1 kernel: [13726.005920]  [<ffffffff8114fd60>] ?
> kmem_cache_free+0x20/0x110
> May  5 22:37:39 stb1 kernel: [13726.005923]  [<ffffffff8152940d>]
> tcp_rcv_established+0x44d/0x830
> May  5 22:37:39 stb1 kernel: [13726.005926]  [<ffffffff81530af9>]
> tcp_v4_do_rcv+0x189/0x430
> May  5 22:37:39 stb1 kernel: [13726.005928]  [<ffffffff8153315c>] ?
> tcp_v4_rcv+0x65c/0x900
> May  5 22:37:39 stb1 kernel: [13726.005932]  [<ffffffff8107d05f>] ?
> queue_work+0x1f/0x30
> May  5 22:37:39 stb1 kernel: [13726.005935]  [<ffffffff81533149>]
> tcp_v4_rcv+0x649/0x900
> May  5 22:37:39 stb1 kernel: [13726.005939]  [<ffffffff8150f72d>]
> ip_local_deliver_finish+0xdd/0x2a0
> May  5 22:37:39 stb1 kernel: [13726.005941]  [<ffffffff8150fab8>]
> ip_local_deliver+0x88/0x90
> May  5 22:37:39 stb1 kernel: [13726.005944]  [<ffffffff8150f401>]
> ip_rcv_finish+0x141/0x390
> May  5 22:37:39 stb1 kernel: [13726.005946]  [<ffffffff8150fcdc>]
> ip_rcv+0x21c/0x2e0
> May  5 22:37:39 stb1 kernel: [13726.005950]  [<ffffffff814dbe7b>]
> __netif_receive_skb+0x55b/0x680
> May  5 22:37:39 stb1 kernel: [13726.005953]  [<ffffffff814dc308>]
> netif_receive_skb+0x58/0x80
> May  5 22:37:39 stb1 kernel: [13726.005955]  [<ffffffff814dc470>]
> napi_skb_finish+0x50/0x70
> May  5 22:37:39 stb1 kernel: [13726.005957]  [<ffffffff814dc9e5>]
> napi_gro_receive+0xb5/0xc0
> May  5 22:37:39 stb1 kernel: [13726.005976]  [<ffffffffa003e3eb>]
> e1000_receive_skb+0x5b/0x90 [e1000]
> May  5 22:37:39 stb1 kernel: [13726.005980]  [<ffffffffa0040c3e>]
> e1000_clean_rx_irq+0x25e/0x4b0 [e1000]
> May  5 22:37:39 stb1 kernel: [13726.005983]  [<ffffffffa00417a9>]
> e1000_clean+0x229/0x620 [e1000]
> May  5 22:37:39 stb1 kernel: [13726.005986]  [<ffffffff814dd41d>] ?
> dev_hard_start_xmit+0x2cd/0x6d0
> May  5 22:37:39 stb1 kernel: [13726.005990]  [<ffffffff81036e39>] ?
> default_spin_lock_flags+0x9/0x10
> May  5 22:37:39 stb1 kernel: [13726.005993]  [<ffffffff814dcbe8>]
> net_rx_action+0x128/0x270
> May  5 22:37:39 stb1 kernel: [13726.005996]  [<ffffffff81069ea8>]
> __do_softirq+0xa8/0x1c0
> May  5 22:37:39 stb1 kernel: [13726.005999]  [<ffffffff8102dc72>] ?
> ack_apic_level+0x72/0x1a0
> May  5 22:37:39 stb1 kernel: [13726.006003]  [<ffffffff815deb1c>]
> call_softirq+0x1c/0x30
> May  5 22:37:39 stb1 kernel: [13726.006006]  [<ffffffff8100d385>]
> do_softirq+0x65/0xa0
> May  5 22:37:39 stb1 kernel: [13726.006008]  [<ffffffff8106a23e>]
> irq_exit+0x8e/0xa0
> May  5 22:37:39 stb1 kernel: [13726.006010]  [<ffffffff815df376>]
> do_IRQ+0x66/0xe0
> May  5 22:37:39 stb1 kernel: [13726.006013]  [<ffffffff815d5b93>]
> common_interrupt+0x13/0x13
> May  5 22:37:39 stb1 kernel: [13726.006014] <EOI>  [<ffffffff810360cb>] ?
> native_safe_halt+0xb/0x10
> May  5 22:37:39 stb1 kernel: [13726.006020]  [<ffffffff81012f31>]
> default_idle+0x41/0xe0
> May  5 22:37:39 stb1 kernel: [13726.006023]  [<ffffffff8100a266>]
> cpu_idle+0xa6/0xf0
> May  5 22:37:39 stb1 kernel: [13726.006027]  [<ffffffff815b4dc5>]
> rest_init+0x75/0x80
> May  5 22:37:39 stb1 kernel: [13726.006031]  [<ffffffff81adfc38>]
> start_kernel+0x3e1/0x3ec
> May  5 22:37:39 stb1 kernel: [13726.006034]  [<ffffffff81adf347>]
> x86_64_start_reservations+0x132/0x136
> May  5 22:37:39 stb1 kernel: [13726.006036]  [<ffffffff81adf44c>]
> x86_64_start_kernel+0x101/0x110
> 
> May  5 22:37:39 stb1 kernel: [13726.049368] BUG: Bad page state in process
> rs:main Q:Reg  pfn:7b6e2
> May  5 22:37:39 stb1 kernel: [13726.049376] page:ffffea0001b00170 count:0
> mapcount:-127 mapping:          (null) index:0xda771
> May  5 22:37:39 stb1 kernel: [13726.049381] page flags: 0x100000000000000()
> May  5 22:37:39 stb1 kernel: [13726.049385] Pid: 2683, comm: rs:main Q:Reg
> Tainted: G    B       2.6.39-rc6-ufm #1
> May  5 22:37:39 stb1 kernel: [13726.049387] Call Trace:
> May  5 22:37:39 stb1 kernel: [13726.049388] <IRQ>  [<ffffffff8110e90b>] ?
> dump_page+0x9b/0xd0
> May  5 22:37:39 stb1 kernel: [13726.049397]  [<ffffffff8110ea09>]
> bad_page+0xc9/0x120
> May  5 22:37:39 stb1 kernel: [13726.049399]  [<ffffffff8110eb4f>]
> free_pages_prepare+0xef/0x100
> May  5 22:37:39 stb1 kernel: [13726.049403]  [<ffffffff81110419>]
> free_hot_cold_page+0x49/0x470
> May  5 22:37:39 stb1 kernel: [13726.049405]  [<ffffffff81113770>]
> __put_single_page+0x20/0x30
> May  5 22:37:39 stb1 kernel: [13726.049407]  [<ffffffff811138bd>]
> put_page+0x2d/0x40
> May  5 22:37:39 stb1 kernel: [13726.049411]  [<ffffffff814cecc4>]
> skb_release_data+0xb4/0xe0
> May  5 22:37:39 stb1 kernel: [13726.049413]  [<ffffffff814ced0e>]
> __kfree_skb+0x1e/0xa0
> May  5 22:37:39 stb1 kernel: [13726.049417]  [<ffffffff81526a44>]
> tcp_ack+0x444/0x1de0
> May  5 22:37:39 stb1 kernel: [13726.049420]  [<ffffffff8152bfa6>] ?
> tcp_write_xmit+0x76/0x990
> May  5 22:37:39 stb1 kernel: [13726.049429]  [<ffffffffa014f001>] ?
> put_osd_con+0x11/0x20 [libceph]
> May  5 22:37:39 stb1 kernel: [13726.049431]  [<ffffffff8152940d>]
> tcp_rcv_established+0x44d/0x830
> May  5 22:37:39 stb1 kernel: [13726.049435]  [<ffffffff81530af9>]
> tcp_v4_do_rcv+0x189/0x430
> May  5 22:37:39 stb1 kernel: [13726.049438]  [<ffffffff8153315c>] ?
> tcp_v4_rcv+0x65c/0x900
> May  5 22:37:39 stb1 kernel: [13726.049441]  [<ffffffff81533149>]
> tcp_v4_rcv+0x649/0x900
> May  5 22:37:39 stb1 kernel: [13726.049445]  [<ffffffff8150f72d>]
> ip_local_deliver_finish+0xdd/0x2a0
> May  5 22:37:39 stb1 kernel: [13726.049448]  [<ffffffff8150fab8>]
> ip_local_deliver+0x88/0x90
> May  5 22:37:39 stb1 kernel: [13726.049450]  [<ffffffff8150f401>]
> ip_rcv_finish+0x141/0x390
> May  5 22:37:39 stb1 kernel: [13726.049452]  [<ffffffff8150fcdc>]
> ip_rcv+0x21c/0x2e0
> May  5 22:37:39 stb1 kernel: [13726.049456]  [<ffffffff814dbe7b>]
> __netif_receive_skb+0x55b/0x680
> May  5 22:37:39 stb1 kernel: [13726.049459]  [<ffffffff814dc308>]
> netif_receive_skb+0x58/0x80
> May  5 22:37:39 stb1 kernel: [13726.049461]  [<ffffffff814dc470>]
> napi_skb_finish+0x50/0x70
> May  5 22:37:39 stb1 kernel: [13726.049463]  [<ffffffff814dc9e5>]
> napi_gro_receive+0xb5/0xc0
> May  5 22:37:39 stb1 kernel: [13726.049470]  [<ffffffffa003e3eb>]
> e1000_receive_skb+0x5b/0x90 [e1000]
> May  5 22:37:39 stb1 kernel: [13726.049474]  [<ffffffffa0040c3e>]
> e1000_clean_rx_irq+0x25e/0x4b0 [e1000]
> May  5 22:37:39 stb1 kernel: [13726.049477]  [<ffffffffa00417a9>]
> e1000_clean+0x229/0x620 [e1000]
> May  5 22:37:39 stb1 kernel: [13726.049482]  [<ffffffff81053412>] ?
> update_shares+0xc2/0x110
> May  5 22:37:39 stb1 kernel: [13726.049484]  [<ffffffff814dcbe8>]
> net_rx_action+0x128/0x270
> May  5 22:37:39 stb1 kernel: [13726.049488]  [<ffffffff81069ea8>]
> __do_softirq+0xa8/0x1c0
> May  5 22:37:39 stb1 kernel: [13726.049491]  [<ffffffff8102dc72>] ?
> ack_apic_level+0x72/0x1a0
> May  5 22:37:39 stb1 kernel: [13726.049495]  [<ffffffff815deb1c>]
> call_softirq+0x1c/0x30
> May  5 22:37:39 stb1 kernel: [13726.049497]  [<ffffffff8100d385>]
> do_softirq+0x65/0xa0
> May  5 22:37:39 stb1 kernel: [13726.049500]  [<ffffffff8106a23e>]
> irq_exit+0x8e/0xa0
> May  5 22:37:39 stb1 kernel: [13726.049502]  [<ffffffff815df376>]
> do_IRQ+0x66/0xe0
> May  5 22:37:39 stb1 kernel: [13726.049505]  [<ffffffff815d5b93>]
> common_interrupt+0x13/0x13
> May  5 22:37:39 stb1 kernel: [13726.049506] <EOI>  [<ffffffff811fdece>] ?
> ext4_mark_iloc_dirty+0xae/0x5c0
> May  5 22:37:39 stb1 kernel: [13726.049512]  [<ffffffff811fdea8>] ?
> ext4_mark_iloc_dirty+0x88/0x5c0
> May  5 22:37:39 stb1 kernel: [13726.049515]  [<ffffffff812035ed>] ?
> ext4_dirty_inode+0x3d/0x60
> May  5 22:37:39 stb1 kernel: [13726.049517]  [<ffffffff811fe501>]
> ext4_mark_inode_dirty+0x81/0x210
> May  5 22:37:39 stb1 kernel: [13726.049521]  [<ffffffff81218f59>] ?
> ext4_journal_start_sb+0x69/0x170
> May  5 22:37:39 stb1 kernel: [13726.049525]  [<ffffffff8118cefe>] ?
> __block_commit_write.clone.12+0x8e/0xd0
> May  5 22:37:39 stb1 kernel: [13726.049528]  [<ffffffff812035ed>]
> ext4_dirty_inode+0x3d/0x60
> May  5 22:37:39 stb1 kernel: [13726.049530]  [<ffffffff81186580>]
> __mark_inode_dirty+0x40/0x210
> May  5 22:37:39 stb1 kernel: [13726.049533]  [<ffffffff8118f1ab>]
> generic_write_end+0x6b/0xa0
> May  5 22:37:39 stb1 kernel: [13726.049535]  [<ffffffff8120216b>]
> ext4_da_write_end+0xdb/0x340
> May  5 22:37:39 stb1 kernel: [13726.049538]  [<ffffffff811074fb>]
> generic_file_buffered_write+0x17b/0x250
> May  5 22:37:39 stb1 kernel: [13726.049541]  [<ffffffff810685f6>] ?
> current_fs_time+0x16/0x60
> May  5 22:37:39 stb1 kernel: [13726.049544]  [<ffffffff81109159>]
> __generic_file_aio_write+0x229/0x440
> May  5 22:37:39 stb1 kernel: [13726.049546]  [<ffffffff811093df>]
> generic_file_aio_write+0x6f/0xe0
> May  5 22:37:39 stb1 kernel: [13726.049549]  [<ffffffff811f7739>]
> ext4_file_write+0x69/0x280
> May  5 22:37:39 stb1 kernel: [13726.049553]  [<ffffffff8115fcb2>]
> do_sync_write+0xd2/0x110
> [...]
> and so on.
> 
> WBR,
>     Fyodor.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux