Re: rbd map command hangs for 15 minutes during system start up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The kernel is 3.5.7 with the following patches applied (and in the
order specified below):

001-libceph_eliminate_connection_state_DEAD_13_days_ago.patch
002-libceph_kill_bad_proto_ceph_connection_op_13_days_ago.patch
003-libceph_rename_socket_callbacks_13_days_ago.patch
004-libceph_rename_kvec_reset_and_kvec_add_functions_13_days_ago.patch
005-libceph_embed_ceph_messenger_structure_in_ceph_client_13_days_ago.patch
006-libceph_start_separating_connection_flags_from_state_13_days_ago.patch
007-libceph_start_tracking_connection_socket_state_13_days_ago.patch
008-libceph_provide_osd_number_when_creating_osd_13_days_ago.patch
009-libceph_set_CLOSED_state_bit_in_con_init_13_days_ago.patch
010-libceph_embed_ceph_connection_structure_in_mon_client_13_days_ago.patch
011-libceph_drop_connection_refcounting_for_mon_client_13_days_ago.patch
012-libceph_init_monitor_connection_when_opening_13_days_ago.patch
013-libceph_fully_initialize_connection_in_con_init_13_days_ago.patch
014-libceph_tweak_ceph_alloc_msg_13_days_ago.patch
015-libceph_have_messages_point_to_their_connection_13_days_ago.patch
016-libceph_have_messages_take_a_connection_reference_13_days_ago.patch
017-libceph_make_ceph_con_revoke_a_msg_operation_13_days_ago.patch
018-libceph_make_ceph_con_revoke_message_a_msg_op_13_days_ago.patch
019-libceph_fix_overflow_in___decode_pool_names_13_days_ago.patch
020-libceph_fix_overflow_in_osdmap_decode_13_days_ago.patch
021-libceph_fix_overflow_in_osdmap_apply_incremental_13_days_ago.patch
022-libceph_transition_socket_state_prior_to_actual_connect_13_days_ago.patch
023-libceph_fix_NULL_dereference_in_reset_connection_13_days_ago.patch
024-libceph_use_con_get_put_methods_13_days_ago.patch
025-libceph_drop_ceph_con_get_put_helpers_and_nref_member_13_days_ago.patch
026-libceph_encapsulate_out_message_data_setup_13_days_ago.patch
027-libceph_encapsulate_advancing_msg_page_13_days_ago.patch
028-libceph_don_t_mark_footer_complete_before_it_is_13_days_ago.patch
029-libceph_move_init_bio__functions_up_13_days_ago.patch
030-libceph_move_init_of_bio_iter_13_days_ago.patch
031-libceph_don_t_use_bio_iter_as_a_flag_13_days_ago.patch
032-libceph_SOCK_CLOSED_is_a_flag_not_a_state_13_days_ago.patch
033-libceph_don_t_change_socket_state_on_sock_event_13_days_ago.patch
034-libceph_just_set_SOCK_CLOSED_when_state_changes_13_days_ago.patch
035-libceph_don_t_touch_con_state_in_con_close_socket_13_days_ago.patch
036-libceph_clear_CONNECTING_in_ceph_con_close_13_days_ago.patch
037-libceph_clear_NEGOTIATING_when_done_13_days_ago.patch
038-libceph_define_and_use_an_explicit_CONNECTED_state_13_days_ago.patch
039-libceph_separate_banner_and_connect_writes_13_days_ago.patch
040-libceph_distinguish_two_phases_of_connect_sequence_13_days_ago.patch
041-libceph_small_changes_to_messenger.c_13_days_ago.patch
042-libceph_add_some_fine_ASCII_art_13_days_ago.patch
043-libceph_set_peer_name_on_con_open_not_init_13_days_ago.patch
044-libceph_initialize_mon_client_con_only_once_13_days_ago.patch
045-libceph_allow_sock_transition_from_CONNECTING_to_CLOSED_13_days_ago.patch
046-libceph_initialize_msgpool_message_types_13_days_ago.patch
047-libceph_prevent_the_race_of_incoming_work_during_teardown_13_days_ago.patch
048-libceph_report_socket_read_write_error_message_13_days_ago.patch
049-libceph_fix_mutex_coverage_for_ceph_con_close_13_days_ago.patch
050-libceph_resubmit_linger_ops_when_pg_mapping_changes_12_days_ago.patch
051-libceph_re_initialize_bio_iter_on_start_of_message_receive_28_hours_ago.patch
052-libceph_protect_ceph_con_open_with_mutex_28_hours_ago.patch
053-libceph_reset_connection_retry_on_successfully_negotiation_28_hours_ago.patch
054-libceph_fix_fault_locking_close_socket_on_lossy_fault_28_hours_ago.patch
055-libceph_move_msgr_clear_standby_under_con_mutex_protection_28_hours_ago.patch
056-libceph_move_ceph_con_send_closed_check_under_the_con_mutex_28_hours_ago.patch
057-libceph_drop_gratuitous_socket_close_calls_in_con_work_28_hours_ago.patch
058-libceph_close_socket_directly_from_ceph_con_close_28_hours_ago.patch
059-libceph_drop_unnecessary_CLOSED_check_in_socket_state_change_callback_28_hours_ago.patch
060-libceph_replace_connection_state_bits_with_states_28_hours_ago.patch
061-libceph_clean_up_con_flags_28_hours_ago.patch
062-libceph_clear_all_flags_on_con_close_28_hours_ago.patch
063-libceph_fix_handling_of_immediate_socket_connect_failure_28_hours_ago.patch
064-libceph_revoke_mon_client_messages_on_session_restart_28_hours_ago.patch
065-libceph_verify_state_after_retaking_con_lock_after_dispatch_28_hours_ago.patch
066-libceph_avoid_dropping_con_mutex_before_fault_28_hours_ago.patch
067-libceph_change_ceph_con_in_msg_alloc_convention_to_be_less_weird_28_hours_ago.patch
068-libceph_recheck_con_state_after_allocating_incoming_message_28_hours_ago.patch
069-libceph_fix_crypto_key_null_deref_memory_leak_28_hours_ago.patch
070-libceph_delay_debugfs_initialization_until_we_learn_global_id_28_hours_ago.patch
071-libceph_avoid_truncation_due_to_racing_banners_28_hours_ago.patch
072-libceph_only_kunmap_kmapped_pages_28_hours_ago.patch
073-rbd_reset_BACKOFF_if_unable_to_re-queue_28_hours_ago.patch
074-libceph_avoid_NULL_kref_put_when_osd_reset_races_with_alloc_msg_28_hours_ago.patch
075-ceph_fix_dentry_reference_leak_in_encode_fh_28_hours_ago.patch
076-ceph_Fix_oops_when_handling_mdsmap_that_decreases_max_mds_28_hours_ago.patch
077-libceph_check_for_invalid_mapping_28_hours_ago.patch
078-ceph_avoid_32-bit_page_index_overflow_28_hours_ago.patch
079-libceph_define_ceph_extract_encoded_string_28_hours_ago.patch
080-rbd_define_some_new_format_constants_28_hours_ago.patch
081-rbd_define_rbd_dev_image_id_28_hours_ago.patch
082-rbd_kill_create_snap_sysfs_entry_28_hours_ago.patch
083-libceph_remove_osdtimeout_option_28_hours_ago.patch
084-ceph_don_t_reference_req_after_put_28_hours_ago.patch
085-libceph_avoid_using_freed_osd_in___kick_osd_requests_28_hours_ago.patch
086-libceph_register_request_before_unregister_linger_28_hours_ago.patch
087-libceph_socket_can_close_in_any_connection_state_28_hours_ago.patch
088-libceph_init_osd-_o_node_in_create_osd_28_hours_ago.patch
089-rbd_remove_linger_unconditionally_28_hours_ago.patch
090-HEAD_ceph_wip-nick-newer_libceph_reformat___reset_osd_28_hours_ago.patch
linux-3.4.4-ignoresync-hack.patch

Yes I was only enabling debugging for libceph.  I'm adding debugging
for rbd as well.  I'll do a repro later today when a test cluster
opens up.


On Fri, Dec 14, 2012 at 8:46 AM, Alex Elder <elder@xxxxxxxxxxx> wrote:
> On 12/13/2012 01:00 PM, Nick Bartos wrote:
>> Here's another log with the kernel debugging enabled:
>> https://gist.github.com/raw/4278697/1c9e41d275e614783fbbdee8ca5842680f46c249/rbd-hang-1355424455.log
>>
>> Note that it hung on the 2nd try.
>
> Just to make sure I'm working with the right code base, can
> you confirm that you're using a kernel built with the equivalent
> of what's now in the "wip-nick-newer" branch (commit id 1728893)?
>
>
> Also, looking at this log I don't think I see any rbd debug output.
> Does that make sense to you?
>
> How are you activating debugging to get these messages?
> If it includes something like:
>
>     echo module libceph +p > /sys/kernel/debug/dynamic_debug/control
>
> it might be that you need to also do:
>
>     echo module rbd +p > /sys/kernel/debug/dynamic_debug/control
>
> This information would be helpful in providing some more context
> about what rbd is doing that's leading to the various messaging
> activity I seen in this log.
>
> Please send me a log with that info if you are able to produce
> one.  Thanks a lot.
>
>                                         -Alex
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux