В Ср, 05/07/2017 в 08:26 +0200, Alvaro Miranda пишет: > Hi > > So access to the server is required, if there are other servers in > the network, then netconsole will send to the remote box over same > syslog idea logs that get printed to the console > > syslog for logs that go to the normal log files > > and kdump will be able to generate a memory dump and some logs if > there is any crash > > with that you should be able to have ll the information outside this > server when there is a hang > > othertools that can be user, are sar/sysstat you can copy the files > out for inspection and get an idea about performance and what was > doing what before a hang > > good luck > alvaro > > > On 5 Jul 2017, at 03:54, Alexandr <sss@xxxxxxxxxxxxxxx> wrote: > > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA512 > > > > В Вт, 04/07/2017 в 07:48 -0500, Billy Crook пишет: > > > You can set up syslog to forward of to an external system. That > > > helps but > > > sometimes it can miss things if the crash is severe enough. > > > > > > Serial console can capture much more if there's another server > > > near > > > by that > > > you can connect and leave minicom logging. Even better if you > > > have a > > > nice > > > physical "serial console server" like an OpenGear. > > > > > > Most server motherboards these days contain some sort of > > > management > > > chip > > > implementing ipmi and additional remote access features which > > > often > > > include > > > remote KVM over IP. > > > > > > Even the most basic ipmi controllers will include serial over LAN > > > which > > > let's do you use serial console without a server nearby to > > > connect a > > > physical serial cable. > > > > > > If the hang is caused by a hardware problem you may find > > > indication > > > of it > > > in the ipmi system event list. 'ipmitool sel list' > > > > > > There's also a kernel module, netconsole, that forwards kernel > > > messages off > > > host in a syslog compatible format. It implements IP and UDP and > > > its > > > own > > > code so it works independent of the system's TCP stack which is > > > important > > > during some crashes which might otherwise break networking before > > > the > > > messages get transmitted. > > > > > > That's just a few ways. There's probably others. Hope it helps > > > > > > > On Jul 4, 2017 07:32, "Alexandr" <sss@xxxxxxxxxxxxxxx> wrote: > > > > > > > > Good day to all. > > > > please suggest methods of system hang debugging on headless > > > > server > > > > ? > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe > > > > linux- > > > > admin" in > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > > More majordomo info at http://vger.kernel.org/majordomo-info.h > > > > tml > > > > > > > > thx for reply. > > it's self built server on regular desktop hardware, so have not > > ipmi. > > it's possible to have phisical access to it, so i guess serial > > console > > is my solution. > > i am also have few questions about serial console setup. > > as i understand simplest way to achieve it is default serial com > > port, > > i am also have read something about serial via usb (looks like it > > require some sort of special adaptor) > > > > also netconsole looks interesting too, i will setup it right now. > > -----BEGIN PGP SIGNATURE----- > > > > iQIzBAEBCgAdFiEEl/sA7WQg6czXWI/dsEApXVthX7wFAllcRvEACgkQsEApXVth > > X7w5DA//VnlRsWDxy0RifqMrmBtiL1/fxdl3Wz/PhH/Fdu+AVgM2XdaITYHfI0uU > > 7l6bYPPnMnXlmiqtJ0DkYEV+Xgq7Gkuw3JK8eppLHehgMfaIEmnXWPMoxVVZalmn > > u1HVGBdEGfRtiLisrNvZvGgeCwXdGIfMeZbnBFwFbG3vbxzHzaGd7mPTO9CvhPyj > > FArTHF+i3DtVpteXbww78JS0qEKLTh83cW1FcCqAWjZBvrnz35i7Egp4grajhhzQ > > ki8wFUqwwVWY+JVg6UcpGr5CH4b11b4qv3Vol/ideooiTHWPv1xKXMjT3+DIggzc > > gh0g6ZwK3IDcR2rtJ2lGD/UOi4jJNti8A6O/48sXIsgruePuDIPNJOIbBbRR5EDv > > cDlb6vVHA0QO6XilUcTKFPEjg9yG++JIyYHmGrGxJGo28KpF+GbStA1xgm98Mbqj > > AJP/mZIdLCoKuRGEGkPCvHAX4BfMm4zdPmBcj3XAiw9xGJCFuts0xowX9uSqnzT6 > > ITlVA9VHWlm030TFYp0h2HYjuzpVrKNw06OQ0AqHy7E932DlPr3us2y7ugWGXUgE > > Sxewc2+QXR2RGEHXVWfbymGr6uKtqwmwuyl8q7QQ/oqYGFK99WblRMT+6+K0kJI6 > > f/YzGyAqHRtN2gBZvRjX2BYVPdPKcGbNKqp27B9wOkz58NNRpF8= > > =PmkQ > > -----END PGP SIGNATURE----- > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux- > > admin" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html i have caught another hang, but now with netconsole i have a bit more info. i still do not understand why it hangs. [64051.691781] cleanupd invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0 [64051.693087] cleanupd cpuset=/ mems_allowed=0 [64051.694287] CPU: 2 PID: 1090 Comm: cleanupd Tainted: G W 4.11.8 #5 [64051.695471] Hardware name: System manufacturer System Product Name/M4A77TD, BIOS 2104 06/28/2010 [64051.696647] Call Trace: [64051.697802] ? dump_stack+0x46/0x6d [64051.698885] ? dump_header+0x8f/0x1ff [64051.700002] ? mem_cgroup_scan_tasks+0x9b/0xc0 [64051.701006] ? oom_kill_process+0x217/0x400 [64051.701972] ? out_of_memory+0xf0/0x280 [64051.702900] ? mem_cgroup_out_of_memory+0x36/0x60 [64051.703800] ? mem_cgroup_oom_synchronize+0x2e3/0x300 [64051.704665] ? __mem_cgroup_insert_exceeded+0x80/0x80 [64051.705499] ? pagefault_out_of_memory+0x1c/0x60 [64051.706301] ? do_page_fault+0x1b/0x60 [64051.707083] ? do_syscall_64+0x5e/0x180 [64051.707839] ? __context_tracking_exit+0x8/0x20 [64051.708572] ? page_fault+0x22/0x30 [64051.709334] Task in /system.slice/smbd.service killed as a result of limit of /system.slice/smbd.service [64051.710134] memory: usage 524288kB, limit 524288kB, failcnt 186142 [64051.710937] memory+swap: usage 528164kB, limit 9007199254740988kB, failcnt 0 [64051.711753] kmem: usage 7908kB, limit 9007199254740988kB, failcnt 0 [64051.712594] Memory cgroup stats for /system.slice/smbd.service: cache:516380KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:515544KB writeback:0KB swap:3876KB inactive_anon:0KB active_anon:0KB inactive_file:258164KB active_file:258088KB unevictable:0KB [64051.714413] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [64051.715447] [ 1088] 0 1088 62115 699 121 3 484 0 smbd [64051.716431] [ 1089] 0 1089 60375 260 116 3 458 0 smbd-notifyd [64051.717387] [ 1090] 0 1090 60379 0 114 3 458 0 cleanupd [64051.718372] [25466] 1000 25466 71519 600 128 3 629 0 smbd [64051.719351] Memory cgroup out of memory: Kill process 25466 (smbd) score 0 or sacrifice child [64051.720330] Killed process 25466 (smbd) total-vm:286076kB, anon- rss:0kB, file-rss:2400kB, shmem-rss:0kB [64051.726998] oom_reaper: reaped process 25466 (smbd), now anon- rss:0kB, file-rss:0kB, shmem-rss:0kB [64261.584682] ------------[ cut here ]------------ [64261.588370] WARNING: CPU: 0 PID: 1033 at net/ipv4/tcp_input.c:2820 tcp_fastretrans_alert+0x8db/0xac0 [64261.592190] Modules linked in: algif_skcipher af_alg netconsole veth ccm sch_pie sctp act_mirred ifb sch_ingress nf_conntrack_netlink nfnetlink cls_u32 sch_sfq sch_htb sit tunnel4 ip_tunnel nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw ip6_tables iptable_raw ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TCPMSS xt_sctp ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_recent xt_conntrack nf_conntrack xt_multiport iptable_filter iptable_mangle ip_tables x_tables radeon ath9k led_class ath9k_common ath9k_hw i2c_algo_bit ttm drm_kms_helper mac80211 sch_fq_codel ath cfbfillrect cfg80211 syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm br_netfilter bridge snd_hda_codec_via snd_hda_codec_generic snd_hda_intel snd_hda_codec [64261.622008] stp backlight llc rfkill r8169 xhci_pci snd_usb_audio snd_hwdep snd_usbmidi_lib snd_hda_core snd_rawmidi ohci_pci xhci_hcd snd_seq_device ohci_hcd parport_pc vhost_net snd_pcm i2c_piix4 snd_timer tun mii btrfs acpi_cpufreq button asus_atk0110 snd soundcore vhost processor tap xor kvm_amd kvm irqbypass gspca_zc3xx gspca_main v4l2_common k10temp hwmon raid6_pq uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_core i2c_core parport fbcon nfsd auth_rpcgss oid_registry nfs_acl bitblit lockd softcursor fb grace sunrpc fbdev font ipv6 autofs4 [64261.647404] CPU: 0 PID: 1033 Comm: ml1 Tainted: G W 4.11.8 #5 [64261.652695] Hardware name: System manufacturer System Product Name/M4A77TD, BIOS 2104 06/28/2010 [64261.657986] Call Trace: [64261.663153] <IRQ> [64261.668267] ? dump_stack+0x46/0x6d [64261.673321] ? __warn+0xb4/0xe0 [64261.678290] ? tcp_fastretrans_alert+0x8db/0xac0 [64261.683298] ? tcp_ack+0xd58/0x1300 [64261.688302] ? tcp_rcv_established+0xf4/0x6a0 [64261.693313] ? tcp_v4_do_rcv+0x115/0x200 [64261.698314] ? tcp_v4_rcv+0xaef/0xb80 [64261.703302] ? nf_nat_ipv4_fn+0x53/0x1a0 [nf_nat_ipv4] [64261.708324] ? ip_local_deliver_finish+0x87/0x1e0 [64261.713352] ? ip_local_deliver+0x3d/0xc0 [64261.718349] ? inet_del_offload+0x40/0x40 [64261.723318] ? ip_rcv+0x281/0x380 [64261.728270] ? ip_local_deliver_finish+0x1e0/0x1e0 [64261.733261] ? __netif_receive_skb_core+0x49b/0x9c0 [64261.738259] ? netif_receive_skb_internal+0x1a/0x80 [64261.743261] ? ifb_ri_tasklet+0x16a/0x240 [ifb] [64261.748188] ? tasklet_action+0x8c/0xa0 [64261.753022] ? __do_softirq+0xd4/0x200 [64261.757817] ? irq_exit+0xe7/0x100 [64261.762552] ? do_IRQ+0x45/0xc0 [64261.767163] ? common_interrupt+0x89/0x89 [64261.771654] </IRQ> [64261.776269] ---[ end trace fa5d523840b633e2 ]--- [64552.218363] systemd[1]: systemd-journald.service: State 'stop- sigabrt' timed out. Terminating. so what happened here.. if i understand correctly, samba start eating more memory than allowed (>512MB) and systemd service manager tried to kill it (it's should be ok), but after this we have "[64552.218363] systemd[1]: systemd- journald.service: State 'stop-sigabrt' timed out. Terminating. " and hanged system. "WARNING: CPU: 0 PID: 1033 at net/ipv4/tcp_input.c:2820 tcp_fastretrans_alert+0x8db/0xac0" - this is upstream regression which is unrelated i think. any suggestions ?, looks like systemd problem ? -- To unsubscribe from this list: send the line "unsubscribe linux-admin" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html