Re: workstation has become ill

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



if it was failing/weak power supply it would just crash, nothing slows
down nicely when that happens.

Nvidia GPU will usually crash the hardware if it overstresses the
power supply and will also crash if it goes bad.

Now overheating may cause the cpus to throttle and that may make the
machine feel rather sluggish, though I would not expect minutes,
unless there is normally a large cpu load on the machine.

Install a package called perf, and next time see if you can run "perf
top" that will show internally what calls the kernel processes may or
may not be doing internally and how much time they are spending.  Note
that on machines with large counts of cores that the ondemand power
savings settings that adjusts mhz is expensive to run.  That will show
as significant system time, and that will show in perf top.    If you
don't have it installed, install sar or a similar tool that will give
you some ideas of what the system saw leading up to the issues, and
during the issues.  Usually I set sar at a 1minute sample vs the
default 10min sample, that change is done via systemd, google knows
how.   The other items that will crush a machine and aren't obvious
are applications creating processes at a high rate, and/or
applications mapping and unmapping a lot of memory, that will also
show as system time, and will have a footprint in perf top.  So note
when it is running good the ratio of user to system time (user being
5x system or higher is what is normal, if it drops to much below 5
often indicates one of the above issues).    sar will show
cpu(user/system/...), disk response, and a lot of raw network and
tcp/udp stats, and process created rates and memory allocation and
paging rates.



On Fri, Feb 5, 2021 at 6:09 AM Neal Becker <ndbecker2@xxxxxxxxx> wrote:
>
> I've been running F32 on a shiny new amd dual epyc workstation for
> about 1 year.  The system is now remote to me and not convenient to
> access.
>
> About 1 week ago the system became unresponsive.  I noticed errors
> logged about I/O errors, so I guessed it was an issue with the SSD.  I
> went there and replaced the SSD with a shiny new samsung 1tb.
> Reinstalled F33 and got my vpns going so I could access again from
> home.
>
> But things are acting very strangely.  Install was lightning fast.
> But after a while the machine becomes unusable.  Any command takes
> minutes to react.  I am unable to reboot it.  sudo reboot after a very
> long time does nothing.
> I don't see anything interesting in /var/log/messages (I installed rsyslog).
> When I can eventually get top to run, I see systemd is in D state.
> There is plenty of free memory, and the machine has 64GB.
>
> I'm going to visit again and this time yank out the nvidia gpu.  This
> is just a wild guess based on 1) it isn't critical for use right now
> 2) it places a load on the power supply just in case that's the issue
> 3) it's the only thing I can think to try.
>
> Just wondering if anyone has any thoughts on how to troubleshoot this.
> _______________________________________________
> users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx



[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux