Saturday, March 25, 2017

ESXi 6 hangs/freezes up

Troubleshooting is so fun when you made tens of changes at once haha.  So my ole' trusty ESXi 6 HPz800 machine started to hang up after a day or so of operation.  Looking into the ESXi logs, they are not much help as you find what you would think are red alarms, but after lots of googling they seem to be normal logs.  But skipping over to vmdkwarning logs and I got a lot of (copied and pasted from another forum member's issues, too lazy to SSH into the system again):
Lost access to volume
4bcce772-3bfe7a35-dceb-001b21541d90 (1_5WD
1) due to connectivity issues. Recovery attempt
is in progress and outcome will be reported
shortly.

And then:
Successfully restored access to volume 4bcce772-
3bfe7a35-dceb-001b21541d90 (1_5WD_1_)
following connectivity issues.
info
4/21/2010 1:16:16 PM

Then according to a few forum posts, it might be HDD failing, or overheating, or get in and tweak heart beat settings (IMO the latter a band-aid to symptoms of something needing to get fixed).

In denial that it is already time to replace my WD Black 1TB I have the datastore on, I thought about what things have changed on my system:
- put some buffer material along my HDDs in software RAID to reduce noise, but this could be reducing their airflow too
- pulled out a Hauppauge HVR card from a defunct MythTV build
- pulled out 24 gigs of 'original' RAM and put in 48 gigs of Ebay ECC RAM

Software side
- upgraded to FreeNAS 10
- Ubuntu server with Plex media server, fstab FreeNAS10 CIF of Movies share
- New Splunk server on CentOS7

Oh joy, plenty to look into.  But after lots of forums and poking around I was finding the Ubuntu Plex server to be getting hung up, particularly the kswapd process eating all of the resources- typical Linux forums leads one on a chase is it the kernel?  Kind of known yet unknownish bug? You the installer is just dumb etc.  What I settled on is I didn't give the Ubuntu server enough RAM to run plex effectively (though I watched two movies without issue, it spins out of control randomly at idle)- hopefully that is the fix, or else I will just have to start over on a CentOS server build (Ubuntu is showing it's desktop user bias as it has not been good at 'services' jobs like running splunk server or now in this case Plex).  This thread doesn't give me much optimism:
http://serverfault.com/questions/316560/how-do-i-tell-what-process-is-causing-kswapd-to-be-in-use/316636

No comments:

Post a Comment