Hi folks,
So, as you can see, the server is back up!
Here's the story, in case any fellow geeks are interested:
Back in the evening of December 17th, I noticed that the server wasn't responding. Unfortunately, it was already after hours at the colocation facility and I was hosting an event the following day so it was Sunday by the time I got into the server room. When I got there, the console was displaying disk errors and the operating system was hung. When I rebooted the server, two drives in the RAID-6 array were rebuilding, so I left it to rebuild and I came back a couple hours later to find the rest of the disks had "disappeared" from the array. I put the word "disappeared" in quotes, as the drives would come back online for a little bit (seconds) within the RAID bios, then they would disappear, then reappear, and so on without any discernable pattern.
I contacted the server manufacturer for hardware support the following Monday (the 20th) outlining these issues. I actually had issues very similar to this with this server back in May which was on their records. Despite that and my thoughts that the drives themselves weren't at fault, I had to go through the process of sending them logs, doing tests, trying to "retag" the drives, and so on until late last week (around the 30th) when a technician came out to replace the server's entire SCSI chain (which didn't work) and, the following day, to replace the risers for the RAID card (which seems to have worked).
After spending a day testing out the hardware configuration (drive verification tests, etc), I reinstalled the OS and restored things from my off-site backups. There are a few more things I need to get running, but I think and hope the server is working well for everyone again!
If you run into any issues, please use the
contact us page and let me know details.
Thanks, everyone, for your support, concern, and thoughts!
Best,
-- Jun