Tuesday, April 01, 2014

Bad Power Supply! >smack<

Now in the closing phase of a hardware crisis involving my server system, I feel the need to blog and record what happened. It started a couple of weeks ago, when the server started randomly powering off for no reason. The first time, I figured it for a glitch and simply powered it back on. The local power has had some brownouts lately, but my UPS is set on the sensitive side and switches to battery power as needed. My next thought was, maybe it was overheating, since we live in a dusty area and I hadn't dusted out the server case in some time. It didn't seem overly hot inside, but I dusted it out anyway. The problem recurred, so I dusted it some more. The secondary display adapter for the second monitor failed to initialize, so I removed the card and dusted it out as well. After this the system remained stable for eight days so I hoped that I had fixed the problem.

At this point I should mention my file systems and raid arrays. All of my file systems had been converted to EXT3 except for the root partition, and this had to be preened after each failure, necessitating a secondary reboot. Long ago I had decided that EXT2 worked just fine with no need to upgrade, but I re-thought that at some point since my raid partition is now quite huge, and preening it after a crash is a lengthy operation. I had upgraded all partitions except the system disk, since I wasn't sure if I could upgrade a mounted partition. Turns out I can, and so I did, so at least the reboots are fairly uneventful.

However I was not so luck with the raid. After the first two power-offs, the array came up clean. But ever since then, it was coming up dirty, needing a three-hour re-sync process. This was starting to concern me, but in hindsight this doesn't appear to have been an issue.

Anyways, after eight days the problems resumed. Initially the power-offs were about every twenty-four hours. After eight days the problem returned, now happening with greater frequency. Finally last night while I was at work, it failed again, and when I got home I was unable to restart it. I had already figured that the power supply was the likely culprit and I planned to replace it this morning with a newer one of the same model, borrowed from my disused Windows XP system. I went ahead and replaced the power supply last night.

This should have been the end of my troubles, except for one thing. These power supplies have both the SATA-style connectors as well as the older style. However the newer version has more SATA power connectors and fewer of the legacy kind. Support for legacy IDE drives is eroding fast, and non-existent on the newer motherboards. I really need to migrate away from legacy technology but I haven't, for reasons of a monetary nature. So at any rate I worked from top to bottom plugging in devices, and when I get to the bottom I have one hard drive with no power.

I rummage in my closet looking for power splitters. I used to have a lot of these but I can find only one, and it's not in very good shape. I use it anyway. The front intake fan that blows over the hard drives is piggy-backed off one of the drives, and it's running so I figure the power is good. Actually it's not, and the fan uses a different voltage from the drives, so this was not a good test.

I try to boot the system and it can't see the IDE drives at all. Now, when I built this system I made a poor selection in the motherboard department, and bought one that had only one IDE header along with four SATA connectors. Of course, my system has four IDE devices counting the CD and DVD drives. So I bought a JMicron adapter to provide the additional IDE support that I would need. At that time I was running an older version of Linux which could not see both IDE buses for some reason. The JMicron bus caused the one on the motherboard to be invisible, except during the installation process, when it was able to install Slackware without a problem. I subsequently upgrade Slackware and this problem went away.

My first thought is the JMicron card has failed somehow, so I unplugged the IDE cable to the CD/DVD drives, and plug in the hard drive bus in its place. This result in the system hanging during the power-up sequence, during drive detection. I restore the IDE buses to their original places and boot the system. Slackware comes up, albeit without the two IDE drives, and hence is unable to start the raid array.

My memory of the sequence of events is a little muddled since it was late at night, I had just worked a long cashier shift, and I was feeling unusually tired on top of that. At any rate I attempted boot Slackware on multiple occasions with varying degrees of success. At one point I was getting hard drive error messages, and either the boot process got hung, or it got to the login prompt but I was unable to log in. It prompted for the username but not the password, and then after a pause returned to the username prompt. Crtl-Alt-Del shutdown would not work since it couldn't execute /sbin/shutdown for some reason.

I got past this impasse somehow, booted the system with everything except the IDE drives, and went to bed.

In the morning I worked on getting a temporary desktop operational. My normal desktop resides on the raid partition, which is mounted as /home. This partition is shared via NFS and Samba. My satellite Linux systems use NFS to mount this as server3:/home. server3 is the third in a line of server systems, starting with a 486 running Windows NT Server, then a Pentium 4 running Windows 2000 Server, and finally the current Linux implementation. I have a feeling I will be wanting to break ground on a new server soon.

Since the laptops might be away from home and away from the /home directory, I created the concept of /away folders, residing on the local drives of satellite systems. User directories are created here, and symbolic links in the /home/ directories point here. Also, in case the "real" /home directory is not mounted, the mount point in the root partition, when not in use as a mount point, contains symbolic links to the user directories in /away. This allows users to log in with the server being present. For the sake of completeness, I created a /away directory on the server as well.

Therefore with the raid partition not mounted, my home directory points to the /away directory and I am able to log in and bring up X-Windows. I get on to Facebook and update my zoos, drink some coffee, then get back to work. I'm able to get on the web, so I do some research. At this point I'm deciding that one of the IDE drives is bad, and it's bringing down the entire IDE bus. So I try unplugging one drive or the other at a time. No dice. I seem to have a culprit, if the lower IDE drive is on the bus, the bus is down.

At this point I toy with the notion of putting one of my IDE-to-SATA tailgates on the 'good' IDE drive and connecting it to a Sata port on the motherboard. The tailgate requires power in the form of a floppy style power connection. My power supply has one, and it's way up at the top of the case hanging off the DVD drives. Remember, I connected things from top to bottom. So I rework the power connections. The power drop with the floppy connector has two legacy connectors. These get plugged into the IDE drives, while the splitter goes onto the CD and DVD drives.

At this point things start to look a whole lot better. I seem to recall now how an unpowered IDE drive can bring down an entire bus. I go through a number of power-ons to the setup screen to check device presence, and eventually I boot up Slackware with all hard drives online.

Now we're into the endgame, trying to get the raid array back online. I try to start /dev/md0. It says started with three devices, but it's dirty and won't run. I do a force, it says I/O error and still won't run. I do an examine on each component partition. They all say that one of the drives is removed, except for the one in question which says everything's present. I must have booted Linux with that drive unplugged from the bus, the one I thought was bad, and it got removed automatically. I've had to do this on a prior occasion but it's been a while and I've forgotten. Anyway an article on the web suggests a re-add, and suddenly everything is working. I'm able to run /dev/md0, re-sync commences, I can remount /home. I restart MySQL since it resides on the raid partition, and everything is looking good.

I'll still have to monitor the system over the next several weeks. I'm assuming it was the power supply, but given the intermittent nature of the problem only time will tell. I'm sort of suspecting that one of the rails in the power supply is weak, and prone to overload, causing the power supply to shut down. I may even be able to use the failed unit in a less-demanding system, maybe a desktop with a single hard drive. I document this incident to remind myself in the future of lessons learned, and hope that others can gain some insight from my experiences.

Monday, November 12, 2012

Slackware V14.0, round one.

Well, on noticing that the new version of Slackware, 14.0, is out now, I read the release notes and noticed among other things, the fact that it is now using the version 3 Linux kernels now, and also that Firefox is now upgraded from version 3 to version 15. Since Facebook has been on my case for some time regarding the inferiorness of my web browser, and also the fact that I have been running the 32-bit version of Slackware on a 64-bit machine all this time, I decided it was time to upgrade, and I was going to do it in style.

Okay. First off, the 64-bit version of Slackware is only available on a DVD distribution. I'm sure I could have hacked my way around this if I had to, but anyway, the first step regardless was to download the ISO-9660 image of the installation DVD. I tried a straightforward HTTP download, only to have it crash a quarter of the way along. So I went with the torrent instead. I downloaded the torrent file, brought up KTorrent, and opened the torrent file.

Nothing happened. No peers. Well, I thought, it's either very busy, or very new and nobody is sharing it much yet. Heck, the Slackware website said the DVDs weren't even available to be shipped yet. So, I waited.

Eventually it occurred to me to check on the trackers. I noticed "connection refused" in each case. Eventually I figured it out. It was the port number, 8082, a non-standard HTTP port number that I myself had cause to use in the past, but which my firewall, running Smoothwall, was blocking. I opened up that port and suddenly I'm downloading.

About 13 hours later, the download is complete, and I'm now seeding it to return the favor to the community. All well and good. Now my Linux server did not have a DVD burner installed. It does now, but let's not get ahead of ourselves. So I went to the nearest DVD burner, my Windows XP system which is amazingly clinging to life after ten years of continuous service. Original hard disk, never reformatted, only a fried power supply, and now a spiffy new case, merely awaiting new innards. But XP wouldn't burn a DVD for me.

So I go to the next nearest DVD burner, in another room, on a Windows 7 system. I have installed Windows 7 for people, but I am doing my very best not to use it personally. Linux does 99% of everything I need to do, and probably the other 1% if I could just figure out how. I have grown to dislike Windows and Microsoft. Anyway, the ISO image takes three hours to transfer over a slow WIFI connection. I really mean to upgrade my wireless access point, but that involves money and one has to prioritize. So three hours later, including restarting after an abort, I am burning the DVD with no issues.

Back to my Linux server, where I discover that my DVD drive is inoperable. It's of the same vintage as the XP system so this is understandable. At any rate, I pull the DVD burner out of the XP machine and transplant it to the server. The server has a CD burner in addition to the defunct DVD drive, so I swap it for the burner.

Next hurdle: I can't get the ribbon cable to connect to both CD and DVD. The connector on the DVD burner is offset a quarter inch, and there's not enough slack in the IDE cable to bend sideways. I try harder. I break the cable.

At this point, viewing the destroyed IDE cable, I observe first-hand the ingenious way they have grounded every other wire in the ribbon cable to eliminate cross-talk, and also realize that, while I have surgically repaired IDE and SCSI cables in the past, the fine nature of the cable, and the thin-ness of the copper sheet that interconnects each ground wire, means that this one is toast.

No matter, I have a gazillion old IDE, SCSI, and floppy cables tucked away for just such an occasion as this. I grab one of the old-fashioned kind, without the extra ground wires, but having ample distance between the connectors to connect to both drives. It works just fine.

On to the next chapter in the adventure. I boot the DVD, after changing the order of the boot devices to do so. I walk through a fairly straightforward Slackware installation, which I have done many times now. Now, in the past when I have upgraded the server, I have always left the previous version intact, choosing a different partition for the new version. I have a pair of partitions for this purpose, and I alternate. So far, so good.

The new version boots up just fine, except for the usual laundry list of things needing to be configured, like Apache and MySQL. No biggie, I've done this before. Anyway, I want to bring up X Windows since I prefer this environment on the whole, but I immediately run into snags. First off, I have a dual-monitor configuration, where each monitor has a different aspect ratio and screen resolution, and one of them is rotated 90 degrees. See, I like to do my web browsing on the rotated monitor, where I can view more of vertically-oriented web pages and the like. In the course of my Linux adventures I had discovered that Slackware did not include the proper driver for my nVidia interfaces, and apparently still doesn't.

At this point I decide that I'm going to take a break from the upgrade process, not to mention the fact that the V14 system doesn't have my custom DHCP configuration running, and everyone in the household is complaining that the internet is down. Well, it isn't, but when DHCP is down, Windows just says "no internet access" without offering further explanation. So I shut down 14.0 and reboot 13.1.

Disaster! The network is down because none of the ethernet interfaces appear, and the X server won't start either. Neither the ethernet driver nor the nVidia display driver will load. It seems somehow the V14.0 installation has done something nasty to 13.1, so I do a number of checks, but everything seems in order. Finally I found the error message buried in the logs. It's looking for the kernel modules for the version 3 kernel. A certain amount of skull sweat later, I identify LILO as the culprit.

The Slackware installation script creates a LILO config file for you. But, it points each Linux boot option towards the same kernel, the one on the system currently booted. Obviously this had happened in the past, but never noticeably. So I had to change the kernel reference from /boot/vmlinuz to /oldslack/boot/vmlinuz, and then things got much better.

Still, my X server would not start. I tracked down in the kernel logs where the nVidia driver failed to allocation memory, and then I remembered. The default video driver in Slackware is the VGA driver, and it is quite capable, except for things like rotating the display. It also works fine without additional memory, but nVidia needs vmalloc=512MB added to the kernel parameters. This gives it enough to do 1280x1024. I tried setting it do 1024MB but 13.1 hung during the boot. 14.0 booted okay with 1024MB so maybe 64 bits allows this where 32 doesn't, I don't know. At any rate, once I have settled in to 14.0, a job for another day, I will see about actually displaying 1920x1080 on my landscape monitor. I am using 1280x1024, stretched, but xine is kind enough to detect the actual monitor dimensions and adjust the image accordingly, so movies play just fine.

So now I'm back on version 13.1 for now, drinking coffee, and enjoying my birthday, and the fact that I have a day off. I have covered many hurdles thus far, and feel that it's pretty much downhill from here.

Wednesday, August 26, 2009

Evolution of a Home Server System

My concept of a home server system has evolved over time.

I got in to a home network early on, long before I had broadband internet or even a modem connection shared via a proxy server. The purpose was sharing files and printers, playing the occasional networked game. The network was peer to peer.

I originally decided that I needed a server right after I lost a hard disk on my home computer system containing irreplaceable information, and decided right then that one fatal hard disk crash was too many, and I needed to be fault-tolerant in the future. My initial implementation of a server was with the oldest, slowest system that I had, a 486 with 20 meg of ram, to which I attached a SCSI disk farm featuring three 6 gig hard drives made into a RAID-5 array. Windows NT Server 4.0 was my choice of operating system, influenced in no small way by similar systems I was working with professionally. This served me well for quite some time, until the point where I outgrew spreadsheets as a repository for data and moved into databases.

I had dabbled in Microsoft Access for a time, but all of a sudden my databases proliferated and multiplied. My choice of a file server with ample (at the time) storage capacity but minimal computing power became quickly overwhelmed, frequently maxed out, and especially at a time when I began encouraging the whole family to take advantage of fault-tolerant storage. So it became necessary create a new server system.

I built my next one from scratch also, a barebones system with state-of-the-art CPU and disks, along with Windows Server 2000. This one fared better at first, however my databases continued to expand.

At this time I began dabbling with Linux, Slackware to be specific. I had purchased Slackware 2.2 a number of years ago, only to set it aside when I could not get X-Windows to work correctly. I have had rather extensive experience with X-Windows, DEC Windows to be specific, and always thought it would be fun to dabble with at home. So now, having been laid off and plenty of time on my hands, I decided to dabble with it again, having about the same amount of luck. I decided an upgrade was in order, and purchased the latest Slackware, 10.2. Bundled with this was MySQL, which I quickly discovered and began to dabble in this as well. It took time, but eventually all my Access databases were ported into MySQL, and I began to explore Apache and PHP as well.

My web server was hosted on a laptop, an old Thinkpad initially, and it did quite well at first. I also ran MySQL on this laptop, this being my Slackware playground. I generated a number of testimonials as to the small footprint a LAMP stack could work within, even as I overwhelmed it eventually. My main MySQL server moved over to the Windows Server system, but even that eventually became overwhelmed as I tested the limits of the system, first with test scenarios and then hosting my ever-growing databases.

A new server system was in order, especially as another desktop computer had crashed, this time with a fried motherboard. My existing server was gutted to make into the new desktop, and I once again bought new hardware. This time I did not even use a barebones platform but completely from scratch this time. I upgraded to Slackware 12.0, used Samba to share the data to the Windows desktops, and put in a Core2 Quad CPU that runs most apps like greased lightning. I consolidated my LAMP stack on this system, and it is working very well. One of my slower pages, a PDF rendering of one of my novels that took three-plus minutes to load via the laptop Apache server, now loads in ten seconds or so. With some of my tables having over a million rows, this computing power is serving me well.

So you can see how my concept of a server has changed over time, from computing power being irrelevant to essential.

Wednesday, December 05, 2007

Further RSS issues

My saga with RSS continues. As I stated before, changes to MySQL's forum site led me to pursue using an RSS reader as my main interface to the forums. That worked quite well until recently. My only issue had been that the various replies to a post showed as separate entries to the reader, with no connection between them. Coming back from a self-imposed hiatus, I find that this problem has been fixed, at the expense of creating another one. Now, each thread shows up once only in the reader, along with an indication of how many replies have been posted. This is fine, except that the thread is not marked as updated, and so threads that get replied to are never highlighted as unread by the reader. This of course makes it extremely difficult to conduct timely and efficient interactions with someone on the board. I now have to peruse the forum web page with the thread list looking for new posts, an error-prone procedure supposedly eliminated by the reader, which had been popping up little notes on my screen when someone replied.

I have enjoyed helping out on the boards in the past, and intend to do so again, however without a foolproof automatic means of being notified of new replies, this rather puts a crimp on my activities.

Tuesday, January 16, 2007

Reaction to the iPhone concept

I find the new iPhone to be conceptually very interesting. Now, I am very new to the world of PDAs— I have only just started using one, a cast-off Palm Pilot from someone, and already I am thinking of things that it should be able to do but doesn't, or I haven't learned how yet. I suspect that very shortly I will be wondering how I got along without a PDA all these years.

Now, this new iPhone from Apple (I hope that they win the name from Cisco, as I have never ever heard of a Cisco iPhone but the name seems to naturally fit alongside iPod and other Apple products) will apparently be running a version of OSX. I find this very intriguing, because it opens up all sorts of interesting possibilities, including perhaps being able to run MySQL on it, or at least a client, perhaps a small database replicated from a fixed or not-as-mobile source.

At first glance and after hearing various reports, it sounds like a very capable device even without getting under the hood and finding out what makes it tick. Of interest will be what CPU it uses, getting stuff to compile for it, etc.

Sunday, January 14, 2007

A Fork In Progress

It looks like MySQL AB is going about the code fork in a very methodical, gradual, and thoughtful fashion. The latest Community release seems evidence of this. It was released as source-only, as promised, however it was announced that since no community contributions have been made to the community tree, the release was made from the Enterprise tree. In effect, the actual fork has not taken place yet. There is a community tree in evidence in BitKeeper, but it appears to be a placeholder only so far. The changes to date have been with regard to policy and procedure, not code.

I see this as a good thing. This gives MySQL time to evaluate reactions to these changes while implementing them in a gradual fashion. The changes appear well thought out and sound, and reflect changes in the community, the marketplace, and the customer base. It's important to balance all of these factors since they all play a part in the life of the product.

Knowing who the community user is, and who the enterprise user is, is important too. The focus should be on prioritizing catering to the needs of larger groups first, with the knowledge that some may not see their needs met directly. But this is an open-source product, so surely someone can rise to the occasion and fulfill a niche need.

Monday, January 08, 2007

New Bitkeeper client, old one broke

As has been pointed out, there is a new version (2.0) of the free BitKeeper client, and furthermore the old one (version 1.1) appears not to work anymore. At first I suspected my daughter's hogging of available bandwidth as the culprit when all of my daily pulls of MySQL source trees failed, but that soon proved not to be the case. I am once again pulling source code successfully.