Preamble
In a previous post, I outlined my plan to rebuild my entire home lab to be far more robust and scalable in the future, as well as for the purposes of learning more advanced technologies and software used in proper larger scale deployments. A home lab is, after all, built for the purposes of learning, and while I do currently have a few of my own servers handling storage and some light containerized workloads through Docker, it’s not really a proper home lab. In fact, although my current networking structure is quite a bit more complex than typical, it to is not truly set up to be all that scalable or redundant, and the current security rules implemented are quite basic and what I would call the bare minimum. I have at least put some time into separating out most of my devices into their own vLANs, but other than my guest network, I’ve done nothing to really isolate traffic between different services. So, today we are going to start down the road of fixing and rebuilding my entire homelab, basically from scratch, and redeploying all of my services in a far more proper and professional structure, complete with proper security, isolation, redundant systems, high availability, and secure remote access.
There is a lot to do, and this is just the first step in the process. This post is going to be our first introduction to some of the hardware we will be using, but only plays a truly small part in the larger picture. Also, I am kind of doing things slightly out of the ideal order, because a lot of the hardware I will be deploying in the end is also currently being used and must continue to run until I am ready to fully switch over. For example, I would probably typically start this whole process by building up my network backbone as a new deployment, but all of my networking gear is currently being used to serve up, well, my entire home network and my currently running services. So, I don’t really want to change over all of my network infrastructure until I have moved all of my running services over to their new home on the new hardware and inside my eventual Kubernetes deployment.
Building the virtualization server
For this purpose I am using an old second hand Dell R410 that I bought on Ebay many years ago for a different project. Today this is probably not the best server choice, since its older architecture only supports up to Intel’s x5570 Xeon; a quad core chip based on 55nm lithography running at 2.93GH; or you might be lucky enough to be able to upgrade to an x5670/75 Xeon, which is a 6 core chip, which is actually what my servers came with, meaning I must have a rev2 model server. All in all this means that these servers are not particularly powerful, nor energy efficient, and if you are in the market for something new (or new to you) then you are better off looking at something newer like a Dell R610/710/etc. While those have support for much newer CPUs (the 610 supports the E5-2660v2 with 16 cores!) which will also be much more energy efficient and support more RAM, like I said, I have this server on hand, which makes the cost… well.. Free. In fact, I have 2 of them! If I was going to buy something however, I would certainly be looking at something newer. Since I will not be putting a heavy load on these, we should be fine. I will only be running a handful of VMs, and most of my services will be relying on lightweight Docker (or more specifically, Kubernetes) containers.
One thing you surely have noticed is that I have not mentioned at all just building your own server from scratch. By all means, it is a completely viable option! It’s not like I haven’t built computers before. If you have the cash to do so, your best bet would be to build a server using a high core count AMD Thread-Ripper or Epyc CPU. You could run way more powerful VMs with something like that, and it would run circles around these older servers, while also probably using less power! The major drawback here is simply the upfront cost. These older servers can often be had for less than $200, especially if you can do local pickups. On top of that, there are other benefits when using old enterprise gear, like remote management features. Server platforms also get priority support when it comes to bug fixes and security vulnerability patches. The R410 platform has been EOL for years, but it received an urgent BIOS update as recently as 2018 to fix security issues with these Intel CPUs. Besides, if you are looking for a learning experience, there is truly nothing like using the real thing.
Let’s talk about Dell’s remote management interface for a minute. Dell uses a system they call iDRAC. My server has an iDRAC 6 module installed. You can get to the configuration of this module by pressing Ctrl+E during boot. You will be prompted to do this on screen when it is time. These Dell r410s are quite old, and certainly not ideal, but like I said, I already have them.
Update: I bought new servers! Details in a future post.
Because these servers are quite old, they could use some love in the firmware update department. Basically all Dell firmware packages can be download directly from Dell’s support site. If you are also looking for this servers downloads… Here ya go: https://poweredgec.dell.com/latest_poweredge-11g.html#R410%20BIOS
My server needed a lot of updates, but most notably, my BIOS was stuck at version 1.4.8, while 1.14 was available, and my iDrac was on v1.54.
I Downloaded all the files and created a bootable Linux USB key. Make sure to download the .BIN files. What flavor of Linux you use shouldn’t matter, but RHEL is the only officially supported, which should mean that CentOS is also fine.
I copied all the files to the USB key and booted the key on the server.
You will need to make the files executable, you can do this by cd-ing into the directory holding them and running
find . -type f -exec chmod +x {} \;
BashTO UPDATE IDRAC FIRMWARE FROM WEB GUI
I was being unsuccessful in booting my live disk, and I was doing everything on an extra monitor plugged directly into the server and a second keyboard because the iDrac firmware was too old to work with newer TLS security requirements so I thought I needed to get it updated so I could use the remote KVM functionalities. Download the .exe file from dell and open it with an archive manager to extract the firming.d6 file, then log into the iDrac web gui and go to firmware update.
FYI, you can also set the iDrac to have an IP address (DHCP or static) using the front LCD panel. Press the check button to show Setup, press again to show iDrac, then left or right to select dhcp or static, then press again to save.
Unfortunately updating iDrac to the latest version doesn’t fix the vulnerabilities of the older deprecated java based kvm and newer versions of java will not allow it to connect without some additional work. Because I am on Ubuntu Linux on my desktop, I installed the icedtea-netx package to open javaws .jnlp files
Update:
I later found too many problems with this approach. Eventually, I instead downloaded older versions of Java through Oracle’s developer site, and then manually installed this binaries in the correct location. I then also manually created alternatives using the update-alternatives package that is installed on Ubuntu based Linux distros to create symbolic links to everything necessary. The command for this looked something like:
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jre1.7.080/bin/java" 2 --slave "/usr/bin/jcontrol" "jcontrol" "/usr/lib/jvm/jre1.7.080/bin/jcontrol" --slave "/usr/bin/javaws" "javaws" "/usr/lib/jvm/jre1.7.080/bin/javaws"
BashThis creates symbolic links to all of the java 1.7 binaries into your /usr/bin folder. You can then do this again for every version of Java you have installed. I ended up copying over all the files for java 1.6, 1.7, and 1.8, and repeating the above command for each, replacing jre1.7.080
with jre1.6.045
and jre1.8.0301
.
*At any point in time now, you can simply type
sudo update-alternatives --config java
Bashand select which version you would like to use. Do note, any changes you make to java settings in jcontrol will only apply to that specific version.*
The update-alternatives setup can actually be done with just about any file… I might have to find some other uses for this later….
I was able to use java 6 to connect to the kvm, after adding the IP address of the server to to trusted hosts in jcontrol. The remainder of this section describing the java settings through icedtea-web is here only for historical reference, but is not my recommended configuration.
After this, I opened the icedtea-web control panel -> went to policy settings -> clicked on advanced editor. (This actually failed to open the editor for some reason, but it gives you the full path to the file which you can open in anything you want. If the file doesn’t exist and you are using openjdk-8, you can copy the file from /etc/java-8-openjdk/security
After this we need to modify our java.security file. For me it was /etc/java-11-openjdk/security/java.security
The iDrac6 kvm uses the out of date RC4 algorithm, so we need to allow RC4 on our client to use it. In the java.security, find the line jdk.tls.disabledAlgorithms=
and remove RC4 from the list.
After all of that, you should be able to go back to the iDRAC dashboard and launch the kvm. You will have to click through and accept all of the warnings about using an insecure connection blah blah blah, don’t open this to the web, but you will finally be able to connect.
Ok so I have to admit, the built in KVM was basically a bust. I had never actually used it before, because I could never get it to work, and I figured now would be a good time to try. It wasn’t. It’s just too old to work properly. For reference, the closest I got was getting output on the kvm viewer, which is great, but I couldn’t get the keyboard and mouse passthrough to really do anything. I ended up trying to install Java 6, which failed, I think because of this error when launching the viewer:
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/lib/jvm/jre1.6.0_45/lib/amd64/libdeploy.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
See Above Update.
I had tried to install java through my package manager, but the answer was to download the binaries directly from Oracle, which ended up working fine.
Update: I installed java 7 on windows and things work a little better. Still not great, but better. It is at least good enough for console input.
Ubuntu live disk failed to install BIOS update, now trying CentOS.
I installed CentOS to a USB disk as an actual full installation because CentOS doesn’t really have a ‘Try before Install’ environment (although I think Fedora does, not positive), and then copied the different firmware installers to the home folder. Using this I was able to run the update for the BIOS, LCC, and Diags. The BMC, Network, and Firmware files did not apply due to nondescript errors unfortunately.
This whole process took for eeeevvvverrrrr. I went through multiple OS installs which each took about an hour because installing CentOS on a USB drive is super slow, or at least it was for me. I first tried Ubuntu because I already had an Ubuntu live disk, but I couldn’t actually run the bin files from the live disk because the script attempts to create directories and extract files to them, and Ubuntu live images are read only and also don’t have any space. So then I figured “OK, I’ll just install Ubuntu onto the free space that is left over on the USB.” That worked fine using manual partitioning (but I did run into a really weird error, the first time I tried, about having too many primary volumes) and I was able to boot into that install, but then actually running the bin scripts failed for a reason that I really didn’t care enough to investigate further. I chalked it up to the scripts being written with RHEL in mind because it was complaining about missing commands.
So I repeated this dance by clearing the USB, mounting CentOS using the virtual media function on the iDrac and installed onto the USB using automatic partitioning. Now, me not being super familiar with RedHat OSs, I did not know that the default partitioning used LVM even if you don’t use encryption. I learned this as soon as I plugged the drive into my main system to copy the bins over and found the partition didn’t mount. I’ve never really messed with LVM so I did some research on how to mount it, followed the instructions which all made perfect sense, and ran into an error about not being able to find the superblock on the partition. Some more research later suggests that I did everything right, but I needed to run a command to fix the filesystem from corruption… or something. Well, I knew that wasn’t right, because I knew the file system wasn’t corrupted since the disk booted just fine when I tried it. Again, this was a wall that I really didn’t want to deal with at the time, and while I would very much like to investigate it further at a later date, I knew that trying to fix it now would likely take longer than just starting over again.
Now that I have had time to sit down and put my thoughts down, I wonder if the error was because I was trying to mount the volumes on an Ubuntu based system that I am not entirely sure has support for the xfs filesystem, which is what I believe the underlying partition is formatted as. The solution very well could have been as simple as installing a package, but I don’t know that for a fact.
Anyway, I reinstalled CentOS again using custom partitioning and an ext4 fs, moved the files over, plugged the drive back in, booted, and started updating. All in all, I spent several hours doing this. The multiple OS installs took about 4 hours total, the updates took about 2.5, and then I spent quite a bit of time screwing around with java and the iDrac kvm from before.
The good news is, everything is now updated to latest software and security patches, and the server is ready to be used for some light virtualization.
Except one thing…. We have to rinse and repeat, because I have 2 of these servers….
OK, OK, the second server didn’t take nearly as long, because of course, I just did all the hard work of getting Java and my USB live disk all set up. The second server took under 2 hours for the full gambit. If I had more to do, then setting up some kind of automation would be on the table, but it wasn’t necessary here.
In a later post, we will talk about deploying Proxmox on these servers as an HA cluster.