How do you end up managing large scale operations? Teea Alarto, Senior Software Engineer, shares her story starting as a newbie Linux sysadmin. She is a self-taught programmer as well as a passionate open source and security zealot.
The Start
My introduction to being a Linux sysadmin for work (among programming and graphical design) began at my second job in my early twenties. It was for a company that wasn’t originally into programming, but saw the rise in automation and wanted to develop programs and services as they expected the demand to rise in coming years.
I had toyed around with my personal computer and tried out several distros, configured Apache and MySQL and such, but didn’t really feel that I was all that skillful. However, most of the programming staff were focused on Microsoft related languages and servers, so my job was to produce cheaper alternatives with Linux and various (non-MS) programming languages.
Down and Dirty
The server room was somewhat of a shock to me. “What’s the use of paying for someone else to host the servers when we can just do it ourselves?” The “servers” in this case were old desktop PCs that had been replaced for desktop use and were now otherwise obsolete. There were a dozen of them in a small room that had no ventilation. The building had a dark roof and no windows, so it was not rare for the temperature in our office to rise close to 40° C, meaning it was well beyond that in the server room. Solution? Buy fans. In retrospect it was quite an achievement of engineering that only a few of these servers failed, as they were operating well beyond designed temperatures.
I learned a lot of things through experience. I rarely had to go to the server room since I just SSH’d to my Linux boxes, but that was not the case with Windows side of the programming force. I had a few servers crash every now and then and later learned that this was due to a Windows programmer blindly hitting ctrl-alt-del to get rid of blank screensavers and into the login prompt on their Windows NTs. Also twice I had a cleaning lady come to vacuum the server room floor and unplug a server to get power for their hoover. Still, we managed to get the programs and services to start selling and what initially was a net loss side of the small company ended up growing as its most profitable section.
Hacked
I read a selection of websites for vulnerabilities and stories on my quest to become a better sysadmin. I ended up installing several utilities such as darkstat and ntop, making custom cronjobs that checked the md5sum of most common commands to be replaced such as ps and bash and e-mail me errors from different logs such as failed SSH logins. In the end these proved to be useful as I noticed one of our database servers (which was installed and supposedly maintained by our German sister company that had insisted on using SuSE as their distribution) started having traffic on port 6667, which is the default IRC server port. I checked my logs and found that I was in fact hosting an IRC server, logged in to it from my own IRC shell and found that there were a group of Polish people chatting on the server. It was taken offline, our German colleagues notified, and in the end we were no longer hosting other people’s servers in our own network. So it goes.
Network Congestion
As our programming efforts started bringing in more customers we started having real servers and moved to a new office that had a proper server room with real air conditioning keeping it a tad under 20° C constantly. My Linux logging had improved over time as well, no longer focusing just on network usage but loads, service response uptimes, and all that jazz. The Windows side of things however was not my responsibility (nor did I want to get involved with that) and we had one of the older Windows programmers acquire the role of Windows admin, maintaining logins, passwords, Office tools, and the likes.
On one day we started noticing our network was sluggish and shortly after started getting complaints from customers that our response times were slow. All my Linux servers showed to be in the green so together with the Windows admin we had a look at his servers, which also seemed to have plenty of resources available. After a moment of thinking we concluded that the fault must lie in our users and lo and behold, our 100 Mbit bandwidth was mostly being used by one of our salespeople. We asked to check his computer and reluctantly he showed us that he was running uTorrent, downloading and seeding several movies and music albums (along with a lot of dirty stuff) to the extent that our whole bandwidth was consumed. He was berated, his pirated stuff deleted and we had a laugh with the Windows admin. The admin started setting up monitoring on users and servers shortly after. So it goes.
Brazilian Bankjob
One summer after I had wished goodbye to my colleagues and started my summer vacation I got a call from our newly hired PR person. She had decided that our website needed a facelift, had hired an external company to do it and now needed a root password from me to give to the website company so that they could install their WordPress site on our servers. After some time of negotiating and telling that no, they can’t have our password, I ended up calling the company and agreeing with them that a regular user login with access to webroot was enough. I created such a login, made sure that they had no access to anything they weren’t supposed to have and gave them free rein to meddle about while I continued my vacationing.
They had a contract to maintain the WordPress website with updates and such, our high-up sales personnel were tutored on the usage of the WordPress platform and in the end when they needed to update any text on the site, I had to do it. So it goes. The WP site was still maintained in terms of patches and upgrades by the external company for a monthly fee, so I didn’t bother looking into it. Until we got a call followed up by an email from a Brazilian bank and their lawyers.
The external company had failed in maintaining the WordPress website and one of the WP addons they had used had a vulnerability in it. I wasn’t too worried since the website was isolated, Apache and PHP were updated by me and the web users didn’t have access to do anything naughty. They, of course, did have access to the website itself. Which now had a new bunch of PHP files added and these pretended to be the website of a bank in Brazil. Our WP website, among several others with the same vulnerability, was being used as a tool where unsuspecting people were invited by an email scam to enter their login credentials which were forwarded to an IP address somewhere.
I fixed the vulnerability by removing the faulty addon (Apache logs revealed where and which it was) and while this fixed the problem in our end, our WordPress website was later hosted on one of the Windows servers in static HTML code because after this Linux was seen as “unsafe”. So it goes.
While this was unjust and no amount of explaining to the higher ups helped, it did push me towards learning some admin secrets of IIS, MSSQL and in the end, .NET and C# which was a fun, new experience. Unfortunately it also meant running down most of the Linux side of development, even though I was keen on keeping it up with Mono, and the Linux servers ended up hosting different internal tools such as Wiki and other intranet sites.
At the time all these different things happened it seemed an everyday occurrence for the company and it wasn’t until looking back that I realised that these and lot of other stories could have made a good reading at the daily WTF. I learned a lot when it comes to preparing for threats outside and inside, that you shouldn’t trust anyone using your services and, most of all, to do extensive logging with automated reporting on anything that looks suspicious.
So it goes.
Want to write on our blog about some specific code or share a tutorial you wrote? Got your own dev or ops story to share? Get in touch with us at hello@exoscale.ch.