Your boss walks in looking anxious but still composed, you look up and they ask, “Can you tell me if the (antiqued system you have never heard of) is having any issues and what sort of load it is under? (Team you have also never heard of) is complaining. While you are in (analytics tool) can you generate an Excel file so I can toy around?”
After some investigation you find an old busted server running an ancient version of Linux. You can’t connect it to your awesome clustered log analytics tool, you are the weeds, you are alone, you are in trouble.
No worries, ancient Linux tools to the rescue!
WARNING: Some of these chained commands are absurd and probably should never be used in real life. Hopefully they act as useful introductions to various Linux tools and the types of things you can do. If you end up with a mess of chained commands, consider a scripting language.
First, time to see where this esoteric system puts its log files. We use locate as our first pass.
locate .log
What is going on?: locate checks a database or databases maintained by updatedb. This database can be out of date or missing data so it isn’t as exhaustive as bind and is just based on the data it has indexed.
For a more exhaustive search, we can reach out to bind.
find / -name \*.log\*
What is going on?: bind checks the file system in real-time, so it is much slower than locate, but will always be up to date. This search starts in /
(the root of the file system) and searches all filenames (-name) that contain .log in any part of them.
Woah, way too many, but still not sure exactly what I am looking for, so I start filtering out stuff I absolutely don’t want.
locate .log | grep -v kern.log | grep -v other.log
What is going on?: We are piping the output of locate into grep and using the -v
flag to remove lines that match that substring. We do this twice so the result intentionally lacks anything with kern.log or other.log. It could be done with a single grep command, but I leave that as an exercise for the reader.
Success, you found the esoteric system files under a random home directory. It must have its own user.
/export/home/random/random.log
/export/home/random/random.log.1.gz
/export/home/random/random.log.2.gz
/export/home/random/random.log.3.gz
/export/home/random/random.log.4.gz
Now you want to quickly look at the content of a complete log, luckily we have zcat which lets us get plain text out of a gzipped file.
Note: you will find a handful of utilities have the z-equivalents that work directly on gz files: zgrep, zdiff, and more.
So, first and foremost the system is acting ‘weird’. Time to check for recent errors. Use the power of grep!
zcat /export/home/random/random.log.2.gz | grep -i error
What is going on?: We are piping the output of the zcat into grep and using the -i
flag to case-insensitively match lines that include the word error.
The output is many screens full of errors, so you get a count quickly using wc.
zcat /export/home/random/random.log.2.gz |
grep -i error |
wc -l
What is going on?: We are extending our last example with wc -l
to count the number of lines (rather than words) in the output.
10033
… that is a lot of errors! Time to find out how many different types of errors we are getting. Also, time to investigate the error format by looking at a few of the errors at the start by using head which by default will return 10, but we only need 5. We could have also used tail to grab from the end.
zcat /export/home/random/random.log.2.gz |
head -5
What is going on?: We are changing our last example to remove the line count and instead just return the first 5 lines using head.
Note: A common use for tail is monitoring files as they are changed using tail -f file_name
.
The output of the above command is:
Apr 28 17:06:20 ip-172-31-11-241 random[12545]: reverse mapping checking getaddrinfo for 216-19-2-8.commspeed.net failed - POSSIBLE BREAK-IN ATTEMPT! [preauth]
Apr 28 17:06:21 ip-172-31-11-241 random[12545]: Received disconnect from 216.19.2.8: 11: Bye Bye [preauth]
Apr 29 17:06:22 ip-172-31-11-241 random[12547]: Invalid user admin from 216.19.2.8 [auth]
Apr 29 17:06:23 ip-172-31-11-241 random[12547]: input_userauth_request: invalid user admin [preauth]
Apr 29 17:06:24 ip-172-31-11-241 random[12547]: questionable user request: delete everything [usage]
Hmph, that usage one is interesting, let’s see how many of those there are. But we don’t want to use grep because it might find the wrong things, so we need to isolate the specific field we want, cut to the rescue.
Note: This could also have been done with either sed or awk.
zcat /export/home/random/random.log.2.gz |
cut -d "[" -f 3 |
cut -d "]" -f 1
What is going on?: We now have added in the cut command, we setup a delimiter to split the string on, the first thing we do is split our lines on the “[” char, which turns
Apr 28 17:06:21 ip-172-31-11-241 random[12545]: Received disconnect from 216.19.2.8: 11: Bye Bye [preauth]
into
Apr 28 17:06:21 ip-172-31-11-241 random
12545]: Received disconnect from 216.19.2.8: 11: Bye Bye
preauth]
Then we take the 3rd field “preauth]” and we split it again using cut on “]” which leaves us with just:
preauth
Note: There are a lot of tools we could have used to trim the last “]“, sed or awk would work here too.
Perfect, that (for our 5 line examples above) gives us the following output:
preauth
preauth
auth
preauth
usage
Terrific, now we want to get counts of each one so we can add in a bit of sort and uniq magic.
zcat /export/home/random/random.log.2.gz |
cut -d "[" -f 3 |
cut -d "]" -f 1 |
sort |
uniq -c |
sort -bnr
What is going on?: We pipe the output of our last command to sort, giving us the output:
auth
preauth
preauth
preauth
usage
Then we use uniq to reduce to unique words and by adding -c
we prefix each line with the count.
1 auth
3 preauth
1 usage
We pipe it back into sort with -bnr
which will (-b) ignore leading blanks, (-n) order using numeric value, and (-r) reverse the order putting highest values first.
3 preauth
1 usage
1 auth
From this we extract some interesting numbers and clues to the problem, but really need it also grouped by day to find patterns. So this one introduces a few new things. Here we actually use awk, see the first appearance of xargs, and zgrep.
zcat /export/home/random/random.log.2.gz |
awk '{print $1" "$2}' |
uniq |
xargs -I '{}' sh -c '
echo "{}";
zgrep "{}" /export/home/random/random.log.2.gz |
cut -d "[" -f 3 |
cut -d "]" -f 1 |
sort |
uniq -c' |
sort -bnr
What is going on?: We start by piping the compressed file into awk, which takes the first two fields and discards the rest. The first two fields are the month and day. Then we remove duplicate days using unique and are left with a series of days like “Apr 28” and “Apr 29”. At this point we take those days and pipe them into xargs, which will run the given command for each line passed into it, in this case each day. We output the day with echo, then we grep the file for that day and pipe matches into the same series of steps we came up with before
That outputs:
Apr 28
2 preauth
Apr 29
1 auth
1 preauth
1 usage
...
May 5
5152 usage
4 auth
2 preauth
...
Great, we found the troublesome day, time to investigate that date further… but before we do that we still owe our boss an Excel file, luckily he doesn’t really care whether it is an Excel file as long as it loads in Excel, so CSV to the rescue. Well, let’s just get ALL the data in CSV format and let them deal with it.
echo "date, time, ip, process, pid, message, type" > boss_output.csv && zcat /export/home/random/random.log.2.gz | sed -n 's/\(\w\+\) \(\w\+\) \(\w\+:\w\+:\w\+\) \(.[^ ]\+\) \(\w\+\)\[\(\w\+\)\]: \(.[^\[]*\) \[\(\w\+\).*/\1 \2, \3, \4, \5, \6, \7, \8/p' >> boss_output.csv
What is going on?: First we create a new file by echoing a header line into boss_output.csv using redirection. After that we pipe the uncompressed content of the log file into sed. Using a regular expression to break up the content we create a CSV file.
Now the boss output looks like the following which hopefully will let them make a pretty graph of some sort.
date, time, ip, process, pid, message, type
Apr 28, 17:06:20, ip-172-31-11-241, random, 12545, reverse mapping checking getaddrinfo for 216-19-2-8.commspeed.net failed - POSSIBLE BREAK-IN ATTEMPT!, preauth
Apr 28, 17:06:21, ip-172-31-11-241, random, 12545, Received disconnect from 216.19.2.8: 11: Bye Bye, preauth
Apr 29, 17:06:22, ip-172-31-11-241, random, 12547, Invalid user admin from 216.19.2.8, auth
Apr 29, 17:06:23, ip-172-31-11-241, random, 12547, input_userauth_request: invalid user admin, preauth
Apr 29, 17:06:24, ip-172-31-11-241, random, 12547, questionable user request: delete everything, usage
... (many more)
May the boss be satisfied.
Want to write on our blog about some specific code or share a tutorial you wrote? Got your own dev or ops story to share? Get in touch with us at hello@exoscale.ch.