Interview questions … Get IP address from Apache Logs
Monday, August 15th, 2011TL;DR Some Apache log processing one liners to get IP addresses from the access and error logs that I have found handy.
There may be some log processing questions asked during the course of an interview, so I am going to concentrate on a couple that will get the IP addresses from the apache log files. If you do have log processing quesitons I sincerely hope you get to play at a command line, as off the top of the head can be difficult, unless you’re good at visualizing commands.
Q: How would you get the IP address from the access logs? A: This one is fairly straight forward, I would:
cat access.log | awk '{print $1}' | uniq
This will output (if you choose not to use uniq you will see multiple of the same ip, I’ll leave to the discretion of the reader):
127.0.0.1 127.0.0.2 127.0.0.3
The breakdown is as follows:
-
1. read through the contents of the file (cat access.log)
2. pipe output to awk to print the first field ($1)
3. pipe output to only show unique data (uniq)
Another question might be to parse out the IP address from the error.log, while this is a bit more difficult, it is fairly straight forward using readily available system tools.
Q: Can you please extract the IP address from the error.log? A: Here is my solution (Thanks Malcolm!)
cat error.log | awk '{ if($7 == "[client") {print $8} }' | sed -e 's/]$//g' | uniq
The breakdown is as follows, this one is a bit more complex so I will walk through each step of it:
First part: cat error.log - read through the log file. [Sun Aug 14 13:27:14 2011] [error] [client 96.126.120.254] Invalid method in request \x80e\x01\x03\x01 [Sun Aug 14 13:27:14 2011] [error] [client 96.126.120.254] Invalid method in request \x80e\x01\x03\x01 [Sun Aug 14 13:27:14 2011] [error] [client 96.126.120.254] Invalid method in request \x80e\x01\x03\x01 [Sun Aug 14 13:27:14 2011] [error] [client 96.126.120.254] Invalid method in request \x80e\x01\x03\x01 [Mon Aug 15 15:16:21 2011] [error] [client 194.72.238.62] Invalid method in request \x16\x03\x01 [Mon Aug 15 15:50:27 2011] [notice] caught SIGWINCH, shutting down gracefully [Mon Aug 15 15:50:37 2011] [notice] mod_python: Creating 8 session mutexes based on 75 max processes and 0 max threads. [Mon Aug 15 15:50:37 2011] [notice] mod_python: using mutex_directory /tmp PHP Warning: Module 'gd' already loaded in Unknown on line 0 PHP Warning: Module 'mysql' already loaded in Unknown on line 0 PHP Warning: Module 'mysqli' already loaded in Unknown on line 0 [Mon Aug 15 15:50:38 2011] [warn] mod_wsgi: Compiled for Python/2.5.1. [Mon Aug 15 15:50:38 2011] [warn] mod_wsgi: Runtime using Python/2.5.2. 2. Second part: cat error.log | awk '{ if ($7 == "[client") {print $8} }' - if the 7th field matches client (this seems to be pretty standard though ymmv) print out eighth field (which should be the IP address, also notice the trailing "]" character). 96.126.120.254] 96.126.120.254] 96.126.120.254] 96.126.120.254] 194.72.238.62] 3. Third Part: cat error.log | awk '{ if ($7 == "[client") {print $8} }' | sed -e 's/]$//g' - use sed to remove the trailing "]" character. 96.126.120.254 96.126.120.254 96.126.120.254 96.126.120.254 194.72.238.62 4. Fourth Part: cat error.log | awk '{ if ($7 == "[client") {print $8} }' | sed -e 's/]$//g' | uniq - lets output unique ips and not all of them (multiple matches). 96.126.120.254 194.72.238.62
Addendum:
If you would like you can then add another awk on the end (or pipe to any other command you feel like) for instance. The following pipes the output to the host command:
[Edit: for host might want to specify the -W <time> flag just in case, it could try forever on some unless specified] cat error.log | awk '{ if ($7 == "[client") {print $8} }' | sed -e 's/]$//g' | uniq | awk '{ print | "host -W 3 " $1 }'
I hope this helps, share and enjoy! Thanks for reading!
-Scott.
Disclaimer: I make no claims to the viability of the code/script/commands and make no guarantees that it will work on your system, use at your own risk.
