Query ALB logs via SQL

AWS Athena Serverless Interactive Query Service

With the news of AWS Athena available in Sydney, instead of just sharing it (which is good as well by the way), I thought I will use the functionality to solve a use case. Athena is a serverless query service that allows you to analyse data stored in Amazon S3 using standard SQL statements.

One of my colleague struggled to parse the ALB logs while troubleshooting and we all know how important it is to see logs; massage it the way you want (count, sort, group, limit results and what not) and then use it to help with your troubleshooting!

We had our ALB logs going to the bucket already. So that’s a tick! If that’s not the case for you, use load-balancer-access-logs to configure this first as we need data! 😉

$ aws elbv2 describe-load-balancer-attributes --load-balancer-arn arn:aws:elasticloadbalancing:ap-southeast-2:xxxxxxxx:loadbalancer/app/alb-xxx-Misc/xxxxxx --output table
--------------------------------------------------------------
|               DescribeLoadBalancerAttributes               |
+------------------------------------------------------------+
||                        Attributes                        ||
|+-------------------------------+--------------------------+|
||              Key              |          Value           ||
|+-------------------------------+--------------------------+|
||  deletion_protection.enabled  |  false                   ||
||  access_logs.s3.prefix        |  Misc                    ||
||  idle_timeout.timeout_seconds |  60                      ||
||  access_logs.s3.enabled       true                    ||
||  access_logs.s3.bucket        |  xx-xx-xx-access-logs  ||
|+-------------------------------+--------------------------+|


Time for Athena:

If you have never created an Athena DB before, you might be greeted with a Wizard. Click cancel/press escape to skip that; so you land onto Fresh Athena Console.

Define your schema: ( xxx below refers to your S3 bucket and Account ID) by reading and pasting below in the Text Box from where Queries can be ran.

Create your DB….. (Normal SQL, yay!)

CREATE DATABASE alb_logs;

Create the Table….

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs.misc_alb
( type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code string,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1','input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:\-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) (.*)' )
LOCATION 's3://xxxxxx-access-logs/Misc/AWSLogs/xxxxxx384xx/elasticloadbalancing/ap-southeast-2/'

Pretty simple schema, eh? See details of ALB logs here

Click on Run Query, should result in a table with defined schema as shown:

And that’s it… we are ready to fire SQLs on it (can sort, group and count)… Imagination is your friend!

select time, type, user_agent, client_port,request_processing_time,target_processing_time,response_processing_time,elb_status_code,target_status_code,received_bytes,sent_bytes,request_verb  from alb_logs.misc_alb where client_ip = 'xx.xx.xx.xx' and time >='2017-08-23T23:55:15' and time <= '2017-08-23T23:55:28';

You can save your query, see history and manage the catalogue. Woot!!!

It’s not only limited to ALB logs, you can query VPC flow logs, CFront logs, basically anything which writes to S3 and has a standard structure (schema) which can be defined. Check sample queries here (large PDF).

If you want to drop your table, your data is still safe. Bonus! When you drop an external table, the underlying data remains intact because all tables in Athena are EXTERNAL.


Continue reading...

Everything you missed at DevDay
Natalie Byrgiotis
October 30, 2018
Chameleon SMS
Ibrahim Tareq
October 29, 2018
MessageMedia x Auth0
Ibrahim Tareq
October 25, 2018