Analyze web traffic with Squid proxy & Elasticsearch/Logstash/Kibana stack

Thomas Decaux
5 min readAug 5, 2015

You want to know what are you sending to the Internet? This tutorial will explain how to setup Squid as a reverse proxy and how to log access queries to elasticsearch, then we will use Kibana to build a cool & clean dashboard.

We will use Docker with some official images and Docker Compose to glue all services together.

First I will discuss about softwares used, then I will show you how to install and run it.

Prepare a Docker host

You should be familiar with Docker and Docker machine:

“an open platform for distributed applications for developers and sysadmins”

“Machine lets you create Docker hosts on your computer, on cloud providers, and inside your own data center. It creates servers, installs Docker on them, then configures the Docker client to talk to them.”

https://www.docker.com/

https://docs.docker.com/machine/

https://registry.hub.docker.com/

We use Docker machine to create and setup a Docker host in a few seconds on VirtualBox or Amazon/Azure cloud or VMWare and Docker registry so we can install any software in a few seconds without worrying about dependencies.

Squid

Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages. Squid has extensive access controls and makes a great server accelerator.

Install Squid with Docker

Squid configuration

File squid.user.conf

visible_hostname proxy

http_port 0.0.0.0:3128

# ACL
http_access allow all
icp_access allow all

cache_dir ufs /var/spool/squid3 1000 16 256

access_log tcp://logstash:1025
cache_access_log /var/log/squid3/cache-access.log
cache_log /var/log/squid3/cache.log
cache_store_log /var/log/squid3/store.log

http_port 3128

The particularity is we forward access logs to a TCP server located at logstach on port 1025.

Logstash

Logstash is a data pipeline that helps you process logs and other event data from a variety of systems. With 165 plugins and counting, Logstash can connect to a variety of sources and stream data at scale to a central analytics system.

Install Logstash with Docker

Configure Logstash as TCP consumer and forward to Elasticsearch

input {
tcp {
port => 1025
type => "squid-access"
}
}

filter {
grok {
match => {
"message" => "%{POSINT:timestamp}.%{WORD:timestamp_ms}\s+%{NUMBER:response_time} %{IPORHOST:src_ip} %{WORD:squid_request_status}/%{NUMBER:http_status_code} %{NUMBER:reply_size_include_header} %{WORD:http_method} %{NOTSPACE:request_url} %{NOTSPACE:user} %{WORD:squid}/%{IP:server_ip} %{NOTSPACE:content_type}"
}
}
}

output {
elasticsearch {
protocol => "http"
host => "elasticsearch"
port => "9200"
index => "squid-access"
}
}

The particularity is the Grok pattern, this will parse the squid access log entries to extract data, then the data are indexing in Elasticsearch.

Elasticsearch

Elasticsearch is an open-source search engine built on top of Apache Lucene™, a full-text search-engine library. Lucene is arguably the most advanced, high-performance, and fully featured search engine library in existence today — both open source and proprietary.

https://www.elastic.co

Install Elasticsearch with Docker

Configure Elasticsearch

Before sending data, we must define how Elasticsearch will interpret it, this the mapping.

{
"settings": {
"number_of_replicas": "1",
"number_of_shards": "1"
},
"mappings": {
"_default_": {
"_timestamp": {
"enabled": true,
"path": "time",
"format": "date_time_no_millis"
},
"properties": {
"time": {
"type": "date",
"format": "date_time_no_millis"
},
"host": {
"type": "string",
"index": "not_analyzed"
},
"src_ip": {
"type": "string",
"index": "not_analyzed"
},
"reply_size_include_header": {
"type": "integer",
"index": "not_analyzed"
},
"squid_request_status": {
"type": "string",
"index": "not_analyzed"
},
"http_status_code": {
"type": "integer",
"index": "not_analyzed"
},
"http_method": {
"type": "string",
"index": "not_analyzed"
},
"request_url": {
"type": "string",
"index": "not_analyzed"
},
"content_type": {
"type": "string",
"index": "not_analyzed"
},
"squid": {
"type": "string",
"index": "not_analyzed"
},
"server_ip": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}

Kibana

Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. You use Kibana to search, view, and interact with data stored in Elasticsearch indices. You can easily perform advanced data analysis and visualize your data in a variety of charts, tables, and maps.

Install Kibana with Docker

Configure Kibana

Kibana is very handy, there is an awesome web UI where you play with your Elasticsearch data and build very complex dashboard.

Install & run services

Now we know Docker images we need, let’s build our docker-compose YAML file

elasticsearch:
image:
elasticsearch:1.7
ports:
- 9200:9200
volumes:
- ./data/elasticsearch:/usr/share/elasticsearch/data:rw
working_dir: /usr/share/elasticsearch/data

kibana:
image:
kibana:4.1
links:
- elasticsearch
ports:
- 5601:5601

logstash:
image:
logstash:1.5.3
command: logstash -f /etc/logstash/conf.d
volumes:
- ./logstash:/etc/logstash:ro
links:
- elasticsearch
ports:
- 1025:1025

proxy:
image:
sameersbn/squid:latest
volumes:
- ./proxy/squid.user.conf:/etc/squid3/squid.user.conf:ro
- ./data/proxy/squid/cache:/var/spool/squid3:rw
links:
- logstash
ports:
- 3128:3128
Files structure

Now let’s run our small infrastructure, the big problem with docker-compose is services dependencies ;-( Here, Squid needs Logstash ready to start and Kibana needs Elasticsearch. So we are not going to use the simple

docker-compose up

But (File run.sh):

#!/bin/bashdocker-compose up -d search

sleep 5

docker-compose up -d --no-deps kibana logstash

sleep 5

docker-compose up -d --no-deps proxy

docker-compose logs

We should be able to access to Elasticsearch in order to setup the mapping:

curl -XPUT http://{IP_DOCKER_HOST}:9200/squid-access -d "$(cat elasticsearch/squid_access_template.json)"

Et voilà!

Use Squid proxy

curl https://www.facebook.com -x http://{IP_DOCKER_HOST}:3128 -I

Or with Firefox

Thomas Decaux
Thomas Decaux

Responses (4)

Write a response