How to setup secure outbound web access

From LinuxMCE
Jump to: navigation, search


This tutorial has come about as a result of a discussion on the forums. It details setting up a chain of security devices on the core which should help optimize web browsing from the internal network.

Please note, much of this is shamelessly plagarised from others. What I have done is to try to bring their work together and provide a comprehensive end-to-end solution. If you are one of those whose work has been used, please take it as a compliment and feel free to add appropriate credits at the end!

How Browsing Works

Without wishing to state the obvious, a basic understanding of web browsing helps us to understand the setup described here. Browsing is a fairly simple process. The client (known as a browser) sends a request using a protocol called HTTP to a server. By default, this request is sent to port 80. The server responds with the requested file. This file may well contain HTML, which the browser will display and will result in the browser making further requests for graphics files, etc.

It is also possible to put a third entity in the middle of this chain. This entity is known as a proxy.

In this case, the requests from the browser are sent to the proxy. The proxy sends the requests to the server, which responds to the proxy. Finally, the proxy responds to the browser. Proxys are used for many reasons, often security-related.

The system we will be setting up will consist of a chain of 3 proxies which will perform the following functions:

  • Caching. This allows the proxy to store a copy of the files requested. If a second request is received for the same file, it is already held locally and a second request does not need to be sent to the server. This reduces traffic on the external network and also improves performance overall. The Caching proxy we will be using is known as Squid.
  • Virus Scanning. As the file will be passing through the proxy, that proxy can examine its contents. In this case, viruses can be scanned for and blocked. The virus scanning proxy we will be using is called HavP and is used in conjunction with a regular scanner, in our case ClamAV.
  • Content Scanning. As well as being examined for viruses, the text of the HTML can be processed and scored. This allows inappropriate (for example sexual) content to be blocked. We will be setting up using one of the best known content scanners, Dan's Guardian.

Transparent Proxying

There was much discussion on the forum thread concerning this. It is an additional feature which is entirely optional. In order for a proxy to be used, the browser has to "know" to send its requests to the proxy rather than to the actual server. This is achieved by configuring the browser with the proxy's details. There are, however, one or two problems with this.

  1. Each browser must be individually configured. Not too much of a problem with a home network and not many browsers, but particularly with portable devices which may or may not be used elsewhere (for example at work) it can be inconvenient to have to keep turning the proxy on and off.
  2. It can be easy to bypass. Without additional firewall rulesc to prevent direct browsing of the internet, bypassing the proxy (and therefore gaining access to blocked content) is as simple as turning it off in the browser.

The solution to this is known as transparent proxying. This works by having the proxy running on the router (in our case the core) and configuring it to intercept all outbound web traffic (i.e. destined for port 80) and redirect it to the proxy. The process is transparent to the end browser / user, hence the name.

NOTE Transparent proxying should not be seen as an alternative to setting a proxy on static machines, only as an addition. There are still ways to circumvent the system (for example, some webservers don't operate on port 80) so transparent proxying should be seen as an additional layer of security, not an alternative!

The Software

We will be installing a series of packages. The full details of configuring each will not be discussed here. For an in depth discussion, please visit the respective websites.


Squid is a very powerful caching web proxy server. This means that it keeps a copy of all files it is asked to retrieve (unless the various http headers dictate that the file cannot be cached). When a file is requested, Squid first checks to see if it has a cached copy. If so, that is returned, if not, it is fetched from the webserver. It is common for certain websites to be visited frequently and Squid will rapidly build up cached copies of things like logo graphics etc. which can (in some cases) drasticly reduce the requests to the internet with resulting speed advantages. If you also pay for data useage, Squid can save money as well! Full details can be found on the Squid website. In our configuration, Squid will be the proxy that actually makes the requests to the internet and is, therefore, the "end" of the chain. You can optionally configure Squid to block ads as well.


ClamAV is a linux-based virus scanning solution. It is not directly part of the proxy chain, but will be used by HavP. It will be configured to download updates on a regular basis and anecdotal evidence is that it is often "first to press" with new virus definitions. Further information at the ClamAV website.


HavP (Http Anti Virus Proxy) is used to scan each file being downloaded using a standard AV scanner. In our case, we are going to use ClamAV (above) to scan. HavP will be configured to use Squid as is proxy. It is important that this order is used as a new virus may otherwise end up in the cache (before ClamAV is updated) and then continue to be served up even after the virus database is updated. Once again, full details on the HavP website.

Dan's Guardian

The final stage in the Proxy chain is Dan's Guardian. This is the proxy that the clients will connect to and it will, in turn, pass the requests on to HavP. Dan's Guardian is a well known content scanner that can control access to websites based not only on the URL, but also on the actual content of the files. It, like Squid, is hugely flexible and a full discussion is beyond the scope of this document, but if you need more, point your browser at the Dan's Guardian Website.



I am writing this as I perform the steps on my own core. So, while this line is here, it's a work in progress and as yet unfinished!!!!

We are going to work "backwards" testing each link in the chain as we go. I am going to assume you have a browser-enabled device on the "inside" network that you can easily re-configure with proxies (a PSP is NOT ideal for this!!) I also accessed my core from an ssh session on my client.


Firstly, we install Squld. There is already a wiki page on installing Squid which works pretty well. The only differences I experienced concerned the squid configuration file. This is located in /etc/squid/squid.conf. Locate the lines that say(Lines 607, 608 & 609 in mine):

acl localnet src    # RFC1918 possible internal network
acl localnet src # RFC1918 possible internal network
acl localnet src        # RFC1918 possible internal network

comment them out(put a # at the beginning) and insert a single line which says:

acl localnet src

Then uncomment the line (remove the initial #) from the line(675)

#http_access allow localnet

To use Squid as an ad blocker, follow this configuration: Squid as ad blocker

Configure your browser to use a proxy on, port 3128 and try to access some sites. Use tail on /var/log/squid/access.log to ensure squid was used. If all is ok, proceed to the next line. (Note, you may want to reverse the changes to squid.conf to prevent this proxy being used directly.)


Next, we need a virus scanner. If you want to install without AV scanning, skip this step.

ClamAV is part of the repositories, so install using:

apt-get install clamav-daemon

We ant the virus definition databases to download automatically. This is performed by freshclam. Quick test: run freshclam (as superuser) with no parameters and check the output. If everything is OK you may create the log file in /var/log (owned by clamav or another user freshclam will be running as):

touch /var/log/freshclam.log
chmod 600 /var/log/freshclam.log
chown clamav /var/log/freshclam.log

I have set mine up to use cron, so add the following to crontab:

12 * * * *	/usr/local/bin/freshclam --quiet

(This will update at 12 minutes past the hour).


Next we install a proxy that will scan using ClamAV. This can be obtained from the main HavP website on [1] and should be unpacked with tar. As this isn't available as a package, the installation is the usual:

./configure --disable-clamav    (if you don't want /usr/local, use --prefix=/other/path)
make install

Note: I couldn't get it to detaect the ClamAV libraries, possibly die to using a packaged ClamAV. If you manage to fix this, please update here! also, a havp user and group should be created:

groupadd havp 
useradd -g havp havp

Finally, edit /usr/local/etc/havp.conf.

Setting up Transparent Proxying

--Wierdbeard65 13:31, 8 September 2009 (CEST)