iptables: Making the network work for you

Posted by David on Feb 11th, 2004

Doing interesting things with the network usually requires more than what can be provided by userland tools. Packet filtering and address translation, both common tasks, need to be performed as a part of the operating system’s network protocol layers. In Linux, tasks such as this are done through netfilter.

iptables, the command-line interface to netfilter, is the outil du jour for Linux packet filtering, and it requires at least a 2.4 kernel. For distributions released within the last year or so, the availability of iptables is generally not a problem.

Whither go the packets

Before creating netfilter rules, it’s important to understand the path that a packet takes through the network code. Each point at which something can happen to a packet is called a chain, since chains of rules are attached to these points. The traversal of these chains is summarized in this lovely bit of ASCII art, stolen from the netfilter-hacking HOWTO:

    --->PRE------>[ROUTE]--->FWD---------->POST------>
       Conntrack    |       Mangle   ^    Mangle
       Mangle       |       Filter   |    NAT (Src)
       NAT (Dst)    |                |    Conntrack
       (QDisc)      |             [ROUTE]
                    v                |
                    IN Mangle       OUT Conntrack
                    |  Filter        ^  Mangle
                    |  Conntrack     |  NAT (Dst)
                    v                |  Filter

“NAT”, “Mangle”, and “Filter” are the tables, from whence iptables derives its name. These can be viewed as collections of chains. The chains are the parts in all caps: PREROUTING, INPUT, OUTPUT, FORWARD, and POSTROUTING. Packets from the outside world start at PREROUTING, a routing decision occurs, and they are either delivered to INPUT and userland processes, or, if packet forwarding has been enabled, are passed to FORWARD before going to POSTROUTING and back into the outside world. Packets generated by processes running on the system start at OUTPUT, go from there to the routing code, and from there to POSTROUTING and the outside world.

Various things can happen to packets in each of the chains. The filter table is used to accept or deny packets, the nat table is used to perform address translation, and the mangle table is used to perform modifications on packets not covered by the other tables. The “Conntrack” hooks are where connection tracking occurs and don’t need to be directly manipulated, and “qdisc”, the queuing discipline used for traffic shaping and fair queuing, is sufficiently complex to fill another article.

Matching packets

Chains consist of lists of rules. When a packet enters a chain, netfilter will test it against the rules one at a time to see if it matches. If the packet does match, it is sent to the target specified by the rule, which could tell netfilter to accept the packet and move on, drop the packet, perform network address translation, or even send it to another chain for further processing. If the packet doesn’t match, the next rule is checked, and if the packet matches no rules it is sent to the default target for the chain.

The most common targets are ACCEPT and DROP, which are fairly self-explanatory. Note that DROP drops the packet on the floor and nothing more; it does not send a RST packet or any other such negative acknowledgment, so if a connection is attempted that matches a DROP target, it will appear as if your host does not exist. Use of this target will cause nmap to report ports as “filtered”.

Packets consist of several properties. They have a source and a destination. They have a protocol. Some protocols, such as TCP and UDP, have source and destination port numbers, which provides additional parameters that can be matched. Some protocols have no port numbers, however, such as ICMP, so port specifications are handled by the modules for TCP and UDP. For this reason, the order of the arguments given to iptables is important; in order to load the modules necessary to parse further arguments, the module must be specified either with “-p” in the case of protocols, or “-m” for other modules.

For example, the following adds a rule to the INPUT chain to block TCP packets from hejaz.example.com with a destination of port 80. If a web server were running on our machine, this would prevent hejaz from accessing it.

iptables -t filter -A INPUT -p tcp -s hejaz.example.com --destination-port 80 -j DROP

First, since we’re filtering packets, we want to use the filter table. Specifying the filter table isn’t necessary, since filter is the default. The “-t filter” could have simply been left off. “-A INPUT” tells iptables to append the rule to the end of filter’s portion of the INPUT chain. Rules can also be inserted at arbitrary points in a chain with -I, deleted with -D, and replaced with -R. See the man page for more on how to use these options. After the choice of chain are the packet matching rules: we want to match TCP packets, a source of hejaz.example.com, and a destination port of 80. Finally comes the target, specified by -j.

Source and destination addresses can also be given as network/mask pairs, or as CIDR network addresses. Suppose you wanted to allow access to UDP ports 137-139, used by the SMB protocol, only to 128.61.* (128.61.0.0/16).

iptables -A INPUT -p udp -s 128.61.0.0/16 --dport 137:139 -j ACCEPT

Here “–destination-port” is replaced by its alias –dport. They both do the same thing, but the latter is less to type. The colon in the argument to –dport specifies a range: ports 137 through 139.

This rule, however, raises another issue. Although this rule matches all packets from 128.61.*, packets from elsewhere will eventually hit the default rule for the chain, also known as the policy, which is ACCEPT. For our rule to be effective, we either need to explicitly block packets from elsewhere or change the policy of the chain to DROP. Any of the three following commands would make our SMB filtering rule more effective:

iptables -A INPUT -p udp ! -s 128.61.0.0/16 --dport 137:139 -j DROP
iptables -A INPUT -p udp --dport 137:139 -j DROP
iptables -P INPUT DROP

The first command uses ! to invert the source address, matching all packet not from 128.61.*. Most options can be given a ! to invert their meaning. The second rule leaves the address out, so will match any packet that satisfies the port specification, and hence must be occur later in the chain than our above ACCEPT rule. The last rule changes the policy for the chain to DROP, denying all packets that are not explicitly allowed elsewhere in INPUT.

Packets can also be matched based on interface, which is useful to prevent address spoofing on machines with multiple interfaces. -i is used to specify the input interface, and -o the output interface. This information is not known at all points, though. You can’t use -o in the PREROUTING or INPUT chain, since the output interface is not yet known; or -i in the POSTROUTING or OUTPUT chains, since there may not be an input interface. Only the FORWARD chain is able to use both the input and output interfaces.

Matching packets with connection tracking

One of the strengths of iptables over its predecessors is the addition of connection tracking. Rather than viewing each packet as an independent entity, netfilter is able to maintain information about connections and match packets based on their positions in these streams. This works best with TCP and other protocols that create connections, but can be used to some extent, through the use of timeouts, in connectionless protocols like UDP. This creation of a virtual connection state is also what is used for UDP and ICMP to be used across address translations.

The possible states of a packet are NEW, ESTABLISHED, RELATED, and INVALID. NEW packets initiate a connection, ESTABLISHED packets are part of connection that has already been created, RELATED packets start a new connection but are associated with some established connections (for example, ftp-data connections, or ICMP errors), and INVALID packets are not creating a new connection and not associated with any known connection.

ESTABLISHED and RELATED packets are almost always desirable, so something like the following is useful at the start of the INPUT chain:

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

“-m state” is necessary to load the state module, since “–state” is otherwise not recognized by iptables.

Network address translation

The number of IP addresses is limited, and for this reason Network Address Translation (NAT) was created. NAT can be used in order for several computers to share the same IP address in a fashion transparent to all but the router. Suppose you have three computers: orlock, who is confined to a local subnet inaccessible by the outside world; knock, a web server somewhere on the internet; and hutter, the router for orlock’s network, and who exists both on the local network and another subnet with a globally routable address. When orlock attempts to make a connection to knock, packets will be first sent to hutter for routing. Hutter will see that a connection is being made to the outside world, replace the source address of the packet using his own IP address, and send it off to knock. Knock will see a connection coming from hutter, and send packets back to hutter. When hutter sees packets coming back on the translated connection, he will replace the destination IP address with orlock’s and send them on to orlock on the local network. Thus orlock thinks that he has a connection to knock, knock thinks that he has a connection to hutter, and only hutter sees the truth of both sides.

hutter in this case is acting as a router, which means that packet forwarding must be enabled. This is done by setting the net.ipv4.ip_forward sysctl variable to 1. Also note that these packets will traverse the FORWARD chain, so the policy for FORWARD is DROP and no rule is added to permit NAT packets through, NAT will appear curiously broken.

sysctl -w net.ipv4.ip_forward=1

Network address translation of this sort is performed in the POSTROUTING chain, in the nat table. Its simplest form is the MASQUERADE target, which maps the source of a connection to the output interface’s address. Supposing that the output interface is named “out”, this would be like the following:

iptables -t nat -A POSTROUTING -o out -j MASQUERADE

MASQUERADE requires address lookups for outgoing packets, since the nat rule may outlive the interface’s address, and all connection information is forgotten when the interface goes down, since its address is assumed to be dynamic. For static IP addresses, SNAT, which creates an explicit source translation to a particular address, is more desirable. If your address to the outside world is 1.2.3.4, the following will perform the same task as the MASQUERADE target:

iptables -t nat -A POSTROUTING -o out -j SNAT --to-source 1.2.3.4

“–to-source” can be abbreviated as “–to”. SNAT addresses are actually of the form <address>[:<port>], so it can be used to translate any outgoing packet to any desired host and port. Applications of this are left as an exercise to the reader.

One of the problems with NAT is that connections cannot be made to machines inside the translated network. A common solution to this is port forwarding: connections to some TCP or UDP port on the router are forwarded to some port on one of the internal machines. Before iptables, this was accomplished through a mechanism separate from the main firewall rules, but iptables simply views this operation as a different sort of NAT. Whereas SNAT translates a connection to any source, DNAT translates a connection to any destination.

Since changing the destination of a packet affects routing decisions, DNAT rules must occur in the PREROUTING (or OUTPUT) chain. For example, to forward the destination TCP port 1000 on hutter to port 22 on orlock:

iptables -t nat -A PREROUTING -p tcp --dport 1000 -j DNAT --to-destination orlock:22

“–to-destination” can also be abbreviated as simply “–to”. This will work just as SNAT: machines will believe that they are communicating with hutter on port 1000, while orlock will believe that the outside world is connecting to him on port 22.

Other helpful targets

This is just an overview of some more of the options available. See the iptables documentation for a more complete description of the targets and their options.

REJECT

The REJECT target can be used to send back a negative response for a packet. This is often more useful than DROP, since it doesn’t disrupt the normal behavior of a connection failure, and it can conceal the fact that any filtering has occurred.

# reject TCP ident with reset
iptables -A INPUT -p tcp --dport 113 -j REJECT --reject-with tcp-reset
# reject UDP DNS
iptables -A INPUT -p udp --dport 53 -j REJECT --reject-with icmp-port-unreachable

LOG

LOG is used to write some information to the kernel log about a packet. This is a non-terminating target, so when LOG rules are matched, traversal of the chain will continue with the next rule.

# log then drop attempts to connect to port 31337
iptables -A INPUT -p tcp --dport 31337 -j LOG --log-prefix "31337 connect attempt: "
iptables -A INPUT -p tcp --dport 31337 -j DROP

TOS

TOS is used to modify the type of service field of IP packets. This an example of one of the targets usable in the mangle table. Like LOG and most other mangle targets, it is non-terminating, so packets can match more than one such rule.

Type of Service can be used to determine the priority of incoming and outgoing packets, and may affect queuing choices made by the routers between hosts. By default Linux splits traffic into three queues based on the TOS field and attempts to deliver low-latency traffic before normal traffic, and bulk traffic (minimize-cost and maximize-throughput) after normal traffic.

# set all outgoing SMTP traffic to low-priority
iptables -t mangle -A OUTPUT -p tcp --dport 25 -j TOS --set-tos Minimize-Cost

MARK

MARK, accessible in the mangle table, sets the netfilter mark value on a packet, which is just an arbitrary integer. This is usually used by the advanced routing tools, but can also be used to construct more complex netfilter behaviors. The mark value can be matched by other rules through the mark module.

# Set a mark for packets coming in on if0
iptables -t mangle -A PREROUTING -i if0 -j MARK --mark 47
# drop packets with a mark of 47 for the sake of the example
iptables -A POSTROUTING -m mark --mark 47 -j DROP

Other helpful modules

limit

The limit module allows packets to be matched at a certain rate. This can be used for a crude sort of traffic shaping, or to prevent the LOG target from flooding the kernel log with packets.

# log new connections to port 25, max. of 3 per minute logged
iptables -A INPUT -p tcp --dport 25 -m limit --limit 3/minute -m state --state NEW -j LOG --log-prefix "new smtp connection: "

multiport

multiport offers a convenient way to match an arbitrary set of ports in a single rule. It adds a ’s’ suffix to –source-port, –destination-port, and their abbreviated equivalents.

# allow packets to smtp, www, and dns
iptables -A INPUT -p tcp -m multiport --dports 25,80,53 -j ACCEPT

recent

The recent module is a fairly new addition to iptables. It allows for the construction of temporary “bad-guy” lists. A host can be added to the recent list, and then future packets can be matched against this list.

# drop and add attempts to connect to 127.0.0.0 from outside
iptables -A FORWARD -i eth0 -d 127.0.0.0/8 -m recent --set -j DROP
# drop any packets from recent list for 60 seconds, reset timer
iptables -A FORWARD -m recent --update --seconds 60 -j DROP

Multiple recent lists can be created by using the “–name” option. The name of the default list is “DEFAULT”. Recent lists can also be used to implement “port knocking” authentication, where a client has to send a packet to a sequence of ports before being allowed through on the eventual target port.

# only allow ssh connections after a connection attempt is made to port 79,
# followed by an ICMP echo request
# only one recent list can be used at a time, so need to use MARK to pass
# hosts between lists
# doing all recent list modification in the mangle table, setting a final
# mark to match in the filter table

# allow things with mark set to 47
iptables -A INPUT -p tcp --dport 22 -m mark --mark 47 -j ACCEPT
# block everything else to port 22
iptables -A INPUT -p tcp --dport 22 -j DROP

# add the knock to port 79 to PHASE1 recent list
iptables -t mangle -A INPUT -p tcp --dport 79 -m recent --name PHASE1 --set -j ACCEPT

# check for subsequent knock in form of ping, add to PHASE2
# can't manipulate two lists at once, using mark of 420 to pass between
# also can't use --seconds with --remove, so doing that in two steps
# setting mark the second time as a do-nothing fall through target
iptables -t mangle -A INPUT -p icmp --icmp-type echo-request -m recent --name PHASE1 --seconds 60 --rcheck -j MARK --set-mark 420
iptables -t mangle -A INPUT -m mark --mark 420 -m recent --name PHASE1 --remove -j MARK --set-mark 420
iptables -t mangle -A INPUT -m mark --mark 420 -m recent --name PHASE2 --set -j ACCEPT

# for anything that made it to PHASE2, set the mark to 47
iptables -t mangle -A INPUT -m recent --name PHASE2 --seconds 60 --update -j MARK --set-mark 47

connlimit

connlimit, formerly known as iplimit, is another new addition to iptables, and still lacks support in the mainstream kernel. A patch for kernel connlimit support exists in the patch-o-matic set available at netfilter.org.

connlimit allows matches to be made based on the number of connections currently open from a particular host or group of hosts.

# limit number of connections to httpd to 3 per host
iptables -A INPUT -p tcp --dport 80 -m connlimit --connlimit-above 3 -j REJECT --reject-with tcp-reset

Example firewall script

Although it is certainly possible to initialize your iptables rules in a simple shell script, there are also tools provided by iptables to save and restore the state of your firewall. These tool, iptables-save and iptables-restore, use a format that is essentially just the arguments that would be given to the iptables command. Using iptables-restore instead of a shell script has several advantages. You can set rules one at a time until you have what you want, then use iptables-save to spit out the state of the firewall, rather than trying to duplicate your actions in a script. Also, iptables-restore sets the firewall all at once, instead of running a separate iptables process for each rule. The state file is transaction based, so the truly paranoid can construct their firewall in such a way that attackers can’t slip a packet or two through while the firewall is being brought up. The following examples use the format used by iptables-restore.

# Simple firewall script, performs some common tasks
# local interface is if0, internet-facing interface is if1
# local subnet is 192.168.73.0/24, internet IP is 128.61.70.47

# start with the filter table
*filter
# set policies
# accept everything on INPUT and OUTPUT,
# permit nothing to be forwarded unless explicitly allowed
:INPUT ACCEPT
:FORWARD DROP
:OUTPUT ACCEPT

# if a connection is made through some other means, keep it up
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
# accept everything coming in from the local subnet
-A INPUT -i if0 -j ACCEPT
# accept everything from loopback
-A INPUT -i lo -j ACCEPT

# allow TCP ftp, ssh, smtp, dns, http
-A INPUT -p tcp -m multiport --dports 21,22,25,53,80 -j ACCEPT

# allow UDP dns
-A INPUT -p udp --dport 53 -j ACCEPT

# drop everything else to ports 0-8000, since nothing should be running there
# leaving everything above 8000 open so users can run random crap
-A INPUT -p tcp --dport 0:8000 -j REJECT --reject-with tcp-reset
-A INPUT -p udp --dport 0:8000 -j REJECT --reject-with icmp-port-unreachable

# allow forwarding from the local subnet to anywhere
-A FORWARD -i if0 -j ACCEPT
# allow established connections to continue
-A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
# allow a port-forward defined in the nat table
-A FORWARD -p tcp -d 192.168.74.4 --dport 6346 -j ACCEPT

# commit it
COMMIT

# nat table
*nat
 :P REROUTING ACCEPT
 :P OSTROUTING ACCEPT
:OUTPUT ACCEPT

# fire up NAT
-A POSTROUTING -o if1 -j SNAT --to 128.61.70.47

# forward TCP port 6346 to 192.168.73.4
-A PREROUTING -p tcp --dport 6346 -j DNAT --to 192.168.73.4:6346

COMMIT

# mangle table
*mangle
 :P REROUTING ACCEPT
:INPUT ACCEPT
:FORWARD ACCEPT
:OUTPUT ACCEPT
 :P OSTROUTING ACCEPT

# set all incoming SMTP traffic to min-cost
# packets from the world coming in
-A PREROUTING -p tcp --dport smtp -j TOS --set-tos Minimize-Cost
# packets from me going out
-A POSTROUTING -p tcp --sport smtp -j TOS --set-tos Minimize-Cost
COMMIT

Further reading