This is a very first implementation of Postfix content filtering.
A Postfix content filter receives unfiltered mail from Postfix and
either bounces the mail or re-injects filtered mail back into Postfix.

This document describes two approaches to content filtering.

Simple content filtering example
================================

The first example is simple to set up.  It uses a shell script that
receives unfiltered mail from the Postfix pipe delivery agent, and
that feeds filtered mail back into the Postfix sendmail command.
Only mail arriving via SMTP will be content filtered.

                  ..................................
                  :            Postfix             :
Unfiltered mail----->smtpd \                /local---->Filtered mail
                  :         -cleanup->queue-       :
               ---->pickup /                \smtp----->Filtered mail
               ^  :                        |       :
               |  :                         \pipe-----+
               |  ..................................  |
               |                                      |
               |                                      |
               +-Postfix sendmail<----filter script<--+

The /some/where/filter program can be a simple shell script like this:

    #!/bin/sh

    # Localize these.
    INSPECT_DIR=/var/spool/filter
    SENDMAIL="/usr/sbin/sendmail -i"

    # Exit codes from <sysexits.h>
    EX_TEMPFAIL=75
    EX_UNAVAILABLE=69

    # Clean up when done or when aborting.
    trap "rm -f in.$$" 0 1 2 3 15

    # Start processing.
    cd $INSPECT_DIR || { echo $INSPECT_DIR does not exist; exit $EX_TEMPFAIL; }

    cat >in.$$ || { echo Cannot save mail to file; exit $EX_TEMPFAIL; }

    # filter <in.$$ || { echo Message content rejected; exit $EX_UNAVAILABLE; }

    $SENDMAIL "$@" <in.$$

    exit $?

The idea is to first capture the message to file and then run the
content through a third-party content filter program.  If the
mail cannot be captured to file, mail delivery is deferred by
terminating with exit status 75 (EX_TEMPFAIL).  If the content
filter program finds a problem, the mail is bounced by terminating
the shell script with exit status 69 (EX_UNAVAILABLE).  If the
content is OK, it is given as input to Postfix sendmail, and the
exit status of the filter command is whatever exit status Postfix
sendmail produces.

I suggest that you play with this script for a while until you are
satisfied with the results. Run it with a real message (headers+body)
as input:

    % /some/where/filter -f sender recipient... <message-file

Once you're satisfied with the content filtering script:

1 - Create a dedicated local user account called "filter".  This
    user handles all potentially dangerous mail content - that is
    why it should be a separate account. Do not use "nobody", and
    most certainly do not use "root" or "postfix".  The user will
    never log in, and can be given a "*" password and non-existent
    shell and home directory.

2 - Create a directory /var/spool/filter that is accessible only
    to the "filter" user. This is where the content filtering script
    is supposed to store its temporary files.

3 - Define the content filter in the Postfix master file:

    /etc/postfix/master.cf:
      filter    unix  -       n       n       -       -       pipe
        flags=Rq user=filter argv=/somewhere/filter -f ${sender} -- ${recipient}

To turn on content filtering for mail arriving via SMTP only, append
"-o content_filter=filter:" to the master.cf entry that defines
the Postfix SMTP server:

    /etc/postfix/master.cf:
	smtp      inet     ...stuff...      smtpd
	    -o content_filter=filter:

Note the ":" at the end!!  The content_filter configuration parameter
accepts the same syntax as the right-hand side in a Postfix transport
table. Execute "postfix reload" to complete the change.

To turn off content filtering, edit the master.cf file, remove the
"-o content_filter=filter:" text from the entry that defines the
Postfix SMTP server, and execute another "postfix reload".

With the shell script as shown above you will lose a factor of four
in Postfix performance for transit mail that arrives and leaves
via SMTP. You will lose another factor in transit performance for
each additional temporary file that is created and deleted in the
process of content filtering.  The performance impact is less for
mail that is submitted or delivered locally, because such deliveries
are already slower than SMTP transit mail.

Simple content filter limitations
=================================

The problem with content filters like the one above is that they
are not very robust, because the software does not talk a well-defined
protocol with Postfix. If the filter shell script aborts because
the shell runs into some memory allocation problem, the script will
not produce a nice exit status as per /usr/include/sysexits.h and
mail will probably bounce. The same lack of robustness is possible
when the content filtering software itself runs into a resource
problem.

Advanced content filtering example
===================================

The second example is considerably more complex, but can give much
better performance, and is less likely to bounce mail when the
machine runs into a resource problem.  This approach uses content
filtering software that can receive and deliver mail via SMTP.

You can expect to lose about a factor of two in Postfix performance
for transit mail that arrives and leaves via SMTP, provided that
the content filter creates no temporary files. Each temporary file
created by the content filter adds another factor to the performance
loss.

We will set up a content filtering program that receives SMTP mail
via localhost port 10025, and that submits SMTP mail back into
Postfix via localhost port 10026.

      ..................................
      :            Postfix             :
   ----->smtpd \                /local---->
      :         -cleanup->queue-       :
   ---->pickup /    ^       |   \smtp----->
      :             |       v          :
      :           smtpd    smtp        :
      :           10026     |          :
      ......................|...........
                    ^       |
                    |       v
                ....|............
                :   |     10025 :
                :   filter      :
                :               :
                .................

To enable content filtering in this manner, specify in main.cf a
new parameter:

    /etc/postfix/main.cf:
	content_filter = scan:localhost:10025

This causes Postfix to add one extra content filtering record to
each incoming mail message, with content scan:localhost:10025.
The content filtering records are added by the smtpd and pickup
servers.

When a queue file has content filtering information, the queue
manager will deliver the mail to the specified content filter
regardless of its final destination.

In this example, "scan" is an instance of the Postfix SMTP client
with slightly different configuration parameters. This is how
one would set up the service in the Postfix master.cf file:

    /etc/postfix/master.cf:
        scan      unix  -       -       n       -       10       smtp
	    -o disable_dns_lookups=yes

Turning off DNS lookups at this point can make a great difference
in content filtering performance. It also isolates the content
filtering process from temporary outages in DNS service.

Instead of 10, use whatever limit is feasible for your machine.
Content inspection software can gobble up a lot of system resources,
so you don't want to have too much of it running at the same time.

The content filter can be set up with the Postfix spawn service,
which is the Postfix equivalent of inetd. For example, to instantiate
up to 10 content filtering processes on demand:

    /etc/postfix/master.cf:
	localhost:10025     inet  n      n      n      -      10     spawn
	    user=filter argv=/some/where/filter localhost 10026

"filter" is a dedicated local user account.  The user will never
log in, and can be given a "*" password and non-existent shell and
home directory.  This user handles all potentially dangerous mail
content - that is why it should be a separate account.

In the above example, Postfix listens on port localhost:10025.  If
you want to have your filter listening on port localhost:10025
instead of Postfix, then you must run your filter as a stand-alone
program.

Note: the localhost port 10025 SMTP server filter should announce
itself as "220 localhost...", to silence warnings in the log.

The /some/where/filter command is most likely a PERL script. PERL
has modules that make talking SMTP easy. The command-line specifies
that mail should be sent back into Postfix via localhost port 10026.

For now, it is left up to the Postfix users to come up with a
PERL/SMTP framework for Postfix content filtering. If done well,
it can be used with other mailers too, which is a nice spin-off.

The simplest content filter just copies SMTP commands and data
between its inputs and outputs. If it has a problem, all it has to
do is to reply to an input of `.' with `550 content rejected', and
to disconnect without sending `.' on the connection that injects
mail back into Postfix.

The job of the content filter is to either bounce mail with a
suitable diagnostic, or to feed the mail back into Postfix through
a dedicated listener on port localhost 10026:

    /etc/postfix/master.cf:
        localhost:10026     inet  n      -      n      -      10      smtpd
	    -o content_filter= 
	    -o local_recipient_maps=
	    -o myhostname=localhost.domain.tld
	    -o smtpd_helo_restrictions=
	    -o smtpd_client_restrictions=
	    -o smtpd_sender_restrictions=
	    -o smtpd_recipient_restrictions=permit_mynetworks,reject
	    -o mynetworks=127.0.0.0/8

This is just another SMTP server. The "-o content_filter=" requests
no content filtering for incoming mail. The server has the same
process limit as the "filter" master.cf entry.

The "-o local_recipient_maps=" is a safety in case you have specified
local_recipient_maps in the main.cf file. That could interfere with
content filtering. 

The "-o myhostname=localhost.domain.tld" avoids a possible problem
if the content filter is picky about the hostname that Postfix
sends in SMTP server replies.

The "-o smtpd_xxx_restrictions" and "-o mynetworks=127.0.0.0/8"
turn off UCE controls that would only waste time here.

Squeezing out more performance
==============================

Many refinements are possible, such as running a specially-configured
smtp delivery agent for feeding mail into the content filter, and
turning off address rewriting before or after content filtering.

As the example below shows, things quickly become very complex,
because a lot of main.cf like information gets listed in the
master.cf file. This makes the system hard to understand.

If you need to squeeze out more performance, it is probably simpler
to run multiple Postfix instances, one before and one after the
content filter. That way, each instance can have simple main.cf
and master.cf files, and the system will be easier to understand.

As before, we will set up a content filtering program that receives
SMTP mail via localhost port 10025, and that submits SMTP mail back
into Postfix via localhost port 10026.

      ...................................
      :            Postfix              :
   ----->smtpd \                        :
      :         -cleanup-\       /local---->
   ---->pickup /          -queue-       :
      :          cleanup2/   |   \smtp----->
      :             ^        v          :
      :             |        v          :
      :           smtpd     scan        :
      :           10026      |          :
      .......................|...........
                    ^        |
                    |        v
                ....|.............
                :   |      10025 :
                :   filter       :
                :                :
                ..................

To enable content filtering in this manner, specify in main.cf a
new parameter:

/etc/postfix/main.cf:
    content_filter = scan:localhost:10025

/etc/postfix/master.cf:
#
# This is the usual input "smtpd" already present in master.cf
#
smtp                inet  n      -      n      -       -      smtpd
#
# This is the cleanup daemon that handles messages in front of
# the content filter, it does header_checks and body_checks (if
# any), but does not do any address rewriting.
#
# The address rewriting happens in the second cleanup phase after
# the content filter. This gives the content_filter access to
# *largely* unmodified addresses for maximum flexibility.
#
# The trivial-rewrite daemon handles the logic of append_myorigin
# and append_dot_mydomain, turning these off requires two
# trivial-rewrite services, by which point (if you are not
# already) you are much better off with two complete Postfix
# instances one in front of and one behind the content filter.
#
# Note that some sites may specifically want to do the opposite:
# perform rewrites in front of the content_filter which would
# then see only cleaned up addresses, in this case the parameter
# settings below should be moved to the second "cleanup"
# instance.
#
cleanup             unix  n      -      n      -       0      cleanup
	-o canonical_maps=
	-o sender_canonical_maps=
	-o recipient_canonical_maps=
	-o masquerade_domains=
	-o virtual_maps=
#
# This is the delivery agent that injects mail into the content
# filter.  It is tuned for low latency and low concurrency, most
# content filters burn CPU and high concurrency causes thrashing.
# The process limit of 10 reenforces the effect of
# $default_destination_concurrency_limit, even without an
# explicit process limit, the concurrency is bounded because all
# messages heading into the content filter have the same
# destination. The "disable_dns_lookups" setting prevents the
# delivery agent from consuming precious "bandwidth" in the
# narrow deliver channel waiting for slow DNS responses. It also
# ensures that the original envelope recipient is seen by the
# content filter.
#
scan                unix  -      -      n      -      10      smtp
    -o disable_dns_lookups=yes
#
# This is the SMTP listener that receives filtered messages from
# the content filter. It *MUST* clear the content_filter
# parameter to avoid loops, and use a different hostname to avoid
# triggering the Postfix SMTP loop detection code.
#
#
# Since all recipients have been validated by the first "smtpd",
# clear local_recipient_maps, virtual_maps and
# virtual_mailbox_maps.
#
# This "smtpd" uses a separate cleanup that does no header or
# body checks, but does do the various address rewrites disabled
# in the first cleanup.
#
# The parameters from mynetworks onward disable all access
# control other than insisting on connections from one of the IP
# addresses of the host. This is typically overkill, but can
# reduce resource usage, if the default restrictions use lots of
# tables.
#
localhost:10026     inet  n      -      n      -       -      smtpd
    -o content_filter= 
    -o myhostname=localhost.domain.tld
    -o local_recipient_maps=
    -o virtual_maps=
    -o virtual_mailbox_maps=
    -o cleanup_service_name=cleanup2 
    -o mynetworks=127.0.0.0/8
    -o mynetworks_style=host
    -o smtpd_restriction_classes=
    -o smtpd_client_restrictions=
    -o smtpd_helo_restrictions=
    -o smtpd_sender_restrictions=
    -o smtpd_recipient_restrictions=permit_mynetworks,reject
#
# This is the second cleanup daemon. No header or body checks.
# If preferable, enable rewrites in the first cleanup daemon, and
# disable them here.
#
cleanup2            unix  n      -      n      -       0      cleanup
    -o header_checks=
    -o mime_header_checks=
    -o nested_header_checks=
    -o body_checks=
#
# The normal "smtp" delivery agent for contrast with "scan".
# Definitely do not set "disable_dns_lookups = yes" here if you
# send mail to the Internet.
#
smtp                unix  -      -      n      -      -       smtp

This causes Postfix to add one extra content filtering record to
each incoming mail message, with content scan:localhost:10025.
You can use the same syntax as in the right-hand side of a Postfix
transport table.  The content filtering records are added by the
smtpd and pickup servers.

The "scan" transport is a dedicated instance of the "smtp" delivery
agent for injecting messages into the SMTP content filter. Using
a dedicated "smtp" transport allows one to tune it for the specific
task of delivering mail to a local content filter (low latency,
low concurrency, throughput dependent on predictably low latency).

See the previous example for setting up the content filter with
the Postfix spawn service; you can of course use any server that
can be run stand-alone outside the Postfix environment.
