$Id: README,v 1.8 2010/10/25 17:16:00 cm-msk Exp $

INTRODUCTION

This is the development area for OpenDKIM's statistics collection system.

The OpenDKIM filter has an option to collect and record statistics
about DKIM's operation including information like number of signatures,
percent of signature failures, first-party vs. third-party signatures,
ADSP statistics, etc. and store them into a file.  An additional tool
called opendkim-stats (in this directory) can be used to retrieve
this information and display it, mail it someplace for data aggregation,
or insert it into an SQL database.

This is especially useful for the industry to collect usage statistics
about how DKIM is being used, the proliferation of third-party signing,
etc.  You are encouraged, but certainly not required, to provide usage
information to The OpenDKIM Project.  The code that does the reporting is
open source and included in this package, so you can easily verify that
nothing private is being revealed in the data thus submitted.

This directory contains information and software for generating, receiving
and processing such reports, including commands for creating a MySQL database
to store the reports for later query, and a program that can receive reports
generated by opendkim-stats to put such data directly into that database.


INSTALLATION

This system can be made to work with any SQL-based system, but the provided
scripts and documentation presume MySQL.

1.	Compile OpenDKIM using the "--enable-stats" option:

	% ./configure --enable-stats
	% make

	If you wish to apply local extensions to the statistics reporting
	system, see the EXTENSIONS section below.

2.	Install MySQL.  Create credentials for an "opendkim" user, optionally
	with some password.  Create a database called "opendkim".  Note that
	this does NOT refer to a "real" UNIX user and password.  The user you
	create must be granted SELECT and INSERT access to that table.

3. 	From inside the MySQL client, source the "mkdb.mysql" script.
	This will create the tables required to store the statistics reports.

4.	Install the "opendkim-stats" program someplace.

5.	Configure OpenDKIM to have statistics enabled, and begin reporting
	them to a file someplace.  See the opendkim(8) and opendkim.conf(5)
	man pages for details.

	By default, statistics will be recorded anonymized.  If you wish to
	provide unencoded statistics to improve the utility of the data
	at some privacy expense, adjust your configuration accordingly.

6.	Restart opendkim.

7a.	To get a human-readable form of the recorded statistics, use:

		opendkim-stats /path/to/stats/file

	...using, of course, the path to the statistics file you configure
	in opendkim.conf.  You can reset the contents of that file at any
	time by simply removing it or copying /dev/null on top of it.  The
	filter will create/append to the file on the next received message.

7b.	To translate records in the recorded statistics file into SQL
	insert operations, use:

		opendkim-importstats /path/to/stats/file

	...again using the path you specified in opendkim.conf.  You will
	probably also need one or more of the following:

		-d dbname	database name (default "opendkim")
		-p dbpasswd	database user's password (no default)
		-s scheme	database scheme (default "mysql")
		-u dbuser	database user (default "opendkim")

	Append "-r" to this to remove the statistics file on completion.

	You can also use the provided opendkim-genstats script to generate
	useful reports from the accumulated data.  Contribution of other
	reports you find useful would be welcome.

7c.	To participate in The OpenDKIM Project's data collection work,
	ask for the submission address you should use and then create this
	script and add it to your crontab:

		mv /path/to/stats/file /path/to/stats/file.OLD
		sendmail submission-address < /path/to/stats/file.OLD
		rm /path/to/stats/file.OLD

	When your data are received, The OpenDKIM project will use the
	aforementioned "opendkim-importstats" tool to import your data into
	the accumulated database.  You can view regularly generated reports,
	including your data, at:

		http://www.opendkim.org/stats/report.html

FILE FORMAT

The format of the file written by the opendkim filter is described here.
If demand appears, a stable API for accessing it will be provided.  Until
then, application developers are advised not to rely on this information being
stable between versions.

A statistics file consists of lines of ASCII data which are delimited from
each other by a single LF (ASCII 10).

Empty lines or lines beginning with other than alphabetic characters are
ignored completely.

A line in the file that begins with a capital letter identifies the type
of record it represents.  The first such line also implicitly ends the global
values section.

There are currently these record types:

	M	identifies a message
	S	identifies a signature

A message record is a tab-separated, ordered sequence of fields, as follows:

	MTA-provided job/envelope ID (string)
	reporter (string; defaults to hostname)
	first domain found in the From: header field (string)
	SMTP client IP address (string)
	anonymized (0 = no, 1 = yes)
	UNIX timestamp of message receive time
	message size, in bytes
	signature count
	ADSP record found in DNS (0 = no, 1 = yes)
	ADSP "unknown" policy found in DNS (0 = no, 1 = yes)
	ADSP "all" policy found in DNS (0 = no, 1 = yes)
	ADSP "discardable" policy found in DNS (0 = no, 1 = yes)
	ADSP failed for this message (0 = no, 1 = yes)
	appeared to come from a mailing list (0 = no, 1 = yes)
	count of Received: header fields
	overall MIME Content-Type, if any
	overall MIME Content-Transfer-Encoding, if any
	ATPS status (-1 = not checked, 0 = no, 1 = yes)

A signature record implicitly references the preceding message record.
There may be more than one signature record per message; there could also be
none.  As above, a signature record is a tab-separated, ordered sequence of
fields, as follows:

	domain of the signature
	algorithm (0 = rsa-sha1, 1 = rsa-sha256)
	header canonicalization (0 = simple, 1 = relaxed)
	body canonicalization (0 = simple, 1 = relaxed)
	tagged for ignore (0 = no, 1 = yes)
	pass (0 = no, 1 = yes)
	failed due to "bh" mismatch (0 = no, 1 = yes)
	"l=" tag value (-1 = not present)
	"t=" present in key (0 = no, 1 = yes)
	"g=" present in key (0 = no, 1 = yes)
	"g=" present in key with a value other than "*" (0 = no, 1 = yes)
	key was DK-compatible (0 = no, 1 = yes)
	error code from signature
	"t=" present in signature (0 = no, 1 = yes)
	"x=" present in signature (0 = no, 1 = yes)
	"z=" present in signature (0 = no, 1 = yes)
	DNSSEC value (see DKIM_DNSSEC_* constants from dkim.h)
	colon-separated list of header fields signed
	colon-separated list of header fields known to have changed
	"i=" domain (0 = absent, 1 = matches From:, 2 = other)
	"i=" user (0 = absent, 1 = empty, 2 = matches From:, 3 = other)
	"s=" present in key (0 = absent, 1 = "*", 2 = "email", 3 = other)
	key size, in bits

Fields for which no value is known or appropriate should be represented as
"-" in the file.


EXTENSIONS

For the purpose of allowing local experimentation, it is possible to
extend the statistics reporting to include data about a message not covered by
the basic schema distributed with OpenDKIM.

Extension data are collected and stored during execution of the Lua "final"
script and then written out to the stats file associated with that message.
These items are written on "X" lines in the form "name value" (i.e. name and
value separated by whitespace).  opendkim-importstats will insert these into
your database using "name" as the name of the column to be updated and "value"
as the value to be placed there.

Choosing the "--enable-statsext" flag at build configuration time adds support
for this.  The mechanism for recording these extra statistics is the
odkim.stats() function, detailed in the opendkim-lua(3) man page.

In this way, one can create any supplementary message-specific columns as
is desirable and populate them whatever way is appropriate for each.

Participants that find particular additional columns produce interesting
correlations are encouraged to share them with the OpenDKIM community.


CONVERTING TO THE IPADDRS SCHEMA

In version 2.3.0 of OpenDKIM, the schema changed to add a new "ipaddrs"
table, moving that data from each row of the "messages" table (causing
duplication).  The opendkim-importstats tool in 2.3.0 expects the new schema
and is not back-compatible with prior releases.

To convert your existing databases, use these steps:

1) Run the stats/mkdb.mysql script from inside your MySQL client to create
   the required new table.  Existing tables will not be altered.

2) Run the following additional MySQL commands (lines are wrapped here for
   readability but each represents a single command):

	a) LOCK TABLES messages WRITE, ipaddrs WRITE;
	b) ALTER TABLE messages ADD COLUMN ip INT UNSIGNED AFTER ipaddr;
	c) INSERT INTO ipaddrs (addr, firstseen) 
		SELECT DISTINCT ipaddr, MIN(msgtime) FROM messages
		GROUP BY ipaddr;
	d) UPDATE messages
		SET ip = (SELECT id FROM ipaddrs WHERE addr = messages.ipaddr);
	e) ALTER TABLE messages MODIFY COLUMN ip INT UNSIGNED NOT NULL,
		DROP COLUMN ipaddr,
		ADD CONSTRAINT FOREIGN KEY(ip) REFERENCES ipaddrs(id)
		ON DELETE CASCADE;
	f) UNLOCK TABLES;

   The various ALTER TABLE commands may require a rebuild of the "messages"
   table.  This can take a large amount of time if the table is large.
