/**
 * \file SCRIPT-readme
 *
 * see script_lua.c
 * see fetchnews.c::getarticle()
 * see store.c::store_stream()
 * see nntpd.c all over
 *
 * for testing the fetchnews support,
 * see script-test-store.c
 * see Makefile-script-test
 *
 * clemens fischer <ino-news@spotteswoode.dnsalias.org>
 */

= INTRODUCTION =

The scripting-extension:

- doesn't change any of the existing facilities
- caches header and body for analysis
- can replace/change header in Lua hooks
- can replace/change the cached part of the body in Lua hooks
- can reject groups or articles based on hook outcome
- can redirect incoming articles to eg. "local.spam" based on hook outcome
- defers disk I/O until filters/hooks ok article
- could run PCRE patterns on cached body (#ifdef'ed, incomplete)

This means, for example, that you can do keyword-searches on all
incoming articles and file away all matching articles in a local archive
group.  Of course you can also archive every article mentioning your
host or your name.  Thus you can participate in hundreds of newsgroups,
but you need only watch one single newsgroup for complete threads with
all the followups to your articles!  Since this is implemented as part
of a leafnode extension, it works with every client.  As the
implementation is scripted in your own Lua hooks, you can customize it
to your hearts desires, you can EVEN DO WITHOUT it!

and last but not least:

- people can add own backends and use "./script-redirects.c" for this

= WHAT YOU CAN DO =

If you can make do with filtering headers, nothing will change.

Scripting is for you, if you:

- want to combine matches on several headers or the body
- want to combine matches with earlier results
- need to connect a bayes filter to keep down maintenance
- want to redirect suspect articles for later inspection
- need to connect to other services for the filtering decision
- want to edit headers or part of the body
- want to archive articles or entire threads based on keywords

One example:  I lurk in a dozen different technical newsgroups, and
on occasion on "kooky" newsgroups like NANAE.  I spend at most an
hour reading until I get bored and do something else, so at times
people answer to a post of mine which could go unnoticed long enough
for the thread to trickle away.  Using the "KeywordSearches" table
implemented in "ln2_distmod.lua" from the distribution, I can have
fetchnews automatically crosspost every followup mentioning my hostname
anywhere in the headers into a local.archive newsgroup.  So when I get
to newsreading, I check this group and see what's been going on in the
meantime, without th eneed to remember and check every newsgroup and
post:

KeywordSearches = {
    ["."] = {
        {
            (arc_plus_threaded .. archive_ngs_prefix .. ".default"),
            { search_header, {"%pspotteswoode%p", hostname}},
            { search_body, {"%pspotteswoode%p", hostname, "clemens fischer"} },
        },
    },
    ["comp.arch.embedded"] = {
        {
            (arc_plus_threaded .. archive_ngs_prefix .. ".tronix"),
            { search_header, {"%pspotteswoode%p", hostname}},
            { search_body, {"%pspotteswoode%p", hostname, "clemens fischer"} },
        },
        {
            (arc_plus_threaded .. archive_ngs_prefix .. ".tronix"),
            { search_body, {"\n[^>]*LPC.?2148"} },
        },
    },
    ...
}

"arc_plus_threaded" in this context means: whenever fetchnews finds an
incoming article matching the search criteria, put it into the spool
as usual, crosspost to the local archiving group mentioned and store
its Message-ID someplace.  Thereafter, if this Message-ID is seen in a
References: header, even if not containing the catch-phrase, archive
the article as well.  This way I get the thread starting from the first
match.  Note the entry marked ["."], which matches every group: it
will do this magic to threads just mentioning me for every group I am
currently subscribed to if no more specific entry exists.  All matches
go to "local.archive.default".

The emphasis of the extension is sophisticated filtering in the storage
phase.  Thus only fetchnews is hooked up to the framework, not texpire
or leafnode, which might be added later depending on user interest.

= IS IS REAL HARD TO MAKE UP ONES OWN SCRIPTS? =

Yes, but not on the part of this implementation.  If you want to use
Lua, you will be using a language looking like a combination of basic
and scheme.  It is basic, because it is really simple.  It is scheme,
because it is functional and all values are "first class" in the sense
that they can be named, stored and used everywhere.  You can use
closures and make objects, too, but you don't write as many parenthesis
as in lisp.  In fact, if your function takes only a single argument, you
can omit them entirely.

The Lua implementation of scripting makes use of "protected calls" as
much as possible.  This means that you cannot crash a program, instead
you will receive very informative error messages citing precisely the
line and the type of an error.  If Lua finds one, your program will
always continue to run after issueing the error message, so your feature
won't work, but everything else will.

= HOWTO SET UP THE LEAFNODE PROGRAMS FOR SCRIPTING =

If you have configured leafnode2 for scripting by using

    ./configure --enable-lua

For actually using Lua, you need:

* the file containing lua code in /etc/leafnode/scripthooks.lua
* the environment variable LUA_PATH="/etc/leafnode/?.lua;;"

for every invocation of lua-enabled leafnode2 programs.  Currently,
there is support for "special delivery" of articles in fetchnews(8) and
for per group authorization of users in leafnode(8).

As leafnodes standard logging function is accessible from Lua, you don't
have to blindly run your scripts in the hope they work the first time.

Leafnode(8) features are especially easy to test, because as a server it
can be run interactively from the console.  You can completely isolate
it from any running instances by using non-default configuration and
script files, like:

leafnode -vvv -e -l /etc/leafnode/script-groupauth.lua \
         -F /etc/leafnode/config-authtest

The file config-authtest is a regular leafnode configuration file.  The
group-auth sample implementation "script-groupauth.lua" is contained in
the distribution, you can use it directly or supply your own one.

This README, "script-groupauth.lua" and "scripthooks.lua.dist" have
details on how to do that.

Scripting in and of itself does nothing for you:  it provides the
possibility to enhance the leafnode programs.  On the other hand, it
does not take anything away.  You can even enable lua tentatively
without having a script file at all, or without specifying all the
function callouts:  the program will log a warning and continue as if
you had a plain leafnode setup.  The same goes for any errors in your
script, except that there will be more warnings telling you about lua
complaints and error locations.

= TESTING A LEAFNODE SCRIPT =

The group-auth sample implementation has support for interactive
testing.  You don't have to run leafnode(8) to check your configuration
table.  When running "script-groupauth.lua" by loading it standalone
into the Lua interpreter[1], it checks for the presence of "ln_log", the
standard leafnode logging function.  It is defined as a global function
when the script is run from a Lua-enabled leafnode(8), and it is
undefined when run standalone.  The script will then proceed to install
a substitute and set up the other variables normally provided by
leafnode.  The loglevel is set to debugging, so it will include details
about pattern matching.  You can use all the Lua facilities built into
the interpreter, register users as if they had authenticated in
a newsreader and call script functions exactly like leafnode(8) would
do.  Here is a sample session:

"cd" into the directory where your configuration is located, type "lua"
(return) and do:

  $ lua
  Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
  > c,e=loadfile("script-groupauth.lua"); print(tostring(e))
  nil
  > c()
  CRIT-ALL_STATIONS: test_logging: display three times
  CRIT-ALL_STATIONS: test_logging: display three times
  CRIT-ALL_STATIONS: test_logging: display three times
  > print_rules()
  AUTHENTICATED_USER=>_unsatisfied_<
  [1] = { c=>mode<, u=>.<, g=>.< }
  [2] = { c=>.<, u=>.<, g=>_frail_microwale< }
  [3] = { c=>.<, u=>!chillun_anny|chillun_sammy|test|_unsatisfied_|popl<, g=>.< }
  [4] = { c=>.<, u=>chillun_anny|chillun_sammy|test|_unsatisfied_<, g=>!%prockets|^alt%.|comp%p< }
  [5] = { c=>.<, u=>popl<, g=>!^gmane%.< }
  > = leafnode_init()
  INFO-ALL: leafnode_init()
  > = leafnode_enter_group("mode", "nowhere")
  DEBUG-GROUP: leafnode_enter_group (on entry): cmd=mode, group=nowhere, user=_unsatisfied_, clean_group=nowhere
  DEBUG-GROUP: leafnode_enter_group: check group_auth[1] c='mode' u='.' g='.'
  DEBUG-GROUP: genMatchN: compare alternative: >mode< against: >mode< => mode
  DEBUG-GROUP: genMatchN: compare alternative: >.< against: >_unsatisfied_< => _
  DEBUG-GROUP: genMatchN: compare alternative: >.< against: >nowhere< => n
  INFO-GROUP: leafnode_enter_group: pass group_auth[1], cmd=mode user=_unsatisfied_ group=nowhere
  => outcome is SCRIPT_NOERROR
  > = reg_user("popl")
  popl
  > = leafnode_enter_group("group", "gmane.test")
  DEBUG-GROUP: leafnode_enter_group (on entry): cmd=group, group=gmane.test, user=popl, clean_group=gmane.test
  DEBUG-GROUP: leafnode_enter_group: check group_auth[1] c='mode' u='.' g='.'
  DEBUG-GROUP: genMatchN: compare alternative: >mode< against: >group< => no match
  DEBUG-GROUP: leafnode_enter_group: check group_auth[2] c='.' u='.' g='_frail_microwale'
  DEBUG-GROUP: genMatchN: compare alternative: >.< against: >group< => g
  DEBUG-GROUP: genMatchN: compare alternative: >.< against: >popl< => p
  DEBUG-GROUP: genMatchN: compare alternative: >_frail_microwale< against: >gmane.test< => no match
  DEBUG-GROUP: leafnode_enter_group: check group_auth[3] c='.' u='!chillun_anny|chillun_sammy|test|_unsatisfied_|popl' g='.'
  DEBUG-GROUP: genMatchN: compare alternative: >.< against: >group< => g
  DEBUG-GROUP: genMatchN: compare alternative: >chillun_anny< against: >popl< => no match
  DEBUG-GROUP: genMatchN: compare alternative: >chillun_sammy< against: >popl< => no match
  DEBUG-GROUP: genMatchN: compare alternative: >test< against: >popl< => no match
  DEBUG-GROUP: genMatchN: compare alternative: >_unsatisfied_< against: >popl< => no match
  DEBUG-GROUP: genMatchN: compare alternative: >popl< against: >popl< => popl
  DEBUG-GROUP: leafnode_enter_group: check group_auth[4] c='.' u='chillun_anny|chillun_sammy|test|_unsatisfied_' g='!%prockets|^alt%.|comp%p'
  DEBUG-GROUP: genMatchN: compare alternative: >.< against: >group< => g
  DEBUG-GROUP: genMatchN: compare alternative: >chillun_anny< against: >popl< => no match
  DEBUG-GROUP: genMatchN: compare alternative: >chillun_sammy< against: >popl< => no match
  DEBUG-GROUP: genMatchN: compare alternative: >test< against: >popl< => no match
  DEBUG-GROUP: genMatchN: compare alternative: >_unsatisfied_< against: >popl< => no match
  DEBUG-GROUP: leafnode_enter_group: check group_auth[5] c='.' u='popl' g='!^gmane%.'
  DEBUG-GROUP: genMatchN: compare alternative: >popl< against: >popl< => popl
  DEBUG-GROUP: genMatchN: compare alternative: >^gmane%.< against: >gmane.test< => gmane.
  INFO-GROUP: leafnode_enter_group: deny group_auth[0], cmd=group user=popl group=gmane.test
  => outcome is SCRIPT_IGNORE_GROUP

You can change and extend the table/array group_auth[] interactively and
rerun "leafnode_enter_group" until the outcome fits your needs.  The
following configuration items and functions are implemented:

-- loglevel = LNLOG_SWARNING

loglevel: determines how much detail is printed.  LNLOG_SDEBUG prints
everything, LNLOG_SWARNING only warnings and worse.

-- check_conf_init = false
-- check_conf_init = true

Set if you want the configuration checked at leafnode_init() time.  If
false, the configuration is checked on every access.  This slows down
operation, but might let you change permissions dynamically.

-- function leafnode_init()

Called by leafnode(8) when initializing scripting.  It doesn't do much
except for checking the configuration and replacing non-string entries
with inert dummies.

-- function leafnode_enter_group(command, groupname)

Called by leafnode(8) whenever the connecting client issues any group
related NNTP command.  The function sees the leafnode internal command
and the actual name of the group.  The command is compared against the
configuration table, as is the groupname.  Authentication is NOT handled
here, instead the global variable AUTHENTICATED_USER becomes valid after
authentication completed.

Besides the items mentioned above, the (crude) interactive "debugging
facility" defines:

-- function print_rules(tab)

When called without an argument, the central table "group_auth[]" is
printed, this is how it would normally be used:

  > print_rules()
  AUTHENTICATED_USER=>_unsatisfied_<
  [1] = { c=>mode<, u=>.<, g=>.< }
  [2] = { c=>.<, u=>.<, g=>_frail_microwale< }
  [3] = { c=>.<, u=>!anny|sammy|test|_unsatisfied_|popl<, g=>.< }
  [4] = { c=>.<, u=>anny|sammy|test|_unsatisfied_<, g=>!%prockets|^alt%.|comp%p< }
  [5] = { c=>.<, u=>popl<, g=>!^gmane%.< }

-- function ins_rule(n, rule)

Use this function to insert or delete rules from "group_auth[]".  The
first argument must be a rule number starting at one.  If the second
argument is not given, rule [n] is deleted.  If you want to append
a rule after the last one, give a very large rule number, which is
interpreted as the rule after the last one.  The resulting table is
printed and you are asked for confirmation.  An answer starting with "y"
replaces group_auth with the temporary edit copy.

  > ins_rule(1)
  AUTHENTICATED_USER=>popl<
  [1] = { c=>.<, u=>.<, g=>_frail_microwale< }
  [2] = { c=>.<, u=>!anny|sammy|test|_unsatisfied_|popl<, g=>.< }
  [3] = { c=>.<, u=>anny|sammy|test|_unsatisfied_<, g=>!%prockets|^alt%.|comp%p< }
  [4] = { c=>.<, u=>popl<, g=>!^gmane%.< }
  ins_rule: is this ok? [yn]
  y
  AUTHENTICATED_USER=>popl<
  [1] = { c=>.<, u=>.<, g=>_frail_microwale< }
  [2] = { c=>.<, u=>!anny|sammy|test|_unsatisfied_|popl<, g=>.< }
  [3] = { c=>.<, u=>anny|sammy|test|_unsatisfied_<, g=>!%prockets|^alt%.|comp%p< }
  [4] = { c=>.<, u=>popl<, g=>!^gmane%.< }
  > ins_rule(1, {"mode", "testuser", "."})
  AUTHENTICATED_USER=>popl<
  [1] = { c=>mode<, u=>testuser<, g=>.< }
  [2] = { c=>.<, u=>.<, g=>_frail_microwale< }
  [3] = { c=>.<, u=>!anny|sammy|test|_unsatisfied_|popl<, g=>.< }
  [4] = { c=>.<, u=>anny|sammy|test|_unsatisfied_<, g=>!%prockets|^alt%.|comp%p< }
  [5] = { c=>.<, u=>popl<, g=>!^gmane%.< }
  ins_rule: is this ok? [yn]
  y
  AUTHENTICATED_USER=>popl<
  [1] = { c=>mode<, u=>testuser<, g=>.< }
  [2] = { c=>.<, u=>.<, g=>_frail_microwale< }
  [3] = { c=>.<, u=>!anny|sammy|test|_unsatisfied_|popl<, g=>.< }
  [4] = { c=>.<, u=>anny|sammy|test|_unsatisfied_<, g=>!%prockets|^alt%.|comp%p< }
  [5] = { c=>.<, u=>popl<, g=>!^gmane%.< }

Ok, if you need anything else, just ask your friendly maintainer.


[1] Possibly called "lua" or "lua-5.1.3" or the like in your
    distribution.

= A BIT OF HISTORY =

When I started out with the scripting extension, I was in desperate
need of filtering the group "sci.crypt", which was then under attack
by vandalism. Though I could not find an explanation of the technique
used, one could see that the articles injected there by the thousands
were real looking articles consisting of words from genuine material,
propably using a markov chainer. The articles came from a variety of
free and pay USENET providers and the headers were complete fakes
attributed to regular contributors. The only property common to all
of them was, that they were all thread-starters, ie. even if they had
"References:" headers, they didn't match up to any existing thread.
Unfortunately, this is also true for the products of undisciplined
authors or users of certain USENET software.

So the only thing left to try was the endless sequence of opening
articles, reading the first few sentences and keeping or loosing them
depending on whether they made sense or not.

I wanted a way to do this automatically, trying at least a bayes
filter or some other statistical means, but leafnode2 at that time
had only header matching.  Then I found an article by matthias andree
in "gmane.network.leafnode" talking about ideas to improve leafnode2,
propably by adding scripting.  In private emails he mentioned Lua,
which I am hooked on for quite some time, so I offered help.  After
a few weeks of doing nothing and then some of writing and rewriting
code, the C side of things was ready for testing.  Now, which is a few
months later, I have an implementation of decent quality in permanent
testing mode.  It does not crash fetchnews and has made reading USENET a
friendlier experience, because there is rarely spam and no HTML anymore.
Plus, should I need new features, I don't have to bug anyone, instead, I
write some lines of Lua and put them into a table.

= WHAT I DO =

USENET carries a wide variety of opinions and facts, all in text.  With
so many different writing styles, a simple error in my filters can
drop a lot of good stuff I'd never know existed, or not work at all.
So I want at least one pseudo newsgroup getting all the spam, so I
can look at it later if I see that articles are missing.  I want this
newsgroup to have a very short expiry time.  If I find posts that can be
classified "content-free", I don't want them to waste disk space.  To
be able to develop the filtering criteria, I need to keep statistics,
either by logging them with leafnodes facilities or by keeping extra
data files.

Just now a fetchnews run ("/l/sbin/fetchnews+lua -e -D 705 -vvv")
finished:

...
2008-08-24_15:08:08.93829 wrote groupinfo with 67949 lines.
2008-08-24_15:08:08.93835 fetchnews: 3522 articles and 0 headers fetched, 1213 killed, 0 posted, in 269 seconds
2008-08-24_15:08:08.98373 fetchnews_finish(0): total=4129, spam=1050/ham=3079, ucase/lcase=0.115, cite/orig=0.667

This is the end of the log. In addition to the regular fetchnews log
messages it shows the output of the last hook invoked by the scripting:
Of 4129 messages fetched, 1050 were classified as spam by a sequence of
routines implemented in Lua. A current pet project of mine is trying to
find if the number of lower case to upper case letters in an article
(ucase/lcase) corelates to its percieved "spaminess". There are spammers
sending only upper case messages, then some people shout a lot. The
other number is the ratio between cited and original lines (cite/orig),
where citations are defined as anything consisting only of sequences of
colons or right pointed parentheses with optional spaces in between,
after a newline: ("\n([:>] ?){11,}" in PCRE parlance).

In my setup, articles scoring really low are rejected without saving,
some are saved to "local.reject", some to "local.spam", the rest into
their respective newsgroups.  Every article has extra headers showing
which of the filters it went through and which added to its score.

You can customize all that in very fine detail.  You can send some
information to the logs, add to or remove to headers or even the body.
In the C-part of Lua-scripting, a test is made if the scripting engine
could be initialized, and instead of logging errors on every article,
the engine is switched off entirely and fetchnews runs without it.  You
can do the same in your code.  If certain groups cause a lot of useless
grief and logging, you might reduce the log level, for example.

In my example implementation described later, I have very simple hook
routines, which basically store article headers and especially the
current groups name for easy access in other stages of the machinery.
Then I have a number of tables indexed by regular expressions to be
matched by newsgroup names. These tables are searched by a central
selection function called "select_match", which can return either the
most specific, ie. longest matching entry, or all matching entries
sorted by relevance. Then I have the other central function called
"run_hooks", which loops over the matches calling the routines specified
for some group and tallying the scores they result. Despite this
simplicity, my personal "scripthooks.lua" are over 1000 lines of Lua,
but they do what I want quite well, and customizing some group or group-
of-groups is easy using the tables.

When you start using the Lua backend, you might want to keep it much
simpler as to avoid the complexity and the learning curve.  Since all
the scripting comes as code of a real programming language, you can
"code as you go", collecting only what's necessary for your immediate
needs.

The most simple callout implementation doing something useful might be:

  function fetchnews_headertxt_bodytxt(header_string, body_string)
      local iam = "fetchnews_headertxt_bodytxt"
      local logmess = iam .. ": "
      local decision = {}
      if body_string:find("cash loanes") then
          newsgroups = spam_ngs
          logmess = logmess .. "cash loanes go to " .. newsgroups
          ln_log(LNLOG_SINFO, LNLOG_CARTICLE, logmess)
          decision = {
              result = SCRIPT_NOERROR,
              newsgroups = "local.spam"
          }
      else
          decision = {
              result = SCRIPT_NOERROR
          }
      end
      return decision
  end

In fact, you do not have to define any callouts at all.  If they don't
exist or have any other type except "function", nothing happens.

= How it works =

Terminology: The term "hook" is used loosely in the following, but it
is important to understand what they mean in a given context. Fetchnews
has been fitted with library calls into "liblua.so", which implements
both the bytecode compiler and runtime of the Lua language. These
library calls are surrounded by little code stubs of a dozen or so
lines of C embedded into the fetchnews code, which do the checking of
parameters and return results. These stubs are what I call "hooks". They
normally check for the existance and type of certain Lua values named by
variables I have named "callouts" later on. The callouts lead into your
Lua code. All the Lua text is collected into a single file, which must
be named "/etc/leafnode/scripthooks.lua", because this file is looked up
and run by the initialization code in fetchnews. If this file doesn't
exist, no problem, fetchnews runs as before. If the file exists, but
isn't compilable, no problem. But if the file contains compilable code
which doesn't make sense when invoked by the callouts, you will see many
error messages in the logs, some of which you can't do anything about.
note that Lua is a dynamic language, where variables are handles of Lua
values, which are the only entities having "types". Thus you can have
an undefined variable at one time or one with inappropriate type, a
situation changed by later code.  My example implementation devides the
operation of the predefined callouts into smaller chunks, which I also
called "hooks", unfortunately.  This is not entirely wrong, as all the
Lua code starts out from the C-hooks in fetchnews.  Since you will
be staying in Lua-land with your own code, these fine points may not
mean anything to you, but they are necessary for understanding the inner
workings, especially if you want to add, say, a perl backend.

= hooks in fetchnews =

At the time of writing this - august 2008 - fetchnews has a set of hooks
influencing the storage of articles.

This is the list of hooks implemented in both fetchnews.c and "store.c".
These hooks are meant to be general in providing any script backend with
the necessary information, but there is no direct relation to the user
visible hooks. For example, the Lua backend uses script_fn_add_header()
to add to a table accumulating unfolded article headers, but there
is no Lua function called on this occasion. The same goes for
script_fn_add_body(), which is not even used for Lua, but it could be
the appropriate place for other backends in case their strings need any
conditioning. The other script_fn_* hooks have user callouts except for
script_fn_finish(), which is used to shut down Lua. Needless to say, any
backend can choose to implement callouts or replace some hook with a
dummy. The Lua backend checks for the existence of every callout before
engaging it. It is considered not to be an error if the callout doesn't
exist or is something other than a function, because users might choose
to implement only the subset of hooks they need for their filtering
needs, but you should not rely on this semantics as other scripting
backends might not have this convenience feature. The names for callouts
and their descriptions specified in this document are the ones from the
Lua backend. See the example implementation called "scripthooks.lua" for
their use and meaning.

== script_fn_init() ==

callout name: "fetchnews_init" (lua)
arguments: none

The "*init" function checks for the availability of a user supplied
hook file - defaulting to "etc/leafnode/scripthooks.lua" in the Lua
backend - and initialize its environment. It also sets up a number of
global constants such as access to leafnodes logging-facility, its
constants for severity and context and leafnodes "age()" function, which
takes a "Date:" header and computes the age of some article in days. If
this step fails for any reason, its cause is logged and all scripting
functions are disabled. Should this happen, all the fetchnews operation
is functional with the exception of scripting. Users might setup their
overall environment such as input/output for external programs.

== script_fn_finish(rc) ==

callout name: "fetchnews_finish" (Lua)
arguments: integer number (recent fetchnews status)

The "*finish" function is called near the end of fetchnews. Its callout
should be scripted very carefully, because it has to deallocate the
scripting structures without disrupting fetchnews finalization. It could
be used for persistent statistical results.

== script_fn_init_group(group_name) ==

callout name: "fetchnews_init_group" (Lua)
arguments: string containg the name of the group about to be entered.

The "*init_group" function gets as its sole argument the name of the
currently "downloaded" group as a string value. This is meant to setup
group specific filtering.

== script_fn_finish_group(status) ==

callout name: "fetchnews_finish_group" (Lua)
arguments: integer number

"*finish_group" receives the most current condition code from fetchnews
central group handling function as a number. This is normally the number
of the last article in the group, but can also be some error number. For
the precise meaning, you would have to either check fetchnews.c or bug
the script maintainer.

== script_fn_init_article() ==

callout name: "fetchnews_init_article" (Lua)
arguments: none

"*init_article" does not get to see any arguments, but its callout can
be used to initialize any article specific processing.

== script_fn_finish_article(rc) ==

callout name: "fetchnews_finish_article" (Lua)
arguments: integer number

"*finish_article" is called near the end of article processing.  Its
argument is the most current status code from "store.c".  Unless
anything goes wrong or your filters reject the article, it will always
be zero.  Filter rejects result in the status to be one, anything else
is strange, but still useful if you keep statistics or want to know the
exact fate of the current article.  A zero should guarantee that the
article has been stored and indexed as a file on disk, as far as the
C code can verify this.

== script_fn_add_header(header) ==

callout name: none (Lua)
arguments: the last header read from the network (internal only, Lua)

This function is used on every article header within the scripting glue
code. Backend implementation might choose to provide a user callout.

== script_fn_add_body(body) ==

callout name: none (Lua)
arguments: one string comprising the cached part of an articles body
           (internal only, Lua)

The scripting machinery cannot always cache the entire body of articles
if they are too big, because articles may be larger than is feasable to
keep in memory. Script backends may choose to treat the body differently
than the header and to provide a callout to users. Lua implements a
dummy function, but no callout, because the same purpose can equally
well be achieved by function script_fn_filter_header_body, callout
"fetchnews_headertxt_bodytxt", which receives the same data.

== script_fn_filter_header_table() ==

callout name: "fetchnews_headertable" (Lua)
arguments: array of (unfolded) headers (Lua)

"fetchnews_headertable" gets a Lua table of all the articles headers in
the form of an array. Note that Lua can have any data type as both index
or value of a table, but there are some builtin functions for the case
of integer indexed tables turning them into regular arrays. The glue
code in fetchnews "store.c" accumulates each header into one long string
and adds it to such an array. This way you can be sure to get all the
original headers except for the upstreams "Xref:" header if traversing
the table with indici starting from one upto the tables size, each one
including all its continuation lines.  As this format is very convenient
to setup all later article processing, it is used in the example
scripthooks.lua to store header data in an article specific structure
which all later callouts rely on.

Note that the C glue code calls "script_fn_filter_header_table" without
any arguments, which have been collected by a separate function
"script_fn_add_header", whereas the roughly corresponding user callout
"fetchnews_headertable" receives the array of article headers mentioned
above.  This is because in fetchnews code collecting headers comes
before calling the filter function and the headers must be complete at
that point in time.

== script_fn_filter_header(head) ==

callout name: "fetchnews_headertxt" (Lua)
arguments: the complete article header as a string

"fetchnews_headertxt" is an option for users wishing to realize all
filtering using unstructured header data.  The example scripthooks.lua
does not use this function, because the same data is available to
"fetchnews_headertxt_bodytxt".

Note that the upstreams "Xref:" header is removed in the glue code and
the local hosts "Xref:" header is not available until the filtering
decision has been digested.  Users will not see any "Xref:" headers.

== script_fn_filter_header_body(head, body) ==

callout name: "fetchnews_headertxt_bodytxt" (Lua)
arguments: both head and body of an article as strings (Lua)

In the filtering chain, "fetchnews_headertxt_bodytxt" is the last of the
user callouts receiving article data as one big string for the header
and the cached part of the body as another one. If you make the body
cache big enough, it will be easy to base a sound filtering decision on
its contents. The current default for this caches size is 55000 bytes,
which might store roughly 687 lines of 80 bytes each. You can easily
increase this amount as long as memory is not of concern. The authors
experience shows that the default is enough to train and use a bayes
filter (bogofilter).  Many spams on USENET can be found even with the
limited regular expression support of Lua, and it is no problem to
reliably remove HTML-parts.

