INTRODUCTION
------------

Awka comprises a translator of the AWK programming language to ANSI-C,
and a library against which translated programs may be linked.

version 0.3b  (beta)

This product is in BETA stage, which means you cannot rely on it to work -
not that it comes with any guarantees anyway.  This release is aimed at those
who may wish to assist by finding bugs and porting to other platforms.

The more people help in this way, the sooner Awka will become stable and
complete enough for general use.

This version is a major advance over 0.2b, fixing many (many!) bugs, greatly
boosting performance and introducing some new functionality.

I am hoping this will be the final beta version.

Instructions on installing Awka may be found in the file INSTALL.TXT.


SO WHAT IS AWKA?
----------------
It is two products :-

 *  A Translator - awka - is a seriously hacked version of mawk, with 
    additional code to output ANSI-C code.  This is not a recommended way of
    producing a program, but (a) it's only a translator, and (b) time was too
    short not to use mawk as a starting point.

 *  The Library - libawka.a - is my own creation, and is I believe(hope) much
    better designed than the translator.  Most development time was spent on
    the library, ensuring that code size and execution time were the best I
    could produce. 

To my mind the most important reason for a translator is that AWK programs
are limited to what the interpreter provides, whereas C code can be compiled
with other code into a larger application.  I know of at least one person
using Awka to integrate AWK programs with C++ code.

The use of C allows other products, such as Tk, to be used to provide GUI 
front-ends and the like.  In the future I intend to extend Awka to allow Tk 
calls to be defined within the AWK code, similar to PerlTk.

Many people have expressed the following requirements for an Awk-to-C
translator; it must allow distribution of an executable, and it should deliver
improved performance through removal of the interpreter.

Awka provides the method for creating an executable.  Increased performance,
however, faced two major issues:-

 (a) It is a common assumption that compiled code will automatically be 
     faster than interpreted, but this is not particularly true of AWK.  
     AWK is a relatively brief language (part of its appeal) - most things 
     translate directly into calls to 'library' functions in the (compiled)
     interpreter.  Therefore a well-implemented interpreter shouldn't have 
     that much overhead.

     Only with larger AWK programs will there be any benefit from compiling;
     even then it may not be significant.  You are also depending on how well
     your C compiler optimises code on your platform, something over which I
     have no control.  I have found most speed differences are due to the 
     implementation of one 'library' function against another.  Hence I have 
     taken considerable time in optimising Awka's library functions.

 (b) Variables are typeless in AWK, so they must also be typeless in the
     compiled code.  Thus one of the main areas where savings could be made -
     use of C native data types - is denied to a translator.  The Awka library
     uses macros and inline functions where possible to allow AWK-style
     type-casting.  I briefly considered using typed variables where possible,
     but discarded the idea as the library API would have become unworkable.

 (c) Many AWK functions have variable numbers of arguments.  An interpreter
     can create efficient internal parameter structures at parse time, but
     C code must resort to vararg calls, which are not particularly quick.

 (d) Of the current interpreters, Mawk is very fast, and Gawk is not too
     far behind.  This makes the scope for potential gain pretty narrow.

Having said all this, Awka seems to be at least competitive with Mawk, and on
some (but by no means all) occasions it is faster.  Perhaps unlike the 
maintainers of other AWKs, I am actively looking for ways in which to boost
Awka's speed.

Some AWK enthusiasts may be opposed to the idea of a compiler, holding to a
philosophical view that AWK scripts should always be made available in source
form to customers.  I feel that, although this may be appropriate most of the 
time, sometimes it is not.  With Awka, the author of a script now has more 
options to decide how they want their work distributed, and I think that is a 
good thing.


HOW TO USE AWKA
---------------
First write and test your AWK program using an interpreter such as MAWK or 
GAWK.  At present I do not recommend using Awka without first testing its 
results against those produced by an interpretive AWK.

There are several ways to use Awka - see the manpage for more details.


MISSING FEATURES
----------------
Most language features are supported, with the exception of:-

  * The pseudo /dev/xxx functionality present in gawk is not included.
    /dev/stderr and /dev/stdout are present, and any real /dev/ thing will
    obviously work, but /dev/pid will not.

  * The Gawk IGNORECASE variable is not supported.

  * Awka does not promise to be Posix compliant - yet, but it is slowly
    creeping towards compliance.  At the very least there should be no
    major clangers that make your program behave differently from other 
    AWK implementations.

  * Like Mawk, Awka does not allow use of 'next' or 'nextfile' inside a 
    function.  I very much doubt whether this will change, as I am yet
    to discover a way in which the feature could be logically implemented
    from within a C function call stack in a single-threaded environment.

  * Awka-compiled executables support the use of "-v var=value" command
    line arguments, but not "var=value" without the -v.  The reason is
    that these behave differently.  Variables set using -v are set to the
    value before BEGIN is executed, whereas without -v the "var=value"
    string is put on the ARGV list, and treated as a filename.  Only
    when it comes time to 'open' the file is it recognised as a variable
    assignment.  I've left this out of Awka by choice - I don't like
    the way "var=value" behaves.


AWK EXTENSIONS?
---------------
Awka introduces some new builtin functions.
These are :-

  totitle     converts to Title case
  ascii       returns ascii code of a character
  char        returns character for an ascii value
  left        returns leftmost n chars of a string
  right       returns rightmost n chars of a string
  ltrim       trims whitespace from left of a string
  rtrim       trims whitespace from right of a string
  trim        trims whitespace from left & right of a string
  and         bitwise and
  or          bitwise or
  xor         bitwise xor
  compl       bitwise compl
  lshift      shifts bits to the left by n positions
  rshift      shifts bits to the right by n positions
  min         returns lowest number in a list
  max         returns highest number in a list
  time        returns number of seconds from 1 Jan 1970, allowing
              the user to define their own year, month, day etc.
  gmtime      formatted time string set to Greenwich Mean Time
  localtime   formatted time string adjusted for local timezone
  alength     returns the number of elements in an array variable.

These functions are documented more thoroughly in the man page.
Variables or AWK functions using the above names will over-ride the
new builtins, so your program should not require modification.

In addition, the following functions are supported in Awka, and also
exist in either Gawk and/or Tawk :-

  nextfile    Advance to the first record of the next file in the
              input list.
  systime     Returns the current time as seconds since 1 Jan 1970.
  strftime    Formats output of time() & systime() according to a
              user-supplied format string.
     abort    Originally from TAWK, this exits a program without
              running the END rules.
  argcount    Also from TAWK, from within a function this returns
              the number of arguments that were passed to the function.
    argval    TAWK again, this returns the value of the nth argument
              passed to a function.
  SORTTYPE    A builtin-variable, this determines if & how output from 
              'for (i in j)' statements is sorted.

Awka now supports Gawk's FIELDWIDTHS variable, and in addition introduces
a SAVEWIDTHS variable.  When this is set to non-zero, and FIELDWIDTHS is
active, Awka will try to preserve column widths when rebuilding $0.

More details on how the above work may be found in the manpage.

You may disagree with the addition of all these extras.  Feel free to
let me know if this is the case, as I would be interested to see why
extra functions are not a good thing.  I believe that if they add to 
the capabilities of AWK, their inclusion can only be a positive step, 
furthering the use, power and effectiveness of the language.

Certainly no disrespect is intended to Brian Kernighan and others who
designed AWK and determined its limits.  If the balance of responses
I receive about the additions are negative, I will consider removing
them.


WHAT PLATFORMS ARE SUPPORTED?
-----------------------------
Awka is designed to operate under a flavour of Unix.

Specific platforms tested, either by me or others:-

- Digital Unix 3.2c and 4.0a

- SGI Irix (using gcc),
  Irix 5.3 ported by Finn Drablos (finn.drablos@unimed.sintef.no).
  Irix 6.5 ported by Eiso AB (E.Ab@chem.rug.nl).

- HP-UX 10.20 & 11.0 (using gcc),
  ported by Matthijs van Aalten (Matthijs.vanAalten@ehv.ce.philips.com)
  (using native compiler)
  ported by Mick Donna (mdonna@cincom.com)

- Linux (Redhat 5.2),
  first ported by chris proctor (bil@linuxstart.com).
  I believe Awka has also been successfully ported to Caldera.

- Linux (SuSe)
  also ported by chris proctor

- LinuxPPC 5.0, using a Powerbook G3 (Wall Street)

- SCO OpenServer 5.0.4
  ported by John H. DuBois III (spcecdt@armory.com)

- Windows NT4.0 and 95, using Cygnus B20.1.

- DJGPP
  ported by Peter J. Farley III (pjfarley@banet.net)
  with help from Nethanel Elzas (nelzas@ndsisrael.com)

For now I can only be sure it works on those platforms.  This is not to
say it won't work on your platform, hopefully it will.  If you do port to
a platform, whether successful or not please take the time to provide
feedback, as this benefits everyone.

You need an ANSI C compiler - GCC is always a good choice.

Neither the Awka library or the generated C code is thread-safe.


LICENSING
---------
The Awka TRANSLATOR contains portions of code derived from Mawk, and 
therefore shares Mawk's GPL license, a copy of which is provided.

The LIBRARY is released under the Awka Library License, a copy of which
is in the file LIBLICENSE.txt.

Copies of both GPL and the Awka Library License are included in this 
distribution.  If you plan to use the Awka package you should definitely 
read these documents and become familiar with their terms.

Please note that the code generated from your awk script by the TRANSLATOR
is always owned by you - I do not consider it to be 'derived from' the
TRANSLATOR's code, so it does not come under GPL.

You should be aware that the Cygnus environment under NT and Windows 95/98
has its own GPL license that applies to any code linked to the cygwin dll
- this automatically happens to anything compiled under Cygnus.


FEEDBACK
--------
Encouragement, bug reports (patches are welcome), suggestions, inducements
and constructive criticism may be sent to andrew_sumner@bigfoot.com.  I don't
always have time to read my mail or to reply promptly, but I will do my best.

The Awka homepage is at:
   http://www.linuxstart.com/~awka

The most current source release will be posted here, and eventually binaries 
will be available.


HOW AWKA CAME TO BE
-------------------
Awka was developed in response to the many posts to comp.lang.awk asking 
whether a free awk translator/compiler existed - clearly there was a 
need for such a tool.

Responses to these queries usually pointed to TAWK, apparently a fine product;
however it is not free, and is not available for many platforms.  From what
I can gather Tawk produces assembler or machine-language output which it then
executes, thus gaining excellent performance.

I found some references to an ancient tool called CAWK, which is apparently
'Compiled AWK', however I could find nothing about it - certainly if it
exists it is not available for general distribution.

There is also awk2c, which creates C code to link to gawk sourcecode, however
this was incomplete, with no development taking place.  It also appeared to
be subject to the GPL, so distribution of binaries without sourcecode may not 
have been possible.

Finally there was awkcc, an AT&T product that is essentially similar to Awka 
but costs $2,000 for a sublicense (or did when I last checked).

  "Implementors of the AWK language have shown a consistent lack of 
   imagination when naming their programs."
                                              - Michael Brennan

Over time I had been creating a library of awk-like functions, which I had
called libawka (Awka stood for Awk Archive).  I extended this to make it a more
complete coverage of the language, including pervasive casting of variable
types.

I then tackled the task of writing a translator - something I had little time
to do properly.  In order to reduce the size of the task I decided to use Mawk
as a starting point.  I tried to remove as much execution code as possible,
leaving code necessary for parsing the AWK language.  I converted the Mawk
internal opcode structure to a format I could use for translation purposes,
then crunched out a translation module.

I preferred Mawk over Gawk as I found it easier to track variable types in 
complex statements using Mawk's reverse-polish, assembler-like opcode structure.
At the time I was not aware of various Mawk parser bugs and restrictions, but
I believe Mike Brennan is now working on these, with a new version of Mawk in
the wings.

Finally I borrowed Gawk's extensive test suite and, after much effort removing
various bugs and incorrect logic (from Awka not Gawk), I ensured that Awka now 
passes the tests.  The test suite is being extended as new bugs are discovered
and fixed.


Andrew Sumner, June 1999. 8-)
