$Id: README,v 1.2 2005/10/17 22:11:13 killabyte Exp $
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

                                                                A una mixirrica
                                                                     A la nica
                                                            A la que ms estimo
                                                               A qui ms estimo


Contents
========

        1. Introduction to pDI-Tools
        2. Quick install
              2.1 Local installation
              2.2 System installation
        3. pDI-Tools components
        4. Your first interposition
        5. License of pDI-Tools


1. Introduction to pDI-Tools
============================

  pDI-Tools is a very powerful and portable API and engine that can be used to
  create dynamic instrumentation tools, performance tools, execution drive
  simulations, reverse engineering, hacking and a lot more. Its main features
  are portability, easy of use, compact and fast. It works with any executable,
  without source code and debug information.

  It implements mechanisms based on intercepting calls between dynamic shared
  objects (executable-libraries, libraries-libraries). These mechanisms exploit
  ELF structures and data, making them very efficient and portable. It
  currently works on GNU/Linux (i386, PowerPC, PowerPC 64), Solaris (SPARC 32
  and 64 bits) and Irix (MIPS 32 and 64).

  Because the way are implemented this mechanisms you can interpose code on any
  executable independently of the tools used for build it or the debug
  information it contains, if contains any.

  Before start we recommend you to read files BUGS and INSTALL. If you are very
  impatient and you want to know what to do to install pDI-Tools now, read
  section 'Quick install' in this file.

  If this is the first time you use this program, please read the section 'Your
  first interposition'. This section describes quickly how to make
  interpositions and build backends.

  If you have any idea or suggestion, or if you think you have found a bug, you
  can contact me at
  
    Gerardo Garca Pea <gerardo@kung-foo.dhs.org>


2. Quick install
================

  There are infinite ways of installing pDI-Tools, and I am sure all are ok.
  But in this section I will propose only two ways, depending on your needs:

    - if you don't have root administrative rights or do not want to install
      this application at system level, you can make a local installation of in
      your home directory. See subsection "2.1 Local installation"

    - if you want that all users in your machine can use pDI-Tools or you want
      pDI-Tools to be part of your system, please read subsection "2.2 System
      installation".

  The two ways to install pDI-Tools are more or less based on the same scheme:

        $ ./configure [configuration]
        $ make
        $ make install


2.1 Local installation
======================

  You don't need any special privilege to run this install. It is also very
  easy to uninstall pDI-Tools if you follow this instructions.

  Uncompress pDI-Tools distribution file in your home directory:

        $ cd
        $ zcat <your pditools-1.0.0.tar.gz dist file> | tar xvf -
        $ cd pditools-1.0.0

  Configure pDI-Tools setting PREFIX to a directory in your home directory:

        $ ./configure --prefix="$HOME/pditools"

  If the default configuration is not useful for you, you can tweak it. Try
  'configure --help' to get a list of possible configuration and installation
  options.

  Build the program and install:

        $ make
        $ make install


2.2 System installation
=======================

  Easier than local installation, but you will need root privileges.

  Go to a temporary directory (in this example we will use /var/tmp) and
  uncompress pDI-Tools distribution there.

    $ cd /var/tmp/
    $ zcat <your pditools-1.0.0.tar.gz dist file> | tar xvf -
    $ cd pditools-1.0.0

  Configure pDI-Tools with 'configure':

    $ ./configure

  If the default configuration is not useful for you, you can tweak it. Try
  'configure --help' to get a list of possible configuration and installation
  options. The default configuration will install all pDI-Tools files in
  '/usr/local' directory.

  Once configured, build the package:

    $ make

  If all have gone ok you can install now pDI-Tools executing the following
  line after getting root privileges:

    $ su
    Password: 
    # make install


3. pDI-Tools files and directories
==================================

  pDI-Tools distribution is divided in several directories. Once installed you
  will find:

    - $(INSTDIR)/bin/
      This directory contains the pDI-Tools core object 'libpdi.so'. Perhaps in
      the future this directory will also contain help scripts or other
      utilities.

    - $(INSTDIR)/etc/
      Configuration files of pDI-Tools.

    - $(INSTDIR)/doc/
      Here you will find pDI-Tools documentation. In the 'html' directory you
      will find articles and manuals about pDI-Tools in HTML format.

    - $(INSTDIR)/include/
      Header files of pDI-Tools API. They are useful to program backends and
      programs that use pDI-Tools.


4. Your first interposition
===========================

  First of all you will need a program to instrument. For this tutorial you can
  use any program which uses libc (usually ALL programs use this library) as a
  shared library (usually ALL programs use shared libraries).

  You can see on which shared libraries a program depends using ldd(1) command.
  This command show which libraries are needed by a certain program. For
  instance you can see which libraries ls(1) use:

    gerardo@arale:~/pdi$ ldd /bin/ls
                    librt.so.1 => /lib/tls/librt.so.1 (0x40025000)
            libacl.so.1 => /lib/libacl.so.1 (0x4002b000)
            libc.so.6 => /lib/tls/libc.so.6 (0x40033000)
            libpthread.so.0 => /lib/tls/libpthread.so.0 (0x40168000)
            /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
            libattr.so.1 => /lib/libattr.so.1 (0x40177000)

  This means that without any of these, for instance 'libacl.so.1', ls doesn't
  work.

  To know which libraries are used by a program is important to make code
  interpositions with pDI-Tools. pDI-Tools mechanism only allow to intercept
  function calls between two different shared objects. A shared object is a
  library or the binary program. So if you have this program:

    01    #include<stdio.h>
    02    #include<stdlib.h>
    03
    04    void printHeader(void)
    05    {
    06      printf("MyFooProg 2005 (C) Gerardo Garca Pea\n");
    07    }
    08
    09    int main()
    10    {
    11      int c;
    12      printHeader();
    13      while((c = fgetc(stdin)) != EOF)
    14        fputc(c, stdout);
    15      exit(0);
    16    }

  In this example there are five function calls, but with pDI-Tools technically
  only four can be instrumented, but only three can be instrumented easily.

  Functions calls in lines 3, 10 and 11 can easily instrumented as you will see
  soon. They can be instrumented because these functions are implemented in
  '/lib/libc.so.6' and they usually follow the platform ABI correctly.

  But some functions like exit(3) cannot be instrumented in a normal way: this
  function never return. If you try to instrument exit(3) probably your program
  will make a segmentation fault when trying to finish. If you want to
  instrument exit(3) take a look to the callback interposition method described
  in documentation.

  Finally function call in line 9 cannot be instrumented because it is part of
  the same shared object. pDI-Tools implements interceptions with a technique
  called PLT/GOT redirection. PLT/GOT are tables used to find functions in
  other shared objects. For instance, if the main program wants to call
  fprintf(3), it must go to its PLT/GOT table, get the real address of
  fprintf(3) function and then jump to it. This table is needed because
  libraries can be loaded in random addresses, so this relocation table is
  needed by programs and libraries to interact. pDI-Tools alter these tables to
  redirect program execution. But some function calls like function call in
  line 9 don't need PLT/GOT relocation entries because they can be linked
  statically (in case of the main program) or relative to the Program Counter.

  Now imagine you want to intercept all fputc(3) and fgetc(3) calls. Both are
  implemented in libc.so, so there is no problem to intercept them with
  pDI-Tools. The easiest way to redirect them should be two functions like
  these that take control in place of the original functions:

    01    fgetc_counter = 0;
    02    fputc_counter = 0;
    03
    04    int fgetc_wrapper(FILE *f)
    05    {
    06      ++fgetc_counter;
    07      return real_fgetc(f);
    08    }
    09
    10    int fputc_wrapper(int c, FILE *f)
    11    {
    12      ++fputc_counter;
    13      return real_fputc(c, f);
    14    }

  Ummmm.. we have two possible functions that will substitute fgetc() and
  fputc() but now we have two problems:

    1) If we substitute fgetc(3) and fputc(3), how can we call to the original
       functions? How can we implement 'real_fgetc()' and 'real_fputc()'?

    2) And one more: how can we install these interpositions with pDI-Tools?

  The first question is easy to solve with pDI-Tools, but you will need some
  background to understand how really it works.

  When you start your application, it is loaded at memory with another program:
  the Runtime Linker. This program, like ldd(1), examines your application to
  know which libraries needs. When it has the full list of requirements of your
  application it loads all the needed shared objects (libraries), creates the
  PLT/GOT and relocation tables, initializes them, executes the .init section
  of each shared object and finally transfers the control to your application.

  There are so much PLT/GOT and relocation tables as shared objects. Each DSO
  (Dynamic Shared Object) has its own PLT/GOT table and relocation table. These
  tables tell to its DSO where it can find a external symbol. For instance, our
  sample program needs a PLT/GOT and a relocation table with pointers to
  printf(3), fgetc(3), fputc(3) and exit(3).
  
  One interesting property of this structure is that these tables are
  independant, so if you modify the main program PLT/GOT and a relocation
  tables, libc's PLT/GOT entries and relocations won't be affected. So, if the
  wrapper functions are in a different DSO than the main app, we can call the
  original functions directly if we only redirect the function calls in the
  main program shared object (MAIN).

    01    /* backend.c - my first backend */
    02    #include<stdio.h>
    03    #include<stdlib.h>
    04
    05    fgetc_counter = 0;
    06    fputc_counter = 0;
    07
    08    int fgetc_wrapper(FILE *f)
    09    {
    10      ++fgetc_counter;
    11      return fgetc(f); /* fgetc is called directly from this DSO */
    12    }
    13
    14    int fputc_wrapper(int c, FILE *f)
    15    {
    16      ++fputc_counter;
    17      return fputc(c, f); /* fputc is also called directly */
    18    }

  But this backend it is not too much useful because only counts how many times
  fgetc(3) and fputc(3) are used, but it doesn't give any result to the user.
  pDI-Tools gives a way to the programmer to execute code after instrumented
  program execution. So we add these backend finalization routine:

    19    void di_fini_backend()
    20    {
    21      printf("fgetc_counter=%d\nfputc_counter=%d\n",
    22             fgetc_counter, fputc_counter);
    23    }

  Now you can compile this shared object with this line:

        $ gcc -Wall -shared -o backend.so backend.c

  The flag -shared tells GNU C that this object is a shared library.

  Now comes the two great questions:
    ok, ok ... I have a shared object with two functions that will take control
    when my program will try to execute fgetc and fputc, but...
      How can I use pDI-Tools to instrument my program?
    and one better...
      How does pDI-Tools know which function calls it should redirect from the
      application MAIN DSO?

  Technically pDI-Tools installs code interpositions adding itself as a shared
  object of your application and taking the control during library
  initialization to modify these (and other several) structures of each DSO in
  memory.

  It achieves this with some help from you: you must launch your program
  telling the Runtime Linker it has to add pDI-Tools to your application in
  run time. It is easier to do than it sounds, but it depends on your operating
  system (all examples are on Bourne Shell - see sh(1)):

    - In Linux and Solaris you must launch your application with the
      environtment variable LD_PRELOAD set to pDI-Tools:

        $ LD_PRELOAD="[PDITOOLS_PATH]/bin/libpdi.so" <your_app> [parameters]

    - In Irix you must prepend pDI-Tools to the Runtime Linker DSO List:

        $ _RLD_LIST="[PDITOOLS_PATH]/libpdi.so:DEFAULT" <your_app> [parameters]

  If you try this it will launch your application managed by pDI-Tools, but
  probably you will get some errors telling you that there is not configuration
  and there is not any interposition to install.

  pDI-Tools need at least two files to work properly: one configuration file
  and a interposition commands file.

  pDI-Tools configuration file is a text file that sets its basic behaviour and
  tells it which files contain the interposition commands. It is usually called
  'pdi.cfg'. One possible 'pdi.cfg' file could be:

    01    # pdi.cfg - my first pDI-Tools config file
    02    [global]
    03    verbose = 2
    04    be_path = .
    05    becfg_path = .
    06    config = first.cfg

  pDI-Tools configuration files can be divided in sections. The 'global'
  section exists always, and it is the default section. How to use sections is
  explained in pDI-Tools manual. You can also find a more complete example of
  'pdi.cfg' file in the 'etc/' directory in this distribution.

  First of all the config file set 'verbose' to 2. This is a high log level, so
  a lot of messages will be printed on screen. Higher levels than 1 are useful
  when debuging backend's and/or pDI-Tools.

  The other two parameters, 'be_path' and 'becfg_path', set the searching path
  for backends (backend.so for instance) and its interposition command files
  (sometimes called backend config files).

  The last parameter, 'config', tells which file contain the interposition
  commands. The interposition commands tell pDI-Tools which function calls will
  be redirected and how they must redirected.

  The interposition commands file is divided in two parts.

  The first part is used to declare which DSO of the application will be
  affected by our interpositions and which backends (shared objects with
  wrapper functions) will be used by the interposition commands.
  
  The second part contains the interposition commands. A interposition command
  has three parts: the type of interposition, the affected function call and
  which wrapper will take control when the function is called.

  An example of interposition command:
  
    01    ; first.cfg - my first interposition commands file
    02
    03    ; declarations
    04    #backend BACKEND        backend.so
    05
    06    ; Interposition commands
    07    #commands
    08    R MAIN fgetc BACKEND fgetc_wrapper
    09    R MAIN fputc BACKEND fputc_wrapper

  The first sentence (#backend BACKEND backend.so) assign the alias 'BACKEND'
  to the file 'backend.so'. It also declares that 'backend.so' is a backend. To
  assign an alias to an object is optional but recommended, because it is
  easier later to change a reference to a file in the rest of the interposition
  commands.
  
  The interposition commands start after the mark #commands in line 7.
  
  Each interposition command begins with a letter which tells pDI-Tools the
  type of this interposition command. The letter: 'R' tells pDI-Tools that this
  command is a relink. Relinks are the easiest to use interposition in
  pDI-Tools. They simple redirect calls from one DSO::function() to another
  DSO::function().

  In this example, we redirect all calls to fgetc(3) from MAIN (program DSO) to
  the function fgetc_wrapper() in 'backend.so'.

  If you instrument this program you will get this ouput:

        gerardo@arale:~/tmp/test$ LD_PRELOAD=./libpdi.so ./prog
    01  ----------------------------------------------------------------------
    02  pDI-Tools version 1.0.0, Copyright (C) 2004, 2005 Gerardo Garca Pea
    03  pDI-Tools comes with ABSOLUTELY NO WARRANTY; for details read the
    04  `COPYING' file that comes with this library. This is free software,
    05  and you are welcome to redistribute it under certain conditions;
    06  read the `COPYING' file for details.
    07  ----------------------------------------------------------------------
    08  init.c:initLiblink:pDI-Tools configuration have been loaded.
    09  linux/lx-init.c:_pdi_linux_init:_r_debug protocol version: '1'
    10  linux/lx-init.c:_pdi_linux_init:GNU libc version: 2.3.2 (stable)
    11  objlist.c:_pdi_ebe_initObjectList:Reserved 2240 bytes for an object poo
        l of 40 entries.
    12  linux/lx-objlist.c:_pdi_linux_initObjectList:Reserved 3680 bytes for 'a
        rchobjlist' table.
    13  init.c:initLiblink:Processing interposition commands files.
    14  init.c:initLiblink:  - Reading 'first.cfg'...
    15  init.c:initLiblink:  - Combining commands.
    16  init.c:initLiblink:Processing of interposition commands files finalized
        correctly.
    17  +-------------------------------------------------------+
    18  | Configuration: FINAL_CONFIG                           |
    19  +---------+-----------------------------------+---------+
    20  | TYPE    | OBJECT                            | ALIAS   |
    21  +---------+-----------------------------------+---------+
    22  | object  | -                                 | LIBC    |
    23  | object  | -                                 | MAIN    |
    24  | object  | -                                 | PDI     |
    25  | BACKEND | /home/gerardo/tmp/test/backend.so | BACKEND |
    26  +---------+-----------------------------------+---------+
    27  init.c:initLiblink:Installing backends and interpositions.
    28  beconfig.c:_pdi_becfg_applyBackendConfig:Loading and initializing backe
        nd '/home/gerardo/tmp/test/backend.so'.
    29  init.c:initLiblink:pDI-Tools initialized succesfully!
    30  MyFooProg 2005 (C) Gerardo Garca Pea
    31  hola
    32  hola
    33  fini.c:finiLiblink:Finalizing pDI-Tools.
    34  fgetc_counter=6
    35  fputc_counter=5
    36  fini.c:finiLiblink:* pDI-Tools execution finalizes here *
        gerardo@arale:~/tmp/test$ _

  As you can see, from line 01 to 29 all output is from pDI-Tools. It prints a
  lot of information because its verbosity level is set to 2. If you set
  verbosity level to 1 pDI-Tools will not print any message except warnings and
  errors.

  From lines 01 to 07 all is copyright information. From line 08 to 12
  pDI-Tools begin initialization. But the interesting messages begin at line
  13, when pDI-Tools has processed 'pdi.cfg' and starts to load interposition
  commands files. When it has processed all interposition commands files, it
  builds a final script (line 15). From this final script or configuration
  results the object list printed in lines 17-26.

  The next action is to init backends. Our backend has not initialization code,
  so on this step it is only loaded at line 28.
  
  As you can see, once backends have been loaded and interpositions installed
  the main program (or instrumented program) starts and finish its execution
  (lines 31-32).
  
  When main program finishes, pDI-Tools take control (line 33) and execute the
  backend's finalization code (lines 34 and 35) and exit.

  With this tutorial you have learned how to make relink interpositions with
  pDI-Tools. Relinks are the most simple interposition you can do with
  pDI-Tools. There is also redefinitions and callbacks (well, callbacks only
  work on linux/i386). You can learn more about them in the user manual in this
  distribution.


5. License of pDI-Tools
=======================

  All pDI-Tools files are generally governed by the LGPL license (Lesser GNU
  Public License, see file `licenses/lgpl.txt' or `COPYING' in this directory),
  except some files that may be under other licenses. For instance pDI-Tools
  documentation is distributed under FDL license (GNU Free Documentation
  License, see file `licenses/fdl.txt' or `doc/COPYING').

-----

pDI-Tools - Portable ELF Dynamic Instrumentation Tools
Copyright (C) 2004, 2005 Gerardo Garca Pea

This file is part of pDI-Tools.

This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

