uthash User Guide
=================
Troy D. Hanson <thanson@users.sourceforge.net>
v1.2, November 2006

include::sflogo.txt[]
include::topnav.txt[]

A hash in C
-----------
include::toc.txt[]

This document is geared towards C programmers of UNIX systems. Since you're
reading this, chances are that you know a hash is used for looking up items
using a key. In scripting languages like Perl, hashes are used all the time. 
In C, hashes don't exist in the language itself. This software provides a hash
for C structures.  

What can it do?
~~~~~~~~~~~~~~~~~
This software supports four basic operations on hashes.

1. add
2. find
3. delete
4. iterate/sort

Is it fast?
~~~~~~~~~~~
Add, find and delete are normally constant-time operations. This is influenced
by your key domain and the hash function. 

This hash aims to be minimalistic and efficient. It's around 500 lines of C.
It inlines automatically because it's implemented as macros. It's fast as long
as the hash function is properly chosen to suit your keys. This is easy; see
<<Appendix_A,Appendix A: Choosing the hash function>>. 

Is it a library?
~~~~~~~~~~~~~~~~
No, it's just a single header file: `uthash.h`.  All you need to do is copy
the header file into your project, and:

    #include "uthash.h"

You may then utilize the hash macros as explained below.

Supported platforms
~~~~~~~~~~~~~~~~~~~
This software has been tested on Linux, Solaris, OpenBSD, Mac OSX and Cygwin.
It probably runs on other Unix or Unix-like platforms too.

BSD licensed
~~~~~~~~~~~~
This software is made available under the BSD license. It is free and
open source. 

Obtaining uthash
~~~~~~~~~~~~~~~~
Please follow the link to download on the 
http://uthash.sourceforge.net[uthash website].

Getting help
~~~~~~~~~~~~
You can email the author at mailto:thanson@users.sourceforge.net[] if you have
questions not answered by this document.

Keys and values
---------------

A hash is comprised of structures. Each structure represents a key-value
association. One or more fields act as the key; the structure itself is the
value.

.Defining a structure that can be hashed
----------------------------------------------------------------------
#include "uthash.h"

struct my_struct {
    int id;                    /* key */
    char name[10];             
    UT_hash_handle hh;         /* makes this structure hashable */
};
----------------------------------------------------------------------

There are no restrictions on the data type or name of the key field(s).

.Data type independence
******************************************************************************
It may be a surprise that this hash does not require your structure or your
key field to have a particular datatype.  How is it possible to implement a
type-independent hash in a strongly-typed language?  In a word, macros. All
the hash code is in the form of macros, whose arguments are typeless. 
******************************************************************************

[NOTE]
Just remember that, as with any hash, the keys have to be unique.

The UT_hash_handle field
~~~~~~~~~~~~~~~~~~~~~~~~
The `UT_hash_handle` field must be present in your structure in order to hash
it.  It is used for the internal bookkeeping that makes the hash work.  It
does not require initialization. It can be named anything, but you can
simplify matters by naming it `hh`. This allows you to use the easier
"convenience" macros to add, find and delete items.


Adding, finding, deleting and iterating
---------------------------------------

This section introduces the `HASH_ADD`, `HASH_FIND`, `HASH_DELETE`, and
`HASH_SORT` macros. This section demonstrates the macros by example. For a
more succinct listing, see <<Appendix_F,Appendix F: Macro Reference>>. 

Adding an item
~~~~~~~~~~~~~~
Your hash must be declared as a `NULL`-initialized pointer to your structure.
Then allocate and initialize your structure and call `HASH_ADD`. (Here we use
the convenience macro `HASH_ADD_INT`, which is for keys of type `int`.)

[IMPORTANT]
The hash must be initialized to `NULL`!

.Adding an item to a hash
----------------------------------------------------------------------
struct my_struct *users = NULL;    /* important! initialize to NULL */

int add_user(int user_id, char *name) {
    struct my_struct *s;

    s = malloc(sizeof(struct my_struct));
    s->id = user_id;
    strcpy(s->name, name);
    HASH_ADD_INT( users, id, s );  /* id: name of key field */
}
----------------------------------------------------------------------

The second parameter to `HASH_ADD_INT` is the 'name' of the key field. Here,
this is `id`. 

[[validc]]
.Uhh.. the field 'name' is a parameter?
*******************************************************************************
Does it look strange to you that `id`, which is the 'name of a field' in the
structure, can be passed as a standalone parameter to `HASH_ADD_INT`?  Welcome
to the world of macros. Rest assured that the C preprocessor expands this
macro to valid C code. 
*******************************************************************************

Once an item has been added to the hash, do not change the value of its key.
Instead, delete the item from the hash, change the key, and then re-add it.

[IMPORTANT]
.Key uniqueness
================================================================================
Your application must enforce key uniqueness. In other words, do not add the
same key to the hash more than once. If you do, hash behavior will be undefined. 
================================================================================

You can test whether a key already exists in the hash using `HASH_FIND`.

Finding an item 
~~~~~~~~~~~~~~~

To look up an item in a hash, you must have its key.  Then call `HASH_FIND`.
(Here we use the convenience macro `HASH_FIND_INT` for keys of type `int`).

.Looking up an item in a hash by its key
----------------------------------------------------------------------

struct my_struct *find_user(int user_id) {
    struct my_struct *s;

    HASH_FIND_INT( users, &user_id, s );  /* s: output pointer */
    return s;
}
----------------------------------------------------------------------

[NOTE]
The middle argument is a 'pointer' to the key. You can't pass a literal key
value to `HASH_FIND`. Instead assign the literal value to a variable, and pass
a pointer to the variable.

Here, `s` is the output variable of `HASH_FIND_INT`. It's a pointer to the
sought item, or `NULL` if the key wasn't found in the hash.

Deleting an item
~~~~~~~~~~~~~~~~

To delete an item from a hash, you must have a pointer to the item.

.Deleting an item from a hash
----------------------------------------------------------------------

void delete_user(struct my_struct *user) {
    HASH_DEL( users, user);  /* user: pointer to deletee */
}
----------------------------------------------------------------------

[NOTE]
Deleting an item from a hash just removes it-- it does not free it. 

Iterating and sorting 
~~~~~~~~~~~~~~~~~~~~~

You can loop over the items in the hash by starting from the beginning and
following the `hh.next` pointer.

.Iterating over all the items in a hash
----------------------------------------------------------------------

void print_users() {
    struct my_struct *s;

    for(s=users; s != NULL; s=s->hh.next) {
        printf("user id %d: name %s\n", s->id, s->name);
    }
}
----------------------------------------------------------------------

There is also an `hh.prev` pointer you could use to iterate backwards through
the hash, starting from any known item.

.A hash is also a doubly-linked list
*******************************************************************************
Iterating backward and forward through the items in the hash is possible
because of the `hh.prev` and `hh.next` fields. All the items in the hash can
be reached by repeatedly following these pointers, thus the hash is also a
doubly-linked list. 
*******************************************************************************

Sorted iteration
^^^^^^^^^^^^^^^^
The items in the hash are, by default, traversed in the order they were added
("insertion order") when you follow the `hh.next` pointer. But you can sort
the items into a new order using `HASH_SORT`. E.g.,

    HASH_SORT( users, name_sort );

The second argument is a pointer to a comparison function. It must accept two
arguments which are pointers to two items to compare. Its return value should
be less than zero, zero, or greater than zero, if the first item sorts before,
equal to, or after the second item, respectively. (Just like `strcmp`).

.Sorting the items in the hash
----------------------------------------------------------------------

int name_sort(struct my_struct *a, struct my_struct *b) {
    return strcmp(a->name,b->name);
}

int id_sort(struct my_struct *a, struct my_struct *b) {
    return (a->id - b->id);
}

void sort_by_name() {
    HASH_SORT(users, name_sort);
}

void sort_by_id() {
    HASH_SORT(users, id_sort);
}

----------------------------------------------------------------------

When the items in the hash are sorted, the first item may also change
position; so in the example above, `users` may point to a different item after
calling `HASH_SORT`.

Putting it all together: A complete Example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We'll repeat all the code and embellish it with a `main()` function to form a
working example. 

If this code was placed in a file called `example.c` in the same directory as
`uthash.h`, it could be compiled and run like this:

    cc -o example example.c
    ./example

Follow the prompts to try the program, and type `Ctrl-C` when done.

.A complete program
----------------------------------------------------------------------
#include <stdio.h>   /* gets */
#include <stdlib.h>  /* atoi, malloc */
#include <string.h>  /* strcpy */
#include "uthash.h"

struct my_struct {
    int id;                    /* key */
    char name[10];             
    UT_hash_handle hh;         /* makes this structure hashable */
};

struct my_struct *users = NULL;

int add_user(int user_id, char *name) {
    struct my_struct *s;

    s = malloc(sizeof(struct my_struct));
    s->id = user_id;
    strcpy(s->name, name);
    HASH_ADD_INT( users, id, s );  /* id: name of key field */
}

struct my_struct *find_user(int user_id) {
    struct my_struct *s;

    HASH_FIND_INT( users, &user_id, s );  /* s: output pointer */
    return s;
}

void delete_user(struct my_struct *user) {
    HASH_DEL( users, user);  /* user: pointer to deletee */
}

void print_users() {
    struct my_struct *s;

    for(s=users; s != NULL; s=s->hh.next) {
        printf("user id %d: name %s\n", s->id, s->name);
    }
}

int name_sort(struct my_struct *a, struct my_struct *b) {
    return strcmp(a->name,b->name);
}

int id_sort(struct my_struct *a, struct my_struct *b) {
    return (a->id - b->id);
}

void sort_by_name() {
    HASH_SORT(users, name_sort);
}

void sort_by_id() {
    HASH_SORT(users, id_sort);
}

int main(int argc, char *argv[]) {
    char in[10];
    int id=1;
    struct my_struct *s;

    while (1) {
        printf("1. add user\n");
        printf("2. find user\n");
        printf("3. delete user\n");
        printf("4. sort items by name\n");
        printf("5. sort items by id\n");
        printf("6. print users\n");
        gets(in);
        switch(atoi(in)) {
            case 1:
                printf("name?\n");
                add_user(id++, gets(in));
                break;
            case 2:
                printf("id?\n");
                s = find_user(atoi(gets(in)));
                printf("user: %s\n", s ? s->name : "unknown");
                break;
            case 3:
                printf("id?\n");
                s = find_user(atoi(gets(in)));
                if (s) delete_user(s);
                else printf("id unknown\n");
                break;
            case 4:
                sort_by_name();
                break;
            case 5:
                sort_by_id();
                break;
            case 6:
                print_users();
                break;
        }
    }
}
----------------------------------------------------------------------

This program is included in the distribution in `tests/example.c`. You can run
`make example` in that directory to compile it easily.

Arbitrary keys
--------------

An example with a string key
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is almost the same as using integer keys except the macros are named
`HASH_ADD_STR` and `HASH_FIND_STR`.

[NOTE]
.char[ ] vs. char* 
================================================================================
The string is 'within' the structure in this example-- `name` is a `char[10]`
field.  If instead our structure merely 'pointed' to the key (i.e., `name` was
declared `char *`), we'd use `HASH_ADD_KEYPTR`, described in
<<Appendix_F,Appendix F>>.
================================================================================

.A string-keyed hash
----------------------------------------------------------------------
#include <string.h>  /* strcpy */
#include <stdlib.h>  /* malloc */
#include <stdio.h>   /* printf */
#include "uthash.h"

struct my_struct {
    char name[10];             /* key */
    int id;                    
    UT_hash_handle hh;         /* makes this structure hashable */
};


int main(int argc, char *argv[]) {
    char **n, *names[] = { "joe", "bob", "betty", NULL };
    struct my_struct *s, *users = NULL;
    int i=0;

    for (n = names; *n != NULL; n++) {
        s = malloc(sizeof(struct my_struct));
        strcpy(s->name, *n);
        s->id = i++;
        HASH_ADD_STR( users, name, s );  
    }

    HASH_FIND_STR( users, "betty", s);
    if (s) printf("betty's id is %d\n", s->id);
}
----------------------------------------------------------------------

This example is included in the distribution in `tests/test15.c`. It prints:

    betty's id is 2

[NOTE]
Remember, don't change an item's key after adding it to the hash. (Instead,
delete the item from the hash, change the key and then re-add it to the hash).

An example with a multi-field, binary key
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Your key field can have any data type, and it can even be an aggregate of
multiple contiguous fields. 

The generalized version of the macros (e.g., `HASH_ADD`) must be used. In
addition to the usual arguments, the generalized macros require the name of
the `UT_hash_handle` field, and the key length.

.Calculating the length of a multi-field key 
*******************************************************************************
To determine the key length when using a multi-field key, you must include any
intervening structure padding the compiler adds for alignment purposes.

An easy way to calculate the key length is to use the `offsetof` macro from
`<stddef.h>`.  The formula is:

     key length =   offsetof(last_key_field) 
                  + sizeof(last_key_field) 
                  - offsetof(first_key_field)

In the example below, the `keylen` variable is set using this technique.
*******************************************************************************

.A hash with a multi-field binary key
----------------------------------------------------------------------
#include <stdlib.h>    /* malloc       */
#include <stddef.h>    /* offsetof     */
#include <stdio.h>     /* printf       */
#include <string.h>    /* memset       */
#include "uthash.h"

struct my_event {
    struct timeval tv;         /* key is aggregate of this field */ 
    char event_code;           /* and this field.                */    
    int user_id;
    UT_hash_handle hh;         /* makes this structure hashable */
};


int main(int argc, char *argv[]) {
    struct my_event *e, ev, *events = NULL;
    int i, keylen;

    keylen =   offsetof(struct my_event, event_code) + sizeof(char)                         
             - offsetof(struct my_event, tv);

    for(i = 0; i < 10; i++) {
        e = malloc(sizeof(struct my_event));
        memset(e,0,sizeof(struct my_event));
        e->tv.tv_sec = i * (60*60*24*365);          /* i years (sec)*/
        e->tv.tv_usec = 0;
        e->event_code = 'a'+(i%2);                   /* meaningless */
        e->user_id = i;

        HASH_ADD( hh, events, tv, keylen, e);
    }

    /* look for one specific event */
    memset(&ev,0,sizeof(struct my_event));
    ev.tv.tv_sec = 5 * (60*60*24*365);          
    ev.tv.tv_usec = 0;
    ev.event_code = 'b';
    HASH_FIND( hh, events, &ev.tv, keylen, e );
    if (e) printf("found: user %d, unix time %ld\n", e->user_id, e->tv.tv_sec);
}
----------------------------------------------------------------------

This example is included in the distribution in `tests/test16.c`.

.Zero-fill a multi-field key before use
*******************************************************************************
Notice the use of `memset` in the example above to initialize the structure 
by zero-filling it.  This might seem unnecessary, since the next lines
explicitly set each field. However, the `memset` is needed, because it zeroes
out any 'padding' that the compiler has inserted into the structure. 

If we didn't zero-fill the structure, padding between the fields that comprise
the key would be initialized with random values. This would make it impossible
to lookup items by their key reliably. (The multi-field key comprises the
fields and their intervening padding).
*******************************************************************************


Participating in multiple hashes
--------------------------------

A structure can be in multiple hashes simultaneously.  For example, you might
want to hash a structure on both an ID field and on a unique username.

.A structure with two keys
----------------------------------------------------------------------
struct my_struct {
    int id;                    /* key 1 */
    char username[10];         /* key 2 */
    UT_hash_handle hh1,hh2;    /* makes this structure hashable */
};
----------------------------------------------------------------------

In the example above, the structure can now be added to two separate hashes;
in one hash, `id` is its key, while in the other hash, `username` is the key.
(There is no requirement that the two hashes have different key fields. They
could both use the same key, such as `id`).

[NOTE]
You need to have a separate `UT_hash_handle` for each hash that your structure
will participate in simultaneously. These are `hh1` and `hh2` in this example.

.A structure participating in two hashes
----------------------------------------------------------------------
    struct my_struct *users_by_id = NULL, *users_by_name = NULL, *s;
    int i;
    char *name;

    s = malloc(sizeof(struct my_struct));
    s->id = 1;
    strcpy(s->username, "thanson");

    HASH_ADD(hh1, users_by_id, id, sizeof(int), s);
    HASH_ADD(hh2, users_by_name, username, strlen(s->username), s);

    /* lookup user by ID */
    i=1;
    HASH_FIND(hh1, users_by_id, &i, sizeof(int), s);
    if (s) printf("found id %d: %s\n", i, s->username);

    /* lookup user by username */
    name = "thanson";
    HASH_FIND(hh2, users_by_name, name, strlen(name), s);
    if (s) printf("found user %s: %d\n", name, s->id);
----------------------------------------------------------------------

When participating in two or more hashes simultaneously, always use the same
key field with the same hash handle.  E.g., in this example, `hh1` is always
used with the `id` key, and `hh2` is always used with the `username` key.

Threaded programs
-----------------
You can use this hash in a threaded program. But, the locking is up to you.
You must protect the hash against concurrent modification. For every hash that
you create in a threaded program, you should create a lock to protect it.

Concurrent hash lookups (`HASH_FIND`) are permitted, but modifications
(`HASH_ADD`, `HASH_DEL`, and `HASH_SORT`) cannot be concurrent. Therefore, you
can use a read/write lock which provides concurrent-read/exclusive-write
semantics. Or you can use a mutex-style lock to prevent any concurrent access
to the hash, but this is less efficient.

[[Appendix_A]]
Appendix A: Choosing the hash function
--------------------------------------
Internally this software uses a particular hash function, such as Bernstein's
hash, to transform a key to a bucket number. 

.Bernstein hash function
-----------------------------------------------------------------
#define HASH_BER(key,keylen,num_bkts,bkt,i,j,k)                        \
  bkt = 0;                                                             \
  while (keylen--)  bkt = (bkt * 33) + *key++;                         \
  bkt &= (num_bkts-1);          
-----------------------------------------------------------------

Several such hash functions are built-in. The default is Jenkin's hash, but
your key domain may get better distribution with another function.  You can
use a specific hash function by compiling your program with
`-DHASH_FUNCTION=HASH_xyz` where `xyz` is one of the symbolic names listed
below. E.g., 

    cc -DHASH_FUNCTION=HASH_OAT -o program program.c

.Built-in hash functions
`---------`---------------
Symbol     Name    
--------------------------
JEN        Jenkins (default)
BER        Bernstein
SAX        Shift-Add-Xor
OAT        One-at-a-time
FNV        Fowler/Noll/Vo
JSW        Julienne Walker
--------------------------

Which hash function is best?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can easily determine the best hash function for your key domain. To do so,
you'll need to run your program once in a data-collection pass, and then run
the collected data through an included analysis utility.

First you must build the analysis utility. From the top-level directory,

    cd tests/
    make

We'll use `test14.c` to demonstrate the data-collection and analysis steps
(using `sh` syntax):

    cc -DHASH_EMIT_KEYS=3 -I../src -o test14 test14.c
    ./test14 3>test14.keys
    ./keystats test14.keys

The numeric value of `HASH_EMIT_KEYS` is a file descriptor. Any file descriptor
that your program doesn't use for its own purposes can be used instead of 3. 

[NOTE]
The data-collection mode enabled by `-DHASH_EMIT_KEYS=x` should not be used in
production code. 

The result of running the `keystats` command should look like this:

--------------------------------------------------------------------------------
fcn     hash_q     #items   #buckets      #dups      flags   add_usec  find_usec
--- ---------- ---------- ---------- ---------- ---------- ---------- ----------
BER   1.000000       1219        512          0         ok       1206        550
FNV   1.000000       1219        512          0         ok       1734        660
JEN   1.000000       1219        512          0         ok       2008        845
OAT   0.998897       1219        512          0         ok       1941        833
SAX   0.997514       1219       1024          0         ok       2217        674
JSW   0.994652       1219        512          0         ok       1369        608
--------------------------------------------------------------------------------

Usually, you should just pick the first hash function that is listed. Here, this
is `BER`.  This is the function that provides the most even distribution for
your keys. If several have the same `hash_q`, then choose the fastest one
according to the `find_usec` column.

[TIP]
Sometimes you might pick the fastest one even though its `hash_q` isn't the
best. This is ok, particularly if its substantially faster (ideally hash
functions have negligible performance overhead but in reality they can vary).

keystats column reference
~~~~~~~~~~~~~~~~~~~~~~~~~
fcn::
    symbolic name of hash function
hash_q::
    a number between 0 and 1 that measures how evenly the items are
    distributed among the buckets (1 is best)
#items::
    the number of keys that were read in from the emitted key file
#buckets::
    the number of buckets in the hash after all the keys were added
#dups::
    the number of duplicate keys encounted in the emitted key file. Duplicates
    keys are filtered out to maintain key uniqueness. (Duplicates are normal.
    For example, if the application adds an item to a hash, deletes it, then
    re-adds it, the key is written twice to the emitted file.)  
flags::
    this is either `ok`, or `noexpand` if the 'expansion inhibited' flag is
    set, described in Appendix C.  It is not recommended to use a hash
    function that has the `noexpand` flag set.
add_usec::
    the clock time in microseconds required to add all the keys to a hash
find_usec::
    the clock time in microseconds required to look up every key in the hash

Appendix B: Statistic hash_q
----------------------------
The hash_q statistic measures how evenly the keys are distributed among the
buckets.  It is a value between 0 (worst) and 1 (best).  You can access it
in this way, if you're interested in observing how well your hash keys hold up
to the ideal distribution.

.Observing the hash_q statistic
----------------------------------------------------------------------

/* using the example "users" hash from previous listings */

void print_hashq() {
    if (users) printf("hash_q: %f\n", users->hh.tbl->hash_q);
}
----------------------------------------------------------------------

The hash_q statistic is updated whenever bucket expansion occurs 
(see <<Appendix_C,Appendix C>>). 

.What is hash_q?
*****************************************************************************
The 'n' items in a hash are distributed into 'k' buckets. Ideally each bucket
would contain an equal share '(n/k)' of the items. (I.e., the maximum linear
position of any item within a bucket would be 'n/k'.) The hash_q statistic
measures the fraction of items whose linear position is less than or equal to
the maximum ideal position 'n/k'. 

Said another way, 'hash_q' is the fraction of items which can be found using
no more than the ideal number of steps, given the number of items and buckets.
*****************************************************************************

A low value of `hash_q` may lead to reduced lookup (`HASH_FIND`) performance.
You can change the hash function to one that distributes your keys more
evenly; see <<Appendix_A,Appendix A: Choosing the hash function>>. 


[[Appendix_C]]
Appendix C: Bucket expansion
----------------------------
Internally this hash manages the number of buckets, with the goal of having
enough buckets so that each one contains only small number of items.  

.Why does the number of buckets matter?
*****************************************************************************
When looking up an item by its key, this hash scans linearly through the items
in the appropriate bucket. In order for the linear scan to run in constant
time, the number of items in each bucket must be bounded. This is accomplished
by increasing the number of buckets as needed.
*****************************************************************************

This hash attempts to keep fewer than 10 items in each bucket. When an item is
added that would cause a bucket to exceed this number, the number of buckets in
the hash is doubled and the items are redistributed. 

[NOTE]
Bucket expansion occurs automatically and invisibly as needed. There is
no need for the application to know when it occurs. 

Bucket expansion hook
~~~~~~~~~~~~~~~~~~~~~
There is a hook for this event, primarily for uthash development. 

.Bucket expansion hook
----------------------------------------------------------------------------
#include "uthash.h"

#undef uthash_expand_fyi 
#define uthash_expand_fyi(tbl) printf("expanded to %d buckets\n", tbl->num_buckets)

...
----------------------------------------------------------------------------

During bucket expansion, if the number of items in a bucket exceeds the ideal
number, then the expansion threshold for that bucket will be increased
(instead of being 10, it will be a multiple of 10). This prevents excessive
bucket expansion for hash functions that tend to over-utilize a few buckets
yet have good distribution overall.

Bucket expansion inhibited hook
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some key domains may not yield a good bucket distribution using the default
(or manually-specified) hash function. 

If this occurs, statistic `hash_q` reflects the uneven distribution by tending
closer to 0 than 1.  This means that items are piling up in some buckets while
other buckets are under-utilized.  Bucket expansion then takes place as this
hash tries to keep a bounded number of items in each bucket.

However, if items continue to be distributed unevenly after repeated bucket
expansion, there is a point at which this hash in effect says, '"This key
domain is really incompatible with this hash function.  I could spend all day
expanding the buckets but it isn't helping to spread out the items in the
hash, so I'm going to cut my losses and stop expanding the buckets."'

Inhibition criteria 
^^^^^^^^^^^^^^^^^^^
The decision to inhibit expansion is made by a heuristic rule: if two
consecutive bucket expansions yield `hash_q` values below 0.5, then the
'bucket expansion inhibited' flag is set for this hash. 

Because this condition would cause `HASH_FIND` lookups to exhibit worse than
constant-time performance, your application can choose to be made aware if it
occurs.  (Usually you don't need to worry about this, if you used the
`keystats` utility during development to select a good hash for your keys).

.Bucket expansion inhibited hook
----------------------------------------------------------------------------
#include "uthash.h"

#undef uthash_noexpand_fyi
#define uthash_noexpand_fyi printf("warning: bucket expansion inhibited\n");

...
----------------------------------------------------------------------------

Once set, the 'bucket expansion inhibited' flag remains in effect as long as
the hash has items in it.


Appendix D: Hooks for memory allocation
----------------------------------------
By default this hash implementation uses `malloc` and `free` to manage memory.
If your application uses its own custom allocator, this hash can use them too.


.Specifying alternate memory management functions
----------------------------------------------------------------------------
#include "uthash.h"


/* undefine the defaults */
#undef uthash_bkt_malloc
#undef uthash_bkt_free
#undef uthash_tbl_malloc
#undef uthash_tbl_free

/* re-define, specifying alternate functions */
#define uthash_bkt_malloc(sz) my_malloc(sz)  /* for UT_hash_bucket */
#define uthash_bkt_free(ptr) my_free(ptr)    
#define uthash_tbl_malloc(sz) my_malloc(sz)  /* for UT_hash_table  */
#define uthash_tbl_free(ptr) my_free(ptr)    

...
----------------------------------------------------------------------------

.Why are there two pairs of malloc/free functions?
*******************************************************************************
The first pair allocates and frees `UT_hash_bucket` structures,
while the second pair deals with `UT_hash_table` structures. While
they don't 'need' to have separate allocation/free functions (and indeed the
default is just to use `malloc` and `free` anyway), having them separate
permits easy integration with pool-type allocators. (In these allocators,
there is a pool for each data structure.)
*******************************************************************************

Out of memory handling
~~~~~~~~~~~~~~~~~~~~~~
If memory allocation fails (i.e., the malloc function returned `NULL`), the
default behavior is to terminate the process by calling `exit(-1)`. This can
be modified by re-defining the `uthash_fatal` macro.
 
    #undef uthash_fatal
    #define uthash_fatal(msg) my_fatal_function(msg);

The fatal function should terminate the process; uthash does not support
"returning a failure" if memory cannot be allocated.

Appendix E: Internal implementation notes
-----------------------------------------

Application and bucket order
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The `UT_hash_handle` data structure includes `next`, `prev`, `hh_next` and
`hh_prev` fields.  The former two fields determine the "application" ordering
(that is, iteration order that corresponds to the order the items were added).
The latter two fields determine the "bucket" order.  These link the
`UT_hash_handles` together in a doubly-linked list that defines a bucket.

Internal consistency check with -DHASH_DEBUG=1 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If a program that uses this hash is compiled with `-DHASH_DEBUG=1`, a special
internal consistency-checking mode is activated.  In this mode, the integrity
of the whole hash is checked following every add or delete operation.

[NOTE]
This is for debugging the uthash software only, not for use in production code. 

In this mode, any internal errors in the hash data structure will cause a
message to be printed to `stderr` and the program to exit.

.Checks performed in `-DHASH_DEBUG=1` mode:
- the hash is walked in its entirety twice: once in 'bucket' order and a
  second time in 'application' order
- the total number of items encountered in both walks is checked against the
  stored number
- during the walk in 'bucket' order, each item's `hh_prev` pointer is compared
  for equality with the last visited item
- during the walk in 'application' order, each item's `prev` pointer is compared
  for equality with the last visited item

In the `tests/` directory, running `make debug` will run all the tests in
this mode.

[[Appendix_F]]
Appendix F: Macro reference
---------------------------

Generalized macros
~~~~~~~~~~~~~~~~~~

These macros add, find, delete and sort the items in a hash.  

.Generalized macros
`---------------`-----------------------------------------------------------
macro           arguments
----------------------------------------------------------------------------
HASH_ADD        (hh_name, head, keyfield_name, key_len, item_ptr)
HASH_ADD_KEYPTR (hh_name, head, key_ptr, key_len, item_ptr)
HASH_FIND       (hh_name, head, key_ptr, key_len, item_ptr)
HASH_DELETE     (hh_name, head, item_ptr)
HASH_SORT       (head, cmp)
----------------------------------------------------------------------------

[NOTE]
`HASH_ADD_KEYPTR` is used when the structure contains a pointer to the
key, rather than the key itself. 

Convenience macros
~~~~~~~~~~~~~~~~~~
The convenience macros do the same thing as the generalized macros, but
require fewer arguments.  They have key and naming restrictions (see below).

.Convenience macros
`---------------`-----------------------------------------------------------
macro           arguments
----------------------------------------------------------------------------
HASH_ADD_INT    (head, keyfield_name, item_ptr)
HASH_FIND_INT   (head, key_ptr, item_ptr)
HASH_ADD_STR    (head, keyfield_name, item_ptr)
HASH_FIND_STR   (head, key_ptr, item_ptr)
HASH_DEL        (head, item_ptr)
----------------------------------------------------------------------------

Key and naming restrictions for convenience macros
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to use the convenience macros, 

1. the structure's `UT_hash_handle` field must be named `hh`, and
2. the key field must be of type `int` or `char[]`

`HASH_DEL` does not require the second condition.


Argument descriptions
~~~~~~~~~~~~~~~~~~~~~
hh_name::
    name of the `UT_hash_handle` field in the structure. Conventionally called
    `hh`.
head::
    the structure pointer variable which acts as the "head" of the hash. So
    named because it points to the first item which is added to the hash.
keyfield_name::
    the name of the key field in the structure. (In the case of a multi-field
    key, this is the first field of the key). If you're new to macros, it
    might seem strange to pass the name of a field as a parameter. See
    <<validc,note>>.
key_len::
    the length of the key field in bytes. E.g. for an integer key, this is
    `sizeof(int)`, while for a string key it's `strlen(key)`. (For a
    multi-field key, see the notes in this guide on calculating key length).
key_ptr::
    for `HASH_FIND`, this is a pointer to the key to look up in the hash
    (since it's a pointer, you can't directly pass a literal value here). For
    `HASH_ADD_KEYPTR`, this is the address of the key of the item being added.
item_ptr::
    pointer to the structure being added, deleted or looked up. This is an
    input parameter for HASH_ADD and HASH_DELETE macros, and an output
    parameter for HASH_FIND.
cmp::
    pointer to comparison function which accepts two arguments (pointers to
    items to compare) and returns an int specifying whether the first item
    should sort before, equal to, or after the second item (like `strcmp`).
