
This is part of the Objectify project documentation.
Copyright (C) 2009   J. Scott Edwards
See the file README for copying conditions.


This document describes the object id's and how they have evolved.

My first plan was that an object ID was 8 bytes long.  And then I decided that
every object needed to have a "context" so I added an 8 byte context to go
with the id, so each identifier was 16 bytes, a 8 byte context and an 8 byte
identifier.  You can see these in the early C structure for references.  I
believe this was the format for the Alpha_01 and Alpha_02 releases.

Then a few months ago, in the Alpha_03 version (29-Dec-2005), I changed the 
identifiers down to simply 4 bytes.  The logic there was that even with the
largest drives available today you couldn't have more than 4 billion objects on
one hard drive.  At that time I was not thinking in terms of a global database.
So versions Alpha_03 through Alpha_05 used the 32 bit identifiers.

It was then that I started thinking about public data verses private data and
decided that 32 bits was woefully inadequate for public data.  The following
text was in the "TODO" file and goes through my thought process for the size
of the references:

 * Perhaps I should go back to longer id's if there is a fixed range of system
   and public objects.  Don't want to repeat the Y2K fiasco!  One more byte
   should be enough, but 5 bytes is so uneven.  I'm sure six would be enough.
   If we just used the ones with the upper byte zero that would be more than
   a trillion public objects.  If we used everything that had the upper
   nibble zero, it would be 16 trillion, leaving 240 trillion for non-public
   uses.  Better yet, just the upper 2 bits are zero that would be 64 trillion
   for the system and 192 trillion for non-public.  Thinking more about this
   perhaps we should reverse it and have 192 trillion for system/public
   objects and 64 trillion for private uses.  This actually makes more sense
   because the system/public objects are global and have to cover everything.
   Whereas the private objects can be duplicated, I.E. each system can reuse
   them.  So going back to the upper nibble, perhaps everything with 0 in the
   upper nibble should be local to the system and everything non-zero belongs
   to the system.  Of course then void (all zeros) would still be reserved
   or we could use all 1's as the void object....  And all zeros could be the
   root object on the local system....  hmmm.  But then all zeros could still
   be void, 0000000001 the local root object and all 1's the system root
   object.  No I think I'll start counting the system/public objects at
   100000000000.  Maybe I should go back to 64 bits.  That way I can't see
   how we could ever run out of objects...  ok never mind I'm sure it will
   happen but not in my lifetime ;-)  So:

     0000 0000 0000 0000 - void
     0000 0000 0000 0001 - local system root
     1000 0000 0000 0000 - system / public root

   All system/public objects are unencrypted.  How do we know if a private/
   local object is encrypted or not?  Maybe all private objects should be
   encrypted...?

   Ok, after pondering this awhile, I have changed my mind again about the
   size of the reference.  While 240 trillion things seems like a lot, I have
   been thinking about all the bits of information there are in the world.  I
   have concluded that it is possible almost likely that there are or will be
   240 trillion things at some point in the future.  So I considered going to
   12 bytes per reference, which I have no doubt would be enough.  But then
   is there any reason to not just go to 16 bytes and make it an even binary
   number?  In reality most objects are going to be under 512 bytes, so we
   aren't going to lose any disk space since every object sucks up 512 bytes
   at a minimum anyway.  Now the question is whether to go back to the base32
   thing for the file names?  No it was too messy and that was 64 bits anyway.
   But 32 character file names could be unwieldy.  The best thing would be to
   abandon file storage altogther, but that is another (big) project.  For
   now what to do?  We could go to some funny scheme to use other characters
   in file names (A-Z + a-z + 0-9) gives 26+26+10=62 if we find another two
   inocuous characters then we have 6 bits 128/6 = 21r2.  Then on the other
   hand many file systems can't cope with millions of files in one directory
   anyway.  Probably should go to plan B at this point, which was break the
   file name up into directories I.E. object 11112222333344445555666677778888
   is stored at path: /obj/1111/2222/3333/4444/5555/6666/7777/8888.  The void
   reference is all zeros.  The all objects with FFFF in the upper 16 bits
   are private objects.  Eight levels of directories seem kind of unwieldy
   though.  You could use 4 levels /obj/11112222/33334444/55556666/77778888,
   but then you could easily exceed the directory limits.  In reality this
   could be adjusted on a per system basis.  For now private objects will
   be random within the lower 32 only /obj/FFFFFFFF/FFFFFFFF/FFFFFFFF/XXXXXXXX.
 
This last form, a 16 byte identifier, with the 3 upper 32 byte words defining
the domain, was used for all of the "big bang" releases (Alpha_06 through
Alpha_13).

At this point I had decided that it would be better to incorporate the class
into the identifier.  So that one could tell from the ID what class the object
was.  I also toyed around with the idea of making the identifier variable
length.  See my hand written notes from 25-Feb-2006 to 28-Feb-2006 (which I
hope to post on the NWOS project web page on SourceForge at some point).

I have now decided to go back to the previous plan of fixed length 16 byte 
identifiers, with the domains but not the classes in the ID.  While the
variable length identifiers might save some space they are so much grief in
the code that I don't think it's worth it.  Similar to the debate over the
variable length opcodes used in CISC processors (like the x86) and fixed length
opcodes in RISC processors.  I have decided to opt for the simplest solution
even if it uses more space.

Also I have decided to NOT include the class in the ID.  I decided this added
complexity as well, and it makes a real mess of the private objects that have
random identifiers (for security).

I do think, however, that I'm going to keep the directory structure idea that
I had for the variable length ids and use it with the fixed length ids.

