Random thoughts about native code generation, which will be compatible 
with the already existing (non-host-specific) dyntrans core.


How to keep track of the number of times a basic block is executed? 
(Perhaps needed, since unnecessary native code generation may slow things 
down. Only the blocks that are really common need to be natively 
translated.)

Perhaps having a small additional array per page is a solution?
	unsigned char count[NR_OF_IC_ENTRIES_PER_PAGE];
For a typical MIPS cpu, that would be 1024 bytes extra per page.
The main loop could be changed to increase count, and if count goes beyond
a certain threshhold, the block is natively translated. Hm.

Or perhaps the overhead of implementing this counter check is more than it 
is worth? After all, most of the time will be spent executing (some of) 
the translated loops.

-------------------------------------

At most one [basic] block is ever translated at any given time.
A small array can hold the INR entries, and a small memory area can
hold a (double-linked list) of native instruction entries.

Simple instructions:

32-bit MIPS:
	andi $5,$5,0xff00
	ori $5,$5,0x0011

Intermediate native representation:
	AND_REG32PTR_REG32PTR_IMM16 (offset to reg 5, offset to reg 5, 0xff00)
	OR_REG32PTR_REG32PTR_IMM16 (offset to reg 5, offset to reg 5, 0x0011)

Non-peephole-optimized x86[_64] code:  (esi = struct cpu *)
	mov eax, [esi + offset_to_source_reg]
	and eax, 0xff00
	mov [esi + offset_to_destination_reg], eax	(#1)
	mov eax, [esi + offset_to_source_reg]		(#2)
	or eax, 0x0011
	mov [esi + offset_to_destination_reg], eax

Peephole-optimized x86[_64] code:
(on the first pass, #2 is removed, since it loads back a value which was
previously written. the value is already in eax!)
(on the second pass, the store at #1 is removed, since another store
later on overwrites the same register)
	mov eax, [esi + offset_to_source_reg]
	and eax, 0xff00
	or eax, 0x0011
	mov [esi + offset_to_destination_reg], eax

Native code entry:
	(none on x86_64)

Native code exit:
	ret[q]

---------------------------

Update of nr-of-executed-instructions and the IC pointer:

	All possible return paths need to update the following:

	x) The nr-of-executed-instructions count (one less than the
	   number of instructions in the translated block, since an
	   implicit count of 1 is already included).
	x) The next_ic pointer, and also the cur_page if we have
	   switched page.

-----------------------------

Stages during translation:

	Stage 1:
		Emulated ISA (e.g. MIPS) to INR instructions.
		Each emulated instruction may be turned into 0 or
		more INR instructions.
		This is done in e.g. src/cpus/cpu_mips_instr.c
		using semi-magic macros.
		The INR array is a fixed size small array, pointed
		to by the cpu struct.

	Stage 2:
		INR -> native operations (e.g. x86).
		This is done in src/native/native_x86.c.
		Things to think about are round-robin use of
		temporary registers.
		native_inr_to_native_ops() takes a cpu as input,
		translates the current INR entries into native
		pseudo-opcodes.

	Stage 3:
		Optimization, native ops -> native ops.
		This is done in src/native/native_x86_optim.c,
		and is an optional step. It should be possible
		to turn this step of, for debugging.
		If e.g. a value is in a register, and it is stored
		to memory, then the same memory position does not
		have to be read back; the value is already in a
		register.

	Stage 4:
		Code generation, native ops -> native machine code.
		Done in src/native/native_x86_gen.c.

	Stage 5:
		Patch _older_ code chunks so that they can branch
		directly to the new chunk, if possible.
		An optional step.

	Stage 6:
		Enter the newly generated native code chunk into
		the physpage' ic->f.
