Make a good layout detector. Every character on its line.
Separate (more) merged characters. (Not so easy).
Deal with frames, lines, pictures, etc.
