It's become clear that the plague called the 8086 architecture has sufficiently entrenched itself that it's not going to go away. For the last month or more, Mike Riddle, John Walker, Keith Marcelius, and Greg Lutz have been bashing their collective heads against it. The following is collected information on this unfortunate machine.
I think we'd be wise to diffuse our 8086 knowledge among as many people as possible. The main reference for the 8086 is a book called, imaginatively enough, The 8086 Book published by Osborne. This is the architecture and instruction set reference, but does not give sufficient information to write assembly code (of which, more later). However, it is the starting point to understand the machine. AI will reimburse the cost of your buying this book, which is available at computer and electronic stores.
I have never encountered a machine so hard to understand, one where the most basic decisions in designing a program are made so unnecessarily difficult, where the memory architecture seems deliberately designed to obstruct the programmer, where the instruction set seems contrived to induce the maximum confusion, and where the assembler is so bizarre and baroque that once you've decided what bits you want in memory you can't figure out how to get the assembler to put them there. But I digress.
Mike Riddle has come up with the following programming rules for the 8086. They are presented here for comments from people with 8086 experience.
With regard to other 8086 developments, Hal Royaltey is writing a floating point package for the beast. The floating point package will be compatible with the IEEE double precision format used by the 8087. We'll set things up so that a program can be easily (maybe automatically?) configured for hardware or software floating point. This floating point package will be used for both SPL and QBASIC programs.
John Walker has a version of QBASIC that generates 8086 assembly code. The compiler still runs on the 9900, where it will stay until META is running on the 8086. Soon we'll be loading the code onto the IBM to make sure it assembles properly, and to check out the segmentation structure of the code/library interface. Assuming that works, it's full steam ahead with QBASIC on the 8086. John Walker will be completing the compiler conversion and basic library routines, Dan Drake will be converting the I/O library, and we'll be integrating Hal Royaltey's floating point package and Mike Riddle's format independent math routines.
We'll be completing the META port on the IBM here, freeing Mike Riddle's time to concentrate on the SPL compiler and runtime library.
In developing both SPL and QBASIC, we're taking the following approach to the 8086. We want to treat the thing as if it had true large memory, even though it's deliberately set up to obstruct us in doing that. We're imposing only the constraint that the static code generated by any one compilation cannot exceed 64K (which would be an unwieldy source program anyway). Dynamically allocated strings and arrays may be anywhere in the 1MB addressing space, and linked lists will use a general segment/offset 32 bit address for pointers. Any number of modules of up to 64K each may be linked together, and runtime library size will not subtract from the maximum program size. Thus, our compilers and their generated code will be limited only by the physical memory constraints of the machine and the operating system we're running under. This is a very important competitive edge: remember that most 8086 code is translated 8080 code, and such converted code cannot easily exceed 128K (or 64K if it's messy). Our programs will have no such limit.
It's planned that an ``engineering test version'' of QBASIC will be running in about a week on the IBM to verify the basic memory architecture ideas that go into the above (such a test is required because the IBM assembler and linker are so confusing that whether some ideas will work cannot be determined from the manuals).
We also lack documentation of the Microsoft/IBM relocatable code format used on the 8086, although Mike Riddle suspects it's an extended version of the bitstream code used by Microsoft Fortran on the 8080 and adopted by Digital Research. Even if it is, we still don't know how the additional information for the 8086 was encoded. Does anybody know this, or have any leads to find out? We need to know to make our compilers salable, as we can't expect people to buy the IBM Macro Assembler just to assemble the code from QBASIC. I can think of lots of things I'd rather do than reverse-engineer somebody's bitstream relocatable format.
Editor: John Walker