FANG — Internal Documentation

John Walker


Introduction

The internal structure of FANG is both complex and unconventional, and is representative of a technology far advanced over that encountered in most EXEC-8 programs. This structure allows unprecedented efficiency in utilising the features of the hardware and software, and provides an elegant internal structure to a program that would otherwise be a nightmare. To one who understands the internal structure of FANG, and the concepts that underlie that structure, modifications and enhancements are easy to install: like the colonial administrator in Africa, the programmer will find many primitives willing to do his work, but only if he knows how to communicate with them. Conversely, the 'programmer' who attempts to add a 'feecher' by kludging an existing routine, or who attempts to learn the structure of FANG by prodding it and seeing how it responds, will be dealt a retribution every bit as swift and decisive as being made into a tasty stew after being caught with the chief's wife. The following is an attempt to convey to those qualified to modify FANG the information they will need to do so properly, and to so terrify those unqualified for the job that they will run, coding forms in hand, screaming from the temple of reentrancy that is FANG.

It is customary in the beginning of internal documentation to explain the cute acronym that makes up the name of a program. after two weeks of diligent searching, wearing out two copies of the Oxford English Dictionary as well as my index finger, I gave up and named it after my cat.

Design Philosophy

Parallelism

FANG differs from most programs in that it attempts to perform as many operations in parallel as are possible, both at the internal and the user source language level. EXEC-8 is almost alone among successful commercial operating systems in providing the capabilities needed to do parallel processing in a convenient and efficient manner, yet few programs take advantage of the capability. The primary reason for this is that few programmers understand enough about the concepts of parallel processing to write programs using them. Moreover, the fact that EXEC-8 is used as a multiprogramming system leads to the belief that optimisation of individual programs is wasted effort, since wait time in one program can be taken up by other programs in the mix. This philosophy, the 'IOW$ COPOUT', has led to many terrible implementations, notably, COPOUT using IOW$. the assumption of efficient utilisation by job-mix would be valid if there were sufficient compute bound programs in the mix to take up the I/O time, and if these programs could be coresident with the non-parallel I/O job. The great flaw in the multiprogramming argument is that it neglects the fact that facilities are normally tied up for the duration of a job, and that core space, a critical matter on 1108's with their endemic problem of small memory and large programs, is tied up by programs whether they are computing or doing I/O.

The philosophy underlying the design of FANG is that a program coded to optimally use the hardware and software, a program which will finish most rapidly when run by itself, exhibits just those properties most desirable in the multiprogramming environment. This statement, 'Lynch's folly', is often contested with the claim that such a program will monopolise the system to the extent that other work will not get done. This can indeed happen if the operating system is poorly designed. A properly designed operating system will be able to apportion the facilities of the system to the programs sharing it so as not to let any program benefit at the expense of others. The parallel program provides the system with enough concurrent tasks that it can fill free time that would otherwise have been lost in the idle loop. hence, if the operating system is well designed the parallel program is its best friend, since it provides a constant source of things to do to occupy its time. Support your local guard mode light.

Reentrancy

As was mentioned before, the biggest problem in many 1108 installations is the allocation and availability of main memory space. What with the astronomical cost of memory and the 262k maximum in any case, shops with any demand load, or reasonable sized batch programs find it difficult to fit enough programs in core at a time to provide by multiprogramming overlap enough work to profitably utilise the central processor. The parallel design of FANG helps in several ways. First, FANG is implemented as a reentrant sharable bank. This means that any number of users may simultaneously use FANG, and only one copy of its instruction and constant data bank will be present at any time. Second, all storage allocation for variable storage within FANG is dynamic, with required core being acquired from the system as needed. This insures that only the storage needed will be used by FANG at any time. In an application characterised by large buffers, this is essential. Third, the parallelism in execution overlaps compute with I/O time, insuring that the program completes as soon as possible, thereby reducing total core usage time. Compare a FANG copy operation, which completes in the time required for the data transfer to the slowest device, with an unbuffered copy which takes the sum of the device transfer times and the compute time in the program. Fourth, the overlap of operations allows FANG to take advantage of the multiprocessor, multiprogramming, and multipath properties of a large 1100 series installation. It is true of FANG, and few other programs, that the bigger a system you run it on, the faster it runs. Again, this helps to get the job done faster. Fifth, and finally, the unique source-language overlap capability of FANG allows operations to be done in parallel that would have been restricted to serial execution in a more conventional implementation.

The chief criticism that could be raised to the above is that while buffering and parallelism does improve utilisation and efficiency, and reduce total residency time, the total core required is greater, so the gain is negated, and in fact, in an installation where core is very tight, and fitting many users in at once is essential to good performance, a program like FANG is an anathema. I will not stoop to reply to such a statement which disagrees with me. However, let it be noted that commands permit the concurrency of FANG to be restricted, limiting core usage to any desired level.

The Scheduler

Since FANG is implemented as a multi-activity program, some coherent means is needed to insure that the various parallel activities perform their duties in harmony and without interference to one another. Above all, there must be a mechanism to protect data referenced by different activities, and a means to conveniently transfer data between activities. EXEC-8 provides all the primitive operations needed to synchronise parallel activities (in fact, it is probably the only system to offer at least five completely independent methods), but none of them are, by themselves, suitable for our purposes. Consequently, a structure is built upon one of those methods (ACT$/DACT$) which offers the desired features.

Activity Synchrony

The element SCHEDULER is an implementation of the method of activity synchronisation conceived by Edsger W. Dijkstra, and described on page 345 of the May 1968 issue of Communications of the ACM (volume 11, number 5). The system involves two functions, designated P and V, which correspond to the acquisition and release, respectively, of a facility. The availability of a facility is indicated by the count in special variable designated a semaphore.

The P Function

The P function is the means by which a facility is obtained or a lock is set. When an activity wishes to obtain a facility, it executes the P function on the semaphore (which for the sake of confusion is referred to as a QUEUE throughout FANG) representing that facility. The P function unconditionally decrements the count in the semaphore. If the count was greater than zero when the P function was invoked on the semaphore, the calling activity is allowed to proceed. If the count is zero or negative, the calling activity is deactivated via DACT$, and is inserted upon a queue, which is part of the semaphore variable. remember that in any case, the count is decremented.

The V Function

The V function is the complement operation of P, and is used to clear a lock or release a facility. The parameter to V is a semaphore, and as a result of the V call, the semaphore count will be unconditionally incremented. If at the time of the call on V, the count in the semaphore was negative, that indicates that the queue attached to the semaphore has one or more activities upon it, waiting for the lock to be cleared or the facility to become available. In this case, V removes the first activity from the waiting queue (the queue is first in, first out, so this will be the first activity that was deactivated on the queue by P), and activates it with ACT$. that activity then will proceed with its return from the P function with the facility obtained, unaware that any delay has occurred.

Queue Operations

FANG uses a consistent data and queue structure throughout the implementation. This provides a common format which aids in debugging, and reduces the number of separate sections of code that must deal with data and link operations. The basic format is a doubly linked queue, with the format of the header of a data item being identical to that of the queue. The queue contains a test-and-set lock to protect operations being performed upon its links.

Queue Insert Operation

The INSERT primitive places a data item upon a queue. the format of the queue is identical to the format of a semaphore. insert may be called by parallel activities referencing the same queue without any extra lock being invoked.

Queue Remove Operation

REMOVE obtains a data item from a queue, and is the complement of INSERT. should no data item be on the queue, REMOVE returns the address of the queue itself. This feature is frequently used to detect the end of data lists.

Bounded Buffers

A bounded buffer is the most elegant means of transferring data between two parallel activities. A bounded buffer consists of a data queue and two semaphores. Of the two semaphores, one is designated the 'not empty' semaphore, and the other the 'not full' semaphore. The data queue holds items being passed between the activities. The 'not empty' queue is used to regulate the removing activity, and at any time, the count in the 'not empty' semaphore will equal the number of data items on the data queue. the 'not full' semaphore regulates activities inserting items on the buffer.

The operation of a bounded buffer can best be understood by an example. A empty bounded buffer is initialised as follows: the data queue is empty, that is, linked to itself. The 'not empty' semaphore has a count of zero, indicating no items on the buffer. The 'not full' semaphore has a count equal to the amount of buffering ahead desired between the activities.

Bounded Buffer Put

The PUT operation inserts a data item on the bounded buffer. It executes in sequence: a P function on the 'not full' semaphore of the bounded buffer. An Insert of the data item on the data queue of the bounded buffer. A V function on the 'not empty' queue of the bounded buffer. This functions as follows: first, we check to see if the concurrency limit has been met by the P on 'not full'. We will be allowed to proceed only if it has not. If we are up against the concurrency limit, we will be deactivated by the P function, and awakened when the concurrency is reduced by a GET operation on the buffer (see below). Having passed the concurrency limit enforcement (with or without an intermediate deactivation), we insert the data item on the data queue, making it available to receiving activities. Finally, we indicate its availability with a V function on the 'not empty' semaphore. This wakes up any activity asleep waiting for a buffer, or insures another pass through the loop for an already active receiving activity.

Bounded Buffer Get

The GET operation inverts the PUT sequence and provides a means of obtaining data items from the bounded buffer. In a GET operation, we execute a P function on the 'not empty' queue, then remove a data item, and finally execute a V on the 'not full' semaphore. What this does, then, is to delay if necessary until a buffer becomes available to process, obtain the buffer, and release the activity submitting buffers.

Operation Of Bounded Buffers

Let us assume for the example that we want to copy one tape to another, and that we are using a bounded buffer which starts with an empty data queue, a 'not empty' semaphore initialised to zero, and a 'not full' semaphore initialised to three. we will create two parallel activities designated 'reader' and 'writer'. 'reader' will read a block from the input tape, put the data into a data item, and do a PUT onto the bounded buffer before reading the next block. 'writer' does a GET from the bounded buffer, writes the block to the output tape, and loops back to get the next block. When we first activate the two activities, 'writer' will be deactivated trying to remove an item from the empty buffer. When 'reader' finally gets a block, the PUT operation, in executing a V on 'not empty', will wake up 'writer'. It can be easily seen that the 'reader' will not be permitted to get more than the initial count in 'not full' ahead of the 'writer', and that the 'writer' will be deactivated when it gets ahead of 'reader' and exhausts all available buffers. It should also be apparent that this is a very nice way to do a tape copy.

Activity Creation And Deletion

The routines FORK and EXIT provide for dynamic creation and deletion of activities. An activity using the primitives described above must have a 'switch list' entry, which is at all times pointed to by user register x4, which contains the header words for a data item, and as data, the name$-assigned activity name. This switch list is inserted on the semaphore queue when a P function fails, and is removed by the V function and used to obtain the activity name for activation. Additionally, the switch list is used for reentrant storage for contingency return point storage, and several other cases where activity local storage is required. The FORK routine creates a new activity with the EXEC-8 primitive FORK$. the newborn activity allocates a switch list, and names itself with NAME$. ready to face the world, the activity then jumps to the entry address specified in the call to FORK. EXIT is the final resting place for activities that time has passed by. The routine EXIT releases the switch list buffer, and terminates the activity via EXIT$.

The Buffer Allocator

FANG uses the BGET routine for buffer allocation. As BGET is a standard part of LIBRARY W, and is extensively documented in the comments at the beginning of the element itself, this section will only summarise its operation and describe how it is configured for this application.

BGET allocates variable size buffers from a common pool of space. Only the requested space is allocated, with a two word control area before the user's buffer area. When buffers are released, they are recombined with any existing free contiguous areas to form larger free blocks. If all buffers are released, the pool will be one large block again.

When buffer space is exhausted, additional space is obtained from the system via mcore$. This additional space is itself structured as a pool, and allocation continues. Should the release of buffers make it possible, free pool space extending to the end of memory will be released with lcore$.

If the assembly time variable 'debug' is set to one, BGET will log all mcore$ and lcore$ requests as they are made. additionally, the FANG termination routine will check for lost buffers, and append a cryptic message to the termination line if any were lost. To aid in plugging holes in buffer usage, the 'z' option (operable only if generated with 'debug' set to one), will log all BGET and BREL calls. This option may be set and cleared on the fly.

The Dispatcher

The dispatcher is the activity which controls the execution of commands. The unit upon which the dispatcher operates is the command packet generated by the scanner. Each command packet represents one statement or substatement (one element of a command which takes a variable number of commands, like REWIND), and attached to the command packet are a list of parameters, including all files, numbers, data, etc., scanned for this command. All unexecuted commands reside on the queue CMDQUE, and it is there that the dispatcher examines them.

Read Mode And Write Mode

Command selection by the dispatcher depends heavily upon the way the file and block parameters to the command are used by the command. A command is said to use a parameter in 'read mode' if the effect of the command does not change either the contents nor the current position of the parameter. By the nature of this definition, no command can be said to use a tape in read mode, since any operation changes the position of a tape. For mass storage files, however, many commands neither change the pointer nor write, and can be said to use the file in read mode. When the dispatcher is scheduling commands, it permits simultaneous execution of multiple read mode references to a file, but is forced to serialise write mode references.

Dispatcher Operating Cycle

The dispatcher makes an examination of command status every time the queue HAPPEN is V'd. This occurs whenever the scanner submits another command for execution, and whenever a currently executing command completes. On any given examination of the command list, the dispatcher looks at each command, one at a time, in the order of submission and makes the following tests.

File Availability

Each file or block parameter to the command is examined. if the file or block is marked available, the command may be executed. If the file is not available, and this command uses the file only in read mode, and no other command is writing the file, the command will be selected. If the file is in use for writing, or this command writes the file, the command cannot be selected.

Previous Reference Conflict Check

The file availability check is then extended to commands not yet executed, but submitted prior to the command being examined. If the command in question writes the file, then the command may not be selected if any earlier command references the file. If the command in question only reads the file, then the command may be selected so long as no previous command writes the file. This prevents the case where a command referencing file 'b' is skipped because it also references busy file 'a', and a later command referencing 'b' only is started out of sequence.

Command Selection Setup

If the above sequence of checks succeeds, all files and blocks used by the command are marked 'in use', and either the read count is incremented or the write lock is set depending upon whether the command uses the file in read mode or write mode. The command is removed from CMDQUE and inserted on the in-progress queue INPROCQ. the address of the command activity is loaded, and a final check is made to verify whether any file referenced by this command has been roadblocked. If so, the address of the ROADBLOCK routine is substituted for the command routine address. then, a command activity is created with FORK, and the dispatcher loops to the next command.

Final Deactivation

Should the dispatcher fail to find any command to execute on a pass through the queue, it will execute a P on HAPPEN, and await a change in the state of affairs. When the scanner detects an end-of-file, a special command packet with a zero handler address is submitted for execution. When this packet is the only packet remaining for execution, the dispatcher knows its job is done, and terminates.

The Command Activities

General Design Principles

A command activity is created by the Dispatcher for each statement or substatement to be executed. The activity receives control via FORK when the dispatcher selects the statement or substatement for execution, and relinquishes control to COMPLETE only when the statement has been completed. Some simple commands are performed entirely by the command activity itself, while more complex command activities make use of the various service activities to perform reading, writing, dump editing, and so forth. It is the responsibility of the command activity to create the required service activities and to insure their orderly termination before the command activity itself terminates.

The command activity receives control with X8 pointing to the command descriptor buffer. Attached to the command descriptor buffer are all parameters affecting the command. The command activity examines the parameter chain and from it decides what action is called for. Parameter types need not be checked for non-omittable parameters since the scanner verifies validity as the parameters are scanned.

REP: A Simple Command Activity

The REP command is serviced by a very simple command activity. The activity needs to perform no I/O, so no service activities are called for. Consequently, the command processing simply consists of obtaining the parameters, applying the specified replacement, and restoring the corrected block. The command then completes.

COPY: A More Complex Command Activity

COPY illustrates the command activity which uses service activities to perform operations. COPY needs an activity to read in the input file, and an activity to write into the output file. is uses the standard INPUT and OUTPUT routines for this, setting up their parameters so that they will copy as specified and terminate at the correct time. Since no intervention is required until the termination of the read/write sequence, COPY deactivates awaiting the completion signal from OUTPUT. when that is received, the I/O file control tables are released and control is passed to COMPLETE.

The Complete Routine

COMPLETE is a routine which handles termination of a synchronous command. Before a command activity goes to COMPLETE it must terminate all service activities it has created and release any buffers allocated in the process of carrying out the command.

COMPLETE first logs the command completion if requested to do so by the 'E' option. Then all files used by the command are released from the in-use state. If a ZAP instruction has been executed during command service, clearing the entry address of the command, the files will be roadblocked, otherwise they will be marked as available. For each file or block parameter, the read count will be decremented or the write lock will be cleared, depending upon the mode in which the command used the parameter.

Parameter, patch, and mask buffers are released, and the command is unchained from the in-progress queue and released. The outstanding command count is decremented. Finally, the queue HAPPEN is V'd to cause the dispatcher to review the situation now that the command has finished.