- Section I
-
- Protocols for a Multi-site Data Collection Facility
-
- Introduction
-
The development of computer networks has provided the
- groundwork for distributed computation: one in which a job or task
-
- is comprised of components from various computer systems. In a
-
- single computer system, the unavailability or malfunction of any of
-
- the job components (e.g. program, file, device, etc.) usually
-
- necessitates job termination. With computer networks, it becomes
-
- feasible to duplicate certain job components which previously had no
-
- basis for duplication. (In a single system, it does not matter how
-
- many times a process that performs a certain function is duplicated;
-
- a system crash makes all unavailable). It is such resource
-
- duplication that enables us to utilize the network to achieve high
-
- reliability and load leveling. In order to realize the potential of
-
- resource duplication, it is necessary to have protocols which
-
- provide for the orderly use of these resources. In this document,
-
- we first discuss in general terms a problem of protocol definition
-
- for interacting with a multiply defined resource (server). The
-
- problem deals with providing a highly reliable data collection
-
- facility, by supporting it at many sites throughout the network. In
-
- the second section of this document, we describe in detail a
-
- particular implementation of the protocol which handles the problem
-
- of utilizing multiple data collector processes for collecting
-
- accounting data generated by the network TIPs. This example also
-
- illustrates the specialization of hosts to perform parts of a
-
- computation they are best equipped to handle. The large network
-
- hosts (TENEX systems) perform the accounting function for the small
-
- network access TiPs.
-
The situation to be discussed is the following: a data
- generating process needs to use a data collection service which is
-
- duplicately provided by processes on a number of network machines.
-
- A request to a server involves sending the data to be collected.
-
- An Initial Approach
-
The data generator could proceed by selecting a particular
- server and sending its request to that server. It might also take
-
- the attitude that if the message reaches the destination host (the
-
- communication subsystem will indicate this) the message will be
-
- properly processed to completion. Failure of the request Message
-
- would then lead to selecting another server, until the request
-
- succeeds or all servers have been tried.
-
-2-
Such a simple strategy is a poor one. It makes sense to
- require that the servicing process send a positive acknowledgement
-
- to the requesting process. If nothing else, the reply indicates
-
- that the server process itself is still functioning. Waiting for
-
- such a reply also implies that there is a strategy for selecting
-
- another server if the reply is not forthcoming. Herein lies a
-
- problem. If the expected reply is timed out, and then a new request
-
- is sent to another server, we run the risk of receiving the
-
(delayed) original acknowledgement at a later time. This could
- result in having the data entered into the collection system twice
-
(data duplication). If the request is re-transmitted to the same
- server only, we face the possibility of not being able to access a
-
- collector (data loss). In addition, for load leveling purposes, we
-
- may wish to send new requests to some (or all) servers. We can then
-
- use their reply (or lack of reply) as an indicator of load on that
-
- particular instance of the service. Doing this without data
-
- duplication requires more than a simple request and acknowledgement
-
- protocol*.
-
- Extension of the Protocol
-
The general protocol developed to handle multiple collection
- servers involves having the data generator send the data request to
-
- some (or all) data collectors. Those willing to handle the request
-
- reply with an "I've got it" message. They then await further
-
- notification before finalizing the processing of the data. The data
-
- generator sends a "go ahead" message to one of the replying
-
- collectors, and a "discard" message to all other replying
-
- collectors. The "go ahead" message is the signal to process the
-
- data (i.e. collect permanently), while the "discard" message
-
- indicates that the data is being collected elsewhere and should not
-
- be retained.
-
The question now arises as to whether or not the collector
- process should acknowledge receipt of the "go ahead" message with a
-
- reply of its own, and then should the generator process acknowledge
-
- this acknowledgement, etc. We would like to send as few messages as
-
- possible to achieve reliable communication. Therefore, when a state
-
--------------------
* If the servers are independent of each other to the extent that if
- two or more servers all act on the same request, the end result is
-
- the same as having a single server act on the request, then a simple
-
- request/acknowledgement protocol is adequate. Such may be the case,
-
- for example, if we subject the totality of collected data (i.e. all
-
- data collected by all collectors for a certain period) to a
-
- duplicate detection scan. If we could store enough context in each
-
- entry to be able to determine duplicates, then having two or more
-
- servers act on the data would be functionally equivalent to
-
- processing by a single server.
-
-3-
- is reached for which further acknowledgements lead to a previously
-
- visited state, or when the cost of further acknowledgements outweigh
-
- the increase in reliability they bring, further acknowledgements
-
- become unnecessary.
-
The initial question was should the collector process
- acknowledge the "go ahead" message? Assume for the moment that it
-
- should not send such an acknowledgement. The data generator could
-
- verify, through the communication subsystem, the transmission of the
-
"go ahead" message to the host of the collector. If this message
- did not arrive correctly, the generator has the option of
-
- re-transmitting it or sending a "go ahead" to another collector
-
- which has acknowledged receipt of the data. Either strategy
-
- involves no risk of duplication. If the "go ahead" message arrives
-
- correctly, and a collector acknowledgement to the "go ahead" message
-
- is not required, then we incur a vulnerability to (collector host)
-
- system crash from the time the "go ahead" message is accepted by the
-
- host until the time the data is totally processed. Call the data
-
- processing time P. Once the data generator has selected a
-
- particular collector (on the basis of receiving its "I've got it"
-
- message), we also incur a vulnerability to malfunction of this
-
- collector process. The vulnerable period is from the time the
-
- collector sends its "i've got it" message until the time the data is
-
- processed. This amounts to two network transit times (2N) plus IMP
-
- and host overhead for message delivery (0) plus data processing time
-
(P). [Total time=2N+P+O]. A malfunction (crash) in this period can
- cause the loss of data. There is no potential for duplication.
-
Now, assume that the data collector process must acknowledge
- the "go ahead" message. The question then arises as to when such an
-
- acknowledgement should be sent. The reasonable choices are either
-
- immediately before final processing of the data (i.c. before the
-
- data is permanently recorded) or immediately after final processing.
-
- It can be argued that unless another acknowledgement is required (by
-
- the generator to the collector) to this acknowledgement BEFORE the
-
- actual data update, then the best time for the collector to
-
- acknowledge the "go ahead" is after final processing. This is so
-
- because receiving the acknowledgement conveys more information if it
-
- is sent after processing, while not receiving it (timeout), in
-
- either case, leaves us in an unknown state with respect to the data
-
- update. Depending on the relative speeds of various network and
-
- system components, the data may or may not be permanently entered.
-
- Therefore if we interpret the timeout as a signal to have the data
-
- processed at another site, we run the risk of duplication of data.
-
- To avoid data duplication, the timeout strategy must only involve
-
- re-sending the "go ahead" message to the same collector. This will
-
- only help if the lack of reply is due to a lost network message.
-
- Our vulnerability intervals to system and process malfunction remain
-
- as before.
-
It is our conjecture (to be analyzed further) that any further
- acknowledgements to these acknowledgements will have virtually no
-
- effect on reducing the period of vulnerability outlined above. As
-
- such, the protocol with the fewest messages required is superior.
-
-4-
- Data Dependent Aspects of the Protocol
-
As discussed above, a main issue is which process should be the
- last to respond (send an acknowledgement). If the data generator
-
- sends the last message (i.e. "go ahead"), we can only check on its
-
- correct arrival at the destination host. We must "take on faith"
-
- the ability of the collector to correctly complete the transaction.
-
- This strategy is geared toward avoiding data duplication. If on the
-
- other hand, the protocol specifies that the collector is to send the
-
- last message, with the timeout of such a message causing the data
-
- generator to use another collector, then the protocol is geared
-
- toward the best efforts of recording the data somewhere, at the
-
- expense of possible duplication.
-
Thus, the nature of the problem will dictate which of the
- protocols is appropriate for a given situation. The next section
-
- deals in the specifics of an implement;tion of a data collection
-
- protocol to handle the problem of collecting TIP accounting data by
-
- using the TENEX systems for running the collection server processes.
-
- It is shown how the general protocol is optimized for the accounting
-
- data collection.
-
- Section II
-
- Protocol for TIP-TENEX Accounting Server Information Exchange
-
- Overview of the Facility
-
When a user initially requests service from a TIP, the TIP will
- perform a broadcast ICP to find an available RSEXEC which maintains
-
- an authentication data base. The user must then complete s login
-
- sequence in order to authenticate himself. If he is successful the
-
- RSEXEC will transmit his unique ID code to the TIP. Failure will
-
- cause the RSEXEC to close the connection and the TIP to hang up on
-
- the user. After the user is authenticated, the TIP will accumulate
-
- accounting data for the user session. The data includes a count of
-
- messages sent on behalf of the user, and the connect time for the
-
- user. From time to time the TIP will transmit intermediate
-
- accounting data to Accounting Server (ACTSER) processes scattered
-
- throughout the network. These accounting servers will maintain
-
- files containing intermediate raw accounting data. The raw
-
- accounting data will periodically be collected and sorted to produce
-
- an accounting data base. Providing a number of accounting servers
-
- reduces the possibility of being unable to find a repository for the
-
- intermediate data, which otherwise would be lost due to buffering
-
- limitations in the TiPs. The multitude of accounting servers can
-
- also serve to reduce the load on the individual hosts providing this
-
- facility.
-
-5-
- The rest of this document details the protocol that has been
-
- developed to ensure delivery of TIP accounting data to one of the
-
- available accounting servers for storage in the intermediate
-
- accounting files.
-
- Adapting the Protocol
-
- The TIP to Accounting Server data exchange uses a protocol that
-
- allows the TIP to select for data transmission one, some, or all
-
- server hosts either sequentially or in parallel, yet insures that
-
- the data that becomes part of the accounting file does not contain
-
- duplicate information. The protocol also minimizes the amount of
-
- data buffering that must be done by the limited capacity TiPs. The
-
- protocol is applicable to a wide class of data collection problems
-
- which use a number of data generators and collectors. The following
-
- describes how the protocol works for TIP accounting.
-
- Each TIP is responsible for maintaining in its memory the cells
-
- indicating the connect time and the number of messages sent for each
-
- of its current users. These cells are incremented by the TIP for
-
- every quantum of connect time and message sent, as the case may be.
-
- This is the data generation phase. Periodically, the TIP will scan
-
- all its active counters, and along with each user ID code, pack the
-
- accumulated data into one network message (i.e. less than 8K bits).
-
- The TIP then transmits this data to a set of Accounting Server
-
- processes residing throughout the network. The data transfer is
-
- over a specially designated host-host link. The accounting servers
-
- utilize the raw network message facility of TENEX 1.32 in order to
-
- directly access that link. When an ACTSER receives a data message
-
- from a TIP, it buffers the data and replies by returning the entire
-
- message to the originating TIP. The TIP responds with a positive
-
- acknowledgement ("go ahead") to the first ACTSER which returns the
-
- data, and responds with a negative acknowledgement ("discard") to
-
- all subsequent ACTSER data return messages for this series of
-
- transfers. If the TIP does not receive a reply from any ACTSER, it
-
- accumulates new data (i.e. the TIP has all the while been
-
- incrementing its local counters to reflect the increased connect
-
- time and message count; the current values will comprise new data
-
- transfers) and sends the new data to the Accounting Server
-
- processes. When an ACTSER receives a positive acknowledgement from
-
- a TIP (i.e. "go ahead"), it appends the appropriate parts of the
-
- buffered data to the locally maintained accounting information file.
-
- On receiving a negative acknowledgement from the TIP (i.e.
-
"discard"), the ACTSER discards the data buffered for this TIP. In
-addition, when the TIP responds with a "go ahead" to the first
- ACTSER which has accepted the data (acknowledged by returning the
-
- data along with the "I've got it"), the TIP decrements the connect
-
- time and message counters for each user by the amount indicated in
-
- the data returned by the ACTSER. This data will already be
-
- accounted for in the intermediate accounting files.
-
- As an aid in determining which ACTSER replies are to current
-
- requests, and which are tardy replies to old requests, the TIP
-
-6-
- maintains a sequence number indicator, and appends this number to
-
- each data message sent to an ACTSER. On receiving a reply from an
-
- ACTSER, the TIP merely checks the returned sequence number to see if
-
- this is the first reply to the current set of TIP requests. If the
-
- returned sequence number is the same as the current sequence number,
-
- then this is the first reply; a positive acknowledgement is sent
-
- off, the counters are decremented by the returned data, and the
-
- sequence number is incremented. If the returned sequence number is
-
- not the same as the current one (i.e. not the one we are now
-
- seeking a reply for) then a negative acknowledgement is sent to the
-
- replying ACTSER. After a positive acknowledgement to an ACTSER (and
-
- the implied incrementing of the sequence number), the TIP can wait
-
- for more information to accumulate, and then start transmitting
-
- again using the new sequence number.
-
- Further Clarification of the Protocol
-
- There are a number of points concerning the protocol that
-
- should be noted.
-
- 1 The data generator (TIP) can send different (i.e. updated
-
- versions) data to different data collectors (accounting servers) as
-
- part of the same logical transmission sequence. This is possible
-
- because the TIP does not account for the data sent until it receives
-
- the acknowledgement of the data echo. This strategy relieves the
-
- TIP of any buffering in conjunction with re-transmission of data
-
- which hasn't been acknowledged.
-
- 2 A new data request to an accounting server from a TIP will
-
- also serve as a negative acknowledgement concerning any data already
-
- buffered by the ACTSER for that TIP, but not yet acknowledged. The
-
- old data will be discarded, and the new data will be buffered and
-
- echoed as an acknowledgement. This allows the TIP the option of not
-
- sending a negative acknowledgement when it is not convenient to do
-
- so, without having to remember that it must be sent at a later time.
-
- There is one exception to this convention. If the new data message
-
- has the same sequence number as the old buffered message, then the
-
- new data must be discarded, and the old data kept and re-echoed.
-
- This is to prevent a slow acknowledgement to the old data from being
-
- accepted by the TIP, after the TIP has already sent the new data to
-
- the slow host. This caveat can be avoided if the TIP does not
-
- resend to a non-responding server within the time period that a
-
- message could possibly be stuck in the network, but could still be
-
- delivered. Ignoring this situation may result in some accounting
-
- data being counted twice. Because of the rule to keep old data when
-
- confronted with matching sequence numbers, on restarting after a
-
- crash, the TIP should send a "discard" message to all servers in
-
- order to clear any data which has been buffered for it prior to the
-
- crash. An alternative to this would be for the TIP to initialize
-
- its sequence number from a varying source such as time of day.
-
- 3 The accounting server similarly need not acknowledge receipt
-
- of data (by echoing) if it finds itself otherwise occupied. This
-
- will mean that the ACTSER is not buffering the data, and hence is
-
- not a candidate for entering the data into the file. However, the
-
-7-
- TIP may try this ACTSER at a later time (even with the same data),
-
- with no ill effects.
-
- 4 Because of 2 and 3 above, the protocol is robust with respect
-
- to lost or garbled transmissions of TIP data requests and accounting
-
- server echo replies. That is, in the event of loss of such a
-
- message, a re-transmission will occur as the normal procedure.
-
- 5 There is no synchronization problem with respect to the
-
- sequence number used for duplicate detection, since this number is
-
- maintained only at the TIP site. The accounting server merely
-
- echoes the sequence number it has received as part of the data.
-
- 6 There are, however, some constraints on the size of the
-
- sequence number field. It must be large enough so that ALL traces
-
- of the previous use of a given sequence number are totally reMoved
-
- from the network before the number is re-used by the TIP. The
-
- sequence number is modulo the size of the largest number represented
-
- by the number of bits allocated, and is cyclic. Problems generally
-
- arise when a host proceeds from a service interruption while it was
-
- holding on to a reply. If during the service interruption, we have
-
- cycled through our sequence numbers exactly N times (where N is any
-
- integer), this VERY tardy reply could be mistaken for a reply to the
-
- new data, which has the same sequence number (i.e. N revolutions of
-
- sequence numbers later). By utilizing a sufficiently large sequence
-
- number field (16 bits), and by allowing sufficient time between
-
- instances of sending new data, we can effectively reduce the
-
- probability of such an error to zero.
-
- 7 Since the data involved in this problem is the source of
-
- accounting information, care must be taken to avoid duplicate
-
- entries. This must be done at the expense of potentially losing
-
- data in certain instances. Other than the obvious TIP malfunction,
-
- there are two known ways of losing data. One is the situation where
-
- no accounting server responds to a TIP for an extended period of
-
- time causing the TIP counters to overflow (highly unlikely if there
-
- are sufficient Accounting Servers). In this case, the TIP can hold
-
- the counters at their maximum value until a server comes up, thereby
-
- keeping the lost accounting data at its minimum. The other
-
- situation results from adapting the protocol to our insistence on no
-
- duplicate data in the incremental files. We are vulnerable to data
-
- loss with no recourse from the time the server receives the "go
-
- ahead" to update the file with the buffered data (i.e. positive
-
- acknowledgement) until the time the update is completed and the file
-
- is closed. An accounting server crash during this period will cause
-
- that accounting data to be lost. In our initial implementation, we
-
- have slightly extended this period of vulnerability in order to save
-
- the TIP from having to buffer the acknowledged data for a short
-
- period of time. By updating TIP counters from the returned data in
-
- parallel with sending the "go ahead" acknowledgement, we relieve the
-
- TIP of the burden of buffering this data until the Request for Next
-
- Message (RFNM) from the accounting server IMP is received. This
-
- adds slightly to our period of vulnerability to malfunction, moving
-
- the beginning of the period from the point when the ACTSER host
-
- receives the "go ahead", back to the point when the TIP sends off
-
-8-
- the "go ahead" (i.e. a period of one network transit time plus some
-
- IMP processing time). However, loss of data in this period is
-
- detectable through the Host Dead or Incomplete Transmission return
-
- in place of the RFNM. We intend to record such occurrences with the
-
- Network Control Center. If this data loss becomes intolerable, the
-
- TIP program will be modified to await the RFNM for the positive
-
- acknowledgement before updating its counters. In such a case, if
-
- the RFNM does not come, the TIP can discard the buffered data and
-
- re-transmit new data to other servers.
-
- 8 There is adequate protection against the entry of forged data
-
- into the intermediate accounting files. This is primarily due to
-
- the system enforced limited access to Host-Imp messages and
-
- Host-Host links. In addition, messages received on such designated
-
- limited access links can be easily verified as coming from a TIP.
-
- The IMP subnet appends the signature (address) of the sending host
-
- to all of its messages, so there can be no forging. The Accounting
-
- Server is in a position to check if the source of the message is in
-
- fact a TIP data generator.
-
- Current Parameters of the Protocol
-
- In the initial implementation, the TIP sends its accumulated
-
- accounting data about once every half hour. If it gets no positive
-
- acknowledgement, it tries to send with greater frequency (about
-
- every 5 minutes) until it finally succeeds. It can then return to
-
- the normal waiting period. (A TIP user logout introduces an
-
- exception to this behavior. In order to re-use the TIP port and its
-
- associated counters as soon as possible, a user terminating his TIP
-
- session causes the accounting data to be sent immediately).
-
- initially, our implementation calls for each TIP to remember a
-
"favored" accounting server. At the wait period expiration, the TIP
- will try to deposit the data at its "favored" site. If successful
-
- within a short timeout period, this site remains the favored site,
-
- and the wait interval is reset. If unsuccessful within the short
-
- timeout, the data can be sent to all servers*. The one replying
-
- first will update its file with the data and also become the
-
"favored" server for this TIP. With these parameters, a host would
- have to undergo a proceedable service interruption of more than a
-
- year in order for the potential sequence number problem outlined in
-
(6) above to occur.
- Concluding Remarks
-
- When the implementation is complete, we will have a general
-
- data accumulation and collection system which can be used to gather
-
- a wide variety of information. The protocol as outlined is geared
-
- to gathering data which is either independent of the previously
-
- accumulated data items (e.g. recording names), or data which
-
- adheres to a commutative relationship (e.g. counting). This is a
-
-9-
- consequence of the policy of retransmission of different versions of
-
- the data to different potential collectors (to relieve TIP buffering
-
- problems).
-
- In the specified version of the protocol, care was taken to
-
- avoid duplicate data entries, at the cost of possibly losing some
-
- data through collector malfunction. Data collection problems which
-
- require avoiding such loss (at the cost of possible duplication of
-
- some data items) can easily be accommodated with a slight adjustment
-
- to the protocol. Collected data which does not adhere to the
-
- commutative relationship indicated above, can also be handled by
-
- utilizing more buffer space at the data generator sites.
-
- The sequence number can be incremented for this new set of data
-
- messages, and the new data can also be sent to the slow host. In
-
- this way we won't be giving the tardy response from the old favored
-
- host unfair advantage in determining which server can respond most
-
- quickly. If there is no reply to this series of messages, the TIP
-
- can continue to resend the new data. However, the sequence number
-
- should not be incremented, since no reply was received, and since
-
- indiscriminate incrementing of the sequence number increases the
-
- chance of recycling during the lifetime of a message.
-
-10-