Physical Methods of Speed-Independent Module Design

Oleg Izosimov

INTEC Ltd, Room 321, 7a Myagi Street, Samara 443093, Russia

1. Introduction

Any method of logic circuit design is based on using formal models

of gates and wires. The simplest model of a gate is determined by only two

«parameters»: (a) Boolean function is to be calculated, (b) fixed

propagation delay. The simplest model of a wire is an ideal medium with

zero resistance and consequently, with zero delay. Such simple models allow

circuit design procedures which are a sequence of elementary steps easily

realized by a computer.

When logic circuits designed by using the simplest models expose

unreliable operation as in the case of gate delay variations, designers

introduce less convenient but more realistic models with arbitrary but

finite delay. Using more complicated models may produce logic circuits that

are called speed-independent [1].

In speed-independent circuits transition duration can be arbitrary.

So a centralized clock cannot be used. Instead special circuitry to detect

output validity is applied. Besides, additional interface circuitry is

needed to communicate with the environment in a handshaking manner. A speed-

independent circuit can be seen as a module consisting of combinational

logic (CL) proper, CL output validity detector (OVD) and interface

circuitry (Fig.1). To enable OVD to distinguish valid output data from

invalid ones, the redundant coding scheme was proposed [2]. The main idea

of the scheme is to enumerate all possible input and output data, both

valid and invalid. The OVD must be provided with appropriate information on

data validity. To realize the idea of redundant coding some constraints on

CL design are imposed [3]:

[pic]

(i) CL must be free of delay hazards, i.e. CL output data word must not

be dependent on the relative delay of signal paths through CL.

(ii) In changing between input states, any intermediate or transient

states that are passed through must not be mapped by CL onto valid output

states.

When these constraints were formulated, the circuit designers

realised that not every Boolean description could be implemented in a speed-

independent style. Other approaches to speed-independent module design were

needed.

SIM design as a science has two branches: logical and physical. For

a long time physical branch was overshadowed in spite of its

competitiveness. The main properties of physical approach to SIM design

are:

(a) Arbitrary coding scheme.

(b) Conventional procedure of operational unit design.

(c) Races of signals in SIM do not affect on its proper operation.

In this paper we propose an approach based on the physical nature of

transitions in CL. We believe that each transition is actually a transfer

of energy which can be naturally detected by physical methods.

From the viewpoint of a radio engineer CL behaves like a radio

transmitter. It emits radio frequencies in the 108-1010Hz band modulated by

signals of 106-108Hz. Obviously, the carrier wave is produced by gate

switchings during transitions in CL. The modulating wave is produced by

control schemes (OVD and interface circuitry) that detect transition

completion and inform the environment about the readiness of CL. OVD is a

kind of radio receiver that extracts the modulation envelope and enhances

the received signal. The main properties that OVD circuit must expose from

a radio engineer's point of view are selectivity and high gain. Since the

useful signals can propagate through non-conducting medium, OVD circuits

can be coupled with CL indirectly.

Advances in semiconductor technology gave birth to two methods of

transition detecting based on two kinds of the information carrying signal,

namely electromagnetic radiation and current consumption. Frequency of the

signal produced by switching logic gates is determined by gate delay.

For instance, CMOS network of 1-ns gates produces 1-GHz signal, ECL

array of 100-ps gates gives 10-GHz radiation. Logic circuits consisting of

10-ps gates will emit infra-red radiation. That signal could be easily

detected by photosensitive devices.

2. Background

Let us have a closer look at the structure of speed-independent

modules (SIM) as presented in Fig.1. All input data are processed in CL,

all output data are obtained from CL, too. So, CL is the only unit in SIM

which is involved in proper data processing. The result of that processing

is specified by Boolean functions. Algorithms for calculating the Boolean

function are realised by the internal structure of CL. Generally, its

structure is series-parallel as well as algorithm implemented.

When n-bit data word is put into the CL, n or more signal

propagation paths (SPPs) can be activated concurrently. So, one can say

that the calculation of a Boolean function by CL is of parallel nature. On

the other hand, each SPP is a gate chain which processes data in a serial

manner. So, calculation in CL is also of sequential nature.

The OVD circuit is intended for detecting transient and steady

«states» of CL. If any SPP in CL is still «active», CL is in transient

state, otherwise it is in steady state. Each gate switching results in both

logical and electromagnetic effects on its surrounding medium. The logical

effects of switching has been heavily investigated; we consider physical

one.

To provide speed-independence of the module the OVD and interface

circuitry must also work in a speed-independent mode. This means that any

arbitrary but finite transistor or wire delay cannot impair proper

operation of OVD and interface circuitry.

The interface circuitry is a mediator between OVD and environment of

SIM. It implements any kind of signalling convention, commonly a two- or

four-cycle one [4] based on request Req and acknowledgement Ack signal

using. The interface circuitry receives the output validity (OV) signal

from the OVD circuit, a Req signal from the environment and transmits an

Ack signal to the environment (Fig.1).

Consider an algorithm of operation for interface circuitry realizing

speed-independent four-cycle signalling convention (FCSC). In accordance

with FCSC the control signals must go in the following sequence: Req+OV-

Ack+Req-Ack- where «+» corresponds to rising the signal and «-» corresponds

to falling the signal. All signals are assumed to adhere to positive logic.

Initially the signals Req and Ack are low, the signal OV is high. If the

environment state changes, the Req signal rises and transient state of CL

occurs (OV-). Upon completion of the transitions in CL, signal OV rises and

the interface circuitry generates the Ack signal rising. After that the

environment produces a falling Req signal and then the interface circuitry

transmits the falling Ack signal to the environment. All the signals have

to be reset into the initial state.

To develop the interface circuitry a circuit designer must take into

account that any OVD circuit has finite (non-zero) turn-on delay ton. This

means that OVD cannot respond on transitions of short duration t tr< ton .

An example of interface circuitry is shown in Fig.2. It contains a

flip-flop, a NOR-gate, an asymmetrical delay and an inverter as an output

stage [5].

[pic]

The asymmetrical delay is intended for delaying Req rising signal

for + period where + > ton . Delaying Req falling signal noted — is to be

as short as possible. Note that speed-independent operation of interface

circuitry is vulnerable to delay + variation. If + becomes less than ton ,

proper operation of SIM can not be guaranteed. Otherwise, if + is much

more than ton , performance of SIM will be significantly reduced. To

provide exact accordance of + and ton a circuit emulator can be used.

Such an emulator is either an exact copy of OVD or its functional

copy, i.e. resistive-capacitive model of OVD's critical path. In the chip

the emulator must be placed next to active OVD circuit in order to ensure

identical conditions of fabrication and operation.

In this example we use a simplified asymmetrical delay implemented

as an asymmetrical CMOS inverter chain (Fig.3). Contrary to the common

inverter an asymmetrical one has non-equal rise and fall times of output

signal.

[pic]

A time diagram for interface circuitry is presented in Fig.4 for two

cases: (a) ttr < ton and (b) ttr ton. In case (a) the signal sequence

Req+Ack+ is formed for (++tNOR) period where tNOR is a NOR-gate delay. In

case (b) the above sequence is formed for (ttr +toff+tNOR) duration where

toff is a turn-off delay of OVD circuit. When the SIM returns to the

initial steady state, the signal sequence Req-Ack- is formed for (-+tNOR)

interval.

[pic]

After considering the SIM in operation it is obvious that the main

problems of the module design are in the area of CL and OVD interaction.

This includes (a) kind of signal used as a carrier of information about CL

output validity, and (b) method of OVD circuit design.

4. Current consumption detection

Using current consumption of CMOS CL for output validity detection

was proposed in 1990 [7]. Contrary to the method of EMR detection this one

is based on introducing direct coupling of source and receiver. While CL is

in steady state it consumes current of about 10-9-10-8A which does not

allow OVD switching. The interface circuitry gets information on CL output

validity and in turn informs the environment about CL readiness to input

data processing. When an input data arrives CL changes its state to

«transient», current consumption increases to 10-4-10-2A, which switches

the OVD, thus informing the interface circuitry about output invalidity.

The latter lets the environment know about CL business.

After the computations in the CL are finished, the current consumption

decreases down to the steady state value, and the OVD sends a signal of

output validity.

4.1 Information carrying signal

Current consumption by CMOS CL contains useful information on CL

state. CMOS CL is a network of CMOS gates, so the current consumed by CL is

a superposition of currents consumed by CMOS gates included in the CL.

Each CMOS gate contains PMOS transistor and NMOS transistor networks

(Fig.5). While a gate is in a steady state either the PMOS or the NMOS

network is in a conducting mode. When a gate switches the non-conducting

transistor network becomes conducting. There is usually a short period in

switching time when both networks are in a conducting mode.

[pic]

Generally, current consumed by a CMOS gate includes three

components [9,10]:

(a) leakage current Ilk passing between power supply and ground due

to finite resistance of non-conducting transistor network;

(b) short-circuit current Isc flowing while both networks are in a

conducting mode;

(c) load capacitance CL charge current ILC flowing while a CMOS gate

is switching from low to high output voltage via conducting PMOS network

and CL .

SPICE simulation has shown [5] that amplitude of current consumed by

a typical CMOS inverter depends on CL and is limited by the non-zero

resistance of the conducting PMOS network (Fig.7). The integral of consumed

current is proportional to CL . When a gate switches from high to low

output voltage, the component ILC is negative by direction and negligible

by value (Fig.7b). It is evident, the switchings from high to low output

voltage occur at the expense of energy accumulated in CL during the

previous switching from low to high output voltage. The component Isc does

not depend on direction in which a gate switches.

[pic]

[pic]

The component ILC equals to ILC = CLVdd f where Vdd is a power

supply voltage, f is a gate switching frequency. Veendrick has investigated

the component Isc dependencies on CL and rise-fall time of input potential

signal [10]. He showed that if both input and output signal have the same

rise-fall time, the component Isc cannot be more than 20 percent of summary

current consumption [10]. However, when the output signal rise-fall time is

less than input one, the component Isc can be of the same order of

magnitude as ILC. In that case it must be taken into account. As to the

component Ilk, it entirely depends on CMOS process parameters and for state

of the art CMOS devices Ilk is about 10-15 -10-12 A.

So, the analysis of CMOS gate current consumption allows us to

conclude that in transient state a CMOS gate consumes a current I=

Ilk+Isc+ILC and in steady state it consumes only Ilk<< I . The difference

between two states from the viewpoint of current consumption is several

orders of magnitude. So, CMOS gate output validity detection is possible,

both in principle and in practice.

In Section 2 we presented series-parallel model of computations in

CL. We showed that in every moment during switching current consumed by CL

is a superposition of the currents consumed on the activated signal

propagation paths (SPPs). Now, considering CL implemented by CMOS devices

we should note that while logical signal propagates through SPP the

neighbouring gates switch in opposite directions. That is why a curve of

current consumed by a ten inverter chain (Fig.8) looks like a combination

of crests and troughs. Nevertheless, in the very lowest point of the curve

the current consumed by CL in a transient state remains several orders more

than in a steady state.

[pic]

4.2 OVD implementation

The proposed OVD circuit, shown in Fig.9, is a threshold circuit

translating an analog current signal I into a logical signal OV.

[pic]

The OVD circuit contains a current-to-voltage converter (CVC)

consisting of the resistor R1 and the diode D1. The OVD also contains a

comparator implemented by the MOS transistors M1-M7 and resistors R2,,,R3 .

CMOS CL consumes the current I and introduces a capacitance Cin . The

capacitance Cout represents the load caused by the interface circuitry. A

low potential output signal of OVD corresponds to CL output validity. A

high potential output signal corresponds to CL output invalidity. So, OVD

generates OV signal in negative logic manner.

The transfer characteristics of CVC is determined by a system of

three equations:

[pic] [pic]

where I is an input current of CVC, V is a voltage drop on the CVC circuit,

Ir is a current flowing through the resistor R1, Id is a current passing

through the diode D1, I0 is a leakage current of the diode, rb is a bulk

resistance of the diode. Here [pic] stands for kT/q where k is Boltzmann's

constant, T is absolute temperature, q is charge of an electron.

Equations (1)-(3) determine the functional connection F between

input current I and voltage drop V: [pic]. Graphic solution of the system

is shown in Fig.10.

[pic]

CVC parameters to be calculated are R1 and rb. Initial data for

calculating R1 are the threshold voltage drop Vth and corresponding

threshold input current Ith . Value Ith is determined by minimal current

consumed by CMOS CL in transient state. Initial data for calculating rb are

maximal voltage drop Vmax and corresponding maximal input current Imax.

Value Imax is determined by the maximal number of gates in CL switching

simultaneously and their load capacitances.

The comparator chosen is the CMOS ECL receiver proposed by Chappell

et al.[11]. The circuit includes a single differential amplifier stage with

built-in compensation for parameter variations, followed by a CMOS

inverter. The comparator has 100-mV worst-case sensitivity in 1-m

technology. Detailed static and dynamic analysis of the comparator circuit

was given in [11].

The comparator compares input voltage signal Vin with reference

voltage Vref. If Vin <Vref the comparator output signal equals to logical

zero which means that CL outputs are valid. Otherwise, Vin >Vref, the

comparator output signal equals to logical «one» which means that the

outputs are invalid.

As it follows from the OVD circuit configuration,

[pic] [pic]

where Vdd is a voltage of power supply.

Equations (4) and (5) allow us to calculate the threshold voltage

drop V of the CVC circuit:

since [pic], so [pic] [pic]

If 0<V<500mV then the diode D1 of CVC operates in the very small

current region Id 0 and Id <<Ir. So the component Id in the Equation (1)

can be neglected and IIr =V/R1 .

For practical values of [pic] the threshold input current of the

OVD circuit is reversely proportional to the resistance of R1 : [pic].

Substituting Equation (6) yields

[pic].

As to choosing value of rb it must be done with regard to maximal

voltage drop Vmax .

If V>750mV, the diode D1 is in active mode and while rb <<R1 the

condition Ir <<Id is true. So, in the large current region IId and Equation

(2) determines an almost linear dependence between I and V. For instance,

if the maximal voltage drop Vmax =900mV and maximal input current Imax=2mA,

then in accordance with the Equation (2) rb 100. Typical element values

for the OVD circuit with Vth =400mV are given in Table 1.

[pic]

The turn-on ton and turn-off toff delays of the OVD circuit depend

on the OVD itself and the CMOS CL as well. (Switching the OVD output from

low to high voltage is called «turning-on» and reverse switching is called

«turning-off».)

Consider a piece of CMOS CL and its interaction with OVD circuit

(Fig.11). The piece is an SPP including N logic gates. Each gate is shown

symbolically as a connection of PMOS and NMOS networks. All the

capacitances affecting ton and toff can be brought down to three

components:

(i) CLi is the load capacitance of the i-th gate;

(ii) Cpsi is the power supply bus capacitance associated with the i-th

gate;

(iii) Cin is the input capacitance of the OVD circuit.

[pic]

Let pi is a probability of the i-th gate being in the state of high

output potential. In this state the capacitance CLi is connected with power

supply bus through the low channel resistance of turned-on transistors in

PMOS network of the i-th gate. Then equivalent capacitance Ceq connected to

the OVD circuit input equals

[pic] (7)

where N is a number of gates in the considered SPP. Here the resistance of

conducting PMOS network is assumed to be negligible.

Equation (7) is also true for CL including several SPPs. In that

case summing must be carried out for all the gates belonging to CL.

Simulation shows that ton and toff are proportional to the OVD time

constant =R1Ceq. It was also obtained that when N>20, the component under

the sign of summation in Equation (7) can be much larger than the

component Cin. Due to voltage drop V the effective power supply voltage is

reduced and CL performance is decreased by about 35 percent [7].

In order to make SIM operating faster special attention must be paid

to reducing the capacitance introduced by CL.

4.3 Speed-independent address bus

The simplest case of CL is a scheme degenerated into a set of wires

called a multi-bit bus. Let us develop the OVD circuit for such a CL.

Multi-bit bus consists of several lines. Each line can be

considered as a medium for signal propagating from one end of the chip

to another. Delay of signal propagation through a line depends on several

factors:

(a) output impedance and symmetry of driver circuit;

(b) initial state of the line: if driver is symmetrical, line switching

from high to low voltage lasts shorter than reverse switching;

(c) electrical properties of the line as a signal propagation medium

(resistance of conducting layer and capacitances between the line and other

wires next to it);

(d) length of the line;

(e) input impedance and sensitivity of receiving circuit.

Since different lines of the bus operate in different conditions

(a)-(e), signal propagation delays are different, too. From the standpoint

of environment the bus behaves like any other more complicated CL.

Asynchronous RAM designers use a bus transition detector since 1980s

[13-15]. Such a detector is usually based on double-rail address coding

and two series connected transistors for each address bit [15]. One of

the transistors receives the true address signal and the other receives

the complementary address signal of the particular address bit. For any

steady state condition one of the transistors will be turned on and one

will be turned off. There will be a finite rise and fall time during a

transition of the address bit. There is a short time during which both

transistors are conducting. The establishment of the conductive path

provides the detection of the address transition. In the first

asynchronous RAMs the output signal of the transition detector is used for

bit line precharging and for enabling/disabling sense amplifiers and

peripheral circuitry.

Self-timed RAM announced in 1983 [14] used transition detectors not for

address transition only but also for detecting read/write completion and

address/bit line precharge completion as well.

The CMOS transition detector was invented in 1986 [15]. This circuit

is also based on double-rail coding and uses a pair of series-connected

NMOS transistors (Fig.12). The scheme for n-bit bus control contains n line

transition detectors (LTDs) and n AND-gates. Outputs of AND-gates are

united in node M forming wired OR. The output inverter serves as a pulse

shaper. Capacitors C1 and C2 are intended to prolong rise time of the LTD

output signal (true and complementary). This is necessary for reliable

detection.

[pic]

The main drawback of the circuit is speed dependence. One can see

that if true and complementary address bit signal have different

propagation delays, the conducting path via NMOS transistors will never be

formed.

Using the OVD circuit proposed in Section 4.2 as LTD we can avoid

this drawback.

Note that address transmission through the address bus is

unidirectional. So to detect completion of bus transition it is enough to

recognize the bus state at the destination end. For this purpose we modify

CL to consist of n lines. The modification means introducing n LTDs, each

actually a CMOS inverter chain. Each chain contains two inverters loaded

with a capacitance (Fig.13). Input of each LTD is connected with

corresponding line of the bus at the destination end. Power supply pads of

all LTDs are connected to the current input of the same OVD circuit.

[pic]

The parameters of the input current signal for the OVD circuit are

varied by

(i) value of capacitances C1 and C2 ;

(ii) dimensions of MOS transistors M1 -M4 .

Since all transitions in CL are of the same duration and can be

lengthened to be outlast the OVD turning-on time, we simplify the

interface circuitry by disallowing the asymmetrical delay.

Due to short duration of normal transition in this CL we must take

into account the integral nature of the sensitivity of the OVD circuit. OVD

sensitivity depends on both amplitude and width of input current pulse.

Simulated operation region of the OVD circuit for current pulses shorter

than 30ns is shown in Fig.14. It is obvious that in this case the threshold

of the OVD circuit must be determined by threshold charge Qth value. The

OVD input charge Q equals to [pic] where I is OVD input current, t is a

moment of time when transition occurs, w is a width of input current

pulse. Turning-on condition for the OVD circuit is Q=Qth.

[pic]

When the LTD circuit shown in Fig.13 is used, the charge value Q is

determined by either C1 or C2. Namely, if the line goes from low to high

voltage, Q=VC2. If the line goes in the reverse direction then [pic] where

V is charging/discharging voltage, approximately equal to the effective

power supply voltage: VVdd -V. Here Vdd is OVD power supply voltage and V

is CVC voltage drop.

The OVD circuit with typical parameters (See Table 1) has a

threshold charge value Qth =4.010-12 C. When C1 =C2 =CL , the minimal value

of CL providing OVD capacity for operation is about 1.010-12 F.

Influence of transistors M1 -M4 dimensions on LTD delay d is

determined by approximation [17]:

[pic]

where ~ is a sign of proportionality, Gn and Gp are the conductances of

NMOS and PMOS transistors respectively (CL =C1 =C2.)

Since [pic] and [pic] where W and L are width and length of

transistor channels of the corresponding conduction type, the LTD delay d

is proportional to [pic].

It has been obtained that for [pic], [pic], CL=1.0pF and Vdd-V=5.0V

the LTD delay d=7.6ns.

When LTD works jointly with the OVD in the speed-independent bus,

the real value of the LTD delay will increase by 30-40 percent due to

OVD's R1 effect on the effective power supply voltage.

To determine the appropriate value of R1 in the OVD circuit we must

know threshold input current Ith corresponding to threshold voltage drop

Vth recommended to be equal to 400mV.

Average input current Iav in transient state of one line is

determined by the expression Iav =CLv where v is the average rate of

increase in the output signal for an inverter included in LTD. For typical

values v=1.0109 Volts per second and CL =1.0pF, Iav =1.0mA. Accepting Ith

=0.4mA and Imax=2.0mA we obtain R1=1k and rb=100.

Simulation has shown that in this case OVD turning-on delay can be

approximated by an empirical expression:

ton[ns]=8.1+0.1n

where n is the address bus bit capacity. Total delay of recognizing address

transition ttot =dg+ton where g is a coefficient of the LTD delay increase

due to reducing power supply voltage. As we showed above g1.35. It can be

seen that if n=32, ttot=21.6ns.

4.4 Speed-independent adder

The circuit we use in this Section as a CL was a touch-stone for

many speed-independent circuit designers for about four decades. We mean a

ripple carry adder (RCA) which is actually a chain of one-bit full adders

(Fig.14).

[pic]

Each full adder calculates two Boolean functions: sum si=aibici and

output carry ci+1=aibi+bici+aici where ai, bi are summands, ci is input

carry and stands for XOR operation.

In 1955 Gilchrist et al. proposed speed-independent RCA with carry

completion signal [18]. In 1960s that circuit was carefully analyzed and

improved [19-21]. In 1980 Seitz used RCA for illustrating his concept of

equipotential region and his approach to self-timed system design [4].

Now we use RCA as a CL for illustrating our approach to SIM design.

As it was shown in Section 4.2 the turn-on and turn-off delays of

the OVD circuit are proportional to the equivalent capacitance Ceq

associated with OVD circuit input. Capacitance Ceq depends linearly on a

number of gates N in CMOS CL. To speed up a SIM it is necessary to reduce a

number N. This can be reached by structural decomposition CMOS CL into

subcircuits CL1, CL2, etc. Each subcircuit CLi is connected to its own

detecting circuit OVDi or directly to the power supply if this subcircuit

transition does not affect the transition duration in CL as a whole. Each

detecting circuit OVDi generates its own OV signal which is combined with

other OVDs' output signals via a multi-input OR (NOR) element. The output

signal of that element serves as OV signal of the CMOS CL.

Multi-bit RCA computation time is determined by length of maximal

activated carry chain. A lot of papers were devoted to analysis of carry

generation and carry propagation in RCA [19-21], many of them contained

their own methods for estimation or calculation of average maximal

activated carry chain. We do not intend to add another one.

Let us have a look inside RCA. As it was mentioned above RCA

consists of one-bit full adders and each full adder consists of two

parts: forming sum si part and forming carry ci+1 part (Fig.16).

In multi-bit RCA all forming sum parts do not interact with each

other and do not affect on transition duration in RCA. Each forming carry

ci+1 part receives ci signal from preceding forming carry part and sends

ci+1 signal to consequent one.

To decompose RCA we use three heuristic tricks:

(i) All forming sum parts we connect directly to power supply.

(ii) We divide each forming carry part into three subcircuits denoted in

Fig.16 by numbers 1,2 and 3. All subcircuits 1 we connect directly to

power supply because they do not contain input ci and so do not contain

carry propagation path.

(iii) All subcircuits 2 we connect to OVD1 and all subcircuits 3 we

connect to OVD2. Outputs of OVD1 and OVD2 are connected to two-input

NOR-gate forming RCA OV signal in positive logic manner (Fig.17).

OVD1 and OVD2 input currents I1 and I2 curves for 6-bit RCA and

longest transition duration are shown in Fig.18.

Accepting Vth1,2=400mV we calculated the OVD circuits parameters. It

was obtained R11=5k, Ith1=0.08mA, R12=3k, Ith2=0.13mA. OVD1 and OVD2 delay

dependencies on a number of bits in RCA are shown in Fig.19.

4.5 Comparison of SIMs with synchronous counterparts

Transition duration in CL is a random variable. Probability of

transition with duration D is determined by implemented Boolean function

and distribution of input logical combinations. Domain of possible values

for variable D occupies the interval [0;Dmax]. Here Dmax is a length of

critical path in CL.

Let [pic] is a mathematical expectation of transition duration in CL

where Di is a length of i-th SPP in CL, pi is a probability of i-th path

being the longest activated SPP.

When CL works in the synchronous mode, the cycle duration Ts is

chosen with regard to maximal transition duration Dmax. Certain margin must

be added to Dmax to provide reliable operation of CL in the case of CL

parameter variations: Ts =kDmax where k is a margin coefficient.

In SIM cycle duration is a random variable with expectation Tsi =

gDme+toff+tif where g is a coefficient of CL delay increasing due to

reducing power supply voltage, toff is turn-off delay of the OVD circuit,

tif is an interface circuitry delay.

We determine efficiency E for speed-independent mode of CL operation

as relative increase of SIM performance in comparison to its synchronous

counterpart:[pic].

Generally, speed-independent mode is more efficient than synchronous

one if Ts >Tsi or, in other words, [pic].

In the case of RCA [pic] where tc is a delay of carry forming part,

n is a number of full adders in RCA.

It has been shown [19] that in n-bit RCA Dme tclog2(5n/4). Then, in

the case of speed-independent operation Tsi=gtclog2(5n/4)+toff+tif.

We have obtained dependencies of Ts , Tsi on a number of bits in

RCA that are shown in Fig.20. As it can be seen, speed-independent

operation of RCA is more efficient while n>8.

5.Conclusion

6.Acknowledgement

I would like to thank Igor Shagurin and Vlad Tsylyov of the Moscow

Physical Engineering Institute for helpful discussions of this work. I am

also grateful to Chris Jesshope of University of Surrey and Mark Josephs of

Oxford University who kindly provided the latest material on their research

in the area of delay-insensitive circuit design.

References

[1] Miller, R.E., Switching theory (Wiley, New York, 1965),

vol.2, Chapter 10.

[2] Unger, S.H., Asynchronous Sequential Switching Circuits

(Wiley, New York, 1969).

[3] Armstrong, D.B., A.D. Friedman, and P.R. Menon, Design of

Asynchronous Circuits Assuming Unbounded Gate Delays, IEEE

Trans.on Computers C-18 (12) (1969) 1110-1120.

[4] Seitz, C.L., System timing, in: C.A. Mead and L.A. Conway,

eds., Introduction to VLSI Systems (Addison-Wesley, New

York, 1980), Chapter 7.

[5] Izosimov, O.A., I.I. Shagurin, and V.V. Tsylyov, Physical

approach to CMOS module self-timing, Electronics Letters 26 (22)

(1990) 1835-1836.

[6] Veendrick, H.J.M., Short-circuit dissipation of static CMOS

circuit and its impact on the design of buffer circuits,

IEEE J. Solid-State Circuits SC-19 (4) (1984) 468-473.

[7] Chappell, B.A, T.I. Chappell, S.E. Schuster, H.M. Segmuller,

J.W. Allan, R.L. Franch, and P.J. Restle, Fast CMOS ECL

receivers with 100-mV worst-case sensitivity, IEEE J. Solid-State

Circuits SC-23 (1) (1988) 59-67.

[8] Chu, S.T., J. Dikken, C.D. Hartgring, F.J. List, J.G.

Raemaekers, S.A. Bell, B. Walsh, and R.H.W. Salters, A 25-ns

Low-Power Full-CMOS 1-Mbit (128K8) SRAM, IEEE J. Solid-State

Circuits SC-23 (5) (1988) 1078-1084.

[9] Frank, E.H., and R.F. Sproull, A Self-Timed Static RAM, in:

Proc. Third Caltech VLSI Conference (Springer-Verlag,

Berlin, 1983) pp.275-285.

[10] Donoghue, W.J., and G.E. Noufer, Circuit for address transition

detection, US Patent 4563599, 1986.

[11] Huang, J.S.T., and J.W. Schrankler, Switching characteristics

of scaled CMOS circuits at 77K, IEEE Trans. on Electron

Devices ED-34 (1) (1987) 101-106.

[12] Gilchrist, B., J.H. Pomerene, and S.Y. Wong, Fast Carry Logic

for Digital Computers, IRE Trans. on Electronic Computers EC-4

(4) (1955) 133-136.

[13] Hendrickson, H.C., Fast High-Accuracy Binary Parallel

Addition, IRE Trans. on Electronic Computers EC-9 (4) (1960)

465-469.

[14] Majerski, S., and M. Wiweger, NOR-Gate Binary Adder with Carry

Completion Detection, IEEE Trans. on Electronic Computers EC-16

(1) (1967) 90-92.

[15] Reitwiesner, G.W., The determination of carry propagation

length for binary addition, IRE Trans. on Electronic Computers

EC-9 (1) (1960) 35-38.

Appendix

SPICE2G.6: MOSFET model parameters

| | | | | | |

| | | | |VALUE | |

| |Name |Parameter |Units |PMOS |NMOS |

|1 |level |model index |- |3 |3 |

|2 |VTO |ZERO-BIAS THRESHOLD VOLTAGE |V |-1.337 |1.161 |

|3 |KP |TRANSCONDUCTANCE | | | |

| | |PARAMETER |A/V2 |2.310-5 |4.610-5 |

|4 |GAMMA |BULK THRESHOLD PARAMETER |[pic] |0.501 |0.354 |

|5 |PHI |SURFACE POTENTIAL |V |0.695 |0.660 |

|6 |RD |DRAIN OHMIC RESISTANCE |OHM |333 |85 |

|7 |RS |SOURCE OHMIC RESISTANCE |OHM |333 |85 |

|8 |CBD |ZERO-BIAS B-D JUNCTION | | | |

| | |CAPACITANCE |F |1.9810-14|6.910-15 |

|9 |CBS |ZERO-BIAS B-S JUNCTION | | | |

| | |CAPACITANCE |F |1.9810-14|6.910-15 |

|10|IS |BULK JUNCTION SATURATION | | | |

| | |CURRENT |A |3.4710-15|9.2210-15|

|11|PB |BULK JUNCTION POTENTIAL |V |0.8 |0.8 |

|12|CGSO |GATE-SOURCE OVERLAP CAPACI- | | | |

| | |TANCE PER METER CHANNEL WIDTH|F/M |6.7010-10|3.3010-10|

|13|CGDO |GATE-DRAIN OVERLAP CAPACI- | | | |

| | |TANCE PER METER CHANNEL WIDTH|F/M |6.7010-10|3.3010-10|

|14|CGBO |GATE-BULK OVERLAP CAPACITANCE| | | |

| | | |F/M |1.9010-9 |2.6010-9 |

| | |PER METER CHANNEL LENGTH | | | |

|15|RSH |DRAIN AND SOURCE DIFFUSION | | | |

| | |SHEET RESISTANCE |OHM/SQ|55 |30 |

|16|CJ |ZERO-BIAS BULK JUNCTION | | | |

| | |BOTTOM | | | |

| | |CAPACITANCE PER SQ METER OF |F/M2 |3.5310-4 |1.2410-4 |

| | |JUNCTION AREA | | | |

|17|MJ |BULK JUNCTION BOTTOM GRADING | | | |

| | |COEFFICIENT |- |0.5 |0.5 |

|18|CJSW |ZERO-BIAS BULK JUNCTION SIDE-| | | |

| | | | | | |

| | |WALL CAPACITANCE PER METER OF|F/M |1.7110-10|3.2010-11|

| | | | | | |

| | |JUNCTION PERIMETER | | | |

| | | | | | |