Papamichalis, P. “Introduction to the TMS320 Family of Digital Signal Processors”
Digital Signal Processing Handbook
Ed. Vijay K. Madisetti and Douglas B. Williams
Boca Raton: CRC Press LLC, 1999
c
1999byCRCPressLLC
77
Introduction to the TMS320 Family
of Digital Signal Processors
Panos Papamichalis
Texas Instruments
77.1 Introduction
77.2 Fixed-Point Devices: TMS320C25 Architecture and
Fundamental Features
77.3 TMS320C25 Memory Organization and Access
77.4 TMS320C25 Multiplier and ALU
77.5 Other Architectural Features of the TMS320C25
77.6 TMS320C25 Instruction Set
77.7 Input/Output Operations of the TMS320C25
77.8 Subroutines, Interrupts, and Stack on the TMS320C25
77.9 Introduction to the TMS320C30 Digital Signal Processor
77.10 TMS320C30 Memory Organization and Access
77.11 Multiplier and ALU of the TMS320C30
77.12 Other Architectural Features of the TMS320C30
77.13 TMS320C30 Instruction Set
77.14 Other Generations and Devices in the TMS320 Family
References
This article discusses the architecture and the hardware characteristics of the TMS320
family of Digital Signal Processors. The TMS320 family includes several generations
of programmable processors with several devices in each generation. Since the pro-
programmable one. Availability and quality of software and hardware development tools (such as
compilers, assemblers, linker, simulators, hardware emulators, and development systems), applica-
tion notes, third-party products and support, hot-line support, etc. play an important role on how
easy it will be to develop an application on the DSP processor. The TMS320 family has very extensive
such support, but its description goes beyond the scope of this article. The interested reader should
contact the TI DSP hotline (Tel. 713-274-2320).
For the purposes of this article, two devices have been selected to be highlighted from the Texas
Instruments TMS320 family of digital signal processors. One is the TMS320C25, a 16-bit, fixed-point
DSP, and the other is the TMS320C30, a 32-bit, floating-point DSP. As a short-hand notation, they
will be called ‘C25 and ‘C30, respectively. The choice was made so that both fixed-point issues are
considered.
There have been newer (and more sophisticated) generations added to the TMS320 family but,
since the objective of this article is to be more tutorial, they will be discussed as extensions of the
‘C25 and the ‘C30. Such examples are other members of the ‘C2x and the ‘C3x generations, as well
as the TMS320C5x generation (‘C5x for short) of fixed-point devices, and the TMS320C4x (‘C4x) of
floating-point devices. Customizable and fixed-function extensions of this family of processors will
be also discussed.
Texas Instruments, like all vendors of DSP devices, publishes detailed User’s Guides that explain at
great length the features and the operation of the devices. Each of these User’s Guides is a pretty thick
book, so it is not possible (or desirable) to repeat all this information here. Instead, the objective of
this article is to give an overview of the basic features for each device. If more detail is necessary for
an application, the reader is expected to refer to the User’s Guides. If the User’s Guides are needed,
it is very easy to obtain them from Texas Instruments.
77.2 Fixed-Point Devices: TMS320C25 Architecture and
Fundamental Features
The Texas Instruments TMS320C25 is a fast, 16-bit, fixed-point digital signal processor. The speed
of the device is 10 MHz, which corresponds to a cycle time of 100 ns. Since the majority of the
instructions execute in a single cycle, the figure of 100 ns also indicates how long it takes to execute
one instruction. Alternatively, we can say that the device can execute 10 million instructions per
second (MIPS). The actual signal from the external oscillator or crystal has a frequency four times
have two operands, one of which is always the accumulator. The result of the operation is stored in
the accumulator.
Because of this approach the form of the instructions is very simple indicating only what the other
operand is. This architectural philosophy is very popular but it is not universal. For instance, as is
discussed later, the TMS320C30 takes a different approach, where there are several “accumulators”
in what is called a register file.
Other components of the TMS320C25 CPU are several shifters to facilitate manipulation of the
data and increase the throughput of the device by performing shifting operations in parallel with
other functions. As part of the CPU, there are also eight auxiliary registers that can be used as memory
pointers or loop counters. There are two status registers, and an 8-deep hardware stack. The stack
c
1999 by CRC Press LLC
FIGURE 77.2: Key architectural features of the TMS320C25.
is used to store the memory address where the program will continue execution after a temporary
diversion to a subroutine.
To communicate with external devices, the TMS320C25 has 16 input and 16 output parallel ports.
It also has a serial port that can serve the same purpose. The serial port is one of the peripherals that
have been implemented on chip. Other peripherals include the interrupt mask, the global memory
capability, and a timer. The above components of the TMS320C25 are examined in more detail
below.
The device has 68 pins that are designated to perform certain functions, and to communicate
with other devices on the same board. The names of the signals and the corresponding definitions
appear in Table 77.1. The first column of the table gives the pin names. Note that a bar over the
name indicates that the pin is in the active position when it is electrically low. For instance, if the
pins take the voltage levels of 0 V and 5 V, a pin indicated with an overbar is asserted when it is set
at 0 V. Otherwise, assertion occurs at 5 V. The second column indicates if the pin is used for input
to the device or output from the device or both. The third column gives a description of the pin
functionality.
Understanding the functionality of the device pins is as important as understanding the internal
D15-D0 I/O/Z 16-bit data bus D15 (MSB) through DO (LSB). Multiplexedbetween program,
data, and I/O spaces.
A15-A0 O/Z 16-bit address bus A15 (MSB) through AO (LSB)
PS,DS, IS
O/Z Program, data, and I/O space select signals
R/
W
O/Z Read/write signal
ST RB
O/Z Strobe signal
RS
I Reset input
INT
2-
INT
0 I External user interrupt inputs
MP/
MC
I Microprocessor/microcomputer mode select pin
MSC
O Microstate complete signal
IACK
O Interrupt acknowledge signal
READY I Data ready input. Asserted by external logic when using slower devices to
indicate that the current bus transaction is complete.
BR
O Busrequestsignal. Assertedwhenthe TMS320C25requiresaccesstoanexternal
global data memory space.
XF O External flag output (latched software-programmable signal)
HOLD
The difference of the architectures is important because it influences the programming style. In
Harvard architecture, two memory locations can have the same address, as long as one of them is
in the data space and the other is in the program space. Hence, when the programmer uses an
address label, he has to be alert as to what space he is referring. Another restriction of the Harvard
architecture is that the data memory cannot be initialized during loading because loading refers
only to placing the program on the memory (and the program memory is separate from the data
memory). Datamemorycan be initialized during execution only. The programmer must incorporate
such initialization in his program code. As it will be seen later, such restrictions have been removed
from the TMS320C30 while retaining the convenient feature of multiple buses.
Figure 77.5 shows a functional block diagram of the TMS320C25 architecture. The Harvard
c
1999 by CRC Press LLC
FIGURE 77.3: Simplified block diagram of the Harvard architecture.
FIGURE 77.4: Simplified block diagram of the von Neuman architecture.
architecture of the device is immediately apparent from the separate program and data buses. What
is not apparent is that the architecture has been modified to permit communication between the
two buses. Through such communication, it is possible to transfer data between the program and
memory spaces. Then, the program memory space also can be used to store tables. The transfer
takes place by using special instructions such as TBLR (Table Read), TBLW (Table Write), and BLKP
(Block transfer from Program memory).
As shown in the block diagram, the program ROM is linked to the program bus, while data RAM
blocks B1 and B2 are linked to the data bus. The RAM block B0 can be configured either as program
or data memory (using the instructions CNFP and CNFD), and it is multiplexed with both buses.
The different segments, such as the multiplier, the ALU, the memories, etc. are examined in more
detail below.
77.3 TMS320C25 Memory Organization and Access
Besides the on-chip memory (RAM and ROM),the TMS320C25 can accessexternalmemory through
the external bus. This bus consists of the 16 address pins A0-A15, and the 16 data pins D0-D15.
The address pins carry the address to be accessed, while the data pins carry the instruction word or
microcomputer configurations of the program memory are depicted separately. The data memory
is partitioned in 512 sections, called pages, of 128 words each. The reason of the partitioning is for
addressing purposes, as will be discussed below. Memory boundaries of the 64K memory space are
shown in both decimal and hexadecimal notation (hexadecimal notation indicated by an “h” or “H”
at the end.) Compare this map with the block diagram in Fig. 77.5.
As mentioned earlier, in two-operand operations, one of the operands resides in the accumulator,
and the result is also placed in the accumulator. (The only exceptions is the multiplication operation
examined later.) The other operand can either reside in memory or be part of the instruction. In the
lattercase, the value to be combinedwith the accumulator is explicitly specified in the instruction, and
this addressing mode is called immediate addressing mode. In the TMS320C25 assembly language,
the immediate addressing mode instructions are indicated by a “K” at the end of the instruction.
c
1999 by CRC Press LLC
For example, the instruction
ADDK 5
increments the contents of the accumulator by 5.
If the value to be operated upon resides in memory, there are two ways to access it: either by
specifying the memory address directly (direct addressing) or by using a register that holds the
address of that number (indirect addressing).
As a general rule, it is desirable to describe an instruction as briefly as possible so that the whole
description can be held in one 16-bit word. Then, when the program is executed, only one word
needs to be fetched before all the information from the instruction is available for execution. This
is not always possible and there are two-word instructions as well, but the chip architects always
strive to achieve one-word instructions. In the direct addressing mode, full description of a memory
address would require a 16-bit word by itself because the memory space is 64K words. To reduce
that requirement, the memory space is divided in 512 pages of 128 words each. An instruction using
direct addressing contains the 7 bits indicating what word you want to access within a page. The
page number (9 bits) is stored in a separate register (actually, part of a register), called the Data Page
pointer (DP). You store the page number in the DP pointer by using the instructions LDP (Load Data
Notation Operation
ADD
∗
No manipulation of AR or ARP
ADD
∗
,Y Y
→
ARP
ADD
∗
+
AR(ARP)
+1 →
AR(ARP)
ADD
∗
+
,Y AR(ARP)
+1 →
AR(ARP)
Y
→
ARP
ADD
∗
- AR(ARP) -
1 →
AR(ARP)
ADD
→
AR(ARP)
ADD
∗
0
-,Y AR(ARP)-AR0
→
AR(ARP)
Y
→
ARP
ADD
∗
BR0
+
AR(ARP)
+
rcAR0
→
AR(ARP)
ADD
∗
BR0
+
,Y AR(ARP)
+
rcAR0
→
AR(ARP)
Y
location. Again, this construct, with one implied operand residing in the T-register, permits more
compact instruction words. When multiplier and multiplicand (two 16-bit words) are multiplied
together, the result is 32-bits long. In traditional microprocessors, this product would have been
truncated to 16 bits, and presented as the final result. In DSP applications, though, this product
is only an intermediate result in a long stream of multiply-adds, and if truncated at this point, too
much computational noise would be introduced to the final result. To preserve higher final accuracy,
the full 32-bit result is held in the P-register (for product register). This configuration is shown in
Fig. 77.8 which depicts the multiplier and the ALU of the TMS320C25.
Actually, the P-register is viewed as two 16-bit registers concatenated. This viewpoint is convenient
c
1999 by CRC Press LLC
if you need to save the product using the instructions SPH (store product high) and SPL (store
product low). Otherwise, the product can operate on the accumulator, which is also 32-bits wide.
The contents of the product register can be loaded on the accumulator, overwriting whatever was
there, using the PAC (product to accumulator) instruction. It can also be added to or subtracted
from the accumulator using the instructions APAC or SPAC.
FIGURE 77.8: Diagram of the TMS320C25 multiplier and ALU.
When moving the contentsof the T-register to the accumulator, youcan shift this number using the
built-in shifters. For instance you can shift the result left by 1 or 4 locations (essentially multiplying
it by 2 or 16), or you can shift it right by 6 (essentially dividing it by 64). These operations are done
automatically, without spending any extra machine cycles, simply by setting the appropriate product
mode with SPM instruction. Why would you want to do such shifting? The left shifts have as a
main purpose to eliminate any extra sign bits that would appear in computations. The right shift
scales down the result and permits accumulation of several products before you start worrying about
overflowing the accumulator.
At this point, it is appropriate to discuss the data formats supported on the TMS320C25. This
device, as most fixed-point processors, uses two’s-complement notation to represent the negative
numbers. In two’s complement notation, to form the negative of a given number, you take the
complement of that number and you add 1. In two’s-complement notation, the most significant bit
data and the program buses to bring in the operands of the multiplication. The data coming from
the data bus can be traced in memory by an AR, using indirect addressing. The data coming from the
program bus are traced by the program counter (actually, the pre-fetch counter, PFC) and, hence,
they must reside in consecutive locations of program memory. To be able to modify the data and
then use it in such multiply-add operations, the TMS320C25 permits reconfiguration of block B0
in the on-chip memory. B0 can be configured either as program or as data memory, as shown in
Fig. 77.9, using the CNFD and CNFP instructions.
c
1999 by CRC Press LLC
77.5 Other Architectural Features of the TMS320C25
The TMS320C25 has many interesting features and capabilities that can be found in the user’s
guide [1]. Here, we present briefly only the most important of them.
The program counter is a 16-bit register, hidden from the user, which contains the address of
the next instruction word to be fetched and executed. Occasionally, the program execution may be
redirected, for instance, through a subroutine call. In this case, it is necessary to save the contents
of the program counter so that the program flow continues from the correct instruction after the
completion of the subroutine call. For this purpose, a hardware stack is provided to save and recover
the contents of the program counter.
The hardware stack is a set of eight registers, of which only the top one is accessible to the user.
Upon a subroutine call, the address after the subroutine call is pushed on the stack, and it is reinstated
in the program counter when the execution returns from the subroutine call. The programmer has
control over the stack by using the PUSH, PSHD, POP, and POPD instructions. The PUSH and
POP operations push the accumulator on the stack or pop the top of the stack to the accumulator
respectively. PSHD and POPD do the same functions but with memory locations instead of the
accumulator.
Occasionally the program execution in a processor must be interrupted in order to take care
of urgent functions, such as receiving data from external sources. In these cases, a special signal
goes to the processor, and an interrupt occurs. The interrupts can be internal or external. During
an interrupt, the processor stops execution, wherever it may be, pushes the address of the next