Arithmetic Processing using Associative memory
Associative memory is capable of performing highly parallel arithmetic computation on large datasets in constant time.
Associative memory is a type of computer memory that is accessed by virtue of its contents, not its location. Rather than saying What is the value at location 42? it says All memory locations with the value 43 please stand up.
Why is this useful, and when is it used ?
It is used whenever a fast search must be made through a list of candidates - for a CPU cache, or a fast network switch.
But if you already know the data stored there, why are you looking it up ?
The search is usually a masked search - only some of the contents are known. Think of it like solving a crossword entry - some letters are known. You put the whole dictionary into Content-addressable memory, search for the known letters in the proper position, with "Don't Care" (usually signified by X) at the other letter locations. All word locations that match are internally flagged. Some kind of resolver in the memory allows you to pick the first one, and the entire word contents are read out. The flag is cleared, and the next match is read out. In this fashion all matching words are read.
The complexity of Content-addressable memory is such that it is rarely used, and only if speed is of primary importance, as hashing algorithms and a small program can usually accomplish tasks like the crossword problem at far lesser expense (measured in chip complexity and time).
Here we have done only the most basic operation - a search. But more can be done - depending on the result of the match, we can perform other operations on the data - like, for instance, deleting all those words, or some other operation. This does not require that we read the data out - the operations are accomplished in situ in the memory.
This may not seem useful in this example, but when doing arithmetic is is.
Regular computer memory is built as Dynamic RAM. Speed and density are the primary constraints - more speed, more bits per chip. Today's Dynamic RAM is implemented as a single transistor per bit - switching charge in and out of a tiny capacitor (charge bucket) connected to each transistor. It is a leaky bucket - it will forget the answer in less than a second, so the data must be constantly read out (a destructive read) and then written back. The memory chip usually handles this internally, but it must be given lots of little time slots every second to handle this refresh administration.
RAM can be built that does not require this refresh (Static RAM) but at the cost of extra transistors - perhaps 6 per bit. So, density (number of bits per chip) is dramatically reduced. But speed is often greater, and the RAM is more deterministic - there is no behind-the-scenes administration preventing full speed access to the data. Content-addressable memory is more complex still - requiring two data lines per column (to represent the three search terms - 0, 1, X) and comparison circuitry and match lines per row.
|0||Don't care||C=0||Clear Carry|
|1||X[i]=0, Y[i]=0, C=1||Z[i]=1, C=0||1 bit set|
|2||X[i]=0, Y[i]=1, C=0||Z[i]=1, C=0||1 bit set|
|3||X[i]=1, Y[i]=0, C=0||Z[i]=1, C=0||1 bit set|
|4||X[i]=1, Y[i]=1, C=0||Z[i]=0, C=1||2 bits set|
|5||X[i]=1, Y[i]=0, C=1||Z[i]=0, C=1||2 bits set|
|6||X[i]=0, Y[i]=1, C=1||Z[i]=0, C=1||2 bits set|
|7||X[i]=1, Y[i]=1, C=1||Z[i]=1, C=1||3 bits set|
|LOOP to Step one 8 times|
As an example of mathematics in an Associative Array, let us look at the addition of two 8 bit fields, X and Y, to be put into a 9 bit Z result field. We will use another flag field as a carry bit C.
The subscripted X, Y and Z fields are marked with i=0 in the starred field on the right. That is a total of 58 operations to do an 8 bit add - is it worth it ?
The point to bear in mind is that this addition is taking place in all locations in the Content Addressable memory - however deep that might be. It is also worth bearing in mind that none of this data exits the CAM chip - all processing is happening in situ. Much of the access delay of regular RAM chips occurs because the signals must be boosted large enough to traverse printed circuit board tracks that take it to the processor chip actually doing the work.
The example given is an 8 bit add - trivially, this can be expanded (with more steps) to any field width, any operation (like multiply, logical, or floating point operations)
We saw an extra flag field being used for Carry - typically there might be another four flag bits used for row sub-selection, or intermediate state for other operations.
There is also a resolver (mentioned earlier) to pick the first matching field, and possibly more complex hardware to set all bits between one flag and another.
Other web resources
- Content-Addressable Memory Introduction by Kostas Pagiamtzis
- Content-addressable memory (CAM) circuits and architectures: A tutorial and survey K. Pagiamtzis and A. Sheikholeslami
- Using Content-Addressable Memory for Networking Applications By Sherri Azgomi
- Altera APEX 20K Device Family Architecture Programmable logic device including CAM - optimised for lookup