Previous Next Contents

Chapter 3:   A CMAC Primer:

The Cerebellar Model Articulated Controller (CMAC) was introduced and then developed by Albus in a series of fundamental papers [Albus, 1,  2,  3 and  4]. In Chapter 2, we indicated the basic neurobiological structure that motivated the abstract notion of the N ® 100N input decoder. Let's see how the recoder is implemented mathematically. We will find that the CMAC architecture builds an input - output model for a finite collection of T input - output pairs (pi,yi), 0 £ i £ T-1 where the inputs pi come from a bounded subset of the N dimensional space ÂN with scalar outputs yi in Â. This model will depend on a vector of parameters P from ÂP and we will be able to choose the parameters in such a way that we can associate to each input pi, an output value which approximates the desired output value yi. Calling this mapping G( P), we see that

G( P): æ
è
inputs pi
in ÂN
ö
ø
®
æ
è
outputs yi
in Â
ö
ø
.

Any algorithm which uses the difference or error between the true target value yi and the actual value of the model G( P)(pi) to change the parameter vector P to minimize this error in some fashion, is called a learning algorithm. It will turn out that the CMAC learning algorithm updates the parameters P in a local fashion. The parameter vector is initialized to a starting value P0 and the desired input - output pair (p0,y0) is used to generate the error for this data pair, G( P0)(p0) - y0. The nature of the CMAC model will assign only a few parameters from P0 to the building of the output G( P0)(p0) which is to approximate the target y0. On the basis of the local error G( P0)(p0) - y0, these few parameters in P0 are updated to new values to help match the target. This generates a new parameter vector P1, which differs from P0 in only a few entries. We continue this process for pairs (pi,yi), for all possible indices i. This generates a new parameter vector Pi which is changed from the previous vector Pi-1 by only a few entries. Hence, we can call this type of learning strategy both local, because only a few parameters are altered each time a data pair is used to generate a local error and fast, because the computational requirements are very low when only a few parameters are changed at each step. It will also turn out that the model we build essentially functions as a local look up table which is generated by what is called a coarse encoding of the inputs (pi). A consequence of this fact is that the CMAC model will lack both continuity and differentiability at certain places in its domain. We now will try to explain all of this carefully.

3.1   The Raw Input Space:

We start by looking at an example where the inputs come from a bounded region I of the two dimensional input space Â2 as shown in Figure 3.1.



Figure 3.1:

3.1.1   The Zeroth Level:

Subdivide the bounded region I using a finite number of disjoint blocks Bk, where k ranges from 0 to K-1. There are of course many ways to accomplish this subdivision. In general, we note that since I is bounded, there exists two intervals [a,b] and [c,d] such that I Ì [a,b] × [c,d]. The set [a,b] × [c,d] is called a bounding box for I. It is easy to see that the choice of B is not unique. Now choose any partitions P of [a,b] and Q of [c,d] of the form
P = {s0 = a < s1 ··· sp-1 = b}.
and
Q = {t0 = c < t1 ··· tq-1 = d}.

These partitions will introduce a total of K = pq subsets of the form
Bq = ip + j = [si,si+1] × [tj,tj+1].
The indexing scheme we use here is called lexicographic. An illustration of this type of partitioning for a given B is shown in Figure 3.2. Note that each element p in B has a representation with respect to the (p0,p1) axes as pi = (pi0,pi1).



Figure 3.2:

Note that given B, if we choose any data point p from I, p will be in a unique block Bk for some k. Let's denote this unique block number by the symbol v(p). This gives us a way to assign to each input p, a unique ``address'' v(p). We can summarize this discussion as follows:

  1. The blocks BR ``cover'' I
  2. Each Bk is disjoint from all other blocks Bl
  3. p Î I Þ $! address k such that p Î Bk.
  4. The partitioning scheme of P and Q of the bounding box B determines this ADDRESS SCHEME.
Using the block scheme, we can also define sensors as follows: Define the sensor function Nk by

Nk: B ® {0,1}

by

Nk(p) = ì
í
î
1 p Î Bk
0 p ÏBk

We see Nk is the characteristic function for the set Bk and Nk locates the block that a given input p lies in because only Nv(p) = 1. It is clear that each Nk functions as an on - off sensor. You can see an illustration of a typical sensor's action in Figure 3.3.



Figure 3.3:

Let V be the 1 × 48 matrix with entries { V0, ···, V47} where each Vi is an arbitrary scalar. The vector V now plays the role of the parameter set P. We can use this notation to define a function G(V): B ® Â by

G(V)(p) =
K-1
å
k=0
Nk(p) Vk
  = Vv(p).

We clearly see that

I Ì Èk=047

and each p in I is assigned a unique address v(p). Thus, the value G(V)(p) is well-defined. Finally, the CMAC model will attempt to find a vector V* which satisfies, for all appropriate indices i

G(V
*
 
)(pi) » yi,

where the meaning of » must be discussed carefully. The addressing scheme we are using above is what is called coarse coding because all inputs p that lie in a given block Bk will be assigned the same address. Note that this means that if, say five inputs land in B32, all five of these inputs will be assigned the same address, 32. Then, we seek a set of parameters V* such that

G(V
*
 
)(pi) = V32 » yi

for all five of these outputs. Unless all five outputs yi are the same, there will be no way to find a scalar V32 which matches all five values. For such a coarse coding of the input space, it is unlikely that all of these outpus will coincide. Hence, our training scheme must be chosen to converge to some sort of average or least squares fit of these values in order for the model to have any sort of utility. The input p represents the mossy fiber input line value and the scalar Vv(p) is interpreted as the value of the parallel fiber/ Purkinje dendrite interaction. Hence, this coarse coding provides a N ® N encoder. How can we implement the full N ® 100N abstract decoder? We can set up another N or so parallel fiber/ Purkinje dendrite interactions by taking the original coarse coding and building a new coding by offsetting each original block Bk by the fixed vector (Qx, Qy). The two sets of blocks will then provide a N ® 2N encoder. Further offsetting strategies will provide additional encodings. After L offsetting strategies are implemented, we will arrive at a N ® LN encoder. The coarse encoding we have defined in this part can now be denoted the level 0 coding. To reflect this, we will now add a superscript 0 to all of the relevant symbols used. To summarize, on level 0, there are K0 blocks, each input p is associated with a unique address v0(p) and the CMAC model is defined by

G(V0)(p) = V
0
 
v0(p)
.

Looking at our specific example in Figure 3.2, we see there are 48 disjoint blocks or 48 distinct addresses. Also, by focusing on the bounding box and its partitioning alone, we can draw the abstract Figure 3.4, where the blocks are denoted by the new symbols, Bk0.



Figure 3.4:

3.2   Adding the Next Level:

We will now construct Level 1 using a (Qp, Qy) offset of level 0. In Figure 3.5, we illustrate this idea. Think of the Level 0 blocks as providing a tiling of the bounding box B. Now, pretend that this tiling can be physically lifted up and slid across B. The amount of displacement is indicated by the fixed offset distances (Qp and Qy). The original Level 0 is drawn in greyscale, while the new Level 1 is shown in a bold black. Note that dotted lines have been drawn down and across to the p0 and p1 axes. This will create an additional row and column of blocks in the new Level 1.



Figure 3.5:

A closeup of the offsetting strategy is shown in Figure 3.6 for convenience. Further, the abstraction of Level 1 can be seen in Figure 3.7.



Figure 3.6:



Figure 3.7:

Note that the blocks in Level 1 are denoted by the symbols Bk1. and that there are K1 such blocks. A given input p will now land in a unique block in Level 0, Bk10 and a unique block in Level 1, Bk21, where k1 Î [0,K0-1] and k2 Î [0,K1-1]. For our particular example, we see K1 = 48 also, but if you think about it a bit, there is no reason to believe that the number of blocks obtained by a given offsetting strategy will always match the number in the previous level. It is clear that we can now assign two unique addresses to the input p, v0(p) = k1 and v1(p) = k2. This defines an address vector

v(p) = é
ë
v0(p)
v1(p)
ù
û
¬ level 0 address
¬ level 1 address
.

Consider now two inputs p1 and p2. As shown in Figure 3.8, both v0(p1) = 26 and v0(p2) = 26. However, the same two inputs land in different blocks in Level 1, v1(p1) = 26 and v1(p2) = 32. This is indicated in Figure 3.9.



Figure 3.8:



Figure 3.9:

We summarize these results in Table 3.1:


LEVEL INPUT ADDRESS
0 p1 26
1 p1 26
 
0 p2 26
1 p2 32

Table 3.1:

The address vectors are then:

v(p1) =
é
ë
26
26
ù
û
v(p2) =
é
ë
26
32
ù
û

It is actually easier to think of the address vector as a comma delimited list of level addresses; that is
v(p) = v0(pv1(p).

In our example, that leads to

v(p1) = 26, 26

and

v(p2) = 26, 32.

Now let V1 be the 1 × 48 matrix with entries { V01, ···, V471} where each Vi1 is an arbitrary scalar. From the matrix V0 for Level 0 and this new matrix V1 for Level 1, we can construct the 2 × 48 matrix V with zeroth row V0 and first row V1. Thus, the (i,j) entry of the matrix V is given by Vi,j = Vji. The 2 × 48 matrix V will now play the role of the parameter set P. We can also define the sensor function Nk1 by

Nk1: B ® {0,1}

by

Nk1(p) = ì
í
î
1 p Î Bk1
0 p ÏBk1

We can then use this notation to define a function G(V): B ® Â by

G(V)(p) =
1
å
j=0
Kj-1
å
k=0
Nkj(p) Vkj
  =
1
å
j=0
Kj-1
å
k=0
Nkj(p) Vj,k
  =
V
0
 
v0(p)
+ V
1
 
v1(p)
  =
V
 
0,v0(p)
+ V
 
1,v1(p)
.

Note that p1 and p2 in B have same addresses if v0(p1) = v0(p2) = k and v1(p1) = v1(p2) = l . This implies that p1 and p2 lie in Bk0 Ç Bl 1. In our example, there are only 140 such nonempty intersections. You can verify this by looking at Figure 3.10 which shows the block structure obtained by combining the indicating all the nonempty intersections of Level 0 and Level 1. Hence, in our example, there 140 possible unique addresses v(p). In the matrix V, there are (48)2 possible Level 0 and Level 1 address combinations, but most of these are impossible to choose as they don't correspond to an address list that can be determined by an input p. Hence, the function G(V) can only compute 140 possible values. Since V stores only 96 scalars, we see that G(V) uses 96 values to construct 140 outputs.



Figure 3.10:

Now in the combined level structure, there are 140 disjoint blocks and 140 unique addresses. If we built a single level coarse encoding using the combined levels, we could build a CMAC style mapping as follows: the address for a given p is given by v(p) which is an integer from 0 to 139. We can pick any 1 × 140 matrix U of scalars and define a model by

G(U)(p) = Uv(p).

So we can still only fit 140 output values, but we now have 140 scalar entries in the parameter matrix U to construct the fit. If 140 input-output pairs were available, with each pair in a unique block, this model will clearly lead to a 1-1 mapping between a given input pi and its associated output yi by simply choosing Uv(pi) = yi. Thus, the combined model generates a simple lookup table. We need to store 140 values in U to construct 140 output values. By contrast, in the two level model, we use the 96 values stored in V to construct the 140 distinct outputs. Since, this can't be done using a simple lookup table approach, it is clear that our training algorithm will somehow have to generate something like a regression or least squares fit to the data. Note that each of the 96 entries in the V matrix plays the role of the value of a parallel fiber/ Purkinje cell dendrite interaction in a N ® 2N encoder.

3.3   The General Model: L Levels:

Following the coarse coding and offsetting strategies discussed already, we can construct a general model using L levels. We now assume that each input comes from the space ÂN, so that there are N components per input vector. In summary, our model will contain

  1. L levels or covers of B with
    I Ì
    K0-1
    È
    k=0
    Bk0 K0 º number of blocks on level   0
    I Ì
    K1-1
    È
    k=0
    Bk1 K1 º number of blocks on level   1
      ·
    ·
    ·
     
    I Ì
    KL-1-1
    È
    k=0
    BkL-1 KL-1 º number of blocks on level   L-1
  2. Address vectors v such that p Î I Þ $!

    v0(p) = s0   address   level 0
    v1(p) = s1   address   level 1
      ·
    ·
    ·
     
    vL-1(p) = sL-1   address   level L-1

  3. A sensor function Nkj corresponding to each block Bkj on level j.
Finally, note that we can calculate the number of blocks on Level j as follows:

Kj =
æ
ç
ç
è
subintervals
input component
0
ö
÷
÷
ø
× æ
ç
ç
è
subintervals
input component
1
ö
÷
÷
ø
  ·
·
·
 
  ×
æ
ç
ç
è
subintervals
input component
N-2
ö
÷
÷
ø
× æ
ç
ç
è
subintervals
input component
N-1
ö
÷
÷
ø
  = m0j × m1j × ··· × mN-2j × mN-1j

where mi denotes the number of subintervals created by our chosen partition scheme of the ith component axis. The parameter matrix V is now not necessarily a rectangular matrix as each row has a potentially different size. In general, we call this parameter matrix our Virtual Memory. Note that this L Level model is an implementation of the general N ® LN encoder with the usual background of neurobiological meaning. In this general case, we have

V =
é
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ë
V00 V01 ...
V
 
0,K0-1
V10 V11 ...
V
 
1,K1-1
·
·
·
·
·
·
·  
 · 
  ·
·
·
·
VL-2,0 VL-2,1 ...
V
 
L-2,KL-2-1
VL-1,0 VL-1,1 ...
V
 
L-1,KL-1-1
ù
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
û
¬ level 0 rowV0
¬ level 1 rowV1
·
·
·
¬ level L-2 rowVL-2
¬ level L-1 rowVL-1
    (3.1)

This defines an address vector

v(p) =
é
ê
ê
ê
ê
ê
ë
v0(p)
v1(p)
·
·
·
vL-1(p)
ù
ú
ú
ú
ú
ú
û

which can also be written as the comma delimited list

v(p) = { v0(p), v1(p), ··· ,vL-1(p) }

We define the CMAC model by G(V) ® Â by

G(V)(p) =
L-1
å
j=0
Kj-1
å
k=0
Nkj(p) Vkj
  =
L-1
å
j=0
Kj-1
å
k=0
Nkj(p) Vj,k
  =
L-1
å
j=0
V
 
j,vj(p)

It is clear that G(V) is a linear function and V contains K0 + K1 + ··· + KL-1 scalars. Let K denote the number of nonempty block intersections for the L levels; that is

K = number of nonempty   B
0
 
k1
 Ç  B
1
 
k2
 Ç ··· Ç  B
L-1
 
kL-1
.

Then under the best circumstances, namely K input - output pairs, each landing in a unique block intersection, G(V) can construct K elements.

3.4   A Specific Example:

Let's assume we have input data from a bounded region of Â2 and we are using the bounding box B = [a,b] × [c,d]. For level 0, we use a uniform partition for each side of the bounding box B into 5 pieces. Thus, each subinterval on the p0 axis has length Dx = b-a/5 and each subinterval on the p1 axis has length Dy = d-c/5. A CMAC model with 1 level would then use a 1 × 25 parameter matrix V, where

V = {V00, ···, V0,24 }

and the best model we could build would require 25 outputs yi with each output associated with an input pi that lies in a distinct block Bi0 from level 0. The model we build would then choose each scalar V0,i to match yi and the CMAC function would have the form

G(V)(pi) = V
 
v0(pi)
= yi,

for some index k between 0 and 24. The CMAC model would obviously function as a lookup table which can match exactly any input pi from the initial data set with its associated output yi. However, if a new input-output pair (x,z) were available, we would see that

G(V)(x) = V
 
v0(x)
= V0,i = yk,

for some indices i and k between 0 and 24. There is no reason to believe that G(V)(x) matches z. Indeed, the only way we could possibly improve the model would be to add a new block to the level 0 structure that contains x and excludes all the other inputs xi.

On the other hand, if we add a second level, level
1, by offsetting the level 0 structure by

(Qx,Qy) = (
Dx

10
,
Dy

10
),

we see that level 1 will contain 36 blocks. The new parameter matrix will then consist of the 61 scalars

V =
é
ë
V00 V01 ... V0,24
V10 V11 ... V1,35
ù
û

For this two level coding, when we add level 1 blocks to the level 0 blocks, each block in level 0 will split into 4 new blocks. Hence, the combined or flattened two level block structure is a coarse coding using one level with 25 × 4 = 100 blocks. The best model we could then hope for would be built from 100 input-output pairs (pi,yi) with each pi landing in a unique region of the form

Bj0 × Bk1

for some indices j and k. Then, we would choose scalars such that

V0,j + V1,k = yn,

for some index n between 0 and 99. This will give us 100 equations in 61 unknowns. There is clearly no unique solution, so when we search for some way to choose the scalars Vi,j to mathc the output data, we would like this training algorithm to be a lot like a standard least-squares solution. Note that the flattened approach will give us a CMAC model with 100 parameters to use to fit 100 outputs leading once again to a lookup table solution. While the lookup table solution can't in general approximate a new input-output pair never seen before, there is hope that a least squares solution approach will do better. Finally, the 2 level CMAC model uses 61 scalars to fit 100 outputs while the flattened model uses 100 scalars to fit the 100 outputs. This amounts to a storage increase of 100/61 = 1.64.

Now assume we have constructed a three level coarse for the same data using the same uniform coding for level
0 and the offset (Qx,Qy) = (Dx/15,Dy/15) for level 1 and (Qx,Qy) = (2Dx/15,2Dy/15) for level 2. In general, for an L level CMAC coding, the offset strategy at each successive level would be

(Qx,Qy) = (
Dx

5L
,
Dy

5L
).

Note that the block structure of level 1 splits each block from level 0 into approximately 4 new blocks. When we add the block structure from level 2 and the block structure from level 1 to each block in level 0, each block in level 0 undergoes a split into about 9 blocks. The flattened CMAC model thus will use 9 × 25 = 225 blocks while the parameter matrix for the 3 level CMAC function will have the form

V =
é
ê
ê
ë
V00 ··· V0,24
V10 ··· V1,35
V20 ··· V2,35
ù
ú
ú
û
.

Hence G(V) will use 97 scalars to model at most 225 distinct outputs. The combined or flattened CMAC model G(U) will use a parameter matrix with 225 scalars to construct at most 225 outputs. Hence, the combined model will function as a direct lookup table, while the three level CMAC must use fewer scalars to model the same number of outputs. The increase in parameters from the three level CMAC to the combined CMAC is 97 to 225 or a storage increase of 2.32.

If we add a fourth level using the offset strategy at each level

(Qx,Qy) = (
Dx

20
,
Dy

20
),

we the number of distinct blocks in the combined model is 16 × 25 = 400; hence K = 400. The new parameter matrix for the four level CMAC model G(V) is then

V =
é
ê
ê
ê
ë
V00 ··· V0,24
V10 ··· V1,35
V20 ··· V2,35
V30 ··· V3,35
ù
ú
ú
ú
û
.

which uses 133 scalars. The combined model G(U) will need the matrix U to contain 400 scalars. hence, there is a storage increase from 133 to 400 or 3.01.

Adding a fifth level will then use
169 scalars for the five level CMAC and 625 scalars for the combined CMAC implying a storage increase of 3.70. As the number of levels increase, we see the storage increase will grow as well. Hence, by using an L level CMAC model, we have fewer parameters to approximate a number of outputs which is substantially larger than the number of parameters. If we use a training scheme that tries to build a least squares model, we will at least have some hope of building a model which will generate a ``reasonable'' approximation to both outputs from the data set as well as unknown data pairs. A lookup table model, on the other hand, has no hope of building a good approximation to an input - output pair that was not used to build the model.

If we use a uniform
L level coding, then the number of parameters in the model will be 25 + 36L while the number of parameters in the flattened model will be 25 L2. Hence the storage increase to model 25 L2 input-output pairs where each input lands in a unique block in the flattened model is given by

25 L2

25 + 36L
£
25

36
L = .69L

Clearly, there is a potential gain in what might be called the ``generalization'' capability of the CMAC model by increasing L as that forces us to use fewer and fewer parameters to model larger and larger numbers of outputs. Of course, this brief motivational analysis is only good when we assume all of our input-output pairs are distributed in such a way that each input falls in a unique flattened block. We will call such a distribution of the data, the Best Flattened Data Distribution or the BFD distribution.

3.5   Function Approximation Using the CMAC:

As we have mentioned earlier, we would like to choose the parameters in a typical CMAC model to match input to output pairs. We assume that we are given T data pairs (pi,yi) with the inputs coming from some bounded region of ÂN and that we have chosen an appropriate bounding box B. We set up a coarse coding of the bounding box using L levels. When we attempt to match inputs to outputs we find the following T equations:

G(V)(p0) =
L-1
å
j=0
V
 
j,vj(p0)
= y0
G(V)(p1) =
L-1
å
j=0
V
 
j,vj(p1)
= y1
G(V)(p2) =
L-1
å
j=0
V
 
j,vj(p2)
= y2
  ·
·
·
 
G(V)(p0) =
L-1
å
j=0
V
 
j,vj(pT-2)
= yT-2
G(V)(p0) =
L-1
å
j=0
V
 
j,vj(pT-1)
= yT-1

There are a total of K0 + K1 + ··· + KL-1 parameters and T equations. For a general input-output data set, there is no guarantee that we have a BFD distribution, so we can expect that sometimes multiple inputs generate identical addresses. However, their associated outputs are probably different, so there is the possibility of inconsistent equations. Regardless of the data, the usual CMAC training algorithm is organized as follows:

Step I:
Choose some initial guess for the parameters V and label this initial guess as V0. Usually, V0 is set to be zero.
Step II:

  1. Choose data (p0,y0).
  2. Compute the error e0 = y0 - G(V0)(p0).
  3. There are only L parameters of V0 used to compute G(V0)(p0). These parameters are {V0,v0(p0), ···, VL-1,vL-1(p0) }. This is usually a very small fraction of the total number of parameters in V0.
  4. Update these L parameter values by distributing e0/L of the total error to each parameter as follows:

    V
    0,new
     
    j,vj(p0)
    = V
    0,old
     
    j,vj(p0)
    + l
    e0

    L
    ,

    where we perform the update for all levels from 0 to L-1 and the old and new superscripts on the initial parameter matrix refer to the original parameters before alteration by the learning algorithm and the new updated parameters. The symbol l denotes what is called the learning rate for the training algorithm. It takes on a value between 0 and 1.
  5. The above steps have created a new parameter matrix V1 which differs from V0 by at most L values.
Step III:
Now choose the next data pair (p1,y1) and repeat the steps from Step II. This will compute the next error e1 and update V1 to a new parameter matrix V2, which will differ from V1 by at most L values. We will continue to cycle through the data (pi,yi) in this fashion until all the data has been seen. At this point we will have a new parameter matrix VL-1. This is the first pass through the full data set.
We then continue passes through the full data set until the errors between the CMAC model and the desired targets are as small as desired. The algorithm as described above uses what is called a delta rule to update parameters. It is a lot like a gradient descent algorithm and it turns out that this algorithm generates a model that is quite similar to a least squares estimate. We can formalize the training algorithm a bit more by using t-1 to indicate how many passes through the full data set have already been completed. Thus, Vt,k indicates that we are in the midst of the tth pass through the data and that we have already updated the parameters for data {(p0,y0), ···, (pi,yi) }. The CMAC training algorithm can then be expressed in compact form by

Step I:
Set t = 0. Choose initial parameters Vt=0,0.
Step II: Pass t

Set the cumulative error,
Et for pass t to zero.

for(i = 0; i < T; ++i) {
   ei = yi - G(V)t,i(pi)
   Et += ei
   for( j=0; j < L, ++j ) {
     Vt,i+1(j,vj(pi)) = Vt,i(j,vj(pi)) + l ei/L }
     }
   }

Step III:
Set Vt+1,0 = Vt,T-1. Increment t to t+1 and restart Step II as long as the cumulative error Et over the tth pass is higher than desired.
In the next chapters, we will continue to analyze the CMAC model. So far, we have only considered scalar output data. If the output targets were d dimensional, we could simply define d separate CMAC models with potentially different coarse codings for the common input space and use the scalar CMAC training algorithm on each of the d components of the vector target. Note further that no matter how we choose to label the blocks Bkj in the coarse coding, the address calculation vj(pi) is not continuous with respect to the input pi. This in turn forces a lack of continuity for G(V) with respect to the inputs. However, it is very clear that learning is both fast and local because only L parameters are updated at any given step.

In general, the size of the parameter matrix
V becomes a problem as the input dimension grows. All of our two dimensional examples have used quite small parameter matrices so far. But what if the inputs came from a bounded region of a higher dimensional space? Then we have trouble! Consider the following simple example:

  1. Assume the input space is 10--dimensional.
  2. We use 5 levels, so L = 5.
  3. We use a very coarse coding scheme, only 4 sensors/ input component/ level
    1. K0 = ··· = K4 = 410 = 1,048,576.
    2. V has 5 rows, each with 1,048,576 entries. implying V stores 5,242,880 values.
    3. Thus, for each component of a vector target, if V stores floats (4 bytes), then V requires 20 MB storage.
  4. We use a slightly finer coding scheme, 5 sensors/ input component/ level.
    1. K0 = ··· = K4 = 510 = 9,765,625.
    2. V has 5 rows, each with 48,828,125 entries. implying V stores 48,828,125 values.
    3. Thus, for each component of a vector target, if V stores floats (4 bytes), then V requires 186 MB storage.
As N increases, L increases, Kj increases; it is clear that the required V storage increases exponentially.

3.6   Hashing Strategies:

The exponential increase in V storage is a serious problem for the CMAC model. One way to deal with this problem is to hash the addresses vj(pi). One way to do this is to use modular hashing with a fixed hash size Hj for each level j. The true address at level j for input pi is the extremely large integer vj(pi). The hashed address is then the potentially smaller integer vj(pi) mod Hj. The use of hashed addresses means that to each level j there are at most Hj unique addresses implying that the parameter matrix V needs only Hj values for the jth row. We call the usual CMAC parameter matrix, V, virtual memory; the new hashed parameter matrix, W, will be called working memory. We will also denote the hashed addresses by the symbol wj to distinguish them from the usual full addresses vj. W has substantially lower storage requirements than V but other problems arise due to hash collisions; i.e. inputs p1 and p2 with normally distinct addresses may be sent to same hashed address. The CMAC training algorithm will then have trouble because
G(W)(p) =
L-1
å
j=0
W
 
j,wj(p1)
  =
L-1
å
j=0
W
 
j,wj(p2)
  = G(p2)

Hence training can try to set G(W)(p1) = y1 or y2 but probably can't match both.

3.7   The Purkinje Model:

The simple Albus model of the Purkinje cell as a N ® 100N recoder can now be built from the CMAC function approximators discussed in this chapter. If we assume that each mossy fiber carries sensor data that is multiplexed in some fashion, then we have inputs p coming from ÂN for some positive integer N. For convenience, assume the data is five dimensional (N = 5). We will assume that all inputs lie in the bounded region

B = [a,b] × [a,b] × [a,b] × [a,b] × [a,b].

Choose the number of levels to be L = 100. Each MF splits into about 100 PF via the granule cell pathways. Think of each individual granule cell path as corresponding to a level. Roughly speaking, the total number of parameters in the CMAC matrix V is L × M, where M is the number of entries in each row (for convenience, we assume each row is of fixed size; for example, we can force this by chooing a uniform hash size H for all levels). The biological Purkinje cell has about 200,000 dendrites, implying that each row in V should contain about 2,000 values. If we use a uniform coarse coding strategy on all five input components, we see we want

2,000 = m5

where m is the number of blocks on each component axis. Thus, m » 4.6 and we should choose the block size for each component to be on the order of

w =
b-a

5
.

This CMAC model will use the uniform offset strategy

(Dx,Dy) = (
b-a

5 × 100
,
b-a

5 × 100
),

implying that the model can resolve inputs on the order of b-a/500 apart, but cannot distinguish between inputs that are closer.


Previous Next Contents