Previous Next Contents

Chapter 5:   CMAC Convergence:

5.1   Introduction:

We now provide a proof of the convergence of the CMAC learning algorithm under very general conditions. The convergence proof is essentially insensitive to the type of hashing used to construct the working addresses. In addition, we develop a closed form solution that sheds insight into the range of the CMAC architecture and the intelligent choice of the crucial CMAC parameters of levels, hashing strategy and sensor field width. This paper extends the convergence analyses of (Lin, 27) and (Wong, 54) to handle rigorously the explicit use of hashing. We derive criteria that suggest how to choose the number of levels, hash size and so forth so as to achieve a first--order measure of quality, thereby enhancing the discussions of (Brown, 9).

5.2   CMAC Training:

The weight update rule given by (4.37) can be cast into an iterative algorithm for training the weights of a CMAC architecture to approximate a given set of I/O data. Assume we are given a collection of S input--output data pairs, (pi,bi), 0 £ i £ S-1. We solve for the appropriate values of the weights Wj,nj(ps®) via an iterative procedure indexed by both the iteration time t and the exemplar s. The procedure works by starting with a zero weight matrix and using (4.37) repeatedly to alter the values of the weights stored in W. This procedure passes through the complete training set from s = 0 to s = S-1 in a serial fashion. Hence, each t iteration consists of S serial update calculations. So, we need to label the values stored in W to reflect the fact that they depend on four indexing variables. We will label these values as Wjkts, where t indicates the iteration count, j denotes the level, k gives the active sensor at that level and s reminds us that these values have been determined after seeing exemplars x = 0 to x =s during this iteration t. Once these values have been calculated, we can evaluate the CMAC function at any input pi; i.e., at iteration t, after we have passed through the first s inputs, we have at any input pi,

g(
®
pi
 
)
=
L-1
å
j=0
W
ts
 
j,nj (
®
pi
 
)
.

The value of g is clearly dependent on the weights stored in Wts. Hence, for a given input pi®, we will use the notation

gts(
®
pi
 
)
=
L-1
å
j=0
W
ts
 
j,nj (
®
pi
 
)
.

The basic training algorithm can then be restated as an update law for the weights at iteration t having passed through the first s inputs, Wi,jts, in terms of the weights already computed with the pass through the first s-1 items:

W
ts
 
j,nj (
®
ps
 
)
=
W
t,s-1
 
j,nj (
®
ps
 
)
+
l

L
( bs-gt,s-1(
®
ps
 
)).

We can now state a complete training algorithm. We start at t = 0 with all weights zero (W0,-1 is the zero matrix) and g0,-1 is identically zero. From examination of Table 5.1, it is clear that our convergence investigation centers on the existence of the limit limt ¾® ¥ Wj,kt,-1.


Wj,k0,-1 = 0, " j, k
 
for(t = 0; t < ¥; ++t) {
   for(s = 0; s < S; ++s) {
     for(j = 0; j < L; ++ j) {
        Wj,nj(ps®)ts = Wj,nj(ps®)t,s-1 + l/L ( bs-gt,s-1(ps® ))
        }
     }
   Wj,kt+1,-1 = Wj,kt,S-1, " j,k
   }

Table 5.1: CMAC Learning Algorithm

5.3   The Matrix Form:

Consider the usual scalar CMAC with or without hashing of size H. If there is no hashing, we can find a hash size, Hc, so large that all the virtual addresses on all the levels are smaller than this critical number Hc. Thus, we may assume without loss of generality that we are dealing with a hashed CMAC architecture with hash size H.

In order to investigate the convergence of the CMAC learning algorithm carefully, it is very useful to recast the algorithm of Table 
5.1 into standard matrix and vector notation that will show structure more clearly. First, the variables we need to determine are stored in the matrix W. We can lexographically order these variables into a column vector consisting of L subblocks of length H. Let x® denote the LH × 1 column vector whose components are

®
x
 
=
é
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ë
x0
·
·
·
xH-1
xH
·
·
·
x2H-1
·
·
·
x(L-1)H
·
·
·
xLH-1
ù
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
û
= é
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ë
W00
·
·
·
W0,H-1
W10
·
·
·
W1,H-1
·
·
·
WL-1,0
·
·
·
WL-1,H-1
ù
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
û
.
    (5.1)

Note that only one element out of each subblock of x® is accessed by the working address vector n®(p®), where

®
n
 
(
®
p
 
)
=
é
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ë
n0(
®
p
 
)
n1(
®
p
 
)
·
·
·
nL-1(
®
p
 
)
ù
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
û
.
    (5.2)

This particular structure of the weight matrix is independent of hash strategy. Hence, the CMAC output corresponding to the input p® can be written as the matrix product of the row vector, A and x® as follows:

g(
®
p
 
)
=
A
®
x
 
  =
é
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ë
0  
·
·
·
 
1
position n0(
®
p
 
)
·
·
·
 
0  
0  
·
·
·
 
1
position H + n1(
®
p
 
)
·
·
·
 
·
·
·
 
0  
·
·
·
 
1
position (L-1)H + nL-1(
®
p
 
)
·
·
·
 
0  
ù
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
û
T










































 
®
x
 
.
    (5.3)

Note that the row A consists of L subblocks that match the subblocks of x®. Let's assume that we are training on a data set of size S. Hence there are input vectors {p0,···,pS-1 }, desired outputs {b0,···,bS-1 } and associated rows {A0,···,AS-1 } with the output of the CMAC at each pi® given by g(pi®) = Ai x®. We can rewrite the weight updating algorithm as

x
new
 
jH+nj (
®
pi
 
)
=
x
old
 
jH+nj (
®
pi
 
)
+
l

L
æ
ç
ç
è
bi - Ai
®
x
 
old ö
÷
÷
ø
.
    (5.4)

The product Ai x®t should only be used to update the elements of x® accessed by the address n(pi)®. Note that AiT is a column vector whose only nonzero elements are the elements residing in the positions accessed by the working address n(pi)®. Hence, we can rewrite (5.4) as

®
x
 
new
=
®
x
 
old +
l

L
AiT( bi - Ai
®
x
 
old )
=
®
x
 
old -
l

L
AiT Ai
®
x
 
old +
l

L
AiT bi
=
(I-b AiT Ai)
®
x
 
old + b AiT bi
=
Ci
®
x
 
old + b AiT bi,
    (5.5)

where b = l/L and I denotes the LH × LH identity matrix, and we define the LH × LH square matrix Ci by Ci = I-b AiT Ai.

We can now rewrite the full algorithm presented in Table 
5.1 by eliminating the level loop (it is subsumed in the lexicographic ordering of the weight matrix). The new algorithm is shown in Table 5.2.


x®0,-1 = 0
 
for(t = 0; t < ¥; ++t) {
   for(s = 0; s < S; ++s) {
     x®t,s = (I-b AiT Ai) x®t,s-1 + b AiT bi
     }
   x®t+1,-1 = x®t,S-1
   }

Table 5.2: CMAC Vector Learning Algorithm

We are interested in establishing the convergence of the vector sequence {x®t,-1 }t=0¥ º {x®t,S-1 }t=0¥; i.e., does limt ¾® ¥ x®t,-1 exist?

5.3.1   The CMAC Training Generator Matrix:

Using Table 5.2, we can compute the iterates of x®t,-1 recursively:

  1. x®0,-1 = 0
  2. x®0,0 = C0 x®0,-1 + b A0T b0
  3. ®
    x
     
    0,1
    =
    C1
    ®
    x
     
    0,0 + b A1T b1
      =
    C1 (C0
    ®
    x
     
    0,-1 + b A0T b0) + b A1T b1
      =
    C1 C0
    ®
    x
     
    0,-1 + b (C1 A0T b0 + A1T b1)
  4. ®
    x
     
    0,2
    =
    C2
    ®
    x
     
    0,1 + b A2T b2
      =
    C2 C1 C0
    ®
    x
     
    0,-1 + b (C2 C1 A0T b0 + C2 A1T b1 + A2T b2)
      =
    D02
    ®
    x
     
    0,-1 + b (D12 A0T b0 + D22 A1T b1 + A2T b2)
      =
    D02
    ®
    x
     
    0,-1 + b æ
    ç
    ç
    è
    2
    å
    k=1
    Dk2 Ak-1T bk-1 + A2T b2 ö
    ÷
    ÷
    ø
where for p ³ q, we use the notation Dqp º Cp Cp-1 ··· Cq. An easy induction argument then shows that the general structure for x®0,p is

®
x
 
0,p
=
D0p
®
x
 
0,-1 + b æ
ç
ç
è
p
å
k=1
Dkp Ak-1T bk-1 + ApT bp ö
÷
÷
ø

and so after the first sweep through the data set is completed, we have

®
x
 
0,S-1
=
D0S-1
®
x
 
0,-1 + b æ
ç
ç
è
S-1
å
k=1
DkS-1 Ak-1T bk-1 + AS-1T bS-1 ö
÷
÷
ø
.
    (5.6)

In general, we can repeat the argument above for a given iteration and express x®t+1,S-1 in terms of x®t,-1:

®
x
 
t+1,S-1
=
D
®
x
 
t,-1 + b æ
ç
ç
è
S-1
å
k=1
DkS-1 Ak-1T bk-1 + AS-1T bS-1) ö
÷
÷
ø
  =
D
®
x
 
t,-1 + b VT
    (5.7)

where we have set D º D0S-1, and we use

VT =
S-1
å
k=1
DkS-1 Ak-1T bk-1 + AS-1T bS-1.

Applying (5.7) recursively, we can show

®
x
 
t+1,S-1
=
D
®
x
 
t,-1 + b VT
  =
D ( D
®
x
 
t-1,-1 + b VT) + b VT
  =
D 2
®
x
 
t-1,-1 + b (I + D) VT
  =
D 3
®
x
 
t-2,-1 + b (I + D + D2 ) VT
  =
D 3
®
x
 
t-2,-1 + b (
2
å
k=0
Dk) VT

where D0 º I. A standard induction argument then shows

®
x
 
t+1,S-1
=
D
l +1
 
®
x
 
t-l ,-1
 
+ b (
l
å
k=0
D)k VT
    (5.8)
=
D t+1
®
x
 
0,-1 + b (
t
å
k=0
Dk) VT.
    (5.9)

We know that we initialize the algorithm using x®0,-1 = 0; hence, (5.10) becomes

®
x
 
t+1,S-1
=
b (
t
å
k=0
Dk) VT.
    (5.10)

Thus, our original question about convergence can be rephrased using the matrix form above: convergence of the sequence is established if and only if (åt=0¥ Dk ) VT exists. Hence, we must investigate the structure of the geometric series

(I + D + ··· + Dt + ··· ) VT.

5.3.2   The Structure of D:

Note that Ai AjT is the dot product of the address vectors due to input vectors pi and pj. If these address vectors were identical --- that is, the CMAC architecture could not distinguish between these vectors --- then this dot product will have value L. If these addresses are not identical, then Ai AjT < L. In general, we will use the notation for all possible exemplar indices i and j:

mij = Ai AjT £ L
pij =
mij

L
Î [0,1]

where we refer to pij as the amount of overlap between address Ai and Aj. Consider the following matrix products:

  1. C1 C0 = I - b (A0T A0 + A1T A1) + b2 A1T A1 A0T A0
      = I - b (A0T A0 + A1T A1) + b2 A1 A0T (A1T A0)
      = I - b (A0T A0 + A1T A1) + b2 m10 (A1T A0)
  2. C2 C1 C0 =
    I - b
    2
    å
    k=0
    AkT Ak + b2 (m10 A1T A0 + m21 A2T A1 + m20 A2T A0) - b3 m21 m10 A2T A0
We can package the last result more concisely by introducing the following notation: For p £ q, let Nq = {0,1,···,q } and

Qqp = { (i0,···,iq) Î Np | i0 > ··· > iq }
| Qpq | =
æ
è
q+1
p+1
ö
ø

where (
n
m
) is the usual binomial coefficient and | S | denotes the cardinality of a set S. Then, in particular, we see

Q01 =
{ i0 Î N1 } = { (0), (1) }; | Q01 | = æ
è
2
1
ö
ø
Q11 =
{ (i0,i1) Î N1 | i0 > i1 } = { (1,0) }; | Q11 | = æ
è
2
2
ö
ø
Q02 =
{ i0 Î N2 } = { 0, , 1, 2 }; | Q02 | = æ
è
3
1
ö
ø
Q12 =
{ (i0,i1) Î N2 | i0 > i1 } = { (1,0), (2,0), (2,1) }; | Q12 | = æ
è
3
2
ö
ø
Q22 =
{ (i0,i1,i2) Î N2 | i0 > i1 > i2 } = { (2,1,0) }; | Q22 | = æ
è
3
3
ö
ø
.

Hence, we can rewrite
C2 C1 C0 as

C2 C1 C0 =
I - b
 
å
i0 Î Q02
A
T
 
i0
A
 
i0
+ b2
 
å
(i0,i1) Î Q12
m
 
i0,i1
A
T
 
i0
A
 
i1
- b3
 
å
(i0,i1,i2) Î Q22
m
 
i0,i1
m
 
i1,i0
A
T
 
i0
A
 
i2
.

A simple induction argument shows that the general form of D is given by

D = CS-1 CS-2 ··· C1 C0
  =
I - b
 
å
i0 Î Q0S-1
A
T
 
i0
A
 
i0
   
  +
S-1
å
k=1
 
å
(i0,i1,···,ik) Î QkS-1
(-1)k+1 bk+1 m
 
i0,i1
··· m
 
ik-1,ik
A
T
 
i0
A
 
ik
.
    (5.11)

We now assume that the overlap between Ai and any other address Aj is a fixed percentage mij = p. In this case, the structure of D is much simpler. We have

D =
I - b
S-1
å
i=0
AiT Ai +
S-1
å
k=1
 
å
(i0,i1,···,ik) Î QkS-1
(-1)k+1 bk+1 pk A
T
 
i0
A
 
ik
.
    (5.12)

5.3.3   The Structure of Ci:

We begin by analyzing the structure of a typical Ci, Ci = I-b AiT Ai. This LH × LH matrix has the structure shown in Table 5.3. In this table, for notational convenience, we let rji denote the row jH + nji, with nij the index of the active sensor on level j in the CMAC architecture for input pi and for ease of display, we use r± to indicate r ± 1.


  COL                    
ROW 0 ··· r0i,- r0i r0i,+ ··· rL-1i,- rL-1 rL-1i,+ ··· LH-
0 1 0 0 0 0 ··· 0 0 0 0 0
: : ddots : : : : : : : : :
r0i,- 0 0 1 0 0 : 0 0 0 0 0
r0i 0 0 0 1-b 0 : 0 -b 0 0 0
r0i,+ 0 0 0 0 1 : 0 0 0 0 0
: : : : : : ddots : : : : :
rL-1i,- 0 0 0 0 0 : 1 0 0 0 0
rL-1i 0 0 0 - b 0 : 0 1-b 0 0 0
rL-1i,+ 0 0 0 0 0 : 0 0 1 0 0
: : : : : : : : : : ddots :
LH- 0 0 0 0 0 : 0 0 0 0 1

Table 5.3: Structure of Ci

We make the following observations:

  1. The matrix Ci is symmetric.
  2. The inverse of Ci exists and is given by Ci-1 = I + b/1-b AiT Ai.
  3. Let Wi = { r0i,···,rL-1i } and let Wic denote the complement of this set in the set of all possible rows, { 0,1,···,LH-1 }. Then if j Î Wic, the row--column pair (j,j) is of identity form. That is, it has the form

      COL            
    ROW 0 ··· j-1 j j+1 ··· LH-
    0       0      
    ···       :      
    j-1       0      
    j 0 ··· 0 1 0 ··· 0
    j+1       0      
    ···       :      
    LH-       0      

    We will henceforth simply say that row j Î Wic in Ci has identity form.
Let Vi denote the span of the vectors in the set Wi and V denote the span of the vectors in the union Èi=0S-1 Vi. Then

  1. C0 has identity form for all j Î W0c
  2. C1 C0 has identity form for all j Î { W0 È W1 }c
  3. CS-1 ··· C0 has identity form j Î { W0 È W1 ··· WS-1 }c
Consider D x® for all vectors x® in V. Note that

| D
®
x
 
|
£
| | CS-1 | | ··· | | C1 | | | C0
®
x
 
|.

Hence, we must obtain an estimate for the norm of the action of C0 on x® Î V. For an arbitrary sample index i, note that the (u,v) element of Ci can be written as u = a H + b and v = c H + d, for some choice of indices a, b, c and d; hence,

(Ci)u,v = (Ci)a H + b,c H + d
  =
dca ddb - b d
a H + b
 
rai
d
c H + d
 
rbi
  =
dca ddb - b d
a H + b
 
a H + nai
d
c H + d
 
c H + nci
  =
dca ddb - b d
b
 
nai
d
d
 
nci
.

For arbitrary vector x® in ÂLH, the component a H + b of Ci x® is given by

(Ci
®
x
 
)a H + b
=
L-1
å
c=0
H-1
å
d=0
dca d db xc H + d - b
L-1
å
c = 0
H-1
å
d = 0
d
b
 
nai
d
d
 
nci
xc H + d
  =
xa H + b - b d
b
 
nai
L-1
å
c = 0
x
 
c H + nci
.

The rows aH+b which differ from some aH + nai correspond to rows in Wic. For future use, we denote the set (Èk=0S-1 Wk) by W. Hence the components of Ci x® fall into three categories:

  1. components (Ci x®)r, where r Î Wi
  2. components (Ci x®)r, where r Î W Ç Wic; these are identity rows for Ci, but not necessarily for any other Ck.
  3. components (Ci x®)r, where r Î Wc; these are identity rows for any Ck
Note that it follows from above that the elements (Ci x®)r with r Î W lie in V; hence, V is closed under the action of Ci restricted to the rows W. Hence, if x® Î V, then all components from the third category can be ignored. The components of Ci x® are then

Ci
®
x
 
=
é
ê
ê
ê
ë
xk, k Î Wic
xk - b
 
å
j Î Wi
xj,
k Î Wi
ù
ú
ú
ú
û
.

Let w® be an eigenvector of Ci corresponding to eigenvalue 1. Then Ci w® = w® implies

é
ê
ê
ê
ë
wk, k Î Wic
wk, - b
 
å
j Î Wi
wj,
k Î Wi
ù
ú
ú
ú
û
=
é
ë
wk, k Î Wic
wk, k Î Wi
ù
û
.

It follows immediately that the basis vectors ek, k Î Wic are eigenvectors of Ci with eigenvalue 1. In addition, the vector

®
w
 
=
é
ë
wk k Î Wi
0, k Î Wic
ù
û

with åj Î Wi wj = 0 is also an eigenvector for eigenvalue 1. Indeed, if åj Î Wi wj ¹ 0, such w® cannot be an eigenvector for eigenvalue 1. Hence, we can describe the set of such eigenvectors by Wi = {x® Î ÂLH : AiT · x® = 0 }.

Further, if
w® is an eigenvector of Ci corresponding to eigenvalue x ¹ 1, then Ci w® = x w® implies

é
ê
ê
ê
ë
wk, k Î Wic
wk - b
 
å
j Î Wi
wj,
k Î Wi
ù
ú
ú
ú
û
=
é
ë
x wk, k Î Wic
x wk, k Î Wi
ù
û
.

Since x ¹ 0 (Ci is invertible), we have wk = 0 for k Î Wic and wk - b åj Î Wi wj = x wk for all k in Wi. Hence,

(1-x) wk =
b
 
å
j Î Wi
wj, " k Î Wi

and summing over Wi, we have

0 =
(1-x - b L)
 
å
j Î Wi
wj.

Since the eigenvalue
x ¹ 1, we must have åj Î Wi wj ¹ 0 and hence x = 1 - b L = 1 - l.

We conclude that the eigenvalues of
Ci are either 1 or 1 - l. Hence, the vector

®
w
 
=
é
ë
1, k Î Wi
0, k Î Wic
ù
û
  = AiT

is an eigenvector corresponding to eigenvalue x = 1 - l.

Also, the square of the norm of
Ci x®, for vectors x® Î V is given by

| Ci
®
x
 
|2
=
 
å
j Î Wic
xj2 +
 
å
j Î Wi
æ
ç
ç
è
xj - b (
 
å
p Î Wi
xp) ö
÷
÷
ø
2



 
  =
 
å
j Î Wic
xj2 +
 
å
j Î Wi
xj2 + (L b2 - 2 b ) æ
ç
ç
è
 
å
j Î Wi
xj ö
÷
÷
ø
2



 
  =
N-1
å
j=0
xj2 + (L b2 - 2 b ) æ
ç
ç
è
 
å
j Î Wi
xj ö
÷
÷
ø
2



 
  =
1 - b (2 - b L) æ
ç
ç
è
 
å
j Î Wi
xj ö
÷
÷
ø
2



 
.

Since b = l/L, for 0 < l < 1 and L ³ 1 we have b < 1 and 2 - b L > 0. Thus, on V, it is clear that the norm is less than or equal to one. From our discussions above, it is immediate that on the subspace V - Wi, the norm of Ci is actually 1 - l.

5.4   Convergence Analysis:

From the above section, it follows that if we restrict our attention to the subspace of ÂLH consisting of V - W0, we have | | C0 | | = 1 - l and

| | D | | £ | | CS-1 | | ··· | | C1 | | | | C0 | |
  < 1 ··· 1 · (1 - l) < 1.

Hence, D has norm less than one and the infinite series

(I + D + ··· + Dt + ··· )

converges to a matrix P as t ® ¥ as long as VT lies out of the subspace W0. This is equivalent to satisfying the condition

0 ¹ A0 · VT.

Moreover, it is clear that on V - W0, I - D is invertible and

(I - D)-1 = P.

Thus, the solution to the CMAC learning algorithm is given in closed form by
(I - D)-1 VT, where the inverse is computed relative to the subspace V - W0. Finally, if we calculate the difference between successive sweeps of the training algorithm, we see

®
x
 
t+1,S-1 -
®
x
 
t,S-1
= b Dt VT
|
®
x
 
t+1,S-1 -
®
x
 
t,S-1 |
< b | | D | |t | V |
  <
l

L
(1 - l)t | V |.

This implies that the rate of convergence of the training algorithm is approximately 1 - l.

5.4.1   The Structure of VT:

It remains to study the condition that A0 · VT ¹ 0. Since VT is defined as

VT =
S-1
å
k=1
CS-1 CS-2 ··· Ck Ak-1T bk-1 + AS-1T bS-1

we must have

0 ¹
S-1
å
k=1
A0 CS-1 CS-2 ··· Ck Ak-1T bk-1 + A0 AS-1T bS-1
  ¹
S-1
å
k=1
A0 CS-1 CS-2 ··· Ck Ak-1T bk-1 + m0,S-1 bS-1.

Let VpT = åk=1p Cp Cp-1 ··· Ck Ak-1T bk-1 + ApT bp. Consider

  1. A0 · V1T = A0(C1A0Tb0 + A1Tb1)
      = A0(I-b A1T A1)A0Tb0 + m01b1
      = (A0 - b m01 A1) A0T b0 + m01b1
      = (L - b m012)b0 + m01b1
  2. A0 · V2T = A0(C2C1A0Tb0 + C2A1Tb1 + A2Tb2)
      = A0(I-b A2T A2)(I-b A1T A1)A0Tb0 + A0(I-b A2T A2)A1Tb1 + m02b2
      = (A0 - b m02A2)(A0T - m01 b A1T)b0 + (A0 - b m02 A2)A1Tb1 + m02b2
      = (L - b (m012 + m022) + b2 m21m02m01)b0 + (m01 - b m02m12)b1 + m02b2
  3. A3 · V2T = A3(C2C1A0Tb0 + C2A1Tb1 + A2Tb2)
      = A3(I-b A2T A2)(I-b A1T A1)A0Tb0 + A3(I-b A2T A2)A1Tb1 + m32b2
      = (A3 - b m32A2) (A0T - m01 b A1T) b0 + (A3 - b m32 A2)A1Tb1 + m32b2
      = (m03 - b (m01m31+m32m20) + b2 m32m01m21)b0 + (m31 - b m21m32)b1 + m32 b2
  4. A0 · V3T = A0(C3 V2T) + A0 A3Tb3
      = A0(I-b A3T A3)V2T + m03b3
      = A0 V2T - b m03 A3 V2T + m03 b3
      = (L - b (m012 + m022+m032) + b2 (m21m02m01 +m03m01m31+m03m32m20)
        - b3 m03m32m01m21)b0
        + (m01 - b (m02m12+m03m31) + b2(m03m21m32))b1
        + (m02 -b m03m32)b2 + m03b3
Since b = l/L, A0 · V3T can be rewritten as

1

L
A0 · V3T
= (1 - l (p012 + p022+p032) + l2 (p21p02p01 +p03p01p31+p03p32p20)
    - l3 p03p32p01p21)b0
    + (p01 - l (p02p12+p03p31) + l2(p03p21p32))b1
    + (p02 -l p03p32)b2 + p03b3.

Assume for the moment that there were just three samples, i.e., S = 3. Then the condition above for A0 · V3T ¹ 0 determines the convergence of the chosen CMAC architecture. Let us assume that the overlap between A0 and any other address Ai, for i ¹ 0, is a fixed percentage p. For example, it is reasonable to assume that p = L-2/L. Then we have

1

L
A0 · V3T
= (1 - 3 p2l + 3 p3 l2 - p4 l3)b0 + (p - 2 p2 l + p3 l2 )b1
    + (p - p2 l )b2 + p b3
  = (1 - 3 p µ + 3 p µ2 - p µ3 )b0 + p(1 - 2 µ + µ2 )b1 + p(1 - µ )b2 + p b3
  =
(1-p)b0 + p ( (1 -3 µ + 3 µ2 - µ3)b0 + (1 - 2 µ + µ2 )b1 + (1 - µ )b2 + b3 )
  =
(1-p)b0 + p
3
å
j=0
(1-µ)j b3-j

where µ = p l. The convergence criterion is therefore

0 ¹
(1-p) b0 + p
3
å
j=0
(1 - p l)j b3-j.

If we assume the uniform overlap condition that p0i = p for all addresses other than the zeroth, a standard induction argument leads to the following convergence criterion:

0 ¹
(1-p) b0 + p
S-1
å
j=0
(1 - p l)j bS-1-j.

This condition is easy to check numerically for a given hash, field width, level strategy and I/O data set; indeed, it is recommended that code to implement this sort of check be used routinely. There has never really been a problem with the CMAC training algorithm converging; the explanation is that the criterion above shows convergence is insensitive to the choice of hash size H, number of levels L, receptive field width w and I/O data set.

5.5   Quality of Solution:

The quality of the approximation is another question. Let x® denote the converged solution to the CMAC training algorithm. Then the quality of this solution is determined by the error term

E =
1

2
S-1
å
j=0
(Aj
®
x
 
- bj)2
  =
1

2
S-1
å
j=0
(b Aj (I - D )
-1
 
| V- W0
VT - bj)2.

The first order approximation to the quality of this solution is found by choosing b and p so that

E1 =
1

2
S-1
å
j=0
(b Aj (I + D) VT - bj)2
  =
S-1
å
j=0
(b Aj VT + b Aj D VT - bj)2

is minimized. Now consider terms of the form AjT VT for the same uniform overlap condition; we have

  1. Aj · V1T = Aj(C1A0Tb0 + A1Tb1)
      = Aj(I-b A1T A1)A0Tb0 + mj1b1
      = (Aj - b mj1 A1) A0T b0 + mj1b1
      = (mj0 - b mj1m10)b0 + mj1b1
      =
    p [ (1-b p)b0 + b1 ]
  2. Aj · V2T = Aj(C2C1A0Tb0 + C2A1Tb1 + A2Tb2)
      = Aj(I-b A2T A2)(I-b A1T A1)A0Tb0 + Aj(I-b A2T A2)A1Tb1 + mj2b2
      = (Aj - b mj2A2)(A0T - m01 b A1T)b0 + (Aj - b mj2 A2)A1Tb1 + mj2b2
      = (mj0 - b (mj1m01 + mj2m02) + b2 mj1m21m01)b0 + (mj1 - b mj2m12)b1 + mj2b2
      =
    p [ (1 - 2b p + b2 p2)b0 + (1 - b p )b1 + b2 ]
      =
    p
    2
    å
    j=0
    (1 - b p)j b2-j
  3. Aj · V3T = Aj(C3 V2T) + Aj A3Tb3
      = Aj(I-b A3T A3)V2T + mj3b3
      = Aj V2T - b mj3 A3 V2T + mj3 b3
      =
    (1 - p b) é
    ê
    ê
    ë
    p
    2
    å
    j=0
    (1 - b p)j b2-j ù
    ú
    ú
    û
    + p b3
      =
    p
    3
    å
    j=0
    (1 - b p)j b3-j
A standard induction argument then implies

Aj · VT =
p
S-1
å
j=0
(1 - b p)j bS-1-j.

Similary, the terms Aj D can be obtained from (5.13):

Aj D =
Aj æ
ç
ç
è
I - b
S-1
å
i=0
AiT Ai +
S-1
å
k=1
 
å
(i0,i1,···,ik) Î QkS-1
(-1)k+1 bk+1 pk A
T
 
i0
A
 
ik
ö
÷
÷
ø
  =
Aj - b
S-1
å
i=0
mji Ai +
S-1
å
k=1
(-1)k+1 bk+1 pk
 
å
(i0,i1,···,ik) Î QkS-1
m
 
j,i0
A
 
ik
  =
Aj - b p
S-1
å
i=0
Ai +
S-1
å
k=1
(-1)k+1 bk+1 pk+1
 
å
(i0,i1,···,ik) Î QkS-1
A
 
ik
.

Hence, Aj D VT has the form

Aj D VT =
Aj VT - b p
S-1
å
i=0
Ai VT +
S-1
å
k=1
(-1)k+1 bk+1 pk+1
 
å
(i0,i1,···,ik) Î QkS-1
A
 
ik
VT
  =
D æ
ç
ç
è
1- b p
S-1
å
i=0
+
S-1
å
k=1
(-1)k+1 bk+1 pk+1
 
å
(i0,i1,···,ik) Î QkS-1
1 ö
÷
÷
ø
  =
D æ
ç
ç
è
1- æ
è
S
1
ö
ø
b p +
S-1
å
k=1
æ
è
S
k+1
ö
ø
(-1)k+1 (b p)k+1 ö
÷
÷
ø
  =
D
S
å
k=0
æ
è
S
k
ö
ø
(-1)k (b p)k
  = D (1 - b p)S

where, for convenience, we set A = åk=0S-1 (1 - b p)k bS-1-k and D = p A. Thus, the first order quality condition is obtained by choosing b and p so that we minimize

E1 =
1

2
S-1
å
j=0
( b D (1+ (1 - b p)S) - bj )
2
 
 
=
1

2
S-1
å
j=0
( b p A(1+ (1 - b p)S) - bj )
2
 
 
.

Letting t = 1 - b p and B = åk=1S-1 k tk-1 bS-1-k, we see

E1 =
1

2
S-1
å
j=0
( b p A (1 + tS) - bj )
2
 
 
=
1

2
S-1
å
j=0
( (1-tS+1) A - bj )
2
 
 
E1

t
=
S-1
å
j=0
( (1-tS+1) A - bj ) æ
ç
ç
è
-(S+1)tS A + (1-tS+1)
A

t
ö
÷
÷
ø
  =
S-1
å
j=0
( (1-tS+1) A - bj ) ( -(S+1)tS A + (1-tS+1) B )
  =
S-1
å
j=0
( (1-tS+1)2 A B -(1-tS+1)(S+1) A2 + bj (S+1) tS A - bj (1+tS+1) B )
  =
æ
ç
ç
è
(1-tS+1) A -
_
b
 
ö
÷
÷
ø
( S(1-tS+1) B -S(S+1) tS A )

where b_ denotes the average of the output data vector b®. The first order extremum occurs when this derivative is zero. The simplest condition for a critical point is

0 =
æ
ç
ç
è
(1-tS+1) A -
_
b
 
ö
÷
÷
ø
  =
æ
ç
ç
è
(1-tS+1)
S-1
å
k=0
tk bS-1-k -
_
b
 
ö
÷
÷
ø
.
    (5.13)

It is easy to see that at this critical t value, E1 = .5 S s2, where s is the standard deviation of the output data.

If we let the function
g(t) = (1-tS+1) åk=0S-1 tk bS-1-k - b_, we see that g(1) = - b_ and g(0) = bS-1 b_. Under reasonable conditions, there is a change in sign at these two values and hence by the intermediate value theorem, there is a root, t*, satisfying g(t*) = 0 with 0 < t* < 1. Thus,

p b =
p
l

L
= 1 - t*

giving a design condition for the number of levels

L =
p l

1-t*

where the value of t* must be obtained numerically. Thus, it is best if the condition (5.14) is approximated numerically at the beginning of the CMAC architecture initialization. Note also that this t* value is only an approximate solution to our quality problem. We are only using the first two terms of the CMAC solution b åj=0¥ Dj VT; inevitably, this will cause error. Numerical studies can give better insight, at the cost of increased computation. It seemed a reasonable tradeoff to use this simple approximation to obtain a ``ballpark'' qualtiy condition. Note that condition (5.14) also gives information about how to choose the overlap percentage p, which is closely related to the hashing strategy H and the sensor field width. If we choose to have an overlap of 80 % at all addresses, then p = .8 and we must choose the sensor field width w to achieve this overlap. Since the sensor field width is usually taken to be b-a/L, the ratio of the interval width to the number of levels, this gives additional constraints on our choices.

Finally, note that higher order quality criteria can, in principle, be found by including additional terms of the form
Dt for t >1 in the error function approximation. While tedious, these conditions can be generated.

The analysis above looks carefully at the quality of the CMAC approximation to the original sample set
{ (pi,bi) : i = 0, S-1 }. The question of the quality of the generalization capabalities of the CMAC approximation has not been answered. Analysis similar in spirit to the arguments given above can be used to generate a variety of first order conditions for the generalization capacity of the CMAC model. We will report on these studies separately in a fuller study of local to global modelling using CMAC architectures.

5.6   Solution Magnitude and Range Estimates:

Let x denote the converged value of the weights in a particular CMAC architecture and p® be an arbitrary input vector from ÂN. Associated to this input vector is an address vector AT that consists of L blocks of length H; the values in each block are all zeroes, except a single 1 at some index within the block. As discussed in Section 5.2, the CMAC output is then given by g(p®):

g(
®
p
 
)
=
A
®
x
 
  =
é
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ë
0  
·
·
·
 
1
position n0(
®
p
 
)
·
·
·
 
0  
0  
·
·
·
 
1
position H + n1(
®
p
 
)
·
·
·
 
·
·
·
 
0  
·
·
·
 
1
position (L-1)H + nL-1(
®
p
 
)
·
·
·
 
0  
ù
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
û
T










































 
®
x
 

where nj(p®) denotes the local address for level j. Then

| g(
®
p
 
)|
=
b | A
¥
å
t=0
Dt VT |
  £
b | A | æ
ç
ç
è
¥
å
t=0
| | D | |t ö
÷
÷
ø
| V |
  £
sqrt(L) b
1

1-| | D | |
| V |
  £
l

sqrt(L)
1

1-(1-l))
| | æ
ç
ç
è
S-1
å
k=1
CS-1 ··· Ck Ak-1T bk-1 + AS-1T bS-1 ö
÷
÷
ø
| |
  £
S-1
å
k=0
| bk |.

Hence, the range of the CMAC function is the ball of radius r = åk=0S-1 | bk |.

The magnitude of the converged solution to the CMAC training is also easily estimated from the argument above:

| x | =
b |
¥
å
t=0
Dt VT |
  £
1

sqrt(L)
S-1
å
k=0
| bk |.

It is therefore clear that the magnitude of the converged solution is determined by the magnitudes of the elements in the training data set.

5.7   Conclusion:

We have shown that the CMAC training algorithm will converge with rate 1 - l as long as the condition A0 · VT ¹ 0 is satisfied. For the conceptually easier case of a uniform overlap percentage p, this condition for S samples becomes

0 ¹
(1-p) b0 + p
S-1
å
j=0
(1 - p l)j bS-1-j.

We have also shown that, in the case of uniform overlap, a first order condition for a quality solution is given by

0 =
æ
ç
ç
è
(1-tS+1)
S-1
å
k=0
tk bS-1-k -
_
b
 
ö
÷
÷
ø
.

Both of these conditions are best checked numerically during the usual CMAC initialization process. Finally, we have shown that the range of the CMAC architecture lies in the ball of radius
r = åk=0S-1 | bk | and that the magnitude of the solution is bounded by r/sqrt(L) .

Note that the analysis above is essentially unchanged; we merely use the average overlap
p^ instead of assuming a uniform overlap. All first order conditions will then be phrased in terms of p^ instead of p.

The application of this methodology to
local to global modelling issues and the determination of first order conditions for the quality of generalization will be discussed in separate work. We finish by noting these techniques are also useful in the study of more complex CMAC--based architectures. These include architectures which use analog rather than binary sensors with clustering strategies, as well as hierarchical structures involving multiple CMAC architectures at different levels of resolution.


Previous Next Contents