|
Mata is a matrix language built into Stata, similar in many ways
to R, Matlab or GAUSS. It does have some unique and intriguing
features however. One is that it is a compiled language rather
than interpreted, which improves performance. It also has been
parallelized in Stata/MP (available on all the SSCC Linux servers
and Condor) which dramatically improves performance. On the other
hand Mata is fairly new and has not yet "caught on" at
the SSCC, so we don't have any real-world comparisons to offer.
Mata is not a replacement for Stata, nor is it intended to be a
stand-alone statistical package. It is a tool which is best used
as a supplement to Stata (or SAS or SPSS), for doing those things
Stata does not do well on its own. In particular, Mata does not work
in the context of a single data set, giving it additional flexibility.
But you should not try to learn Mata unless you are already
familiar with Stata or another statistical package.
Mata is a relatively "low level" language. Much of your
time in Stata (or SAS or SPSS) is spent using built-in programs,
finding just the right combination of options to get Stata
to do what you want. In Mata you will take direct control, telling
Mata what you want to do step-by-step. (The Mata optimizer, which
we will discuss at length, is a notable exception.) That means
doing simple things is usually more cumbersome in Mata than in
Stata, but Mata has fewer constraints.
This publication is primarily written for people who have significant
experience using Stata, SAS or SPSS syntax, but no other programming
languages. Thus there will be a lot of emphasis on learning how
to do useful things by manipulating matrices, and many of the
examples are designed to give experience doing so as well as
illustrating a particular concept. Matlab and GAUSS veterans
may find they can skim these sections, focusing on what is new
to them. C programmers will find that Mata imitates C whenever
it can, so they can probably skim the sections on standard programming
constructs like loops. But no matter what your background, you
will learn far more if you read this publication at the computer,
with Stata running, and actually type in the examples.
Mata runs within Stata, so in order to use Mata you'll need to
know how to run a Stata program, called a do file. If you've
never used Stata, please read the section on do files in An
Introduction to Stata. Interactive Stata (i.e. start it up and
type in commands) is a great way to learn and that's how you'll
do the examples in this publication. But for real work you'll
want to write everything in do files.
There are several example files associated with this publication.
There are links to them in the text as they are used. However,
if you're using an SSCC Linux server you may want to copy them
all ahead of time. To do so type the following at the Linux prompt:
mkdir mataclass
cd mataclass
cp /usr/global/web/sscc/pubs/files/4-26/* .
Most of the commands you'll type as you read are also contained
in mataclass.do. The two
major examples are found in ex1.do and
ex2.do.
Mata Basics
To start Mata, first start up Stata and then type
mata
in the command window. Stata will then switch to "Mata mode." It's
easy to forget which mode you're in, but if you look carefully
at the bottom of the review window, where you normally see a
period you'll now see a colon. Of course when you first start
Mata there's a big horizontal line that says but that won't
be visible for long. That bar also reminds you of another useful
command: to get out of Mata mode, type
end
While Stata is organized around commands, Mata is organized around
"statements." For example, you can simply type
2+2
and Mata will return
4
Storing the results in a variable is just as easy:
x=2+2
Note that there was no output this time. If you want to see the
value of x, simply type:
x
and you'll get 4 back.
Matrix Operators
In our earlier examples, the plus sign acted as an operator: Mata
recognized it as saying "take the thing before the plus sign
and the thing after the plus sign and sum them." Mata
also defines matrix operators which carry out matrix manipulations.
Column and Row Join
The comma is defined as the "column join" operator, or
"take the things before and after the comma and put them next
to each other." That means
1,2
is interpreted by Stata as a matrix with one row and two columns.
In fact, type it and the output will be:
1 2
+---------+
1 | 1 2 |
+---------+
The backslash (\ not /,
which is division) is the "row join" operator, or "take the thing
before the backslash and stack it on top of the thing after it."
Thus
1\2
is a matrix with two rows and one column:
1
+-----+
1 | 1 |
2 | 2 |
+-----+
The things to be operated on are not limited to scalars, so you
can construct matrices.
1,2\3,4
gives
1 2
+---------+
1 | 1 2 |
2 | 3 4 |
+---------+
Error Messages
There are limits:
1,2\3\4
gets you
You can't make a matrix whose first
row has two columns but whose second and third
rows have one column. Stata calls this as a
conformability error, a class which also includes
things like trying to multiply matrices where
the number of columns of the first matrix is different from the number
of rows of the second.
This is also a "runtime" error. When you give Mata a
statement, the first thing it does is compile the statement into
byte code. At that point Mata isn't looking at the specific quantities
to be manipulated. All it sees is "put something next to something
else, then stack that on top of two other things." That makes
sense, so the statement compiles successfully. Only when the
code runs does Mata notice that the things to be stacked have
different numbers of columns.
A compile time error is generated when the statement doesn't make
sense no matter what quantities you plug into it. For example,
1,\3\4
Gets you
Compile time errors are always error code 3000, and don't include
. Noting whether an
error is a compile time error or a runtime error can help you
in finding the problem. If it was runtime, you know Mata at least
understood your code (though there's no guarantee it understood
it to mean the same thing you understand it to mean) and it was
something about the specific quantities it was working with
that caused the problem. If it's compile time, the code doesn't
even make sense to Mata.
Parentheses
As you're putting together matrices you're welcome to include parentheses
to help clarify things. It's mostly for your benefit:
1,2\3,4
(1,2\3,4)
(1,2)\(3,4)
are all the same to Mata, but if one of them looks clearer to you
by all means use that style in all your programs. (I happen to be
a fan of the third.)
Range Operators
Another set of handy tools for creating matrices are the range
operators. The .. operator creates
a series starting from the number on the left up to the number
on the right and makes them into a row vector. The :: operator
does the same but puts them in a column vector.
1..3
1::3
It's not limited to integers, however:
1.2..3.5
Note that number on the right is not guaranteed to be part of the
resulting series.
Variables
The results of matrix statements can of course be saved in variables
just like other statements. For example:
x=3,4
y=5,6
z=(1,2)\x\y
z
Note how the definition of z looks a lot like the statement we
tried earlier that gave a runtime error. It runs now because
x and y both have two columns. That's why it was a runtime error
and not a compile time error: given the right inputs the statement
can work.
Arithmetic Operators
The standard arithmetic operators recognize when one or both of
their arguments is a matrix, and act accordingly. This includes
addition and subtraction, scalar times matrix, and full matrix
multiplication.
However, there are also "colon" operators which work element by
element. For addition and subtraction it makes no difference,
since they're element by element anyway. But it makes a great
deal of difference with multiplication. Consider the following:
x=(1,2)\(3,4)
y=(1,2)\(3,4)
x*y
x:*y
Logical Operators
Logical operators such as greater than, less than, and equal to
are also defined in both matrix and colon versions (note that
the equals operator is ==, as opposed to = for assignment). However,
Mata does not have a boolean variable type. Thus (as in Stata)
logical operators return one for true and zero for false.
If you compare two matrices using standard greater than, less than
or equals, Mata returns one if and only if the condition is true
for all the elements of the two matrices. Otherwise it returns
zero
x==y
x>y
x<y
z=(1,2)\(3,5)
x==z
x>z
x<z
x<=z
These operators can only work on matrices of the same size.
In their colon form, the result is a matrix containing all of the
element-by-element comparisons.
x:==y
x:==z
x:<z
The colon form allows more flexibility about the sizes of the matrices
to be compared. If one argument is a scalar, row vector, or
column vector, it will be duplicated as needed so as to match
the other matrix.
x:==2
x:==(1,2)
x:==(1\3)
The transpose operator is the right single quote ('). It works
on just one object, the one to its left, and returns its transpose:
x'
There is no inverse operator, however. Inversion is carried out
by functions.
Subscripts
To reference a particular element of a matrix, put the row and
column number in square brackets after the matrix name. Given
the matrix x defined above, x[1,1] is 1, x[1,2] is 2,
etc. Subscripts can be used on either side of an equals sign:
z=x[1,1]
z
x[1,1]=5
x
If a matrix is a row or column vector, you can use just one subscript.
y=1,2,3
y[2]
y=1\2\3
y[2]
"Missing" for the row or column number is taken to mean
all the rows or columns. You can use either the standard period
for missing or simply leave the number out. Thus
x[.,1]
x[1,]
You can also replace either number with a row or column vector
(and Stata doesn't care which one you use). However, this time
parentheses are required so Mata knows where the list of rows
ends and the columns begin.
a=(1,2,3)\(4,5,6)\(7,8,9)
a[(1,3),.]
You can repeat rows or columns, mix up the order, or even create
matrices bigger than the original.
a[(3,2,1),(3,2,1)]
a[(3,1,2,2,1,3),.]
Pre-defined vectors are also just fine:
b=(3,1,1)
a[.,b]
Note that using vector subscripts like this will be faster
than multiplying by a permutation matrix.
Subscripting Ranges
Mata refers to the subscripts we've used thus far as "list" subscripts.
Every element we wanted was listed explicitly (well, except for
missing meaning "all"). But you'll often want to specify
a range of rows and columns.
There's not much need for ranges with small matrices, so begin
by creating a bigger matrix:
x1=1..10
x2=(1::10)*10
x=x1[(1,1,1,1,1,1,1,1,1,1),.]+x2[.,(1,1,1,1,1,1,1,1,1,1)]
x
Take a moment to try to figure out how this worked before reading
further.
The first line defines x1 as a row
vector using the range operator. The second defines a x2 column
vector, again using the range operator but multiplying the result
by ten. The third then defines x as ten copies of x1 stacked
on top of each other plus ten copies of x2 placed next to each
other. The result is a ten by ten matrix that goes from 11 to
110, which will make it easy to tell which rows and columns we've
extracted in the next few examples. (Yes, there are easier ways
to do this using functions and/or loops, but we haven't covered
them yet.)
One method of extracting a range of rows and columns from x is
to use the range operators within list subscripts. Thus to get
rows 3-7 and columns 4-8 you could do:
x[(4..7),(3..8)]
This is exactly equivalent to typing out:
x[(4,5,6,7),(3,4,5,6,7,8)]
True range subscripts, however, are different. For one thing, they
are contained in square brackets and the pipe character (shift-backslash).
There are two elements, separated by a backslash: the row and
column of the upper left corner of the desired range, and the
lower right corner of the desired range. Thus the equivalent
to
x[(4..7),(3..8)]
is
x[|4,3\7,8|]
Think of it as replacing "rows four through seven and columns three
through eight" with "everything between row four, column three
and row seven, column eight, inclusive."
Missing can mean several things in range subscripts. If used
in specifying the upper left corner, it means either the first
row or the first column. If used in specifying the lower right
corner, it means either the last row or the last column. Thus
x[|.,.\.,.|]
is simply all of x.
There's also no rule that says the upper left and bottom right
corners can't be on the same row or column.
x[|3,3\3,.|]
This means "row three, from the third column to the end."
Getting your Data from Stata to Mata
Most of the time you won't be creating matrices by hand. Instead
you'll want to make matrices containing the data you already
have in Stata. You're also likely to want to take your Mata results
and copy them to your Stata data set.
The first question you need to answer is whether you want to make
a fresh copy of your data, or whether you want to have Mata work
with the Stata data directly. Mata can define matrices which
are actually "views" of your Stata data. Think of a view
as just giving your Stata data a name Mata can use. Views use
a trivial amount of memory even if your data are very large.
If you then change the contents of the view matrix, the Stata
data are also changed. If you make a copy of the data instead
you can make changes and keep the original intact, however you'll
use twice as much memory.
st_data
To make copies of your data you'll use the st_data family
of functions. The simplest and the fastest is _st_data.
It is used to get a single number from your data set. It takes
two arguments: an observation number and a variable number. Think
of your data set as a matrix already, and the variable number
is simply the column number of the variable you want.
To see this in action, first end Mata by typing end, load
the automobile example data set, list observation one, and go
back into Mata:
end
sysuse auto
l in 1
mata
Now get the price of the first car by typing:
_st_data(1,2)
The result will be 4099, which matches what you just listed. Try
some other variables to make sure you've got the idea. Note that
if you try to get the first column you'll get a missing value.
That's because _st_data is only for numeric data. For strings,
use _st_sdata which works in pretty much the same way.
_st_sdata(1,1)
Mata can look up the column numbers for you using the st_varindex function.
It takes the name of the variable you want and returns the index.
However, that process will slow things down slightly. Note that
you're welcome to use the output of one function as an argument
for another function:
_st_data(1,st_varindex("mpg"))
If you want more than one value, use st_data.
st_data can be used just like _st_data (though
in that case _st_data is
faster). However, st_data is more
flexible about the arguments it will accept.
The row number can be missing, in which case all observations are
returned. It can also be a column vector, in which case only
the specified observations are returned. It can also be a matrix
with two columns. Each row then represents a range of observations.
A missing in this case is interpreted as the last observation.
The column number can also be missing (all variables) or a vector
listing the variables desired. It can also accept the names of
the variables rather than their numbers, though this will be
slower. It does not allow you to specify ranges like you can
for rows.
Like _st_data, st_data cannot handle strings. But there is an equivalent
st_sdata.
Try the following:
st_data(.,.)
st_data((1,3),2)
st_data(1,(2,4))
st_data(1,("price","rep78"))
st_sdata(.,1)
Naturally all these results could be stored in matrices for future
use.
x=st_data(1,(2,4))
x
st_data can also take a third argument:
a selection variable. This can be either the index or the name
of a variable, and if it is specified then only observations
where that variable is not equal to zero are returned. Choosing
0 as the select variable has a special meaning: in that case
observations will be excluded if they have a missing value for
any variable specified.
st_sdata(.,1,"foreign")
st_view
st_view makes views onto your data
rather than copying them, but you can select rows and
variables using the exact same methods as st_data.
However, st_view gives you the results
in a different way. With
st_data, the matrix you wanted was
what the st_data function
returned. You could then store the results or simply let them
be displayed on the screen. With st_view,
the function itself returns nothing. Instead you need to pass
in a matrix which will be replaced with the view you select.
st_view(x,.,.)
x
One catch is that the matrix has to exist before you pass it in.
It doesn't matter how big it is--in fact it will be completely
replaced--but if you try to pass a matrix to st_view that hasn't
been defined somehow you'll get a runtime error. One easy way
is to simply set the matrix you want to use to zero before passing
it in. You can even do that inside your function call:
st_view(n=0,1,(2,4))
n
x and n now
look like a regular matrices, but keep in mind that they are
in fact views. If your objective in using a view is to save
memory, you have to be careful not to make copies of it.
y=x
does not create a new view. It creates a new matrix containing
the values of x; a copy. More subtly,
x[.,j]
creates a copy of the jth column of x. If you need a particular
column out of x, you can just create a new view:
st_view(xj,.,j)
There's no limit to the number of different views you can set up
for the same data.
Getting your Data from Mata to Stata
If all you want to do is change the values of existing variables,
make a view and change at will. But if you want to add new variables
you'll need a couple more functions.
st_addvar adds a new variable. It takes two arguments: the variable
type and the variable name. It returns the index of the variable,
though storing that for future use is optional.
Let's create a new variable: weight*mpg (so, pound-miles per gallon,
a unit only an engineer could love).
mpg=0
st_view(mpg,.,"mpg")
Note that the matrix called mpg is now a view of all rows of the
mpg variable.
weight=0
st_view(weight,.,"weight")
pmg=weight:*mpg
st_addvar("long","pmg")
st_store(.,"pmg",pmg)
To explore this new variable using the Stata tools you're familiar
with, type end. Of course this is
an extraordinarily clumsy replacement for
gen long pmg=mpg*weight
This leads to a general principle: use Mata for what it's
good at, and Stata for what it's good at.
There are a wide variety of other functions you can use to communicate
between Mata and Stata, including adding observations and getting
or setting local macros, e() and r() vectors, and much more.
See the manual for details.
Saving and Loading Mata Data
Most often the goal of a Mata program is to take your Stata
data, find a result, and then either display it or put it back
into your Stata data. However, you can save Mata matrices themselves
if you need to.
To save a matrix, type
mata matsave filename matrixlist
Replace filename with
the name of the file you want to create. Stata will add .mmat to
the filename automatically. Replace matrixlist with
one or more matrices you want to save (separating them with spaces)
or with an asterisk (*) to save all
the matrices currently in memory.
To load the matrices you've saved previously, type
mata matuse filename
This will load all the matrices stored in filename.
Both commands will accept the replace option.
For matsave, this allows Mata to overwrite
the existing file on disk. For matuse,
this allows Mata to overwrite existing matrices in memory with
the same names as matrices in the file.
These commands are intended for interactive use and cannot be
used in functions. If you need to save matrices within a function,
check out fopen, fputmatrix, fgetmatrix,
and fclose.
Hierarchical Data
Hierarchical data has always been a bit awkward to work with in
Stata, or any other statistical program that uses a single data matrix.
A typical example would be individuals living in households: should
each household be one observation, or each individual? Either way
there are inefficiencies. In Mata you can have it both ways: one
matrix of individuals and one matrix of households. The key is linking
them together, but subscripts make that easy.
As an example, take a look at the hh
Stata data set, which is in Stata format. It consists of six individuals
living in three households. hh is
the household ID, hhType is the type
of household, and hhInc is the household
income. age and female are
individual variables. This data is in the long form, with one
observation for each individual, which means that the household
variables must be duplicated.
end
use hh,replace
l
Now enter mata and load the hh
Mata data set.
mata
mata matuse hh
hh
ind
It contains two matrices, ind and hh. hh contains the household level variables (hhType and hhInc). ind contains the individual variables (age and female)
plus the household ID. Note however, that hh does not contain an
ID: the row number is an implicit ID.
For example, if you wanted to know what type of household person
number two lives in, you'd use:
hh[ind[2,3],1]
ind[2,3] is the household ID of
the second person, or the row in the hh matrix. Column one of
hh is the household type.
Of course a regression model may need a single matrix just like
Stata does. You can easily recreate the matrix as Stata views it
with:
x=ind,hh[ind[.,3],.]
Here ind[.,3] is a column vector
listing which row we want from hh for
each row of ind. The resulting rows
from hh are then placed next to ind to
create x.
Matrix Functions
Mata has a very large number of matrix functions built into it.
This section will hit some of the most useful.
Creating Matrices
The I function can take one or two arguments. If it is given one
argument, it will return an identity matrix of size equal to
the argument it was given. If it is given two arguments, it will
return a matrix with that number of rows and columns which is
full of zeroes except for ones along the principal diagonal.
I(3)
I(4,3)
The J function creates a matrix of constants. It takes three arguments:
the number of rows of the matrix to be created, the number of
columns, and what to put in the matrix. The last argument can
be of any type.
J(3,3,0)
J(2,3,"a")
The e function creates unit vectors:
row vectors with a one in one column and zeroes everywhere else.
It takes two arguments: the location of the one, and the size
of the row vector.
e(1,3)
Thus you could create an identity matrix by combining e's,
though the I function would of course
be much easier.
e(1,3)\e(2,3)\e(3,3)
The uniform function returns a matrix
filled with random numbers distributed uniform(0,1). The size
is specified in the same way as with the J function.
uniform(5,5)
If you're putting together a matrix which will be symmetric, the makesymmetric function
can take care of half the work for you. You put together the
lower triangle, and makesymmetric will copy it to the upper triangle
(replacing what was there before). It takes one argument, a matrix,
and returns the symmetric version.
x=(1,2)\(3,4)
y=makesymmetric(x)
y
There's a second version, _makesymmetric which returns nothing
but changes the input matrix instead:
_makesymmetric(x)
x
Sorting
The sort function returns a sorted
matrix. It takes two arguments: the matrix to sort, and the column(s)
to sort by.
x=(2,1)\(1,3)\(1,2)
sort(x,1)
Note that sort is not a "stable" sort. If there are any
ties, they will be resolved randomly. Repeat sort(x,1)
enough times and you should see some cases when the original
second row becomes the first row and others where it
stays the second row.
If the second argument is a row vector, the matrix will be sorted
by the first column listed, then ties resolved by the second
column, etc.
sort(x,(1,2))
sort does not change the matrix
it is given--it returns a copy. If you'd prefer to sort the original
matrix without making a copy, use _sort.
x
_sort(x,(1,2)
x
jumble and _jumble are
the opposite of sort and _sort.
They take just one argument, a matrix, and put its rows in a random
order.
Sizes of Matrices
rows, cols,
and length all take one argument,
a matrix. They return the number of rows, the number of columns,
and the total number of elements (rows*columns) in that matrix
respectively.
rows(x)
cols(x)
length(x)
These can be very useful in for loops where you want to loop over
your entire matrix but don't know ahead of time what size it
will be.
Of course matrices can have missing values just like ordinary Stata
data sets. If you need to know the number of missing values in
the whole matrix, a row, or a column, use missing, rowmissing or colmissing respectively.
missing returns a scalar, rowmissing
returns a column vector (one value for each row in the matrix),
and colmissing returns a row vector (one value for each column
in the matrix).
y=(1,2,.)\(4,.,6)\(7,8,.)
missing(y)
rowmissing (y)
colmissing(y)
To get the number of non-missing
values, using nonmissing, rownonmissing,
or colnonmissing.
Descriptive Statistics
Mata has a set of functions for calculating descriptive
statistics, though by design it's not as rich as Stata's. In
their simplest and most commonly used forms, all these functions
take a single matrix as their argument.
Sums may be stretching the definition of "descriptive statistic"
but they are very useful. There are three main sum functions:
sum, rowsum,
and colsum. sum adds
up the whole matrix, returning a single number. rowsum adds
up each row, returning a column vector. colsum adds
up each column, returning a row vector.
sum(x)
rowsum(x)
colsum(x)
max and min simply return the largest or smallest element of the
matrix.
max(x)
min(x)
rowmax and rowmin return a column vector containing the largest
or smallest element of each row. Likewise, colmax and colmin
return a row vector.
rowmax(x)
colmin(x)
You can get the largest and smallest elements with the minmax functions,
including minmax, rowminmax,
and colminmax. They return both
the min and the max in either two columns or, in the case of colminmax,
two rows.
minmax(x)
colminmax(x)
Sometimes you're more interested in where the min or max is than
what it is. If so, check out minindex and maxindex.
The mean function returns a row vector containing the column means
of the matrix.
mean(x)
If you need row means, you'll need to construct them yourself:
rowsum(x)/cols(x)
You can also get the variance matrix of your matrix with variance,
and the correlation matrix with correlation.
variance(x)
correlation(x)
Matrix Characteristics
Mata has functions for finding the most common matrix characteristics.
diagonal returns the principal diagonal
of a matrix as a row vector.
x=(1,2)\(3,4)
diagonal(x)
trace returns the sum of the diagonal elements.
trace(x)
det returns the determinant (with some round-off error).
det(x)
rank returns the rank of the matrix.
rank(x)
As a general rule, using rank to check that a matrix is full rank
because a subsequent calculation requires it is redundant--better
to let the subsequent calculation fail and handle the failure.
Also, when determining the rank of a matrix on a computer
a certain tolerance is required for round-off error. Different
functions can use different tolerances, so they may disagree
with the rank function.
Solvers, Inverters and Decomposition
Matrix solvers are functions designed to solve the equation AX=B
for X. Since inverting a matrix A can be done by solving the
equation AX=I, inverters and solvers are closely related. In
addition, the solvers and inverters generally work by doing some
sort of decomposition, and the decomposition methods can also
be accessed directly. Thus there are several families of three
related functions. Which family you'll choose depends on the
properties of your matrix. We'll focus on Cholesky methods, but the
others work in a very similar way.
The Cholesky Decomposition decomposes a symmetric, positive definite
matrix into a lower triangular matrix times its transpose. The
results can be used to solve matrix equations and find inverses
much more quickly than more general methods. To get an example
matrix we can work with, we can take advantage of the fact that
if x is of full rank, x'x is symmetric and positive definite--and
random matrices are all but certain to be full rank. So:
a=uniform(5,5)
a=a'*a
The cholesky function takes one argument, a matrix, and returns
a lower triangular matrix which is its Cholesky Decomposition.
cholesky(a)
To verify that it works, note that
cholesky(a)*cholesky(a)'
is the same as a.
To illustrate solving a matrix equation, we need a right-hand side.
b=uniform(5,1)
The cholsolve function takes two
arguments, both matrices. It them sets up and solves the equation
AX=B where A is the first matrix and B is the second, returning
X.
cholsolve(a,b)
To verify that it worked, note that
a*cholsolve(a,b)
gives b.
The cholinv function takes one argument, a matrix, and returns
its inverse:
cholinv(a)
Again, to verify it worked try
a*cholinv(a)
The result should be an identity matrix. You'll notice that round-off
error makes some of the zeroes not quite zeroes, but it's as
close as you can get using a computer.
If your matrix is square but not symmetric, LU Decomposition can
do similar things (though it's slower than Cholesky).
The lud function
is equivalent to cholesky,
but more complex because it has to give back three results. Thus
the matrices to store the results have to be passed in, similar
to st_view.
LU Decomposition breaks a matrix into a lower triangular matrix
L, an upper triangular matrix U, and a permutation matrix P.
But Mata, instead of using a matrix P, gives a column vector p which
can be used with subscripting to do the same thing.
a=uniform(5,5)
L=0
U=0
p=0
lud(a,L,U,p)
Take look at L,U, and p. To verify that it worked, see that
(L*U)[p,.]
is a. Note how we're using subscripts
to pull rows from a result, not an existing matrix, and yet it
works just fine.
lusolve and luinv are
much simpler--in fact the usage is identical to cholsolv and cholinv.
Try
a*lusolve(a,b)
a*luinv(a)
Similar functions exist for QR decomposition (qrd, qrsolve, qrinv).
Singular value decomposition has svd and svsolve, but the related
inverter is pinv (which returns the Moore-Penrose pseudoinverse).
invsym, for inverting symmetric
matrices, has no related decomposition or solver functions. It
is, however, what Stata suggests for linear regression.
Example: Linear Regression
You're now prepared to do the most common matrix manipulation of
all, at least at the SSCC.
Since we have the automobile data set loaded, let's regress price
on mpg, rep78, foreign and weight to see which characteristics
American consumers were willing to pay for in 1978.
One complication is that rep78 is missing for some observations,
so you'll need to drop them from your regression. Of course you
could just drop them from the data set entirely, but let's assume
that you want to keep them for later use.
The first step is to mark the observations you can actually use.
Exit Mata by typing end, then create a new variable touse which
is one if all the variables are non-missing and zero otherwise.
gen touse=(price!=. & mpg!=. & rep78!=. & weight!=. & foreign!=.)
Of course in reality the only variable that is ever missing is
rep78, but we'll pretend we didn't
know that ahead of time. If you wanted to run a regression on
a subpopulation (say, just the foreign cars) you could add a
condition to mark the subsample too.
Now go back into Mata by typing mata.
Next make x a view onto all the independant variables and y a
view onto the dependant variable. touse is
the selection variable for both views: observations will only
be included if it is equal to one.
st_view(x=0,.,("mpg","rep78","weight","foreign"), "touse")
st_view(y=0,.,("price"), "touse")
To include a constant term in the regression you need to add a
column of ones to x. Use the J function to create it, then add
it to x with the comma (column join) operator.
x=x,J(rows(x),1,1)
Now you can find the betas:
b=invsym(x'*x)*x'*y
Of course they don't mean anything without standard errors. Start
by finding the residuals:
e=y-x*b
Then the variance-covariance matrix is
v=(e'*e)/(rows(x)-cols(x))*invsym(x'*x)
The standard errors for each beta can be extracted using the diagonal function, along with sqrt, which takes the (element by element)
square root.
se=sqrt(diagonal(v))
The t-statistic is the beta divided by the standard error, but
this is an element by element division so you need to use the
colon operator.
t=b:/se
To find the p-values requires the ttail function,
which takes as arguments the degrees of freedom and the t-statistic
and returns the probability. It is a single-tailed test however,
so you need to multiply by two.
p=2*ttail(rows(x)-cols(x),abs(t))
To put your results together in a readable form, try:
b,se,t,p
Now exit Mata again, and check your results against
reg price mpg rep78 weight foreign
They should be identical.
If
In regular Stata, if is almost always used to define
which observations a command should act on. It's the equivalent
to SQL's where clause. In Mata, if is only used to control program
flow. (Though Stata can use if in this way as well--see the
appropriate section of Programming
in Stata.)
The most basic form of if is simply
if (condition) statement
If condition is
true, then statement is
executed. Otherwise it is skipped entirely. Note that the condition
must be in parentheses.
If you want to do more than one thing if the condition is true,
use the following syntax:
if (condition)
{
statements
}
You can also use else to specify things that should happen if the
condition is not true. Thus you can create fairly elaborate structures:
if (condition1)
{
statements1
}
else if (condition2)
{
statements2
}
else
{
statements3
}
Keep in mind that each condition has a single result, either true
or false. For example, If x is a matrix you can't write if
(x>5) and then list things you want done
for those elements of x which are greater than 5. You can write
if (x[1,1]>5) followed by if
(x[1,2]>5) etc. but of course the easy way to do that is
using a loop.
Loops
Mata has while, do-while,
and for loops available (plus goto for
easier conversion of FORTRAN code, but we don't want to endorse
spaghetti logic).
While
while looks very similar to if:
while (condition)
{
statements
}
The difference is that the statements will be executed over and
over as long as the condition is true. Thus your first concern
should be to make sure that at some point the condition will
become false, or the loop will run forever.
x=0
while (x<=5)
{
x
x++
}
Note that x++ is shorthand for x=x+1.
Even though we said x<=5, the final value of x is six.
That's because the loop only decides to end when x becomes
six.
You can also put a while loop with a single statement in a single
line.
while (x<=5) x++
x
This loop is perfectly legal, but it didn't do anything. The reason
is that the condition started out false, so the loop never executed
at all. Sometimes you need to make sure that a loop runs at least
once, and that's a job for do-while.
Do-While
A do-while loop starts with do and ends with the while condition.
The statements are always executed at least once. If the condition
is true, the loop starts over again from the top.
do
{
x++
}
while (x<=5)
x
Note that x is increased to seven even though it started out greater
than five.
One typical use for do-while loops is to do something until it
converges:
do
{
complex mathematical stuff
} while (abs(newvalue-oldvalue)>tolerance)
But unless you're absolutely certain your process will actually
converge, you'd better put in an escape clause that tells it
to stop after a while whether it converges or not:
do
{
complex mathematical stuff
} while (abs(newValue-oldValue)>tolerance & iteration<maxIterations)
For
Recall our first while loop:
x=0
while (x<=5)
{
x
x++
}
This is such a common structure that programmers wanted a quicker
way to construct it. The answer was for loops. The for loop equivalent
to this while loop is
for (x=0; x<=5; x++)
{
x
}
Note the three components: an initialization step, a rule for when
the loop should end, and some sort of progression towards that
ending. Strictly speaking you can skip the initialization and
the progression--just leave the semi-colons as placeholders.
for(; x<=10;) {
x
x++
}
All this really means though is that you're promising to take
care of those steps yourself. In particular x must already be
defined and you need to make sure the loop will in fact end.
By far the most common use of for loops is to loop over the elements
of a matrix.
m=J(5,5,0)
for(i=1; i<=rows(m); i++) {
for(j=1; j<=cols(m); j++) {
m[i,j]=i+(j-1)*cols(m)
}
}
m
Writing Your Own Functions
Mata allows you to write your own functions--in fact many of the
standard functions we've used are written in Mata. As you've
seen, calling a function is a matter of typing the function name
and then giving a list of arguments in parentheses. Defining a function
follows the same structure: first the word function to
tell Mata what you're doing, then a name, and then a list of arguments
in parentheses. The body of the function follows in curly brackets.
If you want your function to return a result, one of the statements
in the body needs to be return, followed
by the thing to return in parentheses.
function myfunction(x,y)
{
statements
return(z)
}
As an example, consider the following (not terribly useful) function.
function doubleAndSum(x)
{
x=x*2
return(sum(x))
}
This function takes a matrix, multiplies it by the scalar two,
and returns the sum of all the elements in the matrix. To test
it, try the following:
m=I(3)
doubleAndSum(m)
m
Note that you passed in a matrix called m even though the function
calls it x. That's fine: the doubleAndSum function refers to
whatever it is given as x. In fact the input doesn't even have
to have a name.
doubleAndSum(I(3))
doubleAndSum(1..3)
In the first case, the argument passed in is the result of the
I function. In the second, it is a row vector defined on the
spot. On the other hand, in these cases we can't see how the
input matrix was doubled as they aren't stored anywhere.
Function arguments in Mata are "passed by reference." If a function
changes one of the arguments it is given, that change persists
even after the function is completed.
Example: Loops and Functions
Our next example comes from basic physics, but the real point is
using loops and functions to get useful results. The classic
equation for the motion of a body under uniform acceleration
is simply
x(t)=x0+v0*t+1/2*a*t^2
Depending on the level of mathematics used in your last physics
course, you may or may not have worked with that as a vector
equation, where x,x0,v0 and a are vectors with one number for
each dimension. The vector component makes this a good problem
for Mata. Let's track and plot the movement of a falling object
for ten seconds.
Begin by defining a function x which takes x0,v0,a and t as arguments
and returns the position of the body at time t.
function x(x0,v0,a,t)
{
x=x0+v0*t+1/2*a*t^2
return(x)
}
This is a good point to pause and test what we've done so far before
moving on. Try:
x0=0,0,0
v0=10,0,0
a=0,0,0
This represents a body just moving at 10 meters/second (not that
Mata cares about the units) in the positive x direction, so it
will be easy to tell if your function works properly or not.
x(x0,v0,a,10)
should give 100,0,0.
But that doesn't guarantee our acceleration term is right, so try
v0=0,0,0
a=0,0,1
x(x0,v0,a,10)
This should give 0,0,50.
Now we'll set up the actual values we want to examine:
x0=0,500
v0=100,0
a=0,-9.8
This is an object starting 500 meters up, moving to the right at
100 meters per second, and falling under normal earth gravity
(about 9.8 meters per second squared). Note how we suddenly switched
from three dimensions to two. Your function won't care in the
slightest since they're all still matrices, but it's a lot easier
to plot on a two dimensional graph.
Now you need somewhere to put the results. Mata doesn't do graphs,
so you'll have to use Stata to plot them. But first we have to
think about what the results will be. Your function provides
a snapshot of the object at any given time, so we'll take a thousand
snapshots spread across the ten second span. Each snapshot will
have two numbers, an x coordinate and a y coordinate, so those
will be your variables (though to Mata they're just column one
and column two of the x row vector).
Get out of Mata, then use standard Stata commands to set up the
observations and variables you'll need.
end
clear
set obs 1000
gen x=.
gen y=.
Now get back into Mata, and set up a view of these variables. Call
it r (as in results):
mata
st_view(r=0,.,.)
Now you're ready to loop over your 1000 snapshots and get the results
of the x function for each one:
for(i=1; i<=1000; i++) {
r[i,.]=x(x0,v0,a,i/100)
}
Note how the time for each snapshot, i/100, spreads them evenly
across 10 seconds as i goes from 1 to 1000.
Finally exit Mata again, and create the graph.
end
scatter y x
It's the classic parabola of projectile motion as you no doubt
expected, but of course the real point was to practice using
functions and loops to generate and work with data.
As soon as we defined the x() function, Mata compiled it into "object
code." This is not quite the same as machine-language code that
can be run all by itself. But it is something Mata can understand
and run very quickly. If you were planning to use this function
in the future you could save the object code. Then future programs
wouldn't need to spend time compiling it. If this is something
you're interested in, see the mlib and mata
mosave commands.
This example is contained in ex1.do.
Pointers
A pointer is a variable which contains the memory address of another
variable or matrix. Thus it "points" to that other variable.
In principle a pointer is just a number, but you'll
never work with the number directly. Instead, you'll use two
operators, & and *.
Computer scientists call these the "reference" and "dereference"
operators, but I like to think of them as "the address of"
and "the thing at."
Consider the following:
x=(1..3)\(4..6)\(7..9)
p=&x
*p
(*p)[2,2]
First we define a matrix x so we
have something to work with. Then
p is defined as "the address
of x." Thus
*p or "the thing at p"
is just another name for x.
You can use subscripts with
*p just like you would with x,
but note how you have to put *p in
parenthesis first. (*p)[2,2] means
"find the thing at the address contained in p and
get the value at row two, column two. " *p[2,2] would
mean "get the value at row two, column two of p,
and then find the thing at that address" which won't work because p is
a scalar.
Note that the matrix you are pointing to doesn't need to have a
name. For example:
p=&J(5,5,0)
makes p point to a 5x5 matrix of
zeroes (since that's what J returns)
even though there's no direct name for that matrix. It's only
accessible as *p.
One use for pointers is to construct data structures Mata doesn't
handle automatically. For example, you can construct a three
dimensional matrix by making a two dimensional matrix of pointers,
each of which points to a column vector.
The first step is to define the matrix that will contain the pointers.
Matrices can't switch between containing numbers and containing
pointers, so you need to make sure that the matrix is defined
as containing pointers. On the other hand, we don't have any
actual pointers to store yet. Thus we'll set the initial values
of the matrix to NULL, a special pointer that doesn't point to
anything.
x=J(5,5,NULL)
Now loop over all the elements of x, and make each one a pointer
to a new (and unique) column vector.
for (i=1; i<=rows(x); i++)
{
for (j=1; j<=cols(x); j++)
{
x[i,j]=&J(5,1,0)
}
}
To work with an element i,j,k of your three dimensional matrix
you'd use the following:
(*(x[i,j]))[k]
Yes, the parentheses are essential. It's rather awkward though,
so if you were going to work with such matrices a lot consider
defining the following functions:
function put(val,x,i,j,k)
{
/* Usage: value to put, matrix to put it in,
i, j, k to put it at. */
(*(x[i,j]))[k,1]=val
}
function get(x,i,j,k)
{
/* Usage: matrix to get from, i, j, k of value
to get. */
return((*(x[i,j]))[k,1])
}
Then you can do things like:
put(3,x,5,5,5)
get(x,5,5,5)
The fact that you can't mix pointers and regular data within a
matrix does limit your flexibility. You can't, for example, have
a matrix of individual data including pointers to other data
relevant to the individual. You can, however, have two parallel
matrices, one containing numbers and the other containing
pointers. If row i represents the same individual in both matrices,
you can pull information from either or both as needed.
It's also possible to create a pointer to a function. Given your
existing function doubleAndSum() try:
p=&doubleAndSum()
(*p)(I(3))
Note how Mata distinguishes between x() the
function and x the
variable by the parentheses, even though you're not passing in
any arguments when you define a pointer.
The main reason you'd want to create a pointer to a function is
so that you can use that pointer as the argument of a function.
For example, Mata's optimizer functions have you pass in a pointer
to the function which is to be optimized.
The Mata Optimizer
Many statistical operations involve maximizing or minimizing some
quantity--maximum likelihood estimation being the obvious example.
Mata includes an optimizer for these operations. Mata's optimizer
is actually a family of functions which define and then solve
an optimization problem.
Evaluator functions
The first function must be written by you: the function to
be maximized or minimized. This function can do whatever you
want, it may be very complex or very simple, but it must accept
a certain set of arguments and return the proper result in order
for the optimizer to use it. You can name the arguments whatever
you like, but there must be the right number and each must have
the proper meaning.
Optimization problems are described using a variety of notations,
but if we consider the problem of maximizing
y=f(x), the Mata version of f() must take the following arguments:
- a number indicating whether the function is supposed to calculate
(0) just f(x), (1) y=f(x) and f'(x) or (2) f(x), f'(x) and
f''(x)
- x (the thing that changes as the optimizer looks for the max)
- y (where f(x) will be stored)
- A variable to store f'(x) if calculated
- A variable to store f''(x) if calculated (often H since it is
the hessian in multivariate problems)
If you can find an analytic solution for f'(x) and f''(x) and code
them in, Mata's optimizer will be much faster and more accurate.
This is often impractical though, and Mata is perfectly willing
to find numeric approximations to the derivatives it needs. Note,
however, that even if your function never calculates derivatives
it must still accept variables where they can be stored.
So let's try a very simple function: y=-x^4. If you're willing
to let Mata find all the derivatives for you, you can code this
as the following:
function f(todo,x,y,g,H)
{
y=-x^4
}
Note that todo (the number telling
the function whether to calculate derivatives or not), g,
and H are completely unused, but must
still be accepted. Note also that we could change all the names.
We could call x Fred and y George
as long as George=f(Fred).
If we're not quite so lazy we can also code the derivatives for
this function quite easily:
function g(todo,x,y,g,H)
{
y=-x^4
if (todo>=1)
{
g=-4*x^3
if (todo==2)
{
H=-12*x^2
}
}
}
In this case, both x and y were scalars, but you're not limited
to scalars. Consider y=-x1^4 - x2^4
function h(todo,x,y,g,H)
{
y=-x[1]^4-x[2]^4
}
The two x's are stored in a row vector x, so x1 is x[1] and x2
is x[2]. Making sure the optimizer sends in the right size of
x is one of the steps in setting it up.
In many cases the quantity you'll want to maximize will be the
sum of a column. For example, in maximum likelihood you will
probably create a column where each row gives an observation's
contribution to the likelihood function. Mata's optimizer will
do this automatically with the proper settings, so the previous
function could be recast as:
function i(todo,x,y,g,H)
{
y=J(2,1,.)
y[1]=-x[1]^4
y[2]=-x[2]^4
}
Note how the function had to define y as a column vector of the
proper size--otherwise it is a scalar.
In statistical applications the quantity to be maximized will depend
not just on parameters that can vary (x in our problems thus
far) but on data that do not vary. Mata's optimizer can
be set up to pass up to nine additional arguments to your evaluator
function, which can contain the data. They go after the first
two arguments (and before the final three).
This calls for a change of variable names. Consider maximizing
s=f(b). We'll now use x for the data matrix. Then
the function definition would be:
function f(todo, b, x, s, g, H)
x can contain both the independent and dependant variables, but
if it's easier to work with a separate matrix y, then the definition
becomes:
function f(todo, b, x, y, s, g, H)
We'll do an example using this kind of evaluator shortly.
Setting Up and Running an Optimization Problem
Once you've got your evaluator function defined, you're ready to
set up the optimization problem. The first step is to call the
optimize_init function. optimize_init takes
no arguments and returns a variable containing a description
of your optimization problem. You'll never look at this description,
but you will pass it in to all the other optimization functions.
s=optimize_init()
Next tell it where to find the evaluator:
optimize_init_evaluator(s,&f())
The first argument is the problem description, and the second is
a pointer to the evaluator function you already defined.
Now give it a starting value for x--remember the f() function
is in the form y=f(x).
optimize_init_params(s,-100)
Of course the correct answer is zero, but we want it to do some
work!
Now you're ready to actually run the optimizer:
optimize(s)
This returns the value of x which maximizes f(x). You may want
to store this in a variable:
xmax=optimize(s)
If we want to use the g(x) function instead, there's one additional
step. Recall that in g(x) we coded the first and second derivatives
ourselves so Mata doesn't have to approximate them. Mata refers
to this as a "d2" evaluator. A "d1" evaluator codes just the
first derivative, and a "d0" evaluator codes no derivatives at
all (the f(x) function is a d0 evaluator). Mata will assume functions
are d0 unless we say otherwise, so you need to add:
optimize_init_evaluatortype(s,"d2")
Note that the optimizer doesn't care what order all the initialization
functions are called in, as long as they're before the actual
optimize().
What about h(x1, x2)? It's identical to f(x), except that we need
to set an initial value for both variables:
optimize_init_params(s,(-100,100))
In doing so we also tell Mata that future x's must have two columns.
Finally,
i(x1,x2), where the function to be maximized is the column sum
of y, is what Mata calls a "v0" evaluator.
The "v" is
for vector, and the "0" again means that we didn't code
any derivatives. Thus we need
optimize_init_evaluatortype(s,"v0")
To see the complete code to run the optimizer with each evaluator,
see the the last parts of mataclass.do. There is one setup function
we haven't needed to call but you should know:
optimize_init_which(s,"min")
changes the problem from maximizing the function (the default)
to minimizing it.
Example: Ranking Teams
As a final, extended example, consider a problem familiar to any
sports fan: determining how good a team is based on its won/loss
record.
We'll assume that a team can be characterized by a single "strength"
variable s. If team i plays team j, we'll assume that the probability
of team i winning is given by exp(si-sj)/(1+exp(si-sj)), better
known to Stata as invlogit(si-sj).
Then, given a record of games played and who won, we can find
a set of values for s that maximizes the probability of the given
outcome. We'll do a Monte Carlo study by first creating data
which fits our assumptions, and then seeing how well the method works.
Creating the Data
The first step is to create the data. This is an exercise in matrix
manipulation, so if you want to focus on the optimization part
of the problem feel free to skip ahead. On the other hand, most
readers will benefit from some practice in this area.
First create a row vector str containing
the real strengths of each team. For simplicity, use the uniform
function, giving strengths distributed uniform(0,1). For our
example we'll make 50 teams:
str=uniform(1,50)
Note that the column number within the str vector acts as a sort
of team ID.
Next we need a way to keep track of who played who. We'll create
a two-column matrix, where each row is a game and the two columns
will contain the IDs of the two teams who played in that game.
For brevity we'll call the team in column one the "home" team
and the team in column two the "visiting" team. We'll assume
that each team plays 20 games, 10 as the home team and 10 as
the visiting team.
Begin by creating a column vector teams containing
the 50 teams using the range operator:
teams=1::50
Now create a column vector season which is just ten copies of teams stacked on top of each other:
season=teams
for(i=1;i<10;i++)
{
season=season\teams
}
This represents the home teams. Next assign the visiting teams
by taking the same vector, putting it in a random order, and
column joining it to the original:
season=season,jumble(season)
The only trouble is, it's entirely possible for a team to be randomly
assigned to play itself. This wouldn't really bother our
estimator, but it does offend any claims that this represents
the real world. More importantly, fixing it is good practice.
Create a column vector same with
the same number of rows as season which
contains a 1 if the home and visiting teams are the same and
a 0 if they are not:
same=(season[.,1]:==season[.,2])
If a game has the a team playing itself, we will swap the visiting
team with the visiting team of a randomly chosen game. Since
it's possible we might get the same team yet again, we'll keep
checking and swapping until there are no more games between the
same team. Here's the code:
while (max(same)==1)
{
for(i=1; i<=rows(season); i++)
{
if (same[i])
{
swap=trunc(uniform(1,1)*rows(season))+1
temp=season[swap,2]
season[swap,2]=season[i,2]
season[i,2]=temp
}
}
same=(season[.,1]:==season[.,2])
}
Note how max(same) will be zero
if there are no longer any games where the home and visiting
teams are the same, so that's how we know when we're done.
We then loop over the rows, stopping to change those where same[i] is
one (or true). In those cases, we pick a random row and swap
visiting teams with it, using a temp variable to store its visiting
team's ID as we do. We then recalculate same based
on the new version of
season before the while condition
is reevaluated.
Now we need to decide who won each game. We'll create a column
vector winner, which will contain a 1 if the home team won and
a zero if the visiting team won.
winner=J(rows(season),1,.)
for(i=1; i<=rows(season); i++)
{
winner[i]=uniform(1,1):<invlogit(str[season[i,1]]-str[season[i,2]])
}
Note how indexing is used to pull up the strength (str) of the
appropriate team--we'll be doing that a lot.
The Maximum Likelihood Estimator
Now we're ready to construct the (log) likelihood function to be
maximized. We'll start with a version that's easy to understand,
and try to make it efficient later.
function llf(todo,strhat,season,winner,llf,g,H)
{
llf=J(rows(season),1,.)
for(i=1;i<=rows(season);i++)
{
if (winner[i]) llf[i]=log(invlogit(strhat[season[i,1]]-strhat[season[i,2]]))
else llf[i]=log(invlogit(strhat[season[i,2]]-strhat[season[i,1]]))
}
}
This will be a v0 evaluator which takes season and winner as
additional arguments. The estimated strengths are stored in strhat,
and the column of log likelihoods (which will be summed automatically
by virtue of being v0) is stored in llf.
We have two possible outcomes, and the formula for finding the
log likelihood is different in each outcome. For now we'll handle
the two possibilities with an if/else structure, but there are
more efficient ways.
Now to set up the optimization problem:
s=optimize_init()
optimize_init_evaluator(s, &llf())
optimize_init_evaluatortype(s,"v0")
strhat0=J(1,rows(teams),.5)
optimize_init_params(s,strhat0)
optimize_init_argument(s,1,season)
optimize_init_argument(s,2,winner)
strhat1=optimize(s)
Most of these you've seen before. Note that strhat0 is
the vector of starting values for our estimate of str.
Since the actual strengths are distributed uniform(0,1) we'll
start by setting them all to 0.5.
What is new is optimize_init_arguments.
This is where you tell the optimizer to pass in season and winner to
your evaluator. As you see optimize_init_arguments takes
three arguments: the optimization problem, the number of the
argument you're setting, and what to pass in.
Run the code. It will take a while but it should work.
Efficiency
So how can we make it faster? It would be nice if we didn't have
to figure out which formula to use for the likelihood. So let's
rearrange the data a bit: instead of column one being the "home"
team and column two the "visiting" team, terms which have no
real meaning in our model, let's make column one the winner and
column two the loser.
Create a new matrix season2 with the new arrangement:
season2=J(rows(season),2,.)
for(i=1;i<=rows(season);i++)
{
if (winner[i]) season2[i,.]=season[i,.]
else season2[i,.]=season[i,(2,1)]
}
Of course this loop takes time to run, but it only runs once. It's
the evaluator that must be run over and over, so taking a bit
more time to set things up so that the evaluator runs faster
is well worth it.
One general principle when it comes to writing fast Mata code is
that matrix operations are faster than loops you write
out. There's no matrix operation that would allow you to look
in one matrix for the estimated strength of a team identified
in another matrix, but you can take calculating the log and invlogit functions
out of the loop.
Here's a second and more efficient version of the evaluator
function:
function llf2(todo,strhat,season,llf,g,H)
{
x=J(rows(season),1,.)
for(i=1;i<=rows(season);i++)
{
x[i]=strhat[season[i,1]]-strhat[season[i,2]]
}
llf=log(invlogit(x))
}
Note how it doesn't need to have the winner matrix passed in anymore--it
expects a version of season (season2) that conveys that information
by which team is in column one.
Here's the setup needed to run this version:
s=optimize_init()
optimize_init_evaluator(s, &llf2())
optimize_init_evaluatortype(s,"v0")
strhat0=J(1,rows(teams),.5)
optimize_init_params(s,strhat0)
optimize_init_argument(s,1,season2)
strhat2=optimize(s)
You'll see that this runs in about half the time of the original.
Most of the gain comes from moving the calculation of log and invlogit out
of the loop.
For those looking for the absolute best performance, consider turning
on matastrict (mata set matastrict on).
Matastrict requires that you declare the names and types of all
variables before using them rather than letting Mata choose.
Mata has more variable types than most languages, and they can
be confusing. On the other hand, declaring your variables can
help you avoid errors. More importantly, Mata's compiler can
use the additional information to create slightly more efficient
object code.
If you are interested in using matastrict, see the manuals and
especially the section on declarations.
Constraints
One characteristic of this model is that only the difference between
teams is identified. You could add 100 or -1,000,000 to all the
strengths and the probability of each outcome would remain the
same. Thus two different runs on the same data could give very
different numbers and both be right. However, if we constrain
just one strength to be a given number, then all the strengths
are identified.
Mata's optimizer accepts constraints on the parameters in the form
of two matrices C and c. The parameters p are then constrained
such that Cp=c.
Let's constrain the strength of team one to be zero. To implement
this, the C matrix needs to be a row
vector with a column for each team. It will have a one in the
first column and a zero in all other columns, which makes it
a unit vector. c will
be simply the scalar zero.
C=e(1,rows(teams))
c=0
You then pass in this constraint using the optimize_init_constraints function.
It takes two arguments: the problem s,
as usual, and then a matrix which is the row join of C and c.
I'm not sure why it doesn't just take them as two separate arguments,
but it's easy to join them.
optimize_init_constraints(s,(C,c))
Since you've made no other changes to the problem, you can simply
run it again by calling optimize.
strhat3=optimize(s)
This version will actually run significantly faster.
As an exercise, consider constructing some sort of metric
for how well your estimator does. (One easy one would be how
often it correctly identifies the best team.) Then vary the number
of games per season and see how many it takes to get reasonable
accuracy. However, don't do this if you want to continue to take
the ranking systems of actual sports seriously.
This example (with additional commands for timing each method)
is found in ex2.do.
Learning More
This publication has just scratched the surface of what's possible
in Mata. There's obviously much more to learn, and even
more to be looked up when you need it.
As usual, Stata Corp. has included most of the Mata documentation
in the online help. There is one trick though: to get help for
Mata you need to type help mata topic rather
than just help
topic. This is especially important
for functions and such that exist in both Mata and Stata. For
example, compare the results of the following:
help abs
help mata abs
A couple useful starting places:
help mata
help mata functions
The Mata manuals are available in the CDE Library and the 4218
lab. There are two books, but they should be thought of as two
volumes of the same manual.
You are also welcome to ask the
consultant for help. Mata is new
to us as well, but we'll try to figure things out together.
|