An Introduction to Mata

Mata is a matrix language built into Stata, similar in many ways to R, Matlab or GAUSS. It does have some unique and intriguing features however. One is that it is a compiled language rather than interpreted, which improves performance. It also has been parallelized in Stata/MP (available on all the SSCC Linux servers and Condor) which dramatically improves performance.

Mata is not a replacement for Stata, nor is it intended to be a stand-alone statistical package. It is a tool which is best used as a supplement to Stata, for doing those things Stata does not do well on its own. In particular, Mata does not work in the context of a single data set, giving it additional flexibility. But you should not try to learn Mata unless you are already familiar with Stata or another statistical package.

Mata is a relatively "low level" language. Much of your time in Stata (or SAS or SPSS) is spent using built-in programs, finding just the right combination of options to get Stata to do what you want. In Mata you will take direct control, telling Mata what you want to do step-by-step. (The Mata optimizer, which we will discuss at length, is a notable exception.) That means doing simple things is usually more cumbersome in Mata than in Stata, but Mata has fewer constraints.

This article is primarily written for people who have significant experience using Stata, SAS or SPSS syntax, but no other programming languages. Thus there will be a lot of emphasis on learning how to do useful things by manipulating matrices, and many of the examples are designed to give experience doing so as well as illustrating a particular concept. Matlab and GAUSS veterans may find they can skim these sections, focusing on what is new to them. C programmers will find that Mata imitates C whenever it can, so they can probably skim the sections on standard programming constructs like loops. But no matter what your background, you will learn far more if you read this article at the computer, with Stata running, and actually type in the examples.

Mata runs within Stata, so in order to use Mata you'll need to know how to run a Stata program, called a do file. If you've never used Stata, please read the section on do files in An Introduction to Stata. Interactive Stata (i.e. start it up and type in commands) is a great way to learn and that's how you'll do the examples in this article. But for real work you'll want to write everything in do files.

There are several example files associated with this article. There are links to them in the text as they are used. However, if you're using an SSCC Linux server you may want to copy them all ahead of time. To do so type the following at the Linux prompt:

mkdir mataclass
cd mataclass
cp /usr/global/web/sscc/pubs/files/4-26/* .

Most of the commands you'll type as you read are also contained in The two major examples are found in and

Mata Basics

To start Mata, first start up Stata and then type


in the command window. Stata will then switch to "Mata mode." It's easy to forget which mode you're in, but if you look carefully at the bottom of the review window, where you normally see a period you'll now see a colon. Of course when you first start Mata there's a big horizontal line that says mata but that won't be visible for long. That bar also reminds you of another useful command: to get out of Mata mode, type


While Stata is organized around commands, Mata is organized around "statements." For example, you can simply type


and Mata will return


Storing the results in a variable is just as easy:


Note that there was no output this time. If you want to see the value of x, simply type:


and you'll get 4 back.

Matrix Operators

In our earlier examples, the plus sign acted as an operator: Mata recognized it as saying "take the thing before the plus sign and the thing after the plus sign and sum them." Mata also defines matrix operators which carry out matrix manipulations.

Column and Row Join

The comma is defined as the "column join" operator, or "take the things before and after the comma and put them next to each other." That means


is interpreted by Stata as a matrix with one row and two columns. In fact, type it and the output will be:

       1   2
  1 |  1   2  |

The backslash (\ not /, which is division) is the "row join" operator, or "take the thing before the backslash and stack it on top of the thing after it." Thus


is a matrix with two rows and one column:

  1 |  1  |
  2 |  2  |

The things to be operated on are not limited to scalars, so you can construct matrices.



       1   2
  1 |  1   2  |
  2 |  3   4  |

Error Messages

There are limits:


gets you

<istmt>: 3200 conformability error

You can't make a matrix whose first row has two columns but whose second and third rows have one column. Stata calls this as a conformability error, a class which also includes things like trying to multiply matrices where the number of columns of the first matrix is different from the number of rows of the second.

This is also a "runtime" error. When you give Mata a statement, the first thing it does is compile the statement into byte code. At that point Mata isn't looking at the specific quantities to be manipulated. All it sees is "put something next to something else, then stack that on top of two other things." That makes sense, so the statement compiles successfully. Only when the code runs does Mata notice that the things to be stacked have different numbers of columns.

A compile time error is generated when the statement doesn't make sense no matter what quantities you plug into it. For example,


Gets you

invalid expression

Compile time errors are always error code 3000, and don't include <istmt>. Noting whether an error is a compile time error or a runtime error can help you in finding the problem. If it was runtime, you know Mata at least understood your code (though there's no guarantee it understood it to mean the same thing you understand it to mean) and it was something about the specific quantities it was working with that caused the problem. If it's compile time, the code doesn't even make sense to Mata.


As you're putting together matrices you're welcome to include parentheses to help clarify things. It's mostly for your benefit:


are all the same to Mata, but if one of them looks clearer to you by all means use that style in all your programs. (I happen to be a fan of the third.)

Range Operators

Another set of handy tools for creating matrices are the range operators. The .. operator creates a series starting from the number on the left up to the number on the right and makes them into a row vector. The :: operator does the same but puts them in a column vector.


It's not limited to integers, however:


Note that number on the right is not guaranteed to be part of the resulting series.


The results of matrix statements can of course be saved in variables just like other statements. For example:


Note how the definition of z looks a lot like the statement we tried earlier that gave a runtime error. It runs now because x and y both have two columns. That's why it was a runtime error and not a compile time error: given the right inputs the statement can work.

Arithmetic Operators

The standard arithmetic operators recognize when one or both of their arguments is a matrix, and act accordingly. This includes addition and subtraction, scalar times matrix, and full matrix multiplication.

However, there are also "colon" operators which work element by element. For addition and subtraction it makes no difference, since they're element by element anyway. But it makes a great deal of difference with multiplication. Consider the following:


Logical Operators

Logical operators such as greater than, less than, and equal to are also defined in both matrix and colon versions (note that the equals operator is ==, as opposed to = for assignment). However, Mata does not have a boolean variable type. Thus (as in Stata) logical operators return one for true and zero for false.

If you compare two matrices using standard greater than, less than or equals, Mata returns one if and only if the condition is true for all the elements of the two matrices. Otherwise it returns zero


These operators can only work on matrices of the same size.

In their colon form, the result is a matrix containing all of the element-by-element comparisons.


The colon form allows more flexibility about the sizes of the matrices to be compared. If one argument is a scalar, row vector, or column vector, it will be duplicated as needed so as to match the other matrix.


The transpose operator is the right single quote ('). It works on just one object, the one to its left, and returns its transpose:


There is no inverse operator, however. Inversion is carried out by functions.


To reference a particular element of a matrix, put the row and column number in square brackets after the matrix name. Given the matrix x defined above, x[1,1] is 1, x[1,2] is 2, etc. Subscripts can be used on either side of an equals sign:


If a matrix is a row or column vector, you can use just one subscript.


"Missing" for the row or column number is taken to mean all the rows or columns. You can use either the standard period for missing or simply leave the number out. Thus


You can also replace either number with a row or column vector (and Stata doesn't care which one you use). However, this time parentheses are required so Mata knows where the list of rows ends and the columns begin.


You can repeat rows or columns, mix up the order, or even create matrices bigger than the original.


Pre-defined vectors are also just fine:


Note that using vector subscripts like this will be faster than multiplying by a permutation matrix.

Subscripting Ranges

Mata refers to the subscripts we've used thus far as "list" subscripts. Every element we wanted was listed explicitly (well, except for missing meaning "all"). But you'll often want to specify a range of rows and columns.

There's not much need for ranges with small matrices, so begin by creating a bigger matrix:


Take a moment to try to figure out how this worked before reading further.

The first line defines x1 as a row vector using the range operator. The second defines a x2 column vector, again using the range operator but multiplying the result by ten. The third then defines x as ten copies of x1 stacked on top of each other plus ten copies of x2 placed next to each other. The result is a ten by ten matrix that goes from 11 to 110, which will make it easy to tell which rows and columns we've extracted in the next few examples. (Yes, there are easier ways to do this using functions and/or loops, but we haven't covered them yet.)

One method of extracting a range of rows and columns from x is to use the range operators within list subscripts. Thus to get rows 3-7 and columns 4-8 you could do:


This is exactly equivalent to typing out:


True range subscripts, however, are different. For one thing, they are contained in square brackets and the pipe character (shift-backslash). There are two elements, separated by a backslash: the row and column of the upper left corner of the desired range, and the lower right corner of the desired range. Thus the equivalent to




Think of it as replacing "rows four through seven and columns three through eight" with "everything between row four, column three and row seven, column eight, inclusive."

Missing can mean several things in range subscripts. If used in specifying the upper left corner, it means either the first row or the first column. If used in specifying the lower right corner, it means either the last row or the last column. Thus


is simply all of x. There's also no rule that says the upper left and bottom right corners can't be on the same row or column.


This means "row three, from the third column to the end."

Getting your Data from Stata to Mata

Most of the time you won't be creating matrices by hand. Instead you'll want to make matrices containing the data you already have in Stata. You're also likely to want to take your Mata results and copy them to your Stata data set.

The first question you need to answer is whether you want to make a fresh copy of your data, or whether you want to have Mata work with the Stata data directly. Mata can define matrices which are actually "views" of your Stata data. Think of a view as just giving your Stata data a name Mata can use. Views use a trivial amount of memory even if your data are very large. If you change the tables of a view matrix, the Stata data are also changed. If you make a copy of the data instead you can make changes and keep the original intact, however you'll use twice as much memory.


To make copies of your data you'll use the st_data family of functions. The simplest and the fastest is _st_data. It is used to get a single number from your data set. It takes two arguments: an observation number and a variable number. Think of your data set as a matrix already, and the variable number is simply the column number of the variable you want.

To see this in action, first end Mata by typing end, load the automobile example data set, list observation one, and go back into Mata:

sysuse auto
l in 1

Now get the price of the first car by typing:


The result will be 4099, which matches what you just listed. Try some other variables to make sure you've got the idea. Note that if you try to get the first column you'll get a missing value. That's because _st_data is only for numeric data. For strings, use _st_sdata which works in pretty much the same way.


Mata can look up the column numbers for you using the st_varindex function. It takes the name of the variable you want and returns the index. However, that process will slow things down slightly. Note that you're welcome to use the output of one function as an argument for another function:


If you want more than one value, use st_data. st_data can be used just like _st_data (though in that case _st_data is faster). However, st_data is more flexible about the arguments it will accept.

The row number can be missing, in which case all observations are returned. It can also be a column vector, in which case only the specified observations are returned. It can also be a matrix with two columns. Each row then represents a range of observations. A missing in this case is interpreted as the last observation.

The column number can also be missing (all variables) or a vector listing the variables desired. It can also accept the names of the variables rather than their numbers, though this will be slower. It does not allow you to specify ranges like you can for rows.

Like _st_data, st_data cannot handle strings. But there is an equivalent st_sdata.

Try the following:


Naturally all these results could be stored in matrices for future use.


st_data can also take a third argument: a selection variable. This can be either the index or the name of a variable, and if it is specified then only observations where that variable is not equal to zero are returned. Choosing 0 as the select variable has a special meaning: in that case observations will be excluded if they have a missing value for any variable specified.



st_view makes views onto your data rather than copying them, but you can select rows and variables using the exact same methods as st_data. However, st_view gives you the results in a different way. With st_data, the matrix you wanted was what the st_data function returned. You could then store the results or simply let them be displayed on the screen. With st_view, the function itself returns nothing. Instead you need to pass in a matrix which will be replaced with the view you select.


One catch is that the matrix has to exist before you pass it in. It doesn't matter how big it is--in fact it will be completely replaced--but if you try to pass a matrix to st_view that hasn't been defined somehow you'll get a runtime error. One easy way is to simply set the matrix you want to use to zero before passing it in. You can even do that inside your function call:


x and n now look like a regular matrices, but keep in mind that they are in fact views. If your objective in using a view is to save memory, you have to be careful not to make copies of it.


does not create a new view. It creates a new matrix containing the values of x; a copy. More subtly,


creates a copy of the jth column of x. If you need a particular column out of x, you can just create a new view:


There's no limit to the number of different views you can set up for the same data.

Getting your Data from Mata to Stata

If all you want to do is change the values of existing variables, make a view and change at will. But if you want to add new variables you'll need a couple more functions.

st_addvar adds a new variable. It takes two arguments: the variable type and the variable name. It returns the index of the variable, though storing that for future use is optional.

Let's create a new variable: weight*mpg (so, pound-miles per gallon, a unit only an engineer could love).


Note that the matrix called mpg is now a view of all rows of the mpg variable.


To explore this new variable using the Stata tools you're familiar with, type end. Of course this is an extraordinarily clumsy replacement for

gen long pmg=mpg*weight

This leads to a general principle: use Mata for what it's good at, and Stata for what it's good at.

There are a wide variety of other functions you can use to communicate between Mata and Stata, including adding observations and getting or setting local macros, e() and r() vectors, and much more. See the manual for details.

Saving and Loading Mata Data

Most often the goal of a Mata program is to take your Stata data, find a result, and then either display it or put it back into your Stata data. However, you can save Mata matrices themselves if you need to.

To save a matrix, type

mata matsave filename matrixlist

Replace filename with the name of the file you want to create. Stata will add .mmat to the filename automatically. Replace matrixlist with one or more matrices you want to save (separating them with spaces) or with an asterisk (*) to save all the matrices currently in memory.

To load the matrices you've saved previously, type

mata matuse filename

This will load all the matrices stored in filename.

Both commands will accept the replace option. For matsave, this allows Mata to overwrite the existing file on disk. For matuse, this allows Mata to overwrite existing matrices in memory with the same names as matrices in the file.

These commands are intended for interactive use and cannot be used in functions. If you need to save matrices within a function, check out fopen, fputmatrix, fgetmatrix, and fclose.

Hierarchical Data

Hierarchical data has always been a bit awkward to work with in Stata, or any other statistical program that uses a single data matrix. A typical example would be individuals living in households: should each household be one observation, or each individual? Either way there are inefficiencies. In Mata you can have it both ways: one matrix of individuals and one matrix of households. The key is linking them together, but subscripts make that easy.

As an example, take a look at the hh Stata data set, which is in Stata format. It consists of six individuals living in three households. hh is the household ID, hhType is the type of household, and hhInc is the household income. age and female are individual variables. This data is in the long form, with one observation for each individual, which means that the household variables must be duplicated.

use hh,replace

Now enter mata and load the hh Mata data set.

mata matuse hh

It contains two matrices, ind and hh. hh contains the household level variables (hhType and hhInc). ind contains the individual variables (age and female) plus the household ID. Note however, that hh does not contain an ID: the row number is an implicit ID.

For example, if you wanted to know what type of household person number two lives in, you'd use:


ind[2,3] is the household ID of the second person, or the row in the hh matrix. Column one of hh is the household type.

Of course a regression model may need a single matrix just like Stata does. You can easily recreate the matrix as Stata views it with:


Here ind[.,3] is a column vector listing which row we want from hh for each row of ind. The resulting rows from hh are then placed next to ind to create x.

Matrix Functions

Mata has a very large number of matrix functions built into it. This section will hit some of the most useful.

Creating Matrices

The I function can take one or two arguments. If it is given one argument, it will return an identity matrix of size equal to the argument it was given. If it is given two arguments, it will return a matrix with that number of rows and columns which is full of zeroes except for ones along the principal diagonal.


The J function creates a matrix of constants. It takes three arguments: the number of rows of the matrix to be created, the number of columns, and what to put in the matrix. The last argument can be of any type.


The e function creates unit vectors: row vectors with a one in one column and zeroes everywhere else. It takes two arguments: the location of the one, and the size of the row vector.


Thus you could create an identity matrix by combining e's, though the I function would of course be much easier.


The uniform function returns a matrix filled with random numbers distributed uniform(0,1). The size is specified in the same way as with the J function.


If you're putting together a matrix which will be symmetric, the makesymmetric function can take care of half the work for you. You put together the lower triangle, and makesymmetric will copy it to the upper triangle (replacing what was there before). It takes one argument, a matrix, and returns the symmetric version.


There's a second version, _makesymmetric which returns nothing but changes the input matrix instead:



The sort function returns a sorted matrix. It takes two arguments: the matrix to sort, and the column(s) to sort by.


Note that sort is not a "stable" sort. If there are any ties, they will be resolved randomly. Repeat sort(x,1) enough times and you should see some cases when the original second row becomes the first row and others where it stays the second row.

If the second argument is a row vector, the matrix will be sorted by the first column listed, then ties resolved by the second column, etc.


sort does not change the matrix it is given--it returns a copy. If you'd prefer to sort the original matrix without making a copy, use _sort.


jumble and _jumble are the opposite of sort and _sort. They take just one argument, a matrix, and put its rows in a random order.

Sizes of Matrices

rows, cols, and length all take one argument, a matrix. They return the number of rows, the number of columns, and the total number of elements (rows*columns) in that matrix respectively.


These can be very useful in for loops where you want to loop over your entire matrix but don't know ahead of time what size it will be.

Of course matrices can have missing values just like ordinary Stata data sets. If you need to know the number of missing values in the whole matrix, a row, or a column, use missing, rowmissing or colmissing respectively. missing returns a scalar, rowmissing returns a column vector (one value for each row in the matrix), and colmissing returns a row vector (one value for each column in the matrix).

rowmissing (y)

To get the number of non-missing values, using nonmissing, rownonmissing, or colnonmissing.

Descriptive Statistics

Mata has a set of functions for calculating descriptive statistics, though by design it's not as rich as Stata's. In their simplest and most commonly used forms, all these functions take a single matrix as their argument.

Sums may be stretching the definition of "descriptive statistic" but they are very useful. There are three main sum functions: sum, rowsum, and colsum. sum adds up the whole matrix, returning a single number. rowsum adds up each row, returning a column vector. colsum adds up each column, returning a row vector.


max and min simply return the largest or smallest element of the matrix.


rowmax and rowmin return a column vector containing the largest or smallest element of each row. Likewise, colmax and colmin return a row vector.


You can get the largest and smallest elements with the minmax functions, including minmax, rowminmax, and colminmax. They return both the min and the max in either two columns or, in the case of colminmax, two rows.


Sometimes you're more interested in where the min or max is than what it is. If so, check out minindex and maxindex.

The mean function returns a row vector containing the column means of the matrix.


If you need row means, you'll need to construct them yourself:


You can also get the variance matrix of your matrix with variance, and the correlation matrix with correlation.


Matrix Characteristics

Mata has functions for finding the most common matrix characteristics.

diagonal returns the principal diagonal of a matrix as a row vector.


trace returns the sum of the diagonal elements.


det returns the determinant (with some round-off error).


rank returns the rank of the matrix.


As a general rule, using rank to check that a matrix is full rank because a subsequent calculation requires it is redundant--better to let the subsequent calculation fail and handle the failure. Also, when determining the rank of a matrix on a computer a certain tolerance is required for round-off error. Different functions can use different tolerances, so they may disagree with the rank function.

Solvers, Inverters and Decomposition

Matrix solvers are functions designed to solve the equation AX=B for X. Since inverting a matrix A can be done by solving the equation AX=I, inverters and solvers are closely related. In addition, the solvers and inverters generally work by doing some sort of decomposition, and the decomposition methods can also be accessed directly. Thus there are several families of three related functions. Which family you'll choose depends on the properties of your matrix. We'll focus on Cholesky methods, but the others work in a very similar way.

The Cholesky Decomposition decomposes a symmetric, positive definite matrix into a lower triangular matrix times its transpose. The results can be used to solve matrix equations and find inverses much more quickly than more general methods. To get an example matrix we can work with, we can take advantage of the fact that if x is of full rank, x'x is symmetric and positive definite--and random matrices are all but certain to be full rank. So:


The cholesky function takes one argument, a matrix, and returns a lower triangular matrix which is its Cholesky Decomposition.


To verify that it works, note that


is the same as a.

To illustrate solving a matrix equation, we need a right-hand side.


The cholsolve function takes two arguments, both matrices. It them sets up and solves the equation AX=B where A is the first matrix and B is the second, returning X.


To verify that it worked, note that


gives b.

The cholinv function takes one argument, a matrix, and returns its inverse:


Again, to verify it worked try


The result should be an identity matrix. You'll notice that round-off error makes some of the zeroes not quite zeroes, but it's as close as you can get using a computer.

If your matrix is square but not symmetric, LU Decomposition can do similar things (though it's slower than Cholesky). The lud function is equivalent to cholesky, but more complex because it has to give back three results. Thus the matrices to store the results have to be passed in, similar to st_view.

LU Decomposition breaks a matrix into a lower triangular matrix L, an upper triangular matrix U, and a permutation matrix P. But Mata, instead of using a matrix P, gives a column vector p which can be used with subscripting to do the same thing.


Take look at L,U, and p. To verify that it worked, see that


is a. Note how we're using subscripts to pull rows from a result, not an existing matrix, and yet it works just fine.

lusolve and luinv are much simpler--in fact the usage is identical to cholsolv and cholinv. Try


Similar functions exist for QR decomposition (qrd, qrsolve, qrinv). Singular value decomposition has svd and svsolve, but the related inverter is pinv (which returns the Moore-Penrose pseudoinverse).

invsym, for inverting symmetric matrices, has no related decomposition or solver functions. It is, however, what Stata suggests for linear regression.

Example: Linear Regression

You're now prepared to do the most common matrix manipulation of all, at least at the SSCC.

Since we have the automobile data set loaded, let's regress price on mpg, rep78, foreign and weight to see which characteristics American consumers were willing to pay for in 1978.

One complication is that rep78 is missing for some observations, so you'll need to drop them from your regression. Of course you could just drop them from the data set entirely, but let's assume that you want to keep them for later use.

The first step is to mark the observations you can actually use. Exit Mata by typing end, then create a new variable touse which is one if all the variables are non-missing and zero otherwise.

gen touse=(price!=. & mpg!=. & rep78!=. & weight!=. & foreign!=.)

Of course in reality the only variable that is ever missing is rep78, but we'll pretend we didn't know that ahead of time. If you wanted to run a regression on a subpopulation (say, just the foreign cars) you could add a condition to mark the subsample too.

Now go back into Mata by typing mata. Next make x a view onto all the independant variables and y a view onto the dependant variable. touse is the selection variable for both views: observations will only be included if it is equal to one.

st_view(x=0,.,("mpg","rep78","weight","foreign"), "touse")
st_view(y=0,.,("price"), "touse")

To include a constant term in the regression you need to add a column of ones to x. Use the J function to create it, then add it to x with the comma (column join) operator.


Now you can find the betas:


Of course they don't mean anything without standard errors. Start by finding the residuals:


Then the variance-covariance matrix is


The standard errors for each beta can be extracted using the diagonal function, along with sqrt, which takes the (element by element) square root.


The t-statistic is the beta divided by the standard error, but this is an element by element division so you need to use the colon operator.


To find the p-values requires the ttail function, which takes as arguments the degrees of freedom and the t-statistic and returns the probability. It is a single-tailed test however, so you need to multiply by two.


To put your results together in a readable form, try:


Now exit Mata again, and check your results against

reg price mpg rep78 weight foreign

They should be identical.


In regular Stata, if is almost always used to define which observations a command should act on. It's the equivalent to SQL's where clause. In Mata, if is only used to control program flow. (Though Stata can use if in this way as well—see the appropriate section of Stata Programming Tools.)

The most basic form of if is simply

if (condition) statement

If condition is true, then statement is executed. Otherwise it is skipped entirely. Note that the condition must be in parentheses.

If you want to do more than one thing if the condition is true, use the following syntax:

if (condition)

You can also use else to specify things that should happen if the condition is not true. Thus you can create fairly elaborate structures:

if (condition1)
else if (condition2)

Keep in mind that each condition has a single result, either true or false. For example, If x is a matrix you can't write if (x>5) and then list things you want done for those elements of x which are greater than 5. You can write if (x[1,1]>5) followed by if (x[1,2]>5) etc. but of course the easy way to do that is using a loop.


Mata has while, do-while, and for loops available (plus goto for easier conversion of FORTRAN code, but we don't want to endorse spaghetti logic).


while looks very similar to if:

while (condition)

The difference is that the statements will be executed over and over as long as the condition is true. Thus your first concern should be to make sure that at some point the condition will become false, or the loop will run forever.

while (x<=5)

Note that x++ is shorthand for x=x+1.

Even though we said x<=5, the final value of x is six. That's because the loop only decides to end when x becomes six.

You can also put a while loop with a single statement in a single line.

while (x<=5) x++

This loop is perfectly legal, but it didn't do anything. The reason is that the condition started out false, so the loop never executed at all. Sometimes you need to make sure that a loop runs at least once, and that's a job for do-while.


A do-while loop starts with do and ends with the while condition. The statements are always executed at least once. If the condition is true, the loop starts over again from the top.

} while (x<=5)

Note that x is increased to seven even though it started out greater than five.

One typical use for do-while loops is to do something until it converges:

complex mathematical stuff
} while (abs(newvalue-oldvalue)>tolerance)

But unless you're absolutely certain your process will actually converge, you'd better put in an escape clause that tells it to stop after a while whether it converges or not:

complex mathematical stuff
} while (abs(newValue-oldValue)>tolerance & iteration<maxIterations)


Recall our first while loop:

while (x<=5)

This is such a common structure that programmers wanted a quicker way to construct it. The answer was for loops. The for loop equivalent to this while loop is

for (x=0; x<=5; x++)

Note the three components: an initialization step, a rule for when the loop should end, and some sort of progression towards that ending. Strictly speaking you can skip the initialization and the progression--just leave the semi-colons as placeholders.

for(; x<=10;) {

All this really means though is that you're promising to take care of those steps yourself. In particular x must already be defined and you need to make sure the loop will in fact end.

By far the most common use of for loops is to loop over the elements of a matrix.

for(i=1; i<=rows(m); i++) {
for(j=1; j<=cols(m); j++) {

Writing Your Own Functions

Mata allows you to write your own functions--in fact many of the standard functions we've used are written in Mata. As you've seen, calling a function is a matter of typing the function name and then giving a list of arguments in parentheses. Defining a function follows the same structure: first the word function to tell Mata what you're doing, then a name, and then a list of arguments in parentheses. The body of the function follows in curly brackets. If you want your function to return a result, one of the statements in the body needs to be return, followed by the thing to return in parentheses.

function myfunction(x,y)

As an example, consider the following (not terribly useful) function.

function doubleAndSum(x)

This function takes a matrix, multiplies it by the scalar two, and returns the sum of all the elements in the matrix. To test it, try the following:


Note that you passed in a matrix called m even though the function calls it x. That's fine: the doubleAndSum function refers to whatever it is given as x. In fact the input doesn't even have to have a name.


In the first case, the argument passed in is the result of the I function. In the second, it is a row vector defined on the spot. On the other hand, in these cases we can't see how the input matrix was doubled as they aren't stored anywhere.

Function arguments in Mata are "passed by reference." If a function changes one of the arguments it is given, that change persists even after the function is completed.

Example: Loops and Functions

Our next example comes from basic physics, but the real point is using loops and functions to get useful results. The classic equation for the motion of a body under uniform acceleration is simply


Depending on the level of mathematics used in your last physics course, you may or may not have worked with that as a vector equation, where x,x0,v0 and a are vectors with one number for each dimension. The vector component makes this a good problem for Mata. Let's track and plot the movement of a falling object for ten seconds.

Begin by defining a function x which takes x0,v0,a and t as arguments and returns the position of the body at time t.

function x(x0,v0,a,t)

This is a good point to pause and test what we've done so far before moving on. Try:


This represents a body just moving at 10 meters/second (not that Mata cares about the units) in the positive x direction, so it will be easy to tell if your function works properly or not.


should give 100,0,0.

But that doesn't guarantee our acceleration term is right, so try


This should give 0,0,50.

Now we'll set up the actual values we want to examine:


This is an object starting 500 meters up, moving to the right at 100 meters per second, and falling under normal earth gravity (about 9.8 meters per second squared). Note how we suddenly switched from three dimensions to two. Your function won't care in the slightest since they're all still matrices, but it's a lot easier to plot on a two dimensional graph.

Now you need somewhere to put the results. Mata doesn't do graphs, so you'll have to use Stata to plot them. But first we have to think about what the results will be. Your function provides a snapshot of the object at any given time, so we'll take a thousand snapshots spread across the ten second span. Each snapshot will have two numbers, an x coordinate and a y coordinate, so those will be your variables (though to Mata they're just column one and column two of the x row vector).

Get out of Mata, then use standard Stata commands to set up the observations and variables you'll need.

set obs 1000
gen x=.
gen y=.

Now get back into Mata, and set up a view of these variables. Call it r (as in results):


Now you're ready to loop over your 1000 snapshots and get the results of the x function for each one:

for(i=1; i<=1000; i++) {

Note how the time for each snapshot, i/100, spreads them evenly across 10 seconds as i goes from 1 to 1000.

Finally exit Mata again, and create the graph.

scatter y x

It's the classic parabola of projectile motion as you no doubt expected, but of course the real point was to practice using functions and loops to generate and work with data.

As soon as we defined the x() function, Mata compiled it into "object code." This is not quite the same as machine-language code that can be run all by itself. But it is something Mata can understand and run very quickly. If you were planning to use this function in the future you could save the object code. Then future programs wouldn't need to spend time compiling it. If this is something you're interested in, see the mlib and mata mosave commands.

This example is contained in


A pointer is a variable which contains the memory address of another variable or matrix. Thus it "points" to that other variable. In principle a pointer is just a number, but you'll never work with the number directly. Instead, you'll use two operators, & and *. Computer scientists call these the "reference" and "dereference" operators, but I like to think of them as "the address of" and "the thing at."

Consider the following:


First we define a matrix x so we have something to work with. Then p is defined as "the address of x." Thus *p or "the thing at p" is just another name for x. You can use subscripts with *p just like you would with x, but note how you have to put *p in parenthesis first. (*p)[2,2] means "find the thing at the address contained in p and get the value at row two, column two. " *p[2,2] would mean "get the value at row two, column two of p, and then find the thing at that address" which won't work because p is a scalar.

Note that the matrix you are pointing to doesn't need to have a name. For example:


makes p point to a 5x5 matrix of zeroes (since that's what J returns) even though there's no direct name for that matrix. It's only accessible as *p.

One use for pointers is to construct data structures Mata doesn't handle automatically. For example, you can construct a three dimensional matrix by making a two dimensional matrix of pointers, each of which points to a column vector.

The first step is to define the matrix that will contain the pointers. Matrices can't switch between containing numbers and containing pointers, so you need to make sure that the matrix is defined as containing pointers. On the other hand, we don't have any actual pointers to store yet. Thus we'll set the initial values of the matrix to NULL, a special pointer that doesn't point to anything.


Now loop over all the elements of x, and make each one a pointer to a new (and unique) column vector.

for (i=1; i<=rows(x); i++)
for (j=1; j<=cols(x); j++)

To work with an element i,j,k of your three dimensional matrix you'd use the following:


Yes, the parentheses are essential. It's rather awkward though, so if you were going to work with such matrices a lot consider defining the following functions:

function put(val,x,i,j,k)
/* Usage: value to put, matrix to put it in, i, j, k to put it at. */

function get(x,i,j,k)
/* Usage: matrix to get from, i, j, k of value to get. */

Then you can do things like:


The fact that you can't mix pointers and regular data within a matrix does limit your flexibility. You can't, for example, have a matrix of individual data including pointers to other data relevant to the individual. You can, however, have two parallel matrices, one containing numbers and the other containing pointers. If row i represents the same individual in both matrices, you can pull information from either or both as needed.

It's also possible to create a pointer to a function. Given your existing function doubleAndSum() try:


Note how Mata distinguishes between x() the function and x the variable by the parentheses, even though you're not passing in any arguments when you define a pointer.

The main reason you'd want to create a pointer to a function is so that you can use that pointer as the argument of a function. For example, Mata's optimizer functions have you pass in a pointer to the function which is to be optimized.

The Mata Optimizer

Many statistical operations involve maximizing or minimizing some quantity--maximum likelihood estimation being the obvious example. Mata includes an optimizer for these operations. Mata's optimizer is actually a family of functions which define and then solve an optimization problem.

Evaluator functions

The first function must be written by you: the function to be maximized or minimized. This function can do whatever you want, it may be very complex or very simple, but it must accept a certain set of arguments and return the proper result in order for the optimizer to use it. You can name the arguments whatever you like, but there must be the right number and each must have the proper meaning.

Optimization problems are described using a variety of notations, but if we consider the problem of maximizing y=f(x), the Mata version of f() must take the following arguments:

  • a number indicating whether the function is supposed to calculate (0) just f(x), (1) y=f(x) and f'(x) or (2) f(x), f'(x) and f''(x)
  • x (the thing that changes as the optimizer looks for the max)
  • y (where f(x) will be stored)
  • A variable to store f'(x) if calculated
  • A variable to store f''(x) if calculated (often H since it is the hessian in multivariate problems)

If you can find an analytic solution for f'(x) and f''(x) and code them in, Mata's optimizer will be much faster and more accurate. This is often impractical though, and Mata is perfectly willing to find numeric approximations to the derivatives it needs. Note, however, that even if your function never calculates derivatives it must still accept variables where they can be stored.

So let's try a very simple function: y=-x^4. If you're willing to let Mata find all the derivatives for you, you can code this as the following:

function f(todo,x,y,g,H)

Note that todo (the number telling the function whether to calculate derivatives or not), g, and H are completely unused, but must still be accepted. Note also that we could change all the names. We could call x Fred and y George as long as George=f(Fred).

If we're not quite so lazy we can also code the derivatives for this function quite easily:

function g(todo,x,y,g,H)
if (todo>=1)
if (todo==2)

In this case, both x and y were scalars, but you're not limited to scalars. Consider y=-x1^4 - x2^4

function h(todo,x,y,g,H)

The two x's are stored in a row vector x, so x1 is x[1] and x2 is x[2]. Making sure the optimizer sends in the right size of x is one of the steps in setting it up.

In many cases the quantity you'll want to maximize will be the sum of a column. For example, in maximum likelihood you will probably create a column where each row gives an observation's contribution to the likelihood function. Mata's optimizer will do this automatically with the proper settings, so the previous function could be recast as:

function i(todo,x,y,g,H)

Note how the function had to define y as a column vector of the proper size--otherwise it is a scalar.

In statistical applications the quantity to be maximized will depend not just on parameters that can vary (x in our problems thus far) but on data that do not vary. Mata's optimizer can be set up to pass up to nine additional arguments to your evaluator function, which can contain the data. They go after the first two arguments (and before the final three).

This calls for a change of variable names. Consider maximizing s=f(b). We'll now use x for the data matrix. Then the function definition would be:

function f(todo, b, x, s, g, H)

x can contain both the independent and dependant variables, but if it's easier to work with a separate matrix y, then the definition becomes:

function f(todo, b, x, y, s, g, H)

We'll do an example using this kind of evaluator shortly.

Setting Up and Running an Optimization Problem

Once you've got your evaluator function defined, you're ready to set up the optimization problem. The first step is to call the optimize_init function. optimize_init takes no arguments and returns a variable containing a description of your optimization problem. You'll never look at this description, but you will pass it in to all the other optimization functions.


Next tell it where to find the evaluator:


The first argument is the problem description, and the second is a pointer to the evaluator function you already defined.

Now give it a starting value for x--remember the f() function is in the form y=f(x).


Of course the correct answer is zero, but we want it to do some work!

Now you're ready to actually run the optimizer:


This returns the value of x which maximizes f(x). You may want to store this in a variable:


If we want to use the g(x) function instead, there's one additional step. Recall that in g(x) we coded the first and second derivatives ourselves so Mata doesn't have to approximate them. Mata refers to this as a "d2" evaluator. A "d1" evaluator codes just the first derivative, and a "d0" evaluator codes no derivatives at all (the f(x) function is a d0 evaluator). Mata will assume functions are d0 unless we say otherwise, so you need to add:


Note that the optimizer doesn't care what order all the initialization functions are called in, as long as they're before the actual optimize().

What about h(x1, x2)? It's identical to f(x), except that we need to set an initial value for both variables:


In doing so we also tell Mata that future x's must have two columns.

Finally, i(x1,x2), where the function to be maximized is the column sum of y, is what Mata calls a "v0" evaluator. The "v" is for vector, and the "0" again means that we didn't code any derivatives. Thus we need


To see the complete code to run the optimizer with each evaluator, see the the last parts of There is one setup function we haven't needed to call but you should know:


changes the problem from maximizing the function (the default) to minimizing it.

Example: Ranking Teams

As a final, extended example, consider a problem familiar to any sports fan: determining how good a team is based on its won/loss record.

We'll assume that a team can be characterized by a single "strength" variable s. If team i plays team j, we'll assume that the probability of team i winning is given by exp(si-sj)/(1+exp(si-sj)), better known to Stata as invlogit(si-sj). Then, given a record of games played and who won, we can find a set of values for s that maximizes the probability of the given outcome. We'll do a Monte Carlo study by first creating data which fits our assumptions, and then seeing how well the method works.

Creating the Data

The first step is to create the data. This is an exercise in matrix manipulation, so if you want to focus on the optimization part of the problem feel free to skip ahead. On the other hand, most readers will benefit from some practice in this area.

First create a row vector str containing the real strengths of each team. For simplicity, use the uniform function, giving strengths distributed uniform(0,1). For our example we'll make 50 teams:


Note that the column number within the str vector acts as a sort of team ID.

Next we need a way to keep track of who played who. We'll create a two-column matrix, where each row is a game and the two columns will contain the IDs of the two teams who played in that game. For brevity we'll call the team in column one the "home" team and the team in column two the "visiting" team. We'll assume that each team plays 20 games, 10 as the home team and 10 as the visiting team.

Begin by creating a column vector teams containing the 50 teams using the range operator:


Now create a column vector season which is just ten copies of teams stacked on top of each other:


This represents the home teams. Next assign the visiting teams by taking the same vector, putting it in a random order, and column joining it to the original:


The only trouble is, it's entirely possible for a team to be randomly assigned to play itself. This wouldn't really bother our estimator, but it does offend any claims that this represents the real world. More importantly, fixing it is good practice.

Create a column vector same with the same number of rows as season which contains a 1 if the home and visiting teams are the same and a 0 if they are not:


If a game has the a team playing itself, we will swap the visiting team with the visiting team of a randomly chosen game. Since it's possible we might get the same team yet again, we'll keep checking and swapping until there are no more games between the same team. Here's the code:

while (max(same)==1)
for(i=1; i<=rows(season); i++)
if (same[i])

Note how max(same) will be zero if there are no longer any games where the home and visiting teams are the same, so that's how we know when we're done.

We then loop over the rows, stopping to change those where same[i] is one (or true). In those cases, we pick a random row and swap visiting teams with it, using a temp variable to store its visiting team's ID as we do. We then recalculate same based on the new version of season before the while condition is reevaluated.

Now we need to decide who won each game. We'll create a column vector winner, which will contain a 1 if the home team won and a zero if the visiting team won.

for(i=1; i<=rows(season); i++)

Note how indexing is used to pull up the strength (str) of the appropriate team--we'll be doing that a lot.

The Maximum Likelihood Estimator

Now we're ready to construct the (log) likelihood function to be maximized. We'll start with a version that's easy to understand, and try to make it efficient later.

function llf(todo,strhat,season,winner,llf,g,H)
if (winner[i]) llf[i]=log(invlogit(strhat[season[i,1]]-strhat[season[i,2]]))
else llf[i]=log(invlogit(strhat[season[i,2]]-strhat[season[i,1]]))

This will be a v0 evaluator which takes season and winner as additional arguments. The estimated strengths are stored in strhat, and the column of log likelihoods (which will be summed automatically by virtue of being v0) is stored in llf.

We have two possible outcomes, and the formula for finding the log likelihood is different in each outcome. For now we'll handle the two possibilities with an if/else structure, but there are more efficient ways.

Now to set up the optimization problem:

optimize_init_evaluator(s, &llf())

Most of these you've seen before. Note that strhat0 is the vector of starting values for our estimate of str. Since the actual strengths are distributed uniform(0,1) we'll start by setting them all to 0.5.

What is new is optimize_init_arguments. This is where you tell the optimizer to pass in season and winner to your evaluator. As you see optimize_init_arguments takes three arguments: the optimization problem, the number of the argument you're setting, and what to pass in.

Run the code. It will take a while but it should work.


So how can we make it faster? It would be nice if we didn't have to figure out which formula to use for the likelihood. So let's rearrange the data a bit: instead of column one being the "home" team and column two the "visiting" team, terms which have no real meaning in our model, let's make column one the winner and column two the loser.

Create a new matrix season2 with the new arrangement:

if (winner[i]) season2[i,.]=season[i,.]
else season2[i,.]=season[i,(2,1)]

Of course this loop takes time to run, but it only runs once. It's the evaluator that must be run over and over, so taking a bit more time to set things up so that the evaluator runs faster is well worth it.

One general principle when it comes to writing fast Mata code is that matrix operations are faster than loops you write out. There's no matrix operation that would allow you to look in one matrix for the estimated strength of a team identified in another matrix, but you can take calculating the log and invlogit functions out of the loop.

Here's a second and more efficient version of the evaluator function:

function llf2(todo,strhat,season,llf,g,H)

Note how it doesn't need to have the winner matrix passed in anymore--it expects a version of season (season2) that conveys that information by which team is in column one.

Here's the setup needed to run this version:

optimize_init_evaluator(s, &llf2())

You'll see that this runs in about half the time of the original. Most of the gain comes from moving the calculation of log and invlogit out of the loop.

For those looking for the absolute best performance, consider turning on matastrict (mata set matastrict on). Matastrict requires that you declare the names and types of all variables before using them rather than letting Mata choose. Mata has more variable types than most languages, and they can be confusing. On the other hand, declaring your variables can help you avoid errors. More importantly, Mata's compiler can use the additional information to create slightly more efficient object code.

If you are interested in using matastrict, see the manuals and especially the section on declarations.


One characteristic of this model is that only the difference between teams is identified. You could add 100 or -1,000,000 to all the strengths and the probability of each outcome would remain the same. Thus two different runs on the same data could give very different numbers and both be right. However, if we constrain just one strength to be a given number, then all the strengths are identified.

Mata's optimizer accepts constraints on the parameters in the form of two matrices C and c. The parameters p are then constrained such that Cp=c.

Let's constrain the strength of team one to be zero. To implement this, the C matrix needs to be a row vector with a column for each team. It will have a one in the first column and a zero in all other columns, which makes it a unit vector. c will be simply the scalar zero.


You then pass in this constraint using the optimize_init_constraints function. It takes two arguments: the problem s, as usual, and then a matrix which is the row join of C and c. I'm not sure why it doesn't just take them as two separate arguments, but it's easy to join them.


Since you've made no other changes to the problem, you can simply run it again by calling optimize.


This version will actually run significantly faster.

As an exercise, consider constructing some sort of metric for how well your estimator does. (One easy one would be how often it correctly identifies the best team.) Then vary the number of games per season and see how many it takes to get reasonable accuracy. However, don't do this if you want to continue to take the ranking systems of actual sports seriously.

This example (with additional commands for timing each method) is found in

Learning More

This article has just scratched the surface of what's possible in Mata. There's obviously much more to learn, and even more to be looked up when you need it.

As usual, Stata Corp. has included most of the Mata documentation in the online help. There is one trick though: to get help for Mata you need to type help mata topic rather than just help topic. This is especially important for functions and such that exist in both Mata and Stata. For example, compare the results of the following:

help abs
help mata abs

A couple useful starting places:

help mata
help mata functions

The Mata manuals are available in the CDE Library and the 4218 lab. There are two books, but they should be thought of as two volumes of the same manual.

You are also welcome to ask the Help Desk for assistance. Mata is new to us as well, but we'll try to figure things out together.

Last Revised: 10/24/2007