Solution

The easy way to find the manufacturer is to use the word() function. It takes two arguments: the first is the string you want to extract a word from and the second is the number of the word you want. Since the manufacturer is always the first word of make, all you need is:

gen manufacturer=word(make,1)

There are several ways to extract manufacturer from make, some of which will work in a wider variety of situations.

One alternative is the oddly-named egen function ends(), with the head option. This will give you the first word of the string:

egen manufacturer2=ends(make), head

The last option would give you the last word, and the tail option would give you all but the first word. But the advantage of ends() over word() is that ends() has a punct() option which lets you divide strings into "words" based on characters other than spaces. Thus if you had a variable fullname containing Dimond,Russell you could do:

egen firstname=ends(fullname), punc(",") last
egen lastname=ends(fullname), punc(",") head

The most flexible method uses substr(), but, as usual, flexibility implies complexity. substr() takes three arguments: the string you want to extract a substring from, the location where the substring should start, and the number of characters it should contain. Since we want the first part of make, the starting location is just 1. The trick is that the length of each manufacturer is different. However, we know we've hit the end of the manufacturer when we see a space, and we can use the strpos() function to find the space. strpos() takes two strings as arguments and returns the location of the second string within the first--or zero if the second string is not in the first, which can also be useful. Thus to find manufacturer using substr() you would type:

gen manufacturer3=substr(make,1,strpos(make," "))

This gives you one missing value, the car whose make is just Subaru. While word() and ends() interpreted Subaru as the first word, it confused our substr() method: strpos(make," ") returns zero because Subaru doesn't contain a space, and when substr() is asked to make a substring of zero length it responds with missing.

The moral of this story: use word() if you can, because it's so easy. But if you need to extract data from a complex piece of text (say, the HTML source code of a web page) substr() and strpos() may be your only hope.

Complete do file:

clear all
set more off
capture log close
log using data_ex3.log, replace
use auto

gen manufacturer=word(make,1)
list make manufacturer

log close

Last Revised: 12/17/2015