Stata code for designing custom graph colors

A few readers may be interested in how I used Stata to create the color scheme for the offenses in the graphs I’ve posted recently. This is a “stats nerd” post that assumes the reader uses Stata, a statistical package. Everybody else may wish to give it a pass. Some preliminary tricks, then the code. UPDATED 8/6/2017 to include RGB values in the color palette and to give the formulas for calculating them from intensities.

Trick 1 that I have learned is to generate self-labeling lines by creating a variable that has the label only in the last value of the x-axis variable, year in my case. E.g. gen xvalue15=Label if xvalue==15. Or self-labeling scatterplots by having a label for all values.

Trick 2 is to use Stata macros to generate the lines of a plot. The general scheme is:

local plotlist ""
foreach val in `list of values' {
    local plotlist "`plotlist' (code_for_one_line )"
    }
twoway `plotlist', [code for graph as a whole]

In this code, each line gets added to the macro plotlist. Pro tip: remember to reset the plot macro to ” ” (empty) (or use a new macro name each time) or you will get unpleasant results with repeated graphs.

Color Swatch Generator

Although Stata can generate colors using any set of RGB values, for a variety of reasons* I found it easiest to work with the built-in named colors. Named colors can be modified with the syntax “color*##. Numbers less than 1 lighten the color and numbers greater than 1 darken the color. The ado file full_palette  generates a swatch of the 66 named colors in Stata, with their RGB values (you can access this by typing help full_palette and installing the ado), and the built-in ado palette color  will show color samples and the RGB values for two colors (type help palette color to see the syntax of the command). But I wanted to see ranges of colors using the intensity values across several different named colors. I also tested creating my own color (uwred) and saving it in a .style file. **

 

* Stata program to generate color swatches
* Pamela Oliver 5/29/2017, updated 6/18/2017 and 8/6/2017 
* now locates .style file for named color and identifies RGB code
* calculates and prints RGB values as well as color names
* uses syntax "scatter y1 y2 x" to  print two-line labels
* edits code so colors now print in the order of colorlist 
     * instead of alphabetical
* uwred is a color I created and saved in my personal ado file
version 14.2 
local colorlist "ebblue eltblue orange orange_red red uwred purple"
local intenlist ".8 1 1.3 1.5 1.8 2"
local intnumlist=subinstr("`intenlist'"," ",", ",.) // a numlist with commas
disp "`intnumlist'"
** This code to get a decent intensity range in the graph regardless of entries
local maxint=max(`intnumlist')
local minint=min(`intnumlist')
disp `maxint' `minint'
local range=`maxint'-`minint'
local edgegap=round(`range'*.2,.01)
local labgap=`edgegap'/8
local lowint=`minint'-`edgegap'
local yrange "`lowint' (`edgegap') `maxint'"
** This code generates the data for the plots
local ncolor=wordcount("`colorlist'")
local ninten=wordcount("`intenlist'")
local ncases=`ncolor'*`ninten'
disp "ncolor `ncolor' ninten `ninten' ncases `ncases'"
set more off
clear
set obs `ncases'
gen case=_n
gen ncases=_N
gen color=""
gen intenS=""
gen colorname=""
** fill in the strings with colors and intensities
local ii=1
forval color= 1/`ncolor' {
forval inten= 1/`ninten' {
     replace basecolor=word("`colorlist'",`color') if case==`ii'
     replace colornum=color' if case==ii' 
 replace intenS=word("`intenlist'",`inten') if case==`ii' 
 replace colorname=color+"*"+intenS 
 replace col_int_num=ii' if case==ii' local ii=`ii'+1 } } 
*** the num variables are sequential 
** uses the string variables as value labels for the numerical variables 
labmask colornum, values(basecolor) 
labmask col_int_num, values(colorname) 
** create numeric version of intensity 
encode intenS, gen(intennum) 
gen inten=real(intenS) // this is the actual numeric value of intensity 
gen RGB_base="" 
gen RGB_base="" 
** this code snippet taken from full_palette.ado with modifications 
** it finds and reads the color style file for each named color 
** and extracts the RGB values for that color and puts it in a variable RGB_base 
** I did not copy all the error code returns 
foreach base in `colorlist' { 
 tempname hdl`base'  // assigns a tempfile name 
 findfile color-`base'.style // this command searches all ado directories 
 local colorfile=r(fn)  // findfile returns the file location 
 file open `hdl`base'' using "`colorfile'", read text 
 file read `hdl`base'' line 
 while r(eof)==0 { 
 tokenize `"`line'"' 
 if "`1'"=="set" & "`2'"=="rgb" { 
 *qui replace lab=`""`3'`basemod'""' in `i'  // the code from full_palette.ado 
 qui replace RGB_base="`3'" if basecolor=="`base'" 
 file close `hdl`base'' continue, break 
 } 
 file read `hdl`base'' line 
 } 
 } 
*** parse the original RGB codes 
gen Ro=real(word(RGB_base,1)) 
gen Go=real(word(RGB_base,2)) 
gen Bo=real(word(RGB_base,3)) 
foreach CC in R G B { 
 gen `CC'x=`CC'o if inten==1 
 replace `CC'x=round(`CC'o/inten,1) if inten>1 
 replace `CC'x=`CC'o + round((1-inten)*(255-`CC'o)) if inten<1 
 } 
gen RGB_derived=string(Rx)+" "+string(Gx)+" "+string(Bx) 
** gen a value of inten just a little lower to carry the second label 
gen int2=inten-`labgap'  // slightly lower value on vertical axis 
set scheme s1color 
local plot "" 
local plot2 "" 
summ col_int_num 
local nplots=r(max) 
forval point=1/`nplots' { 
 **get the values for each point in the plot 
 qui summ col_int_num if col_int_num==`point' 
 local labelnum=r(mean) 
 local colorname: label col_int_num `labelnum' 
 qui summ colornum if col_int_num==`point' 
 local colnum=r(mean) 
 local color: label colornum `colnum' 
 qui summ intennum if col_int_num==`point' 
 local intnum=r(mean) 
 local inten: label intennum `intnum' 
** collect the plots for each point in locals named plot and plot2 
 local plot "`plot' (scatter inten colornum if col_int_num==`point', mcolor(`colorname') msize(huge) mlab(colorname) mlabc(`colorname') mlabsize(tiny) mlabpos(6))" 
 local plot2 "`plot2' (scatter inten int2 colornum if col_int_num==`point', ms(S none) mcolor(`colorname' `colorname') msize(*3 *3) mlab( RGB_derived colorname) mlabc(`colorname' `colorname') mlabsize(vsmall vsmall) mlabpos(6 6))" 
 } // end of the loop defining the plots 
*disp "`plot'" // in case you need to check whether it worked 
local xmax=`ncolor'+1 // put an extra column of padding in the table 
local testname "test_today" // give a unique name to the output file 
** plot with just the colornames and round swatches 
twoway `plot' , legend(off) ylab(.25 (.25) 2) xlab(0 (1) `xmax', val) xtitle(color) ytitle(intensity) 
graph export `testname'_sample_color_swatch.png, replace 
**plot with two labels and square swatches 
twoway `plot2' , legend(off) ylab(`yrange') xlab(0 (1) `xmax', val) xtitle(color) ytitle(intensity) 
graph export `testname'_color_swatch_RGB2.png, replace width(800) height(600) exit

 

 

Color Line Generator

color_lines_sample

My application has too many values to use just color (or so I judged) so I also used line type. Thus the code to generate sample lines.

stata 14.2
* insert colors, intensities, patterns in the lists as desired

local colorlist "orange_red ebblue"
local intenlist ".5  1 1.75 "
local lplist "solid dash shortdash"
local ncolor=wordcount("`colorlist'")
local ninten=wordcount("`intenlist'")
local nlp = wordcount("`lplist'")
local ncases=`ncolor'*`ninten'*`nlp'
clear
set obs `ncases'
gen case=_n
gen Ncases=_N
gen hue=""
gen inten=""
gen linepat=""
set more off
set scheme s1color  // white background
*** fill in the color values, text variables
local xx=1
forval col=1/`ncolor' {
     forval int=1/`ninten' {
       forval lpat=1/`nlp' { 
          replace hue=word("`colorlist'", `col') if case==`xx' 
          replace inten=word("`intenlist'", `int') if case==`xx'
          replace linepat=word("`lplist'", `lpat') if case==`xx' 
       local xx=`xx'+1 
       } 
       } 
       } 
** CREATE 16 values for the X axis ****** 
Duplicate observations
expand 2, gen(copy1)
expand 2, gen(copy2)
expand 2, gen(copy3)
expand 2, gen(copy4)
gen xvalue=copy1 + 2*copy2 + 4*copy3 + 8*copy4

* generate text from other text
gen color=hue+"*"+inten
gen definition=hue+"*"+inten+" "+linepat
gen def15=definition if xvalue==15
* create numeric variables with the strings as values
encode color, gen(colornum)
encode linepat, gen(lpnum)
qui sum colornum
local ncol=r(max)
forval colnum=1/`ncol' { 
    local col`colnum' = `colnum' 
    }
forval lpnum=1/`nlp' { 
     local lp`lpnum'=`lpnum' 
    }

local plotlist ""
disp "ncases `ncases'"
forval case=1/`ncases' { 

    qui summ colornum if case==`case' 
    local cn=r(mean) 
    local color: label colornum `cn' 

    qui summ lpnum if case==`case' 
    local ln=r(mean) 
    local lpat: label lpnum `ln' 

    local plotlist "`plotlist' (connected case xvalue if case==`case', msym(i) mlab(def15) lc(`color') mlabc(`color'') lp(`lpat'))" 
    }
twoway `plotlist', legend(off) xlab(0 (2) 22)
graph export color_lines_sample.png, replace

Offense line palette

This is the problem that started me on this path. I have 17 offenses for which I want to graph imprisonment over  time. Letting Stata choose the colors generates an unreadable hash. And brewscheme won’t help because I want to assign particular markers/colors to particular offenses, not create a general order of colors. After working on this problem a while, I realized the graph could be more meaningful if similar offenses had related colors. Generating a variable-specific palette is easy using the skills developed above.

offense_lines_2017-6-1set1

Step 1: Create a spreadsheet with the variable names and labels plus columns for variable groups, color name (hue), intensity, line type, and the order in which I wanted the graphs to appear in my sample. This last is to put the colors that might be difficult to distinguish next to each other in the sample. In my spreadsheet, I put different possible color schemes in different tabs. Here is one sample.

OffLab offdetail group hue intensity line order
Drugs 12 drugdwi navy 2 solid 10
DWI 20 drugdwi navy 2 dash 11
Escape_etc 21 misc ebblue 0.5 solid 16
Family 22 misc ebblue 0.5 shortdash 17
Larceny 8 property ebblue 1.5 dash 12
MVTheft 9 property ebblue 1.5 solid 13
Fraud 10 property ebblue 1 shortdash 14
OthProp 11 property ebblue 1 solid 15
Robbery 4 robbur purple 1 solid 9
Burglary 7 robbur purple 1 dash 8
Murder 1 violent orange_red 1.75 solid 7
NegMansl 2 violent orange_red 1.75 shortdash 6
Rape 3 violent orange_red 1.75 dash 5
Assault 5 violent orange_red 1 dash 4
OthViolent 6 violent orange_red 1 solid 3
Weapon 23 violent orange_red 0.5 solid 2
PubOrd 13 violent orange_red 0.5 dash 1

The do file reads the spreadsheet (with a local parameter that selects the tab) and generates a sample plot.

stata 14.2
local group set1
import excel "offense_colors_lines.xlsx", sheet("`group'") firstrow allstring clear
gen color=hue+"*"+intensity
encode color, gen(colornum)
encode line, gen(linenum)
destring offdetail, replace
destring order, replace

** I save this as a Stata file so I can merge it into the data file for production runs

save "offense_lines_2017-6-1`group'.dta", replace

levelsof offdetail, local(offlist) clean
foreach off in `offlist' {
    qui summ colornum if offdetail==`off'
    local cnum=r(mean)
    local col`off': label colornum `cnum'
    qui summ linenum if offdetail==`off'
    local lnum=r(mean)
    local line`off': label linenum `lnum'
    }
** creates values for an X axis
expand 2, gen(copy1)
expand 2, gen(copy2)
expand 2, gen(copy3)
expand 2, gen(copy4)
gen xvalue=copy1 + 2*copy2 + 4* copy3 + 8*copy4
gen OffLab15=OffLab if xvalue==15


local plotlist ""
forval xx=1/17 {
   qui summ offdetail if order==`xx'
   local off=r(mean)
   local plotlist "`plotlist' (connected order xvalue if offdetail==`off', ml(OffLab15) ms(i) lc(`col`off'') mlabc(`col`off'') lp(`line`off''))"
    }
disp "`plotlist'" 
twoway `plotlist', legend(off) xlab(0 (3) 20)
graph export "offense_lines_2017-6-1`group'.png", replace

Using this scheme in my production graphs involves this code:

use [data file]

merge m:1 offdetail using offense_lines_2017-6-1set1.dta

levelsof offdetail, local(offlist) clean
foreach off in `offlist' {
     qui summ colornum if offdetail==`off'
     local cnum=r(mean)
     local col`off': label colornum `cnum'
     qui summ linenum if offdetail==`off'
     local lnum=r(mean)
     local line`off': label linenum `lnum'
     }

These local macros can then be used in the production graphs with the same code logic as was used to generate the samples.

Notes
Originally blogged at:
Stata: roll your own color palettes

* I originally tried to use the RGB values from specific palettes I found on line, but passing RGB values in a macro the way I do with my offense colors did not work. I think the problem is a subtle Stata bug/behavior about parsing quotes within quotes within quotes in macros referring to macros and/or the parsing of a list of numbers separated only by spaces. When I used the most straightforward syntax, Stata eliminated the spaces between the numbers (a very odd behavior!), and when I added the Stata special double quotes `” and “‘ , that problem was solved but the resulting code generated an error. However, if you use ado files you can find on line to create and save new colors with names, those new colors should work fine with this routine. You create a new color by creating a file named color-COLORNAME.style in your personal ado path (I put it in a style folder that had previously been created but anywhere works); the content of this file must be

set rgb "255 255 255"

where you replace the 255’s with the RGB codes for the color you want to name. If you examine the color-NAME.style files in your system files (which you can find by typing “findfile color-red.style” in a Stata session  and reading the resulting path) you will see that you can also include comments labels and other commands that don’t get in the way of this core command, but this is the one you need.

** I spent some time studying the code for the ado files palette.ado and full_palette.ado trying to figure out how the RGB values were generated  from the color and intensity values so I could put them in my palette as well, but finally gave up. Both ado files read the RGB code for the base color from the color .style file, but I could not find the code in palette.ado that computes the derived RGB when there is an intensity factor. It must not look the way I’m expecting it to look.

By experimentation with putting values into palette color, I learned that an intensity greater than 1 consistently divides the RGB values by that number (e.g. ebblue is RGB 0 139 188 and ebblue*2 is 0 70 94). Lower RGB values are darker with black being 0 0 0). An intensity less than 1 increases the values of all three RGB values and pulls it toward white, which has RGB 255 255 255. So for example, red is 255 0 0 , red*5 is 255 128 128, red*.2 is 255 204 204, ebblue is 0 139 188, ebblue*.5 is 128 197 222, teal is 110 142 132, teal*.5 is 183 199 194, teal*.2 is 226 232 230. If the color is pure and fully saturated, the intensity factor adds (1-int)*255 to the other colors. I am sure I could empirically work out the formula for intensities less than 1 for the more complex cases if I spend more time on it, but it is not immediately obvious to me.  If you know the formula and put it in the comments, I would be grateful. I’m not sure it matters except to my curiosity. EDIT:  The correct general formula for intensity<1 is:  orig_RGBnum + (1-intensity)(255-orig_RGBnum) for each of the three original RGB numbers. I still have not found the actual code that implements these formulas in the palette.ado file. But I was able to implement the formula in my own program.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.