Additional examples for: Stata tip #: Exploring model consequences by plotting different predictions

Maarten L. Buis

Table of content

Original example

. * get -lean- scheme . * net install "http://www.stata-journal.com/software/sj4-3/gr0002_3", replace . set scheme lean1

. . sysuse auto, clear (1978 Automobile Data)

. reg price c.mpg##c.weight##c.weight i.foreign

Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 6, 67) = 14.35 Model | 357171591 6 59528598.5 Prob > F = 0.0000 Residual | 277893805 67 4147668.73 R-squared = 0.5624 -------------+------------------------------ Adj R-squared = 0.5232 Total | 635065396 73 8699525.97 Root MSE = 2036.6

------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | 339.3676 642.7191 0.53 0.599 -943.5052 1622.24 weight | -.4190842 9.351452 -0.04 0.964 -19.08466 18.24649 | c.mpg#| c.weight | -.3354189 .4546588 -0.74 0.463 -1.242922 .5720839 | c.weight#| c.weight | .0003336 .0014293 0.23 0.816 -.0025194 .0031866 | c.mpg#| c.weight#| c.weight | .0000668 .0000766 0.87 0.386 -.0000861 .0002197 | 1.foreign | 3169.324 678.0392 4.67 0.000 1815.952 4522.696 _cons | 4005.813 14733.72 0.27 0.787 -25402.82 33414.44 ------------------------------------------------------------------------------

. . qui sum mpg if e(sample)

. local m = r(mean)

. local sd = r(sd)

. . preserve

. qui replace foreign = 0

. . forvalues i = -2/2 { 2. qui replace mpg = `m' + `i'*`sd' 3. local j = `i' + 2 4. qui predict yhat`j' 5. }

. format yhat* %8.0gc

. sort weight

. twoway line yhat0 yhat1 yhat2 yhat3 yhat4 weight, /// > ytitle("predicted price (US {c S|})") /// > lpattern(solid solid solid solid solid) /// > lcolor(gs13 gs10 gs7 gs4 gs1) /// > legend( order( - "mpg" /// > 1 "mean - 2*sd" /// > 2 "mean - 1*sd" /// > 3 "mean" /// > 4 "mean + 1*sd" /// > 5 "mean + 2*sd" ))

. restore

[do-file]

first example graph

Adding colors

In this application it is useful to choose a gradient of colors for the lines that emphasize the ordinal nature of the predictions. A useful tool that can help choosing such colors is http://colorbrewer2.org/. This will give suggestions for colors in terms of RGB values. You can add these values directly in the Stata graph, as is shown in the example below.

. * get -lean- scheme . * net install "http://www.stata-journal.com/software/sj4-3/gr0002_3", replace . set scheme lean1

. . sysuse auto, clear (1978 Automobile Data)

. reg price c.mpg##c.weight##c.weight i.foreign

Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 6, 67) = 14.35 Model | 357171591 6 59528598.5 Prob > F = 0.0000 Residual | 277893805 67 4147668.73 R-squared = 0.5624 -------------+------------------------------ Adj R-squared = 0.5232 Total | 635065396 73 8699525.97 Root MSE = 2036.6

------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | 339.3676 642.7191 0.53 0.599 -943.5052 1622.24 weight | -.4190842 9.351452 -0.04 0.964 -19.08466 18.24649 | c.mpg#| c.weight | -.3354189 .4546588 -0.74 0.463 -1.242922 .5720839 | c.weight#| c.weight | .0003336 .0014293 0.23 0.816 -.0025194 .0031866 | c.mpg#| c.weight#| c.weight | .0000668 .0000766 0.87 0.386 -.0000861 .0002197 | 1.foreign | 3169.324 678.0392 4.67 0.000 1815.952 4522.696 _cons | 4005.813 14733.72 0.27 0.787 -25402.82 33414.44 ------------------------------------------------------------------------------

. . qui sum mpg if e(sample)

. local m = r(mean)

. local sd = r(sd)

. . preserve

. qui replace foreign = 0

. . forvalues i = -2/2 { 2. qui replace mpg = `m' + `i'*`sd' 3. local j = `i' + 2 4. qui predict yhat`j' 5. }

. format yhat* %8.0gc

. sort weight

. twoway line yhat0 yhat1 yhat2 yhat3 yhat4 weight, /// > ytitle("predicted price (US {c S|})") /// > lpattern(solid solid solid solid solid) /// > lcolor("255 255 178" /// > "254 204 92" /// > "253 141 60" /// > "240 59 32" /// > "189 0 38") /// > legend( order( - "mpg" /// > 1 "mean - 2*sd" /// > 2 "mean - 1*sd" /// > 3 "mean" /// > 4 "mean + 1*sd" /// > 5 "mean + 2*sd" ))

. restore

[do-file]

second example graph

Displaying splines instead of quadratic curves

The idea behind this tip is not limited to quadratic curves, below is an example that applies it to linear splines.

. * get -lean- scheme . * net install "http://www.stata-journal.com/software/sj4-3/gr0002_3", replace . set scheme lean1

. . sysuse auto, clear (1978 Automobile Data)

. mkspline w1 3000 w2 = weight

. reg price c.mpg##c.w1 c.mpg#c.w2 c.w2 i.foreign

Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 6, 67) = 13.75 Model | 350442579 6 58407096.5 Prob > F = 0.0000 Residual | 284622817 67 4248101.75 R-squared = 0.5518 -------------+------------------------------ Adj R-squared = 0.5117 Total | 635065396 73 8699525.97 Root MSE = 2061.1

------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -95.44225 385.0294 -0.25 0.805 -863.9641 673.0796 w1 | .3707357 3.900715 0.10 0.925 -7.415124 8.156595 | c.mpg#c.w1 | .0205254 .1642997 0.12 0.901 -.3074183 .3484691 | c.mpg#c.w2 | .0372192 .1768719 0.21 0.834 -.3158184 .3902569 | w2 | 4.231955 2.964785 1.43 0.158 -1.68578 10.14969 1.foreign | 3178.442 687.4848 4.62 0.000 1806.217 4550.668 _cons | 3462.536 9888.378 0.35 0.727 -16274.75 23199.82 ------------------------------------------------------------------------------

. . qui sum mpg if e(sample)

. local m = r(mean)

. local sd = r(sd)

. . preserve

. qui replace foreign = 0

. . forvalues i = -2/2 { 2. qui replace mpg = `m' + `i'*`sd' 3. local j = `i' + 2 4. qui predict yhat`j' 5. }

. format yhat* %8.0gc

. sort weight

. twoway line yhat0 yhat1 yhat2 yhat3 yhat4 weight, /// > ytitle("predicted price (US {c S|})") /// > lpattern(solid solid solid solid solid) /// > lcolor(gs13 gs10 gs7 gs4 gs1) /// > legend( order( - "mpg" /// > 1 "mean - 2*sd" /// > 2 "mean - 1*sd" /// > 3 "mean" /// > 4 "mean + 1*sd" /// > 5 "mean + 2*sd" ))

. restore

[do-file]

third example graph

After logit instead of regress

Nor is the idea limited to linear regression or models that include interaction terms. In many non-linear models, like logit, the predictions depend on the baseline value, so that way the effect of one variable will depend on other variables even when no interaction terms was included. However, as is also illustrated below, there is often an alternative way of presenting these effects where this "pseudo-interaction" does not occur.

. * get -lean- scheme . * net install "http://www.stata-journal.com/software/sj4-3/gr0002_3", replace . * . * get -grc1leg- . * net install "http://www.stata.com/users/vwiggins/grc1leg", replace . . set scheme lean1

. . webuse lbw, clear (Hosmer & Lemeshow data)

. logit low age lwt i.race smoke ptl ht ui

Iteration 0: log likelihood = -117.336 Iteration 1: log likelihood = -101.28644 Iteration 2: log likelihood = -100.72617 Iteration 3: log likelihood = -100.724 Iteration 4: log likelihood = -100.724

Logistic regression Number of obs = 189 LR chi2(8) = 33.22 Prob > chi2 = 0.0001 Log likelihood = -100.724 Pseudo R2 = 0.1416

------------------------------------------------------------------------------ low | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.0271003 .0364504 -0.74 0.457 -.0985418 .0443412 lwt | -.0151508 .0069259 -2.19 0.029 -.0287253 -.0015763 | race | 2 | 1.262647 .5264101 2.40 0.016 .2309024 2.294392 3 | .8620792 .4391532 1.96 0.050 .0013548 1.722804 | smoke | .9233448 .4008266 2.30 0.021 .137739 1.708951 ptl | .5418366 .346249 1.56 0.118 -.136799 1.220472 ht | 1.832518 .6916292 2.65 0.008 .4769494 3.188086 ui | .7585135 .4593768 1.65 0.099 -.1418484 1.658875 _cons | .4612239 1.20459 0.38 0.702 -1.899729 2.822176 ------------------------------------------------------------------------------

. . qui sum lwt if e(sample)

. local m = r(mean)

. local sd = r(sd)

. . preserve

. qui replace race = 1

. qui replace smoke = 0

. qui replace ptl = 0

. qui replace ht = 0

. qui replace ui = 0

. . forvalues i = -2/2 { 2. qui replace lwt = `m' + `i'*`sd' 3. local j = `i' + 2 4. qui predict pr`j' 5. qui predict odds`j', xb 6. qui replace odds`j' = exp(odds`j') 7. }

. format pr* %8.0gc

. sort age

. twoway line pr0 pr1 pr2 pr3 pr4 age, /// > name(pr, replace) /// > ytitle("probability of low birthweight") /// > title("probability") /// > lpattern(solid solid solid solid solid) /// > lcolor(gs13 gs10 gs7 gs4 gs1) /// > legend( subtitle("weight at last menstrual period") /// > order( 1 "mean - 2*sd" /// > 2 "mean - 1*sd" /// > 3 "mean" /// > 4 "mean + 1*sd" /// > 5 "mean + 2*sd" ) cols(2) colfirst)

. . twoway line odds0 odds1 odds2 odds3 odds4 age, /// > name(odds, replace) /// > ytitle("odds of low birthweight (log scale)") /// > title("Odds") /// > yscale(log) ylab(.025 .05 .1 .2 .4) /// > lpattern(solid solid solid solid solid) /// > lcolor(gs13 gs10 gs7 gs4 gs1) /// > legend( subtitle("weight at last menstrual period") /// > order( 1 "mean - 2*sd" /// > 2 "mean - 1*sd" /// > 3 "mean" /// > 4 "mean + 1*sd" /// > 5 "mean + 2*sd" ) cols(2) colfirst)

. . grc1leg pr odds

. . restore

[do-file]

fourth example graph