```

Log-linear models
Lecture 2: raking

Maarten Buis
office F532
maarten.buis@uni.kn
office hours by appointment

```

`-------------------------------------------------------------------------------`

` index >>`
```-------------------------------------------------------------------------------
```
```                               Table of content

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

```
```    Standardizing a table
```

```       Standardize the table for homogamy
```

```       Making all the row totals 100
```

```       Making all the colum totals 100
```

```       Repeat
```

```       Iterative Proportional Fitting
```

```       Can all tables be standardized?(ancillary)
```

```       Try it yourself
```
```
```
```    Standardization to compare tables
```

```       comparing tables
```

```       Try it yourself
```
```
```
```    keep the margins as observed, but change the pattern
```

```       Independence revisited
```
```
```
```    Standardization to known margins in the population
```

```       Margins in our sample and margins in the population
```

```       Computing post-stratification weights for the cohort 1940-1945

```
```

-------------------------------------------------------------------------------
Supporting materials
-------------------------------------------------------------------------------

Datasets
homogamy_allbus.dta  ALLBUS 1980 - 2016; on slide slide1.smcl
place.dta            German Live History Study I; on slide slide8.smcl
margins1940.dta      Volks- und BerufszĂ¤hlung 1970 and 1987; on slide
slide13.smcl

Do files
slide1ex1.do         initial look at the homogamy data; on slide
slide1.smcl
slide2ex1.do         load the data in Mata; on slide slide2.smcl
slide2ex2.do         adjust the rows to sum to 100; on slide slide2.smcl
slide3ex1.do         adjust the columns to sum to 100; on slide slide3.smcl
slide4ex1.do         repeat adjusting rows and colums; on slide slide4.smcl
slide5ex1.do         write that up in a loop; on slide slide5.smcl
slide5ex2.do         odds ratio in raw data; on slide slide5.smcl
slide5ex3.do         odds ratio in adjusted data; on slide slide5.smcl
slide6ex4.do         do this standardization with stdtable; on slide
slide6.smcl
slide9ex1.do         load data by cohort in Mata; on slide slide9.smcl
slide9ex2.do         store the desired margins; on slide slide9.smcl
slide9ex3.do         standardize the 1940 table to 1960 margins; on slide
slide9.smcl
slide9ex4.do         do this with stdtable; on slide slide9.smcl
slide11ex1.do        use IPF to find the counts under independence; on
slide slide11.smcl
slide13ex1.do        load data and census margins in Stata; on slide
slide13.smcl
slide13ex2.do        load data and census margins in Mata; on slide
slide13.smcl
slide13ex3.do        adjust table to fit census margins; on slide
slide13.smcl
slide13ex4.do        use these adjusted counts to create weights; on slide
slide13.smcl

solutions to "Try it yourself"
raking_sol1.do       sollution; on slide slide8.smcl
raking_sol2.do       sollution; on slide slide10.smcl

```

`-------------------------------------------------------------------------------`

`<< index >>`
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Standardize the table for homogamy

Consider the table of the education of the male and female partners in a
marriage or stable partnership.

```
```
. clear

. use homogamy_allbus.dta
(ALLBUS 1980 - 2016)

. tab meduc feduc, matcell(data)

male |              female education
education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
low |     1,378        600        314         87 |     2,434
lower voc. |     3,864      7,665      2,528        407 |    14,696
medium voc. |       815      1,847      4,802        809 |     8,849
higher voc. |       276        530      1,122        898 |     3,422
university |       387        729      1,828      1,115 |     6,488
------------+--------------------------------------------+----------
Total |     6,720     11,371     10,594      3,316 |    35,889

|   female
male | education
education | universit |     Total
------------+-----------+----------
low |        55 |     2,434
lower voc. |       232 |    14,696
medium voc. |       576 |     8,849
higher voc. |       596 |     3,422
university |     2,429 |     6,488
------------+-----------+----------
Total |     3,888 |    35,889

```
```    It is hard to interpret this table as is, because of the differences in
the margins

For example, it appears that a female with medium vocational education
marying a male with lower vocational education is more common than vice
versa.

This is contrary to the notion that partnerships where the men are better
educated are more common.

But the observed pattern may be due to the fact that lower vocational
education is more common among men than women, and medium vocational
education is more common among women than men.

Can't we change the table such that the pattern of association remains
constant, but all the margins are the same, e.g. 100?

That would make it easier to see patterns.

Last week we solved this problem by looking at how independence was
defined in a chi-square test:

We imagined what the table would look like if the margins remained as we
observed them, but there is no other association between the row and
column variable

This week we turn this around: We keep the association in the table as
observed, but change the margins.

```

`-------------------------------------------------------------------------------`

`<< index >>`
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Making all the row totals 100

Notice, that I added the option matcell(data) to the tab command. This
leaves behind the table as a Stata matrix named data, which in turn can

```
```
. mata
------------------------------------------------- mata (type end to exit) -----
: data = st_matrix("data")

: data
1      2      3      4      5
+------------------------------------+
1 |  1378    600    314     87     55  |
2 |  3864   7665   2528    407    232  |
3 |   815   1847   4802    809    576  |
4 |   276    530   1122    898    596  |
5 |   387    729   1828   1115   2429  |
+------------------------------------+

: end
-------------------------------------------------------------------------------

```
```    If we divide all cell entries by the rowsum, then the new rowsum will be
1.

Multiply the new cell entries by a 100, and the rowsum will be a 100.

```
```
. mata
------------------------------------------------- mata (type end to exit) -----
: muhat = data

:
: muhat = muhat:/rowsum(muhat):*100

: muhat
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |  56.61462613   24.65078061   12.90057518   3.574363188   2.259654889  |
2 |  26.29286881   52.15704954   17.20195972   2.769461078    1.57866086  |
3 |  9.210080235   20.87241496   54.26601876   9.142275963    6.50921008  |
4 |  8.065458796    15.4880187   32.78784337   26.24196376   17.41671537  |
5 |    5.9648582   11.23612824   28.17509248   17.18557337   37.43834772  |
+-----------------------------------------------------------------------+

: rowsum(muhat)
1
+-------+
1 |  100  |
2 |  100  |
3 |  100  |
4 |  100  |
5 |  100  |
+-------+

: colsum(muhat)
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |  106.1478922    124.404392   145.3314895   58.91363736   65.20258892  |
+-----------------------------------------------------------------------+

: end
-------------------------------------------------------------------------------

```
```
```

`-------------------------------------------------------------------------------`

`<< index >>`
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Making all the colum totals 100

The row totals are as we want them, but the column totals are not. What
if we repeat this process for the columns?

```
```
. mata
------------------------------------------------- mata (type end to exit) -----
: muhat = muhat:/colsum(muhat):*100

: muhat
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |  53.33561032   19.81504045   8.876655176   6.067123587   3.465590748  |
2 |  24.77003384   41.92540848   11.83636098   4.700882855    2.42116285  |
3 |  8.676649198   16.77787626   37.33947746   15.51809797   9.983054643  |
4 |  7.598322144   12.44973626   22.56072891   44.54310571   26.71169299  |
5 |    5.6193845   9.031938545   19.38677748   29.17078988   57.41849877  |
+-----------------------------------------------------------------------+

: rowsum(muhat)
1
+---------------+
1 |  91.56002028  |
2 |  85.65384901  |
3 |  88.29515553  |
4 |   113.863586  |
5 |  120.6273892  |
+---------------+

: colsum(muhat)
1     2     3     4     5
+-------------------------------+
1 |  100   100   100   100   100  |
+-------------------------------+

: end
-------------------------------------------------------------------------------

```
```    Now the column totals are as we want them, but now the row totals are a
bit off.

However, the row totals are better than in the original table, so maybe
we need to repeat this process a couple of times?

```

`-------------------------------------------------------------------------------`

`<< index >>`
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Repeat

```
```
. mata
------------------------------------------------- mata (type end to exit) -----
: muhat = muhat:/rowsum(muhat):*100

: muhat
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |  58.25207351   21.64158591   9.694903024   6.626389518   3.785048034  |
2 |  28.91876328   48.94748919    13.8188314   5.488233056   2.826683071  |
3 |  9.826868922   19.00203489   42.28938409   17.57525413   11.30645796  |
4 |  6.673180084   10.93390494   19.81382258   39.11971094   23.45938146  |
5 |   4.65846483   7.487469145   16.07162155   24.18255927    47.5998852  |
+-----------------------------------------------------------------------+

: rowsum(muhat)
1
+-------+
1 |  100  |
2 |  100  |
3 |  100  |
4 |  100  |
5 |  100  |
+-------+

: colsum(muhat)
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |  108.3293506   108.0124841   101.6885626   92.99214691   88.97745573  |
+-----------------------------------------------------------------------+

:
: muhat = muhat:/colsum(muhat):*100

: muhat
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |  53.77312166   20.03618942   9.533916866   7.125751731    4.25394051  |
2 |  26.69522444   45.31651096   13.58936643   5.901824227   3.176853112  |
3 |   9.07128942   17.59244318   41.58715886   18.89971865   12.70710414  |
4 |  6.160085005   10.12281593   19.48480936   42.06775759   26.36553413  |
5 |  4.300279474   6.932040503   15.80474848   26.00494781    53.4965681  |
+-----------------------------------------------------------------------+

: rowsum(muhat)
1
+---------------+
1 |  94.72292019  |
2 |  94.67977917  |
3 |  99.85771425  |
4 |   104.201002  |
5 |  106.5385844  |
+---------------+

: colsum(muhat)
1     2     3     4     5
+-------------------------------+
1 |  100   100   100   100   100  |
+-------------------------------+

: end
-------------------------------------------------------------------------------

```
```    Notice that each time we get a bit closer to our goal

```

`-------------------------------------------------------------------------------`

`<< index >>`
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Iterative Proportional Fitting

The algorithm is called Iterative Proportional Fitting (IPF)

We can automate this repeating using a loop, and in particular we want to
continue the loop until the table does not change anymore.

```
```
. mata
------------------------------------------------- mata (type end to exit) -----
: muhat = data

: muhat2 = 0:*data

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) {
>         muhat2 = muhat
>         muhat = muhat:/rowsum(muhat):*100
>         muhat = muhat:/colsum(muhat):*100
>         printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr
> eldif(muhat2,muhat))
>         i = i + 1
> }
iteration 1 relative change 196.018454
iteration 2 relative change .264736172
iteration 3 relative change .085845379
iteration 4 relative change .032427941
iteration 5 relative change .01302549
iteration 6 relative change .005266606
iteration 7 relative change .002120014
iteration 8 relative change .000852175
iteration 9 relative change .000342384
iteration 10 relative change .00013754
iteration 11 relative change .000055248
iteration 12 relative change .000022192
iteration 13 relative change 8.9140e-06
iteration 14 relative change 3.5805e-06
iteration 15 relative change 1.4382e-06
iteration 16 relative change 5.7769e-07
iteration 17 relative change 2.3204e-07
iteration 18 relative change 9.3206e-08
iteration 19 relative change 3.7438e-08
iteration 20 relative change 1.5038e-08
iteration 21 relative change 6.0404e-09

: muhat
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |   55.2594258   21.01486714   10.56463793    8.17117594   4.989893004  |
2 |  27.29231899   47.28612524   14.98125546   6.732957746   3.707342411  |
3 |  8.441187075   16.70825366   41.72879959   19.62468059   13.49707912  |
4 |  5.378131668    9.02020621   18.34353466   40.98329233   26.27483526  |
5 |  3.628936464   5.970547749   14.38177236   24.48789339   51.53085021  |
+-----------------------------------------------------------------------+

: data
1      2      3      4      5
+------------------------------------+
1 |  1378    600    314     87     55  |
2 |  3864   7665   2528    407    232  |
3 |   815   1847   4802    809    576  |
4 |   276    530   1122    898    596  |
5 |   387    729   1828   1115   2429  |
+------------------------------------+

: end
-------------------------------------------------------------------------------

```
```    In the raw data the odds of men with lower voc marying a women with lower
vocational instead of low is 4.6 times the odds of men with low education
marying a women with lower vocational:

```
```
. di (7665 / 3864 ) / (600 / 1378 )
4.5558877

```
```    In our new table we get the exact same odds ratio:

```
```
. di (47.28612524 / 27.29231899) / (21.01486714 / 55.2594258)
4.5558877

```
```    Standardizing a table like this is nice in a teaching setting, because
you can see what is going on. In a real analysis you use the >> stdtable
package

```

`-------------------------------------------------------------------------------`

`<< index >>`
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Try it yourself

Use IPF to standardize the place of residence table (place.dta) to have
margins of all 100.

```
`raking_sol1.do`
```

```

`-------------------------------------------------------------------------------`

`<< index >>`
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
Standardization to compare tables
-------------------------------------------------------------------------------

comparing tables

Say we want to compare the tables for the cohorts born in 1940-1945 (i.e.
were 20 in 1960-1965) and born in 1960-1965 (i.e. were 20 in 1980-1985).

What if we standardize the table from 1940-1945 to have the margins of
the table from 1960-1965?

This way we can easier compare across cohorts, and still have margins
that are more realistic than all 100s.

```
```
. use homogamy_allbus.dta, clear
(ALLBUS 1980 - 2016)

. gen coh = cond(inrange(byr, 1960, 1965), 1, ///
>           cond(inrange(byr, 1940, 1945), 0, .))
(36,808 missing values generated)

. label define coh 0 "1940-1945" 1 "1960-1965"

. label value coh coh

. label var coh "resp. birth cohort"

. tab meduc feduc if coh==0, matcell(data1940)

male |              female education
education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
low |       146         81         36          9 |       278
lower voc. |       493      1,432        384         48 |     2,388
medium voc. |        99        306        376         52 |       887
higher voc. |        29         83        119         62 |       338
university |        75        157        312        113 |       955
------------+--------------------------------------------+----------
Total |       842      2,059      1,227        284 |     4,846

|   female
male | education
education | universit |     Total
------------+-----------+----------
low |         6 |       278
lower voc. |        31 |     2,388
medium voc. |        54 |       887
higher voc. |        45 |       338
university |       298 |       955
------------+-----------+----------
Total |       434 |     4,846

. tab meduc feduc if coh==1, matcell(data1960)

male |              female education
education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
low |       120         44         56         23 |       249
lower voc. |       181        477        365         73 |     1,115
medium voc. |        94        168      1,031        167 |     1,577
higher voc. |        44         47        217        185 |       616
university |        30         34        223        174 |       818
------------+--------------------------------------------+----------
Total |       469        770      1,892        622 |     4,375

|   female
male | education
education | universit |     Total
------------+-----------+----------
low |         6 |       249
lower voc. |        19 |     1,115
medium voc. |       117 |     1,577
higher voc. |       123 |       616
university |       357 |       818
------------+-----------+----------
Total |       622 |     4,375

. mata
------------------------------------------------- mata (type end to exit) -----
: data1940 = st_matrix("data1940")

: data1960 = st_matrix("data1960")

: end
-------------------------------------------------------------------------------

```
```    We extract the desired row and column totals

```
```
. mata
------------------------------------------------- mata (type end to exit) -----
: col = colsum(data1960)

: row = rowsum(data1960)

: end
-------------------------------------------------------------------------------

```
```    We can now apply these row and column totals instead of 100.

We can see that a large part of the apparent difference between the
cohorts is due to the change in distribution of education between the
cohorts.

```
```
. mata
------------------------------------------------- mata (type end to exit) -----
: muhat = data1940

: muhat2 = 0:*data1940

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) {
>         muhat2 = muhat
>         muhat = muhat:/rowsum(muhat):*row
>         muhat = muhat:/colsum(muhat):*col
>         printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr
> eldif(muhat2,muhat))
>         i = i + 1
> }
iteration 1 relative change 3.35926798
iteration 2 relative change .412621342
iteration 3 relative change .096720838
iteration 4 relative change .022674838
iteration 5 relative change .005356008
iteration 6 relative change .001267613
iteration 7 relative change .000300115
iteration 8 relative change .000071057
iteration 9 relative change .000016824
iteration 10 relative change 3.9832e-06
iteration 11 relative change 9.4306e-07
iteration 12 relative change 2.2328e-07
iteration 13 relative change 5.2864e-08
iteration 14 relative change 1.2516e-08
iteration 15 relative change 2.9633e-09

: data1940
1      2      3      4      5
+------------------------------------+
1 |   146     81     36      9      6  |
2 |   493   1432    384     48     31  |
3 |    99    306    376     52     54  |
4 |    29     83    119     62     45  |
5 |    75    157    312    113    298  |
+------------------------------------+

: muhat
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |  107.9401832   41.79372109    63.5105883    23.4738278   12.28167952  |
2 |  206.3512558   418.3106688   383.5347872   70.87817787   35.92510995  |
3 |  101.1983989    218.300937   917.1484403   187.5222911   152.8299328  |
4 |  25.01236059    49.9609286   244.9159022   188.6511611   107.4596476  |
5 |  28.49780156   41.63374457   282.8902821    151.474542   313.5036302  |
+-----------------------------------------------------------------------+

: data1960
1      2      3      4      5
+------------------------------------+
1 |   120     44     56     23      6  |
2 |   181    477    365     73     19  |
3 |    94    168   1031    167    117  |
4 |    44     47    217    185    123  |
5 |    30     34    223    174    357  |
+------------------------------------+

: end
-------------------------------------------------------------------------------

```
```    Alternatively, we can use stdtable

```
```
. stdtable meduc feduc, by(coh,baseline(1))

-----------------------------------------------------------------------------
resp. birth |
cohort and  |
male        |                        female education
education   |         low   lower voc.  medium voc.  higher voc.   university
------------+----------------------------------------------------------------
1940-1945   |
low |         108         41.8         63.5         23.5         12.3
lower voc. |         206          418          384         70.9         35.9
medium voc. |         101          218          917          188          153
higher voc. |          25           50          245          189          107
university |        28.5         41.6          283          151          314
|
Total |         469          770         1892          622          622
------------+----------------------------------------------------------------
1960-1965   |
low |         120           44           56           23            6
lower voc. |         181          477          365           73           19
medium voc. |          94          168         1031          167          117
higher voc. |          44           47          217          185          123
university |          30           34          223          174          357
|
Total |         469          770         1892          622          622
-----------------------------------------------------------------------------

-------------------------
resp. birth |
cohort and  |   female
male        |  education
education   |       Total
------------+------------
1940-1945   |
low |         249
lower voc. |        1115
medium voc. |        1577
higher voc. |         616
university |         818
|
Total |        4375
------------+------------
1960-1965   |
low |         249
lower voc. |        1115
medium voc. |        1577
higher voc. |         616
university |         818
|
Total |        4375
-------------------------

```
```
```

`-------------------------------------------------------------------------------`

`<< index >>`
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
Standardization to compare tables
-------------------------------------------------------------------------------

Try it yourself

Standardize the tables in place.dta such that the margins for all cohorts
correspond to the margins of the 1950 cohort.

```
`raking_sol2.do`
```
```

`-------------------------------------------------------------------------------`

`<< index >>`
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
keep the margins as observed, but change the pattern
-------------------------------------------------------------------------------

Independence revisited

We have until now keept the patern as observed in the data, and changed
the margins. Can we not turn that around: Keep the margins as observed in
the data, and change the pattern?

An interesting baseline pattern would be independence. We would start
with a table that satisfies independence, and change the values such that
the margins correspond to the observed margins. A table that satisfies
independence is a table with all 1s.

```
```
. mata:
------------------------------------------------- mata (type end to exit) -----
: row = rowsum(data)

: col = colsum(data)

: muhat = J(5,5,1)

: muhat2 = 0:*muhat

: muhat
[symmetric]
1   2   3   4   5
+---------------------+
1 |  1                  |
2 |  1   1              |
3 |  1   1   1          |
4 |  1   1   1   1      |
5 |  1   1   1   1   1  |
+---------------------+

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) {
>         muhat2 = muhat
>         muhat = muhat:/rowsum(muhat):*row
>         muhat = muhat:/colsum(muhat):*col
>         printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr
> eldif(muhat2,muhat))
>         i = i + 1
> }
iteration 1 relative change .999570562
iteration 2 relative change 3.3466e-16

: muhat
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |  455.7519017    771.183761   718.4874474    224.891861   263.6850288  |
2 |  2751.737858   4656.251665   4338.081975   1357.851598   1592.076904  |
3 |  1656.922177   2803.699713   2612.118086   817.6121932   958.6478308  |
4 |   640.748976   1084.219733   1010.133133   316.1791078   370.7190504  |
5 |  1214.839087   2055.645128   1915.179359     599.46524   702.8711862  |
+-----------------------------------------------------------------------+

: end
-------------------------------------------------------------------------------

. tab meduc feduc , exp nofreq

male |              female education
education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
low |     455.8      771.2      718.5      224.9 |   2,434.0
lower voc. |   2,751.7    4,656.3    4,338.1    1,357.9 |  14,696.0
medium voc. |   1,656.9    2,803.7    2,612.1      817.6 |   8,849.0
higher voc. |     640.7    1,084.2    1,010.1      316.2 |   3,422.0
university |   1,214.8    2,055.6    1,915.2      599.5 |   6,488.0
------------+--------------------------------------------+----------
Total |   6,720.0   11,371.0   10,594.0    3,316.0 |  35,889.0

|   female
male | education
education | universit |     Total
------------+-----------+----------
low |     263.7 |   2,434.0
lower voc. |   1,592.1 |  14,696.0
medium voc. |     958.6 |   8,849.0
higher voc. |     370.7 |   3,422.0
university |     702.9 |   6,488.0
------------+-----------+----------
Total |   3,888.0 |  35,889.0

```
```    Notice that the second iteration added nothing, so the IPF converged in
one iteration. Also, the estimated counts correspond to the counts we
computed last week.

This is not a coincidence:

In the first step of the first iteration, each cell gets 1/5 rowtotal

At the begining of the second step of the first iteration, the
collumn totals are 1/5th of N

So we get: (rowtotal/5)/(N/5)*coltotal = (rowtotal*coltotal)/N

```

`-------------------------------------------------------------------------------`

`<< index >>`
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
Standardization to known margins in the population
-------------------------------------------------------------------------------

Margins in our sample and margins in the population

Samples often deviate from the population because of
The way the sample was drawn
the way the data was collected
some people are harder to contact
some people are less likely to participate

What if we had the marginal distriubtion of our variables from the
population?

Can't we use the same trick to standardize our table to those population
margins?

This is a classic application of raking, and is often used when computing
(post-stratification) weights.

```

`-------------------------------------------------------------------------------`

`<< index >>`
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
Standardization to known margins in the population
-------------------------------------------------------------------------------

Computing post-stratification weights for the cohort 1940-1945

Lets get the observed data again and take a look at the population
margins

```
```
. use homogamy_allbus, clear
(ALLBUS 1980 - 2016)

. tab meduc feduc if inrange(byr,1940,1945), matcell(data1940)

male |              female education
education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
low |       146         81         36          9 |       278
lower voc. |       493      1,432        384         48 |     2,388
medium voc. |        99        306        376         52 |       887
higher voc. |        29         83        119         62 |       338
university |        75        157        312        113 |       955
------------+--------------------------------------------+----------
Total |       842      2,059      1,227        284 |     4,846

|   female
male | education
education | universit |     Total
------------+-----------+----------
low |         6 |       278
lower voc. |        31 |     2,388
medium voc. |        54 |       887
higher voc. |        45 |       338
university |       298 |       955
------------+-----------+----------
Total |       434 |     4,846

. use margins1940, clear

. list

+-----------------------------------------+
| female                   ed           p |
|-----------------------------------------|
1. |   male                basic   .14092565 |
2. |   male    vocational, lower   .53389831 |
3. |   male   vocational, middle   .12625549 |
4. |   male   vocational, higher   .03011818 |
5. |   male           university   .16880237 |
|-----------------------------------------|
6. | female                basic   .33888268 |
7. | female    vocational, lower   .40107984 |
8. | female   vocational, middle   .16634283 |
9. | female   vocational, higher   .02550338 |
10. | female           university   .06819128 |
+-----------------------------------------+

```
```    Lets get the data and the desired margins in Mata and compare them with
the observed margins.

Notice the ' at the end of the line starting with col. This turns the
columnvector col into a rowvector.

```
```
. mata
------------------------------------------------- mata (type end to exit) -----
: data1940 = st_matrix("data1940")

: row = st_data((1,5),3)

: col = st_data((6,10),3)'

: n = sum(data1940)

: row = row:*n

: col = col:*n

: row
1
+---------------+
1 |  682.9257144  |
2 |  2587.271186  |
3 |   611.834118  |
4 |  145.9527007  |
5 |  818.0162805  |
+---------------+

: rowsum(data1940)
1
+--------+
1 |   278  |
2 |  2388  |
3 |   887  |
4 |   338  |
5 |   955  |
+--------+

: col
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |  1642.225466   1943.632883   806.0973572   123.5893736   330.4549203  |
+-----------------------------------------------------------------------+

: colsum(data1940)
1      2      3      4      5
+------------------------------------+
1 |   842   2059   1227    284    434  |
+------------------------------------+

: end
-------------------------------------------------------------------------------

```
```    Now we can apply the same trick as before.

```
```
. mata
------------------------------------------------- mata (type end to exit) -----
: muhat = data1940

: muhat2 = 0:*data1940

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) {
>         muhat2 = muhat
>         muhat = muhat:/rowsum(muhat):*row
>         muhat = muhat:/colsum(muhat):*col
>         printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr
> eldif(muhat2,muhat))
>         i = i + 1
> }
iteration 1 relative change 3.15360994
iteration 2 relative change .345717331
iteration 3 relative change .06376476
iteration 4 relative change .016214917
iteration 5 relative change .004553063
iteration 6 relative change .001271767
iteration 7 relative change .000354736
iteration 8 relative change .000098909
iteration 9 relative change .000027576
iteration 10 relative change 7.6877e-06
iteration 11 relative change 2.1432e-06
iteration 12 relative change 5.9750e-07
iteration 13 relative change 1.6657e-07
iteration 14 relative change 4.6438e-08
iteration 15 relative change 1.2946e-08
iteration 16 relative change 3.6092e-09

: data1940
1      2      3      4      5
+------------------------------------+
1 |   146     81     36      9      6  |
2 |   493   1432    384     48     31  |
3 |    99    306    376     52     54  |
4 |    29     83    119     62     45  |
5 |    75    157    312    113    298  |
+------------------------------------+

: muhat
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |  474.5208782   143.3508106   47.95215625   8.169263599   8.932605961  |
2 |  875.0072936   1383.950116   279.3181446   23.79271058   25.20292251  |
3 |  131.9710065   222.1147631   205.4160385   19.35907528   32.97323446  |
4 |  26.30712883   40.99833415   44.24106626   15.70742786   18.69874349  |
5 |  134.4191584   153.2188596   229.1699516   56.56089627   244.6474139  |
+-----------------------------------------------------------------------+

: end
-------------------------------------------------------------------------------

```
```    So if the margins in our data corresponded with the margins in the
population then we would expect to find 475 couples with both low
education, but in our data we only found 146 such couples.

So a single couple with both low education in our data stands for
475/146=3.3 observations in the table with the population margins.

This 3.3 is our post-stratification weight

```
```
. use homogamy_allbus
(ALLBUS 1980 - 2016)

. mata
------------------------------------------------- mata (type end to exit) -----
: muhat:/data1940
1             2             3             4             5
+-----------------------------------------------------------------------+
1 |  3.250143001   1.769763094    1.33200434   .9076959554    1.48876766  |
2 |  1.774862664   .9664456116   .7273910014   .4956814704   .8129975004  |
3 |   1.33304047    .725865239   .5463192514   .3722899092    .610615453  |
4 |  .9071423736   .4939558331   .3717736661   .2533456107   .4155276332  |
5 |  1.792255446   .9759163033   .7345190755   .5005389051   .8209644761  |
+-----------------------------------------------------------------------+

: st_matrix("weights", muhat:/data1940)

: end
-------------------------------------------------------------------------------

. gen weight = weights[meduc,feduc] if inrange(byr,1940,1945)
(43,748 missing values generated)

. tab meduc feduc if inrange(byr, 1940, 1945) [iweight=weight]

male |              female education
education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
low | 474.52089  143.35081  47.952155  8.1692635 | 682.92572
lower voc. |875.007285   1,383.95  279.31815 23.7927103 |2,587.2712
medium voc. | 131.97101  222.11476  205.41604  19.359075 | 611.83412
higher voc. |  26.30713  40.998333  44.241066  15.707428 |  145.9527
university | 134.41916  153.21886  229.16995  56.560894 | 818.01627
------------+--------------------------------------------+----------
Total | 1,642.225  1,943.633  806.09735  123.58937 |     4,846

|   female
male | education
education | universit |     Total
------------+-----------+----------
low | 8.9326057 | 682.92572
lower voc. | 25.202923 |2,587.2712
medium voc. | 32.973233 | 611.83412
higher voc. | 18.698744 |  145.9527
university | 244.64741 | 818.01627
------------+-----------+----------
Total | 330.45491 |     4,846

```
```
```

`-------------------------------------------------------------------------------`

`<< index `
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
digression
-------------------------------------------------------------------------------

stdtable

This method has been implemented in Stata as the stdtable command.

```
```
. stdtable meduc feduc

-----------------------------------------------------------------------------
male        |                        female education
education   |         low   lower voc.  medium voc.  higher voc.   university
------------+----------------------------------------------------------------
low |        55.3           21         10.6         8.17         4.99
lower voc. |        27.3         47.3           15         6.73         3.71
medium voc. |        8.44         16.7         41.7         19.6         13.5
higher voc. |        5.38         9.02         18.3           41         26.3
university |        3.63         5.97         14.4         24.5         51.5
|
Total |         100          100          100          100          100
-----------------------------------------------------------------------------

-------------------------
|   female
male        |  education
education   |       Total
------------+------------
low |         100
lower voc. |         100
medium voc. |         100
higher voc. |         100
university |         100
|
Total |         500
-------------------------

```
```
```

`-------------------------------------------------------------------------------`

`<< index `
```-------------------------------------------------------------------------------
```
```-------------------------------------------------------------------------------
ancillary
-------------------------------------------------------------------------------

Can all tables be standardized?

Consider the following table

0 0 2
1 5 2
8 7 0

In order to make the first row total 100, the top right cell must be 100

In order to make the last column total 100, the top right cell cannot be
100

This is an example of a table that cannot be standardized. The Mata
program we created above will stop after 30 iterations, but the condition
mreldif(muhat2,muhat)>1e-8 will not be met. In other words the algorithm
has not converged. The stdtable command will give a more explicit warning
when that happens.

```

`-------------------------------------------------------------------------------`

`index`
```-------------------------------------------------------------------------------
```
`raking_sol1.do`

````use place.dta, clear`
`tab place15 place30, matcell(data)`
`mata`
`data = st_matrix("data")`
`muhat = data`
`muhat2 = 0:*data`
`i = 1`
`while(i<30 & mreldif(muhat2,muhat)>1e-8) {`
`	muhat2 = muhat`
`	muhat = muhat:/rowsum(muhat):*100`
`	muhat = muhat:/colsum(muhat):*100`
`	printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mreldif(muhat2,muhat))`
`	i = i + 1`
`}`
`muhat`
`data`
`end`
`stdtable place15 place30`
```

`-------------------------------------------------------------------------------`
`<<`
`-------------------------------------------------------------------------------`
`raking_sol2.do`

````use place.dta, clear`
`tab place15 place30 if coh==30, matcell(data30)`
`tab place15 place30 if coh==40, matcell(data40)`
`tab place15 place30 if coh==50, matcell(data50)`
`mata`
`data30 = st_matrix("data30")`
`data40 = st_matrix("data40")`
`data50 = st_matrix("data50")`
`row = rowsum(data50)`
`col = colsum(data50)`
`muhat = data30`
`muhat2 = 0:*data`
`i = 1`
`while(i<30 & mreldif(muhat2,muhat)>1e-8) {`
`	muhat2 = muhat`
`	muhat = muhat:/rowsum(muhat):*row`
`	muhat = muhat:/colsum(muhat):*col`
`	printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mreldif(muhat2,muhat))`
`	i = i + 1`
`}`
`muhat30 = muhat`
`muhat = data40`
`muhat2 = 0:*data`
`i = 1`
`while(i<30 & mreldif(muhat2,muhat)>1e-8) {`
`	muhat2 = muhat`
`	muhat = muhat:/rowsum(muhat):*row`
`	muhat = muhat:/colsum(muhat):*col`
`	printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mreldif(muhat2,muhat))`
`	i = i + 1`
`}`
`muhat40 = muhat`
`muhat30`
`muhat40`
`data50`
`end`
`stdtable place15 place30, by(coh,baseline(50))`
```

`-------------------------------------------------------------------------------`
`<<`
`-------------------------------------------------------------------------------`