Age data frequently display excess frequencies at attractive numbers, such as multiples of five. We use this “age heaping” to measure cognitive ability in quantitative reasoning, or “numeracy”. We construct a database of age heaping-based estimates of basic numeracy with exceptional geographic and temporal coverage


Baten, Joerg, University of Tuebingen.; with the help of several coauthors, without implying any responsibility to them for potential mistakes (Valeria Prayon, Dorothee Crayen, Dácil Juif, Ralph Hippe, Christina Mumme and many others)

Production date



Numeracy estimates (ABCC) of both genders in percent


Numeracy, education

Time period

1500 -1970. All data refer to the birth decadal average (1810 means 1810-19 etc)

Geographical coverage


Methodologies used for data collection and processing

Reconstruction of basic numeracy by birth decade using a variety of different sources. See also the text below. The ABCC is a linear transformation of the Whipple index of age heaping

Period of collection

See references

Data collectors

Joerg Baten, Valeria Prayon, Dorothee Crayen, Dácil Juif, Ralph Hippe, Christina Mumme, and many colleagues from around the world

As good as possible, but counter-checking and improvement welcome. Interpretations on individual country level should be done with careful checking. In the ClioInfra quality coding, none of the ABCC obtains a 1 (“official governmental statistic”) or a 4 (“Conjecture, guesstimate). All the ABCCs which are based on UN Demographic Yearbooks or which contain the word “Census” in the title referenced in the bibliography/list of references should obtain a 2, because those ABCC values are calculated with official statistical data, but by the authors, not by the government. All other estimates should obtain a 3

General references

None of the Whipple estimates of the modestly sized literature entered

the data set unchanged, three general references which should be cited

(because they reflect most of the literature) are

This following is an excerpt of the papers by A’Hearn et al. (2009),

on the pre-1800 estimates (especially those referring to half centuries)

and Prayon/Baten (2013), on the post-1800 estimates. For the citation of

A’Hearn et al. see above.

On the pre-1800 part

As signature ability can proxy for literacy, so accuracy of age

reporting can proxy for numeracy, and for human capital more generally.

A society in which individuals know their age only approximately is a

society in which life is not governed by the calendar and the clock but

by the seasonal cycle; in which birth dates are not recorded by families

or authorities; in which few individuals must document their age in

connection with privileges (voting, office-holding, marriage, holy

orders) or obligations (military service, taxation); in which

individuals who do know their birth year struggle to accurately

calculate their age from the current year. Approximation in age

awareness manifests itself in the phenomenon of “heaping” in

self-reported age data. Individuals lacking certain knowledge of their

age rarely state this openly, but choose instead a figure they deem

plausible. They do not choose randomly, but have a systematic tendency

to prefer “attractive” numbers, such as those ending in 5 or 0, even

numbers, or - in some societies - numbers with other specific terminal

digits. Such “age heaping” can be assessed in a wide range of

sources: census returns, tombstones, necrologies, muster lists, legal

records, or tax data, for example. While care must be exercised in

ascertaining possible biases, such data are much more widely available

than signature rates and other proxies for human capital.

Age heaping is a well-known phenomenon among demographers, development

economists, and anthropologists. Already a half-century ago influential

studies by Roberto Bachi and Robert Myers investigated age heaping and

its inverse correlation with education levels within and across

countries. Myers later demonstrated a similar inverse correlation

between age awareness and income at the individual level as well. For

others, including epidemiologists, age heaping is a problem to be

solved, a source of distortion in age-specific vital rates. Zelnik, for

example, assessed age misreporting in the United States between the 1880

and 1950 censuses.

Meanwhile, historians have studied age heaping as a topic of interest in

its own right. In their landmark study of Florentine tax records from

the fourteenth and fifteenth centuries David Herlihy and Christiane

Klapisch-Zuber document marked heaping on even numbers for children and

on multiples of five for adults, to a degree similar to that reported

for Egyptian census data in 1947. Age heaping diminished substantially

over successive tax enumerations from 1371 to 1470, and was more

prevalent among women, in rural areas and small towns, and among the

poor. Daniel Kaiser and Peyton Engel report similar age heaping levels

for early modern Russia. A well-known study is Richard Duncan-Jones’

analysis of grave monument inscriptions in twelve provinces of the Roman

Empire. He finds age heaping on multiples of five at rates not

dissimilar to those for medieval Tuscany or developing countries of the

1950s and ’60s and higher for women than men.

There has been little use of age heaping as an indicator of human

capital in the economic history literature. Joel Mokyr tests for

positive selection or “brain drain” in pre-famine Irish emigration

by comparing age heaping among migrants to the population at large.

Developing original measures of age heaping along the way, he finds no

support for the conventional wisdom that the best and brightest

emigrated. In other studies of Ireland, John Budd and Timothy Guinnane

report considerable heaping on multiples of five in the 1901 and 1911

censuses among the illiterate, the poor, and the aged; Cormac O’Grada,

among Dublin’s immigrant Jewish population. O’Grada interprets age

heaping as confirming that low Jewish literacy rates did not refer only

to the English language and, consequently, that their lower mortality

rate was the result of religious practices rather than education. For

Britain, Jason Long has assessed age heaping in linked samples from the

censuses of 1851 and 1881. A quarter of his 1851 school-aged children

reported ages in 1881 that were from two to five years different from

the expected 30 year increment. Countywide age heaping had a limited

impact on individual labor market outcomes, once other county

characteristics were controlled for, but individual age discrepancies

had a significant impact on socio-economic status, wages (10% higher for

0-discrepancy individuals), and the probability of rural-urban


To deploy age-heaping as a useful indicator of human capital, we require

a measure that allows us to track its variation over time and across

groups. We propose a variant of the well-known Whipple Index, which is

simple, robust, and easy to interpret. The Whipple Index is the ratio of

the observed frequency of ages ending in 0 or 5 to the frequency

predicted by assuming a uniform distribution of terminal digits (in

other words one fifth).


An index value of 500 would indicate perfect heaping on multiples of

five; a value of 100 no heaping at all; and a value of 0 perfect

“anti-heaping”. The notation in Equation 1 is meant to emphasize

that W must be defined over an interval in which each terminal digit

occurs an equal number of times, for example 30-39 or 23-72. The

prediction of equal terminal digit frequencies is what makes the Whipple

Index easy to calculate, but is also a source of inaccuracy. In a

typical population, frequencies decrease with age; in the interval 50-54

one would expect fewer 54 year olds than 50 year olds, even in the

absence of heaping. Restricting attention to intervals of (multiples of)

ten years helps mitigate this problem. A more obvious limitation of the

Whipple Index is that it can capture only heaping on multiples of five.

In practice, this is the overwhelmingly dominant form of heaping

observed for adults across a wide range of times and places in our data.

(Among children and adolescents even-heaping is common.)

In a separate study, we compare the statistical properties of the

Whipple Index with alternatives including measures proposed by Bachi,

Myers, and Mokyr. In simulation studies, the Whipple Index demonstrates

several advantages. First its mean is not scale dependent, meaning that

W can be compared across samples of widely varying size. Second, E(W)

increases linearly with heaping, again facilitating comparisons.

Finally, the coefficient of variation of W across random samples is

systematically lower than for the alternatives, at all sample sizes and

for all degrees of heaping. This leads to greater reliability in

correctly ranking samples according to the true extent of heaping in the

underlying populations. In this paper we employ a simple transformation

of the Whipple Index that can be interpreted as the share of individuals

that correctly report their age:


Note: this index was named in later publications ‘ABCC-Index’



On the post-1800 part

Based on the assumption that basic numerical skills are acquired during

the first decade of life, we calculate the ABCC index for birth cohorts.

Since mortality increases with higher ages, the frequencies of reported

ages ending in multiples of five would augment and lead to an

underestimation of the ABCC index. To overcome this problem, we spread

the final digits of 0 and 5 more evenly across the age ranges and define

the age-groups 23-32, 33-42, …, 73 to 82. In a second step, the

age-groups are assigned to the corresponding birth decades. In the case

that data overlap for one or several birth decades within a country

because more than one census was available for this country, we

calculated the arithmetic average of the indices. In the entire data

set, the birth decades range from the 1680s to the 1970s for some

countries, whereas for the majority of countries data are only available

for the birth decades from the 1870s to the 1940s for most individual


A major advantage of the age-heaping method is its consistent

calculation. This way, age-heaping results might be more easily

comparable across countries, whereas comparisons of literacy or

enrolment rates might be misleading due to significant measurement

differences or different school systems. Further, owing to usually high

drop-out rates in developing countries and heterogeneous teacher

quality, it can be argued that enrolment rates are less conclusive for

our goal as enrolment ratios are an input measure of human capital: Even

though a country might have high enrolment ratios, they do not permit

conclusions about the quality of education. Age-heaping on the other

hand is - like literacy - an output measure of human capital.

Recently, several studies confirmed a positive correlation between

age-heaping and other human capital indicators. In their global study on

age-heaping for the period 1880 to 1940, Crayen and Baten (2010a)

identified primary school enrolment as a main determinant of

age-heaping: an increase of enrolment rates led to a significant

decrease of the age-heaping level. A’Hearn, Baten, and Crayen (2009)

used a large U.S. census sample to perform a very detailed analysis of

the correlation between regional numeracy and literacy. Based on a

sample of 650,000 individuals from the 1850, 1870, and 1900 IPUMS U.S.

censuses, they found for the overall sample as well as for subsamples a

positive and statistically significant relationship between these two

human capital indicators. They also went back further in time and

studied the relationship of signature ability as a proxy for literacy

and age-heaping as a proxy for numeracy in early modern Europe. Here as

well they found a positive correlation between the two measures. In a

study on China, Baten et al. (2010) found a strong relationship between

the age-heaping and literacy among Chinese immigrants in the US born in

the 19th century. Additionally, Hippe (2011) examined systematically the

relationship of numeracy and literacy on the regional level in seven

European countries in the 19th century and in ten developing countries

in the 20th century. He found for each country separately a high

correlation between the two indicators.

Possible objections to the age-heaping method should be addressed here.

One concerns the uncertainty of what is actually being measured; is it

the age-awareness of the respondent during the interview or the

diligence of the reporting personnel? The other possible objection

relates to other forms of age-heaping, i.e., other patterns than the

heaping on multiples of five. Concerning the first objection, Crayen and

Baten (2010b) admit that the possibility of a potential bias always

exists if more than one person is involved in the creation of a

historical source. For example, if literacy is measured by analysing the

share of signatures in marriage contracts, there might have been priests

who were more or less interested in obtaining real signatures, as

opposed to just crosses or other symbols (Crayen and Baten (2010b:460)).

They argue, however, that the empirical findings in previous age-heaping

studies, namely that there is generally less numeracy among the lower

social strata and similar regional differences of age-heaping and

illiteracy, support their assumption that the age-awareness of the

respondent is captured and the bias of meticulous or inaccurate

reporting is negligible. A study by Scott and Sabagh (1970) supports the

assumption that it does not make a difference whether the individual or

the reporting personnel reports a rounded age if the true age is

unknown. They investigated the behaviour of canvassers during the

Moroccan Multi-Purpose Sample Survey of 1961-1963 and found that the

canvassers were indeed not free of reporting rounded ages of people that

did not know their age themselves. The interesting feature in this

context is that between 70 and 90 per cent (dependent on the underlying

age group) of the interviewed people did not know their age and

thereupon the historical calendar method was applied. Expressed in ABCC

values this would imply an overall numeracy level somewhere between 10

and 30 ABCC points. And indeed, this fits well the calculated

age-heaping level observed in Morocco for the census of 1960, namely an

ABCC level between 20 and 40.

To overrule the second objection, which is different heaping patterns,

we exclude in our study all individuals younger than 23 and older than

82 to minimise possible biases due to age effects. The very old are

dropped as mortality effects might distort the age-heaping indices.

Among teenagers and young adults, we often find a heaping pattern on

multiples of two instead of multiples of five, indicating a more precise

age-awareness than older age groups that heap on multiples of five. The

reason is probably that many important events in life, marriage,

military recruitment, and reaching legal age happen during the late

teens and early twenties; such occasions might increase age awareness.

Further, special cultural number preferences – like the dragon year or

the number eight in Chinese culture – do not seem to influence the

index much, as Baten et al. (2010) found in a study on China.

Crayen and Baten (2010a) also examined whether the degree of bureaucracy

in a country could account for lower age-heaping values, i.e., if the

government interacts with its citizens more regularly, the age-awareness

of the population might be higher than in countries without well

developed institutions, independently of one’s individual educational

attainment. To test this possible bureaucratic factor, they included two

explanatory variables, one measuring the ‘state antiquity’ and one

that accounts for the numbers of censuses performed in each country up

to the period under study. For all specifications, those variables

showed no significant influence on the age-heaping level of the

countries, leading to the conclusion that this ‘bureaucratic factor’

does not play an important role. The fact that countries with an early

introduction of birth registers and a high number of censuses show

higher age-awareness can be explained with the fact that these countries

introduced also schooling relatively early. Again, schooling outweighs

the independent bureaucratic effect. Somehow related to this is the

question of cultural differences in age-awareness. However, analysis

showed that only the East Asian region had systematically less

age-heaping than the other regions under study. This finding might be

due to the importance of the Chinese astrological calendar in daily

life, which relies on greater numerical ability in the population. In

conclusion, the correlation between age-heaping and other human capital

indicators is quite well established, and the ‘bureaucratic’ factor

does not invalidate this relationship (Crayen and Baten 2010b:458).

Additionally, could it be a problem that we construct our trends based

on different census years? Crayen and Baten (2010a) examined the

possible correlation of age and age-heaping and found only a systematic

influence of age on the heaping behaviour among the youngest age group:

23 to 32. People at this age tend to heap their age less than the older

age groups. Based on this observation, Crayen and Baten suggested an

adjustment of the numeracy index for the youngest birth cohort that we

applied in this study as well.

Bachi, Tendency; Myers, Instance and Accuracy.

Zelnik, Age Heaping.

For discussion of age heaping as a problem see Vallin, et al., New

estimate; Crockett and Crockett, Consequences discuss the issues for

historical research. See also U.N. Statistics Division, Nonsampling


Herlihy and Klapisch-Zuber, Toscans.

Kaiser and Engel, Time.

Duncan-Jones, Structure. For a study of contemporary China, see Jowett

and Li, Age Heaping.

Mokyr, J., Why Ireland Starved.

Budd and Guinnane, Intentional; O’Grada, Dublin.

A’Hearn et al., Quantifying.

Manzel and Baten (2009) found the same strong relationship between

literacy and numeracy in a study on the regions of Argentina.










youngest age group (23-32) is: (W-100)*0.25+W. For more details, see the

Appendix in Baten and Crayen (2010a).


