Census data user guide

This section answers some of the questions raised by data users when using census data.

Is census data different from survey data?

A census is different from other sources of data (such as sample surveys) as it includes information from every person in the country and is therefore, not subject to any sampling errors.

Compared with sample surveys, which mostly provide data at national or regional levels, census data is available for much smaller geographic areas (down to the meshblock and area unit levels – see Geographic definitions on the Statistics NZ website) and for other small population groups (eg ethnic groups) in New Zealand.

The census also provides good contextual information for individuals, families, and households.

However, because the census is self-administered, is costly, and includes a wide range of topics, its questions may lack the depth that could otherwise be collected from a sample survey. Sample surveys cover only a small proportion of the population.

Users are advised to fully understand the strengths and limitations of data from individual sample surveys compared with census data before deciding which to use.

To what geographic levels can I get census data?

Census data is available in two main ways:

  • as standard published outputs available from the Statistics NZ website
  • as customised data available on request from Statistics NZ's Client Services team.

The geographic levels available from these two sources are summarised in table 3.

Table 3

Geographic levels available from census data
Geographic level  Standard published output Customised request 
Meshblock(1)  X  X
Area unit  X  X
Ward  X  X
Territorial authority area  X  X
Regional council area  X  X
Urban area    X
Statistical area    X
Regional council constituency    X
Community board    X
General electoral district  X  X
Māori electoral district  X  X
User defined(2)    X
1. Since the 2006 Census, the Meshblock Dataset has been freely available for selected variables from the 2006, 2001, and 1996 Censuses, rebased to 2006 Census boundaries. These counts are at the highest level of each variable's classification. The Meshblock Dataset also contains counts for area units, wards, territorial authority areas, and regional council areas.
2. Such as district health boards, police districts, radius from a specific point, any combination of standard geographies. 

All statistical information published by Statistics NZ is presented in a way that prevents any particulars about a person, dwelling, or household from being identified.

Other than the standard published products and customised requests, anonymised microdata from census can also be accessed by researchers from Statistics NZ’s Data Laboratory (also known as Data Lab) facilities in Statistics NZ's offices in Auckland and Wellington. Access to the Data Lab is based on a rigorous application process and follows strict eligibility criteria based on the requirements of the Statistics Act 1975. The Data Lab operates on a cost-recovery basis.

Which population count should I use?

For some output variables, data about individuals/people can be reported in two ways:

  • census usually resident population count
  • census night population count.

The census usually resident population count is used most often. This is the count of all people who usually live in an area of New Zealand and are present in New Zealand on a given census night. This count excludes visitors from overseas and residents who are temporarily overseas on census night. New Zealand residents who are away from their usual address on census night are allocated back to the area where they usually live and form part of the census usually resident population count of that area.

The census night population count is a count of all people present in a given area of New Zealand on census night. This count includes visitors from overseas who are in New Zealand on census night, but excludes New Zealand residents who are temporarily overseas on census night.

Is census count different from population estimates and projections?

The census usually takes place every five years. Between censuses, Statistics NZ provides population estimates (see Statistics NZ's estimates and projections web page) by age and sex to show the changes since the last census. The estimates are derived from the latest census data with adjustments for net census undercount; residents temporarily overseas on census night; and births, deaths, and migration since the last census. Because of the adjustment for net census undercount and residents who are temporarily overseas at the time of the census, population estimates are usually higher than the census usually resident population count.

Population estimates are provided at the national and subnational levels by age and sex. National population estimates are produced quarterly (reference dates at 31 March, 30 June, 30 September, and 31 December) and provisional results are available within six weeks of the reference date. Subnational population estimates are produced annually (reference date at 30 June) and published in late October each year.

For those interested in planning, Statistics NZ also provides population projections every two to three years. These projections give an indication of the future size and composition of the population. Population projections are available at the national and subnational levels by age, sex, and ethnicity. Multiple projection series are produced by Statistics NZ using different combinations of assumptions about future fertility (births), mortality (deaths), and migration. Population projections use population estimates as a starting point and are an indication of future demographic change based on assumptions about future demographic behaviour.

Further details on population estimates and projections are available on the Statistics NZ website.

Why is the subject population important?

The subject population is the individuals, families, households, or dwellings to which the variable applies.

When interpreting census data, it is important for users to know what subject population the data is based on so that any inferences drawn from that data are restricted only to that population group and not generalised outside that population group. Census variables by subject population lists the subject population(s) for each census variable.

What is the difference between a household and a dwelling?

A dwelling is any building or structure used, or intended to be used, for human habitation. Dwellings can be private (eg houses, flats, or apartments) or non-private (eg hotels, hospitals, prisons). There can be more than one dwelling within a building. For example, in an apartment building each separate apartment or unit is considered a dwelling.

A household comprises either one person who usually resides alone, or two or more people who usually reside together and share facilities (such as those for eating, cooking, bathing; the toilet and a living area) in a private dwelling. There can be more than one family in a household or no families in a household.

Data on families and households collected in the census relates to families and households in private occupied dwellings. No family and household data is collected for non-private dwellings.

How are occupied and unoccupied dwellings defined?

A dwelling is defined as occupied if it is occupied at midnight on the night of the census, or occupied at any time during the 12 hours after midnight on the night of the census, unless the occupant(s) has completed a form at another dwelling during this period.

The count of occupied dwellings includes private occupied dwellings (such as houses, flats, and apartments) and non-private occupied dwellings (such as hotels and hospitals).

A dwelling is defined as unoccupied if it is unoccupied at all times during the 12 hours after midnight on the night of the census, and suitable for habitation. 'Dwellings under construction' includes all houses, flats, groups, or blocks of flats being built.

What is an absentee?

An 'absentee' is a person who was identified on the census dwelling form as usually living in a particular dwelling, but who did not complete a census individual form at that dwelling because they were elsewhere in New Zealand or overseas at the time of the census. Absentees in the census include children away at boarding school and people away on business, on holiday, in hospital, and so on. Long-term hospital patients, and university and other tertiary students who live away from the dwelling for most of the year, are excluded. Information on absentees is used when determining the composition of a household.

Absentees who were elsewhere in New Zealand at the time of the census will have completed an individual form at the place where they were on census night. These absentees are included in the census data.

Absentees who were overseas at the time of the census will not have filled in an individual form and so are not included in the census data. These absentees are also considered when determining household composition.

What is non-response?

'Non-response' is when an individual gives no response at all to a particular census question that was relevant to them. Depending on the variable concerned, the response may either be imputed or classified as ‘not stated’.

Response rates to specific questions may vary. Where non-response rates are available, these will be provided in the relevant Information by Variable product (available from the Statistics NZ website). Unless otherwise stated, non-response rates are calculated by counting the number of responses set to ‘not stated’ (for the 1981–96 Censuses these were called ‘not specified’) as a proportion of the subject population for that question.

High non-response rates may have an impact on the usefulness of data and therefore, the data may need to be interpreted with caution. While there is no standard scale for determining what constitutes a low, moderate, or high non-response rate, a scale was developed in A guide to using data from the New Zealand census: 1981–2006 (Errington, Cotterell, van Randow, & Milligan, 2008). This scale is shown in table 4, together with information about the likely effects of different non-response rates on data quality.

Table 4
Effects of different non-repsonse rates on data quality 
Non-response rate Description of non-response level  Likely effect on data quality
<3.% Low Little or none
3.0–4.9% Relatively low Low
5.0–6.9% Moderate Some reduction in data quality may have occurred
7.0–8.9% Relatively high Some reduction in data quality is likely
9.0%+ High High – data quality will have been reduced

What is imputation?

'Imputation' involves inserting a value when an individual has not provided a valid response. Imputation is carried out for a small number of variables: age, sex, work and labour force status, and usual residence. Imputation procedures are designed so that the imputed values are as close as possible to what is expected to be true when looking at other responses. Imputation takes place during the data processing phase.

Details on imputation methods used in 2006 are available from Imputation and balancing methodologies for the 2006 Census on the Statistics NZ website.

In line with the minimal-content-change strategy adopted by Statistics NZ for the 2013 Census, there will be no change in the imputation methodology for the usual residence variable, and only small updates for age, sex, and work and labour force status variables. These changes will be:

  • age:
    • total income categories used for age imputation will be updated to reflect the new income classification that will be used in the 2013 Census
    • age distributions will be updated.
  • sex:
    • the proportion of males in non-private dwellings will be updated
    • sex distributions will be updated.
  • work and labour force status:
    • total income categories used for work and labour force status imputation will be updated to reflect the new income classification that will be used in the 2013 Census.

Why do totals for some geographic areas from previous censuses change after the latest census?

Geographic totals such as those for meshblocks, area units, territorial authority areas, or regional council areas for previous censuses can differ due to a process called rebasing. Because of population changes, boundaries can change over time, which require geographical rebasing of previous census data to maintain comparability with the latest census.

Rebasing is a process in which previous census geographic data is reallocated to current geographic boundaries. This provides geographic comparability with the latest census and allows time-series analysis of census data. Rebasing requires identifying each dwelling and individual within a meshblock or area unit that has been split since the previous census and determining which meshblock it belongs to on the new meshblock pattern.

When Statistics NZ produces data from previous census years, it does so according to the current census's geographic boundaries. This allows people and dwellings in the same area to be compared between different censuses.

How should I calculate percentages?

When you calculate percentages using census data, it is important to:

  • ensure that the data reflects the correct subject population. For example, when calculating the percentage of regular cigarette smokers, the data needs to refer to the census usually resident population count aged 15 years and over, as this is the correct subject population for this variable
  • use only the total population that excludes residual categories (‘not stated’, ‘refused to answer’, ‘don't know', ‘response outside scope’, ‘response unidentifiable’ and ‘not elsewhere included’) as the denominator for the calculation. Where a ‘total stated’ population specifically appears in the census table, users are advised to use this ‘total stated’ population as the denominator. Where a ‘total stated’ population does not appear in the table, users are advised to subtract the residual categories from the total population mentioned in the table to get the denominator. However, there are some exceptions to this rule:
    • number of children born – 'object to answer' is a valid response and is part of the 'stated' population (it is a tick-box option on the form)
    • Māori descent – 'don't know' is a valid response and is part of the 'stated' population (it is a tick-box option on the form)
    • iwi affiliation – 'don't know' is a valid response and is part of the 'stated' population (it is a tick-box option on the form)
    • religious affiliation – 'no religion' and 'object to answer' are valid responses and are part of the 'stated' population (they are tick-box options on the form).
  • exclude 'not further defined' categories when they are used for cases where the information of interest wasn't provided. For example, if calculating the percentage of households who own the dwelling they live in with a mortgage, the calculation is:
owned with a mortgage  
(owned with mortgage + owned without mortgage)  x 100 

Note: The 'dwelling owned or partly owned, mortgage arrangements not further defined’ category is excluded from the calculation.

When percentages are calculated for categories within total response variables (variables for which there can be more than one valid response), they will most likely add to more than 100 percent.

When calculating percentage change over time use the following formula:

 (latest year figure - base year census figure)  
 base year census figure  x 100

Data users are also advised that in published census data, percentages are usually rounded to one decimal place.

Why use income bands?

Census collects total personal income from individuals using income bands. In this way there is no need to ask detailed questions about income. Total personal income information can then be aggregated to form grouped personal income, which can be used to produce information about small geographic areas or other small population groups. Other total income variables, such as total household income and total family income, are also derived from total personal income.

How are total household and family incomes derived?

To combine income bands together, each band needs to be given a representative value. As census does not provide any details about the distribution of incomes within the bands, representative values are calculated for each income band using data from a survey which collects more detailed income data than the census. In 2006, these were calculated from Survey of Family, Income and Employment (SoFIE) data. For the 2013 Census, Household Economic Survey (HES) data will be used. The representative value for each band is the median value (half are above and half below) for those in that band in the more detailed survey. The representative value may not always be the mid-point of the band because of the uneven distribution within a band.

To derive total household or family income, the representative values for all those who are present in the household/family on census night and who stated their income are added together. If there are no people absent and everyone in the household/family answered the income question then the correct band is allocated. For example, if the sum of the representative values for the household is $73,532 then the total household income band allocated is $70,001 to $100,000.

If the amount summed is less than the top band ($100,001 or more in 2006 and $150,001 or more in 2013) and some people were either absent or did not state their income, then the household/family income is set to ‘not stated’. However, if the sum of the representative value is in the highest band, then the household/family is designated to the top income band for household income even if some people were absent or did not respond.

Median figures should be used with caution as they are not precise figures. Median income is calculated for those that stated an income. Statistics NZ only provides median income figures to the nearest $100.

What is total response data?

For several census variables, individuals have the option to provide more than one response. When more than one response is given, the individual will be counted in each response they reported. The total value is reflected as 'total responses', for example, ethnic group (total responses). This means that the total response count when added together will be greater than the subject population count for that variable, as individuals (or families, households, or dwellings) may be counted more than once.

Variables that may be output on the basis of total responses are:

  • ethnic group
  • languages spoken
  • iwi
  • religious affiliation
  • sources of personal income
  • job search methods
  • unpaid activities
  • sources of family income
  • sources of extended family income
  • sources of household income
  • fuel type used to heat dwellings
  • access to telecommunication systems.

When percentages are calculated for categories within these variables, they will most likely add to more than 100 percent.

Other than total response data, total response variables can also be output as single and combination data, so that individuals or dwellings are counted just once in the category that applies to them – according to the category or combination of categories they have reported. For example, for outputs of ethnic group, the categories may include European only, European/Māori, or Māori/Pacific peoples. This means that the total population will be equal to the usual subject population for that variable, as individuals are counted once only.

Examples of variables that can be output on the basis of single and combination categories are:

  • ethnic group
  • languages spoken
  • fuel type used to heat dwellings.

How is an individual's privacy protected?

Statistics NZ employs several methods to protect the privacy and confidentiality of individuals who fill in census forms. These start from the time census forms are delivered to occupied dwellings and continue through to the publication of census data. 

What happens to your census forms? details how an individual’s privacy and confidentiality is maintained during the collection, transportation, processing, and storage of census forms.

Under the provisions of the Statistics Act 1975, Statistics NZ must ensure that any statistical information published prevents any particulars about a particular person or entity from being identified. To comply with these provisions, Statistics NZ confidentialises the data, balancing the need to protect the details of individuals while providing useful information to data users. Confidentiality rules are usually reviewed after each census and updated when necessary.

The confidentiality rules used from the 2006 Census onwards include:

  1. Meshblock data may only be disaggregated (broken down) by one variable.
  2. No income data can be released for any geographic area if the total unrounded subject population is 40 or fewer individuals for personal income; or 20 or fewer families, dwellings, or households for family and household income.
  3. The standard 15-category income classification cannot be used for meshblocks or user-defined combinations of these. Instead, grouped income categories must be used. The five grouped income classifications are: personal; household, family, extended family, and parental income.
  4. The mean cell size for individual geographic areas must be greater than two. Income information is still subject to rule 2. This means that the total unrounded subject population in a geographic area, divided by the number of categories (excluding totals) in that same geographic area, must be greater than two. The number of categories in the denominator is determined by the number of categories in the standard census variables.
  5. All counts for individuals, families, households, and dwellings must be randomly rounded to base 3. After each count has been rounded, all totals must then be separately randomly rounded to base 3.
  6. All derivations from counts (percentages and ratios, for example) must be derived from the randomly rounded counts. This excludes totals and subtotals, as these are independently randomly rounded. Once a derivation has been calculated, there will be no further random rounding applied to the derived data.

Further details on the 2006 Census confidentiality rules are available from the Statistics NZ website.

Are there any other published sources about census data?

Statistics NZ publishes a series of metadata products that provide information on the creation, definition, interpretation, and comparability of census variables. For 2006 Census, these products include:

  • Introduction to the Census provides an overview of the census. It places the survey in the context of census-taking within New Zealand, relates the census to the Programme of Official Social Statistics, and places the census in an international context. Strategic directions and operational elements of the census, such as collection and processing, are summarised to provide users of census data with information on changes in process, which will enable them to better understand the data.
  • Definitions and Questionnaires contains terminology used in outputs from the census. It also provides a copy of forms used in previous censuses dating back to 1906, where available.
  • Data Dictionary is a catalogue of the variable information available from the census.
  • Information by Variable is a comprehensive source of information about each variable. It gives key information about variables, such as definition, subject population, non-response rates, comparability with previous census data, and any other issues about a variable that data users should know.