12 Number of Household Maintainers
In a recent newspaper article it was reported that
New Statistics Canada data show that households with three or more people contributing to shelter costs and other expenses grew 61 per cent compared with the overall household growth in the City of Vancouver in the past five years.
Reading this we might be interested in more context.
12.1 Question
How has the number of household maintainers changed in other municipalities, and what does this mean?
12.2 Data sources
The number of household maintainers is reported in the census:
Refers to whether or not a person residing in the household is responsible for paying the rent, or the mortgage, or the taxes, or the electricity or other services or utilities. Where a number of people may contribute to the payments, more than one person in the household may be identified as a household maintainer. If no person in the household is identified as making such payments, the reference person is identified by default.
The census dictionary indicates that the ability to identify more than one household maintainer started with the 1996 census moving forward.
12.3 Data acquisition
We need the tidyverse and cancensus packages,
and choose the appropriate census variables and decide what regions we are interested in. Moreover, we need to decide how long a timeframe we are interested in, the news article only referred to the 2016-2021 timeframe, but it might be worthwhile to also consider longer timeframes. The standard census profile data does not report on this before 2011 though, to keep things simple we collect data for the censuses 2011 and onward. We start by collecting the relevant census variables, labelled by base for the base number of households, and One, Two, and Three+ for 1, 2 or 3+ household maintainers.
Selecting regions to compare Vancouver to is somewhat subjective. We go with a mix of larger cities in Metro Vancouver, as well as some from other provinces. One complication is that census geographies can change over time, so we need to be mindful of this. It is good practice to check this against the list of cities that changed 2011 to 2016 and 2016 to 2021, or explicitly inspect the geographies for changes.
We select from the list of all cities of at least 205k people (a number chosen to separate Richmond, BC from Richmond Hill, ON), and within that list narrow it down by matching by name.
Code
regions <- list_census_regions("CA21") |>
filter(level=="CSD",
pop>205000,
grepl("Vancouver|Surrey|Burnaby|Richmond|Toronto|Calgary|Edmonton|Halifax|Winnipeg|Saskatoon|Montréal|Ottawa|Laval",name))
regions
# A tibble: 13 × 8
region name level pop municipal_status CMA_UID CD_UID PR_UID
<chr> <chr> <chr> <int> <chr> <chr> <chr> <chr>
1 3520005 Toronto CSD 2794356 C 35535 3520 35
2 2466023 Montréal CSD 1762949 V 24462 2466 24
3 4806016 Calgary CSD 1306784 CY 48825 4806 48
4 3506008 Ottawa CSD 1017449 CV 505 3506 35
5 4811061 Edmonton CSD 1010899 CY 48835 4811 48
6 4611040 Winnipeg CSD 749607 CY 46602 4611 46
7 5915022 Vancouver CSD 662248 CY 59933 5915 59
8 5915004 Surrey CSD 568322 CY 59933 5915 59
9 1209034 Halifax CSD 439819 RGM 12205 1209 12
10 2465005 Laval CSD 438366 V 24462 2465 24
11 4711066 Saskatoon CSD 266141 CY 47725 4711 47
12 5915025 Burnaby CSD 249125 CY 59933 5915 59
13 5915015 Richmond CSD 209937 CY 59933 5915 59
This leaves us with 13 cities, and manual inspection shows that only Edmonton had a boundary change affecting population, resulting in a gain of 542 people 2016-2021. This should not make a noticeable difference for our analysis, so we will ignore this.
Code
maintainer_data <- bind_rows(
get_census("2011",regions=as_census_region_list(regions), vectors=vectors[["2011"]]) |>
mutate(Year="2011"),
get_census("2016",regions=as_census_region_list(regions), vectors=vectors[["2016"]]) |>
mutate(Year="2016"),
get_census("2021",regions=as_census_region_list(regions), vectors=vectors[["2021"]]) |>
mutate(Year="2021")
) |>
select(GeoUID,Year,base,One,Two,`Three+`) |>
left_join(regions |> select(GeoUID=region,Name=name),by="GeoUID")
Getting the data for the three censuses and our selection of regions is easy, we join on the region names to make sure we have a uniform way to format and spell the names.
12.4 Data preparation
We are interested in shares rather than absolute numbers, for this we change the three different maintainer categories to long form and compute shares.
Code
maintainer_levels <- c("One","Two","Three+")
plot_data <- maintainer_data |>
pivot_longer(maintainer_levels, names_to="Maintainers",values_to = "Value") |>
mutate(Share=Value/base)
12.5 Analysis and visualization
To visualize the data it can be useful to be deliberate about the order of variables. In R we preferably use factors to deal with categorical data, and the factor levels can be set to fix their order in visualizations.
With that in place we can plot the data. We are in particular interested in the change from 2016 to 2021, so we add in an arrow to emphasize this.
Code
plot_data |>
select(Name,Year,Maintainers,Share) |>
ggplot(aes(y=Name, x=Share)) +
geom_point(aes(colour=Year)) +
scale_colour_manual(values=sanzo::trios$c157) +
theme(legend.position = "bottom") +
facet_wrap(~Maintainers,scales="free_x") +
geom_segment(data=~pivot_wider(.,names_from = Year,values_from = Share),
aes(x=`2016`,xend=`2021`,yend=Name),
arrow = arrow(length = unit(0.1,"inches"))) +
scale_x_continuous(labels=scales::percent) +
labs(title="Number of household maintainers by municipality",
y=NULL,x="Share of households by number of household maintainers",
colour=NULL,
caption="StatCan Census 2021, 2016")
To add the segments we took the plot data and pivoted it wider by Year, allowing us to select the start and endpoints for our arrows indicating the movement 2016 to 2021.
Looking at the graph for 2011 to 2016 we notice a general decrease in the share of one-person maintainer households, coupled with an increase in both two and three-person maintainer households. But this trend is not uniform and is at least partially reversed for some cities, for example Saskatoon or Halifax. But for 2016 to 2021 the trends are very large and uniform. To the extend that one gets suspicious, demographic changes usually happen gradually and don’t show strong across the board changes like this. Either something big has happened, or we have some data issues.
At this point we should go back a step and try to understand what is going on here.
12.6 Analysis (revisited)
Let’s try and understand what processes could be driving differences or changes in the number of household maintainers. Household size is a large factor, one person households can have at most one household maintainer. Similarly, a two-person household can have at most two household maintainers. We can try to filter some of this effect out by looking only at two or more person households, that might give a clearer picture of what is going on.
12.7 Data acquisition (revisited)
For this we need data on the number of one-person households for the three years. The process is easily adapted from the code above.
Code
vectors2 <- list(
"2011"=c(`One person households`="v_CA11F_210"),
"2016"=c(`One person households`="v_CA16_419"),
"2021"=c(`One person households`="v_CA21_444")
)
household_size_data <- bind_rows(
get_census("CA11",regions=as_census_region_list(regions), vectors=vectors2[["2011"]]) |>
mutate(Year="2011"),
get_census("CA16",regions=as_census_region_list(regions), vectors=vectors2[["2016"]]) |>
mutate(Year="2016"),
get_census("CA11",regions=as_census_region_list(regions), vectors=vectors2[["2021"]]) |>
mutate(Year="2021")
) |>
select(GeoUID,Year,`One person households`)
Now we need to combine the data on household maintainers with the data on one-person households, which we do by joining the data frames by geographic identifier and year. Then we subtract out one-person households from the denominator and the numerator of single maintainer households, and proceed as before.
Code
plot_data2 <- maintainer_data |>
full_join(household_size_data,by=c("GeoUID","Year")) |>
mutate(base=base-`One person households`,
`One`=`One`-`One person households`) |>
pivot_longer(maintainer_levels, names_to="Maintainers",values_to = "Value") |>
mutate(Share=Value/base) %>%
mutate(Name=factor(Name,filter(.,Year=="2021",Maintainers=="One") |>
arrange(Value) |>
pull(Name))) |>
mutate(Maintainers=factor(Maintainers, levels = maintainer_levels))
12.8 Visualization (revisited)
For visualization we copy the code from before and add in a subtitle to indicate that we excluded one-person households.
Code
plot_data2 |>
select(Name,Year,Maintainers,Share) |>
ggplot(aes(y=Name, x=Share)) +
geom_point(aes(colour=Year)) +
scale_colour_manual(values=sanzo::trios$c157) +
theme(legend.position = "bottom") +
facet_wrap(~Maintainers,scales="free_x") +
geom_segment(data=~pivot_wider(.,names_from = Year,values_from = Share),
aes(x=`2016`,xend=`2021`,yend=Name),
arrow = arrow(length = unit(0.1,"inches"))) +
scale_x_continuous(labels=scales::percent) +
labs(title="Number of household maintainers by municipality",
subtitle="Excluding one-person households",
y=NULL,x="Share of households by number of household maintainers",
colour=NULL,
caption="StatCan Census 2021, 2016")
Dropping one-person households does remove some of the variation between cities, but it amplifies the effect of the drop in single maintainer households 2016-2021, while the change for 2011-2016 is still ambiguous, although in most cases also dropping. It’s hard to imagine what processes could cause such large change across all these cities. Shifting demographics, like Millennials aging into family formation years and coupling up could have some effect, but not of this magnitude. Changes in housing affordability, and people coupling or tripling up to pay for housing could also cause some shift, but these kind of shifts happen more gradually and won’t affect such a large share of households.
This is a bit of a puzzle, time to dig a little deeper at the data sources.
12.9 Data sources (revisited)
We have looked through the census dictionary and found no indication that the concept of the number of household maintainers changed over our time period, with a minor caveat that 2011 data is from the voluntary National Household Survey instead of the mandatory long form census.
But the Housing Characteristics Reference Guide shows that StatCan flagged this issue of lack of ability to compare this concept over time. They write:
In 2021, the household maintainer question was asked for every member of the household aged 15 years or older. Previously, in 2016, the question had a mark-all-that-apply format for the first five persons listed on the paper questionnaire. This alteration to the paper questionnaire brings better visual resemblance between the electronic and paper questionnaires. The wording was also modified to include a qualifying statement of “partly or entirely” when referring to the payments. The result of these changes is the capture of more household maintainers who are not the primary household maintainer.
The growth rate of the number of primary household maintainers since the 2016 Census of Population was 6.4%, the same as the growth rate of private occupied dwellings, because every private occupied dwelling has one primary maintainer. At the same time, the growth rate of other household maintainers over the same period was 34.1%, indicating more comprehensive coverage of all household maintainers. This has not affected the characteristics of the primary household maintainer, which are often used to derive statistics such as the homeownership rates of different generations. However, the more complete coverage of other maintainers will allow for future analysis of the characteristics of these other household members contributing to housing payments.
We can track this further by looking at the 2011 NHS and 2016 census questionnaires, where the information on the number of household maintainers comes from the first question on the dwelling section, Question E1 and F1, respectively.
In the 2011 NHS the question reads:
E1 Who pays the rent or mortgage, taxes, electricity, etc., for this dwelling?
1: Person 1
2: Person 2
3: Person 3
4: Person 4
5: Person 5
6: A person who is listed on another questionnaire for this dwelling
7: A person who does not live here
In the 2016 census the question reads identical, except it received an additional instruction on how to answer if more than one person contributes.
F1. Who pays the rent or mortgage, taxes, electricity, etc., for this dwelling?
If more than one person contributes to such payments, mark as many circles as apply.
1: Person 1
2: Person 2
3: Person 3
4: Person 4
5: Person 5
6: A person who is listed on another questionnaire for this dwelling
7: A person who does not live here
This change in instruction could impact how people answer this question, plausibly increasing the number of people listing multiple people as household maintainers. This taints the overall drop in single household maintainers that we observed in the data 2011-2016.
For the 2021 census the question on household maintainers has been removed from the dwelling section and added to the section that is to be separately filled out for every person in the household. It is not question 58 of part D, which reads.
58. Does this person pay, partly or entirely, the rent or mortgage, taxes, electricity, etc. for this dwelling?
Mark “Yes” if this person pays the rent or mortgage, taxes, electricity, etc. for this dwelling, even if more than one person contributes to such payments.
A dwelling is a separate set of living quarters with a private entrance from the outside or from a common hallway or stairway inside the building. This entrance should not be through someone else’s living quarters.
Do not consider payments for other dwellings such as the school residence of a child, the residence of a former spouse, or another dwelling that you may own or rent.
Yes
No
12.10 Interpretation
The number of household maintainer variable is not comparable across the 2016 to 2021 censuses, and may also be somewhat tainted for comparisons 2011 to 2016. We observe large changes in the shares of single and multiple household maintainer households 2016 to 2021 that are very likely dominated by changes to the census questionnaire.
Comparisons across regions for fixed years can still be informative, with data prior to 2021 likely being tainted by people misreading the question and selecting only one household maintainer where they should have selected more than one. Composition of households by household size also matter, removing one-person household when computing shares can remove some of the bias introduced by some municipalities having a significantly higher share of one-person households than others. This effect is particularly strong when comparing Vancouver and Surrey.
It would be worthwhile to look deeper into what drives three or more household maintainer households, either using cross tabulations or PUMF data. Surrey’s high share of multigenerational households is a likely contributor to Surrey’s high share of three or more household maintainer households.
This exercise serves as a good reminder to be suspicious of implausibly large effects in data. Most probably these arise as a result of errors in the data analysis process, but may also come about due to changes in definitions or in the data generation process, in this case the questionnaire.