Skip to contents

The goal of {canpumf} is to facilitate ingesting StatCan PUMF data in R.

Installation

You can install the current development version of canpumf from GitHub with:

remotes::install_github("mountainmath/canpumf")

Documentation

Please consult the documentation and example articles for further information.

StatCan publishes an official guide to the Labour Force Survey for different vintages of the LFS.

Cache path

PUMF data can be large and should be cached locally. Set the canpumf.cache_path option to a local directory via options(canpumf.cache_path="<your local path>") in your .Rprofile. Without this, data is stored in tempdir() for the session only.

DuckDB

On first use PUMF data is imported into DuckDB. By default a PUMF DuckDB connection will be shown in the RStudio (or Positron) Connections Pane once a connection is opened, you control the default behaviour by setting the canpumf.register_connection optin in your .Rprofile:

options("canpumf.register_connection" = TRUE)

Basic usage

Some PUMF data is available from StatCan via direct download and can be accessed directly via get_pumf(). In other cases, PUMF data must be ordered via EFT and deposited in the cache directory so get_pumf() can find it.

get_pumf() downloads (if needed), parses metadata, applies value labels automatically, and returns a lazy dplyr::tbl() backed by a local DuckDB database. Call dplyr::collect() to load into memory.

Column values are labeled automatically (e.g. province codes become factor levels like "British Columbia"). Column names remain as short coded names by default (e.g. PROV, LFSSTAT). To rename columns to human-readable variable labels, pipe through label_pumf_columns():

tbl <- get_pumf("LFS", "2022") |>
  label_pumf_columns()

When done querying, release the DuckDB connection with close_pumf(tbl).

LFS data

LFS data is organized by year, except for the current year where it is organized by month. To access data for a specific year:

lfs_2022 <- get_pumf("LFS", "2022")

This downloads the 2022 LFS PUMF data if needed, parses it, loads labeled data into a shared DuckDB database, and returns a lazy tbl filtered to 2022. To access all LFS data currently in the local database:

lfs_all_local <- get_pumf("LFS")

To ensure the local database contains all available LFS versions, use refresh = "auto". This checks StatCan for versions not yet in the database and imports them:

lfs_all <- get_pumf("LFS", refresh = "auto")

Census data

The canpumf package supports Census PUMF from 1971 through 2021. All releases from 1991 onward are available via direct download; years 1986 and earlier must be ordered through Statistics Canada’s EFT portal and placed in the cache directory.

pumf_2021 <- get_pumf("Census", "2021")

By default the package loads the individuals file. Available variants by year:

Years Variants
2021 individuals, hierarchical
2016 individuals, hierarchical
2011 individuals (NHS), hierarchical (NHS)
2006 individuals, hierarchical
2001 individuals, households, families
1996 individuals, households, families
1991 individuals, households, families
1986 individuals, households
1981 individuals, households
1976 individuals
1971 individuals, individuals PR
pumf_h_2016 <- get_pumf("Census", "2016 (hierarchical)")

Verified datasets

The following datasets have been end-to-end tested (metadata parsed, data imported, DuckDB built) without errors or warnings. Versions marked direct download can be fetched automatically by get_pumf(); others must be placed in the cache directory via Statistics Canada’s EFT portal.

Survey Series Verified versions Direct download
Labour Force Survey LFS annual and monthly files
Census of Population Census 2021 (individuals, hierarchical), 2016 (individuals, hierarchical), 2011 (individuals, hierarchical), 2006 (individuals, hierarchical), 2001 (individuals, households, families), 1996 (individuals, households, families), 1991 (individuals, households, families)
Census of Population (EFT) Census 1986 (individuals, households, families), 1981 (individuals, households), 1976 (individuals, households, families), 1971 (individuals, households, families — prov and cma variants)
General Social Survey — Caregiving GSS 1996, 2007, 2012, 2018
General Social Survey — Safety GSS Safety 1993, Safety 1999, Safety 2014, Safety 2019
General Social Survey — Family GSS Family 1995, Family 2001, Family 2011, Family 2017
General Social Survey — Social Identity GSS Social Identity 2003, Social Identity 2013, Social Identity 2020
General Social Survey — Education GSS Education 1994, Education 2002, Education 2007
General Social Survey — Time Use GSS Time Use 1998, Time Use 2010, Time Use 2015, Time Use 2022
GSS Giving, Volunteering and Participating SGVP 1997, 2000, 2004, 2007, 2010, 2013, 2018, 2023
Canadian COVID-19 Antibody and Health Survey CCAHS 1
International Travel Survey ITS 2018, 2019
Canadian Housing Survey CHS 2018, 2021, 2022
Survey of Financial Security SFS 1999, 2005, 2012, 2016, 2019, 2023
Canadian Perspectives Survey Series CPSS 2–6
Canadian Income Survey CIS 2017–2022
Survey of Household Spending SHS 2017, 2019, 2021, 2023

The cansim package is designed to retrieve and work with public Statistics Canada data tables. cansim prepares retrieved data tables as analysis-ready tidy dataframes and provides a number of convenience tools and functions to make it easier to work with Statistics Canada data. It is available on CRAN and on Github.

The cancensus package is designed to retrieve and work with public Statistics Canada census data via the CensusMapper API. It is available on CRAN and on Github.

Cite canpumf

If you wish to cite the canpumf package in your work:

von Bergmann, J. (2026), canpumf: Import StatCan PUMF data into R. v0.5.0.

A BibTeX entry for LaTeX users is

  @Manual{,
    author = {Jens {von Bergmann}},
    title = {canpumf: Import StatCan PUMF data into R},
    year = {2026},
    note = {R package version 0.5.0},
    url = {https://mountainmath.github.io/canpumf/},
  }

Statistics Canada Attribution

Subject to the Statistics Canada Open Data License Agreement, licensed products using Statistics Canada data should employ the following acknowledgement of source:

Acknowledgment of Source

(a) You shall include and maintain the following notice on all licensed rights of the Information:

  - Source: Statistics Canada, name of product, reference date. Reproduced and distributed on an "as is" basis with the permission of Statistics Canada.

(b) Where any Information is contained within a Value-added Product, you shall include on such Value-added Product the following notice:

  - Adapted from Statistics Canada, name of product, reference date. This does not constitute an endorsement by Statistics Canada of this product.