Main entry point for the canpumf package. Downloads (if needed), parses metadata, applies bilingual labels, and returns a lazy `dplyr::tbl()` backed by a DuckDB file in the cache directory. Subsequent calls reuse the cached DuckDB without re-downloading.
Arguments
- series
Survey series acronym, e.g. `"SFS"`, `"CHS"`, `"LFS"`, `"Census"`, `"CPSS"`. See [list_canpumf_collection()] for all supported series and versions.
- version
Version string (e.g. `"2019"`, `"2021 (individuals)"`, `"2023-06"`). For series with a single version omit or pass `NULL`.
- lang
`"eng"` (default) or `"fra"`. Selects which set of labels to apply. Each language creates a separate DuckDB table (created lazily on first request).
- cache_path
Root cache directory. Defaults to `getOption("canpumf.cache_path", tempdir())`. Set persistently in `.Rprofile` with `options(canpumf.cache_path = "<path>")`.
- refresh
`FALSE` (default) reuses cached data. `TRUE` clears the DuckDB table and metadata and rebuilds from the already-extracted raw files (does not re-download). `"auto"` is accepted for LFS only and downloads all available versions not yet in the database.
- redownload
If `TRUE`, delete the cached zip and extracted files and re-download from StatCan before rebuilding. Implies `refresh = TRUE`. Not valid with `refresh = "auto"`.
- read_only
Open the DuckDB connection in read-only mode (default `TRUE`). Pass `FALSE` to allow write access, e.g. to persist custom views or derived tables in the DuckDB file. Use [close_pumf()] to release the connection when done.
- registry
Optional custom configuration created by [pumf_registry_entry()] (or [pumf_registry()]), used to parse and build a survey that is not in the built-in registry, or to override fields of one that is. Applied only when a build actually happens – on an already-imported survey it has no effect unless `refresh = TRUE` is also passed (a message is emitted in that case). Not supported for LFS. For a survey not in [list_canpumf_collection()], deposit the raw files under `<cache_path>/<series>/<version>/` first (there is no download URL).
- module
For multi-module surveys (several linked files in one DuckDB, e.g. GSS cycle 16 / "Aging and Social Support" 2002, whose `MAIN`, `CG4`, `CG6` and `CR` files join on `RECID`), selects which module table to return. `NULL` (default) returns the survey's primary module; for a multi-module survey a one-time message then lists the sibling modules and shows how to open one. Use [pumf_module()] to open a sibling module on the *same* connection so the two tbls are joinable. Not supported for LFS.
- register_connection
If `TRUE` (default), the DuckDB connection backing the returned tbl may appear in the RStudio Connections pane (subject to RStudio/duckdb settings). Pass `FALSE` to suppress that registration – useful when opening and closing many connections programmatically (e.g. iterating over surveys in a notebook), where the pane would otherwise be spammed. Defaults to `getOption("canpumf.register_connection", TRUE)`, so you can disable it globally with `options(canpumf.register_connection = FALSE)`.
- ...
Accepts deprecated parameter names (`pumf_series`, `pumf_version`, `pumf_cache_path`, `layout_mask`, `file_mask`, `guess_numeric`, `timeout`, `refresh_layout`) with a warning.
Value
A lazy `dplyr::tbl()` backed by a DuckDB connection. Data values are pre-labeled as factors. Call `dplyr::collect()` to materialise a local tibble, [label_pumf_columns()] to rename columns to their human-readable labels, or [close_pumf()] to release the connection.
Details
The LFS is treated specially: all versions share a single `LFS.duckdb` database. Pass `version = "YYYY"` (annual) or `"YYYY-MM"` (monthly). `refresh = "auto"` downloads every available LFS version that is not yet in the database; this is only valid for LFS.
See also
[label_pumf_columns()], [pumf_var_labels()], [pumf_metadata()], [close_pumf()], [list_canpumf_collection()]
Examples
if (FALSE) { # \dontrun{
# Download and open the SFS 2019 as a lazy DuckDB table
sfs <- get_pumf("SFS", "2019")
dplyr::glimpse(sfs)
# Collect a local tibble after filtering
high_wealth <- sfs |>
dplyr::filter(PEFAMID == 1) |>
dplyr::collect()
# French labels
sfs_fr <- get_pumf("SFS", "2019", lang = "fra")
# LFS: annual version
lfs <- get_pumf("LFS", "2022")
# Release the connection when done
close_pumf(sfs)
} # }
