vignettes/listing_cansim_tables.Rmd
listing_cansim_tables.Rmd
This vignette details how to use the internal table search functions
in the cansim
package with a simple example using
employment data for economic regions in British Columbia.
The list of available tables is cached in the current R session to avoid repeated downloading of data.
Calling list_cansim_cubes
returns a data frame with
useful metadata for available tables. There are 21 fields of metadata
for each table including title, in English and French, keyword sets,
notes, and table numbers.
library(cansim)
names(list_cansim_cubes())
#> Retrieving cube information from StatCan servers...
#> [1] "cansim_table_number" "cubeTitleEn" "cubeTitleFr"
#> [4] "productId" "cansimId" "cubeStartDate"
#> [7] "cubeEndDate" "releaseTime" "archived"
#> [10] "subjectCode" "surveyCode" "frequencyCode"
#> [13] "corrections" "dimensionNameEn" "dimensionNameFr"
#> [16] "surveyEn" "surveyFr" "subjectEn"
#> [19] "subjectFr"
The appropriate table can be found by subsetting or filtering on the properties we want to use to find the appropriate tables.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
list_cansim_cubes() %>%
filter(grepl("Labour force characteristics",cubeTitleEn),
grepl("economic region",cubeTitleEn)) %>%
select(cansim_table_number,cubeTitleEn)
#> Retrieving cube information from temporary cache.
#> # A tibble: 2 × 2
#> cansim_table_number cubeTitleEn
#> <chr> <chr>
#> 1 14-10-0090 Labour force characteristics by province, territory and e…
#> 2 14-10-0293 Labour force characteristics by economic region, three-mo…
The search came up with two tables. In this example we are interested
in the unemployment rate for 2015 onwards for the Lower Mainland,
Vancouver Island, and Okanagan economic regions from the Labour Force
Characteristics table. We use the tidyr
package here to
reshape data from a long format to a wider format.
library(tidyr)
selected_table <- "14-10-0293"
data <-get_cansim(selected_table) %>%
filter(grepl("Mainland|Vancouver Island|Okanagan", GEO),
Date>=as.Date("2015-01-01"),
`Labour force characteristics`=="Unemployment rate") %>%
select(Date, GEO, Statistics, val_norm) %>%
spread(key="Statistics", value=val_norm)
#> Accessing CANSIM NDM product 14-10-0293 from Statistics Canada
#> Parsing data
We can visualize then results with ggplot2
.
library(ggplot2)
ggplot(data, aes(x=Date, group = GEO,y=Estimate)) +
geom_ribbon(aes(ymin=Estimate - `Standard error of estimate`,
ymax=Estimate + `Standard error of estimate`, fill=""),
alpha=0.8) +
geom_line(aes(color=GEO)) +
scale_y_continuous(labels=scales::percent) +
scale_fill_manual(name = "", values="grey80", label="Standard error") +
theme_bw() +
labs(title = "Comparison of unemployment rate by economic region",
y = "Unemployment Rate",
x = "",
color = "",
caption=paste0("CANSIM ", selected_table))