vignettes/retrieving_cansim_vectors.Rmd
retrieving_cansim_vectors.Rmd
Many of the time-series data available from Statistics Canada have
individual vector codes. These vector codes follow a naming format of a
lower-case “v” and an identifying numbers. Time-series tables will often
bundle many series together, resulting in large and sometimes unwieldy
files. Many users of Canadian statistical data, who are often concerned
with specific time series such as the CPI or international arrivals,
will typically know the exact series they need. For this reason, the
cansim
package also provides two functions to make it
easier to retrieve individual vectors: get_cansim_vector()
and get_cansim_vector_for_latest_periods()
.
Running search_cansim_cubes("consumer price index")
shows 33 tables as results. However, if you are tracking the Canadian
Consumer Price Index (CPI) over time, you might already know the
Statistics Canada vector code the seasonally-unadjusted all-items CPI
value: v41690973. To retrieve just this data series on its own
without all of the additional data available in related tables, we can
use the get_cansim_vector()
function with the vector code
and the date onwards from which we want to get vector results for.
get_cansim_vector("v41690973","2015-01-01")
#> Accessing CANSIM NDM vectors from Statistics Canada
#> # A tibble: 117 × 14
#> DECIMALS VALUE REF_DATE releaseTime SYMBOL frequencyCode SCALAR_ID COORDINATE
#> <int> <dbl> <chr> <chr> <int> <int> <int> <chr>
#> 1 1 124. 2015-01… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 2 1 125. 2015-02… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 3 1 126. 2015-03… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 4 1 126. 2015-04… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 5 1 127. 2015-05… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 6 1 127. 2015-06… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 7 1 127. 2015-07… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 8 1 127. 2015-08… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 9 1 127. 2015-09… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 10 1 127. 2015-10… 2021-07-28… 0 6 0 2.2.0.0.0…
#> # ℹ 107 more rows
#> # ℹ 6 more variables: VECTOR <chr>, cansimTableNumber <chr>, val_norm <dbl>,
#> # Date <date>, GEO <fct>, `Products and product groups` <fct>
The call to get_cansim_vector
takes three inputs: a
string code (or codes) for vectors
, a
start_time
in YYYY-MM-DD format, and an optional value for
end_time
, also in YYYY-MM-DD format. By default, the
start_time
and end_time
vectors uses
Statistics Canada’s reference periods (“REF_DATE”) for selecting the
date range of the data for retrieved vectors. There are a few optional
input parameters for this function. If end_time
is not
provided, the call will use the current date as the default series end
time. If the optional parameter use_ref_date
is set to
FALSE
, then vector retrieval will instead filter on the
release date of the vector itself.
Vectors can be coerced into a list object in order to retrieve multiple series at the same time. For example, provincial seasonally-unadjusted CPI values have their own vector codes. The vector code for British Columbia all-items CPI is v41692462.
The below code retrieves monthly Canadian and BC CPI values for the period January 2015 to December 2017 only. Monthly data series are always dated to the first day of the month.
vectors <- c("v41690973","v41692462")
get_cansim_vector(vectors, "2017-01-01")
#> Accessing CANSIM NDM vectors from Statistics Canada
#> # A tibble: 186 × 14
#> DECIMALS VALUE REF_DATE releaseTime SYMBOL frequencyCode SCALAR_ID COORDINATE
#> <int> <dbl> <chr> <chr> <int> <int> <int> <chr>
#> 1 1 130. 2017-01… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 2 1 130. 2017-02… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 3 1 130. 2017-03… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 4 1 130. 2017-04… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 5 1 130. 2017-05… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 6 1 130. 2017-06… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 7 1 130. 2017-07… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 8 1 130. 2017-08… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 9 1 131. 2017-09… 2021-07-28… 0 6 0 2.2.0.0.0…
#> 10 1 131. 2017-10… 2021-07-28… 0 6 0 2.2.0.0.0…
#> # ℹ 176 more rows
#> # ℹ 6 more variables: VECTOR <chr>, cansimTableNumber <chr>, val_norm <dbl>,
#> # Date <date>, GEO <fct>, `Products and product groups` <fct>
Some vectors extend backwards for a significant number of periods
that may not be of interest.
get_cansim_vectors_for_lates_periods()
is a wrapper around
get_cansim_vectors
that takes a periods
input
instead of arguments for start_time
and
end_time
, and provides data for the selected vector(s) for
the last n
periods for which data is available,
irrespective of dates.
get_cansim_vector_for_latest_periods("v41690973", periods = 60)
#> Accessing CANSIM NDM vectors from Statistics Canada
#> # A tibble: 60 × 14
#> DECIMALS VALUE REF_DATE releaseTime SYMBOL frequencyCode SCALAR_ID COORDINATE
#> <int> <dbl> <chr> <chr> <int> <int> <int> <chr>
#> 1 1 137. 2019-10… 2021-09-15… 0 6 0 2.2.0.0.0…
#> 2 1 136. 2019-11… 2021-09-15… 0 6 0 2.2.0.0.0…
#> 3 1 136. 2019-12… 2021-09-15… 0 6 0 2.2.0.0.0…
#> 4 1 137. 2020-01… 2021-09-15… 0 6 0 2.2.0.0.0…
#> 5 1 137. 2020-02… 2021-09-15… 0 6 0 2.2.0.0.0…
#> 6 1 137. 2020-03… 2021-09-15… 0 6 0 2.2.0.0.0…
#> 7 1 136. 2020-04… 2021-09-15… 0 6 0 2.2.0.0.0…
#> 8 1 136. 2020-05… 2021-09-15… 0 6 0 2.2.0.0.0…
#> 9 1 137. 2020-06… 2021-09-15… 0 6 0 2.2.0.0.0…
#> 10 1 137. 2020-07… 2021-09-15… 0 6 0 2.2.0.0.0…
#> # ℹ 50 more rows
#> # ℹ 6 more variables: VECTOR <chr>, cansimTableNumber <chr>, val_norm <dbl>,
#> # Date <date>, GEO <fct>, `Products and product groups` <fct>
In these examples, we have used v41690973 for Canada and
v41692462 for BC. This can be hard to remember and can get
annoying to work with. Both vector retrieval functions in the
cansim
package allow for named vector extraction. This
works by providing a user-determined string directly into a
get_*
call. This may be useful when working with table code
and vector codes that do not have any information in their name and
become easy to lose track of.
Data retrieved as vectors also gains the additional
val_norm
column with normalized values.
This quick example uses a list with two named vectors and a starting
date as an input value, converts values (“normalizes”) on the fly, and
prepares a simple ggplot2
graphic.
vectors <- c("Canadian CPI"="v41690973",
"BC CPI"="v41692462")
data <- get_cansim_vector(vectors, "2010-01-01")
#> Accessing CANSIM NDM vectors from Statistics Canada
library(ggplot2)
ggplot(data,aes(x=Date,y=val_norm,color=label)) +
geom_line() +
labs(title="Consumer Price Index, January 2010 to September 2018",
subtitle = "Seasonally-unadjusted, all-items (2002 = 100)",
caption=paste0("CANSIM vectors ",paste0(vectors,collapse = ", ")),x="",y="",color="")
To access metadata for vectors we can use the
get_cansim_vector_info
call
get_cansim_vector_info(vectors)
#> # A tibble: 2 × 10
#> DECIMALS VECTOR table COORDINATE title_en title_fr UOM frequencyCode
#> <int> <chr> <chr> <chr> <chr> <chr> <int> <int>
#> 1 1 v41690973 18-10-0004 2.2.0.0.0… Canada;… Canada;… 17 6
#> 2 1 v41692462 18-10-0004 26.2.0.0.… British… Colombi… 17 6
#> # ℹ 2 more variables: SCALAR_ID <int>, title <chr>