R/vector_discovery.R
find_census_vectors.Rd
Query the available list of Census vectors based on their label and return
details including vector code. Default search behaviour expects an exact match, but
keyword or semantic searches can be used instead by setting query_type='keyword'
or
query_type = 'semantic'
instead. Keyword search is useful when looking to explore
Census vectors based on broad themes like "income" or "language". Keyword search separates
the query into unigrams and returns Census vectors with matching words, ranked by incidence
of matches. Semantic search is designed for more precise searches while allowing room for error
for spelling or phrasing, as well as for finding closely related vector matches. Semantic search
separates the query into n-grams and relies on string distance measurement using a generalized
Levenshtein distance approach.
Some census vectors return population counts segmented by Female
and Male
populations, in
addition to a total aggregate. By default, query matches will return matches for the Total
aggregation, but can optionally return only the Female
or Male
aggregations by adding
type = 'female'
or type = 'male'
as a parameter.
find_census_vectors(query, dataset, type = "all", query_type = "exact", ...)
The term or phrase to search for e.g. 'Oji-cree'
.
Search queries are case insensitive.
The dataset to query for available vectors, e.g. 'CA16'
.
To see a list of available datasets: list_census_datasets()
One of 'all'
, 'total'
, 'male'
or 'female'
.
If specified, only return aggregations of specified `type`. By default, only
the 'total'
aggregation will be returned.
One of exact
, 'semantic'
or 'keyword'
.
By default, assumes exact string matching, but the alternatives may be better
options in some cases. See description section for more details on query types.
Other arguments passed to internal functions.
find_census_vectors('Oji-cree', dataset = 'CA16', type = 'total', query_type = 'exact')
#> # A tibble: 4 × 4
#> vector type label details
#> <chr> <fct> <chr> <chr>
#> 1 v_CA16_626 Total Oji-Cree Language; Total - Mother tongue for the total popu…
#> 2 v_CA16_1433 Total Oji-Cree Language; Total - Language spoken most often at ho…
#> 3 v_CA16_2676 Total Oji-Cree 25% Data; Total - Knowledge of languages for the p…
#> 4 v_CA16_5930 Total Oji-Cree 25% Data; Work; Total - Language used most often a…
find_census_vectors('commuting duration', dataset = 'CA11', type = 'female', query_type = 'keyword')
#> # A tibble: 2 × 4
#> vector type label details
#> <chr> <fct> <chr> <chr>
#> 1 v_CA11N_2214 Female Total employed population aged 15 years and over … CA 201…
#> 2 v_CA11N_2217 Female Median commuting duration CA 201…
find_census_vectors('after tax income', dataset = 'CA16', type = 'total', query_type = 'semantic')
#> Multiple possible matches. Results ordered by closeness.
#> # A tibble: 56 × 4
#> vector type label details
#> <chr> <fct> <chr> <chr>
#> 1 v_CA16_2210 Total Number of after-tax income recipients aged 15 year… Income…
#> 2 v_CA16_2213 Total Median after-tax income in 2015 among recipients (… Income…
#> 3 v_CA16_2306 Total Percentage with after-tax income Income…
#> 4 v_CA16_2297 Total Total - After-tax income groups in 2015 for the po… Income…
#> 5 v_CA16_2300 Total Without after-tax income Income…
#> 6 v_CA16_2303 Total With after-tax income Income…
#> 7 v_CA16_2309 Total Under $10,000 (including loss) Income…
#> 8 v_CA16_2312 Total $10,000 to $19,999 Income…
#> 9 v_CA16_2315 Total $20,000 to $29,999 Income…
#> 10 v_CA16_2318 Total $30,000 to $39,999 Income…
#> # ℹ 46 more rows
if (FALSE) {
# This incorrect spelling will return a warning that no match was found,
# but will suggest trying semantic or keyword search.
find_census_vectors('Ojibwey', dataset = 'CA16', type = 'total')
# This will find near matches as well
find_census_vectors('Ojibwey', dataset = 'CA16', type = 'total', query_type = "semantic")
find_census_vectors('commute duration', dataset = 'CA16', type = 'female', query_type = 'keyword')
find_census_vectors('commute duration', dataset = 'CA11', type = 'all', query_type = 'keyword')
find_census_vectors('ukrainian origin', dataset = 'CA16', type = 'total', query_type = 'keyword')
}