Create scalar valued summary features for a dataset from feature functions.
features(.tbl, .var, features, ...)
features_at(.tbl, .vars, features, ...)
features_all(.tbl, features, ...)
features_if(.tbl, .predicate, features, ...)
A dataset
An expression that produces a vector from which the features are computed.
A list of functions (or lambda expressions) for the features to compute. feature_set()
is a useful helper for building sets of features.
Additional arguments to be passed to each feature. These arguments will only be passed to features which use it in their formal arguments (base::formals()
), and not via their ...
. While passing na.rm = TRUE
to stats::var()
will work, it will not for base::mean()
as its formals are x
and ...
. To more precisely pass inputs to each function, you should use lambdas in the list of features (~ mean(., na.rm = TRUE)
).
A tidyselect compatible selection of the column(s) to compute features on.
A predicate function (or lambda expression) to be applied to the columns or a logical vector. The variables for which .predicate is or returns TRUE are selected.
Lists of available features can be found in the following pages:
# Provide a set of functions as a named list to features.
library(tsibble)
tourism %>%
features(Trips, features = list(mean = mean, sd = sd))
#> # A tibble: 304 × 5
#> Region State Purpose mean sd
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 Adelaide South Australia Business 156. 35.6
#> 2 Adelaide South Australia Holiday 157. 27.1
#> 3 Adelaide South Australia Other 56.6 17.3
#> 4 Adelaide South Australia Visiting 205. 32.5
#> 5 Adelaide Hills South Australia Business 2.66 4.30
#> 6 Adelaide Hills South Australia Holiday 10.5 6.37
#> 7 Adelaide Hills South Australia Other 1.40 1.65
#> 8 Adelaide Hills South Australia Visiting 14.2 10.7
#> 9 Alice Springs Northern Territory Business 14.6 7.20
#> 10 Alice Springs Northern Territory Holiday 31.9 18.1
#> # ℹ 294 more rows
# Search and use useful features with `feature_set()`.
library(feasts)
tourism %>%
features(Trips, features = feature_set(tags = "autocorrelation"))
#> # A tibble: 304 × 14
#> Region State Purpose acf1 acf10 diff1_acf1 diff1_acf10 diff2_acf1
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Adelaide Sout… Busine… 0.0333 0.131 -0.520 0.463 -0.676
#> 2 Adelaide Sout… Holiday 0.0456 0.372 -0.343 0.614 -0.487
#> 3 Adelaide Sout… Other 0.517 1.15 -0.409 0.383 -0.675
#> 4 Adelaide Sout… Visiti… 0.0684 0.294 -0.394 0.452 -0.518
#> 5 Adelaide Hills Sout… Busine… 0.0709 0.134 -0.580 0.415 -0.750
#> 6 Adelaide Hills Sout… Holiday 0.131 0.313 -0.536 0.500 -0.716
#> 7 Adelaide Hills Sout… Other 0.261 0.330 -0.253 0.317 -0.457
#> 8 Adelaide Hills Sout… Visiti… 0.139 0.117 -0.472 0.239 -0.626
#> 9 Alice Springs Nort… Busine… 0.217 0.367 -0.500 0.381 -0.658
#> 10 Alice Springs Nort… Holiday -0.00660 2.11 -0.153 2.11 -0.274
#> # ℹ 294 more rows
#> # ℹ 6 more variables: diff2_acf10 <dbl>, season_acf1 <dbl>, pacf5 <dbl>,
#> # diff1_pacf5 <dbl>, diff2_pacf5 <dbl>, season_pacf <dbl>
# Best practice is to use anonymous functions for additional arguments
tourism %>%
features(Trips, list(~ quantile(., probs=seq(0,1,by=0.2))))
#> # A tibble: 304 × 9
#> Region State Purpose `0%` `20%` `40%` `60%` `80%` `100%`
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Adelaide South Aus… Busine… 68.7 127. 145. 160. 182. 242.
#> 2 Adelaide South Aus… Holiday 108. 133. 148. 160. 181. 224.
#> 3 Adelaide South Aus… Other 25.9 42.3 50.6 55.9 70.4 107.
#> 4 Adelaide South Aus… Visiti… 137. 177. 194. 214. 237. 270.
#> 5 Adelaide Hills South Aus… Busine… 0 0 0.763 1.79 4.56 28.6
#> 6 Adelaide Hills South Aus… Holiday 0 5.29 7.51 11.3 15.5 35.8
#> 7 Adelaide Hills South Aus… Other 0 0 0.685 1.18 2.44 8.95
#> 8 Adelaide Hills South Aus… Visiti… 0.778 8.15 10.2 14.1 19.3 81.1
#> 9 Alice Springs Northern … Busine… 1.01 8.41 11.3 15.8 21.5 34.1
#> 10 Alice Springs Northern … Holiday 2.81 14.6 24.1 36.2 46.6 76.5
#> # ℹ 294 more rows