Create scalar valued summary features for a dataset from feature functions.

features(.tbl, .var, features, ...)

features_at(.tbl, .vars, features, ...)

features_all(.tbl, features, ...)

features_if(.tbl, .predicate, features, ...)

Arguments

.tbl

A dataset

.var, .vars

The variable(s) to compute features on

features

A list of functions (or lambda expressions) for the features to compute. feature_set() is a useful helper for building sets of features.

...

Additional arguments to be passed to each feature. These arguments will only be passed to features which use it in their formal arguments (base::formals()), and not via their .... While passing na.rm = TRUE to stats::var() will work, it will not for base::mean() as its formals are x and .... To more precisely pass inputs to each function, you should use lambdas in the list of features (~ mean(., na.rm = TRUE)).

.predicate

A predicate function (or lambda expression) to be applied to the columns or a logical vector. The variables for which .predicate is or returns TRUE are selected.

Details

Lists of available features can be found in the following pages:

See also

Examples

# Provide a set of functions as a named list to features. library(tsibble) tourism %>% features(Trips, features = list(mean = mean, sd = sd))
#> # A tibble: 304 x 5 #> Region State Purpose mean sd #> <chr> <chr> <chr> <dbl> <dbl> #> 1 Adelaide South Australia Business 156. 35.6 #> 2 Adelaide South Australia Holiday 157. 27.1 #> 3 Adelaide South Australia Other 56.6 17.3 #> 4 Adelaide South Australia Visiting 205. 32.5 #> 5 Adelaide Hills South Australia Business 2.66 4.30 #> 6 Adelaide Hills South Australia Holiday 10.5 6.37 #> 7 Adelaide Hills South Australia Other 1.40 1.65 #> 8 Adelaide Hills South Australia Visiting 14.2 10.7 #> 9 Alice Springs Northern Territory Business 14.6 7.20 #> 10 Alice Springs Northern Territory Holiday 31.9 18.1 #> # … with 294 more rows
# Search and use useful features with `feature_set()`. if(requireNamespace("feasts")) library(feasts) tourism %>% features(Trips, features = feature_set(tags = "autocorrelation"))
#> # A tibble: 304 x 14 #> Region State Purpose acf1 acf10 diff1_acf1 diff1_acf10 diff2_acf1 #> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Adelaide South Aus… Busine… 0.0333 0.131 -0.520 0.463 -0.676 #> 2 Adelaide South Aus… Holiday 0.0456 0.372 -0.343 0.614 -0.487 #> 3 Adelaide South Aus… Other 0.517 1.15 -0.409 0.383 -0.675 #> 4 Adelaide South Aus… Visiti… 0.0684 0.294 -0.394 0.452 -0.518 #> 5 Adelaide… South Aus… Busine… 0.0709 0.134 -0.580 0.415 -0.750 #> 6 Adelaide… South Aus… Holiday 0.131 0.313 -0.536 0.500 -0.716 #> 7 Adelaide… South Aus… Other 0.261 0.330 -0.253 0.317 -0.457 #> 8 Adelaide… South Aus… Visiti… 0.139 0.117 -0.472 0.239 -0.626 #> 9 Alice Sp… Northern … Busine… 0.217 0.367 -0.500 0.381 -0.658 #> 10 Alice Sp… Northern … Holiday -0.00660 2.11 -0.153 2.11 -0.274 #> # … with 294 more rows, and 6 more variables: diff2_acf10 <dbl>, #> # season_acf1 <dbl>, pacf5 <dbl>, diff1_pacf5 <dbl>, diff2_pacf5 <dbl>, #> # season_pacf <dbl>
# Best practice is to use anonymous functions for additional arguments tourism %>% features(Trips, list(~ quantile(., probs=seq(0,1,by=0.2))))
#> # A tibble: 304 x 9 #> Region State Purpose `0%` `20%` `40%` `60%` `80%` `100%` #> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Adelaide South Austra… Busine… 68.7 127. 145. 160. 182. 242. #> 2 Adelaide South Austra… Holiday 108. 133. 148. 160. 181. 224. #> 3 Adelaide South Austra… Other 25.9 42.3 50.6 55.9 70.4 107. #> 4 Adelaide South Austra… Visiti… 137. 177. 194. 214. 237. 270. #> 5 Adelaide H… South Austra… Busine… 0 0 0.763 1.79 4.56 28.6 #> 6 Adelaide H… South Austra… Holiday 0 5.29 7.51 11.3 15.5 35.8 #> 7 Adelaide H… South Austra… Other 0 0 0.685 1.18 2.44 8.95 #> 8 Adelaide H… South Austra… Visiti… 0.778 8.15 10.2 14.1 19.3 81.1 #> 9 Alice Spri… Northern Ter… Busine… 1.01 8.41 11.3 15.8 21.5 34.1 #> 10 Alice Spri… Northern Ter… Holiday 2.81 14.6 24.1 36.2 46.6 76.5 #> # … with 294 more rows