February 2022 list of packages of interest
- Introduction
- splitTools: Tools for Data Splitting
- matlab2r: Translation Layer from MATLAB to R
- grafify: Easy Graphs for Data Visualisation and Linear Models for ANOVA
- DiagrammeR: Graph/Network Visualization
- optedr: Calculating Optimal and D-Augmented Designs
- ReDaMoR: Relational Data Modeler
- gridpattern: 'grid' Pattern Grobs
- PairViz: Visualization using Graph Traversal
- explore: Simplifies Exploratory Data Analysis
- outForest: Multivariate Outlier Detection and Replacement
TOC
- splitTools: Tools for Data Splitting
- matlab2r: Translation Layer from MATLAB to R
- grafify: Easy Graphs for Data Visualisation and Linear Models for ANOVA
- DiagrammeR: Graph/Network Visualization
- optedr: Calculating Optimal and D-Augmented Designs
- ReDaMoR: Relational Data Modeler
- gridpattern: 'grid' Pattern Grobs
- PairViz: Visualization using Graph Traversal
- explore: Simplifies Exploratory Data Analysis
- outForest: Multivariate Outlier Detection and Replacement
Introduction
Each month I will describe the package that I've discovered or rediscovered and the ones that I've used the most of my time. I will start with the package used in my work and the the one that I would like to try/did not had time to use for work and also fun
Each card is organised as this
Name of the package: short description
mytags: #example tag
links
[cran package link]
[cran vignette link]
[github link]
description from the author/vignette
mynotes
splitTools: Tools for Data Splitting
mytags: #data splitting
links
[cran package link] https://cran.r-project.org/package=splitTools
[cran vignette link] https://cran.r-project.org/web/packages/splitTools/vignettes/splitTools.html
description from the author/vignette
Fast, lightweight toolkit for data splitting. Data sets can be partitioned into disjoint groups (e.g. into training, validation, and test) or into (repeated) k-folds for subsequent cross-validation. Besides basic splits, the package supports stratified, grouped as well as blocked splitting. Furthermore, cross-validation folds for time series data can be created. See e.g. Hastie et al. (2001) doi:10.1007/978-0-387-84858-7 for the basic background on data partitioning and cross-validation.
mynotes
matlab2r: Translation Layer from MATLAB to R
mytags: #R #Matlab
links
[cran package link] https://cran.r-project.org/package=matlab2r
description from the author/vignette
Allows users familiar with MATLAB to use MATLAB-named functions in R. Several basic MATLAB functions are written in this package to mimic the behavior of their original counterparts, with more to come as this package grows.
mynotes
grafify: Easy Graphs for Data Visualisation and Linear Models for ANOVA
mytags: #multivariate #Inference #tests #statistics
links
[cran package link] https://cran.r-project.org//package=energy
description from the author/vignette
E-statistics (energy) tests and statistics for multivariate and univariate inference, including distance correlation, one-sample, two-sample, and multi-sample tests for comparing multivariate distributions, are implemented. Measuring and testing multivariate independence based on distance correlation, partial distance correlation, multivariate goodness-of-fit tests, k-groups and hierarchical clustering based on energy distance, testing for multivariate normality, distance components (disco) for non-parametric analysis of structured data, and other energy statistics/methods are implemented.
mynotes
DiagrammeR: Graph/Network Visualization
mytags: #graph #networks
links
[cran package link] https://cran.r-project.org/package=DiagrammeR
description from the author/vignette
Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.
mynotes
optedr: Calculating Optimal and D-Augmented Designs
mytags: #DoE #Chemometrics #optimal-design
links
[cran package link] https://cran.r-project.org//package=optedr
description from the author/vignette
Calculates D-, Ds-, A- and I-optimal designs for non-linear models, via an implementation of the cocktail algorithm (Yu, 2011, doi:10.1007/s11222-010-9183-2). Compares designs via their efficiency, and D-augments any design with a controlled efficiency. An efficient rounding function has been provided to transform approximate designs to exact designs.mynotes
ReDaMoR: Relational Data Modeler
mytags: #database #relational #data
links
[cran package link] https://cran.r-project.org/package=ReDaMoR
[vignette link] https://cran.r-project.org/web/packages/ReDaMoR/vignettes/ReDaMoR.html
description from the author/vignette
The aim of this package is to manipulate relational data models in R. It provides functions to create, modify and export data models in json format. It also allows importing models created with 'MySQL Workbench' (https://www.mysql.com/products/workbench/). These functions are accessible through a graphical user interface made with 'shiny'. Constraints such as types, keys, uniqueness and mandatory fields are automatically checked and corrected when editing a model. Finally, real data can be confronted to a model to check their compatibility.
gridpattern: 'grid' Pattern Grobs
mytags: #database #relational #data
links
[cran package link] https://cran.r-project.org/package=gridpattern
[vignette link] https://cran.r-project.org/web/packages/gridpattern/vignettes/developing-patterns.html,
https://cran.r-project.org/web/packages/gridpattern/vignettes/tiling.html
description from the author/vignette
Provides 'grid' grobs that fill in a user-defined area with various patterns. Includes enhanced versions of the geometric and image-based patterns originally contained in the 'ggpattern' package as well as original 'pch', 'polygon_tiling', 'regular_polygon', 'rose', 'text', 'wave', and 'weave' patterns plus support for custom user-defined patterns.
PairViz: Visualization using Graph Traversal
mytags: #graphs #visualization
links
[cran package link] https://cran.r-project.org/package=gridpattern
[vignette link] https://cran.r-project.org/web/packages/gridpattern/vignettes/developing-patterns.html,
https://cran.r-project.org/web/packages/gridpattern/vignettes/tiling.html
description from the author/vignette
Improving graphics by ameliorating order effects, using Eulerian tours and Hamiltonian decompositions of graphs. References for the methods presented here are C.B. Hurley and R.W. Oldford (2010) doi:10.1198/jcgs.2010.09136 and C.B. Hurley and R.W. Oldford (2011) doi:10.1007/s00180-011-0229-5.
explore: Simplifies Exploratory Data Analysis
mytags: #graphs #visualization
links
[cran package link] https://cran.r-project.org/package=gridpattern
[vignette link] https://cran.r-project.org/web/packages/explore/vignettes/explore.html,
https://cran.r-project.org/web/packages/explore/vignettes/explore_mtcars.html,
https://cran.r-project.org/web/packages/explore/vignettes/explore_penguins.html,
https://cran.r-project.org/web/packages/explore/vignettes/explore_titanic.html
description from the author/vignette
Interactive data exploration with one line of code or use an easy to remember set of tidy functions for exploratory data analysis. Introduces three main verbs. explore() to graphically explore a variable or table, describe() to describe a variable or table and report() to create an automated report.
outForest: Multivariate Outlier Detection and Replacement
mytags: #random forest #outliers
links
[cran package link] https://cran.r-project.org/package=outForest [vignette link] https://cran.r-project.org/web/packages/outForest/vignettes/outForest.html
description from the author/vignette
Provides a random forest based implementation of the method described in Chapter 7.1.2 (Regression model based anomaly detection) of Chandola et al. (2009) doi:10.1145/1541880.1541882. It works as follows: Each numeric variable is regressed onto all other variables by a random forest. If the scaled absolute difference between observed value and out-of-bag prediction of the corresponding random forest is suspiciously large, then a value is considered an outlier. The package offers different options to replace such outliers, e.g. by realistic values found via predictive mean matching. Once the method is trained on a reference data, it can be applied to new data..