Enhance Your Machine Learning Workflow with These 15 R Libraries

R Libraries in 2023

Introduction

In the realm of machine learning, Python has gained immense popularity. However, it’s important to recognize that R, with its clean code and versatile functionality, remains an essential tool in any developer’s toolkit. R not only simplifies simple tasks but also excels in complex endeavors such as forecasting and modeling. Today, R is stronger than ever, boasting an ever-expanding list of supported libraries. In this article, we will explore 15 remarkable R libraries for machine learning that were released in 2023.

fastTopics

The fastTopics package offers algorithms for fitting topic models and non-negative matrix factorization. Leveraging the relationship between probabilistic latent semantic index and Poisson non-negative matrix factorization, this package provides tools to compare, annotate, and visualize models. With its ability to create ‘structure plots’ and identify key features, fastTopics facilitates efficient data analysis.

Check the documentation for fastTopics here.

Metrica

Metrica is a comprehensive package featuring over 80 functions designed to evaluate the prediction performance of regression and classification point-forecast models. It offers a diverse toolbox encompassing error metrics, indices, and coefficients for assessing different features between predicted and observed values. Additionally, Metrica provides basic visualization functions in a customizable format using ggplot.

Check the documentation for Metrica here.

SparseVFC (Sparse Vector Field Consensus for Vector Field Learning)

SparseVFC implements the Sparse Vector Field Consensus (SparseVFC) algorithm, which is utilized for robust vector field learning. This package, primarily translated from MATLAB functions, allows users to explore the algorithm’s potential in R. By leveraging SparseVFC, developers can effectively handle vector field learning tasks.

Check the documentation for SparseVFC here.

agua

Based on the h2oparsnip package, agua enables users to fit, optimize, and evaluate models using H2O and tidymodels syntax. It leverages the new parsnip computational engine ‘h2o’ for utilizing features. During the model fitting process, the data is directly passed to the h2o server. Instructions are given to the h2o.grid() function for processing the data, which is passed once during tuning.

Check the documentation for agua here.

OpenAI

OpenAI serves as an R wrapper for OpenAI API endpoints, providing seamless integration with the OpenAI ecosystem. The package covers a wide range of functionalities, including Engines, Completions, Edits, Files, Fine-tunes, Embeddings, and legacy Searches, Classifications, and Answers endpoints. To utilize the OpenAI API, users need to sign up and obtain an API key. The documentation provides detailed instructions for acquiring the key.

Check the documentation for OpenAI here.

webmorphR

With a specific focus on face stimuli, webmorphR facilitates consistent construction of image stimuli. Although the research stimuli cannot be shared due to ethical reasons, webmorphR allows users to share recipes for creating stimuli. This encourages generalizability to new faces, enabling researchers to conduct experiments effectively.

Check the documentation for webmorphR here.

cito

The ‘cito’ package aims to simplify the process of building and training Neural Networks by utilizing standard R syntax. It offers a seamless experience by allowing the creation and training of models with just one line of code. Additionally, cito enables users to utilize all generic R methods on the created object. The package is based on the ‘torch’ framework, eliminating the need for Python installation.

Check the documentation for cito here.

etree

etree focuses on implementing Energy Trees, a model for classification and regression with structured and mixed-type data. This package covers functions and graphs as structured covariates. By leveraging etree, developers can effectively handle classification and regression tasks involving structured and mixed-type data.

Check the documentation for etree here.

mildsvm

mildsvm provides a simple and efficient approach to learning from data by training Support Vector Machine (SVM)-based classifiers. Additionally, it offers useful functions for building and printing multiple instance data frames. By utilizing mildsvm, developers can harness the power of SVM-based classifiers in their machine learning workflows.

Check the documentation for mildsvm here.

aorsf

The aorsf package focuses on decision trees, which involve splitting training data into subsets to maximize similarity within the subsets. This process continues until a stopping criterion is met. aorsf simplifies the development and management of decision trees, enabling users to efficiently work with this popular machine learning technique.

Check the documentation for aorsf here.

calibrationband

calibrationband is an invaluable R package for assessing the calibration of binary outcome predictions. Authored by Timo Dimitriadis, Alexander Henzi, and Marius Puke, it provides functions to evaluate the calibration of probabilistic classifiers. The package offers confidence bands for monotonic functions and facilitates the construction of inverted goodness-of-fit tests. calibrationband empowers users to analyze and improve the calibration of their predictive models effectively.

Check the documentation for calibrationband here.

tidytags

tidytags is designed to enhance the accessibility and robustness of collecting Twitter data. It retrieves tweet data collected by a Twitter Archiving Google Sheet (TAGS) and obtains additional metadata from Twitter using the rtweet R package. This package also provides additional functions to facilitate systematic and flexible analyses of Twitter data. Users can leverage tidytags to perform comprehensive analyses based on predefined search criteria and collection frequency.

Check the documentation for tidytags here.

Mlim

Mlim offers a versatile solution for missing data imputation across various data types, including continuous, binary, multinomial, and ordinal. The package utilizes machine learning techniques, including a fine-tuned ELNET algorithm, to achieve high-performance missing data imputation. Mlim outperforms other available imputation software on multiple fronts, providing users with accurate and reliable imputation results.

Check the documentation for Mlim here.

Kernelshap

The Kernelshap package implements a multidimensional refinement of the Kernel SHAP Algorithm. It allows for the calculation of Kernel SHAP values precisely through iterative sampling or a hybrid approach. The algorithm iterates until convergence and provides standard errors. By utilizing Kernelshap, developers can gain valuable insights into the feature importance of their models.

Check the documentation for Kernelshap here.

Survex

Survex is an R package built on top of DALEX, offering model-agnostic explanations for survival models. It extends the methods described in Explanatory Model Analysis (EMA) and implemented in DALEX to models with functional output. Survex empowers users to gain deeper insights into survival models, enabling them to understand the factors influencing survival outcomes.

Check the documentation for Survex here.

Conclusion

In this article, we have explored 15 remarkable R libraries for machine learning that were released in 2023. These libraries cover a wide range of functionalities, including topic modeling, regression/classification performance evaluation, vector field learning, H2O integration, OpenAI API wrappers, face stimuli construction, neural network training, energy trees, SVM-based classifiers, decision trees, calibration assessment, Twitter data collection, missing data imputation, Kernel SHAP algorithm, and survival model explanations.

By incorporating these libraries into their machine learning workflows, developers can enhance their productivity, streamline their processes, and gain valuable insights from their data. Each library serves a unique purpose and offers extensive documentation to guide users through their functionalities and usage.