FEDOT: The Ultimate Framework for Automated Machine Learning

If you’re in the world of data science and machine learning, you’re probably aware of the recent advancements in automated machine learning (AutoML). These frameworks and libraries have taken the data science community by storm with their impressive results. One such framework that has garnered attention is FEDOT. In this article, we’ll delve into the world of FEDOT for automated machine learning and explore an example of time series modeling using this powerful framework.

What is FEDOT?

FEDOT is an open-source framework designed to support automated machine learning modeling. This versatile framework allows us to customize the pipeline of machine learning modeling procedures according to our specific needs. With FEDOT, we can effortlessly tackle real-world problems using various evolutionary approaches to modeling. Whether it’s classification, regression, clustering, or time series modeling, FEDOT has got us covered.

Unleashing the Features of FEDOT

FEDOT offers a myriad of modules that enable end-to-end modeling. One of the standout features of this framework is its provision of modules for fundamental processes like data preprocessing, feature engineering, and model optimization. With FEDOT, we can even build comprehensive graphs that outline the procedures used to solve any problem within the framework. Here are some of the notable features offered by FEDOT:

  1. Flexible Architecture: FEDOT’s architecture is highly flexible, allowing us to create machine learning models using different types of data.
  2. Support for Popular Libraries: FEDOT seamlessly integrates with popular machine learning libraries such as SK-Learn, Keras, and Statsmodel, providing us with a wide range of options.
  3. Custom Model Integration: FEDOT goes beyond conventional modeling by enabling the integration of models specific to areas like ODE and PDE into pipelines.
  4. Enhanced Explainability: With FEDOT, we can leverage a variety of models, ultimately increasing the explainability of our modeling procedures.

Getting Started with FEDOT

Before we embark on our FEDOT journey, we need to ensure that the framework is installed on our system. Here’s how you can install FEDOT using the following lines of code:

!pip install fedot

Once the installation is complete, we are ready to dive into the world of machine learning with FEDOT.

Time Series Modeling with FEDOT

In this section, we’ll explore an example of time series analysis using the FEDOT library. To kick things off, we need to import the required data for our analysis. For this example, we’ll be utilizing traffic data, which can be found here. Let’s proceed with importing the data using the following code:

import pandas as pd
df = pd.read_csv('<data_path>', parse_dates=['datetime'])
df.head(10)

The imported data consists of two variables, namely “datetime” and “value,” which are essential for our time series modeling.

Now, let’s visualize the imported data by plotting it using the following code:

import matplotlib.pyplot as plt
from pylab import rcParams
rcParams['figure.figsize'] = 18, 7
df.plot('datetime', 'value', c='magenta')
plt.show()

The resulting plot provides us with a clear visualization of our time series data, showcasing the vehicle values over time. With the data imported and visualized, we can proceed to leverage FEDOT’s modules for time series modeling.

Harnessing the Power of FEDOT Modules

To utilize FEDOT’s capabilities for time series modeling, we need to import the required modules. Let’s import the necessary modules from FEDOT using the following code:

from fedot.api.main import Fedot
from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams
from fedot.core.data.data import InputData
from fedot.core.data.data import train_test_data_setup
from fedot.core.repository.dataset_types import DataTypesEnum

The imported modules include FEDOT’s API, modules for solving tasks, modules for data splitting, fitting, and prediction, as well as FEDOT’s data type module.

Preparing and Processing the Data

To ensure our data is prepared for FEDOT’s modules, we need to go through the necessary data processing steps. Let’s load and split our data using the following code:

input_data = InputData.from_csv_time_series(task, '<data_path>', target_column='value')
train_data, test_data = train_test_data_setup(input_data)

By executing the above code, we load and split our data, ensuring that it is appropriately structured for FEDOT’s modules.

Defining the Task and Model

Before we proceed further, let’s check the length of our data using the following code:

print(f'Length of the time series: {len(df)}')

Based on the length of our data (801 in this case), we define a task for modeling using FEDOT’s modules. Let’s define a time series forecasting task with 144 predictions using the following code:

task = Task(TaskTypesEnum.ts_forecasting, TsForecastingParams(forecast_length=144))

With our task defined, we’re now ready to initiate the modeling process.

Initiating the Modeling Process

To start the modeling process using FEDOT’s API, we’ll initiate the model by executing the following code:

model = Fedot(problem='ts_forecasting', task_params=task.task_params)
chain = model.fit(features=train_data)

The above code initiates the modeling process, and as we can see from the output, the API begins the modeling procedure after hyperparameter tuning.

Making Predictions with FEDOT

Once the modeling process is complete, we can proceed to make predictions using FEDOT. Let’s generate predictions by executing the following code:

forecast = model.predict(features=test_data)
forecast

The resulting output provides us with the predictions made by our FEDOT models. To optimize our predictions further, let’s visualize them.

Visualizing Predictions and Results

To visualize our predictions, we’ll define a function that plots the actual time series values and the predicted values. Additionally, we’ll calculate the mean absolute error (MAE) for our predictions using the following code:

import numpy as np
from sklearn.metrics import mean_absolute_error

traffic = np.array(df['value'])

def display_results(actual_time_series, predicted_values, len_train_data, y_name='Traffic volume'):
    plt.plot(np.arange(0, len(actual_time_series)), actual_time_series, label='Actual values', c='green')
    plt.plot(np.arange(len_train_data, len_train_data + len(predicted_values)), predicted_values, label='Predicted', c='blue')

Visualizing Predictions and Results

To visualize our predictions, we’ll define a function that plots the actual time series values and the predicted values. Additionally, we’ll calculate the mean absolute error (MAE) for our predictions using the following code:

import numpy as np
from sklearn.metrics import mean_absolute_error

traffic = np.array(df['value'])

def display_results(actual_time_series, predicted_values, len_train_data, y_name='Traffic volume'):
    plt.plot(np.arange(0, len(actual_time_series)), actual_time_series, label='Actual values', c='green')
    plt.plot(np.arange(len_train_data, len_train_data + len(predicted_values)), predicted_values, label='Predicted', c='blue')
    plt.plot([len_train_data, len_train_data], [min(actual_time_series), max(actual_time_series)], c='black', linewidth=1)
    plt.ylabel(y_name, fontsize=15)
    plt.xlabel('Time index', fontsize=15)
    plt.legend(fontsize=15, loc='upper left')
    plt.grid()
    plt.show()

mae_value = mean_absolute_error(actual_time_series[len_train_data:], predicted_values)
print(f'MAE value: {mae_value}')

The function display_results takes in the actual time series values, the predicted values, the length of the training data, and the name of the target variable as inputs. It plots the actual values in green, the predicted values in blue, and adds a vertical line to separate the training and test data. It also calculates the MAE value for the predictions.

By visualizing the results, we can gain insights into the performance of our models and evaluate their accuracy.

Understanding the Model Pipeline

To understand the modeling procedure followed by FEDOT, we can visualize the pipeline generated by the framework. Let’s check the pipeline using the following code:

chain.show()

print('Obtained chain:')
for node in chain.nodes:
    print(f'{node.operation}, params: {node.custom_params}')

The above code displays the pipeline used in our modeling process, showcasing the operations performed and the associated parameters.

Final Thoughts

In this article, we’ve explored FEDOT, an open-source framework for automated machine learning modeling. We’ve discussed its various features and capabilities and even delved into an example of time series modeling using FEDOT. The results obtained from our modeling efforts have been highly satisfactory, demonstrating the power and effectiveness of FEDOT.

If you’re looking to unlock the potential of automated machine learning and enhance your modeling procedures, FEDOT is a framework worth exploring. Its flexibility, support for popular libraries, and ability to handle a wide range of problem types make it a valuable asset in the world of machine learning.