Building and Deploying Machine Learning Pipelines with scikit-learn | by Noor Fatima

On this article, we’ll stroll by means of the method of constructing and deploying machine studying pipelines utilizing the Pipeline class from scikit-learn. We are going to use a dataset from the Titanic competitors as an instance the method.

A machine studying pipeline in scikit-learn is a technique to streamline a sequence of information processing and modeling steps. Pipelines assist make sure that the identical transformations are utilized throughout each coaching and testing, stopping information leakage and making your workflow cleaner and extra reproducible.

We are going to use the Titanic dataset, which comprises details about passengers and whether or not they survived the Titanic catastrophe. The aim is to construct a mannequin that predicts survival based mostly on passenger attributes.

import pandas as pd# Load the dataset
df = pd.read_csv('practice.csv')
print(df.head())

We drop columns that received’t be helpful for prediction.

df.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin'], inplace=True)

Break up the info into coaching and testing units.

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(
df.drop(columns=['Survived']), 
df['Survived'], 
test_size=0.2, 
random_state=42
)

Imputation Transformer

Deal with lacking values.

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputertrf1 = ColumnTransformer([
('impute_age', SimpleImputer(), [2]),  # Impute Age
('impute_embarked', SimpleImputer(technique='most_frequent'), [6])  # Impute Embarked
], the rest='passthrough')

One-Scorching Encoding

Convert categorical variables into numeric.

from sklearn.preprocessing import OneHotEncodertrf2 = ColumnTransformer([
('ohe_sex_embarked', OneHotEncoder(sparse=False, handle_unknown='ignore'), [1, 6])  # One-Scorching Encode Intercourse and Embarked
], the rest='passthrough')

Scaling

Scale the options to a given vary.

from sklearn.preprocessing import MinMaxScalertrf3 = ColumnTransformer([
('scale', MinMaxScaler(), slice(0, 10))  # Scale all features
])

Function Choice

Choose a very powerful options.

from sklearn.feature_selection import SelectKBest, chi2trf4 = SelectKBest(score_func=chi2, okay=8)

Use a choice tree classifier.

from sklearn.tree import DecisionTreeClassifiertrf5 = DecisionTreeClassifier()

Mix all transformations and the mannequin right into a single pipeline.

from sklearn.pipeline import Pipelinepipe = Pipeline([
('trf1', trf1),
('trf2', trf2),
('trf3', trf3),
('trf4', trf4),
('trf5', trf5)
])
# Practice the pipeline
pipe.match(X_train, y_train)

Consider the mannequin on the check information.

from sklearn.metrics import accuracy_scorey_pred = pipe.predict(X_test)
print(accuracy_score(y_test, y_pred))

Use cross-validation to verify the mannequin’s robustness.

from sklearn.model_selection import cross_val_scoreprint(cross_val_score(pipe, X_train, y_train, cv=5, scoring='accuracy').imply())

Use grid search to search out the most effective hyperparameters.

from sklearn.model_selection import GridSearchCVparams = {
'trf5__max_depth': [1, 2, 3, 4, 5, None]
}
grid = GridSearchCV(pipe, params, cv=5, scoring='accuracy')
grid.match(X_train, y_train)
print(grid.best_score_)
print(grid.best_params_)

Export the educated pipeline to a file for later use.

import picklepickle.dump(pipe, open('pipe.pkl', 'wb'))

Load the pipeline and use it for predictions.

pipe = pickle.load(open('pipe.pkl', 'rb'))# Instance person enter
test_input = np.array([2, 'male', 31.0, 0, 0, 10.5, 'S'], dtype=object).reshape(1, 7)
print(pipe.predict(test_input))

Pipelines in scikit-learn present a strong technique to handle your complete machine studying workflow, from preprocessing to mannequin coaching and analysis. By following this information, you may construct strong and reproducible pipelines on your personal machine studying initiatives.

Source link

The Why Behind the What: Exploring Causal AI | by Vaibhav Ramrakhyani | Jul, 2024

What do the best-performing SP500 stocks have in common? | by Stephen McBride | Jul, 2024

Developing a Machine Learning Model: A Step-by-Step Guide | by OneLot Blogs | Jul, 2024

Say ‘Hi’ to The Acolyte’s New Little Guy

‘Metroid Prime 4’ Gets a Release Date After Years of Troubled Development

Nvidia, with $3.34 Trillion Market Cap, Becomes Most Valuable Company

Netflix House will open two locations in Texas and Pennsylvania in 2025

CoinPoker Up 80x During Bear Market – Could It Be the Best Crypto Gaming Platform? ClayBro’s Video Reviews

Most Popular

Say ‘Hi’ to The Acolyte’s New Little Guy

‘Metroid Prime 4’ Gets a Release Date After Years of Troubled Development

Nvidia, with $3.34 Trillion Market Cap, Becomes Most Valuable Company

Our Picks

Demystifying Feature Engineering: Transforming Your Data for Better Models | by Noor Fatima | Jun, 2024

Meta is Pulling the Plug on Quest 1 Security Patches Next Month

Chinese automakers urge retaliatory tariffs on European gas-powered cars after the EU’s EV tariffs

Building and Deploying Machine Learning Pipelines with scikit-learn | by Noor Fatima | Jun, 2024

Imputation Transformer

One-Scorching Encoding

Scaling

Function Choice

Related Posts