Stochastic gradient descent

8/16/2023

This project is a classification problem and so the SGDClassifier is used from sklearn.linear_model. Dummy code categorical variables using the OneHotEncoder function from sklearn.preprocessing making sure to use the parameter drop = first so that if there are 4 categories in a variable, for example, only 3 categories are retained and the 4th category is inferred by all others being 0.Standardise continuous variables using the StandardScaler function from sklearn.preprocessing.Feature engineering – after some initial data exploration, new features can be created or existing features can be modified.

To do this, the train_test_split function is used from sklearn.model_selection (making sure that shuffle = True as this is a requirement of SGD)

Split the data into training and test sets (typically in an 80/20 ratio).
Clean up any funnies in the data – for example, in this project the value -1 or 9 are used to represent null or unknown values and so these should be cleaned up and replaced with null.These are important steps for data preparation/preprocessing: Preprocessing the data for classification Therefore, the steps and performance measures chosen here are best suited to modelling a binary response variable.Įach aspect of the project is broken down below. This project is focused around applying SGD to a classification problem. How to Implement Stochastic Gradient Descent in Python This algorithm is faster than Batch GD but still suffers from the same drawback of potentially getting stuck in local minima. Mini-Batch Gradient Descent: the algorithm uses small subsets (or batches) of the training data at each step.Some of the erratic nature of the algorithm can be solved by using a learning schedule that slowly reduces the learning rate so that it can settle on a more accurate solution. That being said, this algorithm is much more likely to find the global maximum. Instead, it approaches the minimum on average. However, it is erratic and may select points from all over the place, never stopping at a truly accurate solution. This algorithm is very fast, only needing to perform calculations on a single point at a time. Stochastic Gradient Descent: a single, random observation in the training data is selected at each step.There is also a possibility that this algorithm may get stuck in local minima. Thus this algorithm is very slow for large datasets but scales well for large numbers of features. Batch Gradient Descent: the entire training dataset is used at every step.Which one you choose depends on the amount of data you have and the type of model you are fitting. There are 3 types of Gradient Descent implimentations: batch, mini-batch or stochastic. It is good to keep this in mind when training a model, adjusting the learning strategy according to the current problem. Some datasets have an irregular shape and the cost function could contain both local minima and global minima. The learning rate hyperparameter determines the size of each step – too small and the model takes too long, too big and the model may not converge to the global minimum. This is referred to as the global optimum or global minimum. Eventually, after a number of steps, the algorithm will reach a point where the gradient is 0 and stops. The gradient at this point is -3 so the algorithm steps to the right, toward this negative gradient, and calculates the gradient at the next step. The initial random point that starts the algorithm off is at (x = -2, y = 4). To do that, it calculates the gradient at an initial random point and moves to another point in the direction of descending gradient until it reaches a point where the gradient is zero.Īs an example, see figure of a parabola below. Gradient descent seeks to find the global minimum of a function. The word ‘descent’ gives the purpose of SGD away – to minimise a cost (or loss) function.įor a better understanding of the underlying principle of GD, let’s consider an example. It is a method that allow us to efficiently train a machine learning model on large amounts of data. Stochastic gradient descent is an optimisation technique, and not a machine learning model.

0 Comments

Stochastic gradient descent

Leave a Reply.

Author

Archives

Categories