Techbyte-2019 | Insight

Machine Learning Explained

Faculty Mentor:
Dr.Deepti Sharma

Student Name:
Kapil Goel (MCA – I)

1.INTRODUCTION

Machine learning is about extracting knowledge from data. It is a research field at the intersection of computer science, statistics and artificial intelligence. Machine learning is enabling computers to tackle tasks that have, until now, only been carried out by people. It uses the data to detect patterns in a dataset and adjust program actions accordingly and focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data. It enables computers to find hidden insights using iterative algorithms without being explicitly programmed.

2.DIFFERENCE IN MACHINE LEARNING AND TRADITIONAL PROGRAMMING

Traditional Programming: In Traditional programming, we have to code step by step instructions to perform a task.
Machine Learning: In a machine learning system, the system learns from the data itself. This technique is more appropriate if the rules necessary to solve the problem at hand are too many and too complex to code using traditional approaches.

3.MACHINE LEARNING STEPS

Step 1: Collecting Data: The first step is data collection and this stage involves the collection of all relevant data from various sources.
Step 2: Data Wrangling: Now the Second step after collecting all the data is data wrangling which is the process of cleaning and converting the raw data into a format that allows convenient consumption.
Step 3: Analyze Data: Now after the data have been cleaned and converted into a particular format, the data is analyzed to select and filter the data required to prepare the model because not all the data is required for a particular model. We have to select certain features.
Step 4: Train Algorithm: Now after selecting the features, the algorithm is trained on the training data set through which the algorithm understands the pattern and the rules which govern the data.
Step 5: Test Algorithm: After this, the testing dataset determines the accuracy of the model. If the speed and the accuracy of the model is acceptable, then that model should be deployed in the real system.
Step 6: Deployment: The model is deployed based upon its performance. The model is updated and improved and if there’s a dip in the performance, the model is retrained.

4.TYPES OF MACHINE LEARNING

SUPERVISED LEARNING
TYPES OF PROBLEMS THAT ARE SOLVED USING SUPERVISED LEARNING
Classification: Classification deals with a label, class or any discrete values whereas regression deals with a continuous quantity. Suppose we can classify emails into spam or non-spam emails. In this type of problem, we can use classification.
Regression: Regression is used to predict a continuous quantity. Now a continuous variable is a variable that has an infinite number of possibilities. For example, a person’s weight. So, someone could be 180 pounds or they could be 180.10 pounds or 180.110 pounds.
TYPES OF DATA
TRAINING
AIM
APPROACH FOLLOWED
FEEDBACK
ALGORITHMS
APPLICATIONS
UNSUPERVISED LEARNING
TYPES OF PROBLEMS THAT ARE SOLVED USING UNSUPERVISED LEARNING
Association Problems: Association problems involves finding co-occurrences and discovering patterns in data, and so on. Association problems mainly involve discovering associations between items that co-occur frequently.
Clustering Problems: Clustering is widely used in cases wherein we are provided a list of customers and some information about them and we are required to cluster these customers based on their similarity. Clustering is very useful for targeted marketing.
TYPES OF DATA
TRAINING
AIM
APPROACH FOLLOWED
FEEDBACK
ALGORITHMS
APPLICATIONS
REINFORCEMENT LEARNING
TYPES OF PROBLEMS THAT ARE SOLVED USING REINFORCEMENT LEARNING
TYPES OF DATA
TRAINING
AIM
APPROACH FOLLOWED
FEEDBACK
ALGORITHMS
APPLICATIONS

LINEAR REGRESSION: Linear Regression is a machine learning algorithm which is based on supervised learning. It performs a regression task. Linear regression attempts to model the relationship between two variables by setting a linear equation to the observed data. One variable is considered to be an independent variable while the other is considered to be a dependent variable.
LOGISTIC REGRESSION: Logistic regression predictions are discrete values (i.e., whether a student failed/passed) after applying a transformation function whereas linear regression predictions are continuous values (i.e., height in cm). Logistic regression is suitable for binary classification. It is used in predicting whether an event will occur or not, in cases where there are only two possibilities: that it occurs (denoted as 1) or that it does not (0). So, if we were predicting whether a student has passed, we would label passed students using the value of 1 in our data set.
NAÏVE BAYES: We use Bayes’s Theorem to calculate the probability that an event will occur, given that another event has already occurred. We use Bayes’s Theorem as follows:

where A and B are events and P(B) ≠ 0

Some of real-world examples are:

Marking an email as spam or not spam
Classifying a news article about sports, technology or politics
K-MEANS: K-means clustering is a very popular and simple unsupervised machine learning algorithm. K-means Clustering is used for relationship discovery and understanding the underlying structure of data. It is useful for unlabeled data as a first round of analysis. It makes no assumptions on data. It is manually given a target number of clusters. It utilizes a distance metric and clustering algorithm.
DECISION TREE:A decision tree takes as input an object/situation described by a set of properties and outputs a no/yes "decision". We first describe the representation—the hypothesis space— and then show how to learn a good hypothesis. It is a tree structured classifier, which is tree structured and it has two types of nodes, decision nodes, and leaf nodes. In decision nodes, they specify a choice or a test based on which you can decide which direction you can go. And then, there are leaf nodes. Leaf nodes give the value to be returned if that leaf is reached. Decision trees are mostly used for classification, though it can also be used for regression.
KNN: K Nearest Neighbor (KNN) is a simple algorithm that stores all the available cases and classifies the new data or case based on a similarity measure.
Uses the intuition to classify a new point x
find the most similar training example x'
predict its class y'
Voronoi tessellation
partitions space into regions
boundary: points at the same distance from two different training examples
classification boundary
non-linear, reflects classes well
SUPPORT VECTOR MACHINE (SVM): SVM or support vector machine is a supervised learning algorithm that is mainly used to classifydata into different classes. Unlike most algorithms SVM makes use of a hyperplane which acts as a decision boundary between the various classes. SVM can be used to generate multiple separating hyperplanes so that the data is divided into segments and each of these segments will contain only one kind of data. It is mainly used for classification purposes wherein you want to classify data into two different segments depending on the features of the data.
NEURAL NETWORK: A neural network is a network of neurons, or an artificial neural network, composed of artificial neurons for solving artificial intelligence (AI) problems. The connections of the biological neuron are modelled as weights. An excitatory connection is reflected by a positive weight, while inhibitory connections are reflected by negative values. All inputs are modified by a weight and summed. This activity is known as linear combination. Lastly, an activation function controls the amplitude of the output. These artificial networks are used for adaptive control, predictive modelling, and applications where they can be trained via a dataset.
RANDOM FORESTS: Random Forest or Random Decision Forest is a method that operates by constructing multiple Decision trees during the training phase. The decision of the majority of the trees is chosen by the random forest as the final decision.

Advantages of Random Forests are:

Training time is less.
Runs efficiently on a large database.
For large data, it produces highly accurate predictions.
Random Forest can maintain accuracy when a large proportion of data is missing.
APRIORI ALGORITHM: Apriori is an algorithm for association rule learning and frequent itemset mining over transactional databases. It works by identifying the frequent individual items in the database and extending them to larger and larger item sets until those item sets appear sufficiently often in the database. The frequent itemset determined by Apriori can be used to determine association rules which highlight general trends in the database. This algorithm has applications in domains such as market basket analysis.

6.REFERENCES

[1]https://www.profolus.com/topics/benefits-limitations-of-machine-learning/
[2]https://www.getfilecloud.com/blog/2018/06/top-5-limitations-of-machine-learning-in-an-enterprise-setting/
[3]https://www.edureka.co/blog/what-is-machine-learning
[4]https://en.wikipedia.org/wiki/Machine_learning
[5]https://www.geeksforgeeks.org/machine-learning/