I'm sure you all agreethat machine learning is one of the hottest Trend in today'smarket right Gartner predicts that by 2022 therewould be at least 40% of new application developmentproject going on in the market that would be requiringmachine learning co-developers on their team

It's expected that these projectwill generate a revenue of around three pointnine trillion dollar, isn't it cute solooking at the huge? Upcoming demand of machinelearning around the world

We guys at Eureka have come up and designed a well-structuredmachine learning full course for you guys

But before we actuallydrill down over there, let me just introduce myself

Hello all I am Atul from Edureka

And today I'll be guiding you through this entiremachine learning course

Well, this coursehas been designed in a way that you get the most out of it

So we'll slowlyand gradually start with a beginner level and thenmove towards the advanced topic

So without delaying any further, let's start with the agendaof today's Action on machine learningcourse has been segregated into six different modulewill start our first module with introduction tomachine learning here

We'll discuss things

Like what exactlyis machine learning how it differs from artificialintelligence and the planning what is various typesor dead space application and finally we'll endup first module with a basic demo and python

Okay a second modulefocuses on starts and probability herewill cover things like descriptive statistics andinferential statistics to Bob

Rarity Theory and so on our third moduleis unsupervised learning

Well supervised learning is oneof a type of machine learning which focuses mainly on regression andclassification type of problem

It deals with label data setsand the algorithm which are a part of itare linear regression logistic regression Napier'srandom Forest decision tree and so on

Our fourth module ison unsupervised learning

Well this module focusesmainly on dealing with unlabeled data sets and the algorithmwhich are a part

Offered or k-means algorithm and a priori algorithm asa part of fifth module

We have reinforcementlearning here

We are going to discussabout reinforcement learning and depth on also about Q learning algorithmfinally in the end

It's all about to makeyou industry ready

Okay

So here we are going to discussabout three different projects which are basedon supervised learning and unsupervised learning and reinforcement learningfinally in the end

I tell you about someof the skills that you need to becomea machine learnings and Jean

Nia okay, and also Iam discussing about some of the important questions that are asked in amachine-learning interview fine with this we cometo the end of this agenda before you move ahead don't forget to subscribeto a dareka and press the Bell icon to never missany update from us

Hello everyone

This is a toll from Eureka and welcome to today's sessionon what is machine learning

As you know, we are livingin a world of humans and machines humanshave been evolving and learning from the pastexperience since millions of years on the other handthe era of machines and robots have justbegun in today's world

These machines arethe rewards are like they need to be program before they actuallyfollow your instructions

But what if the machinestarted to learn on their own and this is where machine learning comes into picture machinelearning is the core of many futuristic technologyadvancement in our world

And today you cansee various examples or implementation ofmachine learning around us such as Tesla's self-drivingcar Apple Siri, Sophia

I do bot and manymore are there

So what exactlyis machine learning? Well Machine learningis a subfield of artificial intelligence that focuses onthe design of system that can learn fromand make decisions and predictions basedon the experience which is data in the caseof machines machine learning enables computer to act and make data-drivendecisions rather than Being explicitly programmed to carry out a certaintask these programs are designed to learn and improve over time when exposed to new data

Let's move on and discuss one of the biggest confusionof the people in the world

They think that allthe three of them the AI the machine learning andthe Deep learning all are same, you know, what they are wrong

Let me clarify things for you artificial intelligenceis a broader concept of machines being able to carryout tasks in a smarter way

It covers anything which enablesthe computer to be

Have like humans think of afamous Turing test to determine whether a computer iscapable of thinking like a human being or not

If you are talkingto Siri on your phone and you get an answer you'realready very close to it

So this was about the artificialintelligence now coming to the machine learning part

So as I already saidmachine learning is a subset or a current applicationof AI it is based on the idea that we should be ableto give machine the access to data and let them learnfrom done cells

It's a subsetof artificial intelligence

Is that deals with the extractionof pattern from data set? This means that the machinecan not only find the rules for optimal Behavior, but also can adaptto the changes in the world many of the algorithmsinvolved have been known for decades centurieseven thanks to the advances in the computer scienceand parallel Computing

They can now scale upto massive data volumes

So this was about the machinelearning part now coming over to deep learning deep learningis a subset of machine learning where similar machine learning

Tamar used to traindeep neural network

So as to achieve betteraccuracy in those cases where former was not performingup to the mark, right? I hope now you understoodthat machine learning Ai and deep learningall three are different

Okay moving on ahead

Let's see in generalhow a machine learning work

One of the approaches is where the machine learningalgorithm is strained using a labeledor unlabeled training data set to produce a model new input data is introduced tothe machine learning algorithm and it make predictionbased on the model

The prediction isevaluated for accuracy

And if the accuracyis acceptable the machine learning algorithm is deployed

Now if the accuracyis not acceptable the machine learningalgorithm is strained again, and again with an argumenta training data set

This was justin high-level example as they are many more factorand other steps involved in it

Now, let's move onand subcategorize the Machine learning into three differenttypes the supervised learning and unsupervised learningand reinforcement learning and let's see what eachof them are how they work

Work and how each of them is used in the fieldof banking Healthcare retail and other domains

Don't worry

I'll make sure that I use enough examplesand implementation of all three of them to give youa proper understanding of it

So starting withsupervised learning

What is it? So let's seea mathematical definition of supervised learningsupervised learning is where you have input variables Xand an output variable Y and you use an algorithmto learn the mapping function from the input to the output

That is y Affects the goal is to approximatethe mapping function

So well that wheneveryou have a new input data X you could predictthe output variable

That is whyfor that data, right? I think thiswas confusing for you

Let me simplify the definitionof supervised learning so we can rephrasethe understanding of the mathematical definitionas a machine learning method where each instances ofa training data set is composed of different input attribute and an expected outputthe input attributes of a training data set can beof any End of data it can be a pixel of the image

It can be a valueof a data base row or it can even be an audiofrequency histogram right for each input instance and expected output values Associated value can be discreetrepresenting a category or can be a real or continuousvalue in either case

The algorithm learnsthe input pattern that generate theexpected output now once the algorithm is strain, it can be used to predictthe correct output of a never seen input

You can see I imageon your screen right in this image

And see that we are feedingraw inputs as image of Apple to the algorithm as a partof the algorithm

We have a supervisorwho keeps on correcting the machine or who keepson training the machine

It keeps on telling himthat yes, it is a Apple

No, it is not an applethings like that

So this process keeps on repeating until weget a final train model

Once the model is ready

It can easily predictthe correct output of a never seen inputin this slide

You can see that we are giving an imageof a green apple to the machine and the Machine can easilyidentify it as yes, it is an apple and it is givingthe correct result right? Let me make thingsmore clearer to you

Let's discuss anotherexample of it

So in this Slide, the image shows an example of a supervised learning processused to produce a model which is capable of recognizingthe ducks in the image

The training data setis composed of labeled picture of ducks and non Ducks

The result of supervisedlearning process is a predictor model which is capableof associating a label duck

Or not duck to the new imagepresented to the model

Now one strain, the resulting predictivemodel can be deployed to the production environment

You can see a mobile app

For example once deployedit is ready to recognize the new pictures right now

You might be wonderingwhy this category of machine learning is namedas supervised learning

Well, it is calleda supervised learning because the processof an algorithm learning from the training dataset can be thought of as a teacher supervisingthe learning process if we know the correct answers

I will go Rhythmiteratively makes while predicting onthe training data and is corrected bythe teacher the learning stops when the algorithm achieves anacceptable level of performance

Now, let's move on and see some of the popular supervisedlearning algorithm

So we have linearregression random forest and support Vector machines

These are justfor your information

We will discussabout these algorithms in our next video

Now, let's see someof the popular use cases of supervised learning so we have Donna codonor any other speech Automation in your mobilephone trains using your voice and one strain it start workingbased on the training

This is an applicationof supervised learning suppose

You are tellingOK Google call Sam or you say Hey Siri callSam you get an answer to it and action is performed and automaticallya call goes to Sam

So these are just an example of supervised learning nextcomes the weather up based on someof the prior knowledge like when it is sunnythe temperature is high

Fire when it is cloudy humidityis higher any kind of that they predict the parametersfor a given time

So this is also an exampleof supervised learning as we are feeding the datato the machine and telling that whenever it is sunny

The temperature should be higherwhenever it is cloudy

The humidity should be higher

So it's an exampleof supervised learning

Another example isbiometric attendance where you train the machineand after couple of inputs of your biometric identitybeat your thumb your iris or yellow or anything once trained Machine gunvalidate your future input and can identify you next comesin the field of banking sector in banking sector supervised learning is usedto predict the credit worthiness of a credit card holder by building a machinelearning model to look for faulty attributesby providing it with a data on deliquent and non-delinquent customers

Next comes the healthcare sectorin the healthcare sector

It is used to predictthe patient's readmission rates by building a regression model by providing data on the patients treatmentAdministration and readmissions to show variables that best correlatewith readmission

Next comes the retail sectorand Retail sector

It is used toanalyze the product that a customer by together

It does this by buildinga supervised model to identify frequent itemsets and Association rulefrom the transactional data now, lets learn aboutthe next category of machine learning theunsupervised part mathematically unsupervised learning is where you only have Put data X and nocorresponding output variable

The goal for unsupervisedlearning is to model the underlying structure or distribution in the data in order to learnmore about the data

So let me rephrase youthis in simple terms in unsupervised learningapproach the data instances of a training dataset do not have an expected output Associated to them instead unsupervised learning algorithmdetects pattern based on innate characteristics of the input data an exampleof machine learning tasks

Ask that applies unsupervisedlearning is clustering in this task similar datainstances are grouped together in order to identify clustersof data in this slide

You can see that initiallywe have different varieties of fruits as input

Now these set of fruits asinput X are given to the model

Now, what is the modelis trained using unsupervised learning algorithm

The model will create clusterson the basis of its training

It will grip the similar fruitsand make their cluster

Let me make thingsmore clearer to you

Let's take anotherexample of it

So in this Slide the imagebelow shows an example of unsupervised learning processthis algorithm processes an unlabeled training data set and based onthe characteristics

It grips the picture into three different clustersof data despite the ability of grouping similardata into clusters

The algorithm is not capableto add labels to the crow

The algorithm only knows whichdata instances are similar, but it cannot identifythe meaning of this group

So, Now you might be wonderingwhy this category of machine learning is namedas unsupervised learning

So these are called as unsupervised learning becauseunlike supervised learning ever

There are no correct answer and there is no teacheralgorithms are left on their own to discover and present the interestingstructure in the data

Let's move on and see some of the popular unsupervisedlearning algorithm

So we have herek-means apriori algorithm and hierarchical clustering now, let's move on and seesome of the examples of Is learning suppose a friendinvites you to his party and where you meettotally strangers

Now, you will classify themusing unsupervised learning as you don't haveany prior knowledge about them and this classificationcan be done on the basis of gender age groupdressing education qualification or whatever way youmight like now why this learning is differentfrom supervised learning since you didn't useany pasta prior knowledge about the people you kept onclassifying them on the go as they kept on coming youkept on classifying them

Yeah, this categoryof people belong to this group this category of people belongto that group and so on

Okay, let's seeone more example

Let's suppose you have neverseen a football match before and by chance you watcha video on the internet

Now, you can easily classifythe players on the basis of different Criterion, like player wearingthe same kind of Jersey are in one class playerwearing different kind of Jersey aren't different class or you can classifythem on the basis of their playing stylelike the guys are attacker

So he's in one class

He's a Defenderhe's Another class or you can classify them

Whatever Way Youobserve the things so this was also an exampleof unsupervised learning

Let's move on and see how unsupervised learningis used in the sectors of banking Healthcare undertale

So starting at banking sector

So in banking sector itis used to segment customers by behavioral characteristicby surveying prospects and customers to developmultiple segments using clustering andHealthcare sector

It is used to categorize the MRIdata by normal or abnormal

Ages it uses deep learningtechniques to build a model that learns from differentfeatures of images to recognize a different pattern

Next is the retail sectorand Retail sector

It is used to recommendthe products to customer based on their past purchases

It does this by buildinga collaborative filtering model based on the pastpurchases by them

I assume you guys now have a proper idea ofwhat unsupervised learning means if you have any slightest doubt don't hesitate and add yourdoubt to the I'm in section

So let's discuss the third and the last typeof machine learning that is reinforcement learning

So what isreinforcement learning? Well reinforcement learning is a type of machinelearning algorithm which allows software agents and machine to automaticallydetermine the ideal Behavior within a specific contextto maximize its performance

The reinforcement learningis about interaction between two elements the environment andthe learning agent the learning agent leveragesto mechanism namely exploration

And exploitation whenlearning agent acts on trial and error basis, it is termed as exploration and when it acts basedon the knowledge gained from the environment, it is referredto as exploitation

Now this environment rewardsthe agent for correct actions, which is reinforcement signalleveraging the rewards obtain the agent improves its environmentknowledge to select the next action in this image

You can seethat the machine is confused whether it is an appleor it's not an apple then the Sheena's chainusing reinforcement learning

If it makes correct decision

It get rewards point for it and in case of wrong it getsa penalty for that

Once the training is done

Now

The machine can easily identifywhich one of them is an apple

Let's see an example here

We can see that we have an agent who has to judgefrom the environment to find out which of the two isa duck the first task he did is to observethe environment next

We select some actionusing some policy

It seems that the machinehas made a wrong decision

Bye

Choosing a bunny as a duck

So the machine willget penalty for it

For example -50

4 a wrong answer right now

The machine willupdate its policy and this will continue till the machine getsan optimal policy from the next time machine willknow that bunny is not a duck

Let's see some of the use casesof reinforcement learning but before that lets see how Pavlo trained his dogusing reinforcement learning or how he applied the reinforcement methodto train his dog

Babu integrated learning in four stages initiallyPavlo gave me to his dog and in response to the meetthe dog started salivating next what he did he created a sound with the bell for this the dogdid not respond anything in the third part ittried to condition the dog by using the bell and then giving himthe food seeing the food the dog started salivatingeventually a situation came when the dog started salivatingjust after hearing the Bell even if the food was not given to himas the The dog was reinforced that whenever the masterwill ring the bell he will get the food now

Let's move on and see how reinforcement learningis applied in the field of banking Healthcareand Retail sector

So starting withthe banking sector in banking sector reinforcementlearning is used to create a next best offer model for a call centerby building a predictive model that learns over time as user accept or reject offermade by the sales staff fine now in healthcare sector itis used to allocate the scars

Resources to handledifferent type of er cases by buildinga Markov decision process that learns treatment strategiesfor each type of er case next and the last comesin retail sector

So let's see how reinforcement learningis applied to retail sector and Retail sector

It can be used toreduce excess stock with Dynamic pricing by buildinga dynamic pricing model that are just the price based on customer responseto the offers

I hope by now you have attainedsome understanding of what is machine learningand you are ready to move

Move ahead

Welcome to today's topicof discussion on AI versus machine learningversus deep learning

These are the termwhich have confused a lot of people and if youtwo are one among them, let me resolve it for you

Well artificial intelligenceis a broader umbrella under which machine learning and deep learning come youcan also see in the diagram that even deep learning is a subset of machinelearning so you can say that all three of them The AIand machine learning and deep learning are justthe subset of each other

So let's move on and understand how exactly the differfrom each other

So let's startwith artificial intelligence

The term artificial intelligence was first coinedin the year 1956

The concept is pretty old, but it has gainedits popularity recently

But why well, the reason is earlier we hadvery small amount of data the data we had was not enoughto predict the Turret result but now there'sa tremendous increase in the amount of data statistics suggest that by 2020the accumulated volume of data will increase from 4

4 zettabyte stewroughly around 44 zettabytes or 44 trillion jeebies of data along with suchenormous amount of data

Now, we have moreadvanced algorithm and high-end computingpower and storage that can deal with such largeamount of data as a result

It is expected that 70% of The pricewill Implement a i over the next 12 months which is up from 40 percentin 2016 and 51 percent in 2017

Just for your understanding

What does AI well, it's nothing but a technique that enables the machineto act like humans by replicating the behaviorand nature with AI it is possible for machine to learnfrom the experience

The machines are justtheir responses based on new input there by performing human-like tasksartificial intelligence can be and to accomplish specific tasks by processinglarge amount of data and recognizing pattern in them

You can consider that building an artificialintelligence is like Building a Church the first churchtook generations to finish

So most of the workerswere working in it never saw the final outcome those working on it took pridein their craft building bricks and chiseling stone that was going to be placedinto the great structure

So as AI researchers, we should think of ourselvesas humble brick makers was job

It's just study how to build componentsexample Parts is planners or learning algorithmor Etc anything that someday someoneand somewhere will integrate into the intelligent systemssome of the examples of artificial intelligencefrom our day-to-day life are Apple series chess-playingcomputer Tesla self-driving car and many more these examplesare based on deep learning and natural language processing

Well, this was about what is AIand how it gains its hype

So moving on ahead

Let's Gus about machinelearning and see what it is and why it was thewhen introduced well Machine learning cameinto existence in the late 80s and the early 90s, but what were the issueswith the people which made the machine learningcome into existence let us discuss them one by onein the field of Statistics

The problem was how to efficiently trainlarge complex model in the field of computer scienceand artificial intelligence

The problem washow to train more robust version of AI system while inthe case of Neuroscience

Problem faced bythe researchers was how to design operationmodel of the brain

So these were some of the issues which had the largest influenceand led to the existence of the machine learning

Now this machine learningshifted its focus from the symbolic approaches

It had inheritedfrom the AI and move towards the methods and model

It had borrowed from statisticsand probability Theory

So let's proceed and see what exactly ismachine learning

Well Machine learningis a subset of AI which enables thecomputer to act and make data-driven decisionsto carry out a certain task

These programs are algorithmsare designed in a way that they can learnand improve over time when exposed to new data

Let's see an exampleof machine learning

Let's say you wantto create a system which tells the expected weightof a person based on its side

The first thing you dois you collect the data

Let's see there is how your data lookslike now each point on the graph representone data point to start with we can draw a simple lineto predict the weight based on the height for Samplea simple line W equal x minus hundred with Wis waiting kgs and edges hide and centimeter this line canhelp us to make the prediction

Our main goal isto reduce the difference between the estimated valueand the actual value

So in order to achieve it, we try to draw a straight linethat fits through all these different pointsand minimize the error

So our main goal isto minimize the error and make them as small aspossible decreasing the error or the difference betweenthe actual value and estimated

Value increases the performanceof the model further on the more data points

We collect the better

Our model will become wecan also improve our model by adding more variables and creating differentproduction lines for them

Once the line is created

So from the next timeif we feed a new data, for example heightof a person to the model, it would easily predict the datafor you and it will tell you what has predictedweight could be

I hope you gota clear understanding of machine learning

So moving on ahead

Let's learn about deep learningnow what is deep learning? You can consider deep learningmodel as a rocket engine and its fuel isits huge amount of data that we feed tothese algorithms the concept of deep learning is not new, but recently it'shype as increase and deep learningis getting more attention

This field is a particular kindof machine learning that is inspired by the functionality ofour brain cells called neurons which led to the conceptof artificial neural network

It simply takes the data connection between allthe artificial neurons and adjust them accordingto the data pattern

More neurons are added at the size of the data is largeit automatically features learning at multiplelevels of abstraction

Thereby allowing a system to learn complex functionmapping without depending on any specific algorithm

You know, what no oneactually knows what happens inside a neural networkand why it works so well, so currently you can callit as a black box

Let us discuss someof the example of deep learning and understand itin a better way

Let me start within simple example and explain you how things Andat a conceptual level, let us try and understand how you would recognizea square from other shapes

The first thingyou do is you check whether there are four linesassociated with a figure or not simple concept, right? If yes, we further check if they are connectedand closed again a few years

We finally checkwhether it is perpendicular and all its sidesare equal, correct

If everything fulfills

Yes, it is a square

Well, it is nothing buta nested hierarchy of Concepts

What we did here wetook a complex task of identifying a square and this case and brokeninto simpler tasks

Now this deep learningalso does the same thing but at a larger scale, let's take an exampleof machine which recognizes the animal the taskof the machine is to recognize whether the given image isof a cat or a dog

What if we were asked to resolvethe same issue using the concept of machine learningwhat we would do first

We would Definethe features such as check whether the animal haswhiskers or not a check

The animal has pointed ears or not or whether its tailis straight or curved in short

We will Definethe facial features and let the system identify whichfeatures are more important in classifying aparticular animal now when it comes to deep learningit takes this to one step ahead deep learning automaticallyfinds are the feature which are most importantfor classification compare into machine learning where we had to manually giveout that features by now

I guess you have understoodthat AI is the bigger picture and machine learning anddeep learning are it's apart

So let's move on and focus our discussionon machine learning and deep learning the easiestway to understand the difference between the machine learningand deep learning is to know that deep learning is machinelearning more specifically

It is the next evolutionof machine learning

Let's take fewimportant parameter and compare machine learningwith deep learning

So starting withdata dependencies, the most important differencebetween deep learning and machine learning isits performance as the volume of the data getsFrom the below graph

You can see that when the size of the datais small deep learning algorithm doesn't perform that well, but why well, this is because deeplearning algorithm needs a large amount of datato understand it perfectly on the other hand the machinelearning algorithm can easily work with smaller data set fine

Next comes the hardwaredependencies deep learning algorithms are heavily dependenton high-end machines while the machine learningalgorithm can work on low and machines as Well, this is because the requirement of deep learningalgorithm include gpus which is an integral part of its working the Deep learningalgorithm requires gpus as they do a large amount of matrixmultiplication operations, and these operations can only be efficientlyoptimized using a GPU as it is built for this purpose

Only our third parameter will be feature engineering wellfeature engineering is a process of putting the domain knowledgeto reduce the complexity of the data

Make patterns more visibleto learning algorithms

This process is difficultand expensive in terms of time and expertise in caseof machine learning most other features are neededto be identified by an expert and then hand codedas per the domain and the data type

For example, the features can be a pixel value shapestexture position orientation or anything fine the performance of most of the machinelearning algorithm depends on how accurately the featuresare identified and stood where as in case of deep learning algorithms ittry to learn high level features from the data

This is a very distinctive partof deep learning which makes it way ahead of traditional machine learningdeep learning reduces the task of developing new featureextractor for every problem like in the case of CNN algorithm it first tryto learn the low-level features of the image such asedges and lines and then it proceedsto the parts of faces of people and then finally tothe high-level representation of the face

I hope that thingsGetting clearer to you

So let's move on ahead and seethe next parameter

So our next parameter isproblem solving approach when we are solving a problemusing traditional machine learning algorithm

It is generally recommended that we first breakdown the problem into different sub partssolve them individually and then finally combine themto get the desired result

This is how the machine learningalgorithm handles the problem on the other handthe Deep learning algorithm solves the problemfrom end to end

Let's take an example

To understand thissuppose you have a task of multiple object detection

And your task is to identify

What is the object and where itis present in the image

So, let's see and compare

How will you tacklethis issue using the concept of machine learning and deep learning startingwith machine learning in a typical machinelearning approach

You would first dividethe problem into two step first object detectionand then object recognization

First of all, you would use a boundingbox detection algorithm like grab could fight

Sample to scan through the image and find out allthe possible objects

Now, once the objectsare recognized you would use object recognization algorithm, like svm with hogto recognize relevant objects

Now, finally, when you combine the result youwould be able to identify

What is the objectand where it is present in the image on the other handin deep learning approach

You would do the processfrom end to end for example in a yellow net which is a type of deep learningalgorithm you would pass

An image and it would give outthe location along with the name of the object

Now, let's move on toour fifth comparison parameter its execution time

Usually a deep learningalgorithm takes a long time to train this is because there's so many parameter ina deep learning algorithm that makes the training longer than usual the trainingmight even last for two weeks or more than that

If you are trainingcompletely from the scratch, whereas in the caseof machine learning, it relatively takes muchless time to train ranging from a few weeks

Too few Arts

Now

The execution timeis completely reversed when it comes to the testingof data during testing the Deep learning algorithmtakes much less time to run

Whereas if you compare itwith a KNN algorithm, which is a type of machinelearning algorithm the test time increases as the sizeof the data increase last but not the least wehave interpretability as a factor for comparisonof machine learning and deep learning

This fact is the main reasonwhy deep learning is still thought ten timesbefore anyone knew

Uses it in the industry

Let's take an example suppose

We use deep learning to give automated scoring two essaysthe performance it gives and scoring is quiteexcellent and is near to the human performance, but there's an issue with it

It does not reveal white has given that scoreindeed mathematically

It is possible to find out that which node of a deepneural network were activated, but we don't know what the neuronsare supposed to model and what these layers of neuronsare doing collectively

So if To interpret the result on the other hand machinelearning algorithm, like decision tree gives usa crisp rule for void chose and watered chose

So it is particularly easyto interpret the reasoning behind therefore the algorithmslike decision tree and linear or logistic regression are primarily used inindustry for interpretability

Let me summarize things for you machine learninguses algorithm to parse the data learn from the data and make informed decision basedon what it has learned fine

in this deep learning structuresalgorithms in layers to create artificial neural network that can learn and make Intelligent Decisionson their own finally deep learning is a subfieldof machine learning while both fallunder the broad category of artificial intelligencedeep learning is usually what's behind the most human-like artificialintelligence now in early days scientistsused to have a lab notebook to Test progress results and conclusions nowJupiter is a modern-day to that allows data scientists to record the completeanalysis process much in the same way other scientistsuse a lab notebook

Now, the Jupiter product wasoriginally developed as a part of IPython project the iPad and project was used to provideinteractive online access to python over time

It became useful to interact with other data analysis toolssuch as are in the same manner with the split from python the tool crew in in his currentmanifestation of Jupiter

Now IPython isstill an active tool that's available for use

The name Jupiter itself isderived from the combination of Julia Python

And our while Jupiter runs code in many programming languagespython is a requirement for installing the jupyternotebook itself now to download jupyter notebook

There are a few waysin their official website

It is strongly recommendedinstalling Python and Jupiter using Anaconda distribution, which includes pythonDon't know what book and other commonly used packages for scientific Computingas well as data science

Although one can also do so using the pipeinstallation method personally

What I would suggestis downloading an app on a navigator, which is a desktop graphical userinterface included in Anaconda

Now, this allows youto launch application and easily manageconda packages environments and channels without the needto use command line commands

So all you need to do is goto another Corner dot orgy and inside you go

To Anaconda Navigators

So as you can see here, we have the conda installationcode which you're going to use to install itin your particular PC

So either you canuse these installers

So once you downloadthe Anaconda Navigator, it looks something like this

So as you can see here, we have Jupiter lab jupyternotebook you have QT console, which is IPython console

We have spider which issomewhat similar to a studio in terms of python again, we have a studioso we have orange three We have glue isand we have VSC code

Our Focus today would beon this jupyter notebook itself

Now when youlaunch the Navigator, you can see there aremany options available for launching python as well

As our instances Nowby definition are jupyter

Notebook is fundamentally a Json file witha number of annotations

Now, it has three main parts which are the metadataThe Notebook format and the list of cells now you should get yourself acquaintedwith the environment that Jupiter user interfacehas a number of components

So it's important to know what our componentsyou should be using on a daily basis and youshould get acquainted with it

So as you can see here our Focus today will beon the jupyter notebook

So let me just launchedthe Japan and notebook

Now what it does is createsa online python instance for you to use it over the web

So let's launch now as you can see we haveJupiter on the top left as expected and this actsas a button to go to your home pagewhenever you click on this you get backto your particular home paste

Is the dashboard now there arethree tabs displayed with other filesrunning and clusters

Now, what will do iswill understand all of these three and understand what are the importance of these three tabsother file tab shows the list of the current filesin the directory

So as you can see we haveso many files here

Now the running tab presents another screen ofthe currently running processes and the notebooks now thedrop-down list for the terminals and notebooks arepopulated with there

Running numbers

So as you can see inside, we do not haveany running terminals or there no runningnotebooks as of now and the cluster tab presents another screento display the list of clusters available see in thetop right corner of the screen

There are three buttonswhich are upload new and the refresh button

Let me go backso you can see here

We have the upload newand the refresh button

Now the upload buttonis used to add files to The Notebook space and youmay also just drag and drop as you wouldwhen handling files

Similarly, you can drag and drop notebooksinto specific folders as well

Now the menu with the newin the top residents of further manyof text file folders terminal and Python 3

Now, the test file optionis used to add a text file to the current directory Jupiterwill open a new browser window for you for the runningnew text editor

Now, the text enteredis automatically saved and will be displayedin your notebooks files display

Now the folder option what it does iscreates a new folder

With the name Untitled folderand remember all the files and folder names are editable

Now the terminal optionallows you to start and IPython session

The node would optionsavailable will be activated when additional note books areavailable in your environment

The Python 3 option is usedto begin pythons recession interactively in your note

The interface lookslike the following screen shot

Now what you have isfull file editing capabilities for your scriptincluding saving as new file

You also have a complete ID for your python script now wecome to the refresh button

The refresh button is usedto update the display

It's not really necessaryas a display is reactive to any changes inthe underlying file structure

I had a talk withthe files tab item

There is a check box drop down menu and a home buttonas you can see here

We have the checkboxthe drop-down menu and the home button

Now the check box is usedto toggle all the checkboxes in the item list

So as you can see you can selectall of these when either move or either delete allof the file selected, It or what youcan do is select all and deselect some of the files as your wish now the dropdown menu presents a list of choices available, which are the foldersall notebooks running and files to the folder section will select all the foldersin the display and present accountof the folders in the small box

So as you can see here, we have 18 number of foldersnow all the notebooks section will change the countof the number of nodes and provide youwith three option so you can see here

It has selected allthe given notebooks which are a In a number and you get the option to eitherduplicate the current notebook

You need to move it viewit edit it or delete

Now, the writing sectionwill select any running scripts as you can see here

We have zero running scripts and update the countto the number selected

Now the file sectionwill select all the files in the notebook display andupdate the counts accordingly

So if you select the files here, we are seven filesas you can see here

We have seven filessome datasets CSV files and text files nowthe home button

Brings you back to the homescreen of the notebook

So on you to do is clickon the jupyter

Notebook lower

It will bring you back tothe Jupiter notebook dashboard

Now, as you can seeon the left hand side of every item is a checkboxand I can and the items name

The checkbox is used to builda set of files to operate upon and the icon is indicatedof of the type of the item

And in this case, all of the items arefolder here coming down

We have the ring notebooks

And finally we have certainfiles which are the text files and the As we filesnow a typical workflow of any jupyter

Notebook is to firstof all create a notebook for the projector your data analysis

Add your analysis step coding and output and Surroundyour analysis with organization and presentation markdown to communicate and entire storynow interactive notebooks that include widgetsand display modules will then be used by othersby modifying parameters and the data to note the effectsof the changes now if we talk about securityjupyter notebooks are created in order to Be sharedwith other users in many cases over the Internet

However, jupyter notebookcan execute arbitrary code and generate arbitrary code

This can be a problem

If malicious aspectshave been placed in the note Now the defaultsecurity mechanism for Japan or notebooks include raw HTML, which is always sanitizedand check for malicious coding

Another aspect is you cannot runexternal Java scripts

Now the cell contents, especially the HTML andthe JavaScript are not trusted it requires user value

Nation to continue and the output from any cellis not trusted all other HTML or JavaScript is never trusted and clearing the outputwill cause the notebook to become trusted when save now notebookscan also use a security digest to ensure the correct useris modifying the contents

So for that what youneed to do is a digest what it does is takesinto the account the entire contentsof the notebook and a secret which is only known byThe Notebook Creator and this combination ensures that malicious coding isis not going to be added to the notebook so you can add securityto address to notebook using the following commandwhich I have given here

So it's Jupiter the profile what you have selectedand inside you what you need to do is securityand notebook secret

So what you can do isreplace the notebooks secret with your putter secret and that will act as a keyfor the particular notebook

So what you need to dois share that particular key with all your colleagues or whoever you want to sharethat particular notebook with and in that case, it keeps the notebooks

Geode and away fromother malicious coders and all other aspectof Jupiter is configuration

So you can configure someof the display parameters used and presenting notebooks

Now, these aren't configurabledue to the use of product known as code mirror to presentand modify the notebook

So cold mirror water basicallyis it is a JavaScript based editor for the u

s

Within the web pagesand notebooks

So what you do iswhat you do code mirror, so as you can see here code mirror is a versatiletext editor implemented

In JavaScript for the browser

So what it does is allow you to configurethe options for Jupiter

So now let's executesome python code and understand the notebookin A Better Way Jupiter does not interact with your scripts asmuch as it executes your script and request the result

So I think this ishow jupyter notebooks have been extended to otherlanguages besides python as it just takes a script runs it againsta particular language engine and across the outputfrom the engine all the while not Reallyknowing what kind of a script is being executednow the new windows shows and empty cell for you to enterthe python code know what you need to do is under newyou select the Python 3 and what I will do is opena new notebook

Now this notebook is Untitled

So let's give the new work areaand name python code

So as you can see we haverenamed this particular cell now order save option should beon the next to the title as you can see last

Checkpoint a few daysago, unsaved changes

The autosave option isalways on what we do is with an accurate name

We can find the selection and this particularnotebook very easily from The Notebook home page

So if you selectyour browser's Home tab and refresh you will find this new window namedisplayed here again

So if you just goto a notebook home and as you can see, I mentioned it by then quotesand under running

Also, you have the pilotand quotes here

So let's get backto the Particular page or the notebookone thing to note here that it has and does an item iconversus a folder icon though automaticallyassigned extension as you can see here is ipyand be the IPython note and says the item is in a browserin a Jupiter environment

It is marked as running answer is a file by that namein this directory as well

So if you goto your directory, let me go and check it

So as you can seeif you go into the users are you can see we have thein class projects that Python codeslike the series automatically have that particularIPython notebook created in our working environment and the local disk space also

So if you open the IP y +B file in a text editor, you will see basic contextof a Jupiter code as you can see if I'm opening it

The cells are empty

Nothing is there so let's typein some code here

For example, I'm going to putin name equals edgy Rekha

Next what I'm going to dois provide subscribers that equals seven hundred gayand to run this particular cell

What you need to dois click on the run Icon and it will seehere we have one

So this is the first set to beexecuted in the second cell

We enter python code that references the variablesfrom the first cell

So as you can see here, we have friend namedhas strings subscribers

So let me justrun this particular

So as you can see here note

Now that we have an output here that Erica has 700kYouTube subscriber now since more than 700 K nowto know more about Jupiter and other Technologies, what you can do is subscribeto our Channel and get updates on the latesttrending Technologies

So note that Jupiter color codes your python just asdecent editor vote and we have empty braces to the left of each code blocksuch as you can see here

If we execute the cell the results are displayedin line now, it's interesting that Jupiter keeps

The output last generatedin the saved version of the file and it's a save checkpoints

Now, if we were to rerunyour cells using the rerun or the run all the outputwould be generated and c8y autosave now, the cell number is incrementedand as you can see if I rerun this you seethe cell number change from one to three and if I rerun this the Selmawill change from 2 to 4

So what Jupiter does is keepsa track of the latest version of each cell so similarly if you are to closethe browser tab It's the display in the Home tab

You will finda new item we created which is the python codeyour notebook saved autosaved as you can see herein the bracket has autosaved

So if we close thisin the home button, you can see here

We have python codes

So as you can see if we clickthat it opens the same notebook

It has the previously displayed items will be alwaysthere showing the output sweat that we generatedin the last run now that we have seenhow python Works in Jupiter includingthe underlying encoding then how this python

This allows data setor data set Works in Jupiter

So let me createanother new python notebook

So what I'm going to dois name this as pandas

So from here, what we will do is readin last dataset and compute some standardstatistics of data

Now what we are interestedin in seeing how to use the pandas in Jupiter how well the script performs and what informationis stored in the metadata, especially if it'sa large dataset so our Python script accessesthe iris dataset here that's built into oneof the Python packages

Now

All we are looking in to do isto read in slightly large number of items and calculatesome basic operations on the data set

So first of all, what we need to do is from sklearn importthe data set so sklearn is scikit-learn and it isanother library of python

It contains a lot of data setsfor machine learning and all the algorithms which are presentfor machine learning and the data setswhich are there so, So import was successful

So what we're going to donow is pull in the IRS data

What we're going to do is Irisunderscore data set equals and the load on the screen nowthat should do and I'm sorry, it's data set start lower

So so as you can see here, the number hereis considered three now because in the second drawer and we encounteredan error it was data set

He's not data set

So so what we're going to do is grab the firsttwo corner of the data

So let's pretend x equals

If you press the tab,it automatically detects what you're going to writeas Todd datasets dot data

And what we're going to do istake the first two rows comma not to run itfrom your keyboard

All you need to do ispress shift + enter

So next what we're goingto do is calculate some basic statistics

So what we're goingto do is X underscore

Count equals x I'm going to usethe length function and said that we're going to use xdot flat similarly

We going to see X-Men and X Max and the Minour display our results

What we're going to do is youjust play the results now, so as you can see the counter 300the minimum value is 3

8 m/s

And what is 0

4 and the mean is five pointeight four three three three

So let me connect youto the real life and tell you whatall are the things which you can easily do usingthe concepts of machine learning so you can easily getanswer to the questions like which types of houselies in this segment or what is the market valueof this house or is this a male as spam or not spam? Is there any fraud? Well, these are someof the question you could ask to the machine but for getting an answerto these you need some algorithm the machine need to trainon the basis of some algorithm

Okay, but how will youdecide which algorithm to choose and when? Okay

So the best option for us isto explore them one by one

So the first isclassification algorithm where the categoriespredicted using the data if you have some question, like is this person a male or a female or isthis male a Spam or not? Spam then these categoryof question would fall under the classificationalgorithm classification is a supervised learning approach in which the computer programlearns from the input given to it and then uses this learning to classifynew observation some examples of classification problems are speech organizationhandwriting recognized

Shouldn't biometricidentification document classification Etc

So next is the anomalydetection algorithm where you identifythe unusual data point

So what is an anomaly detection

Well, it's a technique that is used toidentify unusual pattern that does not conformto expected Behavior or you can say the outliers

It has many application in business likeintrusion detection, like identifying strangepatterns in the network traffic that could signal a hackor system Health monitoring that is sporting a deadly tumorin the MRI scan or you can even use it for fraud detectioncredit card transaction or to deal with fault detectionin operating environment

So next comesthe clustering algorithm, you can use this clusteringalgorithm to group the data based on some similar condition

Now you can get answerto which type of houses lies in this segment or what typeof customer buys this product

The clustering is a taskof dividing the population or data points intonumber of groups such that the data pointand the same groups are more

Hello to other data pointsin the same group than those in the other groupsin simple words

The aim is to segregategroups with similar trait and assigning them into cluster

Now this clustering is a taskof dividing the population or data points intoa number of groups such that the data points in the X group is more similarto the other data points in the same group rather thanthose in the other group

In other words

The aim is to segregatethe groups with similar traits and assigning theminto different clusters

Let's understand this with an example Suppose you arethe head of a rental store and you wish to understandthe preference of your customer to scale up your business

So is it possible for you to look at the detailof each customer and design a unique business strategyfor each of them? Definitely not right? But what you can do is toCluster all your customer saying to 10 different groups basedon their purchasing habit and you can usea separate strategy for customers in eachof these ten different groups

And this iswhat we call clustering

Next we have regressionalgorithm where the data itself is predicted question

You may ask to this typeof model is like what is the market value of this house or is it going to raintomorrow or not? So regression is one of the mostimportant and broadly used machine learningand statistics tool

It allows you to make predictionfrom data by learning the relationship betweenthe features of your data and some observe continuousvalued response regulation is used in a massivenumber of application

You know, what stock Isisprediction can be done using regression now, you know about differentmachine learning algorithm

How will you decidewhich algorithm to choose and when so let's coverthis part using a demo

So in this demo part what we will do will create sixdifferent machine learning model and pick the best modeland build the confidence such that it has the mostreliable accuracy

So far our demo partwill be using the IRS data set

This data set isquite very famous and is considered one ofthe best small project to start with you can consider this as a hello world data setfor machine learning

So this data set consists of 150 observationof Iris flower

Therefore Columns of measurementof flowers in centimeters the fifth columnbeing the species of the flower observe allthe observed flowers belong to one of the three speciesof Iris setosa Iris virginica and Iris versicolor

Well, this isa good good project because it is sowell to understand the attributes are numeric

So you have to figure outhow to load and handle the data

It is a classification problem

Thereby allowing you to practice with perhaps an easier type ofsupervised learning algorithm

It has only fourattributes and 150 rose

Meaning it is very small and can easily fitinto the memory and even all of the numeric attributesare in same unit and the same scale means you donot require any special scaling or transformationto get started

So let's start coding andas I told earlier for the But I'll be using Anacondawith python 3

0 install on it

So when you install Anaconda how your Navigatorwould look like

So there's my home page ofmy anaconda navigator on this

I'll be usingthe jupyter notebook, which is a web-based interactiveComputing notebook environment, which will help me to write andexecute my python codes on it

So let's hit the launchbutton and execute our jupyter notebook

So as you can see that my jupyter notebookis starting on localhost double eight nine zero

Okay, so there'smy jupyter notebook what I'll do here

I'll select new

book Python 3 Doesmy environment where I can write and execute allmy python codes on it? So let's startby checking the version of the libraries in orderto make this video short and more interactiveand more informative

I've already writtenthe set of code

So let me just copyand paste it down

I'll explain youthen one by one

So let's start by checking the versionof the Python libraries

Okay, so there isthe code let's just copy it copied and let's paste it

Okay first let me summarize things for youwhat we are doing here

We are just checking the version of the differentlibraries starting with python will firstcheck what version of python we are workingon then we'll check what are the versionof sci-fi we are using the numpy matplotlib thenPanda then scikit-learn

Okay

So let's executethe Run button and see what are the variousversions of libraries which we are using it the run

So we are working on Python 3point 6 point 4 PSI by 1

0 now

By 1

1 for matplotlib 2

12pandas 0

22 and scikit-learn or version 0

19

Okay

So these are the version which I'm using ideally yourversion should be more recent or it should matchbut don't worry if you lacka few versions behind as the API is do not changeso quickly everything in this tutorial will verylikely still work for you

Okay, but in case you are getting an error stopand try to fix that error in case you are unable to findthe solution for the error, feel free to reach out at Eurekaeven after the This class

Let me tell you this if you are not able to runthe script properly, you will not be ableto complete this tutorial

Okay, so whenever youget a doubt reach out to a deal-breakerand just resolve it now, everything is workingsmoothly then now is the time to load the data set

So as I said, I'll be using the iris flowerdata set for this tutorial but before loading the data set, let's import all the modulesfunction and the object which we are going to usein this tutorial same I've already writtenthe set of code

So let's just copyand paste them

Let's load all the libraries

So these arethe various libraries which will be usingin our tutorial

So everything should workfine without an error

If you get an error juststop you need to work on your cyber environmentbefore you continue any further

So I guess everythingshould work fine

Let's hit the Runbutton and see

Okay, it worked

So let's now move aheadand load the data

We can load the data direct from the UCI machinelearning repository

First of all, let me tell you we are usingPanda to load the data

Okay

So let's say my URL

Is this so This is My URL for the useyour machine learning repository from where I will bedownloading the data set

Okay

Now what I'll do, I'll specify the nameof each column when loading the data

This will help me laterto explore the data

Okay, so I'll just copyand paste it down

Okay, so I'm defininga variable names which consists ofvarious parameters including sepal length sepalwidth petal length battle with and class

So these are just the nameof column from the data set

Okay

Now let's define the data set

So data set equals Pandadot read underscore CSV inside that we are definingURL and the names that is equal to name

As I already said we'll be usingPanda to load the data

Alright, so we are usingPanda dot read CSV, so we are reading

The CSV file and inside that from where that CSV is comingfrom the URL which you are

So there's my URL

Okay name sequel names

It's just specifying the namesof the various columns in that particular CSV file

Okay

So let's move forwardand execute it

So even our data set is loaded

In case you have some networkissues just go ahead and download the iris data fileinto your working directory and loaded using the same methodbut your make sure that you change the urlto the local name or else you might get an error

Okay

Yeah, our data set is loaded

So let's move aheadand check out data set

Let's see how many columnsor rows we have in our data set

Okay

So let's print the number of rows and columnsin our data set

So our data set isdata set dot shape what this will do

It will just give youthe numbers of total number of rows and 2

Little more of columnor you can say the total number of instances are attributesin your data set fine

So print data set dot shapeaudio getting 150 and 500

So 150 is the total numberof rows in your data set and five is the total numberof columns fine

So moving on ahead

What if I want to seethe sample data set? Okay

So let me just printthe first certain instances of the data set

Okay, so print data set

Head

What I want is the first30 instances fine

This will give me the first30 result of my data set

Okay

So when I hit the Run button what I am getting isthe first 30 result, okay 0 to 29

So this is how my sample data set lookslike sepal length sepal width petal and petal widthand the class, okay

So this is how our dataset looks like now, let's move on and look at the summaryof each attribute

What if I want to find outthe count mean the minimum and the maximum values andsome other percentiles as well

So what should I do then for that print dataset dot described

What did we give let's see

So you can see that all the numbers arethe same scales of similar range between 0 to 8 centimeters, right the mean valuethe standard deviation the minimum valuethe 25 percentile 50 percentile 75 percentile the maximum value allthese values lies in the range between0 to 8 centimeter

Okay

So what we just did iswe just took a summary of each attribute

Now, let's lookat the number of instances that belong to each class

So for that what we'll doprint data set

First of all, so let's print data set and I want to group itGroup by using class and I want the sizeof it size of each class fine, and let's hit the Run

Okay

So what I want to do, I want to printprint out data set

However want to get it

I want it by class

So Group by class

Okay

Now I want the size of each class findthe size of each class

So Group by class dot sizeand skewed the run so you can see that I have 50 instancesof Iris setosa 50 instances of Iris versicolor and 50 instancesof Iris virginica

Okay, all our of data typeinteger of base64 fine

So now we have a basicidea of Data, now, let's move ahead and createsome visualization for it

So for this we are goingto create two different types of plot first would bethe univariate plot and the next would bethe multivariate plot

So we'll be creating univariateplots to better understand about each attribute and the next will be creatingthe multivariate plot to better understand the relationshipbetween different attributes

Okay

So we start withsome univariate plot that is plotof each individual variable

So given that the inputvariables are numeric we can create boxand whiskers plot for it

Okay

So let's move ahead and createa box and whiskers plot so data set Dot Plot

What kind I want it's a box

Okay, I'm do I need a subplot? Yeah, I need subplots for that

So subplots equal to what type of layout do I won't somy layout structure is 2 cross 2 next do I wantto share my coordinates X and Y coordinates

No, I don't want to share it

So share x equal false and even share whythat 2 equals false? Okay

So we have our data setDot Plot kind equal box

My subplots is to lay outto Us too and then what I want to do it, I want to see so Plot showwhatever I created short

Okay, execute it

Not just gives usa much clearer idea about the distributionof the input attribute

Now what if I had giventhe layout to 2 cross 2 instead of that I would have givenit for cross for so what it will resultjust see fine

Everything would be printedin just one single row

Hold on guys area is a doubt

He's asking that whywe're using the sheriff's and share y values

What are these why we haveassigned false values to it? Okay Ariel

So in order toresolve this query, I need to show youwhat will happen if I give True Values to them

Okay, so be with meso share its go

Pull through and share whythat equals true

So let's seewhat result will get

You're getting it the X and y-coordinates are justshared among all the for visualization

Right? So are you can see that the sepal lengthand sepal width has y values ranging from zero pointzero two seven point five which are being shared among both the visualization sois with the petal length

It has shared value between zero pointzero two seven point five

Okay, so that is whyI don't want to share the value of X and Y, so it's just giving usa cluttered visualization

So Aria why I'm doing this

I'm just doing it cause I don't want my Xand Y coordinates To be shared among any visualization

Okay

That is why my share X and shareby value are false

Okay, let's execute it

So this is a prettymuch Clear visualization which gives a clear ideaabout the distribution of the input attribute

Now if you want youcan also create a histogram of each input variableto get a clear idea of the distribution

So let's createa histogram for it

So data set dot his okay

I would need to see it

So plot dot show

Let's see

So there's my histogramand it seems that we have two input variablesthat have a go

And distribution sothis is useful to note as we can use the algorithms that can exploitthis assumption

Okay

So next comesthe multivariate lat now that we have created theunivariate plot to understand about each attribute

Let's move on and lookat the multivariate plot and see the interaction betweenthe different variables

So first, let's lookat the scatter plot of all the attributethis can be helpful to spot structured relationshipbetween input variables

Okay

So let's createa scatter Matrix

So for creating a scatter plot,we need scatter Matrix, and we need to passour data set into It okay

And then what I wantI want to see it

So plot dot show

So this ishow my scatter Matrix looks like it's like that the diagonal groupingof some pear, right? So this suggestsa high correlation and a predictable relationship

All right

This was our multivariate plot

Now, let's move onand evaluate some algorithm that's time to createsome model of the data and estimate the accuracyon the basis of unseen data

Okay

So now we know allabout our data set, right? We know how many instances and attributes arethere in our data set

We know the summaryof each attribute

So I guess we have seen muchabout our data set

Now

Let's move onand create some algorithm and estimate their accuracybased on the Unseen data

Okay

Now what we'll do we'll createsome model of the data and estimate the accuracy basedon the some unseen data

Okay

So for that first of all, let's create avalidation data set

What is the validation dataset validation data set is your training data set that will be using itto trainer model fine

All right

So how will createa validation data set for creatinga validation data set? What we are going to do is weare going to split our data set into two point

Okay

So the very first thingwe'll do is to create a validation data set

So why do we even needa validation data set? So we need a validationdata set know that the model wecreated is any good later

What we'll do we'll usethe statistical method to estimate the accuracyof the model that we create on the Unseen data

We also want a more concreteestimate of the accuracy of the best model on unseen data by evaluating iton the actual unseen data

Okay confused

Let me simplify this for you

What we'll do we'llsplit the loaded data into two parts the first80 percent of the data

User to train our modeland the rest 20% will hold back as the validation data set that will use it to verifyour trained model

Okay fine

So let's define an array

This is my ra water itwill consist of will consist of all the valuesfrom the data set

So data set dot values

Okay next

I'll Define a variable X which will consistof all the column from the array from 0to 4 starting from 0 to 4 and the next variable Y which would consist ofof the array starting from this

So first of all, we will Define a variable Xthat will consist of the values in the array startingfrom the beginning 0 Del for okay

So these are the columnwhich will include in the X variable and for a y variable I'll Defineit as a class or the output

So what I need, I just need the fourth columnthat is my class column

So I'll start itfrom the beginning and I just wantthe fourth column

Okay now I'll Definethe my validation size

Validation underscore sighs, I'll Define it as 0

20and our use a seed I Define CD equals 6

So this method seed setsthe integers starting value used in generating random number

Okay, I'll Define the valueof C R equals x

I'll tell you what isthe importance of that later on? Okay

So let me Define firstfew variables such as X underscore train testwhy underscore train and why underscore test Okay, so What do you want to dois Select some model

Okay, so moduleunderscore selection

But before doing that what we have to dois split our training data set into two halves

Okay, so dot train underscoretest underscore split what you want to splitis a value of X and Y

Okay and my test size is equalsto validation size, which is a 0

20 correctand my random state

Is equal to seedso what the city is doing? It's helping me to keep the sameRandomness in the training and testing data set fine

So let's execute it and seewhat is our result

It's executed next

We'll create a testharness for this

We'll use 10-foldcross-validation to estimate the accuracy

So what it will do itwill split a data set into 10 parts craneon the nine part and test on the one part and this will repeatfor all combination of train and test pilots

Okay

So for that, let's define again my CDthat was six already Define and scoringequals accuracy fine

So we are using the metric ofaccuracy to evaluate the model

So what is this? This is a ratio of numberof correctly predicted instances divided by the total numberof instances in the data set x hundred giving apercentage example

It's 98% accurate or 99%accurate things like that

Okay, so we'll beIn the scoring variable when we run the build and evaluate each modelin the next step

The next part is buildingmodel till now

We don't know which algorithmwould be good for this problem or what configuration to use

So let's begin withsix different algorithm

I'll be using logistic regression lineardiscriminant analysis, k-nearest neighborclassification and regression trees neighbor buys

And what Vector machine well these algorithms chime using isa good mixture of simple linear or non-linear algorithmsin simple linear switch

Included the logistic regression and the linear discriminantanalysis or the nonlinear part which included the KNNalgorithm the card algorithm that the neighbor buysand the support Vector machines

Okay

So we resetthe random number seed before each runto ensure that evaluation of each algorithm is performed using exactlythe same data spreads

It ensures the resultare directly comparable

Okay, so, let mejust copy and paste it

Okay

So what we're doing here, we are buildingfive different types of model

We are buildinglogistic regression linear discriminant analysis, k-nearest neighbor decisiontree ghajini buys and the support Vector machine

Okay next what we'll do we'llevaluate model in each turn

Okay

So what is this? So we have six different model and accuracy estimationfor each one of them now we need to comparethe model to each other and select the mostaccurate of them all

So running the script wesaw the following result so we can see someof the results on the screen

What is It is just the accuracy score usingdifferent set of algorithms

Okay, when we are usinglogistic regression, what is the accuracy rate when we are usinglinear discriminant algorithm? What is the accuracyand so-and-so? Okay

So from the output with seems that LD algorithm wasthe most accurate model that we tested now, we want to get an ideaof the accuracy of the model on our validation setor the testing data set

So this will give usan independent final check on the accuracyof the best model

It is always valuableto keep our testing data set for just in case youmade a our overfitting to the testing data set or you made a data leakboth will result in an overly optimistic result

Okay, you can run the ldo modeldirectly on the validation set and summarize the result asa final score a confusion Matrix and a classification statisticsand probability are essential because these disciplesform the basic Foundation of all machine learningalgorithms deep learning

Social intelligenceand data science, in fact mathematicsand probability is behind everything around usfrom shapes patterns and colors to the count of petals in a flowermathematics is embedded in each and everyaspect of our lives

So I'm going to go aheadand discuss the agenda for today with youall we're going to begin the session by understanding what is data after that

We'll move on and lookat the different categories of data like quantitativeand Qualitative data, then we'll discuss whatexactly statistics is the basic terminologies instatistics and a couple of sampling techniques

Once we're done with that

We'll discuss a differenttypes of Statistics which involve descriptiveand inferential statistics

Then in the next session, we will mainly be focusingon descriptive statistics here will understandthe different measures of center measuresof spread Information Gain and entropy will also understand all of these measureswith the help of a user

And finally, we'll discuss whatexactly a confusion Matrix is

Once we've covered the entire descriptivestatistics module will discuss the probability module here will understandwhat exactly probability is the differentterminologies in probability

We will also study the differentprobability distributions, then we'll discuss the typesof probability which include marginal probability jointand conditional probability

Then we move on and discuss a use casewherein we will see examples that show us how the different typesof probability work and to betterunderstand Bayes theorem

We look at a small example

Also, I forgot to mention that at the end of thedescriptive statistics module will be running a small demoin the our language

So for those of youwho don't know much about our I'll be explainingevery line in depth, but if you want to havea more in-depth understanding about our I'll leavea couple of blocks

And a couple of videosin the description box you all can definitelycheck out that content

Now after we've completed theprobability module will discuss the inferential statisticsmodule will start this module by understanding what is pointestimation will discuss what is confidence intervaland how you can estimate the confidence interval willalso discuss margin of error and will understand allof these concepts by looking at a small use case

We finally end the inferentialReal statistic module by looking at what hypothesistesting is hypothesis

Testing is a very important partof inferential statistics

So we'll end the sessionby looking at a use case that discusses howhypothesis testing works and to sum everything up

We'll look at a demo that explains howinferential statistics works

Right? So guys, there'sa lot to cover today

So let's move ahead and takea look at our first topic which is what is data

Now, this isa quite simple question if I ask any of Youwhat is data? You'll see that it'sa set of numbers or some sort of documents that have stored in my computernow data is actually everything

All right, look around you thereis data everywhere each click on your phone generatesmore data than you know, now this generated dataprovides insights for analysis and helps us makeBetter Business decisions

This is why data isso important to give you a formal definition data refersto facts and statistics

Collected togetherfor reference or analysis

All right

This is the definitionof data in terms of statistics and probability

So as we know datacan be collected it can be measured and analyzed it can be visualized byusing statistical models and graphs now data is dividedinto two major subcategories

Alright, so first wehave qualitative data and quantitative data

These are the twodifferent types of data under qualitative data

I'll be have nominaland ordinal data and under quantitative data

We have discreteand continuous data

Now, let's focuson qualitative data

Now this type of data deals withcharacteristics and descriptors that can't be easily measured but can be observed subjectively now qualitative datais further divided into nominal and ordinal data

So nominal data isany sort of data that doesn't haveany order or ranking? Okay

An example of nominaldata is gender

Now

There is no ranking in gender

There's only male femaleor other right? There is no one two, three four or any sortof ordering in gender race is another example of nominal data

Now ordinal data is basically anordered series of information

Okay, let's saythat you went to a restaurant

Okay

Your information is storedin the form of customer ID

All right

So basically you are representedwith a customer ID

Now you would have ratedtheir service as either good or average

All right, that'show no ordinal data is and similarly they'll havea record of other customers who visit the restaurantalong with their ratings

All right

So any data which hassome sort of sequence or some sort of orderto it is known as ordinal data

All right, so guys, this is pretty simpleto understand now, let's move on and lookat quantitative data

So quantitative databasically these He's with numbers and things

Okay, you can understand that by the word quantitativeitself quantitative is basically quantity

Right Saudis with numbersa deals with anything that you can measureobjectively, right? So there are two types of quantitative data there isdiscrete and continuous data now discrete data is alsoknown as categorical data and it can hold a finite numberof possible values

Now, the number of studentsin a class is a finite Number

All right, you can'thave infinite number of students in a class

Let's say in your fifth grade

There were a hundred studentsin your class

All right, there weren'tinfinite number but there was a definite finite numberof students in your class

Okay, that's discrete data

Next

We have continuous data

Now this type of datacan hold infinite number of possible values

Okay

So when you say weightof a person is an example of continuous data what I mean to see ismy weight can be 50 kgs or it Can be 50

1 kgs or it can be 50

00 one kgsor 50

000 one or is 50

0 2 3 and so on right? There are infinite numberof possible values, right? So this is what I meanby continuous data

All right

This is the difference betweendiscrete and continuous data

And also I would like to mentiona few other things over here

Now, there are a coupleof types of variables as well

We have a discrete variable and we have a continuousvariable discrete variable is also known asa categorical variable or and it can hold valuesof different categories

Let's say that you havea variable called message and there are two typesof values that this variable can hold let's say that your messagecan either be a Spam message or a non spam message

Okay, that's when you calla variable as discrete or categorical variable

All right, because itcan hold values that represent differentcategories of data now continuous variablesare basically variables that can store infinite number of values

So the weight of a personcan be denoted as a continuous variable

All right, let's say there isa variable called weight and it can store infinite numberof possible values

That's why we'll call ita continuous variable

So guys basicallyvariable is anything that can store a value right? So if you associate any sortof data with a A table, then it will becomeeither discrete variable or continuous variable

There is also dependent andindependent type of variables

Now, we won't discuss allwith that in depth because that's pretty understandable

I'm sure all of you know, what is independent variableand dependent variable right? Dependent variable isany variable whose value depends on any otherindependent variable? So guys that muchknowledge I expect or if you do have all right

So now let's move on and lookat our next topic which Which is what is statistics now comingto the formal definition of statistics statistics isan area of Applied Mathematics, which is concerned with data collectionanalysis interpretation and presentation now usually when I speak about statisticspeople think statistics is all about analysis but statistics has other pathtoward it has data collection is also part of Statistics datainterpretation presentation

All of this comes into statistics already aregoing to use statistical methods to visualize data to collectdata to interpret data

Alright, so the areaof mathematics deals with understanding how data can be usedto solve complex problems

Okay

Now I'll give youa couple of examples that can be solvedby using statistics

Okay, let's say that your companyhas created a new drug that may cure cancer

How would you conducta test to confirm the As Effectiveness now, even though this soundslike a biology problem

This can besolved with Statistics

All right, you will haveto create a test which can confirmthe effectiveness of the drum or a this is a common problem that can be solvedusing statistics

Let me give youanother example you and a friend are at a baseballgame and out of the blue

He offers you a bet that neither team will hita home run in that game

Should you take the BET? All right here you justdiscuss the probability of I know you'll win or lose

All right, thisis another problem that comes under statistics

Let's look at another example

The latest sales datahas just come in and your boss wantsyou to prepare a report for management on places where the companycould improve its business

What should you look for? And what should younot look for now? This problem involves a lot of data analysis will have tolook at the different variables that are causingyour business to go down or the you have to lookat a few variables

That are increasingthe performance of your models and does growing your business

Alright, so this involvesa lot of data analysis and the basic idea behind data analysis isto use statistical techniques in order to figureout the relationship between different variables or different componentsin your business

Okay

So now let's move on and look at our next topic which is basicterminologies and statistics

Now before you dive deepinto statistics, it is important that you understandthe basic terminologies used in statistics

The two most importantterminologies in statistics are population and Sample

So throughout the statisticscourse or throughout any problem that you're tryingto stall with Statistics

You will comeacross these two words, which is population and SampleNow population is a collection or a set of individualsor objects or events

Events whose propertiesare to be analyzed

Okay

So basically you can referto population as a subject that you're trying to analyzenow a sample is just like the word suggests

It's a subset of the population

So you have to make surethat you choose the sample in such a way that it representsthe entire population

All right

It shouldn't Focus add one partof the population instead

It should representthe entire population

That's how your sampleshould be chosen

So Well chosen samplewill contain most of the information about aparticular population parameter

Now, you must be wonderinghow can one choose a sample that best representsthe entire population now sampling is a statistical method that deals with the selectionof individual observations within a population

So sampling is performed in order to infer statisticalknowledge about a population

All right, if youwant to understand the different statisticsof a population like the mean the median Median the modeor the standard deviation or the variance of a population

Then you're goingto perform sampling

All right, because it's not reasonable foryou to study a large population and find out the mean medianand everything else

So why is samplingperformed you might ask? What is the point of sampling? We can just studythe entire population now guys, think of a scenario where in you're askedto perform a survey about the eating habitsof teenagers in the US

So at present there areover 42 million teens in the US and this number is growing as we are speakingright now, correct

Is it possible to survey eachof these 42 million individuals about their health? Is it possible? Well, it might be possible but this will takeforever to do now

Obviously, it's not it'snot reasonable to go around knocking each door and asking for what doesyour teenage son eat and all of that right? This is not very reasonable

That's Why sampling is used? It's a method wherein a sampleof the population is studied in order to draw inferencesabout the entire population

So it's basicallya shortcut to starting the entire population insteadof taking the entire population and finding outall the solutions

You just going to takea part of the population that represents theentire population and you're going to performall your statistical analysis your inferential statisticson that small sample

All right, and that sample basically herePresents the entire population

All right, so I'm surehave made this clear to you all what is sampleand what is population now? There are two main typesof sampling techniques that are discussed today

We have probability samplingand non-probability sampling now in this video will only be focusing onprobability sampling techniques because non-probability samplingis not within the scope of this video

All right will only discussthe probability part because we're focusing on Statistics andprobability correct

Now again underprobability sampling

We have three different types

We have randomsampling systematic and stratified sampling

All right, and justto mention the different types of non-probability sampling zwi haveno ball Kota judgment and convenience sampling

All right now guysin this session

I'll only befocusing on probability

So let's move on and look at the different typesof probability sampling

So what is Probability sampling

It is a sampling technique in which samplesfrom a large population are chosen by usingthe theory of probability

All right, so thereare three types of probability sampling

All right first we havethe random sampling now in this method each member of the populationhas an equal chance of being selected in the sample

All right

So each and every individualor each and every object in the populationhas an equal chance of being a A part of the sample

That's what randomsampling is all about

Okay, you are randomly goingto select any individual or any object

So this Bay each individual has an equal chanceof being selected

Correct? Next

We have systematic sampling now in systematic samplingevery nth record is chosen from the population to bea part of the sample

All right

Now refer this image that I've shown over hereout of these six groups every Skinned groupis chosen as a sample

Okay

So every second recordis chosen here and this is our systematic sampling works

Okay, you're randomlyselecting the nth record and you're going to addthat to your sample

Next

We have stratified sampling

Now in this type of technique a stratumis used to form samples from a large population

So what is a stratuma stratum is basically a subset of the population that shares atleast one comment

Characteristics so let's say that your population has a mixof both male and female so you can create to straightens out of this one will haveonly the male subset and the other will havethe female subset or a this is what stratum isit is basically a subset of the population that shares at leastone common characteristics

All right in our example,it is gender

So after you've created a stratum you're goingto use random sampling on the stratums and you're goingto choose a final Samba

But so random sampling meaning that all of the individuals in each of the stratumwill have an equal chance of being selectedin the sample, correct

So Guys, these werethe three different types of sampling techniques

Now, let's move on and lookat our next topic which is the differenttypes of Statistics

So after this, we'll be looking at the moreadvanced concepts of Statistics, right so far we discussthe basics of Statistics, which is basically what is statisticsthe different sampling

Techniques and theterminologies and statistics

All right

Now we look at the differenttypes of Statistics

So there are two majortypes of Statistics descriptive statistics and inferential statisticsin today's session

We will be discussingboth of these types of Statistics in depth

All right, we'll alsobe looking at a demo which I'll be runningin the our language in order to makeyou understand what exactly descriptive and inferentialstatistics is so guys, which is going to lookat the 600 don't worry, if you don'thave much knowledge, I'm explaining everythingfrom the basic level

All right, so guys descriptivestatistics is a method which is used to describeand understand the features of specific data set by givinga short summary of the data

Okay, so it is mainly focused upon thecharacteristics of data

It also provides a graphicalsummary of the data now in order to make you understandwhat descriptive statistics is, let's suppose

Pose that you want to gift allyour classmates or t-shirt

So to study the averageshirt size of a student in a classroom

So if you were to usedescriptive statistics to study the average shirt sizeof students in your classroom, then what you would do is youwould record the shirt size of all students in the class and then you would find outthe maximum minimum and average shirt size of the club

Okay

So coming to inferentialstatistics inferential statistics makes Is and predictions abouta population based on the sample of data takenfrom the population? Okay

So in simple words, it generalizes a large data setand it applies probability to draw a conclusion

Okay

So it allows youto infer data parameters based on a statistical modelby using sample data

So if we considerthe same example of finding the average shirt sizeof students in a class in infinite shal statistics, you will take a sample

All set of the class which is basically a few peoplefrom the entire class

All right, you alreadyhave had grouped the class into large medium and small

All right in this methodyou basically build a statistical model and expand it for the entirepopulation in the class

So guys, there was a briefunderstanding of descriptive and inferential statistics

So that's the differencebetween descriptive and inferential nowin the next section, we will go in depthabout descriptive statistics

All right, so, That's a discuss moreabout descriptive statistics

So like I mentioned earlier descriptivestatistics is a method that is used to describeand understand the features of a specific data set by givingshort summaries about the sample and measures of the data

There are two important measuresin descriptive statistics

We have measureof central tendency, which is also known as measure of center and we havemeasures of variability

This is also known asmeasures of spread

Ed so measures of centerinclude mean median and mode now what is measures of center measures of the centerare statistical measures that represent the summaryof a data set? Okay, the three main measuresof center are mean median and mode comingto measures of variability or measures of spread

We have rangeinterquartile range variance and standard deviation

All right

So now let's discuss each of these measuresin a little more

Up starting withthe measures of center

Now

I'm sure all of you know whatthe mean is mean is basically the measure of the averageof all the values in a sample

Okay, so it's basicallythe average of all the values in a sample

How do you measure the mean Ihope all of you know how the main is measured if there are 10 numbers and you want to find the meanof these 10 numbers

All you have to do is you haveto add up all the 10 numbers and you have to divideit by 10 then here represents the Numberof samples in your data set

All right, since wehave 10 numbers, we're going todivide this by 10

All right, this willgive us the average or the mean so to betterunderstand the measures of central tendency

Let's look at an example

Now the data set over here isbasically the cars data set and it contains a few variables

All right, it hassomething known as cars

It has mileage per gallon cylindertype displacement horsepower and roll axle ratio

All right, all of these measuresare related to cars

Okay

So what you're goingto do is you're going to use descriptive analysis and you're going to analyzeeach of the variables in the sample data set for the mean standarddeviation median mode and so on

So let's say that you wantto find out the mean or the average horsepower of the cars amongthe population of cards

Like I mentioned earlier what you'll do is you will checkthe average of all the values

So in this case, we will take the sumof the horizontal

Horsepower of each car and we'll divide thatby the total number of cards

Okay, that's exactly what I've done herein the calculation part

So this hundredand ten basically represents the horsepowerfor the first car

Alright, similarly

I've just added up allthe values of horsepower for each of the cars and I've divided it by 8 now8 is basically the number of cars in our data set

All right, so hundred and threepoint six two five is what our mean is or Averageof horsepower is all right

Now, let's understandwhat median is with an example? Okay

So to Define median medianis basically a measure of the central value of the sample setis called the median

All right, you can seethat it is a middle value

So if we want to findout the center value of the mileage per gallonamong the population of cars first, what we'll do is we'll arrangethe MGP values in ascending or descending order and Choose a middle valueright in this case since we haveeight values, right? We have eight valueswhich is an even entry

So whenever you have evennumber of data points or samples in your data set, then you're goingto take the average of the two middle values

If we had nine values over here

We can easily figureout the middle value and you know choosethat as a median

But since they're even numberof values we're going to take the averageof the two middle values

All right, so, Eight and twenty threeare my two middle values and I'm takingthe mean of those 2 and hence I gettwenty two point nine, which is my median

All right

Lastly let's look athow mode is calculated

So what is mode the value that is most recurrent in the sample set is known asmode or basically the value that occurs most often

Okay, that is known as mode

So let's say that we want to find outthe most common type of cylinder among the population of cards all we have to Dois we will check the value which is repeatedthe most number of times here

We can see that the cylinderscome in two types

We have cylinder of Type4 and cylinder of type 6, right? So take a look at the data set

You can see that the mostrecurring value is 6 right

We have one two,three four and five

We have five sixand we have one two, three

Yeah, we have three four typesof lenders and 5/6

Cylinders

So basically we have three four type cylinders and wehave five six type cylinders

All right

So our mode is goingto be 6 since 6 is more recurrent than 4 so guys those were the measures of the center or the measuresof central tendency

Now, let's move on and lookat the measures of the spread

All right

Now, what is the measureof spread a measure of spread? Sometimes also called as measure of dispersion is usedto describe the The variability in a sample or population

Okay, you can thinkof it as some sort of deviation in the sample

All right

So you measure this with the help of the differentmeasure of spreads

We have rangeinterquartile range variance and standard deviation

Now range is prettyself-explanatory, right? It is the given measure ofhow spread apart the values in a data set arethe range can be calculated as shown in this formula

So you're basically goingto The maximum value in your data set from the minimum valuein your data set

That's how you calculatethe range of the data

Alright, next wehave interquartile range

So before we discussinterquartile range, let's understand

What a quartile is red

So quartiles basically tell usabout the spread of a data set by breaking the data setinto different quarters

Okay, just like how the median breaksthe data into two parts

The quartile will break it

In two different quarters, so to better understandhow quartile and interquartile are calculated

Let's look at a small example

Now this data set basicallyrepresents the marks of hundred studentsordered from the lowest to the highest scores red

So the quartiles lie in the following rangesthe first quartile, which is also known as q1 it lies between the 25thand 26th observation

All right

So if you look at thisI've highlighted the 25th and the Six observation

So how you can calculateQ 1 or first quartile is by taking the averageof these two values

Alright, since boththe values are 45 when you add them upand divide them by two you'll still get 45 nowthe second quartile or Q 2 is between the 50thand the fifty first observation

So you're going to takethe average of 58 and 59 and you will geta value of 58

5 now, this is my second quarterthe third quartile Q3

Is between the 75th and the 76th observation hereagain will take the average of the two values which is the 75th valueand the 76 value right and you'll get a value of 71

All right, so guysthis is exactly how you calculatethe different quarters

Now, let's look atwhat is interquartile range

So IQR or the interquartilerange is a measure of variability based on dividinga data set into quartiles

Now, the interquartilerange is Calculated by subtracting the q1 from Q3

So basically Q3 minus q1 is your IQ are soyour IQR is your Q3 minus q1? All right

Now this is how each of the quartiles are each coretile represents a quarter, which is 25% All right

So guys, I hope allof you are clear with the interquartile rangeand what our quartiles now, let's look atvariance covariance is basically a measurethat shows how much a I'm variable the firstfrom its expected value

Okay

It's basically the variancein any variable now variance can be calculated by usingthis formula right here x basically representsany data point in your data set n is the total numberof data points in your data set and X bar is basicallythe mean of data points

All right

This is how you calculatevariance variance is basically a Computingthe squares of deviations

Okay

That's why it sayss Square there now

Look at what is deviationdeviation is just the difference between each elementfrom the mean

Okay, so it can be calculatedby using this simple formula where X I basicallyrepresents a data point and mu is the meanof the population or add this is exactly how you calculate the deviationNow population variance and Sample varianceare very specific to whether you're calculatingthe variance in your population dataset or in your sample data set now the onlydifference between Elation and Sample variance

So the formula for population varianceis pretty explanatory

So X is basicallyeach data point mu is the mean of the population n is the number of samplesin your data set

All right

Now, let's look at sample

Variance Now sample variance is the average of squareddifferences from the mean

All right here xi is any data point or any sample in your dataset X bar is the mean of your sample

All right

It's not the mainof your population

It's the If your sample and if you notice any here isa smaller n is the number of data points in your sample

And this is basicallythe difference between sample and population variance

I hope that is clear coming to standard deviation isthe measure of dispersion of a set of data from its mean

All right, so it's basicallythe deviation from your mean

That's what standard deviationis now to better understand how the measuresof spread are calculated

Let's look at a small use case

So let's say the Daeneryshas 20 dragons

They have the numbersnine to five four and so on as shown on the screen, what you have to do isyou have to work out the standard deviation or at in order to calculatethe standard deviation

You need to know the mean right? So first you're going to findout the mean of your sample set

So how do you calculatethe mean you add all the numbers in your data set and divided by the total numberof samples in your data set so you get a value of 7 here then you I'll clear the rhs ofyour standard deviation formula

All right, so fromeach data point you're going to subtract the meanand you're going to square that

All right

So when you do that, you will getthe following result

You'll basically getthis 425 for 925 and so on so finally youwill just find the mean of the squared differences

All right

So your standard deviation will come up to two pointnine eight three once you take the square root

So guys, this is pretty simple

It's a simplemathematic technique

All you have to do is you haveto substitute the values in the formula

All right

I hope this was clearto all of you

Now let's move on and discuss the next topicwhich is Information Gain and entropy now

This is one of my favoritetopics in statistics

It's very interesting andthis topic is mainly involved in machine learning algorithms, like decision treesand random forest

All right, it's very important for you to know how Information Gain and entropyreally work and why they are so essential in buildingmachine learning models

We focus on the statistic partsof Information Gain and entropy and after that we'll discussAs a use case and see how Information Gain and entropy is usedin decision trees

So for those of you who don't know whata decision tree is it is basically a machinelearning algorithm

You don't have to knowanything about this

I'll explaineverything in depth

So don't worry

Now

Let's look at what exactlyentropy and Information Gain Is As entropy isbasically the measure of any sort of uncertaintythat is present in the data

All right, so it can be measuredby using this formula

So here s is the setof all instances in the data set or although data itemsin the data set n is the different type of classes in your data setPi is the event probability

Now this might seema little confusing to you all but when wego to the use case, you'll understand allof these terms even better

All right cam

To Information Gain as the word suggestsInformation Gain indicates how much informationa particular feature or a particular variable givesus about the final outcome

Okay, it can be measuredby using this formula

So again here hedgeof s is the entropy of the whole data sets SJ is the number of instances with the J value of an attribute a s isthe total number of instances in the data set V is the set of distinct valuesof an attribute a hedge of SJ is the entropyof subsets of instances and hedge of a comma s isthe entropy of an attribute a even thoughthis seems confusing

I'll clear out the confusion

All right, let's discussa small problem statement where we will understand how Information Gain and entropy is used to studythe significance of a model

So like I said Information Gain and entropy are veryimportant statistical measures that let us understand the significance ofa predictive model

Okay to get a moreclear understanding

Let's look at a use case

All right now suppose weare given a problem statement

All right, the statement isthat you have to predict whether a match can be played or Not by studyingthe weather conditions

So the predictor variables hereare outlook humidity wind day is also a predictor variable

The target variableis basically played already

The target variableis the variable that you're trying to protect

Okay

Now the value of the targetvariable will decide whether or not a gamecan be played

All right, so that'swhy The play has two values

It has no and yes, no, meaning that the weatherconditions are not good

And therefore youcannot play the game

Yes, meaning that the weatherconditions are good and suitable for you to play the game

Alright, so that wasa problem statement

I hope the problem statementis clear to all of you now to solve such a problem

We make use of somethingknown as decision trees

So guys thinkof an inverted tree and each branch of the treedenotes some decision

All right, each branch isIs known as the branch node and at each branch node, you're going to takea decision in such a manner that you will get an outcomeat the end of the branch

All right

Now this figurehere basically shows that out of 14 observations9 observations result in a yes, meaning that out of 14 days

The match can be playedonly on nine days

Alright, so here if you see on day 1 Day2 Day 8 day 9 and 11

The Outlook has been Alright, so basically we tryto plaster a data set depending on the Outlook

So when the Outlook is sunny, this is our data setwhen the Outlook is overcast

This is what we have and when the Outlook isthe rain this is what we have

All right, so when it is sunny we havetwo yeses and three nodes

Okay, when theOutlook is overcast

We have all fouras yes has meaning that on the four dayswhen the Outlook was overcast

We can play the game

All right

Now when it comes to drain, we have three yesesand two nodes

All right

So if you notice here, the decision is being made bychoosing the Outlook variable as the root node

Okay

So the root node is basically the topmost nodein a decision tree

Now, what we've done here iswe've created a decision tree that starts withthe Outlook node

All right, then you're splittingthe decision tree further depending on other parameterslike Sunny overcast and rain

All right now like we knowthat Outlook has three values

Sunny overcast and brainso let me explain this in a more in-depth manner

Okay

So what you're doinghere is you're making the decision Tree by choosingthe Outlook variable at the root node

The root note isbasically the topmost node in a decision tree

Now the Outlook node has threebranches coming out from it, which is sunnyovercast and rain

So basically Outlook can have three valueseither it can be sunny

It can be overcastor it can be rainy

Okay now these three valuesUse are assigned to the immediate Branchnodes and for each of these values the possibility of play is equalto yes is calculated

So the sunny and the rain brancheswill give you an impure output

Meaning that there is a mixof yes and no right

There are two yeseshere three nodes here

There are three yeses hereand two nodes over here, but when it comesto the overcast variable, it results in a hundredpercent pure subset

All right, this shows thatthe overcast baby

Will result in a definiteand certain output

This is exactly what entropyis used to measure

All right, it calculatesthe impurity or the uncertainty

Alright, so the lesserthe uncertainty or the entropy of a variable moresignificant is that variable? So when it comes to overcastthere's literally no impurity in the data set

It is a hundred percentpure subset, right? So be want variables like thesein order to build a model

All right now, we don't always Ways get luckyand we don't always find variables that will resultin pure subsets

That's why we havethe measure entropy

So the lesser the entropy ofa particular variable the most significant that variablewill be so in a decision tree

The root node is assignedthe best attribute so that the decision treecan predict the most precise outcome meaningthat on the root note

You should have the mostsignificant variable

All right, that's whywe've chosen Outlook or and now some of you might askme why haven't you chosen overcast Okay is overcastis not a variable

It is a valueof the Outlook variable

All right

That's why we've chosenoutlook here because it has a hundred percent pure subsetwhich is overcast

All right

Now the question in your head ishow do I decide which variable or attribute best Blitzthe data now right now, I know I looked at the data and I told you that, you know here we havea hundred percent pure subset, but what if it'sa more complex problem and you're not ableto understand which variable will best split the data, so guys when it comes to decision treeInformation and gain and entropy will help you understand which variablewill best split the data set

All right, or which variable youhave to assign to the root node because whichever variableis assigned to the dude node

It will best let the data set and it has to be the mostsignificant variable

All right

So how we can do thisis we need to use Information Gain and entropy

So from the totalof the 14 instances that we saw nineof them said yes and 5 of the instances said know that you cannot playon that particular day

All right

So how do youcalculate the entropy? So this is the formulayou just substitute the values in the formula

So when you substitutethe values in the formula, you will get a value of 0

9940

All right

This is the entropy or this is the uncertaintyof the data present in a sample

Now in order to ensure that we choose the best variablefor the root node

Let us look at allthe possible combinations that you can useon the root node

Okay, so these are Allthe possible combinations you can either haveOutlook you can have windy humidity or temperature

Okay, these are four variablesand you can have any one of these variablesas your root node

But how do you select which variable bestfits the root node? That's what we are goingto see by using Information Gain and entropy

So guys now the task at handis to find the information gain for each of these attributes

All right

So for Outlook for windy forhumidity and for temperature, we're going to findout the information

Nation gained right nowa point to remember is that the variable that results in the highestInformation Gain must be chosen because it will give us the mostprecise and output information

All right

So the information gain forattribute windy will calculate that first here

We have six instances of trueand eight instances of false

Okay

So when you substitute allthe values in the formula, you will get a valueof zero point zero four eight

So we get a valueof You 2

0 for it

Now

This is a very low valuefor Information Gain

All right, so the information that you're going to get fromWindy attribute is pretty low

So let's calculatethe information gain of attribute Outlook

All right, so from the totalof 14 instances, we have five instanceswith say Sunny for instances, which are overcastand five instances, which are rainy

All right for Sonny

We have three yeses and to nosefor overcast we have All the for as yes for any we havethree years and two nodes

Okay

So when you calculatethe information gain of the Outlook variablewill get a value of zero point 2 4 7 now comparethis to the information gain of the windy attribute

This value isactually pretty good

Right we have zero point 2 4 7which is a pretty good value for Information Gain

Now, let's lookat the information gain of attribute humiditynow over here

We have seven instanceswith say hi and seven instances with say normal

Right and underthe high Branch node

We have three instanceswith say yes, and the rest for instanceswould say no similarly under the normal Branch

We have one two, three, four, five six seveninstances would say yes and one instance with says no

All right

So when you calculatethe information gain for the humidity variable, you're going to geta value of 0

15 one

Now

This is alsoa pretty decent value, but when you compare itto the Information Gain, Of the attribute Outlook itis less right now

Let's look at the informationgain of attribute temperature

All right, so the temperaturecan hold repeat

So basically the temperatureattribute can hold hot mild and cool

Okay under hot

We have two instanceswith says yes and two instances for no under mild

We have four instances of yesand two instances of no and under col we havethree instances of yes and one instance of no

All right

When you calculate the information gainfor this attribute, you will get a valueof zero point zero to nine, which is again very less

So what you can summarizefrom here is if we look at the information gain for eachof these variable will see that for Outlook

We have the maximum gain

All right, we havezero point two four seven, which is the highestInformation Gain value and you must always choosea variable with the highest Information Gain to splitthe data at the root node

So that's why we assignThe Outlook variable at the root node

All right, so guys

I hope this use case with clearif any of you have doubts

Please keep commentingthose doubts now, let's move on and look at whatexactly a confusion Matrix is the confusion Matrixis the last topic for descriptive statisticsread after this

I'll be running a short demowhere I'll be showing you how you can calculatemean median mode and standard deviation varianceand all of those values by using our okay

So let's talk aboutconfusion Matrix now guys

What is the confusion Matrixnow don't get confused

This is not any complextopic now confusion

Matrix is a matrix that is often used to describethe performance of a model

All right, and thisis specifically used for classification models or a classifier and what it does is itwill calculate the accuracy or it will calculate theperformance of your classifier by comparing your actual resultsand Your predicted results

All right

So this is what it looks like to positive tonegative and all of that

Now this is a little confusing

I'll get back to whatexactly true positive to negative and allof this stands for for now

Let's look at an example andlet's try and understand what exactly confusion Matrix is

So guys have made sure that I put examplesafter each and every topic because it's important you understand the Practicalpart of Statistics

All right statistics hasliterally nothing to do with Theory you needto understand how Calculations are done in statistics

Okay

So here what I've done is nowlet's look at a small use case

Okay, let's consider that your given dataabout a hundred and sixty five patients out of which hundredand five patients have a disease and the remaining 50 patientsdon't have a disease

Okay

So what you're going to do isyou will build a classifier that predicts by using these hundred andsixty five observations

You'll feed all of these 165 observationsto your classifier and it will predictthe output every time a new patients detail is fedto the classifier right now out of these 165 cases

Let's say thatthe classifier predicted

Yes hundred and ten timesand no 55 times

Alright, so yesbasically stands for yes

The person has a diseaseand no stands for know

The person doesnot have a disease

All right, that'spretty self-explanatory

But yeah, so it predictedthat a hundred and ten times

Patient has a diseaseand 55 times that know the patientdoesn't have a disease

However in reality onlyhundred and five patients in the sample havethe disease and 60 patients who do not havethe disease, right? So how do you calculatethe accuracy of your model? You basically buildthe confusion Matrix? All right

This is how the Matrix looks like and basically denotesthe total number of observations that you have which is 165 in our caseactual denotes the actual use in the data set and predicted denotesthe predicted values by the classifier

So the actual value is no here and the predictedvalue is no here

So your classifierwas correctly able to classify 50 cases as no

All right, since bothof these are no so 50 it was correctly ableto classify but 10 of these cases itincorrectly classified meaning that your actual value hereis no but you classifier predicted it as yes or I that's why thisAnd over here similarly it wrongly predicted that five patientsdo not have diseases whereas they actuallydid have diseases and it correctlypredicted hundred patients, which have the disease

All right

I know this isa little bit confusing

But if you lookat these values no, no 50 meaning that it correctlypredicted 50 values No Yes means that itwrongly predicted

Yes for the values are itwas supposed to predict

No

All right

Now what exactly is? Is this true positiveto negative and all of that? I'll tell you whatexactly it is

So true positive are the casesin which we predicted a yes and they do not actuallyhave the disease

All right, so it isbasically this value already predicted a yes here, even though theydid not have the disease

So we have 10 true positivesright similarly true- is we predicted know and they don't havethe disease meaning that this is correct

False positive is be predicted

Yes, but they do notactually have the disease or at this is also known as type1 error falls- is we predicted

No, but they actuallydo not have the disease

So guys basically falls- and true negatives are basicallycorrect classifications

All right

So this was confusion Matrix and I hope this conceptis clear again guys

If you have doubts, please comment your doubtin the comment section

So guys, that wasthe entire descriptive

X module and now wewill discuss about probability

Okay

So before we understandwhat exactly probability is, let me clear out a verycommon misconception people often tend to askme this question

What is the relationship betweenstatistics and probability? So probability and statisticsare related fields

All right

So probability isa mathematical method used for statistical analysis

Therefore we can say that a probability andstatistics are interconnected

Launches of mathematics that deal with analyzing therelative frequency of events

So they're veryinterconnected feels and probability makesuse of statistics and statistics makes use of probability or a they'revery interconnected Fields

So that is the relationshipbetween statistics and probability

Now, let's understandwhat exactly is probability

So probability is the measure of How likely an eventwill occur to be more precise

It is the ratio

Of desired outcometo the total outcomes

Now, the probability of all outcomes always sum upto 1 the probability will always sum up to 1 probabilitycannot go beyond one

Okay

So either your probabilitycan be 0 or it can be 1 or it can be in the formof decimals like 0

5 to or 0

55 or it can bein the form of 0

5 0

7 0

9

But it's valuable always staybetween the range 0 and 1

Okay at the famous example of probability is rollinga dice example

So when you roll a dice you getsix possible outcomes, right? You get one two, three four and five sixphases of a dice now each possibility onlyhas one outcome

So what is the probabilitythat on rolling a dice? You will get 3 the probabilityis 1 by 6, right because there's only one phase which has the number 3 on itout of six phases

There's only one phasewhich has the number three

So the probability of getting 3 when you roll a diceis 1 by 6 similarly, if you want to findthe probability of getting a number 5 again, the probability isgoing to be 1 by 6

All right, so allof this will sum up to 1

All right, so guys this isexactly what probability is

It's a very simple conceptwe all learnt it in 8 standard onwards right now

Let's understand thedifferent terminologies that are related to probability

Now the three terminologiesthat you often come across when We talk about probability

We have something knownas the random experiment

Okay, it's basicallyan experiment or a process for which the outcomes cannot bepredicted with certainty

All right

That's why you use probability

You're going to use probabilityin order to predict the outcome with some sort of certainty sample spaceis the entire possible set of outcomes of a randomexperiment an event is one or more outcomesof an experiment

So if you consider the exampleLove rolling a dice

Now

Let's say that you wantto find out the probability of getting a towhen you roll the dice

Okay

So finding this probabilityis the random experiment the sample space is basicallyyour entire possibility

Okay

So one two, three, four, five six phases are thereand out of that you need to find the probabilityof getting a 2, right

So all the possible outcomes will basically representyour sample space

Okay

So 1 to 6 are all your possibleoutcomes this represents

Sample space event isone or more outcome of an experiment

So in this casemy event is to get a to when I roll a dice, right? So my event is the probabilityof getting a to when I roll a dice

So guys, this is basically whatrandom experiment sample space and event really means alright

Now, let's discussthe different types of events

There are two types of eventsthat you should know about there is disjoint and non disjointevents disjoint events

These are events that do not haveany common outcome

For example, if you draw a single cardfrom a deck of cards, it cannot be a kingand a queen correct

It can either be kingor it can be Queen

Now a non disjointevents are events that have common outcomes

For example, a studentcan get hundred marks in statistics and hundredmarks in probability

All right, and also the outcome of a ball delibirdcan be a no ball and it can be a 6 right

So this is Nondisjoint events are or n

These are very simpleto understand right now

Let's move on and lookat the different types of probability distribution

All right, I'll be discussing the three main probabilitydistribution functions

I'll be talkingabout probability density function normal distributionand Central limit theorem

Okay probability densityfunction also known as PDF is concerned with the relative likelihood fora continuous random variable

To take on a given value

All right

So the PDF gives the probability of a variable that liesbetween the range A and B

So basically what you're tryingto do is you're going to try and find the probabilityof a continuous random variable over a specified range

Okay

Now this graph denotes the PDFof a continuous variable

Now, this graph is also knownas the bell curve right? It's famously calledthe bell curve because of its shape and there arethree important properties that you To know abouta probability density function

Now the graph of a PDFwill be continuous over a range

This is because you'refinding the probability that a continuous variable liesbetween the ranges A and B, right the second property is that the area bounded bythe curve of a density function and the x-axis is equalto 1 basically the area below the curve is equalto 1 all right, because it denotesprobability again the probability cannot arrange

More than one it has to be between 0 and 1 property numberthree is that the probability that our random variableassumes a value between A and B is equal to the area under the PDF boundedby A and B

Okay

Now what this means is that the probability valueis denoted by the area of the graph

All right, so whatever valuethat you get here, which basically oneis the probability that a random variable will liebetween the range A and B

All right, so I hope If you have understood theprobability density function, it's basically the probabilityof finding the value of a continuous random variablebetween the range A and B

All right

Now, let's lookat our next distribution, which is normal distributionnow normal distribution, which is also known as the gaussian distribution isa probability distribution that denotes thesymmetric property of the mean right meaning that the ideabehind this function is that The data near the mean occurs more frequently thanthe data away from the mean

So what it means to say is that the data around the meanrepresents the entire data set

Okay

So if you just takea sample of data around the mean it can representthe entire data set now similar to the probability densityfunction the normal distribution appears as a bell curve

All right

Now when it comesto normal distribution, there are two important factors

All right, we have the meanof the population

And the standard deviation

Okay, so the mean and the graphdetermines the location of the center of the graph, right and the standard deviationdetermines the height of the graph

Okay

So if the standard deviationis large the curve is going to look something like this

All right, it'll beshort and wide and if the standard deviationis small the curve is tall and narrow

All right

So this was itabout normal distribution

Now, let's lookat the central limit theorem

Now the centrallimit theorem states that the sampling distribution of the mean of any independentrandom variable will be normal or nearly normal if the sample sizeis large enough now, that's a little confusing

Okay

Let me break it down foryou now in simple terms if we had a large population and we divided itinto many samples

Then the mean of all the samples from the populationwill be almost equal to the mean of the entirepopulation right meaning that each of the sampleis normally distributed

Right

So if you compare the meanof each of the sample, it will almost be equalto the mean of the population

Right? So this graph basically showsa more clear understanding of the central limit theorem redyou can see each sample here and the mean of each sample is almostalong the same line, right? Okay

So this is exactly what the central limit theoremStates now the accuracy or the resemblance to the normal distributiondepends on two main factors

Right

So the first is the numberof sample points that you consider

All right, and the second is a shapeof the underlying population

Now the shape obviously dependson the standard deviation and the meanof a sample, correct

So guys the centrallimit theorem basically states that each samplewill be normally distributed in such a way that the mean of each samplewill coincide with the mean of the actual population

All right in short terms

That's what centrallimit theorem States

Alright, and thisholds true only for a large

Is it mostlyfor a small data set and there are more deviations when compared to a largedata set is because of the scaling Factor, right? The small is deviation in a small data set will changethe value very drastically, but in a large dataset a small deviation will not matter at all

Now, let's move on and lookat our next topic which is the differenttypes of probability

Now, this is a important topic because most of your problemscan be solved by understanding which type of probabilityshould I use to solve? This problem right? So we have three importanttypes of probability

We have marginal jointand conditional probability

So let's discuss each of these now the probability ofan event occurring unconditioned on any other event is knownas marginal probability or unconditional probability

So let's say that you wantto find the probability that a card drawn is a heart

All right

So if you want tofind the probability that a card drawn isa heart the prophet

B13 by 52 since thereare 52 cards in a deck and there are 13 heartsin a deck of cards

Right and there are52 cards in a turtleneck

So your marginal probabilitywill be 13 by 52

That's aboutmarginal probability

Now, let's understand

What is joint probability

Now joint probability is a measure of two eventshappening at the same time

Okay

Let's say that the twoevents are A and B

So the probability of event A and B occurring isthe dissection of A and B

So for example, if you want tofind the probability that a card is a four and a redthat would be joint probability

All right, becauseyou're finding a card that is 4 and the cardhas to be red in color

So for the answer, this will be 2 by 52because we have 1/2 in heart and we have 1/2and diamonds correct

So both of these are redand color therefore

Our probability is to by 52 and if you further downit Is 1 by 26, right? So this is whatjoint probability is all about moving on

Let's look at what exactlyconditional probability is

So if the probability of an event or an outcomeis based on the occurrence of a previous eventor an outcome, then you call it asa conditional probability

Okay

So the conditional probabilityof an event B is the probability that the event will occur given that an event a hasalready occurred, right? So if a and b aredependent events, then the expression for conditional probabilityis given by this

Now this first termon the left hand side, which is p b of a isbasically the probability of event B occurring given that event ahas already occurred

All right

So like I said, if a and b are dependent events, then this isthe expression but if a and b are independent events, and the expressionfor conditional probability is like this, right? So guys P of A and B of B isobviously the probability of A and probability of B right now

Let's move on now in order to understand conditionalprobability joint probability and marginal probability

Let's look at a small use case

Okay now basicallywe're going to take a data set which examines the salarypackage and training undergone my candidates

Okay

Now in this there are60 candidates without training and forty five candidates, which have enrolled forAdder a curse training

Right

Now the task here is you haveto assess the training with a salary package

Okay, let's look at thisin a little more depth

So in total, we have hundred and fivecandidates out of which 60 of them have not enrolledFrederick has training and 45 of them have enrolledfor a deer Acres training or this is a small survey that was conducted and this is the ratingof the package or the salary that they got right? So if you read through the data, you can understandthere were five candidates

It's without education or training who got a verypoor salary package

Okay

Similarly, there are 30 candidates withEd Eureka training who got a good package, right? So guys basically you'recomparing the salary package of a person depending on whether or not they've enrolledfor a director training, right? This is our data set

Now, let's look at our problemstatement find the probability that a candidatehas undergone a Drake has training quite simple, which type of probabilityis this Is this is marginal probability? Right? So the probability that a candidate has undergoneedger Acres training is obviously 45 dividedby a hundred and five since 45 is the number of candidates withEddie record raining and hundred and five isthe total number of candidates

So you get a valueof approximately 0

4 to all right, that's the probabilityof a candidate that has undergone educatea girl straining next question find the probability that a candidate has attendededger Acres training

Also has good package

Now

This is obviously a jointprobability problem, right? So how do youcalculate this now? Since our table is quiteformatted we can directly find that people who havegotten a good package along with Eddie recordraining or 30, right? So out of hundred andfive people 30 people have education trainingand a good package, right? They specificallyasking for people with Eddie record raining

Remember that night

The question is find theprobability that a gang Today, it has attendededitor Acres training and also has a good package

All right, so we needto consider two factors that is a candidate who's addenda deaderickhas training and who has a good package

So clearly that number is 30 30 divided bytotal number of candidates, which is 1:05, right? So here you getthe answer clearly next

We have find the probability that a candidate hasa good package given that he has notundergone training

Okay

Now this is Earlyconditional probability because here you're defininga condition you're saying that you want to findthe probability of a candidate who has a good package giventhat he's not undergone

Any training, right? The condition is that he'snot undergone any training

All right

So the number of people who have not undergonetraining are 60 and out of that five of themhave got a good package that so that's why this is Phiby 60 and not five by a hundred and five because here they have clearlymentioned has a good pack

Given that he hasnot undergone training

So you have to only consider people who havenot undergone training, right? So any five people who have not undergonetraining have gotten a good package, right? So 5 divided by 60 you geta probability of around 0

08 which is pretty low, right? Okay

So this was all about the different typesof probability now, let's move on and look atour last Topic in probability, which is base theorem

Now guys base

Your room is a veryimportant concept when it comes to statistics and probability

It is majorly usedin knife bias algorithm

Those of you who aren't aware

Now I've bias is a supervisedlearning classification algorithm and it is mainly usedin Gmail spam filtering right? A lot of you might have noticedthat if you open up Gmail, you'll see that you havea folder called spam right or that is carried outthrough machine learning and And the algorithm usethere is knife bias, right? So now let's discuss whatexactly the Bayes theorem is and what it denotesthe bias theorem is used to show the relation betweenone conditional probability and it's inverse

All right

Basically it's nothingbut the probability of an event occurring basedon prior knowledge of conditions that might be relatedto the same event

Okay

So mathematically the bell'stheorem is represented like this right now

Shown in this equation

The left-hand term is referredto as the likelihood ratio which measures the probability of occurrence of eventbe given an event a okay on the left hand side is what is known asthe posterior right is referred to as posterior, which means that the probability of occurrence of a givenan event be right

The second term is referred to as the likelihood Ratio or atthis measures the probability of occurrence of Bgiven an event

A now P of a is alsoknown as the prior which refers to the actualprobability distribution of A and P of B is again, the probability of B, right

This is the bias theorem and in order to betterunderstand the base theorem

Let's look at a small example

Let's say that we havethree bowels we have bow is a bow will be and bouncy

Okay barley containstwo blue balls and for red balls bowel B contains eight blueballs and for red balls

Wow Zeke

Games one blue balland three red balls now if we draw one ballfrom each Bowl, what is the probabilityto draw a blue ball from a bowel a if we know that we drew exactly a totalof two blue balls, right? If you didn'tunderstand the question, please read it I shall pausefor a second or two

Right

So I hope all of youhave understood the question

Okay

Now what I'm going to dois I'm going to draw a blueprint for you and tell you how exactlyto solve the problem

But I want you all to giveme the solution to this problem, right? I'll draw a blueprint

I'll tell youwhat exactly the steps are but I want you to comeup with a solution on your own right the formulais also given to you

Everything is given to you

All you have to do is come upwith the final answer

Right? Let's look at how youcan solve this problem

So first of all, what we will do isLet's consider a all right, let a be the event of picking a blue ballfrom bag in and let X be the event of pickingexactly two blue balls, right because theseare the two events that we need to calculatethe probability of now there are two probabilitiesthat you need to consider here

One is the event of pickinga blue ball from bag a and the other is the event ofpicking exactly two blue balls

Okay

So these two are representedby a and X respectively and so what we want isthe probability of occurrence of event a given X, which means that given that we're pickingexactly two blue balls

What is the probability that we are pickinga blue ball from bag? So by the definitionof conditional probability, this is exactly whatour equation will look like

Correct

This is basically a occurrenceof event a given element X and this isthe probability of a and x and this is the probabilityof X alone, correct

What we need to do is we needto find these two probabilities which is probability of aand X occurring together and probability of X

Okay

This is the entire solution

So how do you find P probability of X this you can doin three ways

So first is white ballfrom a either white from be or read from see now first isto find the probability of x x basically represents the event of picking exactlytwo blue balls

Right

So these are the three waysin which it is possible

So you'll pick one blue ballfrom bowel a and one from bowel be in the second case

You can pick one from a and another blue ballfrom see in the third case

You can pick a blueball from Bagby and a blue ball from bagsy

Right? These are the three waysin which it is possible

So you need to findthe probability of each of this step do is that you need to findthe probability of a and X occurring together

This is the sumof terms one and two

Okay, this is because in bothof these events, you're picking a ballfrom bag, correct? So there is find outthis probability and let me know your answerin the comment section

All right

We'll see if you getthe answer right? I gave you the entiresolution to this

All you have to do issubstitute the value right? If you want a second or two, I'm going to pause on the screenso that you can go through this in a more clearer way right? Remember that you needto calculate two

He's the first probability that you need to calculate isthe event of picking a blue ball from bag a given that you're pickingexactly two blue balls

Okay, II probability you needto calculate is the event of picking exactly to bluebirds

All right

These are the two probabilities

You need to calculate soremember that and this is the solution

All right, so guys, make sure youmention your answers in the comment section for now

Let's move on and Getour next topic, which is theinferential statistics

So guys, we just completed theprobability module right now

We will discussinferential statistics, which is the secondtype of Statistics

We discussed descriptivestatistics earlier

All right

So like I mentioned earlierinferential statistics also known as statistical inferenceis a branch of Statistics that deals with forminginferences and predictions about a population basedon a sample of data

Taken from the population

All right, and the questionyou should ask is how does one form inferencesor predictions on a sample? The answer is youuse Point estimation? Okay

Now you must be wondering what is point estimationone estimation is concerned with the use of the sample datato measure a single value which serves asan approximate value or the best estimate ofan unknown population parameter

That's a little confusing

Let me break it downto you for Camping in order to calculate the meanof a huge population

What we do is we first draw outthe sample of the population and then we find the sample mean right the sample meanis then used to estimate the population mean this isbasically Point estimate, you're estimating the valueof one of the parameters of the population, right? Basically the main you're trying to estimatethe value of the mean

This is what point estimation isthe two main terms in point estimation

There's something known as as the estimatorand the something known as the estimate estimatoris a function of the sample that is used to findout the estimate

Alright in this example

It's basically the samplemean right so a function that calculates the samplemean is known as the estimator and the realized value of the estimator isthe estimate right? So I hope Pointestimation is clear

Now, how do youfind the estimates? There are four common waysin which you can do this

The first one ismethod of Moment yo, what you do isyou form an equation in the sample data set and then you analyzethe similar equation in the population data set as well like the population meanpopulation variance and so on

So in simple terms, what you're doing is you'retaking down some known facts about the population and you're extendingthose ideas to the sample

Alright, once you do that, you can analyze the sampleand estimate more essential or more complexvalues right next

We have maximum likelihood

This method basically usesa model to estimate a value

All right

Now a maximum likelihoodis majorly based on probability

So there's a lot of probabilityinvolved in this method next

We have the base estimatorthis works by minimizing the errors or the average risk

Okay, the base estimatorhas a lot to do with the Bayes theorem

All right, let'snot get into the depth of these estimation methods

Finally

We have the best unbiasedestimators in this method

There are seven unbiasedestimators that can be used to approximate a parameter

Okay

So Guys these werea couple of methods that are usedto find the estimate but the most well-known methodto find the estimate is known as the interval estimation

Okay

This is one of the most importantestimation methods right? This is where confidenceinterval also comes into the picture right apartfrom interval estimation

We also have somethingknown as margin of error

So I'll be discussingall of this

In the upcoming slides

So first let's understand

What is interval estimate? Okay, an intervalor range of values, which are used to estimate apopulation parameter is known as an interval estimation, right? That's very understandable

Basically what they're trying tosee is you're going to estimate the value of a parameter

Let's say you're trying to findthe mean of a population

What you're going to do isyou're going to build a range and your value will lie inthat range or in that interval

Alright, so this way your outputis going to be more accurate because you've not predicteda point estimation instead

You have estimated an interval within which your valuemight occur, right? Okay

Now this image clearly shows how Point estimate and intervalestimate or different so guys interval estimateis obviously more accurate because you are not justfocusing on a particular value or a particular point in order to predictthe probability instead

You're saying thatthe value might be within this range betweenthe lower confidence limit and the upper confidence limit

All right, this is denotesthe range or the interval

Okay, if you're still confusedabout interval estimation, let me give you a small example if I stated that I will take30 minutes to reach the theater

This is knownas Point estimation

Okay, but if I stated that I will takebetween 45 minutes to an hour to reach the theater

This is an exampleof into Estimation

All right

I hope it's clear

Now now interval estimationgives rise to two important statistical terminologies oneis known as confidence interval and the other is knownas margin of error

All right

So there's it's important that you pay attention to both of these terminologiesconfidence interval is one of the most significant measures that are used to check how essential machinelearning model is

All right

So what is confidence intervalconfidence interval is the measure of your confidence that the intervalestimated contains the population parameteror the population mean or any of those parametersright now statisticians use confidence intervalto describe the amount of uncertainty associated with the sample estimate ofa population parameter now guys, this is a lot of definition

Let me just make youunderstand confidence interval with a small example

Okay

Let's say that you perform a survey and you surveya group of cat owners

The see how many cansof cat food they purchase in one year

Okay, you test your statistics at the 99percent confidence level and you geta confidence interval of hundred comma 200 this means that you think that the cat owners by between hundred to twohundred cans in a year and also since the confidencelevel is 99% shows that you're very confidentthat the results are, correct

Okay

I hope all of youare clear with that

Alright, so your confidenceinterval here will be a hundred and two hundred and your confidence levelwill be 99% Right? That's the differencebetween confidence interval and confidence level Sowithin your confidence interval your value is going to lie andyour confidence level will show how confident you areabout your estimation, right? I hope that was clear

Let's look at margin of error

No margin of error for a given level of confidenceis a greatest possible distance between the Point estimate and the value of the parameter that it is estimatingyou can say that it is a deviation fromthe actual point estimate right

Now

The margin of errorcan be calculated using this formula now zcher denotes the critical value or the confidence interval and this is X standarddeviation divided by root of the sample size

All right, n is basicallythe sample size now, let's understand howyou can estimate the confidence intervals

So guys the level of confidence which is denoted byC is the probability that the interval estimatecontains a population parameter

Let's say that you're tryingto estimate the mean

All right

So the level of confidenceis the probability that the intervalestimate contains the population parameter

So this intervalbetween minus Z and z or the area beneath this curveis nothing but the probability that the interval estimatecontains a population parameter

You don't all right

It should basicallycontain the value that you are predicting right

Now

These are knownas critical values

This is basicallyyour lower limit and your higherlimit confidence level

Also, there's somethingknown as the Z score now

This court can be calculated byusing the standard normal table

All right, if you lookit up anywhere on Google you'll find the z-score table or the standard normaltable to understand how this is done

Let's look at a small example

Okay, let's saythat the level of confidence

Vince is 90% This meansthat you are 90% confident that the interval containsthe population mean

Okay, so the remaining 10%which is out of hundred percent

The remaining 10%is equally distributed on these tail regions

Okay, so you have 0

05 hereand 0

05 over here, right? So on either side of see you will distributethe other leftover percentage now these Z scoresare calculated from the table as I mentioned before

All right one

I'm 6 4 5 is get collatedfrom the standard normal table

Okay, so guys how you estimatethe level of confidence? So to sum it up

Let me tell you the steps thatare involved in constructing a confidence interval first

You would start by identifyinga sample statistic

Okay

This is the statistic that you will use to estimatea population parameter

This can be anythinglike the mean of the sample next youwill select a confidence level now the confidence leveldescribes the uncertainty of a Sampling method right after that you'll findsomething known as the margin of error right? We discussed marginof error earlier

So you find this basedon the equation that I explainedin the previous slide, then you'll finally specifythe confidence interval

All right

Now, let's lookat a problem statement to better understandthis concept a random sample of 32 textbook prices is takenfrom a local College Bookstore

The mean of the sample is so so and so and the samplestandard deviation is This use a 95% confident level and find the marginof error for the mean price of all text booksin the bookstore

Okay

Now, this is a verystraightforward question

If you want you can readthe question again

All you have to do is you haveto just substitute the values into the equation

All right, so guys, we know the formula for marginof error you take the Z score from the table

After that we have deviationMadrid's 23

4 for right and that's standard deviationand n stands for the number of samples here

The number of samples is32 basically 32 textbooks

So approximately your marginof error is going to be around 8

1 to this isa pretty simple question

All right

I hope all of youunderstood this now that you know, the idea behindconfidence interval

Let's move ahead to one of the most important topicsin statistical inference, which is hypothesistesting, right? So Ugly statisticiansuse hypothesis testing to formally check whether the hypothesisis accepted or rejected

Okay, hypothesis

Testing is an inferentialstatistical technique used to determine whether there is enough evidencein a data sample to infer that a certain condition holdstrue for an entire population

So to understand the characteristicsof a general population, we take a random sample, and we analyze the propertiesof the sample right we test

Whether or not the identifiedconclusion represents the population accurately and finally we interpretthe results now whether or not to acceptthe hypothesis depends upon the percentage valuethat we get from the hypothesis

Okay, so tobetter understand this, let's look at a smallexample before that

There are a few stepsthat are followed in hypothesis testing you beginby stating the null and the alternative hypothesis

All right

I'll tell you whatexactly these terms are and then you formulate

Analysis plan right after thatyou analyze the sample data and finally you caninterpret the results right now to understandthe entire hypothesis testing

We look at a good example

Okay now considerfor boys Nick jean-bob and Harry these boyswere caught bunking a class and they were askedto stay back at school and clean the classroomas a punishment, right? So what John did is he decided that four of them would taketurns to clean their classrooms

He came up with a planof writing each of their names on chits and putting them in a bowl now every day they hadto pick up a name from the bowel and that person had to playin the clock, right? That sounds pretty fairenough now it is been three days and everybody's name has come upexcept John's assuming that this eventis completely random and free of bias

What is a probability of John not cheatingright or is the probability that he's not actuallycheating this can Solved by using hypothesis testing

Okay

So we'll Begin by calculatingthe probability of John not being picked for a day

Alright, so we'regoing to assume that the event is free of bias

So we need to findout the probability of John not cheating right firstwe will find the probability that John is not pickedfor a day, right? We get 3 out of 4, which is basically 75%75% is fairly high

So if John is not pickedfor three days in a row the Probability will drop downto approximately 42% Okay

So three days in a row meaning that is the probabilitydrops down to 42 percent

Now, let's consider a situation where John is not pickedfor 12 days in a row the probability drops downto three point two percent

Okay

That's the probabilityof John cheating becomes fairly high, right? So in order for statisticians to cometo a conclusion, they Define what is knownas a threshold value

Right consideringthe above situation if the threshold valueis set to 5 percent

It would indicate that if the probability liesbelow 5% then John is cheating his way out of detention

But if the probability isabout threshold value then John it just lucky and his nameisn't getting picked

So the probability and hypothesis testing give riseto two important components of hypothesis testing, which is null hypothesisand alternative hypothesis

Null

Hypothesis is based

Basically approving the Assumption alternatehypothesis is when your result disapprovesthe Assumption right therefore in our example, if the probabilityof an event occurring is less than 5% which it isthen the event is biased hence

It proves thealternate hypothesis

So guys with this we cometo the end of this session

Let's go aheadand understand what exactly is

Was learning sosupervised learning is where you havethe input variable X and the output variable Yand use an algorithm to learn the map Egg functionfrom the input to the output as I mentioned earlier with the exampleof face detection

So it is calledsupervised learning because the processof an algorithm learning from the training dataset can be thought of as a teacher supervisingthe learning process

So if we have a lookat the supervised learning steps or What would rathersay the workflow? So the model is usedas you can see here

We have the historic data

Then we again we havethe random sampling

We split the datainto train your asset and the testing data set usingthe training data set

We with the helpof machine learning which is supervisedmachine learning

We create statistical model then after we have a modwhich is being generated with the helpof the training data set

What we do is usethe testing data set for production and testing

What we do is get the output and finally we havethe model validation outcome

That was thetraining and testing

So if we have a lookat the prediction part of any particular supervisedlearning algorithm, so the model is usedfor operating outcome of a new data set

So whenever performance of the model degradedthe model is retrained or if there areany performance issues, the model is retained withthe help of the new data now when we talk about supervisor in there not just onebut quite a few algorithms here

So we have linearregression logistic regression

This is entry

We have random Forest

We have made by classifiers

So linear regression is usedto estimate real values

For example, the cost of houses

The number of callsthe total sales based on the continuous variables

So that is whatreading regression is

Now when we talkabout logistic regression, which is used to estimatediscrete values, for example, which are binary valueslike 0 and 1 yes, or no true

False based on the given setof independent variables

So for example, when you are talkingabout something like the chances of winning or if you talk about winning which can beeither true or false if will it rain todaywith it can be the yes or no, so it cannot be like when the outputof a particular algorithm or the particularquestion is either

Yes

No or Banner e then only we use a largestick regression the next we have decision trees

So now these are used forclassification problems it work

X for bothcategorical and continuous dependent variables and if we talk about random ForestSo Random Forest is an M symbol of a decision tree, it gives better predictionaccuracy than decision tree

So that is another type ofsupervised learning algorithm

And finally we havethe need based classifier

So it was a classification technique basedon the Bayes theorem with an assumption ofIndependence between predictors

A linear regression is oneof the easiest algorithm in machine learning

It is a statistical model that attempts toshow the relationship between two variableswith a linear equation

But before we drill down to linear regressionalgorithm in depth, I'll give you a quick overviewof today's agenda

So we'll start a session with a quick overviewof what is regression as linear regressionis one of a type of regression algorithm

Once we learn about regression, its use case the varioustypes of it next

We'll learn aboutthe algorithm from scratch

Each where I'll teach you it's mathematicalimplementation first, then we'll drill downto the coding part and Implement linearregression using python in today's session will deal with linear regression algorithmusing least Square method check its goodness of fit or how close the data is to the fitted regression lineusing the R square method

And then finally what will do will optimize itusing the gradient decent method in the last parton the coding session

I'll teach you to implementlinear regression using Python and Coding session would be divided into two partsthe first part would consist of linear regressionusing python from scratch where you will usethe mathematical algorithm that you have learnedin this session

And in the next partof the coding session will be using scikit-learnfor direct implementation of linear regression

So let's begin our sessionwith what is regression

Well regression analysis is a form of predictivemodeling technique which investigates therelationship between a dependent and independent variablea regression analysis

Vols graphing a lineover a set of data points that most closely fitsthe overall shape of the data or regression shows the changesin a dependent variable on the y-axis to the changes in the explanation variableon the x-axis fine

Now you would askwhat are the uses of regression? Well, there are major three usesof regression analysis the first being determining the strengthof predicates errs, the regression might be usedto identify the strength of the effect that the independent variableshave on the dependent variable or But you can ask question

Like what is the strengthof relationship between sales and marketing spending or whatis the relationship between age and income second is forecasting an effect in this the regressioncan be used to forecast effects or impact of changes

That is the regression analysishelp us to understand how much the dependent variablechanges with the change and one or moreindependent variable fine

For example, you can askquestion like how much additional say Lancomewill I get for each? Thousand dollarsspent on marketing

So it is Trend forecasting in this the regressionanalysis predict Trends and future values

The regression analysis canbe used to get Point estimates in this you can ask questions

Like what will bethe price of Bitcoin and next six months, right? So next topic is linear versuslogistic regression by now

I hope that you know,what a regression is

So let's move onand understand its type

So there are various kindsof regression like linear regression logistic regressionpolynomial regression

Others only but for this session will be focusing on linearand logistic regression

So let's move on and let me tellyou what is linear regression

And what is logistic regression then what we'll dowe'll compare both of them

All right

So starting withlinear regression in simple linear regression, we are interested in thingslike y equal MX plus C

So what we are trying to findis the correlation between X and Y variable this means that every value of x has a correspondingvalue of y and it if it is continuous

All right, howeverin logistic regression, we are not fitting our datato a straight line like linear regression insteadwhat we are doing

We are mapping Y versus X to a sigmoid functionin logistic regression

What we find out is is y 1 or 0for this particular value of x so thus we are essentiallydeciding true or false value for a given value of x fine

So as a core conceptof linear regression, you can say that the datais modeled using a straight

But in the caseof logistic regression the data is module usinga sigmoid function

The linear regression is usedwith continuous variables on the other handthe logistic regression

It is used with categoricalvariable the output or the predictionof a linear regression is a value of the variable on the other handthe output of production of a logistic regressionis the probability of occurrence of the event

Now, how will youcheck the accuracy and goodness of fit in caseof linear regression? We are various methodslike measured by loss R square

Are adjusted r squared Etc while in the caseof logistic regression you have accuracy precisionrecall F1 score, which is nothing butthe harmonic mean of precision and recall next is Roc curve for determining the probabilitythreshold for classification or the confusion Matrix Etc

There are many all right

So summarizing the difference between linear andlogistic regression

You can say that the type of function youare mapping to is the main point of difference between linearand logistic regression a linear regression model

The Continuous X2 a continuousfile on the other hand a logistic regressionMaps a continuous x to the bindery why so we can use logisticregression to make category or true false decisionsfrom the data find so let's moveon ahead next is linear regression selection criteria, or you can say when willyou use linear regression? So the first is classification and regression capabilitiesregression models predict a continuous variable such asthe sales made on a day or predict the temperature of a city T their Relianceon a polynomial like a straight lineto fit a data set poses a real challenge when it comes towards buildinga classification capability

Let's imagine that you fita line with a train points that you have now imagine youadd some more data points to it

But in order to fit it,what do you have to do? You have to changeyour existing model that is maybe you haveto change the threshold itself

So this will happenwith each new data point you are to the model hence

The linear regression is notgood for classification models

Fine

Next is data quality

Each missing valueremoves one data point that could optimize the regression andsimple linear regression

The outliers can significantly disrupt the outcomejust for now

You can know that if youremove the outliers your model will become very good

All right

So this is about data quality

Next is computational complexitya linear regression is often not computationally expensive ascompared to the decision tree or the clustering algorithmthe order of complexity for n training exampleand X features usually Falls in either Big O of x Or bigger of xnnext is comprehensible and transparent thelinear regression are easily comprehensibleand transparent in nature

They can be represented bya simple mathematical notation to anyone and can beunderstood very easily

So these are someof the criteria based on which you will selectthe linear regression algorithm

All right

Next is where is linearregression used first is evaluating transand sales estimate

Well linear regressioncan be used in business to evaluate Trendsand make estimates

Forecast for example, if a company sales haveincreased steadily every month for past few years thenconducting a linear analysis on the sales datawith monthly sales on the y axis and time on the x axis

This will give you a line that predicts the upward Trendsin the sale after creating the trendline the companycould use the slope of the lines to focusedsale in future months

Next is analyzing

The impact of price changeswill linear regression can be used to analyzethe effect of pricing on Omer behavior for instance, if a company changes the price on a certainproduct several times, then it can record the quantityitself for each price level and then performa linear regression with sold quantity as a dependent variable and priceas the independent variable

This would result in a linethat depicts the extent to which the customer reducetheir consumption of the product as the prices increasing

So this result would help usin future pricing decisions

Next is assessment of riskin financial services and insurance domain

Linear regression can be usedto analyze the risk, for example health insurancecompany might conduct a linear regression algorithm how it can do it can do itby plotting the number of claims per customer against its ageand they might discover that the old customers tend to make morehealth insurance claim

Well the resultof such analysis might guide important business decisions

All right, so by now youhave just a rough idea of what linear regressionalgorithm as like what it does where it is usedwhen You should use it early

Now

Let's move onand understand the algorithm and depth so supposeyou have independent variable on the x-axis and dependentvariable on the y-axis

All right suppose

This is the data pointon the x axis

The independent variableis increasing on the x-axis

And so does the dependentvariable on the y-axis? So what kind of linearregression line you would get you would get a positivelinear regression line

All right as the slope wouldbe positive next is suppose

You have an independentvariable on the X axis which is increasing and on the other hand thedependent variable on the y-axis that is decreasing

So what kind of linewill you get in that case? You will geta negative regression line

In this case as the slopeof the line is negative and this particular line that is line of y equal MX plus C is a lineof linear regression which shows the relationshipbetween independent variable and dependent variable and this line is only knownas line of linear regression

Okay

So let's addsome data points, too

Our graph so theseare some observation or data points on our graph

So let's plot some more

Okay

Now all our data pointsare plotted now our task is to create a regression lineor the best fit line

All right now once our regressionline is drawn now, it's the taskof production now suppose

This is our estimated valueor the predicted value and this is our actual value

Okay

So what we have to do our maingoal is to reduce this error that is to reduce the distance between the Estimatedor the predicted value and the actual value the bestfit line would be the one which had the least error or the least differencein estimated value and the actual value

All right, and other words wehave to minimize the error

This was a brief understanding of linear regressionalgorithm soon

We'll jump towardsmathematical implementation

But for then let me tell youthis suppose you draw a graph with speed on the x-axis and distance covered on the y axis withthe time domain in constant

If you plot a graphbetween the speed travel by the vehicle and the distance traveledin a fixed unit of time, then you will geta positive relationship

All right

So suppose the equationof a line is y equal MX plus C

Then in this case Y isthe distance traveled in a fixed duration of time x is the speed of vehicle mis the positive slope of the line and see isthe y-intercept of the line

All right supposethe distance remaining constant

You have to plot a graphbetween the speed of the vehicle and the time takento travel a fixed distance

Then in that caseyou will get a line with a negative relationship

All right, the slope of the lineis negative here the equation of line changes to yequal minus of MX plus C where Y is the timetaken to travel a fixed distance X is the speed of vehicle m isthe negative slope of the line and see isthe y-intercept of the line

All right

Now, let's get back to our independentand dependent variable

So in that term,why is our dependent variable and X that isour independent variable now, let's move on

And see them at the magicalimplementation of the things

Alright, so we have x equal 1 2 3 4 5 let's plotthem on the x-axis

So 0 1 2 3 4 5 6 alignand we have y as 3 4 2 4 5

All right

So let's plot 1 2 3 4 5 on the y-axis now, let's plot our coordinates 1by 1 so x equal 1 and y equal 3, so we have here x equal 1 and y equal 3 So this is the point1 comma 3 so similarly we have 1 3 2 4 3 2 4 4 & 5 5

Alright, so moving on ahead

Let's calculate the mean of Xand Y and plot it on the graph

All right, so mean of X is 1 plus 2 plus 3 plus 4plus 5 divided by 5

That is 3

All right, similarly meanof Y is 3 plus 4 plus 2 plus 4 plus 5 that is 18

So we 10 divided by 5

That is nothing but 3

6

Alright, so nextwhat we'll do we'll plot

I mean that is 3 comma3

6 on the graph

Okay

So there's a point 3 comma 3

6 see our goal is to findor predict the best fit line using the least SquareMethod All right

So in order to find that we first need to findthe equation of line, so let's find the equationof our regression line

Alright, so let's supposethis is our regression line y equal MX plus C

Now

We have an equation of line

So all we need to dois find the value of M and C

I wear m equalssummation of x minus X bar X Y minus y barupon the summation of x minus X bar whole Squaredon't get confused

Let me resolve it for you

All right

So moving on aheadas a part of formula

What we are going to dowill calculate x minus X bar

So we have X as 1 minus X baras 3 so 1 minus 3 that is minus 2 next

We have x equalto minus its mean 3 that is minus 1similarly we 3 - 3 0 4 minus 3 1 5 - 3 2

All right, so x minus X bar

It's nothing but the distanceof all the point through the line y equal 3 and what does this y minus y bar impliesit implies the distance of all the point from the line x equal 3

6 fine

So let's calculate the valueof y minus y bar

So starting with y equal 3 - value of y barthat is 3

6

So it is three minus three



6

How much - of 0

6 next is 4 minus 3

6that is 0

4 next to minus 3

6 that is - of 1

6

Next is 4 minus 3

6that is 0

4 again, 5 minus 3

6 that is 1

4

Alright, so now we are donewith Y minus y bar fine now next we will calculate xminus X bar whole Square

So let's calculate xminus X bar whole Square so it is - 2 whole square that is4 minus 1 whole square

That is 1 0 squared is0 1 Square 1 2 square for fine

So now in our table we have xminus X bar y minus y bar and x minus X bar whole Square

Now what we need

We need the product of xminus X bar X Y minus y bar

Alright, so let's seethe product of x minus X bar X Y minus y bar that is minusof 2 x minus of 0

6

That is 1

2 minusof 1 x 0 point 4

That is minus

- of zero point 4 0 xminus of 1

6

That is 0 1 multipliedby zero point four that is 0

4

And next 2 multipliedby 1 point for that is 2

8

All right

Now almost all the partsof our formula is done

So now what we needto do is get the summation of last two columns

All right, so the summation of xminus X bar whole square is 10 and the summation of x minus X bar X Y minus y bar is for So the value of Mwill be equal to 4 by 10 fine

So let's put this value of m equals zero point 4and our line y equal MX plus C

So let's file all the pointsinto the equation and find the value of C

So we have y as 3

6 rememberthe mean by m as 0

4 which we calculated justnow X as the mean value of x that is 3 and we have the equation as3 point 6 equals 0

4 Applied by 3 plus C

Alright that is 3

6 equal1 Point 2 plus C

So what is the value of Cthat is 3

6 minus 1

2

That is 2

4

All right

So what we had we had m equalszero point four C as 2

4

And then finally when we calculate the equationof the regression line, what we get is y equalzero point four times of X plus two point four

So this is the regression line

All right, so there is how you are plottingyour points this Actual point

All right now for given m equalszero point four and SQL 2

4

Let's predict the value of yfor x equal 1 2 3 4 & 5

So when x equal1 the predicted value of y will be zero point four x one plus two pointfour that is 2

8

Similarly when x equalto predicted value of y will be zero point 4 x 2 + 2 point 4 that equalsto 3 point 2 similarly x equal 3 y will be 3



6

X equals 4 y will be 4 point 0 x equal 5 y will befour point four

So let's plot them on the graph and the line passing throughall these predicting point and cutting y-axis at 2

4as the line of regression

Now your task is to calculatethe distance between the actual and the predicted value and your job isto reduce the distance

All right, or in other words, you have to reduce the errorbetween the actual and the predicted value the line with the least error will bethe line of linear regression

Chicken or regression line and it will also bethe best fit line

All right

So this is how thingswork in computer

So what it do it performsn number of iteration for different values of Mfor different values of M

It will calculatethe equation of line where y equals MX plus C

Right? So as the valueof M changes the line is changing so iterationwill start from one

All right, and it will performa number of iteration

So after every iteration what it will do itwill calculate the predicted

Value according to the lineand compare the distance of actual valueto the predicted value and the value of M for which the distancebetween the actual and the predicted value isminimum will be selected as the best fit line

All right

Now that we have calculatedthe best fit line now, it's time to check the goodnessof fit or to check how good a model is performing

So in order to do that, we have a methodcalled R square method

So what is this R square? Well r-squared value isa statistical measure of how close the data are to the fitted regressionline in general

It is considered that a high r-squaredvalue model is a good model, but you can also havea lower squared value for a good model as well or a higher squaredvalue for a model that does not fit at all

I like it is also known ascoefficient of determination or the coefficientof multiple determination

Let's move on and seehow a square is calculated

So these are our actual valuesplotted on the graph

We had calculatedthe predicted values of Y as 2

8 3

2 3

6 4

0 4

4

Remember when we calculatedthe predicted values of Y for the equation Ypredicted equals 0 1 4 x of X plus two pointfour for every x equal 1 2 3 4 & 5 from there

We got the Ed valuesof Phi all right

So let's plot it on the graph

So these are pointand the line passing through these points are nothingbut the regression line

All right

Now what you need to do is you have to check and comparethe distance of actual - mean versus the distanceof predicted - mean alike

So basically what you are doingyou are calculating the distance of actual value to the meanto distance of predicted value to the mean I like so there is nothing but a square in mathematicallyyou can represent our school

Whereas summation of Ypredicted values minus y bar whole Square dividedby summation of Y minus y bar whole Square where Y is the actual valuey p is the predicted value and Y Bar is the mean value of ythat is nothing but 3

6

Remember, this is our formula

So next what we'll dowe'll calculate y minus

Y1

So we have y is3y bar as 3 point 6, so we'll calculateit as 3 minus 3

6 that is nothing but minus of 0

6similarly for y equal 4 and Y Bar equal 3

6

We have y minus y bar aszero point 4 then 2 minus 3

6

It is 1 point 6 4 minus3

6 again zero point four and five minus 3

6 it is 1

4

So we got the valueof y minus y bar

Now what we have to do wehave to take it Square

So we have minus 0

6 Squareas 0

36 0

4 Square as 0

16 - of 1

6 Square as 2

56 0

4 Squareas 0

16 and 1

4 squared is 1

96 now is a partof formula what we need

We need our YPminus y BAR value

So these are VIP values and wehave to subtract it from the No, why so 2

8 minus 3

6that is minus 0

8

Similarly

We will get 3

2 minus 3

6that is 0

4 and 3

6 minus 3

6

That is 0 for 1 0 minus3

6 that is 0

4

Then 4

4 minus 3

6 that is 0

8

So we calculated the valueof YP minus y bar now, it's our turn to calculatethe value of y b minus y bar whole Square next

We have - of 0

8 Square as 0

64 - of Pointfour square as 0

160 Square 0 0 point 4 Square as again 0

16and 0

8 Square as 0

64

All right

Now as a part of formula what it suggests it suggestsme to take the summation of Y P minus y bar whole square and summation of Y minusy bar whole Square

All right

Let's see

So in submitting yminus y bar whole Square what you get is five point twoand summation of Y P minus y bar whole Square youget one point six

So the value of R squarecan be calculated as 1 point 6 upon 5

2 fine

So the result which will getis approximately equal to 0

3

Well, this is not a good fit

All right, so it suggests that the data points are faraway from the regression line

Alright, so this is how your graph will looklike when R square is 0

3 when you increase the valueof R square to 0

7

So you'll see that the actual value would likecloser to the regression line when it reaches to 0

9 it comes

More clothes and when the valueof approximately equals to 1 then the actual values lieson the regression line itself, for example, in this case

If you get a very low valueof R square suppose 0

02

So in that case what will see that the actual valuesare very far away from the regression lineor you can say that there are toomany outliers in your data

You cannot focusand thing from the data

All right

So this was all about thecalculation of our Square now, you might get a questionlike are low values of Square always bad

Well in some field itis entirely expected that I ask where value will be low

For example any field that attempts to predict humanbehavior such as psychology typically has r-squared valueslower than around 50% through which you can conclude that humans are simply harder to predict the underphysical process furthermore

If you ask what value is low, but you have statisticallysignificant predictors, then you can stilldraw important conclusion about how changes in thepredicator values associated

Created with the changesin the response value regardless of the r-squared the significant coefficientstill represent the mean change in the response for one unitof change in the predicator while holding other predicatedin the model constant

Obviously this type of information can beextremely valuable

All right

All right

So this was all aboutthe theoretical concept now, let's move on to the coding part and understand thecode in depth

So for implementinglinear regression using python, I will be using Anaconda with jupyter notebookinstalled on it

So I like there'sa jupyter notebook and we are usingpython 3

0 on it

Alright, so we are goingto use a data set consisting of head size and human brainof different people

All right

So let's import our data setpercent matplotlib and line

We are importing numpy as NP pandas as speedy andmatplotlib and from matplotlib

We are importing pipe lotof that as PLT

Alright next we will importour data had brain dot CSV and store itin the database table

Let's execute the Run buttonand see the armor

But so this tasksymbol it symbolizes that it still executing

So there's a outputour data set consists of two thirty seven rowsand 4 columns

We have columns asgender age range head size in centimeter Cube and brain weightsand Graham fine

So there's our sample data set

This is how it looks it consistsof all these data set

So now that wehave imported our data, so as you can see they are237 values in the training set so we can find a linear

Relationship between the headsize and the Brain weights

So now what we'll dowe'll collect X & Y the X would consistof the head size values and the Y would consistof brain with values

So collecting X and Y

Let's execute the Run

Done next what we'll do weneed to find the values of b 1 or B not or you can say m and C

So we'll need the mean of Xand Y values first of all what we'll do we'll calculatethe mean of X and Y so mean x equal NP dot Min X

So mean is a predefined functionof Numb by similarly mean underscore y equalNP dot mean of Y, so what it will return if you'll returnthe mean values of Y next we'll checkthe total number of values

So m equals

Well length of X

Alright, then we'll use the formulato calculate the values of b 1 and B naught or MNC

All right, let's executethe Run button and see what is the result

So as you can seehere on the screen, we have got d 1 as 0 point 2 6 3 and be not as three twentyfive point five seven

Alright, so nowthat we have a coefficient

So comparing it withthe equation y equal MX plus C

You can saythat brain weight equals zero point 2 6 3 X Head sizeplus three twenty five point five seven so you can say that the value of Mhere is zero point 2 6 3 and the value of C

Here is three twentyfive point five seven

All right, so there'sour linear model now, let's plot itand see graphically

Let's execute it

So this is how our plot lookslike this model is not so bad

But we need to find outhow good our model has

So in order to findit the many methods like root mean Square methodthe coefficient of determination or the a square method

So in this tutorial, I have told youabout our score method

So let's focus on that and seehow good our model is

So let's calculatethe R square value

All right here SS underscore Tis the total sum of square SS

I is the total sum of squareof residuals and R square as the formula is1 minus total sum of squares upon total sumof square of the residuals

All right nextwhen you execute it, you will get the valueof R square as 0

63 which is pretty very good

Now that you have implementedsimple linear regression model using least Square method, let's move on and see how will you implement the modelusing machine learning library called scikit-learn

All right

So this scikit-learnis a simple machine

Owning library in Python weldingmachine learning model are very easy using scikit-learn

So suppose there'sa python code

So using the scikit-learnlibraries your code shortens to this length like so let's execute the Run button and see youwill get the same our to score

So today we'll be discussinglogistic regression

So let's move forward and understand the what and byof logistic regression

Now this algorithmis most widely used when the dependent variable or you can see the output isin the binary format

And so here you needto predict the outcome of a categoricaldependent variable

So the outcome should bealways discreet or categorical in nature Now by discrete

I mean the valueshould be binary or you can say you just havetwo values it can either be 0 or 1 you can either be yes or a no either be trueor false or high or low

So only these can bethe outcomes so the value which you need to protectshould be discrete or you can saycategorical in nature

Whereas in linear regression

We have the value of byor you can say the value

Two predictors in a Range that is how there's a differencebetween linear regression and logistic regression

We must be having question

Why not linear regression nowguys in linear regression the value of buyer or the value which you needto predict is in a range, but in our case asin the logistic regression, we just have two valuesit can be either 0 or it can be one

It should not entertainthe values which is below zero or above one

But in linear regression, we have the value of yin the range so here in order to implement logic regression

We need to clip this This partso we don't need the value that is below zeroor we don't need the value which is above 1 so since the value of y will bebetween only 0 and 1 that is the main ruleof logistic regression

The linear line has to beclipped at zero and one now

Once we clip this graph itwould look somewhat like this

So here you aregetting the curve which is nothing butthree different straight lines

So here we need to makea new way to solve this problem

So this has to beformulated into equation and hence we come upwith logistic regression

So here the outcomeis either 0 or 1

Which is the main ruleof logistic regression

So with this our resulting curvecannot be formulated

So hence our main aimto bring the values to 0 and 1 is fulfilled

So that is how we came up withlarge stick regression now here once it gets formulatedinto an equation

It looks somewhat like this

So guys, this isnothing but an S curve or you can say the sigmoid curvea sigmoid function curve

So this sigmoid functionbasically converts any value from minus infinity to Infinitypure discrete values, which a Logitech regressionwants or you can say the Values which are in binaryformat either 0 or 1

So if you see herethe values and either 0 or 1 and this is nothingbut just a transition of it, but guys there'sa catch over here

So let's say I havea data point that is 0

8

Now, how can you decide whether your value is 0 or 1 now here youhave the concept of threshold which basicallydivides your line

So here threshold value basically indicates theprobability of either winning or losing so here by winning

I mean the value is equals to 1

Am I losing I meanthe values equal to 0 but how does it do that? Let's have a data pointwhich is over here

Let's say my cursor is at 0

8

So here I check whether this value is lessthan the threshold value or not

Let's say if it is morethan the threshold value

It should give me the resultas 1 if it is less than that, then should give methe result is zero

So here my thresholdvalue is 0

5

I need to Define thatif my value let's is 0

8

It is a more than 0

5

Then the value shouldbe rounded of to 1

Let's see if it isless than 0

5

Let's I have a value 0

2 thenshould reduce it to zero

So here you can use the concept of threshold valueto find output

So here it should be discreet

It should be either 0or it should be one

So I hope you caught this curveof logistic regression

So guys, this isthe sigmoid S curve

So to make this curvewe need to make an equation

So let me addressthat part as well

So let's see how an equationis formed to imitate this functionality so over here, we have an equationof a straight line

It is y is equal to MX plus C

So in this case, I just have only one independentvariable but let's say if we have manyindependent variable then the equation becomes m 1 x1 plus m 2 x 2 plus m 3 x 3 and so on till M NX n now, let us put in B and X

So here the equationbecomes Y is equal to b 1 x 1 plus beta 2 x 2 plus b 3 x 3 and so ontill be nxn plus C

So guys the equation of the straight line has a rangefrom minus infinity to Infinity

But in our case or you can saylargest equation the value which we need to predictor you can say the Y value it can have the rangeonly from 0 to 1

So in that case we needto transform this equation

So to do that what we had done we have just dividethe equation by 1 minus y so now Y is equal to 0 so 0 over 1 minus 0which is equal to 1 so 0 over 1 is again 0 and if you take Y is equalsto 1 then 1 over 1 minus 1 which is 0 so 1 over 0 is infinity

So here my range is nowbetween You know to Infinity, but again, we want the rangefrom minus infinity to Infinity

So for that what we'll do we'll havethe log of this equation

So let's go aheadand have the logarithmic of this equation

So here we have this transformit further to get the range between minus infinityto Infinity so over here we have log of Y over 1 minus 1 and this is your finallogistic regression equation

So guys, don't worry

You don't have to writethis formula or memorize this formula in Python

You just need tocall this function which is logistic regression and everything will be beautomatically for you

So I don't want to scareyou with the maths in the formulas behind it, which is always good to knowhow this formula was generated

So I hope you guys are clear with how logistic regressioncomes into the picture next

Let us see what arethe major differences between linear regression wasa logistic regression the first of all in linear regression, we have the value of y as a continuous variableor the variable between need to predictare continuous in nature

Whereas in logistic regression

We have the categorical variableso here the value which you need to predictshould be Creating nature

It should be either 0 or 1 or should havejust two values to it

For example, whether it is rainingor it is not raining is it humid outsideor it is not humid outside

Now, does it going to snowand it's not going to snow? So these are the few example, we need to predict where the values are discreteor you can just predict whether this ishappening or not

Next linear equation solvesyour regression problems

So here you have a conceptof independent variable and the dependent variable

So here you can calculatethe value of y which you need to predictusing the A of X so here your y variableor you can see the value that you need topredict are in a range

But whereas inlogistic regression you have discrete values

So logistic regression basicallysolves a classification problem so it can basically classify itand it can just give you result whether this eventis happening or not

So I hope it is pretty much Clear till nownext in linear equation

The graph that you have seen is a straight line graphso over here, you can calculate the valueof y with respect to the value of x where as in logisticregression because of that

The got was a Escobar youcan see the sigmoid curve

So using the sigmoid functionYou can predict your y-values moving the I letus see the various use cases where in logistic regressionis implemented in real life

So the very first isweather prediction now largest aggression helpsyou to predict your weather

For example, itis used to predict whether it is rainingor not whether it is sunny

Is it cloudy or not? So all these thingscan be predicted using logistic regression

Where as you needto keep in mind that both linear regression

And logistic regression can beused in predicting the weather

So in that case linear equationhelps you to predict what will bethe temperature tomorrow whereas logistic regressionwill only tell you which is going to rain or notor whether it's cloudy or not, which is going to snow or not

So these values are discrete

Whereas if you applylinear regression you the predicting things like whatis the temperature tomorrow or what is the temperatureday after tomorrow and all those thing? So these arethe slight differences between linear regression and logistic regressionthe moving ahead

We have classification problem

Sighs on performsmulti-class classification

So here it can help you tellwhether it's a bird

It's not a bird

Then you classifydifferent kind of mammals

Let's say whether it's a dogor it's not a dog similarly

You can check it for reptile whether it's a reptileor not a reptile

So in logistic regression, it can performmulti-class classification

So this pointI've already discussed that it is usedin classification problems next

It also helps you to determinethe illness as well

So let me take an example

Let's say a patient goes fora routine check up in hospital

So what doctor will do it, it will perform various testson the patient and will check whether the patient isactually l or not

So what will be the features so doctor can checkthe sugar level the blood pressure then whatis the age of the patient? Is it very small or isit old person then? What is the previous medicalhistory of the patient and all of these featureswill be recorded by the doctor and finally doctor checksthe patient data and Data - the outcome of an illnessand the severity of illness

So using all the dataa doctor can identify with A patient is ill or not

So these arethe various use cases in which you can uselogistic regression now, I guess enough of theory part

So let's move ahead and see someof the Practical implementation of logistic regressionso over here, I be implementing two projects when I have the data set of Titanic so over herewill predict what factors made people more likely to survivethe sinking of the Titanic ship and my second projectwill see the data analysis on the SUV cars so over here wehave the data of the SUV cars who can purchase it

And what factors made peoplemore interested in buying SUV? So these will bethe major questions as to why you should Implement logistic regression andwhat output will you get by it? So let's start bythe very first project that is Titanic data analysis

So some of you might know that there was a shipcalled as Titanic with basically hit an iceberg and it sunk to the bottomof the ocean and it was a big disaster at that time because it was the firstvoyage of the ship and it was supposed to be reallyreally strongly built and one of the best ships of that time

So it was a big Disaster of that time and of course thereis a movie about this as well

So many of youmight have washed it

So what we have we have dataof the passengers those who survived and those who did not survivein this particular tragedy

So what you have to do youhave to look at this data and analyze which factorswould have been contributed the most to the chances of a person survivalon the ship or not

So using the logisticregression, we can predict whether the person survived or the person diednow apart from this

We also have a lookwith the various features along with that

So first, let us exploreThe data set so over here

We have the index valuethen the First Column is passenger ID

Then my next column is survived

So over here, we have two valuesa 0 and a 1 so 0 stands for did not surviveand one stands for survive

So this column is categorical where the valuesare discrete next

We have passenger classso over here, we have three values 1 2 and 3

So this basically tells you that whether a passengers travellingin the first class second class or third class, then we have the name of the We have the six or you can seethe gender of the passenger where the passengeris a male or female

Then we have the agewe had sip SP

So this basically meansthe number of siblings or the spouses aboardthe Titanic so over here, we have values such as 10 and so on then we have Parts apart is basicallythe number of parents or children aboardthe Titanic so over here, we also have some values then we have the ticket number

We have the fair

We have the table numberand we have the embarked column

So in my inbox column, we have three valueswe have SC and Q

So as basically stands for Southampton Cstands for Cherbourg and Q stands for Cubans down

So these are the features that will be applyingour model on so here we'll perform various steps and then we'll be implementinglogistic regression

So now these arethe various steps which are requiredto implement any algorithm

So now in our casewe are implementing logistic regression soft

Very first step isto collect your data or to import the libraries that are used forcollecting your data

And then taking it forward thenmy second step is to analyze your data so over here I can goto the various fields and then I can analyze the data

I can check that the females or children survivebetter than the males or did the richpassenger survived more than the poor passengeror did the money matter as in who paid mode to getinto the ship with the evacuated first? And what about the workersdoes the worker survived or what is the survival rate if you were the workerin the ship and not just a traveling passenger? So all of these arevery very and questions and you would be goingthrough all of them one by one

So in this stage, you need to analyze our data and explore your data as muchas you can then my third step is to Wrangle your data now data wrangling basically meanscleaning your data so over here, you can simply removethe unnecessary items or if you have a null valuesin the data set

You can just clear that data andthen you can take it forward

So in this step youcan build your model using the train data set and then you can test it using a test so over here youwill be performing a split which basically Getyour data set into training and testing data set and findyou will check the accuracy

So as to ensure how much accurateyour values are

So I hope you guys gotthese five steps that you're going to implementin logistic regression

So now let's go into allthese steps in detail

So number one

We have to collect your data or you can sayimport the libraries

So it may show youthe implementation part as well

So I just openmy jupyter notebook and I just Implement allof these steps side by side

So guys this ismy jupyter notebook

So first, let me just renamejupyter notebook to let's say Titanic data analysis

Now a full step wasto import all the libraries and collect the data

So let me just importall the library's first

So first of all,I'll import pandas

So pandas is usedfor data analysis

So I'll say import pandas as PDthen I will be importing numpy

So I'll say import numpy as NPso number is a library in Python which basically standsfor numerical Python and it is widely used to performany scientific computation

Next

We will be importing Seaborn

So c 1 is a library for statistical plotting soSay import Seaborn as SNS

I'll also import matplotlib

So matplotlib libraryis again for plotting

So I'll say importmatplotlib dot Pi plot as PLT now to run this libraryin jupyter Notebook all I have to write in his percentagematplotlib in line

Next I will be importingone module as well

So as to calculate the basicmathematical functions, so I'll say import maths

So these are the libraries that I will be needingin this Titanic data analysis

So now let me justimport my data set

So I'll take a variable

Let's say Titanic dataand using the pandas

I will just read my CSVor you can see the data set

I like the name of my data setthat is Titanic dot CSV

Now

I have already showed youthe data set so over here

Let me just bringthe top 10 rows

So for that I will just say I take the variableTitanic data dot head and I'll say the top ten rules

So now I'll just run this so to run this style is haveto press shift + enter or else you can just directlyclick on this cell so over here

I have the index

We have the passenger ID,which is nothing

But again the index which is starting from 1 thenwe have the survived column which has a category

Call values or you can saythe discrete values, which is in the form of 0 or 1

Then we havethe passenger class

We have the name ofthe passenger sex age and so on

So this is the data set that I will be going forward with next let us printthe number of passengers which are there in this originaldata frame for that

I'll just simply type in print

I'll say a number of passengers

And using the length function, I can calculatethe total length

So I'll say length and inside this I'll bepassing this variable because Titanic data,so I'll just copy it from here

I'll just paste it dot index and next set mejust bring this one

So here the number of passengers which are there in the original data setwe have is 891 so around this number would traveling inthe Titanic ship so over here, my first step is done where you have just collecteddata imported all the libraries and find out the totalnumber of passengers, which are Titanic sonow let me just go back to presentation and let's see

What is my next step

So we're done withthe collecting data

Next step is to analyzeyour data so over here will be creating different plotsto check the relationship between variables as in how one variableis affecting the other so you can simply exploreyour data set by making use of various columns and then you can plota graph between them

So you can either plota correlation graph

You can plota distribution curve

It's up to you guys

So let me just go back to my jupyter notebook and letme analyze some of the data

Over here

My second part isto analyze data

So I just put this in headedto now to put this in here to I just have to goon code click on mark down and I just run this so first let us plot account plot where you can paybetween the passengers who survived andwho did not survive

So for that I will be usingthe Seabourn Library so over here I have importedSeaborn as SNS so I don't haveto write the whole name

I'll simply saySNS dot count plot

I say axis with the surviveand the data that I'll be usingis the Titanic data or you can say the nameof variable in which you have store your data set

So now let me just run this so who were here as you can seeI have survived column on my x axis and on the y axis

I have the count

So zero basically standsfor did not survive and one standsfor the passengers who did survive so over here, you can see that around 550of the passengers who did not survive and theywere around 350 passengers who only survive so hereyou can basically conclude

There are very less survivorsthan on survivors

So this was the very first plotnow there is not another plot to compare the sex as to whetherout of all the passengers who survived andwho did not survive

How many were men andhow many were female so to do that? I'll simply saySNS dot count plot

I add the Hue as sixso I want to know how many females andhow many male survive then I'll bespecifying the data

So I'm using Titanic dataset and let me just run this you have done a mistake over here so over here you can see I have survivedcolumn on the x-axis and I have the counton the why now

So have you color standsfor your male passengers and orange standsfor your female? So as you can seehere the passengers who did not survive that has a value0 so we can see that

Majority of males did notsurvive and if we see the people who survived here, we can see the majorityof female survive

So this basically concludesthe gender of the survival rate

So it appears on averagewomen were more than three times more likelyto survive than men next

Let us plot another plot where we have the Hue asthe passenger class so over here we can see which class atthe passenger was traveling in whether it was travelingin class 1 2 or 3

So for that I justarrived the same command

I will say as soon as

com plot

I gave my x-axis as a family

I'll change my Hueto passenger class

So my variablenamed as PE class

And the data said that I'll be usingthis Titanic data

So this is my result so over here you can see I haveblue for first-class orange for second class and greenfor the third class

So here the passengers who did not survive a majorlyof the third class or you can say the lowest class or the cheapest class to getinto the dynamic and the people who did survive majorly belongto the higher classes

So here 1 & 2 has more eyesthan the passenger who were travelingin the third class

So here we have computedthat the passengers who did not survivea majorly of third

Or you can see the lowest class and the passengerswho were traveling in first and second classwould tend to survive mode next

I just got a graph forthe age distribution over here

I can simply use my data

So we'll be usingpandas library for this

I will declare an arrayand I'll pass in the column

That is H

So I plot and Iwant a histogram

So I'll see plot da test

So you can notice over here that we have moreof young passengers, or you can see the childrenbetween the ages 0 to 10 and then we havethe average people and if you go ahead Lesterwould be the population

So this is the analysison the age column

So we saw that we have more young passengers andmore video courage passengers which are travelingin the Titanic

So next let me plota graph of fare as well

So I'll say Titanic data

I say fair and again, I've got a histogramso I'll say hissed

So here you can seethe fair size is between zero to hundred now

Let me add the bin size

So as to make itmore clear over here, I'll say Ben is equals to let's say 20 and I'll increasethe figure size as well

So I'll say fixed size

Let's say I'll givethe dimensions as 10 by 5

So it is bins

So this is more clear now next

It is analyzedthe other columns as well

So I'll just typein Titanic data and I want the information asto what all columns are left

So here we have passenger ID, which I guess it'sof no use then we have see how many passengers survivedand how many did not we also do the analysison the gender basis

We saw with a femaletend to survive more or the maintain to survive morethen we saw the passenger class where the passenger is travelingin the first class second class or third class

Then we have the name

So in name,we cannot do any analysis

We saw the sex we saw the ages

Well, then we have sea bass P

So this stands for the numberof siblings or the spouse is which Are aboard the Titanic solet us do this as well

So I'll say SNS dot count plot

I mentioned X SC SP

And I will be usingthe Titanic data so you can see the plot over here so over hereyou can conclude that

It has the maximum valueon zero so we can conclude that neither childrennor a spouse was on board the Titanic nowsecond most highest value is 1 and then we have various valuesfor 2 3 4 and so on next if I go above the storethis column as well

Similarly can do four parts

So next we have part so you can see the numberof parents or children which are both the Titanicso similarly can do

Israel then we havethe ticket number

So I don't think so

Any analysis isrequired for Ticket

Then we have fears of a wehave already discussed as in the people would tendto travel in the first class

You will pay the highest viewthen we have the cable number and we have embarked

So these are the columns that will be doingdata wrangling on so we have analyzed the data and we have seenquite a few graphs in which we can conclude whichvariable is better than another or what is the relationshipthe whole third step is my data wranglingso data wrangling basically means Cleaning your data

So if you have a large data set, you might be havingsome null values or you can say n values

So it's very important that you remove allthe unnecessary items that are presentin your data set

So removing this directlyaffects your accuracy

So I just go aheadand clean my data by removing all the Nan valuesand unnecessary columns, which has a null valuein the data set the next time you'reperforming data wrangling

Supposed to fall I'll check whether my datasetis null or not

So I'll say Titanic data, which is the name of my data setand I'll say is null

So this will basically tellme what all values are null and will return mea Boolean result

So this basicallychecks the missing data and your result will bein Boolean format as in the result will be trueor false so Falls mean if it is not nulland true means if it is null, so let me just run this

Over here you can seethe values as false or true

So Falls is where the value isnot null and true is where the value is none

So over here you can seein the cabin column

We have the very first value which is null so we have to dosomething on this so you can see that we have a large data set

So the counting does not stop and we can actuallysee the some of it

We can actually printthe number of passengers who have the Nan valuein each column

So I say Titanicunderscore data is null and I want the sum of it

They've got some so this isbasically print the number of passengers who have the nn values in each column so we can see that we have missing valuesin each column that is 177

Then we have the maximum valuein the cave in column and we have very Lessin the Embark column

That is 2 so here if you don't wantto see this numbers, you can also plot a heat map and then you can visuallyanalyze it so let me just do that as well

So I'll say SNSD heat map

and say why tick labelsFalse child has run this as we have already seen that there were three columns in which missing datavalue was present

So this might be age so overhere almost 20% of each column has a missing value thenwe have the caping columns

So this is quite a large value and then we have two valuesfor embark column as well

Add a see map for color coding

So I'll say see map

So if I do this so the graph becomesmore attractive so over here yellow stands for Drew or youcan say the values are null

So here we have computed that we have the missing valueof H

We have a lot of missing valuesin the cabin column and we have very less value, which is not even visiblein the Embark column as well

So to removethese missing values, you can either replacethe values and you can put in some dummy values to it or youcan simply drop the column

So here let us supposepick the age column

So first, let mejust plot a box plot and they will analyzewith having a column as age so I'll say SNS dot box plot

I'll say x is equalsto passenger class

So it's PE class

I'll say Y is equalto H and the data set that I'll be usingis Titanic side

So I'll say the datais goes to Titanic data

You can see the edge in first class and second classtends to be more older rather than we have itin the third place

Well that dependsOn the experience how much you earn on might bethere any number of reasons? So here we concluded that passengers who were traveling in classone and class two a tend to be older than what we havein the class 3 so we have found that we have somemissing values in EM

Now one way is to either justdrop the column or you can just simply fillin some values to them

So this method is calledas imputation now to perform data wrangling or cleaning it is for springthe head of the data set

So I'll say Titanic not headso it's Titanic

For data, let's say Ijust want the five rows

So here we have survivedwhich is again categorical

So in this particular column, I can applylogic to progression

So this can be my y valueor the value that you need to predict

Then we havethe passenger class

We have the name then wehave ticket number Fair given so over here

We have seen that in keeping

We have a lot of null valuesor you can say that any invalid which is quite visible as well

So first of all, we'll just drop this columnfor dropping it

I'll just sayTitanic underscore data

And I'll simply typein drop and the column which I need to drop so Ihave to drop the cable column

I mention the access equalsto 1 and I'll say in place also to true

So now again, I just print the headand a to see whether this column has been removedfrom the data set or not

So I'll say Titanic dot head

So as you can see here, we don't havegiven column anymore

Now, you can alsodrop the na values

So I'll say Titanic datadot drop all the any values or you can say Nan which is not a number and I willsay in place is equal to True

Let's Titanic

So over here, let me again plot the heat mapand let's say what the values which will be for showinga lot of null values

Has it been removed or not

So I'll say SNSD heat map

I'll pass in the data set

I'll check it is null I say whydick labels is equal to false

And I don't want color coding

So again I say false

So this will basicallyhelp me to check whether my valueshas been removed from the data set or not

So as you can see here,I don't have any null values

So it's entirely black now

You can actually knowthe some as well

So I'll just go above SoI'll just copy this part and I just use the sum functionto calculate the sum

So here the tells me that data set is green asin the data set does not contain any null value or any n value

So now we have R Angela data

You can see cleaner data

So here we have done justone step in data wrangling that is just removingone column out of it

Now you can do a lotof things you can actually fill in the valueswith some other values or you can justcalculate the mean and then you can just fitin the null values

But now if I see my data set, so I'll sayTitanic data dot head

But now if I see you over here Ihave a lot of string values

So this has to be convertedto a categorical variables in order to implementlogistic regression

So what we will dowe will convert this to categorical variable into some dummy variables andthis can be done using pandas because logistic regressionjust take two values

So whenever you apply machinelearning you need to make sure that there areno string values present because it won't be takingthese as your input variables

So using string you don't haveto predict anything but in my case I have the survivedcolumns 2210 how many? People tend to survive and how men did not so 0 standsfor did not survive and one stands for survive

So now let me justconvert these variables into dummy variables

So I'll just use pandasand I say PD not get dummies

You can simply presstab to autocomplete and say Titanic dataand I'll pass the sex so you can just simply clickon shift + tab to get more information on this

So here we havethe type data frame and we have the passenger IDsurvived and passenger class

So if Run this you'll see that 0 basically standsfor not a female and once and for it is a female similarly formale zero Stanford's not made and one Stanford main now, we don't requireboth these columns because one columnitself is enough to tell us whether it's maleor you can say female or not

So let's say if I wantto keep only mail I will say if the value of mail is 1 so it is definitely a maidand is not a female

So that is how you don't needboth of these values

So for that I justremove the First Column, let's say a female soI'll say drop first

Andrew it has givenme just one column which is male and hasa value 0 and 1

Let me just set this asa variable hsx so over here I can say sex dot head

I'll just want to seethe first pie Bros

Sorry, it's Dot

So this is how my datalooks like now here

We have done it for sex

Then we havethe numerical values in age

We have the numericalvalues in spouses

Then we have the ticket number

We have the pair and wehave embarked as well

So in Embark,the values are in SC and Q

So here also we can applythis get dummy function

So let's say Iwill take a variable

Let's say Embark

I'll use the pandas Library

I need the column namethat is embarked

Let me just printthe head of it

So I'll say Embarkdot head so over here

We have c q and s now here alsowe can drop the First Column because these twovalues are enough with the passengeris either traveling for Q that is toonstone S4 sound time and if both the valuesare 0 then definitely the passenger is from Cherbourg

That is the third value so you can again drop the firstvalue so I'll say drop

Let me just run this so this is how my output lookslike now similarly

You can do it forpassenger class as well

So here also we havethree classes one two, and three so I'll justcopy the whole statement

So let's say I wantthe variable name

Let's say PCL

I'll pass in the column name that is PE class and I'll justdrop the First Column

So here also the valueswill be 1 2 or 3 and I'll just removethe First Column

So here we just leftwith two and three so if both the values are 0 thendefinitely the passengers traveling the first class now, we have made the valuesas categorical now, my next step would beto concatenate all these new rows into a data set

We can see Titanic data usingthe pandas will just concatenate all these columns

So I'll say p Dot

One cat and then saywe have to concatenate sex

We have to concatenateEmbark and PCL and then I will mentionthe access to one

I'll just run this can you to print the head soover here you can see that these columnshave been added over here

So we have the mail columnwith basically tells where the person is male or it's a female thenwe have the Embark which is basically q and s so if it's traveling from Queenstown valuewould be one else it would be 0 and If bothof these values are zeroed, it is definitelytraveling from Cherbourg

Then we have the passengerclass as 2 and 3

So the value of both these is 0 then passengerstravelling in class one

So I hope you got thistill now now these are the irrelevant columns that we have done over here so we can just dropthese columns will drop in PE class the embarked columnand the sex column

So I'll just type in Titanic data dot dropand mention the columns that I want to drop

So I say And even leadthe passenger ID because it's nothingbut just the index value which is starting from one

So I'll drop this as well thenI don't want name as well

So I'll delete name as well

Then what else we can drop wecan drop the ticket as well

And then I'll justmention the axis L say in place is equal to True

Okay, so the my columnname starts uppercase

So these has been dropped now, let me just bringmy data set again

So this ismy final leadership guys

We have the survived column which has the value zero and onethen we have the passenger class or we forgot to dropthis as well

So no worries

I'll drop this again

So now let me just run this

So over here wehave the survive

We have the H wehave the same SP

We have the parts

We have Fair mail and thesewe have just converted

So here we have justperformed data angling for you can see clean the data and then we have justconverted the values of gender to male then embarked to qns and the passenger Class 2 2 & 3

So this was allabout my data wrangling or just cleaning the data thenmy next up is training and testing your data

So here we will splitthe data set into train subset and test steps

And then what we'll dowe'll build a model on the train data and then predict the outputon your test data set

So let me just goback to Jupiter and it is implementthis as well over here

I need to train my data set

So I'll just put thisindeed heading 3

So over you need to Defineyour dependent variable and independent variable

So here my Y is the outputfor you can say the value that I need to predictso over here, I will write Titanic data

I'll take the columnwhich is survive

So basically I haveto predict this column whether the passengersurvived or not

And as you can see we havethe discrete outcome, which is in the form of 0and 1 and rest all the things we can take it as a features or youcan say independent variable

So I'll say Titanic data

Not a drop, so we just simplydrop the survive and all the other columnswill be my independent variable

So everything else arethe features which leads to the survival rate

So once we have definedthe independent variable and the dependent variablenext step is to split your data into trainingand testing subset

So for that we willbe using SK loan

I just type in from sklearndot cross validation

import train display Now here if you just clickon shift and tab, you can go to the documentation and you can just seethe examples over here

And she can blast open it and then I just goto examples and see how you can split your data

So over here you haveextra next test why drain why test and then usingthe string test platelet and just passing your independent variableand dependent variable and just Define a sizeand a random straight to it

So, let me just copy thisand I'll just paste over here

Over here, we'll train test

Then we have the dependentvariable train and test and using the split functionwill pass in the independent and dependent variableand then we'll set a split size

So let's say I'll put it up 0

3

So this basically means that your data setis divided in 0

3 that is in 70/30 ratio

And then I can addany random straight to it

So let's say I'm applyingone this is not necessary

If you want the same resultas that of mine, you can add the random stream

So this would basicallytake exactly the same sample every Next I have to trainand predict by creating a model

So here logisticregression will graph from the linear regression

So next I'll just type in from SK loan dot linear modelimport logistic regression

Next I'll just create the instance of thislogistic regression model

So I'll say log model is equalsto largest aggression now

I just need to fit my model

So I'll say log model dot fit and I'll just passin my ex train

And why it rain? It gives me all the detailsof logistic regression

So here it gives me the classmade dual fit intercept and all those things thenwhat I need to do, I need to make prediction

So I will take a variableinsect addictions and I'll pass on the model to it

So I'll saylog model dot protect and I'll pass in the valuethat is X test

So here we have justcreated a model fit that model and then wehad made predictions

So now to evaluate how my modelhas been performing

So you can simplycalculate the accuracy or you can also calculatea classification report

So don't worry guys

I'll be showing bothof these methods

So I'll say from sklearn dot matrixinput classification report

Are you start fishing report? And inside this I'll be passingin why test and the predictions? So guys this ismy classification report

So over here,I have the Precision

I have the recall

We have the advanced codeand then we have support

So here we have the valueof decision as 75 72 and 73, which is not that bad now in order to calculatethe accuracy as well

You can also use the conceptof confusion Matrix

So if you want to printthe confusion Matrix, I will simply say from sklearn dot matrix importconfusion Matrix first of all, and then we'll justprint this So how am I functionhas been imported successfully so is a confusion Matrix

And I'll again passingthe same variables which is whytest and predictions

So I hope you guys already knowthe concept of confusion Matrix

So can you guys give mea quick confirmation as to whether you guys remember this confusionMatrix concept or not? So if not, I can just quicklysummarize this as well

Okay charged with you say so yes

Okay

So what is not clear with this? So I'll just tellyou in a brief what confusion Matrix is all about? So confusion Matrix is nothingbut a 2 by 2 Matrix which has a four outcomesthis basic tells us that how accurateyour values are

So here we havethe column as predicted

No predicted Y and wehave actual know an actual

Yes

So this is the conceptof confusion Matrix

So here let me just fadein these values which we have just calculated

So here we have 105

105 2125 and 63 Soas you can see here, we have got four outcomes now 105 is the valuewhere a model has predicted

No, and in reality

It was also a no so where we have predicted knowan actual know similarly

We have 63 as a predicted

Yes

So here the model predicted

Yes, and actuallyalso it was yes

So in order tocalculate the accuracy, you just need to add the sum of these two values and dividethe whole by the some

So here these two valuestells me where the order has

We predicted the correct output

So this value is alsocalled as true- This is calledas false positive

This is called as true positive and this is calledas false negative

Now in order tocalculate the accuracy

You don't haveto do it manually

So in Python, you can just importaccuracy score function and you can getthe results from that

So I'll just do that as well

So I'll say from sklearndot-matrix import accuracy score and I'll simplyprint the accuracy

I'm passing the same variables

That is why I testand predictions so over here

It tells me the accuracy as 78which is quite good so over here if you want to do it manually wehave 2 plus these two numbers, which is 105 263

So this comes out to almost 168and then you have to divide by the sum of allthe phone numbers

So 105 plus 63 plus 21 plus 25, so this gives mea result of to 1/4

So now if you divide these two number you'll getthe same accuracy that is 98% or you can say

78

So that is how youcan calculate the accuracy

So now let me just go backto my presentation and let's see what all we havecovered till now

So here we have First Data datainto train and test subset then we have build a modelon the train data and then predicted the outputon the test data set and then my fifth stepis to check the accuracy

So here we have calculator accuracy to almostseventy eight percent, which is quite good

You cannot saythat accuracy is bad

So here tells mehow accurate your results

So him accuracy skoda finds that enhanced gota good accuracy

So now moving ahead

Let us see the second projectthat is SUV data analysis

So in this a car company hasreleased new SUV in the market and using the previous dataabout the sales of their SUV

They want to predictthe category of people who might be interestedin buying this

So using thelogistic regression, you need to find what factorsmade people more interested in buying this SUV

So for this let us hear data setwhere I have user ID I have Of gender as maleand female then we have the age

We have the estimated salary and then we havethe purchased column

So this is my discreet column or you can seethe categorical column

So here we just have the value that is 0 and 1 and this columnwe need to predict whether a person can actuallypurchase a SUV or Not

So based on these factors,we will be deciding whether a person canactually purchase SUV or not

So we know the salaryof a person we know the age and using these we can predict whether person canactually purchase SUV on Let me just go to my jupyter

Notebook and has implementeda logistic regression

So guys, I will not be goingthrough all the details of data cleaning and analyzingthe part start part

I'll just leave it on you

So just go aheadand practice as much as you can

Alright, so the second projectis SUV predictions

Alright, so first of all, I have to importall the libraries so I say import numpySNP and similarly

I'll do the rest of it

Alright, so now letme just bring the head of this data set

So this give already seenthat we have columns as user ID

We have gender

We have the age

We have the salaryand then we have to calculate whether person can actuallypurchase a SUV or not

So now let us just simply go onto the algorithm part

So we'll directly start offwith the logistic regression how you can train a model sofor doing all those things we first need to Definean independent variable and a dependent variable

So in this case, I want my ex at isan independent variable is a data set

I lock so here I will specifysighing all the rows

So cool and basically standsfor that and in the columns, I want only two andthree dot values

So here we should fetchme all the rows and only the secondand third column which is age and estimated salary

So these are the factors which will be used to predictthe dependent variable that is purchase

So here my dependentvariable is purchase any dependent variable isof age and salary

So I'll say later said dot I log I'll have all the rowsand add just one for column

That is my position

Is column values

All right, so I just forgot when one squarebracket over here

Alright so over here

I have defined my independentvariable and dependent variable

So here my independent variableis age and salary and dependent variableis the column purchase

Now, you must be wonderingwhat is this? I lock function

So I look function is basicallyan index of a panda's data frame and it is usedfor integer based indexing or you can also sayselection by index now, let me just bringthese independent variables and dependent variable

So if I bring the independentvariable I have aged as well as a salary next

Let me print the dependentvariable as well

So over here you can see Ijust have the values in 0 and 1 so 0 standsfor did not purchase next

Let me just divide my data setinto training and test subset

So I'll simply write in from SK loaned cross platedot cross validation

import rain test next Ijust press shift and tab and over here

I will go to the examplesand just copy the same line

So I'll just copy this

I'll move the points now

I want to text sizeto be let's see 25, so I have divided the trainedand tested in 75/25 ratio

Now, let's say I'll takethe random set of 0 So Random State basicallyensures the same result or you can say the same samplestaken whenever you run the code

So let me just run this now

You can also scaleyour input values for better performing and this can be doneusing standard scale

Oh, so let me do that as well

So I'll sayfrom sklearn pre-processing

Import standard scalar now

Why do we scale it now? If you see a data set weare dealing with large numbers

Well, although we are usinga very small data set

So whenever you're workingin a prod environment, you'll be workingwith large data set we will be using thousands and hundred thousands of dopeople's so they're scaling down will definitelyaffect the performance by a large extent

So here let me just show you how you can scale downthese input values and then the pre-processing contains allyour methods & functionality, which is requiredto transform your data

So now let us scale down for tests as well astheir training data set

So else First Makean instance of it

So I'll say standard scalar then I'll have Xtreme sasc dotfit fit underscore transform

I'll pass in my Xtreme variable

And similarly I can doit for test wherein I'll pass the X test

All right

Now my next step isto import logistic regression

So I'll simply applylogistically creation by first importing it so I'll say from sklearn sklearn the linear model importlogistic regression over here

I'll be using classifier

So is a classifier DOT is equals to largest aggressionso over here, I just make an instance of it

So I'll say logisticregression and over here

I just pass in the random state, which is 0 No,I simply fit the model

And I simply pass inX train and white rain

So here it tellsme all the details of logistic regression

Then I have topredict the value

So I'll say why I prayedit's equals to classifier

Then predict functionand then I just pass in X test

So now we havecreated the model

We have scale downour input values

Then we have appliedlogistic regression

We have predicted the values and now we wantto know the accuracy

So now the accuracy first weneed to import accuracy scores

So I'll say fromsklearn dot-matrix import actually see school and using this function wecan calculate the accuracy or you can manually do that by creatinga confusion Matrix

So I'll just pass

my lightest and my ypredicted All right

So over here I getthe accuracy is 89% So we want to knowthe accuracy in percentage

So I just have to multiply itby a hundred and if I run this so it gives me 89% So I hope you guys are clear with whatever Ihave taught you today

So here I have takenmy independent variables as age and salary and thenwe have calculated that how many peoplecan purchase the SUV and then we have calculatedour model by checking the accuracy so over herewe get the accuracy is 89 which is great

Alright guys that isit for today

So I'll Discuss what we have coveredin today's training

First of all, we had a quick introductionto what is regression and where their aggressionis actually use then we have understoodthe types of regression and then got into the details of what and whyof logistic regression of compared linear wasin logistic regression

If you've also seenthe various use cases where you can Implementlogistic regression in real life and then we have pickedup two projects that is Titanic data analysis and SUV prediction soover here we have seen how you can collect your dataanalyze your data then perform

Modeling on that datethat train the data test the data and then finallyhave calculated the accuracy

So in your SUV prediction, you can actuallyanalyze clean your data and you can do a lot of things so you can just go aheadpick up any data set and explore it asmuch as you can

What is classification

I hope every one of youmust have used Gmail

So how do you think the maleis getting classified as a Spam or not spam mail? Well, there's Butclassification So What It Is Wellclassification is the process of dividing the data setinto different categories or groups by adding label

In other way, you can saythat it is a technique of categorizing the observationinto different category

So basically what youare doing is you are taking the data analyzing it and on the basisof some condition you finely dividedinto various categories

Now, why do we classify it? Well, we classify itto perform predictive analysis on it like when you get the mail the machinepredicts it Be a Spam or not spam mail and on the basisof that prediction it add the irrelevant or spam mail to the respective folderin general this classification

Algorithm handled questions

Like is this data belongsto a category or B category? Like is this a male or is thisa female something like that? I getting it? Okay fine

Now the question ariseswhere will you use it? Well, you can use thisof protection order to check whether the transactionis genuine or not suppose

I am using a credit

Here in India now due to some reason I hadto fly to Dubai now

If I'm using the creditcard over there, I will get a notification alertregarding my transaction

They would ask me to confirmabout the transaction

So this is also kindof predictive analysis as the machine predicts that something fishy is in the transactionas very for our ago

I made the transaction usingthe same credit card and India and 24 hour later

The same credit card is beingused for the payment in Dubai

So the machine texts thatsomething fishy is going on in the transaction

So in order to confirm it itsends you a notification alert

All right

Well, this is one ofthe use case of classification you can even use itto classify different items like fruits on the baseof its taste color size or weight a machinewell trained using the classification algorithmcan easily predict the class or the type of fruit whenevernew data is given to it

Not just the fruit

It can be any item

It can be a car

It can be a house

It can be a signboard

Or anything

Have you noticed that while you visit some sites or you try to logininto some you get a picture capture for that right where you have to identify whether the given image is ofa car or its of a pole or not? You have to select itfor example that 10 images and you're selectingthree Mages out of it

So in a way you aretraining the machine, right you're telling that these three arethe picture of a car and rest are not so who knows you are trainingat for something big right? So moving on ahead

Let's discuss the typesof education online

Well, there areseveral different ways to perform the same taskslike in order to predict whether a given person is a male or a female the machinehad to be trained first

All right, but there are multiple waysto train the machine and you can choose any one of them justfor Predictive Analytics

There are manydifferent techniques, but the most common of themall is the decision tree, which we'll cover in depthin today's session

So it's a partof classification algorithm

We have decision treerandom Forest name buys

K-nearest neighbor Lodge is Regression linear regressionsupport Vector machines and so on there are many

Alright, so let me giveyou an idea about few of them startingwith decision tree

Well decision tree isa graphical representation of all the possible solution to a decision the decisions which are made theycan be explained very easily

For example here is a task, which says that should I goto a restaurant or should I buy a hamburgeryou are confused on that

So for the artboard youwill do you will create a dish entry for it starting with the root nodewill be first of all, you will checkwhether you are hungry or not

All right, if you're not hungry thenjust go back to sleep

Right? If you are hungry and you have $25 then youwill decide to go to restaurant and if you're hungryand you don't have $25, then you will justgo and buy a hamburger

That's it

All right

So there's about decision treenow moving on ahead

Let's see

What is a random Forest

Well random Forest buildmultiple decision trees and merges them togetherto get a more accurate and stable production

All right, most of the timerandom Forest is trained with a bagging method

The bragging methodis based on the idea that the combination of learning module increasesthe overall result

If you are combining thelearning from different models and then clubbing it together what it will do it will Increasethe overall result fine

Just one more thing

If the size of yourdata set is huge

Then in that case one singledecision tree would lead to our Offutt model same way like a single personmight have its own perspective on the complete population asa population is very huge

Right? However, if we implementthe voting system and ask different individualto interpret the data, then we would be ableto cover the pattern in a much meticulous wayeven from the diagram

You can see that in section A we have Howard largetraining data set what we do

We first divideour training data set into n sub-samples on it and we create a decision treefor each cell sample

Now in the B partwhat we do we take the vote out of every decision madeby every decision tree

And finally we Clubthe vote to get the random Forest dition fine

Let's move on ahead

Next

We have neighbor Buys

So name bias isa classification technique, which is based on Bayes theorem

It assumes that it's of any particular feature ina class is completely unrelated to the presence of any other featurenamed buys is simple and easy to implement algorithmand due to a Simplicity this algorithm might out performmore complex model when the size of the data setis not large enough

All right, a classical use case of Navy bias isa document classification

And that what youdo you determine whether a given text corresponds to one or more categoriesin the Texas case, the features used might bethe presence or absence

Absence of any keyword

So this was about Nevfrom the diagram

You can seethat using neighbor buys

We have to decide whether we havea disease or not

First what we do wecheck the probability of having a disease and not having the diseaseright probability of having a disease is 0

1 while on the other handprobability of not having a disease is 0

9

Okay first, let's see when we have diseaseand we go to the doctor

All right, so when wevisited the doctor and the test is positiveAdjective so probability of having a positive test when you're having a diseaseis 0

8 0 and probability of a negative test when you already havea disease that is 0

20

This is also a false negativestatement as the test is detecting negative, but you still havethe disease, right? So it's a falsenegative statement

Now, let's move ahead when you don't havethe disease at all

So probability of not havinga disease is 0

9

And when you visit the doctorand the doctor is like, yes, you have the disease

But you already knowthat you don't have the disease

So it's a falsepositive statement

So probability of havinga disease when you actually know there is no diseaseis 0

1 and probability of not having a disease when you actually knowthere is no disease

So and the probabilityof it is around 0

90 fine

It is same as probabilityof not having a disease even the test is showingthe same results a true positive statement

So it is 0

9

All right

So let's move on ahead anddiscuss about kn n algorithm

So this KNN algorithmor the k-nearest neighbor, it stores allthe available cases and classifies new cases basedon the similarity measure the K in the KNN algorithm asthe nearest neighbor, we wish to take votefrom for example, if k equal 1 then the objectis simply assigned to the class of that single nearest neighborfrom the diagram

You can see the differencein the image when k equal 1 k equal 3and k equal 5, right? Well the And systems are now able to usethe k-nearest neighbor for visual patternrecognization to scan and detect hidden packagesin the bottom bin of a shopping cartat the checkout if an object is detected which matches exactlyto the object listed in the database

Then the price of the spottedproduct could even automatically be addedto the customers Bill while this automatedbilling practice is not used extensively at this time, but the technologyhas been developed and is available for use if you want you canjust use It and yeah, one more thing k-nearestneighbor is also used in retail to detect patterns in the credit card users manynew transaction scrutinizing software application use Cayenne algorithms toanalyze register data and spot unusual pattern that indicatessuspicious activity

For example,if register data indicates that a lotof customers information is being entered manually ratherthan through automated scanning and swapping then in that case

This could indicate that the employeeswere using the register

In fact stealing customerspersonal information or if I register data indicates that a particular goodis being returned or exchanged multiple times

This could indicate that employees are misusingthe return policy or trying to make money fromdoing the fake returns, right? So this was about KNN algorithm

So starting withwhat is decision tree, but first, let me tellyou why did we choose the Gentry to start with? Well, these decision treeare really very easy to read and understand it belongsto one of The few models that interpretable where you can understand exactlywhy the classifier has made that particular decision right? Let me tell you a factthat for a given data set

You cannot say that this algorithm performsbetter than that

It's like you cannot saythat decision trees better than a buys or name biases performing betterthan decision tree

It depends on the data set, right you have to applyhit and trial method with all the algorithms one by one and then comparethe result the model which gives the bestresult as the Order which you can useat for better accuracy for your data set

All right, so let's startwith what is decision tree

Well a decision tree isa graphical representation of all the possible solution to our decision basedon certain conditions

Now, you might be wonderingwhy this thing is called as decision tree

Well, it is called so because it starts with the root and then branches offto a number of solution just like a tree right eventhe tree starts from a roux and it startsgrowing its branches

As once it gets bigger and bigger similarlyin a decision tree

It has a roux which keeps on growing withincreasing number of decision and the conditions now, let me tell youa real life scenario

I won't say that all of you, but most of youmust have used it

Remember whenever you dialthe toll-free number of your credit card company, it redirects you to his intelligentcomputerised assistant where it asksyou questions like, press one for Englishor press 2 for Henry, press 3 for this press4 for that right now once you select one now again, It redirects youto a certain set of questions like press1 for this press 1 for that and similarly, right? So this keeps on repeating until you finally getto the right person, right? You might think that you are caughtin a voicemail hell but what the companywas actually doing it was just using a decision treeto get you to the right person

I lied

I'd like you to focuson this particular image for a moment onthis particular slide

You can see I imagewhere the task is

Should I accepta new job offer or not? Alright, so you haveto decide that for That what you did you createda decision tree starting with the base conditionor the root node

Was that the basic salary or the minimum salaryshould be $50,000 if it is not $50,000

Then you are not at allaccepting the offer

All right

So if your salary isgreater than $50,000, then you will further check whether the commute ismore than one hour or not

If it is more than one are youwill just decline the offer if it is less than one hour, then you are getting closerto accepting the job offer then further what you will do

You will checkwhether the company is offering

Free coffee or not, right if the companyis not offering the free coffee, then you will justdecline the offer and have fit as offeringthe free coffee

And yeah, you will happilyaccept the offer right? This is just an exampleof a decision tree

Now, let's move aheadand understand a decision tree

Well, here is a sample data set that I will be usingit to explain you about the decision tree

All right in this data seteach row is an example

And the first two columnsprovide features or attributes that describes the data and the last columngives the label or the class we want to predict and if you like youcan just modify this data by adding additional features and more example and our program will workin exactly the same way fine

Now this data setis pretty straightforward except for one thing

I hope you have noticed thatit is not perfectly separable

Let me tell you somethingmore about that as in the second and fifth examplesthey have the same features

But different labelsboth have yellow as a Colour and diameter as three, but the labels are mangoand lemon right? Let's move on and see how our decision treehandles this case

All right, in order to builda tree will use a decision tree algorithm called cardthis card algorithm stands for classification and regression treealgorithm online

Let's see a previewof how it works

All right to beginwith We'll add a root node for the tree and allthe nodes receive a list of rows as a input and the route will receivethe entire training data set now each node will asktrue and false question about one other feature

And in responseto that question will split or partition the data setinto two different subsets these subsets then becomeinput to child node

We are to the tree and the goal of the questionis to finally unmix the labels as we proceed down or inother words to produce the purest possible distributionof the labels at each node

For example, the inputof this node contains only

One single typeof label so we could say that it's perfectly unmixed

There is no uncertaintyabout the type of label as it consistsof only grapes right on the other hand the labelsin this node are still mixed up

So we would ask another questionto further drill it down, right but before that we need tounderstand which question to ask and when and to do that we need to conduct by how much questionhelps to unmix the label and we can quantifythe amount of Uncertainty at a single node usinga metric called gini impurity and we can quantify how much a question reduces that uncertainty using a conceptcalled information game will use these to select the bestquestion to ask at each point

And then what we'll dowe'll iterate the steps will recursively build the tree on each of the new nodewill continue dividing the data until there areno further question to ask and finally wereach to our Leaf

Alright, alright,so this was about decision tree

So in order to createa diversion First of all what you have to doyou have to identify different set of questions that you can ask to a treelike is this color green and what will be these questionthis question will be decided by your data set like asthis colored green as the diameter greaterthan equal to 3 is the color yellow right questions resemblesto your data set remember that? All right

So if my color is green, then what it will do itwill divide into two part first

The Green Mango will bein the true while on the false

We have lemonand the map all right

And if the color is green or the diameter is greaterthan equal to 3 or the color is yellow

Now let's move on and understand aboutdecision tree terminologies

Alright, so starting with root node root nodeis a base node of a tree the entire tree startsfrom a root node

In other words

It is the first nodeof a tree it represents the entire population or sample and this entire populationis further segregated or divided into twoor more homogeneous set

Fine

Next is the leaf node

Well, Leaf node is the one when you reachat the end of the tree, right that is youcannot further segregated down to any other level

That is the leaf node

Next is splitting splittingis dividing your root node or node into different sub parton the basis of some condition

All right, then comesthe branch or the sub tree

Well, this Branchor subtree gets formed when you split the tree supposewhen you split a root node, it gets dividedinto two branches or two subtrees right next

The concept of pruning

Well, you can saythat pruning is just opposite of splitting what weare doing here

We are just removingthe sub node of a decision tree will see more about pruninglater in this session

All right, let's move on ahead

Next is parent or child node

Well, first of all root nodeis always the parent node and all other nodes associated with thatis known as child node

Well, you can understand itin a way that all the top node belongs to a parent nodeand all the bottom node which are derived froma Top node zhi node the node producing a further note isa child node and the node which is producing

It is a parent nodesimple concept, right? Let's use the cartel Gothamand design a tree manually

So first of all, what you do you decidewhich question to ask and when sohow will you do that? So let's first of all visualizethe decision tree

So there's the decision treewhich will be creating manually or like first of all, let's have a lookat the Data set you have Outlook temperaturehumidity and windy as you have different attributeson the basis of that you have to predict thatwhether you can play or not

So which one among them shouldyou pick first answer determine the best attribute thatclassifies the training data? All right

So how will you choosethe best attribute or how does a tree decide where to split or how the treewill decide its root node? Well before we move on and split a tree thereare some terminologies that you should know

All right firstbeing the gini index

X so what is this gini Index? This gini index is the measureof impurity or Purity used in building a decisionTree in cartel Gotham

All right

Next is Information Gainthis Information Gain is the decrease in entropy after data set is spliton the basis of an attribute constructing a decision tree isall about finding an attribute that Returns the highestInformation Gain

All right, so youwill be selecting the node that would give youthe highest Information Gain

Alright next isreduction in variance

Reduction in variance isan algorithm which is used for continuous Target variableor regression problems

The split with lower varianceis selected as a criteria to let the population seein general term

What do you mean by variance? Variance is how muchyour data is wearing? Right? So if your data isless impure or is more pure than in that casethe variation would be less as all the dataalmost similar, right? So there's also a wayof setting a tree the split with lower variance is selected as the criteriato split the population

All right

Next is the chi Square t Square

It is an algorithm which is used to find outthese statistical significance between the differencesbetween sub nodes and the parent nodes fine

Let's move ahead nowthe main question is how will you decidethe best attribute for now just understand that you need to calculatesomething known as information game the attribute with the highest InformationGain is considered the best

Yeah

I know your next questionmight be like what? This information, but before we move on and see what exactly Information GainIs let me first introduce you to a term called entropy because this term will be used in calculatingthe Information Gain

Well entropy is just a metric which measures the impurityof something or in other words

You can say that asthe first step to do before you solve the problemof a decision tree as I mentioned issomething about impurity

So let's move on and understandwhat is impurity suppose

You are a basket full of apples and another Bowl Whichis full of same label, which says Apple now if you are askedto pick one item from each basket and ball, then the probabilityof getting the apple and it's correct label is 1 soin this case, you can say that impurities zero

All right

Now what if there arefour different fruits in the basket and four differentlabels in the ball, then the probabilityof matching the fruit to a label is obviously not one

It's something less than that

Well, it could be possible that I picked bananafrom the basket and when I randomlypicked Level from the ball

It says a cherryany random permutation and combination can be possible

So in this case, I'd saythat impurities is nonzero

I hope the conceptof impurities here

So coming back to entropy as I said entropy isthe measure of impurity from the graph on your left

You can see that as the probabilityis zero or one that is either theyare highly impure or they are highly purethan in that case the value of entropy is zero

And when the probability is0

5 then the value of entropy

Is maximum

Well, what is impurityimpurities the degree of Randomness how random data is so if the data is completely pure in that casethe randomness equals zero or if the data is completely emptyor even in that case the value of impuritywill be zero question

Like why is it that the valueof entropy is maximum at 0

5 might arisein a mine, right? So let me discuss about that

Let me derive it mathematically as you can see here on the slidethe mathematical formula of entropy is - of probability of yes, let's move on and see what this graph has to saymathematically suppose s is our total sample spaceand it's divided into two parts

Yes, and no likein our data set the result for playing was dividedinto two parts

Yes or no, which we have to predicteither we have to play or not

Right? So for that particular case, you can Define the formulaof entropy as entropy of total samplespace equals negative of probability of e is multiplied bylog of probability

We of yes, whether base 2 minus probabilityof no X log of probability of no with base to where s isyour total sample space and P of v s isthe probability of e s-- and p-- of know isthe probability of no

Well, if the numberof BS equal number of know that is probabilityof s equals 0

5 right since you have equal numberof BS and know so in that case the value of entropy will be one justput the value over there

All right

Let me just move to Next slideI'll show you this

Alright next isif it contains all Yes, or all know that is probabilityof a sample space is either 1 or 0 then in that case entropywill be equal to 0 Let's see themathematically one by one

So let's startwith the first condition where the probability was 0

5

So this is our formulafor entropy, right? So there's our first case rightwhich will discuss the art when the probabilityof vs equal probability of node that is in our data set we haveRule number of yes, and no

All right

So probability of yesequal probability of no and that equals0

5 or in other words, you can say that yes plus no equalto Total sample space

All right, sincethe probability is 0

5

So when you put the values in the formula you getsomething like this and when you calculate it, you will get the entropy ofthe total sample space as one

All right

Let's see for the next case

What is the next caseeither you have totally us or you have to No, so if you have total, yes, let's see the formulawhen we have total

Yes

So you have all yesand 0 no fine

So probability of e s equal one

And yes as the totalsample space obviously

So in the formulawhen you put that thing up here, you get entropy of sample space equal negative Xof 1 multiplied by log of 1 as the value of log 1 equals 0

So the total thing will resultto 0 similarly is the case with no even in that caseyou will get the entropy of total sample

Case as 0 so this wasall about entropy

All right

Next is what isInformation Gain? Well Information Gain what it does is it measuresthe reduction in entropy

It decides which attribute should be selectedas the decision node

If s is our total collection than Information Gainequals entropy, which we calculatedjust now that - weighted average multipliedby entropy of each feature

Don't worry

We'll just see how it to calculateit with an example

All right

So let's manually builda decision tree for our data set

So there's our data set which consists of14 different instances out of which we have nine

Yes and five know I likeso we have the formula for entropy just putover that since 9 years

So total probabilityof e s equals 9 by 14 and total probabilityof no equals Phi by 14 and when you put up the value and calculate the resultyou will get the value

Oh of entropy as 0

94

All right

So this was your first step that is compute the entropyfor the entire data set

All right

Now you have to select that out of Outlooktemperature humidity and windy, which of the node should youselect as the root node big question, right? How will you decide that? This particular node shouldbe chosen at the base note and on the basis of that only I will be creatingthe entire tree

I will select that

Let's see so you have to do it one by one you haveto calculate the entropy and Information Gain for all of the Front note sostarting with Outlook

So Outlook has three different parametersSunny overcast and rainy

So first of all selecthow many number of years and no are there in the caseof Sunny like when it is sunny how many number of years and how many numberof nodes are there? So in total we have to yes and three Nos and caseof sunny in case of overcast

We have all yes

So if it is overcast thenwill surely go to play

It's like that

Alright and next it is rainythen total number of vs equal

Three and total numberof no equals 2 fine next what we do wecalculate the entropy for each feature for here

We are calculating the entropywhen Outlook equals Sunny

First of all, we are assumingthat Outlook is our root node and for that we are calculatingthe information gain for it

Alright

So in order to calculatethe Information Gain remember the formula it was entropyof the total sample space - weighted average X entropyof each feature

All right

So what we are doing here, we are calculatingthe entropy of out

Look when it was sunny

So total number of yes, when it was sunny wasto and total number of know that was three fine

So let's put up in the formula since the probabilityof yes is 2 by 5 and the probabilityof no is 3 by 5

So you will getsomething like this

Alright, so you aregetting the entropy of sunny as zero pointnine seven one fine

Next we will calculatethe entropy for overcast when it was overcast

Remember it was all yes, right

So the probability of yes is equal 1and when you put over that you will get the valueof entropy as 0 fine and when it was rainy rainyhas 3s and to nose

So probability of e sin case of Sonny's 3 by 5 and probability of knowin case of Sonny's 2 by 5

And when you add the valueof probability of vs and probability of noto the formula, you get the entropy of sunny aszero point nine seven one point

Now, you have to calculate how much information youare getting from Outlook that equals weighted average

All right

So what was this? To diverge total number of yearsand total number of no fine

So information from Outlookequals 5 by 14 from where does this 5 came over? We are calculating the total number of sample spacewithin that particular Outlook when it was sunny, right? So in case of Sunny therewas two years and three NOS

All right

So weighted average for Sonnywould be equal to 5 by 14

All right, since the formula was fiveby 14 x entropy of each feature

All right, so as calculated the entropy Hefor Sonny is zero point nine

Seven one, right? So what we'll do we'll multiply5 by 14 with 0

97 one

Right? Well, this wasthe calculation for information when Outlook equal sunny, but Outlook even equals overcastand rainy for in that case

What we'll do again similarlywill calculate for everything for overcast and sunny for overcast weighted averages for by 14 multipliedby its entropy

That is 0 and for Sonnyit is same Phi by 14

Yes, and to Knows X its entropy that is zero pointnine seven one

And finally we'll take the sumof all of them which equals to 0

693 right next

We will calculatethe information gained this what we did earlier wasinformation taken from Outlook

Now, we are calculating

What is the information? We are gainingfrom Outlook right

Now this Information Gain that equals to Total entropyminus the information that is taken from Outlook

All right, so Sototal entropy we had 0

94 - information we tookfrom Outlook as 0

693

So the value of informationgained from Outlook results to zero point two four seven

All right

So next what we have to do

Let's assume thatWendy is our root node

So Wendy consists oftwo parameters false and true

Let's see how many years and how many nodes are therein case of true and false

So when Wendy hasFalls as its parameter, then in that case it hassix years and to knows

And when it as trueas its parameter, it has 3 S and 3 nodes

All right

So let's move ahead and similarly calculatethe information taken from Wendy and finally calculate theinformation gained from Wendy

Alright, so first of all, what we'll do we'llcalculate the entropy of each feature startingwith windy equal true

So in case of true wehad equal number of yes and equal numberof no will remember the graph when we had the probability as 0

5 as total number of yearsequal total number of know

For that casethe entropy equals 1 so we can directlywrite entropy of room when it's windy is one as we had already proved it when probability equals 0

5the entropy is the maximum that equals to 1

All right

Next is entropy of falsewhen it is windy

All right, so similarly justput the probability of yes and no in the formulaand then calculate the result since you have six yearsand two nodes

So in total, you'll get the probabilityof e S6 by 8 and probability of know Two by eight

All right, so when youwill calculate it, you will get the entropy of false as zero pointeight one one

Alright, now, let's calculatethe information from windy

So total informationcollected from Windy equals information taken when Wendy equal trueplus information taken when when D equals false

So we'll calculate the weightedaverage for each one of them and then we'll sum it up to finally get the totalinformation taken from windy

So in this case, it equals to 8 by 14 multipliedby 0

8 1 1 + 6 y 14 x 1 what is this? 8 it is total number of yes, and no in case when when Dequals false, right? So when it was false,so total number of BS that equals to 6 and total moreof know that equal to 2 that some herbs to 8

All right

So that is why the weightedaverage results to Aid by 14 similarly information taken when windy equals true equalsto 3 plus 3 that is 3 S and 3 no equal 6 divided bytotal number of sample space

That is 14 x Thatis entropy of true

All right, so it is a by 14 multiplied by 0

8 1 1plus 6 by 14 x one which results to 0

89 to this is information taken from Windy

All right

Now how much informationyou are gaining from Wendy

So for that what you will do sototal information gained from Windy that equalsto Total entropy - information taken from Windy

All right, that is 0

94 - 0

89 to that equalsto zero point zero four eight

And so 0

048 is the informationgained from Windy

All right

Similarly we calculatedfor the rest to all right

So for Outlookas you can see, the information was 0

693

And it's Information Gainwas zero point two four seven in case of temperature

The information was around zero point nine one oneand the Information Gain that was equal to 0

029 in case of humidity

The information gained was 0

15to and in the case of windy

The informationgained was 0

048

So what we'll do we'llselect the attribute

With a maximum fine

Now, we are selectedOutlook as our root node, and it is further subdivided into three different partsSunny overcast and rain, so in case of overcastwe have seen that it consists of all

Yes, so we can considerit as a leaf node, but in case of sunny and rainy, it's doubtful as itconsists of both

Yes and both know so you need to recalculatethe things right again for this node

You have torecalculate the things

All right, you have to againselect the attribute

Is having the maximumInformation Gain

All right, so there's how your complete treewill look like

All right

So, let's see when you can playso you can play when Outlook is overcast

All right, in that case

You can always playif the Outlook is sunny

You will further drill downto check the humidity condition

All right, if thehumidity is normal, then you will play if the humidity is highthen you won't play right when the Outlook predicts that it's rainy thenfurther you will check whether it's windy or not

If it is a week went thenyou will go and offer

Say but if it has strong wind,then you won't play right? So this is how your entiredecision tree would look like at the end

Now comes the conceptof pruning say is that what should I do to play? Well you have to dopruning pruning will decide how you will play

What is this pruning? Well, this pruning is nothingbut cutting down the nodes and order to getthe optimal solution

All right

So what pruning does itreduces the complexity? All right as are youcan see on the screen that it showing onlythe result for you

That is it showing allthe result which says that you can play

All right before we drill downto a practical session a common questionmight come in your mind

You might think that our tree base model betterthan cleaner model, right? You can think like if Ican use a logistic regression for classification problem and linear regressionfor regression problem

Then why there isa need to use the tree

Well many of us have this Inin their mind and well, there's a valid question too

Well, actually asI said earlier, you can use any algorithm

It depends onthe type of problem

You're solving let's lookat some key factor, which will help you to decidewhich algorithm to use and when so the first point being if the relationship betweendependent and independent variable as well approximated by a linear model then linear regression will outperformtree base model second case if there is a highnon-linearity and complex relationship between Lent and independent variablesat remodel will outperform a classical regressionmodel in third case

If you need to build a model which is easy to explainto people a decision tree model will always do betterthan a linear model as the decision tree models are simpler to interpretthen linear regression

All right

Now, let's move on ahead and see how you can write it asGentry classifier from scratch and python usingthe card algorithm

All right for this

I will be using jupyter notebookwith python 3

0

Oh install on it

Alright, so let'sopen the Anaconda and the jupyter notebook

Whereas that so this is a inner Corner Navigatorand I will directly jump over to jupyter notebook and hitthe launch button

I guess everyoneknows that jupyter

Notebook is a web-basedinteractive Computing notebook environment where youcan run your python codes

So my jupyter notebook

It opens on my LocalHost double 8 9 1 so I will be usingthis jupyter notebook in order to writemy decision tree classifier using python for thisdecision tree classifier

I have already written

Set of codes

Let me explain youjust one by one

So we'll start with initializingour training data set

So there's our sample data set for which each rowis an example

The last column is a label and the first two columnsare the features

If you want you can add somemore features an example for your practiceinteresting fact is that this data setis designed in a way that the second and fifthexample have almost the same features, but they have different labels

All right

So let's move on and seehow the tree handles this case as you can see here both

Both of them the second and the fifth columnhave the same features

What did differentis just their label? Right? So let's move ahead

So this is our training dataset next what we are doing we are adding some column labels

So they are used onlyto print the trees fine

So what we'll do we'll addheader to the columns like the First Column isof color second is of diameter and third is a label column

Alright, next Roadwill do will Define a function as unique valuesin which will pass the rows and the columns

So this functionwhat it will do

We find the unique valuesfor a column in the data set

So this is an example for that

So what we are doing here, we are passingtraining data Hazard row and column number as 0 so what we are doing we are findingunique values in terms of color

And in this since the row is training dataand the column is 1 so what you are doing here, so we are findingthe unique values in terms of diameter fine

So this is just an example next what we'll do we'll Definea function as class count and we'll pass zeros into it

So what it does it countsthe number of each type of Example within data set

So in this function what we are basically doingwe are counting the number of each type for example in the data set orwhat we are doing

We are counting the uniquevalues for the label in the data set as a sample

You can see here

We can pass that entiretraining data set to this particular functionas class underscore count what it will do it will findall the different types of label within the training data set as you can see here the uniquelabel consists of mango grape and lemon so next what we'll dowe'll Define a function is numeric and we'll passa value into it

So what it Do it

We'll just test if the value is numericor not and it will return if the value isan integer or a float

For example, youcan see is numeric

We are passing 7so it is an integer so it will return in value and if we are passing red it'snot a numeric value, right? So moving on ahead where you define a classnamed as question

So what this question does this question is usedto partition the data set

This class voted does itjust records a column number? For example 0 for color a lightand a column value for example, green Next what we are doingwe are defining a match method which is used to comparethe feature value in the example

The feature valuesstored in the question

Let's see how first of allwhat you are doing

We're defining an initfunction and inside that we are passingthe self column and the value as parameter

So next what we dowe Define a function as match what it does is itcompares the feature value in an example to the featurevalue in this question when next we'll Definea function as re PR, which is just a helper methodto print the question in a readable format

Next what we are doing we aredefining a function partition

Well, this functionis used to partition the data set each rowin the data set it checks if it matchedthe question or not if it does so it adds itto the true rose or if not, then it adds to the false Rose

All right, for example, as you can see, it's partitionthe training data set based on whether the rowsare ready or not here

We are callingthe function question and we are passing a valueof zero and read to it

So what did we do? It will assign all the red roseto True underscore Rose

And everything elsewill be assigned to false underscore rose fine

Next what we'll do we'll Definea gini impurity function and inside that will passthe list of rows

So what it will do it will justcalculate the dream Purity for the list of rows

Next what we are doingevery defining a function as Information Gain

So what this Information Gainfunction does it calculates The Information Gainusing the uncertainty of the starting node - the weighted impurityof the child node

The next functionis find the best plate

Well, this function is usedto find the best question to ask by iterating overevery feature of value and then calculatingthe information game

For the detail explanationon the code

You can find the codein the description given below

All right next we'll definea class as leave for classifying the data

It holds a dictionary of glasslike mango for how many times it appears in the rowfrom the training data that reaches the sleeve

Alright next isthe decision node

So this decision node,it will ask a question

This holds a referenceto the question and the two child nodeson the base of that you are deciding which nodeto add further to which branch

Alright so next video

We're defining a functionof Beltre and inside that we are passingour number of rows

So this is the functionthat is used to build the tree

So initially what we did weDefine all the various function that we'll be usingin order to build a tree

So let's start by partitioning the data setfor each unique attribute, then we'll calculatethe information gain and then return the question that produces the highest gain and on the basis of thatwill split the tree

So what we are doing here, we are partitioningthe data set calculating the Information Gain

And then what this is returningit is returning the question that is producingthe highest gain

All right

Now if gain equals0 return Leaf Rose, so what it will do

So if we are gettingno for the gain that is gain equals0 then in that case since no further questioncould be asked so what it will do itwill return a leaf fine now true or underscore Rose or false underscore Roseequal partition with rose and the question

So if we are reachingtell this position, then you have alreadyfound a Value which will be usedto partition the data set then what you will do youwill recursively build the true branch and similarly recursivelybuild the false Branch

So return Divisionand Discord node and side that will be passing questiontrue branch and false Branch

So what it will do itwill return a question node

This question node thisrecalls the best feature or the value to askat this point fine

Now that we haveBuilder tree next what we'll do we'll Definea print underscore tree function which will be usedto print the tree fine

So finally what we are doingin this particular function that we are printing our treenext is the classify function which will use it to decide whether to follow the trueBranch or the false branch and then compared to the feature values storedin the node to the example

We are consideringand last what we'll do we'll finally printthe production at the leaf

So let's executeit and see okay, so there's our testing data

Online so we printeda leaf as well

Now that we have trainedour algorithm is our training data setnow it's time to test it

So there's our testing data set

So let's finally executeit and see what is the result

So this is the result youwill get so first question, which is asked by the algorithmis is diameter greater than equal to 3, if it is true, then it will further askif the color is yellow again, if it is true, then it will predict mangoas one and lemon with one

And in case it is false, then it will justpredict the mango

Now

This was the true part

Now next comingto diameter is not greater than or equal to 3 thenin that case it's false

And what did we do? It'll just predictthe grape vine

Okay

So this was allabout the coding part now, let's conclude this session

But before concluding let mejust show you one more thing

Now

There's a scikit-learnalgorithm cheat sheet, which explains you which algorithm you should useand when all right, let's build ina decision tree format

At let's see how it is Big

So first condition it will check whether you have50 samples or not

If your samplesare greater than 50, then we'll move aheadif it is less than 50, then you needto collect more data if your sampleis greater than 50, then you have to decide whether you want to predicta category or not

If you want topredict a category, then further you will see that whether youhave labeled data or not

If you have label data, then that would be a classificationalgorithm problem

If you don't havethe label data, then it would bea clustering problem

Now if you don't wantto The category then what you want to protectpredict a quantity

Well, if you wantto predict a quantity, then in that case, it would bea regression problem

If you don't want to predict a quantity and you wantto keep looking further, then in that case, you should go for dimensionalityreduction problems and still if you don't want to look and the predicting structureis not working

Then you havetough luck for that

I hope this doesn't recessionclarifies all your doubt over decision tree algorithm

Now, we'll try to find outthe answer to this particular question as to why weneed random Forest fine

So like human beings learnfrom the past experiences

So unlike human beingsa computer does not have experiences then how doesmachine takes decisions? Where does it learn from? Well a computer system actually learns from the data whichrepresents some past experiences of an application domain

So now let's see, how random Forest It'sin building up in learning model with a very simple use caseof credit risk detection

Now needless to say that credit card companies have a very nestedinterest in identifying Financial transactions that are illegitimateand criminal in nature

And also I would liketo mention this point that according tothe Federal Reserve payments study Americans usedcredit cards to pay for twenty six pointtwo million purchases in 2012 and The estimated lossdue to unauthorized transactions that here was u

s

6 point 1 billion dollars now in the banking industrymeasuring risk is very critical because the stakes are too high

So the overall goal isactually to figure out who all can be fraudulent before too much Financialdamage has been done

So for this a credit cardcompany receives thousands of applications for new cards and each applicationcontains information

Mission about anapplicant, right? So so here as you can seethat from all those applications what we can actuallyfigure out is that predictor variables

Like what is the maritalstatus of the person? What is the genderof the person? What is the age of the personand the status which is actually whether it is a default pairor non-default pair

So default payments arebasically when payments are not made in time and according to the agreementsigned by the cardholder

So now that account is actuallyset to be in the default

So you can easilyfigure out the history of the particular card holderfrom this then we can also look at the time of payment whether he has beena regular pair or non regular one

What is the source of incomefor that particular person and so and so forth

So to minimize lossthe back actually needs certain decision rule to predict whether to approve Particular no one ofthat particular person or not

Now here is where the random Forestactually comes into the picture

All right

Now, let's see how randomForest can actually help us in this particular scenario

Now, we have taken randomly two parameters out of allthe predictive variables that we saw previously now, we have taken twopredictor variables here

The first one is the income and the second oneis the H right and Hurley parallel it to decision treeshave been implemented upon those predicted variablesand let's first assume the case of the income variable right? So here we have dividedour income into three categories the first one being the personearning over $35,000 second from 15 to 35 thousand dollarsthe third one running in the range of 0 to15 thousand dollars

Now if a personis earning over $35,000, which is a pretty Goodincome pretty decent

So now we'll check outfor the credit history

And here the probability is that if a person is earninga good amount then there is very low risk that he won't be able to payback already earning good

So the probability is that his applicationof loan will get approved

Right? So there is actually low riskor moderate risk, but there's no real issueof higher risk as such

We can approvethe applicants request here

Now, let's move on and watch outfor the second category where the personis actually earning from 15 to 35 thousand dollarsright now here the person may or may not pay back

So in such scenarios will lookfor the credit history as to what has beenhis previous history

Now if his previoushistory has been bad like he has been a default ERin the previous transactions will definitely not Considerapproving his request and he will be at the high risk in whichis not good for the bank

If the previous history of that particularapplicant is really good

Then we will just to clarify a doubt will consideranother parameter as well that will be on depth

I have his alreadyin really high dip then the risks again increasesand there are chances that he might not payrepay in the future

So here Will

Not accept the requestof the person having high dipped if the person isin the low depth and he has been a good pairin his past history

Then there are chances that he might be backand we can consider approving the requestof this particular applicant

Alex look at the third category, which is a person earningfrom 0 to 15 thousand dollars

Now, this is somethingwhich actually raises I broke and this personwill actually lie in the category of high risk

All right

So the probability is that his application of loanwould probably get rejected now, we'll get one final outcome fromthis income parameter, right? Now let us lookat our second variable that is H which will leadinto the second decision tree

Now

Let us sayif the person is Young, right? So now we will look forward toif it is a student now if it is a student thenthe chances are high that he won't beable to repay back because he hasno earning Source, right? So here the risks are too highand probability is that his applicationof loan will get rejected fine

Now if the person is Young and his Not the studentthen we'll probably go on and look for another variable

That is pan balance

Now

Let's look if the bank balanceis less than 5 lakhs

So again the risk arisesand the probabilities that his applicationof loan will get rejected

Now if the personis Young is not a student and his bank balance so of greater than 5 lakhsis got a pretty good and stable and balancedthen the probability is that he is sort of application will get approved of Nowlet us take another scenario if he's a senior, right? So if he is a seniorwill probably go and check out for this credit history

How well has he beenin his previous transactions? What kind of a person he is like whether he's a defaulteror is Ananda falter

Now if he is a veryfair kind of person in his previous transactionsthen again the risk arises and the probabilityof his application getting rejected actuallyincreases right now if he has An excellent person as per his transactionsin the previous history

So now again herethere is least risk and the probabilities that his applicationof loan will get approved

So now here these two variablesincome and age have led to two different decision trees

Right and these two differentdecision trees actually led to two different results

Now what random forest does isit will actually compile these two different resultsfrom these two different

Gentry's and then finally, it will leadto a final outcome

That is how randomForest actually works

Right? So that is actually the motiveof the random Forest

Now let us move forward and seewhat is random Forest right? You can get an idea of the mechanism from the nameitself random forests

So a collectionof trees is a fortress that's why I calledfor is probably and here also the trees are actuallybecause being trained on subsets which are beingselected at random

And therefore they are calledrandom forests So Random forests is a collection or an insane

Humble of decision trees righthere decision trees actually built using the whole dataset considering all features, but actually in random Forestonly a fraction of the number of rows is selected and that too at random and a particularnumber of features, which are actually selectedat random are trained upon and that is how the decision treesare built upon

Right? So similarly numberof decision trees will be grown and each decision tree will Saltinto a certain final outcome and random Forestwill do nothing but actually justcompiled the results of all those decision treesto bring up the final result

As you can seein this particular figure that a particular instanceactually has resulted into three differentdecision trees, right? So not tree one results intoa final outcome called Class A and tree to results into class B

Similarly treethree results into class P So Random Forest will compile the resultsof all these Decision trees and it will go by the callof the majority voting now since head to decision treeshave actually voted into the favor of the Class Bthat is decision tree 2 and 3

Therefore the final outcome willbe in the favor of the Class B

And that is how randomForest actually works upon

Now one really beautiful thing aboutthis particular algorithm is that it is oneof the versatile algorithms which is capable of Performingboth regression as well as Now, let's try to understandrandom Forest further with a very beautiful exampleor this is my favorite one

So let's say you want to decide if you want to watch edgeof tomorrow or not, right? So in this particular scenario, you will have two differentactions to work Bond either

You can just straight away go to your best friendasked him about

All right, whether should I go for Edgeof Tomorrow not will I like this movie or youcan ask Your friends and take their opinionconsideration and then based on the final results who can go out and watch Edgeof Tomorrow, right? So now let's just takethe first scenario

So where you goto your best friend asked about whether you should goout to watch edge of tomorrow or not

So your friend will probablyask you certain questions like the first one beinghere Jonah So so let's say your friend asks you if you really likeThe Adventurous kind of movies or not

So you say yes, definitely I would love to watchit Venture kind of movie

So the probabilities that you will like edgeof tomorrow as well

Since Age of Tomorrow isalso a movie of Adventure and sci-fi kindof Journal right? So let's say you do not likethe adventure John a movie

So then againthe probability reduces that you might reallynot like edge of Morrow right

So from here you can cometo a certain conclusion right? Let's say your best friend putsyou into another situation where he'll ask you or a do you like Emily Bluntand you see definitely I like Emily Blunt and then heputs another question to you

Do you like Emily Bluntto be in the main lead and you say yes, then again, the probability arises that you will definitelylike edge of tomorrow as well because Edge of Tomorrowis Has the Emily plant in the main lead cast so and if you say oh I do not likeEmily Blunt then again, the probability reduces that you would like Edgeof Tomorrow to write

So this is one way where you have one decision treeand your final outcome

Your final decision will bebased on your one decision tree, or you can see your finaloutcome will be based on just one friend

No, definitely notreally convinced

You want to consider the optionsof your other friends also so that you can makevery precise and crisp decision right you go out and you approach some otherbunch of friends of yours

So now let's say you goto three of your friends and you ask themthe same question whether I would like to watchit off tomorrow or not

So you go out and approach three or four friends friendone friend twin friend three

Now, you will considereach of their Sport and then you will your decisionnow will be dependent on the compiled results of allof your three friends, right? Now here, let's say you goto your first friend and you ask him whether you would liketo watch it just tomorrow not and your first friendputs you to one question

Did you like Top Gun? And you say yes, definitely I did like the movieTop Gun then the probabilities that you would likeedge of tomorrow as well because topgun is actuallya military action drama, which is also Tom Cruise

So now again the probabilityRises that yes, you will like edgeof tomorrow as well and If you say no I didn't likeTop Gun then again

The chances are that you wouldn't like Edgeof Tomorrow, right? And then another questionthat he puts you across is that do you really liketo watch action movies? And you say yes, I would love to watchthem that again

The chances are that you would liketo watch Edge of Tomorrow

So from your friend when you can cometo one conclusion now here since the ratio of liking the movieto don't like is actually 2 is to 1 so the finalresult is Actually, you would like Edge of Tomorrow

Now you go to your second friendand you ask the same question

So now you are second friendasks you did you like far and away when we wentout and did the last time when we washed it and you say no I reallydidn't like far and away then you would say thenyou are definitely going to like Edge of Tomorrow

Why does so because farand away is actually since most of whommight not be knowing it so far in a ways Johner of romance and it revolves around a girl and a guy By falling in lovewith each other and so on

So the probability is that you wouldn't likeedge of tomorrow

So he ask you another question

Did you like Bolivian and to really liketo watch Tom Cruise? And you say Yes, again

The probability is that you would liketo watch Edge of Tomorrow

Why because Oblivionagain is a science fiction casting Tom Cruise fullof strange experiences

And where Tom Cruise isthe savior of the masses

Kind well, that is the same kind of plotin edge of tomorrow as well

So here it is pure yes that you would liketo watch edge of tomorrow

So you get another second decisionfrom your second friend

Now you go to your thirdfriend and ask him so probably our third friend isnot really interesting in having any sortof conversation with you say, it just simply asks you did youlike Godzilla and you said no I didn't like Godzilla's we said definitelyyou wouldn't like it's of tomorrow why so because Godzilla is alsoactually sign Fiction movie from the adventure Jonah

So now you have gotthree results from three different decision treesfrom three different friends

Now you compile the resultsof all those friends and then you makea final call that yes, would you like to watch edgeof tomorrow or not? So this is some very real timeand very interesting example where you can actuallyImplement random Forest into ground reality rightany questions so far

So far, no, that's good, and thenwe can move forward

Now let us lookat various domains where random Forestis actually used

So because of its diversityrandom Forest is actually used in various diverse to means like so beat banking beat medicine beat land usebeat marketing name it and random Forest is there soin banking particularly random Forest is beingactually used to make it out whether the applicantwill be a default a pair or it Will be non default of 1 so that it can accordingly approve or rejectthe applications of loan, right? So that is how random Forestis being used in banking talking about medicine

Random

Forest is widely used in medicine fieldto predict beforehand

What is the probability if a person will actually havea particular disease or not? Right? So it's actually used to lookat the various disease Trends

Let's say you want to figureout what is the probability that a personwill have diabetes? Not and so what would you do? It'd probably lookat the medical history of the patient and thenyou will see or read

This has beenthe glucose concentration

What was the BMI? What was the insulin levels in the patient in the pastprevious three months

What is the ageof this particular person and will make a differentdecision trees based on each one of these predictor variables and then you'll finallycompiled the results of all those variablesand then you'll make a fine

Final decision as to whether the personwill have diabetes in the near future or not

That is how randomForest will be used in medicine sector now move

Random Forest is also actuallyused to find out the land use

For example, I want to setup a particular industry in certain area

So what would I probablylook for a look for? What is thevegetation over there? What is the Urbanpopulation over there? Right and how much is the Isfrom the nearest modes of Transport likefrom the bus station or the railway stationand accordingly

I will split my parameters and I will make decisionon each one of these parameters and finally I'll compilemy decision of all these parameters in thatwill be my final outcome

So that is how Iam finally going to predict whether I should put my industry at this particularlocation or not

Right? So these three exampleshave actually been of majorly around classification problem because we aretrying to classify whether or not we're actuallytrying to answer this question whether or not right now, let's move forward and look how marketing is revolvingaround random Forest

So particularly in marketing we try to identifythe customer churn

So this is particularlythe regression kind of problem right now how let's see so customer churn is nothing but actuallythe number of people which are actuallyThe number of customers who are losing out

So we're goingout of your market

Now you want to identify what will be your customer churnin near future

So you'll most of them eCommerce Industries areactually using this like Amazon Flipkart Etc

So they particularly lookat your each Behavior as to what has been your past history

What has beenyour purchasing history

What do you likebased on your activity around certain things aroundcertain ads around certain? Discounts or around certain kindof materials right? If you like a particular topyour activity will be more around that particular top

So that is how they track eachand every particular move of yours and thenthey try to predict whether you will bemoving out or not

So that is how they identifythe customer churn

So these all are various domains where random Forestis used and this is not the only list so thereare numerous other examples which are Chile are using random forests that makesit so special actually

Now, let's moveforward and see how random Forest actually works

Right

So let us start with the randomForest algorithm first

Let's just see it step by step as to how randomForest algorithm works

So the first step isto actually select certain M features from T

Where m is less than T

So here T is the total numberof the predictor variables that you have in your data set and out ofthose total predictor variables

You will select some randomlysome Features out of those now why we are actually selectinga few features only

The reason is that if you will select allthe predictive variables or the total predictor variablesthen each of your decision tree will be same

So the model is not actuallylearning something new

It is learningthe same previous thing because all those decision treeswill be similar, right if you actually splityour predicted variables and you select randomlya few predicted variables only

Let's say there are 14 totalnumber of variables and out of those you randomlypick just three right? So every time you will geta new decision tree, so there will be variety

Right? So the classification modelwill be actually much more intelligentthan the previous one

Now

It has gotbarrier to experiences

So definitely it will makedifferent decisions each time

And then when you will compileall those different decisions, it will be a new more accurate

An efficient result right? So the first important stepis to select certain number of features out of allthe features now, let's move on tothe second step

Let's say for any node D

Now

The first step is to calculatethe best plate at that point

So, you know that decision tree how decision treesactually implemented so you pick up a the mostsignificant variable right? And then you will splitthat particular node into Other child nodes that is how the splittakes place, right? So you will do itfor M number of variables that you have selected

Let's say youhave selected three so you will implementthe split at all

Those three nodesin one particular decision tree, right the third stepis split up the node into two daughter nodes

So now you can splityour root note into as many notes as you want to put hairwill split our node into 2

2 notes as to this or that so it will be an answerin terms of You saw that right? Our fourth step will beto repeat all these 3 steps that we've done previously and we'll repeatall this splitting until we have reached allthe N number of nodes

Right? So we need to repeat until we have reachedtill the leaf nodes of a decision tree

That is how we will do it rightnow after these four steps

We will haveour one decision tree

But random Forest isactually about multiple

Asian trees

So here our fifth stepwill come into the picture which will actually repeatall these previous steps for D number of times nowhit these the D number of decision trees

Let's say I want to implementfive decision trees

So my first step will be to implement allthe previous steps 5 times

So the head the eye tration is4/5 number of times right now

Once I have created these five decision trees stillmy task is not complete yet

On my final task will beto compile the results of all these fivedifferent decision trees and I will make a call in the majorityvoting right here

As you can see in this picture

I had in different instances

Then I createdn different decision trees

And finally I will compilethe result of all these n different decision trees and I will take my callon the majority voting right

So whatever mymajority vote says that will be My final result

So this is basically an overviewof the random Forest algorithm how it actually works

Let's just have a lookat this example to get much better understandingof what we have learnt

So let's say I havethis data set which consists of fourdifferent instances, right? So basically it consistsof the weather information of previous 14 days rightfrom D1 tildy 14, and this basicallyOutlook humidity and wind is Click gives methe better condition of those 14 days

And finally I have play which is my target variableweather match did take place on that particular dayor not right

Now

My main goal is to find out whether the matchwill actually take place if I have followingthese weather conditions with me on any particular day

Let's say the Outlookis rainy that day and humidity is highand the wind is very weak

So now I need to predict whether I will be ableto play in the match

That they are not

All right

So this isa problem statement fine

Now, let's see how random Forestis used in this to sort it out

Now here the first stepis to actually split my entire data setinto subsets here

I have split my entire14 variables into further smaller subsets rightnow these subsets may or may not overlap like there is certainoverlapping between d 1 till D3 and D3 till D6 fine

Is an overlapping of D3 so it might happenthat there might be overlapping so you need not really worryabout the overlapping but you have to make sure that all those subsets areactually different right? So here I have takenthree different subsets my first subset consists of D1 till D3 Mexican subsetconsists of D3 till D6 and methods subsetconsists of D7 tildy

Now now I will first be focusingon my first upset now here, let's say that particular day the Outlook wasOvercast fine if yes, it was overcastthen the probabilities that the match will take place

So overcast is basicallywhen your weather is too cloudy

So if that is the conditionthen definitely the match will take place and let's sayit wasn't overcast

Then you will consider thesesecond most probable option that will be the wind and you will makea decision based on this now whether wind was weak or strongif wind was weak, then you will definitelygo out and play them

Judge as you would not sonow the final outcome out of this decisiontree will be Play Because here the ratiobetween the play and no play is to is to 1 so we get to a certain decisionfrom a first decision tree

Now, let us lookat the second subset now since second subset hasdifferent number of variables

So that is why this decisiontrees absolutely different from what we saw in our four subsets

So let's say if it was overcastthen you will play the match if It isn't the overcastin you would go and look out for humidity

Now further

It will get split into twowhether it was high or normal

Now, we'll take the first case if the humidity was highand when it was week, then you will playthe match else if humidity was highbut wind was too strong, then you would not go outand play the match right now

Let us look at the second dotto node of humidity if the humidity was normal

The wind was weak

Then you will definitely go outand play the match as you want go outand play the match

So here if you lookat the final result, then the ratio of placed no playis 3 is to 2 then again

The final outcomeis actually play, right? So from second subset, we get the finaldecision of play now, let us look at our third subset which consists of D7till D9 here if again the overcast is yes,then you will play a match

Each else you will goand check out for humidity

And if the humidity isreally high then you won't play the match else

You will play the matchagain the probability of playing the matches

Yes, because the ratioof no play is Twist one, right? So three different subsetsthree different decision trees three different outcomes and one final outcomeafter compiling all the results from these three differentdecision trees are so I This gives a better perspectivebetter understanding of random Forest likehow it really works

All right

So now let's just have a look at various featuresof random Forest Ray

So the firstand the foremost feature is that it is one of the most accuratelearning algorithms, right? So why it is so because single decision treesare actually prone to having high variance or Hive bias and onthe contrary actually

M4s, it averagesthe entire variance across the decision trees

So let's say if the variances sayX4 decision tree, but for random Forest, let's say we haveimplemented n number of decision trees parallely

So my entire variancegets averaged to upon and my final varianceactually becomes X upon n so that is how the entire varianceactually goes down as compared to other algorithms

Now second mostimportant feature is that it works wellfor both classification and regression problems and by far I have comeacross this is one and the only algorithm which works equallywell for both of them these classification kindof problem or a regression kind of problem, right? Then it's really runs efficienton large databases

So basically it'sreally scalable

Even if you work forthe lesser amount of database or if you work for a reallyhuge volume of data, right? So that's a verygood part about it

Then the fourth mostimportant point is that it requires almostno input preparation

Now, why am I saying this is because it has gotcertain implicit methods, which actually take careand All the outliers and all the missing data and you really don't have totake care about all that thing while you are in the stagesof input preparations

So Random Forest isall here to take care of everything else and next

Is it performs implicitfeature selection, right? So while we are implementingmultiple decision trees, so it has got implicit method which will automatically pickup some random features out

Of all your parametersand then it will go on and implementingdifferent decision trees

So for example, if you just giveone simple command that all right, I want to implement500 decision trees no matter how so Random Forestwill automatically take care and it will Implement allthose 500 decision trees and those all 500 decision treeswill be different from each other and this is because it hasgot implicit methods which will automaticallycollect different parameters

Out of all the variablesthat you have right? Then it can be easily grownin parallel why it is so because we are actually implementing multipledecision trees and all those decision trees are running or all those decisionstrees are actually getting implemented parallely

So if you say I want thousandtrees to be implemented

So all those thousand trees aregetting implemented parallely

So that is how the computationtime reduces down

Right, and the last point is that it has got methodsfor balancing error in unbalanced it as it's now what exactlyunbalanced data sets are let me just giveyou an example of that

So let's say you're workingon a data set fine and you create a randomforest model and get 90% accuracy immediately

Fantastic you think right

So now you start divingdeep you go a little deeper

And you discovered that 90% of that data actuallybelongs to just one class damn your entire data set

Your entire decisionis actually biased to just one particular class

So Random Forest actuallytakes care of this thing and it is really not biased towards any particular decisiontree or any particular variable or any class

So it has got methodswhich looks after it and they does is all the balanceof errors in your data sets

So that's pretty much about the featuresof random forests

What is KNN algorithm will K

Nearest neighboris a simple algorithm that stores allthe available cases and classify the new data or case basedon a similarity measure

It suggests that if you are similarto your neighbors, then you are one of them, right? For example, if apple looks more similarto banana orange or Melon

Rather than a monkey rat or a cat then most likely Applebelong to the group of fruits

All right

Well in general Cayenne is usedin Search application where you are lookingfor similar items that is when your task issome form of fine items similar to this one

Then you call this searchas a Cayenne search

But what is this KN KN? Well this K denotes the numberof nearest neighbor which are voting classof the new data or the testing data

For example, if k equal 1 then the testingdata are given the same label as a close this Amplein the training set similarly if k equal to 3 the labels of the three closes classesare checked and the most common label is assignedto then testing data

So this is what a KN KN algorithm meansso moving on ahead

Let's see someof the example of scenarios where KN is usedin the industry

So, let's seethe industrial application of KNN algorithm startingwith recommender system

Well the biggest use case of cayenne and searchis a recommender system

This recommended system islike an automated form of a shop counter guy when youasked him for a product

Not only shows you the product but also suggest you or displaysyour relevant set of products, which are related to the item

You're already interestedin buying this KNN algorithm applies to recommendingproducts like an Amazon or for recommending media, like in case of Netflix or evenfor recommending advertisement to display to a user if I'm not wrong almost allof you must have used Amazon for shopping, right? So just to tell you morethan 35% of amazon

com revenue is generated byits recommendation engine

So what's theirstrategy Amazon uses? Recommendation asa targeted marketing tool in both the email campaigns around most of its website Pages Amazon willrecommend many products from different categories basedon what you have browser and it will pull those productsin front of you which you are likely to buy like the frequentlybought together option that comes at the bottomof the product page to tempt you into buying the combo

Well, this recommendationhas just one main goal that is increase averageorder value or to upsell and cross-sell customersby providing product suggestion based on itemsin the shopping cart, or On the product they arecurrently looking at on site

So next industrialapplication of KNN algorithm is concept search or searching semanticallysimilar documents and classifying documentscontaining similar topics

So as you know, the data on the Internetis increasing exponentially every single second

There are billions and billionsof documents on the internet each document on the internetcontains multiple Concepts, that could bea potential concept

Now, this is a situation where the main problemis to extract concept from a set of documents as each page could havethousands of combination that could be potential Conceptsan average document could have millions of concept combined that the vast amountof data on the web

Well, we are talkingabout an enormous amount of data set and Sample

So what we need is we needto find the concept from the enormous amountof data set and samples, right? So for this purpose, we'll be using KNNalgorithm more advanced example could include handwritingdetection like an OCR or image recognizationor even video recognization

All right

So now that you knowvarious use cases of KNN algorithm, let's proceed and seehow does it work

So how doesa KNN algorithm work? Let's start by plottingthese blue and orange point on our graph

So these Blue Pointsthe belong to class A and the orange onesthey belong to class B

Now you get a star as a new ponyand your task is to predict whether this new pointit belongs to class A or it belongs to the class B

So to start the production,the very first thing that you have to do isselect the value of K, just as I told you KN KNalgorithm refers to the number of nearest neighbors that youwant to select for example, in this case k equal to 3

So what does it mean it means that I am selecting three points which are the least distanceto the new point or you can say I am selectingthree different points which are closest to the star

Well at this pointof time you can ask how will you calculatethe least distance? So once youcalculate the distance, you will get one blueand two orange points which are closest to this starnow since in this case as we have a majorityof Inch point so you can see that for k equal 3D starbelongs to the class B, or you can say that the star is more similarto the orange points moving on ahead

Well, what if k equalto 6 well for this case, you have to lookfor six different points which are closest to this star

So in this caseafter calculating the distance, we find that we havefour blue points and two Orange Point which are closestto the star now, as you can see that the blue points arein majority so you can say that for k equals6 this star belongs

These two class A or the staris more similar to Blue Points

So by now, I guess you knowhow a KNN algorithm work

And what is the significanceof gain KNN algorithm

So how will youchoose the value of K? So keeping in mind this casethe most important parameter in KNN algorithm

So, let's see when you builda k nearest neighbor classifier

How will you choosea value of K? Well, you might havea specific value of K in mind or you could divide upyour data and use something like cross-validation techniqueto test several values of K in order to determine which works best for your data

Example if n equal2,000 cases then in that case the optimal valueof K lies somewhere in between 1 to 19

But yes, unless you try ityou cannot be sure of it

So, you know how the algorithmis working on a higher level

Let's move on and see how things are predictedusing KNN algorithm

Remember I told you the KNN algorithm usesthe least distance measure in order to findits nearest neighbors

So let's seehow these distances calculated

Well, there areseveral distance measure which can be used

So to start with Will mainlyfocus on euclidean distance in Manhattan distancein this session

So what isthis euclidean distance? Well, this euclidean distanceis defined as the square root of the sum of differencebetween a new point x and an existing Point why so for example here wehave Point P1 and P2 Point P

1 is 1 1 and point B 2 is 5 for so what is the euclideandistance between both of them? So you can saythat euclidean distance is a direct distancebetween two points

So what is the distancebetween the point P1 and P2? So we Calculate it as5 minus 1 whole square plus 4 minus 1 whole square and we can route itover which results to 5

So next isthe Manhattan distance

Well, this Manhattan distance isused to calculate the distance between real Vector using the sum of their absolutedifference in this case

The Manhattan distancebetween the point P1 and P2 is mod of 5 minus 1plus mod value of 4 minus 1, which results to 3 plus 4

That is 7

So this slide showsthe difference between euclidean and Manhattan distancefrom point A to point B

So euclidean distance isnothing but the direct or the least possible distancebetween A and B

Whereas the Manhattan distanceis a distance between A and B measured along the axisat right angle

Let's take an example and see how things are predictedusing KNN algorithm or how the cannonalgorithm is working suppose

We have data setwhich consists of height weight and T-shirt sizeof some customers

Now when a new customercome we only have is height

And wait as the informationnow our task is to predict

What is the T-shirt sizeof that particular customer? So for this will be usingthe KNN algorithm

So the very first thingwhat we need to do, we need to calculatethe euclidean distance

So now that you have a new dataof height 160 one centimeter and weight as 61 kg

So the very first thing that we'll do is we'll calculatethe euclidean distance, which is nothingbut the square root of 160 1 minus 158 whole square plus 61 minus 58 whole squareand square root of that is 4

24

Let's drag and drop it

So these are the variouseuclidean distance of other points

Now, let's suppose k equalto 5 then the algorithm what it does is it searchesfor the five customer closest to the new customer that is most similarto the new data in terms of its attribute for k equal 5

Let's find the top fiveminimum euclidian distance

So these are the distance which we are goingto use one two, three, four and five

So let's rank themin the order first

This is second

This is third then this oneis Forward and again, this one is five

So there's our order

So for k equal 5 wehave for t-shirts which come under sizeM and one t-shirt which comes under size l so obviously best guessfor the best prediction for the T-shirt size of white161 centimeters and wait 60 1 kg is M

Or you can saythat a new customer fit into size M

Well this was allabout the theoretical session

But before we drill downto the coding part, let me just tell you why peoplecall KN as a lazy learner

Well KN for classification

Ocean is a verysimple algorithm

But that's not why they arecalled lazy KN is a lazy learner because it doesn't havea discriminative function from the training data

But what it does itmemorizes the training data, there is no learning phaseof the model and all of the work happens at the time

Your prediction is requested

So as such there's the reasonwhy KN is often referred to us lazy learning algorithm

So this was all aboutthe theoretical session now, let's move onto the coding part

So for the Practical implementation ofthe Hands-On part, I'll be usingthe artists data set so This data set consistsof 150 observation

We have four features and one class labelthe four features include the sepal length sepal widthpetal length and the petrol head whereas the class labeldecides which flower belongs to which category

So this was the descriptionof the data set, which we are using now, let's move on and seewhat are the step by step solutionto perform a KNN algorithm

So first, we'll startby handling the data what we have to do wehave to open the data set from the CSV format and split the data setinto train and test part next

We'll take the Clarity where wehave to calculate the distance between two data instances

Once we calculate the distancenext we'll look for the neighbor and select K Neighbors which are having the leastdistance from a new point

Now once we get our neighbor, then we'll generate a responsefrom a set of data instances

So this will decide whether the new Point belongsto class A or Class B

Finally will createthe accuracy function and in the end

We'll tie it all togetherin the main function

So let's start with our code for implementing KNNalgorithm using python

I'll be using Java

Old book by Don3

0 installed on it

Now

Let's move on and see how can an algorithmcan be implemented using python

So there's my jupyter notebook, which is a web-based interactiveComputing notebook environment with python 3

0 installed on it

So the launch its launching sothere's our jupyter notebook and we'll be ridingour python codes on it

So the first thing that we need to do isload our file our data is in CSV formatwithout a header line or any code we can openthe file the open function and read the data lineusing the reader function

In the CSV module

So let's write a codeto load our data file

Let's execute the Run button

So once you executethe Run button, you can see the entire trainingdata set as the output next

We need to split the datainto a training data set that KN can use to makeprediction and a test data set that we can use to evaluatethe accuracy of the model

So we first need to convertthe flower measure that will load it asstring into numbers that we can work next

We need to split the data setrandomly to train and test

Ratio 67's 233 for test isto train as a standard ratio, which is used for this purpose

So let's define a function as load data set that loads a CSVwith the provided file named and split itrandomly into training and test data set usingthe provided split ratio

So this is our function loaddata set which is using filename split ratio training data set and testing dataset as its input

All right

So let's execute the Run buttonand check for any errors

So it's executedwith zero errors

Let's test this function

So there's our trainingset testing set load data set

So this is our functionload data set on inside that we are passing

Our file is datawith a split ratio of 0

66 and training data setand test data set

Let's see what our training dataset and test data set

It's dividing into soit's giving a count of training data setand testing data set

The total numberof training data set as split into is97 and total number of test data set we have is 53

So total number of training dataset we have here is 97 and total number of test dataset we have here is 53

All right

Okay, so Function loaddata set is performing

Well, so let's moveon to step two which is similarity

So in order to make prediction, we need to calculatethe similarity between any two given data instances

This is needed so that we can locate the kamosimilar data instances in the training data set arein turn make a prediction given that all for flower measurementare numeric and have same unit

We can directly usethe euclidean distance measure

This is nothingbut the square root of the sum of squared differencesbetween two areas of the number given that all the for flowerAre numeric and have same unit we can directly usethe euclidean distance measure which is nothingbut the square root of the sum of squared differencebetween two areas or the number additionallywe want to control which field to includein the distance calculation

So specifically we only wantto include first for attribute

So our approach will beto limit the euclidean distance to a fixed length

All right

So let's defineour euclidean function

So this are euclideandistance function which takes instanceone instance to and length as parameters instance 1 and ends

These two are the two points of which you want to calculatethe euclidean distance, whereas this length and denote that how many attributesyou want to include? Okay

So there's oureuclidean function

Let's execute it

It's executing finewithout any errors

Let's test the functionsuppose the data one or the first instance consistsof the data point has two to two and it belongs to class A and data to consistof four for four and it belongs to class P

So when we calculatethe euclidean distance of data one to data to and what we have to do wehave to consider only first three features of them

All right

So let's print the distanceas you can see here

The distance comesout to be three point four six four now likeso this is nothing but the square rootof 4 minus 2 whole Square

So this distance is nothingbut the euclidean distance and it is calculated as squareroot of 4 minus 2 whole square plus 4 minus 2 whole square that is nothing but 3times of 4 minus 2 whole square that is 12 + square root of 12 is nothingbut 3

46 for all right

So now that we have calculatedthe distance now we need to look for K nearest

Neighbors now that wehave a similarity measure we can use it to collectthe kamo similar instances for a given unseen instance

Well, this isa straightforward process of calculating the distancefor all the instances and selecting a subset withthe smallest distance value

And now what we haveto do we have to select the smallest distance values

So for that will bedefining a function as get neighbors

So for that what we will be doingwill be defining a function as get neighbors what it will do it will returnthe K most similar Neighbors From the training setfor a given test instance

All right, so this ishow our get neighbors In look like it takes training data set and test instanceand K as its input here

The K is nothing but the number of nearest neighboryou want to check for

All right

So basically whatyou'll be getting from this get Mabel'sfunction is K different points having least euclidean distancefrom the test instance

All right, let's execute it

So the function executedwithout any errors

So let's test our function

So suppose the training data setincludes the data like to to to and it belongs to class A and other data includesfour four four and it belongs to class P and at testingand Census 555 or now, we have to predict whether this test instancebelongs to class A or it belongs to class be

All right for k equal 1we have to predict its nearest neighbor and predict whether this test instanceit will belong to class A or will it belong to class be? Alright

So let's execute the Run button

All right

So an executingthe Run button you can see that we have outputas for for for and be a new instance 5 5 5 is closes 2

44for which belongs to class be

All right

Now once you have locatedthe most similar neighbor for a test instance next taskis to predict a response based on those neighbors

So how we can do that

Well, we can do this by allowing each neighborto vote for the class attribute and take the majority voteas a prediction

Let's see how we can do that

So we are function as getresponse with takesneighbors as the input

Well, this neighbor was nothingbut the output of this get me / function the output of get me were functionwill be fed to get response

All right

Let's execute the Run button

It's executed

Let's move ahead and testour function get response

So we have a But as bun bunbun it belongs to class A 2 2 2 it belongs to class a33

It belongs to class B

So this response,that's what it will do

It will store the valueof get response by passing this neighbor value

I like so what we wantto check is we want to predict whether that test instancefinal outcome will belongs to class A or Class B

When the neighbors are1 1 1 a 2 2 A + 3 3 B

So, let's check our response

Now that we have createdall the different function which are requiredfor a KNN algorithm

So important main concern is how do you evaluatethe accuracy of the prediction and easy way to evaluate the accuracy of the modelis to calculate a ratio of the total correct predictionto all the protection made

So for this I willbe defining function as get accuracy and inside that I'll be passingmy test data set and the predictions getaccuracy function check get executed without any error

Let's check itfor a sample data set

So we have our test data set as1 1 1 It belongs to class A 2/2 which again belongs to class3 3 3 which belongs to class B and my predictions isfor first test data

It predicted latter belongsto class A which is true for next it predictedthat belongs to class C, which is again to and forthe next again and predictive that it belongs to class Awhich is false in this case cause the test databelongs to class be

All right

So in total we have to correctprediction out of three

All right, so the ratiowill be 2 by 3, which is nothing but 66

6

So our accuracy rate is 66

6

It's so now that youhave created all the function that are requiredfor KNN algorithm

Let's compile theminto one single main function

Alright, so this isour main function and we are using Iris data set with a split of 0

67 andthe value of K is 3 Let's see

What is the accuracy scoreof this check how accurate are modulus soin training data set, we have a hundredand thirteen values and then the test data set

We have 37 values

These are the predicted and the actual valuesof the output

Okay

So in total we gotan accuracy of 90s

In point two nine percent,which is really very good

Alright, so I hope the conceptof this KNN algorithm is here device in a worldfull of machine learning and artificial intelligencesurrounding almost everything around us classification and prediction is one of the most important aspectsof machine learning

So before moving forward, let's have a quick lookat the agenda

I'll start off this videoby explaining you guys what exactly is Nave biased then we'll and whatis Bayes theorem which serves as a logic behind the name passalgorithm going forward

I'll explain the steps involved in the neighborsalgorithm one by one and finally add finishof this video with a demo on the Nave bass usingthe SQL own package noun a bass is a simple butsurprisingly powerful algorithm from penetrative analysis

It is a classification techniquebased on base theorem with an assumption ofIndependence among predictors

It comprises of two parts,which is name

And bias in simple termsneighbors classifier assumes that the presenceof a particular feature in a class is unrelatedto the presence of any other feature, even if this featuresdepend on each other or upon the existenceof the other features, all of these propertiesindependently contribute to the probability whether a fruit is an appleor an orange or a banana

So that is why it is known as naive now naivebased model is easy to build and particularly usefulfor very large data sets

In probability Theoryand statistics based theorem, which is alreadyknown as the base law or the base rule describesthe probability of an event based on prior knowledgeof the conditions that might be relatedto the event now paste theorem is a way to figureout conditional probability

The conditional probabilityis the probability of an event happening given that it has some relationshipto one or more other events

For example, your probabilityof getting a parking space is connected to the timeof the day you pass

Where you park and what conventions are yougoing on at that time based Serum is slightlymore nuanced in a nutshell

It gives you an actualprobability of an event given information about the tests

Now, if you lookat the definition of Bayes theorem, we can seethat given a hypothesis H and the evidencee-base term states that the relationship between theprobability of the hypothesis before getting the evidence which is the P of Hand the probability of the hypothesisafter getting the evidence that P of H given eis defined as probability of e given H into probability of H divided by probability of eit's rather confusing, right? So let's take an exampleto understand this theorem

So suppose I havea deck of cards and if a single card is drawnfrom the deck of playing cards, the probability that the cardis a king is for by 52 since there are four Kingsin a standard deck of 52 cards

Now if King is an event,this card is a king

The probability of Kingis given as 4 by 52 that is equal to 1 by 13

Now if the evidence is providedfor instance someone looks as the That the single cardis a face card the probability of King given that it's a facecan be calculated using the base theoremby this formula

The since every Kingis also a face card the probability of face given that it's a king is equal to 1 and since there arethree face cards in each suit

That is the chat king and queen

The probability of the face cardis equal to 12 by 52

That is 3 by 30

Now using Bayes theorem wecan find out the probability of King given that it's a face so our final answercomes to 1 by 3, which is also true

So if you have a deck of cards which has having only faces nowthere are three types of phases which are the chat kingand queen so the probability that it's the king is 1 by 3

Now

This is the simple exampleof how based on works now if we look at the proof as inhow this Bayes theorem Evolved

So here we haveprobability of a given p and probability of Bgiven a now for a joint probability distributionover the sets A and B, the probability ofa intersection B, the conditional probabilityof a given B is defined as the probability of a intersection B dividedby probability of B, and similarly probability of B, given a is defined asprobability of B intersection a divided by probabilityof a now we can Equate probability ofa intersection p and probability of B intersection a asboth are the same thing now from this method as you can see, we get our finalbase theorem proof, which is the probability of agiven b equals probability of B, given a into probability of P divided bythe probability of a now while this is the equation that applies toany probability distribution over the events A and B

It has a particular niceinterpretation in case where a is representedas the hypothesis h and B is represented as some observed evidence ein that case the formula is p of H given e is equal to P of e given H into probability of H dividedby probability of e now this relates the probability of hypotheses beforegetting the evidence, which is p of Hto the probability of the hypothesisafter getting the evidence which is p of H given e for this reason P of H is knownas the prior probability while P of Each given e is knownas the posterior probability and the factor that relates the two is known asthe likelihood ratio Now using this term space theoremcan be rephrased as the posteriorprobability equals

The prior probabilitytimes the likelihood ratio

So now that we know the maths which is involvedbehind the baster

Mm

Let's see how we can implementthis in real life scenario

So suppose we have a data set

In which we havethe Outlook the humidity and we need to find out whether we should playor not on that day

So the Outlook can besunny overcast rain and the humidity high normal and the wind are categorizedinto two phases which are the weakand the strong winds

The first of all will createa frequency table using each attribute of the data set

So the frequency tablefor the Outlook looks like this we have Sunny overcastand rainy the frequency table of humidity looks like this and Frequency table of whenlooks like this we have strong and weak for wind and highand normal ranges for humidity

So for each frequency table, we will generatea likelihood table now now the likelihood tablecontains the probability of a particular daysuppose we take the sunny and we take the play as yes and no so the probabilityof Sunny given that we play yes is 3 by 10, which is 0

3 theprobability of X, which is theprobability of Sunny Is equal to 5 by 14 now, these are all the terms which are just generatedfrom the data which we have a and finally the probabilityof yes is 10 out of 14

So if we have a lookat the likelihood of yes given that it's a sunny wecan see using Bayes theorem

It's the probabilityof Sunny given yes into probability of s dividedby the probability of Sunny

So we have allthe values here calculated

So if you putthat in our base serum equation, we get the likelihood of yes

A 0

59 similarly the likelihood of no can also be calculatedhere is 0

40 now similarly

We are going to createthe likelihood table for both the humidity and the win there's a for humidity the likelihoodfor yes given the humidity is high is equal to 0

4to and the probability of playing knowgiven the vent is high is 0

58

The similarly for table windthe probability of he has given that the wind is week is 0

75and the probability of no given that the win is week is 0

25now suppose we have of day which has high rain which has high humidityand the wind is weak

So should we play or not? That's our for that? We use the base theoremhere again the likelihood of yes on that day is equal to the probabilityof Outlook rain given that it's a yes into probabilityof Magic given that say yes, and the probability ofwhen that is we given that it's we are playing yesinto the probability of yes, which equals to zeropoint zero one nine and similarly the likelihoodof know on that day is equal to zero point zero one six

Now if we look at the probabilityof yes for that day of playing we justneed to divide it with the likelihoodsome of both the yes and no so the probabilityof playing tomorrow, which is yes is 5 whereas the probabilityof not playing is equal to 0

45

Now

This is based upon the datawhich we already have with us

So now that you have an ideaof what exactly is named bias how it works and we have seen how it can be implementedon a particular data set

Let's see where itis used in the industry

The started with our firstindustrial use case, which is news categorizationor we can use the term text classificationto broaden the spectrum of this algorithm news in the web are rapidly growingin the era of Information Age where each new site hasits own different layout and categorizationfor grouping news

Now these heterogeneity of layout and categorizationcannot always satisfy individual users needto remove these heterogeneity and classifyingthe news articles

Owing to the user preferenceis a formidable task companies use web crawlerto extract useful text from HTML Pagesthe news articles and each of these news articles is then tokenized nowthese tokens are nothing but the categoriesof the news now in order to achievebetter classification result

We remove the lesssignificant Words, which are the stop wasfrom the documents or the Articles and then we applythe Nave base classifier for classifying the newscontents based on the news

Now this is by far one of the best examplesof Neighbors classifier, which is Spam filtering

Now

It's the NaveBayes classifier are a popular statistical techniquefor email filtering

They typically use bagof words features to identify at the spam email and approach commonly usedin text classification as well

Now it works by correlatingthe use of tokens, but the spam and non-spam emailsand then the Bayes theorem, which I explained earlier is used tocalculate the probability that an email is or not a Spam so namedby a Spam filtering is a baseline techniquefor dealing with Spam that container itself to the emails needof an individual user and give low false positivespam detection rates that are generallyacceptable to users

It is one of the oldest waysof doing spam filtering with its roots in the 1990s particular wordshave particular probabilities of occurring in spam

And and legitimate emailas well for instance

Most emails users will frequently encounterthe world lottery or the lucky draw a spam email, but we'll sell themsee it in other emails

The filter doesn't knowthese probabilities in advance and must be friends

So it can build themup to train the filter

The user must manually indicate whether a new email is Spamor not for all the words in each straining email

The filter willadjust the probability that each word will appearin a Spam or legitimate

Owl in the database now after training the wordprobabilities also known as the likelihood functions areused to compute the probability that an email with a particularset of words as in in belongs to either category each word in the email contributesthe email spam probability

This contribution is calledthe posterior probability and is computed againusing the base 0 then the email spam probability is computed over allthe verse in the email and if the total exceedsa certain threshold say Or 95% the filter will Markthe email as spam

Now object detection isthe process of finding instances of real-world objectssuch as faces bicycles and buildings in images or video now object detection algorithm typicallyuse extracted features and learning algorithm to recognize instance ofan object category here again, a bass plays an importantrole of categorization and classification of objectnow medical area

This is increasingly voluminousamount of electronic data, which are becoming moreand more complicated

The produced medical datahas certain characteristics that make the analysisvery challenging and attractive as well among allthe different approaches

The knave bias is used

It is the most effectiveand efficient classification algorithm and hasbeen successfully applied to many medical problemsempirical comparison of knave bias versusfive popular classifiers on Medical data sets shows that may bias is well suitedfor medical application and has high performance in mostof the examine medical problems

Now in the past varioustesticle methods have been used for modeling in the areaof disease diagnosis

These methods requireprior assumptions and are less capable of dealing with massive and complicatednonlinear and dependent data one of the main advantagesof neighbor as approach which is appealingto Physicians is that all the availableinformation is used? To explain the decisionthis explanation seems to be natural for medicaldiagnosis and prognosis

That is it is veryclose to the way how physician diagnosed patientsnow weather is one of the most influential factorin our daily life to an extent that it may affectthe economy of a country that depends on occupationlike agriculture

Therefore as a countermeasureto reduce the damage caused by uncertaintyin whether Behavior, there should be an efficient wayto print the weather now whether projectinghas Challenging problem in the meteorological department since ears evenafter the technology skill and scientificadvancement the accuracy and protection of weatherhas never been sufficient even in current day this domainremains as a research topic in which scientists and mathematicians are workingto produce a model or an algorithm that will accuratelypredict the weather now a bias in approachbased model is created by where posterior probabilitiesare used to calculate the likelihood ofeach class label for input

Data instance and the onewith the maximum likelihood is considered as the resultingoutput now earlier

We saw a small implementationof this algorithm as well where we predicted whether we should playor not based on the data, which we have collected earlier

Now, this is a python Library which is known as scikit-learnit helps to build in a bias and model in Python

Now, there are three typesof named by ass model under scikit-learn Library

The first one is the caution

It is used in classificationand it Assumes that the feature followa normal distribution

The next we have is multinomial

It is used for discrete counts

For example, let's say we havea text classification problem and here weconsider bernouli trials, which is one step further and instead of wordoccurring in the document

We have count how often word occurs in the document youcan think of it as a number of timesoutcomes number is observed in the given number of Trials

And finally we havethe bernouli type

Of neighbors

The binomial model is useful if your feature vectors arebinary bag of words model where the once and the zeros are words occurin the document and the verse which do not occur in the document respectivelybased on their data set

You can choose any ofthe given discussed model here, which is the gaussianthe multinomial or the bernouli

So let's understandhow this algorithm works

And what arethe different steps? One can take to createa bison model and use knave bias to predict the output sohere to understand better

We are going to predictthe onset of diabetes Now this problem comprises of 768 observationsof medical details for Pima Indian patients

The record describesinstantaneous measurement taken from the patient such asthe age the number of times pregnant and the blood work crew now allthe patients are women aged 21 and Older and allthe attributes are numeric and the unit's varyfrom attribute to attribute

Each record hasa class value that indicate whether the patient sufferedon onset of diabetes within five yearsare the measurements

Now

These are classified as 0 now

I've broken the whole processdown into the following steps

The first stepis handling the data in which we load the datafrom the CSV file and split it into training and test it as it's the second stepis summarizing the data

In which we summarize the properties in the trainingdata sets so that we can calculate the probabilitiesand make predictions

Now the third step comes ismaking a particular prediction

We use the summaries of the data set to generatea single prediction

And after that we generatepredictions given a test data set and a summarizedtraining data sets

And finally we evaluate the accuracy of the predictionsmade for a test data set as the percentage correctout of all the predictions made and finally We tiedtogether and form

Our own modelof nape is classifier

Now

The first thing we need to dois load our data the data is in the CSV formatwithout a header line or any codes

We can open the filewith the open function and read the data linesusing the read functions in the CSV module

Now, we also needto convert the attributes that were loaded asstrings into numbers so that we can work with them

So let me show you how this can be implemented nowfor that you need to Tall python on a system and usethe jupyter notebook or the python shell

Hey, I'm usingthe Anaconda Navigator which has all the things required to dothe programming in Python

We have the Jupiter lab

We have the notebook

We have the QT console

Even we have a studio as well

So what you need to do is justinstall the Anaconda Navigator it comes with the preinstalled python also, so the moment you click launchon The jupyter Notebook

It will take youto the Jupiter homepage in a local system and here youcan do programming in Python

So let me just rename it asby my India diabetes

So first, we needto load the data set

So I'm creating here a functionload CSV now before that

We need to importcertain CSV the math and the random method

So as you can see, I've created a load CSV function which will take the piemy Indian diabetes data dot CSV file usingthe CSV dot read a method and then we are convertingevery element of that data set into float originally allthe ants are in string, but we need to convertthem into floor for all calculation purposes

The next we need to splitthe data into training data sets that nay bias can useto make the prediction and this data set that we can use to evaluatethe accuracy of the model

We need to split the dataset randomly into training and testing data setin the ratio of usually which is 7230

But for this example, I'm going to use 67 and 33 now 70 and 30 is a Ratiofor testing algorithms so you can play aroundwith this number

So this is our splitdata set function

Now the Navy basemodel is comprised of summary of the datain the training data set

Now this summary is then usedwhile making predictions

Now the summaryof the training data collected involves the meanthe standard deviation of each attributeby class value now, for example, if there are two class valuesand seven numerical attributes, then we need a mean and the standard deviation foreach of these seven attributes and the class value which makes The 14attributes summaries so we can break the preparation of this summary downinto the following sub tasks which are the separating databy class calculating mean calculating standard deviationsummarizing the data sets and summarizingattributes by class

So the first task is to separate the training data setinstances by class value so that we can calculatestatistics for each class

We can do that by creating a map of each class valueto a list of instances that belong to the class

Class and sort the entiredataset of instances into the appropriate list

Now the separateby class function just the same

So as you can seethe function assumes that the last attributeis the class value the function returns a mapof class value to the list of data instances next

We need to calculatethe mean of each attribute for a class value

Now, the mean is the central middle orthe central tendency of the data and we use it as a middleof our gaussian distribution when Calculatingthe probabilities

So this is our functionfor mean now

We also need to calculatethe standard deviation of each attributefor a class value

The standard deviationis calculated as a square root of the variance and the varianceis calculated as the average of the squared differences for each attribute value from the mean nowone thing to note that here is that we are usingn minus one method which subtracts one from the numberof attributes values when calculating the variance

Now that we have the toolsto summarize the data for a given list of instances

We can calculate the meanand standard deviation for each attribute

Now that's if function groupsthe values for each attribute across our data instancesinto their own lists so that we can compute the meanand standard deviation values for each attribute

Now next comes the summarizingattributes by class

We can pull it all togetherby first separating

Our training data setsinto instances groped by class then calculating the summariesfor each a Should be now

We are ready to make predictionsusing the summaries prepared from our training data making patients involvedcalculating the probability that a given data instancebelong to each class then selecting the class with the largest probabilityas a prediction

Now we can divide this wholemethod into four tasks which are the calculatinggaussian probability density function calculating classprobability making a prediction and then estimating the accuracy now to calculate the gaussianprobability density function

We use the gaussian functionto estimate the probability of a given attribute valuegiven the node mean and the standard deviationof the attribute estimated from the training data

As you can seethe parameters RX mean and the standard deviation now in the calculateprobability function, we calculate the exponent firstthen calculate the main division this lets us fit the equationnicely into two lines

Now, the next task is calculating theclass properties now that we had can calculatethe probability of an attribute belonging to a class

We can combine the probabilitiesof all the attributes values for a data instanceand come up with a probability of the entire

Our data instancebelonging to the class

So now that we have calculatedthe class properties

It's time to finally makeour first prediction now, we can calculate the probabilityof the data instance belong to each class value and we can lookfor the largest probability and return the associated class and for that we are goingto use this function predict which uses the summaries and the input Vector which isbasically all the probabilities which are being inputfor a particular label now finally we canAn estimate the accuracy of the modelby making predictions for each data instancesin our test data for that

We use the getpredictions method

Now this method is used to calculate the predictionsbased upon the test data sets and the summaryof the training data set

Now, the predictionscan be compared to the class valuesin our test data set and classification accuracycan be calculated as an accuracy ratiobetween the zeros and the hundred percent

Now the get accuracy method willcalculate this accuracy ratio

Now finally to sum it all up

We Define our main functionwe call all these methods which we have definedearlier one by one to get the Courtesy of the modelwhich we have created

So as you can see, this is our main functionin which we have the file name

We have defined the split ratio

We have the data set

We have the trainingand test data set

We are using the splitdata set method next

We are using the summarizedby class function using the get protection andthe get accuracy method as well

So guys as you can seethe output of this one gives us that we are splittingthe 768 Rose into 514 which is the training and 254 which is the test data set rowsand the accuracy of this model is 68% Now we can playwith the amount of training and test data setswhich are to be used so we can changethe split ratio to seventies

238 is 220 to getdifferent sort of accuracy

So suppose I changethe split ratio from 0

67 20

8

So as you can see, we get the accuracyof 62 percent

So splitting it into 0

67gave us a better result which was 68 percent

So this is how you can ImplementNavy bias caution classifier

These are the stepby step methods which you need to do in case ofusing the Nave Bayes classifier, but don't worry

We do not need to writeall this many lines of code to make a modelthis with the second

And I really comes into picturethe scikit-learn library has a predefined method or as say a predefinedfunction of nape bias, which converts allof these lines, of course into merely justtwo or three lines of codes

So, let me just openanother jupyter notebook

So let me name itas sklearn a pass

Now here we are going to usethe most famous data set which is the iris De Casa

Now, the iris flower dataset is a multivariate data set introduced bythe British statistician and biologists Roland Fisher and based on this fish is lineardiscriminant model this data set became a typical test case for many statisticalclassification techniques in machine learning

So here we are going to usethe caution NB model, which is already availablein the sklearn

As I mentioned earlier, there were threetypes of Neighbors which are the questionmultinomial and the bernouli

So here we are going to usethe caution and be model which is already presentin the SK loan Library, which is the cycle in library

So first of all, what we need to do isimport the sklearn data sets and the metrics and we also needto import the caution NB Now once all these libraries are lowered we needto load the data set which is the iris dataset

The next what we needto do is fit a Nave by a smaller to this data set

So as you can see we have soeasily defined the model which is the gaussianNB which contains all the programming which I just showed youearlier all the methods which are taking the inputcalculating the mean the standard deviationseparating it bike last and finally making predictions

Calculating theprediction accuracy

All of this comesunder the caution and be method which is inside already presentin the sklearn library

We just need to fit itaccording to the data set which we have so next if we print the model we seewhich is the gaussian NB model

The next what we need to dois make the predictions

So the expected outputis data set dot Target and the projectedis using the pretend model and the model we are usingis the cause in N be here

Now to summarize the model which created we calculatethe confusion Matrix and the classification report

So guys, as you can seethe classification to provide we have the Precisionof Point Ninety Six, we have the recall of 0

96

We have the F1 score and the support and finally ifwe print our confusion Matrix, as you can see it givesus this output

So as you can seeusing the gaussian and we method justputting it in the model and using any of the data

fitting the model which you createdinto a particular data set and getting the desiredoutput is so easy with the scikit-learn library as we Mo supportVector machine is one of the most effectivemachine learning classifier and it has been usedin various Fields such as face recognitioncancer classification and so on today's session is dedicated to how svm worksthe various features of svm and how it Is usedin the real world

All right

Okay

Now let's move on and seewhat svm algorithm is all about

So guys s VM or support Vector machine isa supervised learning algorithm, which is mainly used to classifydata into different classes now unlike most algorithms svmmakes use of a hyperplane which acts likea decision boundary between the various classes in general svm canbe used to generate multiple separating hyperplanes so that the datais Divided into segments

Okay, and each of these segments will containonly one kind of data

It's mainly used for classification purposewearing you want to classify or data into two differentsegments depending on the features of the data

Now before moving any further, let's discuss a fewfeatures of svm

Like I mentioned earlier svm isa supervised learning algorithm

This means that svm trains on a set of labeled data svmstudies the label training data and then classifiesany new input Data, depending on what it learned in the training phasea main advantage of support Vector machine is that it can be usedfor both classification and regression problems

All right

Now even though svm is mainlyknown for classification the svr which is the supportVector regressor is used for regression problems

All right, so svm can be usedboth for classification

And for regression

Now, this is one of the reasonswhy a lot of people prefer svm because it's a verygood classifier and along That it is alsoused for regression

Okay

Another feature is the svmkernel functions svm can be used for classifying nonlinear data by using the kernel trickthe kernel trick basically means to transform your datainto another dimension so that you can easilydraw a hyperplane between the differentclasses of the data

Alright, nonlinear datais basically data which cannot be separatedwith a straight line

Alright, so svm can even be usedon nonlinear data sets

You just have to usea A kernel functions to do this

All right

So guys, I hopeyou all are clear with the basic concepts of svm

Now, let's move onand look at how svm works so there's an orderto understand how svm Works let's consider a small scenarionow for a second pretend that you own a firm

Okay, and let's saythat you have a problem and you want to set up a fenceto protect your rabbits from the pack of wolves

Okay, but where do you build your filmsone way to get around? The problem is to builda classifier based

On the position of the rabbitsand words in your pasture

So what I'm telling you isyou can classify the group of rabbits as one group and draw a decisionboundary between the rabbits and the world correct

So if I do that and if I tryto draw a decision boundary between the rabbitsand the Wolves, it looks something like this

Okay

Now you can clearly builda fence along this line in simple terms

This is exactly how SPM work it drawsa decision boundary, which is a hyperplane between any New classesin order to separate them or classify them now

I know you're thinkinghow do you know where to draw a hyperplane the basic principle behindsvm is to draw a hyperplane that best separatesthe two classes in our case the two glassesof the rabbits and the Wolves

So you start off by drawinga random hyperplane and then you check the distancebetween the hyperplane and the closest data points from each Club these closeson your is data points to the hyperplane are knownas support vectors

And that's where the name comesfrom support Vector machine

So basically thehyperplane is drawn based on these support vectors

So guys an optimalhyperplane will have a maximum distance from eachof these support vectors

All right

So basically the hyperplanewhich has the maximum distance from the support vectors isthe most optimal hyperplane and this distancebetween the hyperplane and the support vectorsis known as the margin

All right, so to sum it up svmis used to classify data

By using a hyper plane suchthat the distance between the hyperplane andthe support vectors is maximum

So basically your marginhas to be maximum

All right, that way, you know that you're actuallyseparating your classes or add because the distance betweenthe two classes is maximum

Okay

Now, let's tryto solve a problem

Okay

So let's say that I inputa new data point

Okay

This is a new data point and now I want to drawa hyper plane such that it best separatesthe two classes

Okay, so I start offby drawing a hyperplane

Like this and thenI check the distance between the hyperplaneand the support vectors

Okay, so I'm trying to check if the margin is maximumfor this hyper plane, but what if I draw a hyperplanewhich is like this? All right

Now I'm going to checkthe support vectors over here

Then I'm goingto check the distance from the support vectors and forthis hyperplane, it's clear that the margin is more red

When you compare the marginof the previous one to this hyperplane

It is more

So the reason why I'm choosingthis hyperplane is because the Distancebetween the support vectors and the hyperplane is maximumin this scenario

Okay

So guys, this ishow you choose a hyperplane

You basically have to make sure that the hyper planehas a maximum

Margin

All right, it has to bestseparate the two classes

All right

Okay so far it was quite easy

Our data was linearly separable which means that youcould draw a straight line to separate the two classes

All right, but what will you do? If the data set is like this you possibly can't drawa hyperplane like Is on it, it doesn't separatethe two classes at all

So what do you do in such situations now earlierin the session I mentioned how a kernel can be usedto transform data into another dimension that has a clear dividing marginbetween the classes of data

Alright, so kernel functionsoffer the user this option of transforming nonlinear spacesinto linear ones

Nonlinear data set is the one that you can't separateusing a straight line

All right

In order to dealwith such data sets, you're going to transform theminto linear data sets and then use svm on them

Okay

So simple trick would beto transform the two variables X and Y into a newfeature space involving a new variable called Z

All right, so guys so farwe were plotting our data on two dimensional space

Correct? We will only using the X and the y axis so we had onlythose two variables X and Y now in order to deal with this kindof data a simple trick

Be to transformthe two variables X and Y into a new feature spaceinvolving a new variable called Z

Okay, so we're basicallyvisualizing the data on a three-dimensional space

Now when you transformthe 2D space into a 3D space you can clearly seea dividing margin between the two classesof data right now

You can go aheadand separate the two classes by drawing the besthyperplane between them

Okay, that's exactly what we discussedin the previous slides

So guys, why don't youtry this yourself dried

Drawing a hyperplane, which is the most Optimumfor these two classes

All right, so guys, I hope you havea good understanding about nonlinear svm's now

Let's look at a realworld use case if support Vector machines

So guys s VM as a classifier has been usedin cancer classification since the early 2000s

So there was an experiment heldby a group of professionals who applied svm in a coloncancer tissue classification

So the data setconsisted of about Transmembrane protein samples and only about 50 to 200genes samples were input Into the svm classifier Now this sample which was input into the svm classifier hadboth colon cancer tissue samples and normal colon tissuesamples right now

The main objective of this studywas to classify Gene samples based on whether theyare cancerous or not

Okay, so svm was trainedusing the 50 to 200 samples in order to discriminatebetween non-tumor from A tumor specimens

So the performance of the svm classifierwas very accurate for even a small data set

All right, we had only50 to 200 samples and even for the small data setsvm was pretty accurate with this results

Not only that itsperformance was compared to other classificationalgorithms like naive Bayes and in each case svmoutperform naive Bayes

So after this experimentit was clear that svm classified the datamore effectively and it worked exceptionally good

Small data sets

Let's go ahead and understand what exactlyis unsupervised learning

So sometimes the given datais unstructured and unlabeled so it becomes difficultto classify the data into different categories

So unsupervised learninghelps to solve this problem

This learning is usedto Cluster the input data and classes on the basisof their statistical properties

So example, we can cluster Different Bikes basedupon the speed limit there

Acceleration or the average that they are giving so and suppose learning is a typeof machine learning algorithm used to draw inferences from beta sets consistingof input data without labeled responses

So if you have a lookat the workflow or the process flowof unsupervised learning, so the training data iscollection of information without any label

We have the machinelearning algorithm and then we havethe clustering malls

So what it does is that distributes the datainto a different class

And again, if you provideany unreliable new data, it will make a prediction and find out to which clusterthat particular data or the data set belongs to or the particular data pointbelongs to so one of the most important algorithms in unsupervisedlearning is clustering

So let's understand exactlywhat is clustering

So a clustering basically is the processof dividing the data sets into groups consistingof similar data points

It means groupingof objects based on the information found inthe data describing the object

Objects or their relationships so clustering malls focus on and defying groupsof similar records and labeling records according to the group to whichthey belong now this is done without the benefitof prior knowledge about the groupsand their characteristics

So and in fact, we may not even know exactlyhow many groups are there to look for

Now

These models are oftenreferred to as unsupervised learning models, since there's no externalstandard by which to judge

One isclassification performance

There are no right or wronganswers to these model

And if we talk about whyclustering is used so the goal of clusteringis to determine the intrinsic group in a setof unlabeled data sometime

The partitioning is the goal or the purpose of clusteringalgorithm is to make sense of and exact value from the last set of structuredand unstructured data

So that is why clusteringis used in the industry and if you have a look at the video, These use cases of clusteringin the industry

So first of all,it's being used in marketing

So discovering distinct groups in customer databasessuch as customers who make a lotof long-distance calls customers who use internet more than cause they're alsousing insurance companies

So like I need to find groupsof Corporation insurance policy holders with high averageclaim rate Farmers crash cops, which is profitable

They are using C Smith studiesand defined problem areas of Oil or gas explorationBased on seesmic data, and they're also usedin the recommendation of movies

If you would say theyare also used in Flickr photos

They also used by Amazon for recommending the productwhich category it lies in

So basically if we talk about clustering there arethree types of clustering

So first of all, we have the exclusive clustering which is the hard clusteringso here and item belongs exclusively to one clusternot several clusters and the data point

Along exclusivelyto one cluster

So an example of this isthe k-means clustering so k-means clusteringdoes this exclusive kind of clustering so secondly, we have overlapping clustering so it is also known assoft clusters in this and item can belong to multiple clusters asits degree of association with each clusteris shown and for example, we have fuzzyor the c means clustering which is being usedfor overlapping clustering and finally we haveThe hierarchical clustering so when two clusters havea parent-child relationship or a tree-like structure, then it is knownas hierarchical cluster

So as you can see herefrom the example, we have a parent-child kind of relationship inthe cluster given here

So let's understand what exactly isK means clustering

So k-means clustering isan algorithm whose main goal is to group similar elementsof data points into a cluster and it is the process by which objects are classifiedinto a predefined number

Of groups so that theyare as much dissimilar as possible from one groupto another group but as much as similar orpossible within each group now if you have a look at thealgorithm working here, right? So first of all, it starts with and defyingthe number of clusters, which is k then I can we find the centroid we findthe distance objects to the distance object to the centroid distanceof object to the centroid then we find the Dropping based on the minimum distance hasthe centroid Converse if true then we makea cluster false

We then I can't findthe centroid repeat all of the stepsagain and again, so let me show you how exactly clustering waswith an example here

So first we needto decide the number of clusters to be made nowanother important task here is how to decide the importantnumber of clusters or how to decide the numberof clusters really get into that later

So first, let's assume that the number Numberof clusters we have decided is 3 so after that thenwe provide the centroids for all the Clusters which is guessing and the algorithm calculatesthe euclidean distance of the point from each centroid and assigns the data point to the closest clusternow euclidean distance

All of you knowis the square root of the distance the square rootof the square of the distance

So next when the centeris a calculated again, we have our new clustersfor each data point

And again the distancefrom the points to the new clustersare calculated and then again, the points are assignedto the closest cluster

And then again, we have the new centroid scattered and nowthese steps are repeated until we havea repetition the centroids or the new center eyes are veryclose to the very previous ones

So antenna and lessoutput gets repeated or the outputs arevery very close enough

We do not stop this process

We keep on calculatingthe euclidean distance

It's of all the pointsto the centroids

Then we calculatethe new centroids and that is how clay meansclustering Works basically, so an important parthere is to understand how to decide the value of Kor the number of clusters because it doesnot make any sense

If you do not know how many classesare you going to make? So to decidethe number of clusters, we have the elbow method

So let's assume first of all computethe sum squared error, which is the sse4 some value

A for example, let's take two four sixand eight now the SS e which is the sum squaredis defined as a sum of the squared distancebetween each number member of the cluster and its centroidmathematically and if you mathematically itis given by the equation which is provided here

And if you broughtthe key against the SSE, you will seethat the error decreases as K gets large now this is because the numberof cluster increases they should be smaller

So does this torsion isalso smaller know the idea of the elbow method isto choose the K at which the SSC decreases abruptly

So for example here if we have a lookat the figure given here

We see that the best numberof cluster is at the elbow as you can see here the graphhere changes abruptly after number four

So for this particular example, we're going to usefor as a number of cluster

So first of all while working with k-means clusteringthere are two key points, As to know first of allbe careful about various start

So choosing the first center at random choosingthe second center that is far away from the firstcenter similarly choosing the NIH Center as far awayas possible from the closest of the all the other centers and the second ideais to do as many runs of k-means each with differentrandom starting points so that you get an ideaof where exactly and how many clustersyou need to make and where exactly the centroid lies

And how the datais getting confused now k-means is not exactlya very good method

So let's understand the pros andcons of clay means clusterings

We know that k-means is simpleand understandable

Everyone loves you that the first gothe items automatically assigned to the Clusters

Now if we havea look at the cons, so first of all one needs todefine the number of clusters, there's a veryheavy task asks us if we have 3/4 or if we have 10 categoriesand if we do not know what the numberof clusters are going to be

It's Difficult for anyoneto you know to guess the number of clusters not all itemsare forced into clusters whether they are actually belongto any other cluster or any other category, they are forced to to lie in that other categoryin which they are closest to this against happensbecause of the number of clusters with not definingthe correct number of clusters or not being able to guessthe correct number of clusters

So and most of all it's unable to handle the noisy data andthe outliners because anyway, As machine learning engineers and data scientistshave to clean the data

But then again it comes down to the analysis watch theyare doing and the method that they are using so typicallypeople do not clean the data for k-means clustering even if the clean there'ssometimes a now see noisy and outliners datawhich affect the whole model so that was allfor k-means clustering

So what we're going to dois now use k-means clustering for the We data set so we have to find outthe number of clusters and divide it accordingly

So the use case isthat first of all, we have a data setof five thousand movies

And what you wantto do is grip them if the movies into clustersbased on the Facebook likes, so guys, let's have a lookat the demo here

So first of all, what we're going to do isimport deep copy numpy pandas Seaborn the various libraries, which we're going to use nowand from map popular videos

In the use ply plot, and we're going to usethis ggplot and next what we're going to dois import the data set and look at the shapeof the data is it so if we have a look at theshape of the data set we can see that it has 5043 rowswith Twenty Eight columns

And if you have a look at the head of the data set wecan see it has 5043 data points, so What we're going to dois place the data points in the plot me takethe director Facebook likes and we have a lookat the data columns face number and post cars total Facebook likes directorFacebook likes

So what we have done here now is taking the directorFacebook likes and the actor three Facebook likes, right

So we have five thousandforty three rows and two columns Now usingthe k-means from sklearn what we're goingto do is import it

First we're goingto import k-means from sklearn dot cluster

Remember guys Escalon isa very important library in Python for machine learning

So and the number of cluster what we're going to do isprovide as five now this again, the number of clusterdepends upon the SSE, which is the sumof squared errors or the we're goingto use the elbow method

So I'm not going to gointo the details of that again

So we're going to fit the datainto the k-means to fit and if you find the cluster, Us then for thek-means and printed

So what we find is isan array of five clusters and Fa print the labelof the Caymans cluster

Now next what we're goingto do is plot the data which we have with the Clusterswith the new data clusters, which we have foundand for this we're going to use the si bon andas you can see here, we have plotted that car

We have plottedthe data into the grid and You can see here

We have five clusters

So probably what I would say is that the cluster3 and the cluster zero are very very close

So it might dependsee that's exactly what I was going to say

Is that initiallythe main Challenge and k-means clustering isto define the number of centers which are the K

So as you can see here that the third Center and the zeroth clusterthe third cluster and the zeroth cluster upvery very close to each other so It probably could have beenin one another cluster and the another disadvantage was that we do not exactly know how the points areto be arranged

So it's very difficult to forcethe data into any other cluster which makes our analysisa little different works fine

But sometimes itmight be difficult to code in the k-means clustering now, let's understand what exactlyis seems clustering

So the fuzzy c means is an extension of the k-meansclustering the popular simple

Clustering technique sofuzzy clustering also referred as soft clustering is a form of clustering in whicheach data point can belong to more than one cluster

So k-means tries to findthe heart clusters where each point belongsto one cluster

Whereas the fuzzy c meansdiscovers the soft clusters in a soft clusterany point can belong to more than one cluster at a time witha certain Affinity value towards each 4zc means assignsthe degree of membership, which Just from 0 to 1to an object to a given cluster

So there is a stipulationthat the sum of the membership of an object to all the cluster

It belongs to must be equalto 1 so the degree of membership of this particular point to pullof these clusters as 0

6 0

4

And if you add up we get 1 so that is one of the logicbehind the fuzzy c means so and and this Affinityis proportional to the distance from the point to the centerof the cluster now then again Now we have the prosand cons of fuzzy see means

So first of all, it allows a data point to bein multiple cluster

That's a pro

It's a more neutralrepresentation of the behavior of jeans jeans usually areinvolved in multiple functions

So it is a verygood type of clustering when we're talkingabout genes First of and again, if we talk about the cons again, we have to Define cwhich is the number of clusters same as K next

We need to determine themembership cutoff value also, so that takes a lot of Timeand it's time-consuming and the Clusters are sensitive to initialassignment of centroid

So a slight change or deviation from the centerhas it's going to result in a very differentkind of, you know, a funny kind of output we getfrom the fuzzy c means and one of the major disadvantageof see means clustering is that it's this arenon deterministic algorithm

So it does not give you a particular outputas in such that's that now let's have a look

At the third type of clustering which isthe hierarchical clustering

So hierarchical clusteringis an alternative approach which builds a hierarchyfrom the bottom up or the top to bottom and does not requireto specify the number of clusters beforehand

Now, the algorithm worksas in first of all, we put each data pointin its own cluster and if I the closest to Cluster and combine them into one morecluster repeat the above step till the data points arein a single cluster

Now, there are two types ofhierarchical clustering one is I've number 80 plus string and the other oneis division clustering

So a commemorativeclustering bills the dendogram from bottom level while the division clusteringit starts all the data points in one clusterthe fruit cluster now again hierarchical clustering alsohas some sort of pros and cons

So in the prosdon't know Assumption of a particular numberof cluster is required and it may correspondto meaningful taxonomist

Whereas if we talkabout the cons once a decision is madeto combine two clusters

Has it cannot be undone and one of the major disadvantage ofthese hierarchical clustering is that it becomes very slow

If we talked about very verylarge data sets and nowadays

I think every industry are usinglast year as its and collecting large amounts of data

So hierarchical clusteringis not the act or the best method someonemight need to go for so there's that now when we talkabout unsupervised learning, so we have K meansclustering and again, Another important term which people usually Miss whiletalking about us was running and there's one veryimportant concept of Market Basket analysis

Now, it is oneof the key techniques used by large retailersto uncover association between items nowit works by looking for combination of items that occur together frequently in the transactionsto put it in other way

It allows retailersto identify the relationships between the items that the People byfor example people who buy bread also tend to buybutter the marketing team at the retail storesshould Target customers who buy bread and butterand provide them and offer so that they buya third item like an egg

So if a customer buys breadand butter and sees a discount or an offer on X, he will be encouraged to spendmore money and buy the eggs

Now, this is what Market Basketanalysis is all about now to find the associationbetween the two items and make predictions aboutwhat the customers will buy

There are two Cartoons which arethe association rule Mining and the ebrary algorithms

So let's discuss eachof these algorithm with an example

First of all, if we have a look atthe association rule mining now, it's a technique that's shows how items are associated toeach other for example customers who purchased spread havea 60 percent likelihood of also purchasingjam and customers who purchase laptop are morelikely to purchase laptop bags

Now if you take an exampleof an association rule if we have a lookat the Example here a arrow B

It means that if a person buys an atom a thenhe will also buy an atom P

Now

There are three common ways tomeasure a particular Association because we have to findthese rules not on the basis of some statistics, right? So what we do is use support confidence and liftnow these three common ways and the measures to have a look at the association ruleMining and know exactly how good is that rule

So first of all, we have support So supportgifts the fraction of the Which contains an item A and B

So it's basicallythe frequency of the item in the whole item set

Where's confidence gifts how often the itemA and B occurred together given the numberof item given the number of times a occur

So it's frequencya comma B divided by the frequency of a now left what indicates is the strengthof the rule over the random co-occurrence of A and B

If you have a close lookat the denominator of the lift formula here, we have support a into supportbe and now a major thing which can be noted from this is that the support of Aand B are independent here

So if the value of lift or the denominator valueof the lift is more it means that the items are independentlyselling more not together

So that in turn will decreasethe value of lift

So what happens is that suppose the valueof lift is more that implies that the rule which we get

It implies that the ruleis strong and it And we used for later purposesbecause in that case the support in to support P value, which is the denominatorof lift will be low which in turn means that there is a relationshipbetween the items in the and B

So let's take an exampleof Association rule Mining and understand howexactly it works

So let's suppose we havea set of items a b c d and e and we havethe set of transactions which are T1 T2, T3, T4 and T5 and what we need to do iscreate some sort of Rules, for example, you can see a d which means thatif a person buys a he buys D if a person by see he buys aif a person buys a he by C

And for the fourth one is if a person by Band C Hill in turn by a now what we need to do is calculatethe support confidence and lift of these rules now here again, we talked abouta priority algorithm

So a priori algorithm and the association rule mininggo hand in hand

So what a predatorThis algorithm

It uses the frequent itemsets togenerate the association rules and it is based on the concept that a subset of a frequent itemsets must alsobe a frequent Isom set

So let's understand what is a frequent item set andhow all of these work together

So if we take the followingtransactions of items, we have transaction T1 2 T 5 and the items are 1 3 4 2 3 5 1 2 3 5 2 5 and 1 3 5 now

Now another moreimportant thing about support which I forgot to mention was that when talking about Association rule miningthere is a minimum support count what we need to do

Now

The first step isto build a list of items that of size 1 usingthis transaction data set and use the minimumsupport count to now, let's see how we dothat if we create the table see when you have a close lookat the table c 1 we have the items at onewhich has support three because it appearsin the transaction one

Three and five similarly if you have a look at the itemset the single item 3

So it has the supportof for it appears in t 1 T 2 T 3 and T 5 but if we have a look at the itemset for it only appears in the transaction once so it's support value is1 now the item set with the support value Which is lessthan the minimum support value that is to haveto be eliminated

So the final table which is a table F1has one two three

And five it does notcontain the for now

What we're going to do iscreate the item list of the size 2 and all the combinationof the item sets in F1

I used in this iteration

So we're left for behind

We just have 1 2 3 & 5

So the possible itemsets a 1 2 1 3 1 5 2 3 2 5 & 3 5 then again

We will calculate the support So in this case if we havea closer look at the table c 2 we seethat the items at once

What to do is having a support value 1which has to be eliminated

So the final table f 2 doesnot contain 1 comma 2 similarly if we create the item sets of size 3 and calculatethis support values, but before calculatingthe support, let's perform the puring on the data set

Now what's appearing? So after all the combinationsare made we divide the table c 3 items to check if there are another subsetwhose support is less than the minimum support value

This is a prairie algorithm

So in the item sets one, two, three what we can seethat we have one two, and in the one to five again, we have one too so build thiscardboard of these item sets and we'll be leftwith 1 3 5 and 2 3 5

So with one three five, we have three subsetsone five one, three three five, which are present in table F2

Then again

We have two threeto five and 3/5 which are also presentin t will f 2 so we have 2 Move 1 comma2 from the table c 3 and create the table F3 now if you're using the items of C3to create the atoms of C-4

So what we find is that we have the item set1 2 3 5 the support value is 1 Which is less thanthe minimum support value of 2

So what we're goingto do is stop here and we're going to returnto the previous item set

That is the tablec 3 so the final table

Well, if three wasone three five with the support value of 2 and 2 3 5with the support value of 2 now, what we're gonna do isgenerate all the subsets of each frequent itemsets

So let's assume that minimum confidence value is60% So for every subset s of I the output rule isthat s gives i2s is that s recommends i ns

If the support of I / supportof s is greater than or equal

Equal to the minimumconfidence value, then only will proceed further

So keep in mind that we have not usedleft till now

We are only workingwith support and confidence

So applying ruleswith item sets of F3 we get rule 1 which is 1 comma3 which gives 1 3 5 and 1/3

It means if you buy oneand three there's a 66% chance that you will buy item 5 also similarly the rule1 comma 5 it means that If you buy one and five, there's a hundred percent chance that you will buythree also similarly if we have a lookat Rule 5 and 6 here the confidence value is less than 60 percent which wasthe assumed confidence value

So what we're going to do iswith reject these files now an important thingto note here is that have a closer lookto the Rule 5 and root 3, you see it has one five threeone five three three point five

It's very confusing

So one thing to keep in Mine is that the order of the item setsis also very important that will help usallow create good rules and avoid any kind of confusion

So that's that

So now let's learn how Association rule I used inMarket Basket analysis problems

So what we'll dois we will be using the online transactions data of a retail store forgenerating Association rules

So first of all, what you need to do isimport pandas MSD ml

D&D libraries from the importedand read the data

So first of all, what we'regoing to do is read the data, what we're goingto do is from M LX T and E dot frequent patterns

We're going to improve the apriori and Association rules

As you can see here

We have the head of the data

You can see we have invoicenumber stock code the description quantity the invoice dt8 unit pricecustomer ID and the country

So in the next step, what we will do is wewill do the data cleanup which includes removing

His from someof the descriptions given and what we're goingto do is drop the rules that do not have the invoice numbers every movethe crate transactions

So hey, what what you'regoing to do is remove which do not haveany invoice number if the string tight ainst Epstein was a number thenwe're going to remove that

Those are the credits removeany kind of spaces from the descriptions

So as you can see here, we have like five hundredand thirty-two thousand rows with eight columns

So next one

We're going to do isafter the cleanup

We need to consolidate the itemsinto one transaction per row with each product for the sakeof keeping the data set small

We're going to only lookat the sales for France

So we're going to usethe only France and group by invoice number description with the quantity sumup and see so which leaves uswith three ninety two rows and one thousand fivehundred sixty three columns

Now, there are a lotof zeros in the data, but we also need to make sureAny positive values are converted to a 1 andanything less than 0 is set to 0 so for that we're goingto use this code defining and code units if x is less than0 return 0 if x is greater than 1 returned one

So what we're going to do is map and apply it to the whole dataset we have here

So now that wehave structured data properly

So the next step is to generatethe frequent item set that has support of atleast seven percent

Now this number is chosen sothat you can get close enough

Now, what we're goingto do is generate the rules with the correspondingsupport confidence and lift

So we had giventhe minimum support a 0

7

The metric is leftfrequent Island set and threshold is 1 so these are the following rules now a fewrules with a high lift value, which means that itoccurs more frequently than would be expectedgiven the number of transaction the product combinations mostof the places the confidence

Is high as well

So these are few to observationswhat we get here

If we filter the data frameusing the standard pandas code for large lift sixand high confidence 0

8

This is what the outputis going to look like

These are 1 2 3 4 5 6 7 8

So as you can see here, we have the H ruleswhich are the final rules which are given bythe Association rule Mining and this is how allthe industries are

Are any of these we've talkedabout largely retailers

They tend to know how their products are usedand how exactly they should rearrange and providethe offers on the product so that people spendmore and more money and time in the shop

So that was allabout Association rule mining

So so guys, that's all forunsupervised learning

I hope you got to knowabout the different formulas how unsupervised learning worksbecause you know, we did not provideany label to the data

All we did was create some rulesand not knowing what the data is and we did clusteringsdifferent types of clusterings came in simi'shierarchical clustering

The reinforcement learningis a part of machine learning where an agent is putin an environment and he learns to behavein this environment by performing certain actions

Okay, so it basically performsactions and it either gets a rewards on the actions or It gets a punishmentand observing the reward which it gets from those actionsreinforcement learning is all about taking an appropriateaction in order to maximize the rewardin a particular situation

So guys in supervised learningthe training data comprises of the input and the expected output and so the model is trainedwith the expected output itself, but when it comesto reinforcement learning, there is no expected output herethe reinforcement agent decides

What actions to take in orderto perform a given task

In the absence of a trainingdata set it is bound to learn from its experience itself

All right

So reinforcement learningis all about an agent who's put inan unknown environment and he's going to use a hitand trial method in order to figure out the environment and then come upwith an outcome

Okay

Now, let's lookat reinforcement learning within an analogy

So consider a scenariowhere in a baby is learning how to walk the scenariocan go about in two ways

Now in the first casethe baby starts walking and makes it to the candy here

The candy is basicallythe reward it's going to get so since the candy is the end goal

The baby is happy

It's positive

Okay, so the baby is happyand it gets rewarded a set of candies now another wayin which this could go is that the baby starts walking but Falls due to some hurdlein between the baby gets hurt and it doesn't get any candyand obviously the baby is sad

So this is a negative reward

Okay, or you can saythis is a setback

So just like how we humans learn from our mistakesby trial and error

Learning is also similar

Okay, so we have an agent which is basicallythe baby and a reward which is the candy over here

Okay, and with many hurdlesin between the agent is supposed to find the best possible pathto read through the reward

So guys, I hopeyou all are clear with the reinforcement learning

Now

Let's look at thereinforcement learning process

So generally a reinforcementlearning system has two main components

All right, the first is an agentand the second one is an environment nowin the previous case, we saw that theagent was a baby

B and the environmentwas the living room where in the baby was crawling

Okay

The environment is the setting that the agent is actingon and the agent over here represents the reinforcementlearning algorithm

So guys the reinforcementlearning process starts when the environmentsends a state to the agent and then the agentwill take some actions based on the observations in turn the environmentwill send the next state and the respective rewardback to the agent

The agent will update its knowledge with the rewardreturned by the I meant and it uses that to evaluateits previous action

So guys thisLoop keeps continuing until the environment sendsa terminal state which means that the agent hasaccomplished all his tasks and he finally gets the reward

Okay

This is exactly what was depictedin this scenario

So the agent keepsclimbing up ladders until he reaches his rewardto understand this better

Let's suppose that our agent islearning to play Counter-Strike

Okay, so let's break it downnow initially the RL agent which is Only the player player1 let's say it's the player 1 who is trying to learnhow to play the game

Okay

He collects some statefrom the environment

Okay

This could be the first stateof Counter-Strike now based on the state the agentwill take some action

Okay, and this actioncan be anything that causes a result

So if the player moves left or right it's alsoconsidered as an action

Okay

So initially the actionis going to be random because obviously the first timeyou pick up Counter-Strike, you're not goingto be a master at it

So you're going to trywith different actions and you're just goingto Up a random action in the beginning

Now the environment is goingto give a new state

So after clearing that the environmentis now going to give a new state to the agent or to the player

So maybe he'sacross stage 1 now

He's in stage 2

So now the player will get a rewardour one from the environment because it cleared stage 1

So this reward can be anything

It can be additional pointsor coins or anything like that

Okay

So basically this Loopkeeps going on until the player is deador reaches the destination

Okay, and it Continuouslyoutputs a sequence of States actions and rewards

So guys

This was a small example to show you how reinforcementlearning process works

So you startwith an initial State and once a player clothesthat state he gets a reward after that the environment willgive another stage to the player and after it clears that stateit's going to get another reward and it's going to keep happening until the playerreaches his destination

All right, so guys,I hope this is clear now, let's move on and look at the reinforcementlearning definition

So there are a few Conceptsthat you should be aware of while studyingreinforcement learning

Let's look at thosedefinitions over here

So first we have the agentnow an agent is basically the reinforcement learningalgorithm that learns from trial and error

Okay

So an agent takes actions, like for example a soldierin Counter-Strike navigating through the game

That's also an action

Okay, if he moves left rightor if he shoots at somebody that's also an action

Okay

So the agent is responsible for taking actionsin the environment

Now the environment isthe whole Counter-Strike game

Okay

It's basically the worldthrough which the agent moves the environment takesthe agents current state and action as input and it Returns the agency rewardand its next state as output

Alright, next we have actionnow all the possible steps that an agent can takeare called actions

So like I said, it can be moving right leftor shooting or any of that

Alright, then we havestate now state is basically the current conditionreturned by the environment

So Double State you are in if you are in state 1 orif you're interested to that representsyour current condition

All right

Next we have reward a rewardis basically an instant return from the environmentto appraise Your Last Action

Okay, so it can beanything like coins or it can be additional points

So basically a rewardis given to an agent after it clears

The specific stages

Next we have policy policy isbasically the strategy that the agent uses to findout his next action

In based on his currentstate policy is just the strategy with whichyou approach the game

Then we have value

Now while you is the expectedlong-term return with discount so value and action valuecan be a little bit confusing for you right now

But as we move further, you'll understand whatI'm talking about

Okay, so value is basicallythe long-term return that you get with discount

Okay discount, I'll explainin the further slides

Then we have action value now action valueis also known as Q value

Okay, it's very similarto what You except that it takesan extra parameter, which is the current action

So basically here you'll findout the Q value depending on the particular actionthat you took

All right

So guys don't get confusedwith value and action value

We look at examples in the further slides and youwill understand this better

Okay, so guys make sure that you're familiarwith these terms because you'll be seeinga lot of these terms in the further slides

All right

Now before we move any further, I'd like to discussa few more Concepts

Okay

So first we will discussthe reward maximization

So if you haven't alreadyrealize the it the basic aim of the RL agent isto maximize the reward now, how does that happen? Let's try to understandthis in depth

So the agent must betrained in such a way that he takes the best actionso that the reward is maximum because the end goalof reinforcement learning is to maximize your rewardbased on a set of actions

So let me explain thiswith a small game now in the figure you can see thereis a Forks there's some meat and there's a tiger Soodd agent is basically the fox and his end goal is to eatthe maximum amount of meat before being eatenby the tiger now since the fox is a clever fellow he eats the meat that is closer to himrather than the meat which is closer to the tiger

Now this is because thecloser he is to the tiger the higher are his chancesof getting killed

So because of this the rewardswhich are near the tiger, even if they arebigger meat chunks, they will be discounted

So this is exactlywhat discounting means so our agent is not goingto eat the meat chunks which are Closer to the tigerbecause of the risk

All right now even though the meat chunksmight be larger

He does not want to takethe chances of getting killed

Okay

This is called discounting

Okay

This is where you discount because it improvisedand you just eat the meat which are closer to youinstead of taking risks and eating the meat which are closerto your opponent

All right

Now the discountingof reward Works based on a value called gammawill be discussing gamma in our further slides, but in short the valueof gamma is between 0 and 1

Okay

So the Follow the gamma

The larger isthe discount value

Okay

So if the gamma value is lesser, it means that the agentis not going to explore and he's not goingto try and eat the meat chunks which are closer to the tiger

Okay, but if the gamma valueis closer to 1 it means that our agent is actuallygoing to explore and it's going to dryand eat the meat chunks which are closer to the tiger

All right now, I'll be explaining thisin depth in the further slides

So don't worry if you haven't gota clear concept yet, but just understandthat reward maximized

Ation is a very important step when it comesto reinforcement learning because the agent hasto collect maximum rewards by the end of the game

All right

Now, let's lookat another concept which is called explorationand exploitation

So exploration like the namesuggests is about exploring and capturing more informationabout an environment on the other hand exploitationis about using the already known exploited informationto hide in the rewards

So guys consider the foxand tiger example that we discussed now here the foxy Only the meat chunkswhich are close to him, but he does not eatthe meat chunks which are closer to the tiger

Okay, even though theymight give him more Awards

He does not eat them if the fox only focuseson the closest rewards, he will never reachthe big chunks of meat

Okay, this iswhat exploitation is about you just going to usethe currently known information and you're goingto try and get rewards based on that information

But if the fox decidesto explore a bit, it can find the bigger awardwhich is the big chunks of meat

This is exactlywhat exploration is

So the agent is not goingto stick to one corner instead

He's going to explorethe entire environment and try and collect bigger rewards

All right, so guys, I hope you all are clear withexploration and exploitation

Now, let's lookat the markers decision process

So guys, this is basicallya mathematical approach for mapping a solution inreinforcement learning in a way

The purpose of reinforcementlearning is to solve a Markov decision process

Okay, so there area few parameters

Was that I used to getto the solution

So the parameters includethe set of actions the set of states the rewards the policy that you're taking to approachthe problem and the value that you get

Okay, so to sum it upthe agent must take an action a to transition from a start stateto the end State s while doing so the agentwill receive a reward are for each action that he takes

So guys a series of actions taken bythe agent Define the policy or a defines the approach

And the rewards that are collectedDefine the value

So the main goal here isto maximize the rewards by choosing the optimum policy

All right

Now, let's try to understandthis with the help of the shortest path problem

I'm sure a lot of you mighthave gone through this problem when you are in college, so guys lookat the graph over here

So our aim here isto find the shortest path between a and dwith minimum possible cost

So the value that you seeon each of these edges basically denotes the cost

So if I want to go from A to seeit's gonna cost me 15 points

Okay

So let's look athow this is done

Now before we moveand look at the problem in this problem the set ofstates are denoted by the nodes, which is ABCD and the action is to Traversefrom one node to the other

So if I'm going from A to B, that's an actionsimilarly a to see that's an action

Okay, the reward isbasically the cost which is representedby each Edge over here

All right

Now the policy is basicallythe path that I choose to reach the destination, so Let's say I choosea seed be okay, that's one policyin order to get to D and choosing a CDwhich is a policy

Okay

It's basically howI'm approaching the problem

So guys here youcan start off at node a and you can take baby stepsto your destination

Now initially you're clueless so you can just takethe next possible node, which is visible to you

So guys, if you're smart enough, you're going to choose ato see instead of ABCD or ABD

All right

So now if you are at nodessee you want to drive

String note D

You must againchoose a weisbarth

All right, you just haveto calculate which path has the highest cost or which path will giveyou the maximum rewards

So guys, this isa simple problem

We just trying to calculatethe shortest path between a and d by traversingthrough these nodes

So if I Traverse from a CD,it gives me the maximum reward

Okay, it gives me 65, which is more than any otherpolicy would give me

Okay

So if I go from ABD, it would be 40 when youcompare this to a CD

It gives me more reward

So obviously I'm goingto go with a CB

Okay, so guys wasa simple problem in order to understand howMarkov decision process works

All right, so guys,I want to ask you a question

What do you think? I did hear did I performexploration or did I perform exploitation now the policy for the above exampleis of exploitation because we didn't explorethe other nodes

Okay

We just selected three notesand we travel through them

So that's why thisis called exploitation

We must always explorethe different notes so that we Finda more optimal policy

But in this case, obviously a CD hasthe highest reward and we're going with a CD butgenerally it's not so simple

There are a lot of nodes therehundreds of notes you Traverse and there are like50 60 policies

Okay, 50 60 different policies

So you make sure you explorethrough all the policies and then decideon an Optimum policy which will give youa maximum reward the for a robot and environment is a place where It has beenput to use now

Remember this reward is itself the agent for examplean automobile Factory where a robot is usedto move materials from one place to another nowthe task we discussed just now have a property in common

Now, these tasks involveand environment and expect the agent to learnfrom the environment

Now, this is where traditionalmachine learning phase and hence the needfor reinforcement learning now

It is good to havean established overview of the problem

That is to be Of usingthe Q learning or the reinforcement learning so it helps to definethe main components of a reinforcementlearning solution

That is the agent environmentaction rewards and States

So let's suppose we are to build a few autonomous robots foran automobile building Factory

Now, these robots will helpthe factory personal by conveying themthe necessary parts that they would needin order to pull the car

Now

These different partsare located at nine different positionswithin the factory warehouse

The car part includethe chassis Wheels dashboard the engine and so on and the factory workershave prioritized the location that contains the body or the chassis to bethe topmost but they have provided the prioritiesfor other locations as well, which will look into the moment

Now these locations within the factory looksomewhat like this

So as you can see here, we have L1 L2 L3 all of these stations nowone thing you might notice here that there Little obstacleprison in between the locations

So L6 is the toppriority location that contains the chassisfor preparing the car bodies

Now the task isto enable the robots so that they can findthe shortest route from any given location toanother location on their own

Now the agents in this caseare the robots the environment is the automobileFactory warehouse

So let's talk about the state'sthe states are the location in which a particular robot is And in the particularinstance of time, which will denote it statesthe machines understand numbers rather than let us so let's mapthe location codes to number

So as you can see here, we have mapped locationl 1 to this t 0 L 2 and 1 and so on we have L8 asstate 7 and L line at state

So next what we're going to talkabout are the actions

So in our example, the action will bethe direct location that a robot can go from a particular locationright considering What that is a tel to location and the Direct locations towhich it can move rl5 L1 and L3

Now the figure here may comein handy to visualize this now as you might have alreadyguessed the set of actions here is nothing but the set of all possible states of the robot for each locationthe set of actions that a robot can takewill be different

For example, the setof actions will change if the robot isin L1 rather than L2

So if the robot is Isin L1 it can only go to L 4 and L 2 directly now that we are done with the statesand the actions

Let's talk about the rewards

So the states arebasically zero one two, three four and theactions are also 0 1 2 3 4 up to 8

Now

The rewards now willbe given to a robot

If a location which is the stateis directly reachable from a particular location

So let's take an example supposeL line is directly reachable from L8, right? If a robot goes from LAto align and vice versa, it will be rewarded by one and if I look a shin isnot directly reachable from a particular equation

We do not give any rewarda reward of 0 now the reward is just a number and nothing else it enablesthe robots to make sense of the movements helping them in deciding what locationsare directly reachable and what are not now with this Q

We canconstruct a reward table which contains allthe required values mapping between all possible States

So as you can see herein the table the positions which are marked greenhave a positive reward

And as you can see here, we have all the possible rewardsthat a robot can get by moving in between the different states

Now comes aninteresting decision

Now remember that the factoryadministrator prioritized L6 to be the topmost

So how do we incorporatethis fact in the above table

Now, this is done by associatingthe topmost priority location with a very high rewardthan the usual ones

So let's put 990

And in the cell L 6 comma and 6 now the table of rewardswith a higher reward for the topmost locationlooks something like this

We have not formally definedall the vital components for the solution

We are aiming forthe problem discussed

Now, you will shift gearsa bit and study some of the fundamental concepts that Prevail in the worldof reinforcement learning and q-learning the firstof all we'll start with the Bellmanequation now consider the following Square rooms, which is analogousto the actual environment

Aunt from our original problem, but without the barriers nowsuppose a robot needs to go to the room marked in the green promisecurrent position a using the specified Direction now, how can we enable the robotto do this programmatically one idea would be introducedsome kind of a footprint which the robot will be ableto follow now here a constant value is specifiedin each of the rooms which will comealong the robots way if it follows the directionspecified above now in this way if it starts at A itwill be able to scan through this constant value and will move accordinglybut this will only work if the direction is prefix and the robot always starts at the location a nowconsider the robot starts at this location ratherthan its previous one

Now the robotnow sees Footprints in two different directions

It is therefore unableto decide which way to go in order to get the destinationwhich is the Green Room

It happens primarily because the robotdoes not have a weight

Remember the directionsto proceed so our job now is to enablethe robot with a memory

Now, this is where the Bellmanequation comes into play

So as you can see here, the main reasonof the Bellman equation is to enable the rewardwith the memory

That's the thingwe're going to use

So the equation goessomething like this V of s gives maximum a rof s comma a plus gamma of vs - where s is a particular state which is a ROM a isthe Action Moving between the rooms as - is the state to whichthe robot goes from s and gamma is the discount Factor now we'll getinto it in a moment and obviously R of s commaa is a reward function which takes a state as an actiona and outputs the reward now V of s is the value of beingin a particular state which is the footprint now we consider allthe possible actions and take the one that yieldsthe maximum value now, there is one constraint howeverregarding the value Footprint, that is the row marked in the yellow justbelow the Green Room

It will always havethe value of 1 to denote that is one of the nearest room adjacent to the Green Roomnot this is also to ensure that a robot gets a reward when it goes from a yellow roomto The Green Room

Let's see how to makesense of the equation which we have here

So let's assumea discount factor of 0

9 as remember gamma isthe discount value or the discount Factor

So let's take a 0

9now for the room, which is Just below the one or the yellow room, which isthe Aztec Mark for this room

What will be the V of s that is the value of beingin a particular state? So for this V of swould be something like maximum of a will take 0 which is the initialof our s comma

Hey plus 0

9which is gamma into 1 that gives us zero pointnine now here the robot will not get any reward for going to a statemarked in yellow

Hence the ER s comma a is 0 here but the robot knows the valueof being in the yellow room

Hence V of s Dash isone following this for the other states

We should get 0

9 then again, if we put 0

9 in this equation, we get 0

81 than 0

7 to 9and then we again reach the starting point

So this is how the table looks withsome value Footprints computed from the Bellman equation nowa couple of things to It is here is that the max functionhas the robot to always choose the state that gives it the maximum valueof being in that state

Now the discount Factorgamma notifies the robot about how far it isfrom the destination

This is typically specified bythe developer of the algorithm

That would be installedin the robot

Now, the other states can alsobe given their respective values in a similar way

So as you can see herethe boxes adjacent to the green one have one and if we Move away from 1 weget 0

9 0

8 1 0 1 7 to 9 and finally we reach 0

66

Now the robot nowcan precede its way through the Green Room utilizingthese value Footprints event if it's droppedat any arbitrary room in the given location now, if a robot Lance up inthe highlighted Sky Blue Area, it will still findtwo options to choose from but eventually eitherof the parts will be good enough for the robot to take because Auto Vthe value for prints and only that out

Now one thing to note is that the Bellman equation is oneof the key equations in the world of reinforcementlearning and Q learning

So if we think realistically oursurroundings do not always work in the way we expectthere is always a bit of stochastic Cityinvolved in it

So this appliesto robot as well

Sometimes it might so happen that the robotsMachinery got corrupted

Sometimes the robot may comeacross some hindrance on its way which itmay not be known to it beforehand

Right and sometimes evenif the robot knows that it needs to takethe right turn it will not so how do we introducethis to cast a city in our case now here comesthe Markov decision process

So consider the robot iscurrently in the Red Room and it needs to goto the green room

Now

Let's now considerthe robot has a slight chance of dysfunctioning and might takethe left or the right or the bottom turn insteadof digging the upper turn and are Get to the Green Roomfrom where it is now, which is the Retro

Now the question is, how do we enable the robotto handle this when it is out in the given environment right

Now, this is a situation where the decision making regarding which turn isto be taken is partly random and partly another controlof the robot now partly random because we are not sure when exactly the robot minddysfunctional and partly under the control of the robot because it is still making a decision of takinga turn right on its own

And with the helpof the program embedded into it

So a Markov decision process is a discrete timestochastic Control process

It provides a mathematicalframework for modeling decision-making in situations where the outcomesare partly random and partly under the controlof the decision maker

Now we need to give this concept a mathematical shapemost likely an equation which then can be taken further

Now you might be surprised that we can do this with thehelp of the Bellman equation

Action with a few minor tweaks

So if we have a lookat the original Bellman equation V of X is equal to maximum of our s commaa plus gamma V of s - what needs to be changedin the above equation so that we can introducesome amount of Randomness here as long as we are not sure when the robot might not takethe expected turn

We are then also not surein which room it might end up in which is nothing but the ROM it moves from its current roomat this point according

To the equation

We are not sure of the a stash which is the next stateor the room, but we do know all the probableturns the robot might take now in order to incorporate each of this probabilitiesinto the above equation

We need to associatea probability with each of the turns toquantify the robot

If it has got any expertisechance of taking the stern know if we do so we getPS is equal to maximum of RS comma a plus gammainto summation of s - PS comma a comma s stash into Vof his stash now the PS a-- and a stash is the probability of moving from room sto establish with the action a and the submissionhere is the expectation of the situation

That's a robot in curse, which is the randomness now,let's take a look at this example here

So when we associate the probabilities to eachof these terms Owns, we essentially meanthat there is an 80% chance that the robot willtake the upper turn

Now, if you put allthe required values in our equation, we get V of s is equalto maximum of R of s comma a + comma of 0

8 into V of room up plus zero point 1 into Vof room down 0

03 into Rome of V of from left plus 0

03into V of room right now note that the value footprints

Not change due to the fact that we are incorporatingstochastically here

But this time wewill not calculate those values Footprints instead

We will let the robotto figure it out

Now up until this point

We have not consideredabout rewarding the robot for its action of goinginto a particular room

We are only watering the robot when it getsto the destination now, ideally there should be a rewardfor each action the robot takes to help it better assessthe quality of the actions, but the there was neednot to be always be the same but it is much betterthan having some amount of reward for the actionsthan having no rewards at all

Right and this idea is known asthe living penalty in reality

The reward systemcan be very complex and particularly modelingsparse rewards is an active area of research in the domainof reinforcement learning

So by now we have gotthe equation which we have a so what we're going to do isnow transition to Q learning

So this equation givesus the value of going to a particular Statetaking the stochastic city of the environment into account

Now, we have also learnedvery briefly about the idea of living penalty which deals with associatingeach move of the robot with a reward

So Q learning processes and idea of assessingthe quality of an action that is taken to move toa state rather than determining the possible value of the state which is being movedto so earlier

We had 0

8 into V

Eof s 1 0

03 into V of S 2 0 point 1 into Vof S 3 and so on now if you incorporate the ideaof assessing the quality of the action for movingto a certain state so the environmentwith the agent and the quality of the actionwill look something like this

So instead of 0

8 V of s 1 will have q of s1 comma a one will have q of S 2 comma 2 Q of S 3 nowthe robot now has food

In states to choose fromand along with that there are four different actions also forthe current state it is in so how do we calculate Q of s comma a that is the cumulative qualityof the possible actions the robot might take solet's break it down

Now from the equation V of sequals maximum a RS comma a + comma summation s -PSAs - into V of s - if we discard them

Maximum function we have isof a plus gamma into summation p and v now essentiallyin the equation that produces V of s

We are consideringall possible actions and all possible Statesfrom the current state that the robot is in and then we are taking the maximum value causedby taking a certain action and the equation producesa value footprint, which is for justone possible action

In fact, we can thinkof it as the quality of the So Q of s comma ais equal to RS comma a + comma of summation p and v now that we have got an equationto quantify the quality of a particular action

We are going to makea little adjustment in the equation we can now say that V of s is the maximumof all the possible values of Q of s comma a right

So let's utilize this fact and replace V of s Dash asa function of Q

So Q U

s

Comma a becomes R of s comma a +comma of summation PSAs - and maximum of the que es - a - so the equation of V is nowturned into an equation of Q, which is the quality

But why would we do that now? This is done toease our calculations because now we haveonly one function Q which is also the core of thedynamic programming language

We have only one

Ocean Q to calculate and R of s comma a isa Quantified metric which produces rewardof moving to a certain State

Now, the qualities of the actions arecalled The Q values and from now on we will referto the value Footprints as the Q valuesan important piece of the puzzle isthe temporal difference

Now temporal differenceis the component that will help the robotcalculate the Q values which respect to the changesin the environment over time

So consider Our robot iscurrently in the mark State and it wants to moveto the Upper State

One thing to note that here is that the robot already knowsthe Q value of making the action that is moving throughthe Upper State and we know that the environmentis stochastic in nature and the reward that the robot will getafter moving to the Upper State might be differentfrom an earlier observation

So how do we capturethis change the real difference? We calculate the new q s comma awith the same formula and subtract the Previouslyknown qsa from it

So this will in turngive us the new QA

Now the equation that we just derived giftsthe temporal difference in the Q values which further helpsto capture the random changes in the environment which may impose nowthe name q s comma a is updated as the followingso Q T of s comma is equal to QT minus 1 s comma a plus Alpha D DT of a comma s now here Allah Alpha isthe learning rate which controls how quickly the robot adaptsto the random changes imposed by the environment the qts commais the current state q value and a QT minus 1 s comma isthe previously recorded Q value

So if we replace the TDS comma awith its full form equation, we should get Q T of scomma is equal to QT - 1 of s comma y plus Alpha into R of scomma a plus gamma maximum

Q s Dash a dash minus QT minus 1 s comma a now that we have all the littlepieces of q line together

Let's move forwardto its implementation part

Now, this is the final equationof q-learning, right? So, let's see how we can implement thisand obtain the best path for any robot to take nowto implement the algorithm

We need to understandthe warehouse location and how that can be mappedto different states

So let's start by reconnectingthe sample environment

So as you can see here, we have L1 L2 L3 to alignand as you can see here, we have certain borders also

So first of all, let's map each of the abovelocations in the warehouse two numbers or the states so that it will easeour calculations, right? So what I'm going to do iscreate a new Python 3 file in the jupyter notebookand I'll name it as q-learning

Number

Okay

So let's define the states

But before that what weneed to do is import numpy because we're going to use numpy for this purpose and let'sinitialize the parameters

That is the gammaand Alpha parameters

So gamma is 0

75 which is the discount Factorwhereas Alpha is 0

9, which is the learning rate

Now next what we're going to dois Define the states and map it to numbers

So as I mentioned Earlier l1 is 0 and Dylan line

We have defined the statesin the numerical form

Now

The next step is to definethe actions which is as mentioned aboverepresents the transition to the next state

So as you can see here, we have an arrayof actions from 0 to 8

Now, what we're going to dois Define the reward table

So as you can see,it's the same Matrix that we created just now that I showed you just now nowif you understood it correctly, there isn't any realBarrel limitation as depicted in the image, for example, the transitionalfor tell one is allowed but the reward will bezero to discourage that path or in tough situation

What we do is adda minus 1 there so that it getsa negative reward

So in the above code snippetas you can see here

I took each of the states andput once in the respective state that are directly reachablefrom the certain State now, if you refer to that rewardtable, once again, what we created the above, our reconstruction willbe easy to understand but one thing to note here is that we did not consider the toppriority location L6 yet

We would also needan inverse mapping from the state's backto its original location and it will be cleaner when we reach to the utterdepths of the algorithms

So for that what we're goingto do Is have the inverse map location State delegation

We will take the distinctState and location and convert it back

Now

What we'll do is we will nowDefine a function get optimal which is the get optimal route, which will have a start locationand an N location

Don't worry

The code is pick but I'll explain you eachand every bit of the code

Now the get optimalroute function will take two arguments the style locationin the warehouse and the end locationin the warehouse recipe lovely and it will returnthe optimal route for reaching the end location from the starting locationin the form of an ordered list containing the letters

So we'll start by defining the function by initializingthe Q values to be all zeros

So as you can see here, we have given the Q valuehas to be 0 but For that what we need to do is copythe reward Matrix to a new one

So this is the rewardsnew and next again

What we need to do is getthe ending State corresponding to the ending location

And with this informationautomatically will set the priority of the given endingstay to the highest one that we are not defining it now, but will automaticallyset the priority of the given endingState as nine nine nine

So what we're goingto do is initialize the Q values to be 0 andin the queue learning process what you can see See here

We are taking I in range1,000 and we're going to pick up a state randomly

So we're going to usethe MP dot random r + NT and for traversingthrough the neighbor location in the same maze

We're going to iteratethrough the new reward Matrix and get the actions which are greaterthan 0 and after that what we're going to do is pickan action randomly from the list of the playable actions in years to the next state will going to computethe temporal difference, which is TD, which is the rewards plus gammainto The queue of next state and will take n p dot ARG Max of Q of next eight minus Qof the current state

We going to then update the Q values usingthe Bellman equation as you can see here, you have the Bellman equation and we're goingto update the Q values and after that we're goingto initialize the optimal route with a starting locationnow here we do not know what the next location yet

So initialize it with the valueof the starting location, which again is the random Shhnow we do not know about the exact numberof iteration needed to reach to the final location

Hence while loop will bea good choice for the iteration

So when you're going to fetchthe starting State fetch the highest Q value penetrating to the starting Statewe go to the index or the next state, but we needthe corresponding letter

So we're going to use that stateto location function

We just mentioned there and after that we're goingto update the starting location for the next iteration

Finally, we'll return the root

So let's take the startinglocation of n line and and location of L1 and seewhat part do we actually get? So as you can see here, we get Airline l8lfive L2 and L1

And if you have a lookat the image here, we have if we start from L9 to L1 we got l8l 5 L2 l 1 L HL 5 L2 L1

That would yield us the maximum

Mm value of the maximumreward for the robot

So now we have come to the end of this Q learning sessionthe past year has seen a lot of great examplesfor machine learning and many new high-impact application of machinelearning with discovered and brought to light especiallyin the healthcare Finance the speech recognitionaugmented reality and much more complex 3Dand video applications

The natural languageprocessing was easily the most talked about domain within the communitywith the likes of you

Lmf it and butbeing open sourced

So let's have a look at some of the amazingmachine learning projects which are open sourcedthe code is available for you

And those are discussed inthis 2018 to nine in Spectrum

So the first and the foremostis tensorflow dot DS now machine learning in the browser or fictional thoughta few years back

Back and a stunning reality

Now a lot of us in this fieldare welded to our favorite IDE, but tells of not DOT JS has thepotential to change your habits

It's become a very popularreleased since its release earlier this year and continues to amazewith its flexibility

Now as a repository states, there are primarilythree major features of terms of rho dot J'sdevelop machine learning and deep learning models in your processitself run pre-existent as a flow models withinthe browser retrain our Gene these prediction models as well

And if you are familiar withKara's the high-level layers EPA will seem quite familiar, but there are plenty of examplesavailable on GitHub repository

So do check out those legsto Quicken your learning curve

And as I mentioned earlier, I'll leave the linksto all of these open source machine learning projectsin the description below

The next what wenot discuss is detector on it is developed by Facebookand made a huge Splash when it was earlier launched in

An 80 those developed byFacebook's AI research team, which is fa ir

And it implements the state of the art objectdetection frame was it is written in Python and as help enablemultiple projects including the dance pose

Now, we'll know what exactly is then supposeafter this example and this repositorycontains the code of over 70 preacher involves

So it's a very goodopen source small guys

So to check it out nowthe moment I talked about then suppose

That's the next one

I'm going to talk about so That's supposed stents humanpose estimation in the wild, but the code to trainand evaluate your own dance pose using the our CNN modelis included here and I've given the linkto the open source code in the description below and there are notebooksavailable as well to visualize certain Sports cocoa data setthe next on our list

We have Dpainterly harmonization

Now, I want you to takea moment to just admire the above images

Can you tell which oneswe're done by a human and which one by a machine? I certainly could not now here

The first frame isthe input image the original one and a third frame as you can see herehas been generated by this technique amazing, right? The algorithm hasan external object to your choosing to any image and manages it to make it looklike nothing touched it now, make sure you check outthe code and try to implement it on different setsof images yourself

It is really really fun

But talking about images

We have image out painting now what if I give you an image andask you to extend Its boundaries by imagining what it would look like when the entirescene was captured

You would understandably turnto some image editing software

But here's the awesome news

You can achieve itin few lines of code, which is the image out painting

Now

This project is Akira'simplementation of Stanford image out failing paper, which is incredibly cooland Illustrated paper

And this is how mostresearch paper should be

I've given the linksin the description below to check it outguys and see how you can

Implement it now

Let's talk about audioprocessing which is an another field where machine learninghas started to make its mark

It is not just limitedto generate music

You can do tasks like audioclassification fingerprinting segmentation tagging and muchmore and there is a lot that's still yet to be explored and who knows perhaps youcould use this project to Pioneer your way to the top

Now what if you wantto discover your own planner now that might perhapsbe overstating things a bit, but the astronaut repositorywill definitely get you close

The Google brain team discovered two new planets in the summer2017 by applying the astronaut

It's a deep neural networkmeant for working with astronomical data

It goes to showthe far-ranging application of machine learning and wasa truly Monumental development

And now the team behind the technology hasopen source the entire code, so go ahead and checkout your own planet and who knows you might even havea planet on your name now, I could not possiblylet this section

Pass by withoutmentioning the brt

The Google AI is releasedhas smashed record on his way to winning the heartsof NLP enthusiasts and experts alike following you

Lmf it and he LMO brt reallyblew away the competition with its performance

It obtained a stateof art result on 11 and LP task apart fromthe official Google repository

There is a pythonimplementation of birth, which is worth checking outwhether it makes a new era or not in naturallanguage processing

The thing we will soonfind out now add on it

I'm sure you guysmight have heard of it

It is a frameworkfor automatically learning high quality models withoutrequiring programming expertise since it's a Google invention

The framework is basedon tensorflow and you can build and simple modelsusing a Danette and even extend it to useto train a neural network

Now the GitHub page containsthe code and example the API documentation and other things to getyour hands dirty the trust me Otto ml is the next big thing

NG in our field now if you follow a few researcherson social media, you must have comeacross some of the images

I am showing here in a video form a stick humanrunning across the terrain or trying to standup or some sort, but that my friendsis reinforcement learning and action now, here's a signature exampleof it a framework to create a simulated humanoid to imitatemultiple motion skin

So let's have a lookat the top 10 skills

Are required to become a successful machinelearning engineer

So starting withprogramming languages python is the lingua Francaof machine learning

You may have hadexposure to buy them

Even if you weren't previously in programming or in a computerscience research field

However, it is important to havea solid understanding of glasses and data structures

Sometimes python won'tbe enough often

You'll encounter projects that need to leverage hardwarefor Speed improvements

Now, make sure you are familiarwith the basic algorithms as well as the classes

Memory managementand linking now if you want a jobin machine learning, you will probably haveto learn all of these languages at some point C++ can helpin speeding code up

Whereas our works greatin statistics and plots and Hadoop is java-based

So you probably needto implement mappers and reducers in Java

Now next we have linear algebra

You need to be intimatelyfamiliar with mattresses vectors and matrix multiplication if you have an understandingof derivatives and integrals, You should be in the clear

Otherwise even simple concept like gradient descentwill elude you statistic is going to come up a lot atleast make sure you are familiar with the caution distributionsmeans standard deviation and much more every bitof statistical understanding Beyond this helpsthe theories help in learning about algorithms greatsamples are naive buys gaussian mixture modelsand hidden Markov models

You need to have a firmunderstanding of probability and stats to understandthese these models just go nuts and study measure Theory and next we have advancedsignal processing techniques

Now feature extraction is oneof the most important parts of machine learningdifferent types of problems need various Solutions

You may be able to utilizereally cool Advanced signal processing algorithmssuch as wavelets share

Let's go bladesand bandless you need to learn about the time-frequencyanalysis and try to apply it in your problems

Now, this skill will giveyou an edge over all the other skills not this kid

Will give you an edge while you're applying for a machine learning enginethe job or others or next we have applied maths a lot of machinelearning techniques out

There are just fancy typesof functional approximation

Now these often get developedby theoretical mathematician and then get applied by people who do not understandthe theory at all

Now the result is that many developersmight have a hard time finding the best techniquesfor the problem

So even a basic understandingof numerical analysis will give you a huge Edge havinga firm understanding

Ending of algorithmTheory and knowing how the algorithm works

You can also discriminate modelssuch as svm's now you will need to understand subjects such as gradient descent convexoptimization LaGrange quadratic programming partialdifferentiation equations and much more now all this mathmight seem intimidating at first if you have been awayfrom it for a while just machine learning ismuch more math intensive than something likefront-end developer

Just like any other skillgetting better at math is a man

Our Focus practicethe next skill in our list is the neuralnetwork architectures

We need machine learningfor tasks that are too complex for human to quotedirectly that is tasks that are so complex that it is Impractical nowneural networks are a class of models within the generalmachine learning literature or neural networks area specific set of algorithms that have revolutionizedmachine learning

They're inspired bybiological neural networks, and the current so-calleddeep neural networks have proven to work quite well

Well, the neuralnetworks are themselves General function approximations, which is why they can be applied to almostany machine learning problem about learning a complex mapping from the inputto the output space

Of course, there are stillgood reason for the surge in the popularityof neural networks, but neural networks have beenby far the most accurate way of approaching many problems liketranslation speech recognition and image classification nowcoming to our next point which is the naturallanguage processing now since it combinescomputer science and Listed, there are a bunch of librarieslike the NLT K chances

Mm and the techniquessuch as sentimental analysis and summarization that are unique to NLP now audioand video processing has a frequent overlap withthe natural language processing

However, natural languageprocessing can be applied to non audio datalike text voice and audio analysis involvesextracting useful information from the audio signalsthemselves being well-versed in math will getyou far in this one and you should also be familiar

Her with the concepts such asthe fast Fourier transforms

Now, these werethe technical skills that are required to become a successfulmachine learning engineer

So next I'm going to discusssome of the non-technical skills or the soft skills, which are required to becomea machine-learning engineer

So first of all,we have the industry knowledge

Now the most successfulmachine learning projects out

There are going to be those that address real pain pointswhichever industry we are working for you should know how that industry works and Will be beneficialfor the business if a machine learning engineerdoes not have business Acumen and the know-how of the elements that make upa successful business model or any particular algorithm

Then all those technical skillscannot be Channel productively, you won't be ableto discern the problems and potential challengesthat need solving for the business to sustain and grow you won'treally be able to help your organization explorenew business opportunities

So this is a must-haveskill now next we have effective communication

You'll need to explainthe machine learning Concepts to the peoplewith little to no expertise in the field chancesare you'll need to work with a team of Engineers aswell as many other teams

So communication is goingto make all of this much more easier companies searching for a strong machine learningengineer looking for someone who can clearly and fluently translatetheir technical findings to a non technical teamsuch as marketing or sales departmentand next on our list

We have rapid prototyping soIterating on ideas as quickly as possible is mandatoryfor finding one that works in machine learningthis applies to everything from picking up the right model to working on projectssuch as A/B Testing you need to do a group of techniques used to quicklyfabricate a scale model of a physical part or assembly using the three-dimensionalcomputer aided design, which is the cat so last but not the least wehave the final skill and that is to keep updated

You must stay up to datewith Any upcoming changes every month new neuralnetwork models come out that are performedthe previous architecture

It also means being aware of the news regardingthe development of the tools the changelog the conferences and much more you need to know aboutthe theories and algorithms

Now this you can achieve by reading the research papersblogs the conference's videos

And also you need to focuson the online community with changes very quickly

So expect and cultivatethis change now, this is not the Here we havecertain skills the bonus skills, which will give you an edgeover other competitors or the other persons who are applying for a machine-learning engineerposition on the bonus point

We have physics

Now, you might be in a situation where you're like to applymachine learning techniques to A system that will interact with the realworld having some knowledge of physics will takeyou far the next we have reinforcement learning

So this reinforcement learninghas been a driver behind many of the mostexciting developments in the Deep learning and the AI community

T from the alphago zero tothe open a is Dota 2 pot

This will bea critical to understand if you want to gointo robotics self-driving cars or other AI related areas

And finally we have computer vision out of allthe disciplines out there

There are by far the most resources availablefor learning computer vision

This field appears to havethe lowest barriers to entry but of course this likely means you will faceslightly more competition

So having a good knowledgeof computer vision how it rolls willgive you an edge

Other competitors now

I hope you got acquaintedwith all the skills which are requiredto become a successful machine learning engineer

As you know, we are living in the worldsof humans and machines in today's world

These machines arethe robots have to be programmed before they startfollowing your instructions

But what if the machine started learning on its own fromtheir experience work like us and feel like us and do things moreaccurately than us now? Well his machine learning Angelacomes into picture to make sure everything is workingaccording to the procedures and the guidelines

So in my opinion machinelearning is one of the most recent and And Technologies, there is you probably use itat dozens of times every day without even knowing it

So before we indulge into the different rolesthe salary Trends and what should bethere on the resume of a machine learning engineer while applying for a job

Let's understand who exactly a machine learningengineering is so machine learning Engineers aresophisticated programmers who develop machines and systems that can learn and apply knowledge withoutspecific Direction artificial intelligence is the goalof a machine-learning engineer

They are computer programmers but their focus goesbeyond specifically programming machines toperform specific tasks

They create programs that will enablemachines to take actions without being specificallydirected to perform those tasks

Now if we have a lookat the job trends of machine learning in general

So as you can seein Seattle itself, we have 2,000 jobs in New York

We have 1100 San Francisco

We have 1100 in Bengaluru India, we have 1100 and thenwe have Sunnyvale, California where we haveIf I were a number of jobs, so as you can see the number ofjobs in the market is too much and probably with the emergenceof machine learning and artificial intelligence

This number is justgoing to get higher now

If you have a look at the jobopening salary-wise percentage, so you can see for the $90,000per annum bracket

We have 32

7 percentageand that's the maximum

So be assured that if you get a job asa machine-learning engineer, you'll probably getaround 90 thousand bucks a year

That's safe to say

Now for the hundred andten thousand dollars per year

We have 25% $120,000

We have 20 percent almost then we have a hundredand thirty thousand dollars which are the seniormachine learning and Jenna's that's a 13 point6 7% And finally, we have the most seniormachine learning engineer or we havethe data scientist here, which have the salary of a hundred and forty thousanddollars per annum and the percentagefor that one is really low

So as you can see there isa great opportunity for people

What trying to gointo machine learning field and get started with it? So let's have a lookat the machine learning in junior salary

So the average salaryin the u

s

Is around a hundred eleventhousand four hundred and ninety dollars and the average salaryin India is around seven last nineteen thousandsix hundred forty six rupees

That's a verygood average salary for any particular profession

So moving forwardif we have a look at the salary ofan entry-level machine learning

You know, so the salaryranges from $76,000 or seventy seven thousanddollars two hundred and fifty one thousanddollars per annum

That's a huge salary

And if you talkabout the bonus here, we have like three thousand dollars to twentyfive thousand dollars depending on the work YouTube andthe project you are working on

Let's talk aboutthe profit sharing now

So it's aroundtwo thousand dollars to fifty thousand dollars

Now this again depends upon the project you are workingthe company you are working for and the percentage that Give to the in generalor the developer for that particular project

Now, the total pay comes aroundseventy six thousand dollars or seventy-five thousand dollars two hundred and sixtytwo thousand dollars and this is just for the entrylevel machine learning engineer

Just imagine if you becomean experience machine learning engineer your salaryis going to go through the roof

So now that we have understood who exactly isa machine learning engineer the various salary Trendsthe job Trends in the market and how it's rising

Let's understand

What skills it takes to becomea machine learning engine

So first of all, we have programming languagesnow programming languages are big deal when it comesto machine learning because you don't justneed to have Proficiency in one language you mightrequire Proficiency in Python

Java are or C++ because you might be workingin a Hadoop environment where you require Java programming to dothe mapreduce Coatings and sometimes our is very great for visualization purposesand python has you know, Another favorite languages when comes to machinelearning now next scale that particular individual needsis calculus and statistics

So a lot of machine learningalgorithms are mostly maths and statistics

So and a lot of static is required majorlythe matrix multiplication and all so good understanding of calculus as well asstatistic is required

Now next we have signal processing now Advancedsignal processing is something that will give you an upper Edge over other machinelearning engine is if you are Applyingfor a job anywhere

Now the next kill wehave is applied maths as I mentioned earliermany of the machine learning algorithms here arepurely mathematical formulas

So a good understanding of mathsand how the algorithm Works will take you far aheadthe next on our list

We have neural networks

No real networks are something that has been emerging quite popularly in the recentyears and due to its efficiency and the extent to which itcan walk and get the results as soon as possible

Neural networks are a must for machine learning enginenow moving forward

We have language processing

So a lot of times machinelearning Engineers have to deal with text Data the voice dataas well as video data now processing any kindof language audio or the video is something that a machine-learning engineerhas to do on a daily basis

So one needs to be proficientin this area also now, these are only someof the few skills which are absolutely necessary

I would say forany machine learning and Engineer so let'snow discuss the job description or the roles and responsibilities of a particular machinelearning engineer now depending on their level of expertise machinelearning Engineers may have to study and transformdata science prototypes

They need to designmachine Learning Systems

They also need to research andImplement appropriate machine learning algorithms and tools as it's a veryimportant part of the job

They need to developnew machine learning application according to the industryrequirements the Select the appropriate data sets and the datarepresentation methods because if there is a slightdeviation in the data set and the data representation that's going toaffect Model A lot

They need to run machinelearning tests and experiments

They need to performstatistical analysis and fine-tuning usingthe test results

So sometimes people ask what exactly is a differencebetween a data analyst and a machine learning engineer

So so static analysis just a small part of ofmachine learning Engineers job

Whereas it is a major part or it probably covers a largepart of a data analyst job rather than a machinelearning Engineers job

So machine learningEngineers might need to train and retrain the systemswhenever necessary and they also need to extend the existingmachine learning libraries and Frameworks totheir full potential so that they could makethe model Works superbly and finally they need to keepabreast of the developments in the field needless to say that any machine

In general or any particularindividual has to stay updated to the technologies that are coming in the market and every now and thena new technology arises which will overthrowthe older one

So you need to beup to date now coming to the resume partof a machine learning engineer

So any resume of a particularmachine learning Engineers should consist like clearcareer objective skills, which a particularindividual possesses the educational qualification certain certificationthe past experience if you are an experiencedmachine learning and Jen are the projects which youhave worked on and that's it

So let's have a lookat the various elements that are required in a machine-learningEngineers resume

So first of all, you need to havea clear career objective

So here you will neednot stretch it too much and keep it asprecise as possible

So next we have the skillsrequired and these skills can be technical aswell as non technical

So let's have a look at the various Technical andnon-technical skills out here

So starting withthe technical skills

First of all, we have programming languagesas an our Java Python and C++

But the first and the foremost requirementis to have a good grip on any programming languagespreferably python as it is easy to learn and it's applications are widerthan any other language now, it is important to havea good understanding of topics like data structures memorymanagement and classes

All the python isa very good language it alone cannot help you so you will probablyhave to learn all these he's languageslike C++ are python Java and also work on mapreduce at some point of timethe next on our list

We have calculus and linearalgebra and statistics

So you'll need to beintimately familiar with matrices the vectorsand the matrix multiplication

So statistics is goingto come up a lot and at least make sureyou are familiar with caution distributionmeans standard deviations and much more

So you also need to have a firmunderstanding of probability

Stats to understand the machinelearning models the next as I mentioned earlier, it'ssignal processing techniques

So feature extraction is oneof the most important parts of machine learning different types of problemsneed various Solutions

So you may be able to utilize the really cool Advanced signalprocessing algorithms such as wavelengths shallots curve

Let's and the ballastso try to learn about the time-frequency analysis andtry to apply it to your problems as it gives you an upper jaw

Our other machinelearning Engineers, so just go for the next we have mathematics and a lot ofmachine learning techniques out

There are just fancy typesof function approximation having a firm understandingof algorithm Theory and knowing how the algorithm worksis really necessary and understanding subjects like gradient descentconvex optimization quadratic programming and partial differentiation willhelp a lot the neural networks as I was talking earlier

So we need machine learningfor tasks that are too Flex for humans to quote directly

So that is the tasksthat are so complex that it is Impractical neuralnetworks are a class of models within the generalmachine learning literature

They are specificset of algorithms that have revolutionized machine learning deep neural networkshave proven to work quite well and neural networks are themself Generalfunction approximations, which is why they can be applied to almost any machinelearning problem out there and they help a lotabout learning a complex mapping from the input to The output space now nextwe have language processing since natural languageprocessing combines two of the major areas of work that are linguisticand computer science and chances are at some pointyou are going to work with either textor audio or the video

So it's necessary to havea control over libraries like gents mm and ltk and techniques likewhat to wet sentimental analysis and text summarization Now voice and audio analysis involvesextracting useful information from the Your signals themselvesvery well versed in maths and concept like Fouriertransformation will get you far in this one

These were the technical skillsthat are required but be assured that there are a lotof non technical skills

Also that are requiredto land a good job in a machine learning industry

So first of all, you need to havean industry knowledge

So the most successfulmachine learning projects out

There are going to be thosethat address real pain points, don't you agree? So whichever industryare working for You should know how that industry works and what will be beneficialfor the industry

Now, if a machinelearning engineer does not have business Acumenand the know-how of the elements that make upa successful business model

All those technicalskills cannot be channeled productively

You won't be ableto discern the problems and the potential challengesthat need solving for the business to sustainand grow the next on our list

We have effective communicationand not this is one of the most important partsin any job requirements

So you'll need to In machinelearning Concepts to people with little to no expertisein the field a chances are you will need to workwith a team of Engineers as well as many otherteams like marketing and the sales team

So communication is goingto make all of this much easier companies searching for the strong machine learningengineer looking for someone who can clearly and fluencytranslate technical findings to a non technical team

Rapid prototypingis another skill, which is very much required forany machine learning engineer

So iterating on ideas asquickly as possible is mandatory for finding the one that works in machine learningthis applies to everything from picking the right model to working on projectssuch as a/b testing and much more now youneed to do a group of techniques used to quicklyfabricate a scale model of a physical part or assembly using the three-dimensionalcomputer aided design, which is the cat data now comingto the final skills, which will be required for any machine learning agendais to keep updated

So you must stay up to datewith any upcoming changes every month new neuralnetwork models come out that outperformedthe previous architecture

It also means being aware of thenews regarding the development of the tools Theory and algorithms through researchpapers blocks conference videos and much more

Now another part of any machinelearning engineer's resume is the education qualification

So a bachelor's or master's degree in computerscience RIT economics statistics or even mathematics can help

Up you land a jobin machine learning plus if you are an experiencedmachine learning engineer, so probably some standardcompany certifications will help you a lot when Landing a good jobin machine learning and finally comingto the professional experience

You need to have experience incomputer science statistics data as is if you are switching from any other profession intoa machine learning engineer, or if you have a previousexperience in machine learning that is very well

Now finally if we talkabout The projects so you need to havenot just any project that you have worked on youneed to have working on machine learning related projects that involve acertain level of AI and working on neural networksto a certain degree to land a good job asa machine-learning engineer

Now if you have a lookat the company's hiring machine learning Engineers, so every othercompany is looking for machine learning Engineers who can modify the existingmodel to something that did not need much more

Of Maintenance and cancelsustain so basically working on artificial intelligenceand new algorithms that can work on their own iswhat every company deserves

So Amazon Facebook

We have Tech giants like Microsoft IBM againin the gaming industry, we have or the GPUindustry Graphics industry

We have Nvidiain banking industry

We have JPMorgan Chase again, we have LinkedInand also we have Walmart

So all of these companiesrequire machine learning engine at some part of the time

So be assured that if you are looking for a machinelearning engineer post, every other companies be ita big shot company or even the new startups are lookingfor machine learning Engineers

So be assured you will geta job now with this we come to an end of this video

So I hope you've gota good understanding of who exactly aremachine learning engineer is the way just job Trendsthe salary Trends

What are the skills required tobecome machine learning engineer and once you becomea machine-learning engineer, what are the rolesand responsibilities or the Job description what appears to be on the resumeor the job description what appears to be on the job application ofany machine learning engineers? And also I hope you got to knowhow to prepare your resume or how to prepare itin the correct format

And what on to keep their in the resume the careerobjectives the skills Technical and non-technical previous experienceeducation qualification and certain projects which are related to it

So that's it guys Ed Rica as you know providesa machine learning

Engineer master's program nowthat is aligned in such a way that will get you acquaintedin all the skills that are requiredto become a machine learning engine and that tooin the correct form