Hi everyone,this is codebeast from codebeast, and I welcome you to this session on Artificial Intelligence full course

In this video, I'll be covering all the domains and the concepts involved under the umbrellaof artificial intelligence, and I will also be showingyou a couple of use cases and practical implementationsby using Python

So there's a lot to cover in this session, and let me quickly run youthrough today's agenda

So we're gonna begin the session by understanding the historyof artificial intelligence and how it cam into existence

We'll follow this by looking at why we're talking aboutartificial intelligence now, why has it gotten so famous right now

Then we'll look at what exactlyis artificial intelligence

We'll discuss the applicationsof artificial intelligence, after which we'll discuss the basics of AI where in we'll understand the different types ofartificial intelligence

We'll follow this by understanding the different programming languages that can be used to study AI

And we'll understand whywe're gonna choose Python

Alright, I'll introduce you to Python

And then we'll move on anddiscuss machine learning

Here we'll discuss the differenttypes of machine learning, the different algorithmsinvolved in machine learning, which include classification algorithms, regression algorithms, clustering, and association algorithms

To make you understandmachine learning better, we'll run a couple of demos wherein we'll see howmachine learning algorithms are used to solve real world problems

After that, we'll discuss the limitations of machine learning and why deep learning is needed

I'll introduce you to thedeep learning concept, what are neurons, perceptrons, multiple layer perceptrons and so on

We'll discuss the differenttypes of neural networks, and we'll also look at whatexactly back propagation is

Apart from this, we'll be running a demo to understand deep learning in more depth

And finally we'll moveonto the next module, which is natural language processing

On the natural language processing, we'll try to understandwhat is text mining, the difference between text mining in NLP, what are the differentterminologies in NLP, and we'll end the session by looking at the practical implementationof NLP using Python, alright

So guys, there's a lot tocover in today's session

Also, if you want to stay updated about the recent technologies, and would like to learn moreabout the training technology,

So let's move ahead and takea look at our first topic which is history ofartificial intelligence

So guys, the concept ofartificial intelligence goes back to the classical ages

Under Greek mythology, the concept of machines and mechanical men were well thought of

So, an example of this is Talos

I don't know how many ofyou have heard of this

Talos was a giant animated bronze warrior who was programmed toguard the island of Crete

Now these are just ideas

Nobody knows if this wasactually implemented, but machine learning and AIwere thought of long ago

Now let's get back to the 19th century

Now 1950 was speculated to be one of the most important years for the introduction ofartificial intelligence

In 1950, Alan Turing published a paper in which he speculatedabout the possibility of creating machines that think

So he created what isknown as the Turing test

This test is basically used to determine whether or not a computercan think intelligently like a human being

He noted that thinkingis difficult to define and devised his famous Turing test

So, basically, if a machinecan carry out a conversation that was indistinguishable from a conversation with a human being, it was reasonable to saythat the machine is thinking, meaning that the machinewill pass the Turing test

Now, unfortunately, up to this date, we haven't found a machinethat has fully cleared the Turing test

So, the Turing test was actuallythe first serious proposal in the philosophy ofartificial intelligence

Followed by this was the era of 1951

This was also known as the game AI

So in 1951, by using theFerranti Mark 1 machine of the University of Manchester, a computer scientist knownas Christopher Strachey wrote a checkers program

And at the same time, a program was written for chess as well

Now, these programs werelater improved and redone, but this was the first attempt at creating programs that could play chess or that would compete withhumans in playing chess

This is followed by the year 1956

Now, this is probablythe most important year in the invention of AI

Because in 1956, for the firs time, the term artificialintelligence was coined

Alright

So the term artificial intelligence was coined by John McCarthy at the Dartmouth Conference in 1956

Coming to the year 1959, the first AI laboratory was established

This period marked theresearch era for AI

So the first AI lab whereresearch was performed is the MIT lab, which is still running til date

In 1960, the first robot was introduced to the General Motors assembly line

In 1961, the first chatbot was invented

Now we have Siri, we have Alexa

But in 1961, there was a chatbot known as Eliza, which was introduced

This is followed by thefamous IBM Deep Blue

In 1997, the news broke down that IBM's Deep Bluebeats the world champion, Garry Kasparov, in the game of chess

So this was kind of thefirst accomplishment of AI

It was able to beat theworld champion at chess

So in 2005, when the DARPAGrand Challenge was held, a robotic car named Stanley, which was built by Stanford's racing team, won the DARPA Grand Challenge

That was another big accomplish of AI

In 2011, IBM's questionanswering system, Watson, defeated the two greatestJeopardy champions, Brad Rutter and Ken Jennings

So guys, this was how AI evolved

It started off as ahypothetical situation

Right now it's the mostimportant technology in today's world

If you look around every where, everything around us is runthrough AI deep learning or machine learning

So since the emergence of AI in the 1950s, we have actually seenan exponential growth and its potential

So AI covers domainssuch as machine learning, deep learning, neural networks, natural language processing, knowledge based, expert systems and so on

It is also made its wayinto computer vision and image processing

Now the question hereis if AI has been here for over half a century, why has it suddenlygain so much importance? Why are we talking aboutartificial intelligence now? Let me tell you the mainreasons for the demand of AI

The first reason is what we have more computation power now

So, artificial intelligence requires a lot of computing power

Recently, many advances have been made and complex deep learningmodels are deployed

And one of the greatest technology that made this possible are GPUs

Since we have morecomputational power now, it is possible for us to implement AI in our daily aspects

Second most important reason is that we have a lot of data at present

We're generating dataat an immeasurable pace

We are generating datathrough social media, through IoT devices

Every possible way, there's a lot of data

So we need to find a method or a solution that can help us process this much data, and help us derive useful insight, so that we can grow businesswith the help of data

Alright, so, that process is basically artificial intelligence

So, in order to have a useful AI agent to make smart decisions like telling which item to recommend next when you shop online, or how to classify anobject from an image

AI are trained on large data sets, and big data enables us todo this more efficiently

Next reason is now wehave better algorithms

Right now we have veryeffective algorithms which are based on theidea of neural networks

Neural networks is nothing but the concept behind deep learning

Since we have better algorithms which can do better computations and quicker computationswith more accuracy, the demand for AI has increased

Another reason is thatuniversities, governments, startup, and tech giantsare all investing in AI

Okay, so companies like Google, Amazon, Facebook, Microsoft, all of these companieshave heavily invested in artificial intelligence because they believethat AI is the future

So AI is rapidly growingboth as a field of study and also as an economy

So, actually, this is the right time for you to understand whatis AI and how it works

So let's move on and understand what exactly artificial intelligence is

The term artificial intelligence was first coined in theyear 1956 by John McCarthy at the Dartmouth Conference

I already mentioned this before

It was the birth of AI in the 1956

Now, how did he defineartificial intelligence? John McCarthy defined AI asthe science and engineering of making intelligent machines

In other words, artificial intelligence is the theory and developmentof computer systems able to perform task that normally require human intelligence, such as visual perception,speech recognition, decision making, andtranslation between languages

So guys, in a sense, AI is atechnique of getting machines to work and behave like humans

In the rest past, artificial intelligence has been able to accomplish this by creating machines and robots that have been used inwide range of fields, including healthcare, robotics, marketing, business analytics, and many more

With this in mind, let's discuss a couple ofreal world application of AI, so that you understand howimportant artificial intelligence is in today's world

Now, one of the most famous applications of artificial intelligence is the Google predictive search engine

When you begin typing a search term and Google makes recommendationsfor you to choose from, that is artificial intelligence in action

So predictive searches are based on data that Google collects about you, such as your browserhistory, your location, your age, and other personal details

So by using artificial intelligence, Google attempts to guess whatyou might be trying to find

Now behind this, there's a lot of naturallanguage processing, deep learning, andmachine learning involved

We'll be discussing all of those concepts in the further slides

It's not very simple tocreate a search engine, but the logic behind Google search engine is artificial intelligence

Moving on, in the finance sector, JP Morgan Chase's ContractIntelligence Platform uses machine learning,artificial intelligence, and image recognition software to analyze legal documents

Now let me tell youthat manually reviewing around 12,000 agreementstook over 36,000 hours

That's a lot of time

But as soon as this taskwas replaced by AI machine, it was able to do thisin a matter of seconds

So that's the differencebetween artificial intelligence and manual or human work

Even though AI cannot thinkand reason like humans, but their computationalpower is very strong compared to humans, because the machine learning algorithm, deep learning concepts, andnatural language processing, AI has reach a stagewherein it can compute the most complex of complex problems in a matter of seconds

Coming to healthcare, IBMis one of the pioneers that has developed AI software, specifically for medicine

Let me tell you that more than230 healthcare organizations use IBM AI technology, which is basically IBM Watson

In 2016, IBM Watson technologywas able to cross reference 20 million oncology records quickly and correctly diagnose a rare leukemia condition in a patient

So, it basically wentthrough 20 million records, which it probably did in amatter of second or minutes, max to max

And then it correctly diagnosed a patient with a rare leukemia

Knowing that machines are now used in medical fields as well, it shows how important AI has become

It has reached every domains of our lives

Let me give you another example

The Google's AI Eye Doctor is another initiative,which is taken by Google, where they're working withan Indian eye care chain to develop artificial intelligence system which can examine retinal scans and identify a condition called diabetic retinopathywhich can cause blindness

Now in social mediaplatforms like Facebook, artificial intelligence isused for face verification wherein you make use of machine learning and deep learning concept in order to detect facialfeatures and tag your friends

All the auto tagging featurethat you see in Facebook, behind that there's machine learning, deep learning, neural networks

There's only AI behind it

So we're actually unaware that we use AI very regularly in our life

All the social media platforms like Instagram, Facebook, Twitter, they heavily rely onartificial intelligence

Another such example is Twitter's AI which is being used to identifyany sort of hate speech and terroristic languages in tweets

So again, it makes use of machine leaning, deep learning, natural language processing in order to filter out any offensive or any reportable content

Now recently, the company discovered around 300,000 terroristic link accounts and 95% of these were found by non-human artificially intelligent machines

Coming to virtual assistants, we have virtual assistantslike Siri and Alexa right now

Let me tell you aboutanother newly released Google's virtual assistantcalled the Google Duplex, which has astonished millionsof people around the world

Not only can it respond to calls and book appointments for you, it also adds a human touch

So it adds human filters and all of that

It makes it sound very realistic

It's actually very hardto distinguish between human and the AI speaking over the phone

Another famous applicationis AI is self-driving cars

So, artificial intelligenceimplements computer vision, image detection, deep learning, in order to build cars that can automatically detectany objects or any obstacles and drive around withouthuman intervention

So these are fullyautomated self-driving cars

Also, Elon Musk talks a lotabout how AI is implemented in Tesla's self-driving cars

He quoted that Tesla willhave fully self-driving cars ready by the end of the year, and robo taxi versionthat can ferry passengers without anyone behind the wheel

So if you look at it, AI is actually used by the tech giants

A lot of tech giant companieslike Google, Tesla, Facebook, all of these data-driven companies

In fact, Netflix also makes use of AI,

So, coming to Netflix

So with the help ofartificial intelligence and machine learning, Netflix has developed apersonalized movie recommendation for each of its users

So if each of you opened up Netflix and if you look at the type of movies that are recommended toyou, they are different

This is because Netflix studies each user's personal details, and tries to understand whateach user is interested in and what sort of moviepatterns each user has, and then it recommends movies to them

So Netflix uses the watchinghistory of other users with similar taste to recommend what you may be mostinterested in watching next, so that you can stay engaged and continue your monthly subscription

Also, there's a known factthat over 75% of what you watch is recommended by Netflix

So their recommendationengine is brilliant

And the logic behind theirrecommendation engine is machine learning andartificial intelligence

Apart from Netflix, Gmail alsouses AI on a everyday basis

If you open up your inbox right now, you will notice that thereare separate sections

For example, we have primary section, social section, and all of that

Gmail has a separate sectioncalled the spam mails also

So, what Gmail does is it makes use of concepts of artificial intelligence and machine learning algorithms to classify emails as spam and non-spam

Many times certain words or phrases are frequently used in spam emails

If notice your spam emails, they have words likelottery, earn, full refund

All of this denotes that the email is more likely to be a spam one

So such words andcorrelations are understood by using machine learning andnatural language processing and a few other aspects ofartificial intelligence

So, guys, these werethe common applications of artificial intelligence

Now let's discuss thedifferent types of AI

So, AI is divided into threedifferent evolutionary stages, or you can say that there are three stages of artificial intelligence

Of course, we have artificialnarrow intelligence followed by artificialgeneral intelligence, and that is followed byartificial super intelligence

Artificial narrow intelligence, which is also known as weak AI, it involves applyingartificial intelligence only to specific task

So, many currently existing systems that claim to use artificial intelligence are actually operating as weak AI focused on a narrowlydefined specific problem Let me give you an example of artificial narrow intelligence

Alexa is a very good example of weak AI

It operates within unlimitedpre-defined range of functions

There's no genuine intelligence or there is no self awareness, despite being a sophisticatedexample of weak AI

The Google search engine,Sophia the humanoid, self-driving cars, andeven the famous AlphaGo fall under the category of weak AI

So guys, right now we're at the stage of artificial narrowintelligence or weak AI

We actually haven't reachedartificial general intelligence or artificial super intelligence, but let's look at whatexactly it would be like if we reach artificialgeneral intelligence

Now artificial general intelligence which is also known as strong AI, it involves machinesthat posses the ability to perform any intelligenttask that a human being can

Now this is actually something that a lot of people don't realize

Machines don't posseshuman-like abilities

They have a very strong processing unit that can perform high-level computations, but they're not yetcapable of doing the simple and the most reasonablethings that a human being can

If you tell a machine to processlike a million documents, it'll probably do that ina matter of 10 seconds, or a minute, or even 10 minutes

But if you ask a machine towalk up to your living room and switch on the TV, a machine will take forever to learn that, because machines don't havethe reasonable way of thinking

They have a very strong processing unit, but they're not yet capable of thinking and reasoninglike a human being

So that's exactly why we're still stuck on artificial narrow intelligence

So far we haven't developed any machine that can fully be called strong AI, even though there areexamples of AlphaGo Zero which defeated AlphaGo in the game of Go

AlphaGo Zero basically learnedin a span of four months

It learned on its own withoutany human intervention

But even then, it was not classified as a fully strong artificial intelligence, because it cannot reasonlike a human being

Moving onto artificial super intelligence

Now this is a term referring to the time when the capabilities of a computer will surpass that of a human being

In all actuality, I'll take a while for us to achieve artificial super intelligence

Presently, it's seen asa hypothetical situation as depicted in movies andany science fiction books wherein machines havetaken over the world, movies like Terminator and all of that depict artificial super intelligence

These don't exist yet, which we should be thankful for, but there are a lot of people who speculate thatartificial super intelligence will take over the world by the year 2040

So guys, these were the different types or different stages ofartificial intelligence

To summarize everything,like I said before, narrow intelligence is theonly thing that exist for now

We have only weak AI or weakartificial intelligence

All the major AI technologies that you see are artificial narrow intelligence

We don't have any machineswhich are capable of thinking like human beings orreasoning like a human being

Now let's move on and discuss the different programming language for AI

So there are actually N number of language that can be used forartificial intelligence

I'm gonna mention a few of them

So, first, we have Python

Python is probably themost famous language for artificial intelligence

It's also known as the mosteffective language for AI, because a lot of developersprefer to use Python

And a lot of scientistsare also comfortable with the Python language

This is partly because the syntaxes which belong to Python are very simple and they can be learned very easily

It's considered to be one of the most easiest language to learn

And also many other AI algorithms and machine learning algorithms can be easily implemented in Python, because there are a lot of libraries which are predefined functionsfor these algorithms

So all you have to do is youhave to call that function

You don't actually haveto call your algorithm

So, Python is considered the best choice for artificial intelligence

With Python stands R, which is a statisticalprogramming language

Now R is one of themost effective language and environment for analyzingand manipulating the data for statistical purpose

It is a statistical programming language

So using R we can easily produce well designed publication quality plots, including mathematical symboland formula, wherever needed

If you ask me, I thinkR is also one of the easiest programming language to learn

The syntax is very similarto English language, and it also has N number of libraries that support statistics, data science, AI, machine learning, and so on

It also has predefined functions for machine learning algorithms, natural language processing, and so on

So R is also a very good choice if you want to get startedwith programming languages for machine learning or AI

Apart from this, we have Java

Now Java can also beconsidered as a good choice for AI development

Artificial intelligence has a lot to do with search algorithms, artificial neural networks,and genetic programming, and Java provides many benefits

It's easy to use

Debugging is very easy, package services

There is simplified workwith large scale projects

There's a good user interaction, and graphical representation of data

It has something known asthe standard widget toolkit, which can be used for makinggraphs and interfaces

So, graphic virtualization is actually a very important part of AI, or data science, or machinelearning for that matter

Let me list out a few more languages

We also have something known as Lisp

Now shockingly, a lotof people have not heard of this language

This is actually the oldestand the most suited language for the development ofartificial intelligence

It is considered to be a language which is very suited for the development of artificial intelligence

Now let me tell you that this language was invented by John McCarthy who's also known as the fatherof artificial intelligence

He was the person who coined the term artificial intelligence

It has the capability ofprocessing symbolic information

It has excellent prototyping capabilities

It is easy, and it creates dynamicobjects with a lot of ease

There's automatic garbagecollection in all of that

But over the years,because of advancements, many of these features have migrated into many other languages

And that's why a lot ofpeople don't go for Lisp

There are a lot of new languages which have more effective features or which have better packages you can see

Another language I liketo talk about is Prolog

Prolog is frequentlyused in knowledge base and expert systems

The features provided by Prolog include pattern matching,freebase data structuring, automatic back tracking and so on

All of these features provide a very powerful and flexibleprogramming framework

Prolog is actually widelyused in medical projects and also for designing expert AI systems

Apart from this, we also have C++, we have SaaS, we have JavaScript which can also be used for AI

We have MATLAB, we have Julia

All of these languagesare actually considered pretty good languages forartificial intelligence

But for now, if you ask me which programminglanguage should I go for, I would say Python

Python has all the possible packages, and it is very easy tounderstand and easy to learn

So let's look at a coupleof features of Python

We can see why we should go for Python

First of all, Python was created in the year 1989

It is actually a veryeasy programming language

That's one of the reasons why a lot of people prefer Python

It's very easy to understand

It's very easy to grasp this language

So Python is an interpreted,object-oriented, high-level programming language, and it can be very easily implemented

Now let me tell you afew features of Python

It's very simple and easy to learn

Like I mentioned, it is one of the easiestprogramming language, and it also free and open source

Apart from that, it isa high-level language

You don't have to worry about anything like memory allocation

It is portable, meaning that you canuse it on any platform like Linux, Windows,Macintosh, Solaris, and so on

It support different programming paradigms like object-oriented andprocedure oriented programming, and it is extensible, meaning that it can invokeC and C++ libraries

Apart from this, letme tell you that Python is actually gaining unbelievablehuge momentum in AI

The language is used to developdata science algorithms, machine learning algorithms,and IoT projects

The other advantages to Python also, the fact that you don't have to code much when it comes to Pythonfor AI or machine learning

This is because thereare ready-made packages

There are predefined packages that have all the functionand algorithm stored

For example, there issomething known as PiBrain, which can be used for machine learning, NumPy which can be usedfor scientific computation, Pandas and so on

There are N number of libraries in Python

So guys, I'm now going togo into depth of Python

I'm now going to explain Python to you, since this session is aboutartificial intelligence

So, those of you who don'tknow much about Python or who are new to Python, I will leave a couple oflinks in the description box

You all can get started with programming and any other concepts or any other doubts that you have on Python

We have a lot of contentaround programming with Python or Python for machine learning and so on

Now let's move on and talk about one of the most important aspects of artificial intelligence, which is machine learning

Now a lot of people alwaysask me this question

Is machine learning andartificial intelligence the same thing? Well, both of them are not the same thing

The difference betweenAI and machine learning is that machine learning isused in artificial intelligence

Machine learning is a method through which you can feeda lot of data to a machine and make it learn

Now AI is a vast of field

Under AI, we have machinelearning, we have NLP, we have expert systems,we have image recognition, object detection, and so on

We have deep learning also

So, AI is sort of a processor it's a methodology in which you make machines mimic the behavior of human beings

Machine learning is a way in which you feed a lotof data to a machine, so that it can make it's own decisions

Let's get into depthabout machine learning

So first, we'll understandthe need for machine learning or why machine learningcame into existence

Now the need for machine learning begins since the technicalrevolution itself

So, guys, since technologybecame the center of everything, we've been generating animmeasurable amount of data

As per research, we generate around 2

5 quintillion bytes ofdata every single data every single day

And it is estimatedthat by this year, 2020, 1

7 mb of data will becreated every second for every person on earth

So as I'm speaking to you right now, I'm generating a lot of data

Now your watching this video on YouTube also accounts for data generation

So there's data everywhere

So with the availability of so much data, it is finally possible tobuild predictive models that can study and analyze complex data to find useful insights anddeliver more accurate results

So, top tier companieslike Netflix and Amazon build such machine learning models by using tons of data in order to identify anyprofitable opportunity and avoid any unwanted risk

So guys, one thing youall need to know is that the most important thingfor artificial intelligence is data

For artificial intelligence or whether it's machinelearning or deep learning, it's always data

And now that we have a lot of data, we can find a way to analyze, process, and draw useful insights from this data in order to help us grow businesses or to find solutions to some problems

Data is the solution

We just need to knowhow to handle the data

And the way to handle data is through machinelearning, deep learning, and artificial intelligence

A few reasons why machinelearning is so important is, number one, due toincrease in data generation

So due to excessive production of data, we need to find a method that can be used to structure, analyze, anddraw useful insights from data, this is where machine learning comes in

It is used to solveproblems and find solutions through the most complextask faced by organizations

Apart form this, we also neededto improve decision making

So by making use of various algorithms, machine learning can be used to make better business decisions

For example, machine learningis used to focus sales

It is used to predict anydownfalls n the stock market or identify any sortof risk and anomalies

Other reasons include thatmachine learning helps us uncover patterns and trends in data

So finding hidden patterns and extracting key insights fro data is the most importantpart of machine learning

So by building predictive models and using statistical techniques, machine learning allows youto dig beneath the surface and explode the data at a minute scale

Understanding data andextracting patterns manually takes a lot of time

It'll take several days for us to extract any usefulinformation from data

But if you use machinelearning algorithms, you can perform similarcomputations in less than a second

Another reason is we needto solve complex problems

So from detecting the genes linked to the deadly ALS disease, to building self-driving cars, machine learning can be used to solve the most complex problems

At present, we alsofound a way to spot stars which are 2,400 lightyears away from our planet

Okay, all of this is possible through AI, machine learning, deeplearning, and these techniques

So to sum it up, machine learning is veryimportant at present because we're facing alot of issues with data

We're generating a lot of data, and we have to handle this data in such a way that in benefits us

So that's why machine learning comes in

Moving on, what exactlyis machine learning? So let me give you a shorthistory of machine learning

So machine learning wasfirst coined by Arthur Samuel in the year 1959, which is just three years from when artificial intelligence was coined

So, looking back, that year was probably the most significant in termsof technological advancement, because most of the technologies today are based on the conceptof machine learning

Most of the AI technologies itself are based on the concept of machine learning and deep learning

Don't get confused about machine learning and deep learning

We'll discuss about deeplearning in the further slides, where we'll also see the difference between AI, machinelearning, and deep learning

So coming back to whatexactly machine learning is, if we browse through the internet, you'll find a lot of definitions about what exactly machine learning is

One of the definitions I found was a computer program is saidto learn from experience E with respect to some class of task T and performance measure P ifits performance at task in T, as measured by P, improveswith experience E

That's very confusing, so letme just narrow it down to you

In simple terms, machinelearning is a subset of artificial intelligence which provides machines the ability to learn automatically andimprove with experience without being explicitlyprogrammed to do so

In the sense, it is the practice of getting machines to solve problems by gaining the ability to think

But now you might be thinking how can a machine think or make decisions

Now machines are very similar to humans

Okay, if you feed a machinea good amount of data, it will learn how to interpret, process, and analyze this data by usingmachine learning algorithms, and it will help you solve world problems

So what happens here is a lot of data is fed to the machine

The machine will train on this data and it'll build a predictive model with the help of machinelearning algorithms in order to predict some outcome or in order to find somesolution to a problem

So it involves data

You're gonna train the machine and build a model by usingmachine learning algorithms in order to predict some outcome or to find a solution to a problem

So that is a simple way of understanding what exactly machine learning is

I'll be going into moredepth about machine learning, so don't worry if you haveunderstood anything as of now

Now let's discuss a couple terms which are frequentlyused in machine learning

So, the first definition thatwe come across very often is an algorithm

So, basically, a machinelearning algorithm is a set of rules andstatistical techniques that is used to learn patterns from data and draw significant information from it

Okay

So, guys, the logic behinda machine learning model is basically the machinelearning algorithm

Okay, an example of amachine learning algorithm is linear regression, or decisiontree, or a random forest

All of these are machinelearning algorithms

We'll define the logic behind a machine learning model

Now what is a machine learning model? A model is actually the main component of a machine learning process

Okay, so a model is trained by using the machine learning algorithm

The difference between analgorithm and a model is that an algorithm maps all the decisions that a model is supposed to take based on the given input in order to get the correct output

So the model will use the machine learning algorithm in order to draw usefulinsights from the input and give you an outcomethat is very precise

That's the machine learning model

The next definition wehave is predictor variable

Now a predictor variableis any feature of the data that can be used to predict the output

Okay, let me give you an example to make you understand whata predictor variable is

Let's say you're trying topredict the height of a person, depending on his weight

So here your predictorvariable becomes your weight, because you're usingthe weight of a person to predict the person's height

So your predictor variablebecomes your weight

The next definition is response variable

Now in the same example, height would be the response variable

Response variable is also known as the target variable orthe output variable

This is the variable thatyou're trying to predict by using the predictor variables

So a response variable is the feature or the output variablethat needs to be predicted by using the predictor variables

Next, we have somethingknown as training data

Now training and testingdata are terminologies that you'll come across very often in a machine learning process

So training data is basicallythe data that I used to create the machine learning model

So, basically in amachine learning process, when you feed data into the machine, it'll be divided into two parts

So splitting the data into two parts is also known as data splicing

So you'll take your input data, you'll divide it into two sections

One you'll call the training data, and the other you'llcall the testing data

So then you have somethingknown as the testing data

The training data is basically used to create the machine learning model

The training data helpsthe model to identify key trends and patterns which are essential to predict the output

Now the testing data is,after the model is trained, it must be tested in orderto evaluate how accurately it can predict an outcome

Now this is done byusing the testing data

So, basically, the trainingdata is used to train the model

The testing data is used to test the efficiency of the model

Now let's move on and get our next topic, which is machine learning process

So what is the machine learning process? Now the machine learning process involves building a predictive model that can be used to find a solution for a problem statement

Now in order to solve anyproblem in machine learning, there are a couple of stepsthat you need to follow

Let's look at the steps

The first step is you definethe objective of your problem

And the second step is data gathering, which is followed by preparing your data, data exploration, building a model, model evaluation, andfinally making predictions

Now, in order to understandthe machine learning process, let's assume that you'vebeen given a problem that needs to be solvedby using machine learning

So the problem that you need to solve is we need to predict the occurrence of rain in your local area byusing machine learning

So, basically, you need to predict the possibility of rain bystudying the weather conditions

So what we did here is we basically looked at step number one, which is define theobjective of the problem

Now here you need toanswer questions such as what are we trying to predict

Is that output going tobe a continuous variable, or is it going to be a discreet variable? These are the kinds of questionsthat you need to answer in the first page, which is defining the objectiveof the problem, right? So yeah, exactly whatare the target feature

So here you need to understand which is your target variable and what are the differentpredictor variables that you need in orderto predict this outcome

So here our targetvariable will be basically a variable that can tell us whether it's going to rain or not

Input data is we'llneed data such as maybe the temperature on a particular day or the humidity level, theprecipitation, and so on

So you need to define theobjective at this stage

So basically, you have toform an idea of the problem at this storage

Another question thatyou need to ask yourself is what kind of problem are you solving

Is this a binary classification problem, or is this a clustering problem, or is this a regression problem? Now, a lo of you might not be familiar with the terms classification clustering and regression in termsof machine learning

Don't worry, I'll explainall of these terms in the upcoming slides

All you need to understand at step one is you need to define how you'regoing to solve the problem

You need to understand what sort of data you need to solve the problem, how you're going to approach the problem, what are you trying to predict, what variables you'll need inorder to predict the outcome, and so on

Let's move on and look at step number two, which is data gather

Now in this stage, you mustbe asking questions such as, what kind of data is neededto solve this problem? And is this data available? And if it is available, fromwhere can I get this data and how can I get the data? Data gathering is one ofthe most time-consuming steps in machine learning process

If you have to go manuallyand collect the data, it's going to take a lot of time

But lucky for us, there area lot of resources online, which were wide data sets

All you need to do is web scraping where you just have to goahead and download data

One of the websites I cantell you all about is Cargill

So if you're a beginnerin machine learning, don't worry about datagathering and all of that

All you have to do is goto websites such as cargill and just download the data set

So coming back to the problemthat we are discussing, which is predicting the weather, the data needed for weather forecasting includes measures like humidity level, the temperature, thepressure, the locality, whether or not you live in a hill station, such data has to be collectedor stored for analysis

So all the data is collected during the data gathering stage

This step is followed by data preparation, or also known as data cleaning

So if you're going around collecting data, it's almost never in the right format

And eve if you are taking data from online resources from any website, even then, the data will requirecleaning and preparation

The data is never in the right format

You have to do some sort of preparation and some sort of cleaning in order to make thedata ready for analysis

So what you'll encounterwhile cleaning data is you'll encounter alot of inconsistencies in the data set, like you'll encounter som missing values, redundant variables, duplicatevalues, and all of that

So removing such inconsistenciesis very important, because they might lead to any wrongful computations and predictions

Okay, so at this stageyou can scan the data set for any inconsistencies, and you can fix them then and there

Now let me give you a smallfact about data cleaning

So there was a survey thatwas ran last year or so

I'm not sure

And a lot of data scientists were asked which step was the mostdifficult or the most annoying and time-consuming of all

And 80% of the data scientist said it was data cleaning

Data cleaning takes up 80% of their time

So it's not very easy toget rid of missing values and corrupted data

And even if you get rid of missing values, sometimes your dataset might get affected

It might get biasedbecause maybe one variable has too many missing values, and this will affect your outcome

So you'll have to fix such issue, we'll have to deal withall of this missing data and corrupted data

So data cleaning is actuallyone of the hardest steps in machine learning process

Okay, now let's move onand look at our next step, which is exploratory data analysis

So here what you do is basically become a detective in the stage

So this stage, which is EDAor exploratory data analysis, is like the brainstormingstage of machine learning

Data exploration involvesunderstanding the patterns and the trends in your data

So at this stage, all theuseful insights are drawn and any correlations betweenthe various variables are understood

What do I mean by trends andpatterns and correlations? Now let's consider our example which is we have to predict the rainfall on a particular day

So we know that there is astrong possibility of rain if the temperature has fallen law

So we know that our output will depend on variables such as temperature,humidity, and so on

Now to what level itdepends on these variables, we'll have to find out that

We'll have to find out the patterns, and we'll find out the correlations between such variables

So such patterns and trendshave to be understood and mapped at this stage

So this is what exploratorydata analysis is about

It's the most importantpart of machine learning

This is where you'll understand what exactly your data is and how you can form thesolution to your problem

The next step in amachine learning process is building a machine learning module

So all the insights and the patterns that you derive duringthe data exploration are used to build amachine learning model

So this stage always beginsby splitting the data set into two parts, which istraining data and testing data

I've already discussed with you that the data that you usedin a machine learning process is always split into two parts

We have the training dataand we have the testing data

Now when you're building a model, you always use the training data

So you always make useof the training data in order to build the model

Now a lot of you might beasking what is training data

Is it different from the input data that you're feeding with the machine or is it different from the testing data? Now training data is the same input data that you're feeding to the machine

The only difference is that you're splitting the data set into two

You're randomly picking 80% of your data and you're assigning for training purpose

And the rest 20%, probably, you'll assign it for testing purpose

So guys, always rememberanother thing that the training data is always much more than your testing data, obviously because you needto train your machine

And the more data you feed the machine during the training phase, the better it will beduring the testing phase

Obviously, it'll predict better outcomes if it is being trained on more data

Correct? So the model is basically using the machine learning algorithmthat predicts the output by using the data fed to it

Now in the case of predicting rainfall, the output will be a categorical variable, because we'll be predicting whether it's going to rain or not

Okay, so let's say we have anoutput variable called rain

The two possible valuesthat this variable can take is yes it's going to rainand no it won't rain

Correct, so that is out come

Our outcome is a classificationor a categorical variable

So for such cases where your outcome is a categorical variable, you'll be using classification algorithms

Again, example of aclassification algorithm is logistic regression or you can also support vector machines, you can use K nearest neighbor, and you can also usenaive Bayes, and so on

Now don't worry about these terms, I'll be discussing allthese algorithms with you

But just remember thatwhile you're building a machine learning model, you'll make use of the training data

You'll train the model byusing the training data and the machine learning algorithm

Now like I said, choosing themachine learning algorithm, depends on the problem statement that you're trying to solve because of N number ofmachine learning algorithms

We'll have to choose the algorithm that is the most suitablefor your problem statement

So step number six is model evaluation and optimization

Now after you've done building a model by using the training data set, it is finally time toput the model road test

The testing data set is used to check the efficiency of the model and how accurately itcan predict the outcome

So once the accuracy is calculated, any further improvements in the model can be implemented during this stage

The various methods that can help you improve the performance of the model, like you can use parameter tuning and cross validation methods in order to improve theperformance of the model

Now the main things you need to remember during model evaluation and optimization is that model evaluation is nothing but you're testing how well yourmodel can predict the outcome

So at this stage, you will beusing the testing data set

In the previous stage,which is building a model, you'll be using the training data set

But in the model evaluation stage, you'll be using the testing data set

Now once you've tested your model, you need to calculate the accuracy

You need to calculate how accurately your model is predicting the outcome

After that, if you find that you need to improve your model insome way or the other, because the accuracy is not very good, then you'll use methodssuch as parameter tuning

Don't worry about these terms, I'll discuss all of this with you, but I'm just trying to make sure that you're understanding the concept behind each of the phasesand machine learning

It's very important youunderstand each step

Okay, now let's move on and look at the last stage of machinelearning, which is predictions

Now, once a model is evaluated and once you've improved it, it is finally used to make predictions

The final output can eitherbe a categorical variable or a continuous variable

Now all of this dependson your problem statement

Don't get confused aboutcontinuous variables, categorical variables

I'll be discussing all of this

Now in our case, because we're predicting the occurrence of rainfall, the output will be categorical variable

It's obvious because we're predicting whether it's going to rain or not

The result, we understand that this is a classification problem because we have a categorical variable

So that was the entiremachine learning process

Now it's time to learnabout the different ways in which machines can learn

So let's move ahead and look at the types of machine learning

Now this is one of the most interesting concepts in machine learning, the three different waysin which machines learn

There is something knownas supervised learning, unsupervised learning, andreinforcement learning

So we'll go through this one by one

We'll understand whatsupervised learning is first, and then we'll look atthe other two types

So defined supervised learning, it is basically atechnique in which we teach or train the machine by using the data, which is well labeled

Now, in order to understandsupervised learning, let's consider a small example

So, as kids, we all neededguidance to solve math problems

A lot of us had troublesolving math problems

So our teachers always helpus understand what addition is an dhow it is done

Similarly, you can thinkof supervised learning as a type of machine learning that involves a guide

The label data set is a teacher that will train you to understandthe patterns in the data

So the label data set is nothingbut the training data set

I'll explain more about this in a while

So, to understandsupervised learning better, let's look at the figure on the screen

Right here we're feeding the machine image of Tom and Jerry, and the goal is forthe machine to identify and classify the images into two classes

One will contain images of Tom and the the other willcontain images of Jerry

Now the main thing that you need to note in supervised learningis a training data set

The training data set isgoing to be very well labeled

Now what do I mean when I say that training data set is labeled

Basically, what we're doingis we're telling the machine this how Tom looks andthis is how Jerry looks

By doing this, you're training the machine by using label data

So the main thing that you'redoing is you're labeling every input data thatyou're feeding to the model

So, basically, you're entiretraining data set is labeled

Whenever you're giving an image of Tom, there's gonna be a labelthere saying this is Tom

And when you're giving an image of Jerry, you're saying that thisis how Jerry looks

So, basically, you're guiding the machine and you're telling that,"Listen, this is how Tom looks, "this is how Jerry looks, "and now you need to classify them "into two different classes

" That's how supervised learning works

Apart from that, it'sthe same old process

After getting the input data, you're gonna perform data cleaning

Then there's exploratory data analysis, followed by creating the model by using the machine learning algorithm, and then this is followedby model evaluation, and finally, your predictions

Now, one more thing to note here is that the output that you get byusing supervised learning is also labeled output

So, basically, you'llget two different classes of name Tom and one of name Jerry, and you'll get them labeled

That is how supervised learning works

The most important thingin supervised learning is that you're training the model by using labeled data set

Now let's move on and lookat unsupervised learning

We look at the same example and understand how unsupervised learning works

So what exactly is unsupervised learning? Now this involves trainingby using unlabeled data and allowing the model toact on that information without any guidance

Alright

Like the name suggest itself, there is no supervision here

It's unsupervised learning

So think of unsupervisedlearning as a smart kid that learns without any guidance

Okay, in this type of machine learning, the model is not fed with any label data, as in the model has no clue that this is the image of Tom and this is Jerry

It figures out patterns and the difference betweenTom and Jerry on its own by taking in tons and tons of data

Now how do you think themachine identifies this as Tom, and then finally gives us the output like yes this is Tom, this is Jerry

For example, it identifiesprominent features of Tom, such as pointy ears,bigger in size, and so on, to understand that thisimage is of type one

Similarly, it finds out features in Jerry, and knows that this image is of type two, meaning that the first image is different from the second image

So what the unsupervisedlearning algorithm or the model does is it'llform two different clusters

It'll form one clusterwhich are very similar, and the other clusterwhich is very different from the first cluster

That's how unsupervised learning works

So the important thingsthat you need to know in unsupervised learning is that you're gonna feedthe machine unlabeled data

The machine has to understand the patterns and discover the output on its own

And finally, the machinewill form clusters based on feature similarity

Now let's move on and locate the last type of machine learning, which is reinforcement learning

Reinforcement learning is quite different when compared to supervisedand unsupervised learning

What exactly is reinforcement learning? It is a part of machinelearning where an agent is put in an environment, and he learns to behavein this environment by performing certain actions, and observing the rewards whichis gets from those actions

To understand whatreinforcement learning is, imagine that you were droppedoff at an isolate island

What would you do? Now panic

Yes, of course, initially,we'll all panic

But as time passes by, you will learn how to live on the island

You will explode the environment, you will understandthe climate conditions, the type of food that grows there, the dangers of the island so on

This is exactly howreinforcement learning works

It basically involves an agent, which is you stuck on the island, that is put in an unknownenvironment, which is the island, where he must learn byobserving and performing actions that result in rewards

So reinforcement learning is mainly used in advanced machine learning areas such as self-driving cars and AlphaGo

I'm sure a lot of youhave heard of AlphaGo

So, the logic behind AlphaGo is nothing but reinforcementlearning and deep learning

And in reinforcement learning, there is not really any inputdata given to the agent

All he has to do is he has to explore everything from scratch it's like a newborn baby withno information about anything

He has to go aroundexploring the environment, and getting rewards, andperforming some actions which results in either rewards or in some sort of punishment

Okay

So that sums up the typesof machine learning

Before we move ahead, I'd like to discuss the difference between the three typesof machine learning, just to make the concept clear to you all

So let's start by lookingat the definitions of each

In supervised learning, the machine will learnby using the label data

In unsupervised learning,they'll be unlabeled data, and the machine has to learnwithout any supervision

In reinforcement learning,there'll be an agent which interacts with the environment by producing actions anddiscover errors or rewards based on his actions

Now what are the type of problems that can be solved by usingsupervised, unsupervised, and reinforcement learning

When it comes to supervised learning, the two main types ofproblems that are solved is regression problems andclassification problems

When it comes to unsupervised learning, it is association and clustering problems

When it comes to reinforcement learning, it's reward-based problems

I'll be discussingregression, classification, clustering, and all of thisin the upcoming slides, so don't worry if youdon't understand this

Now the type of data which isused in supervised learning is labeled data

In unsupervised learning, it unlabeled

And in reinforcement learning, we have no predefined data set

The agent has to doeverything from scratch

Now the type of training involved in each of these learnings

In supervised learning, thereis external supervision, as in there is the labeled data set which acts as a guidefor the machine to learn

In unsupervised learning,there's no supervision

Again, in reinforcement learning, there's no supervision at all

Now what is the approach to solve problems by using supervised, unsupervised, and reinforcement learning? In supervised learning, it is simple

You have to mal the labeledinput to the known output

The machine knows whatthe output looks like

So you're just labelingthe input to the output

In unsupervised learning, you're going to understand the patterns and discover the output

Here you have no clueabout what the input is

It's not labeled

You just have to understand the patterns and you'll have to form clustersand discover the output

In reinforcement learning,there is no clue at all

You'll have to follow thetrial and error method

You'll have to go around your environment

You'll have to explore the environment, and you'll have to try some actions

And only once you perform those actions, you'll know that whetherthis is a reward-based action or whether this is apunishment-based action

So, reinforcementlearning is totally based on the concept of trial and error

Okay

A popular algorithm on thesupervised learning include linear regression, logistic regressions, support vector machinesK nearest neighbor, naive Bayes, and so on

Under unsupervised learning, we have the famous K-meansclustering method, C-means and all of that

Under reinforcement learning, we have the famous learningQ-learning algorithm

I'll be discussing thesealgorithms in the upcoming slides

So let's move on andlook at the next topic, which is the types of problems solved using machine learning

Now this is what we weretalking about earlier when I said regression, classification, and clustering problems

Okay, so let's discuss whatexactly I mean by that

In machine learning,all the problems can be classified into three types

Every problem that isapproached in machine learning can be put interest oneof these three categories

Okay, so the first typeis known as a regression, then we have classificationand clustering

So, first, let's look atregression type of problems

So in this type problem, the output is alwaysa continuous quantity

For example, if you want to predict the speed of a car, given the distance, it is a regression problem

Now a lot of you might not be very aware of what exactly a continuous quantity is

A continuous quantity isany quantity that can have an infinite range of values

For example, The weight of a person, it is a continuous quantity, because our weight can be 50, 50

1, 50

001, 5

0021, 50

0321 and so on

It can have an infiniterange of values, correct? So the type of problemthat you have to predict a continuous quantity to makeuse of regression algorithms

So, regression problems can be solved by using supervised learning algorithms like linear regression

Next, we have classification

Now in this type of problem, the output is always a categorical value

Now when I say categorical value, it can be value such as the gender of a personis a categorical value

Now classifying emailsinto two two classes like spam and non-spam isa classification problem that can be solved by using supervised learningclassification algorithms, like support vector machines, naive Bayes, logistic regression, Knearest neighbor, and so on

So, again, the main aim in classification is to compute the category of the data

Coming to clustering problems

This type of problem involves assigned input into two or more clusters based on feature similarity

Thus when I read this sentence, you should understand thatthis is unsupervised learning, because you don't haveenough data about your input, and the only option thatyou have is to form clusters Categories are formedonly when you know that your data is of two type

Your input data is labeledand it's of two types, so it's gonna be a classification problem

But when a clustering problem happens, when you don't have muchinformation about your input, all you have to do isyou have to find patterns and you have to understand that data points which are similar are clustered into one group, and data points which aredifferent from the first group are clustered into another group

That's what clustering is

An example is in Netflix what happens is Netflix clusters theirusers into similar groups based on their interest,based on their age, geography, and so on

This can be done by usingunsupervised learning algorithms like K-means

Okay

So guys, there were thethree categories of problems that can be solved byusing machine learning

So, basically, what I'm trying to say is all the problems will fallinto one of these categories

So any problem that you giveto a machine learning model, it'll fall into one of these categories

Okay

Now to make things alittle more interesting, I have collected real world data sets from online resources

And what we're gonna do is we'regoing to try and understand if this is a regression problem, or a clustering problem, ora classification problem

Okay

Now the problem statement in here is to study the house sales data set, and build a machine learning model that predicts the house pricing index

Now the most importantthing you need to understand when you read a problem statement is you need to understandwhat is your target variable, what are the possible predictorvariable that you'll need

The first thing you shouldlook at is your targe variable

If you want to understandif this a classification, regression, or clustering problem, look at your target variableor your output variable that you're supposed to predict

Here you're supposed to predictthe house pricing index

Our house pricing index is obviously a continuous quantity

So as soon as you understand that, you'll know that thisis a regression problem

So for this, you can make use of the linear regression algorithm, and you can predict thehouse pricing index

Linear regression is theregression algorithm

It is a supervised learning algorithm

We'll discuss more aboutit in the further slides

Let's look at our next problem statement

Here you have to studya bank credit data set, and make a decision about whether to approve the loan of an applicant based on his profile

Now what is your outputvariable over here? Your output variable isto predict whether you can approve the loan of a applicant or not

So, obviously, your outputis going to be categorical

It's either going to be yes or no

Yes is basically approved loan

No is reject loan

So here, you understand that this is a classification problem

Okay

So you can make use ofalgorithms like KNN algorithm or you can make use ofsupport vector machines in order to do this

So, support vector machine and KNN which is K nearest neighbor algorithms are basically supervisedlearning algorithm

We'll talk more about thatin the upcoming slides

Moving on to our next problem statement

Here the problem statement is to cluster a set of movies as either good or average based on the social media outreach

Now if you look properly, your clue is in the question itself

The first line it says isto cluster a set of movies as either good or average

Now guys, whenever youhave a problem statement that is asking you to group the data set into different groups or to form different, different clusters, it's obviously a clustering problem

Right here you can make use of the K-means clustering algorithm, and you can form two clusters

One will contain the popular movies and the other will containthe non-popular movies

These alright smallexamples of how you can use machine learning tosolve clustering problem, the regression, andclassification problems

The key is you need to identifythe type of problem first

Now let's move on anddiscuss the different types of machine learning algorithms

So we're gonna start bydiscussing the different supervised learning algorithms

So to give you a quick overview, we'll be discussing the linear regression, logistic regression, and decision tree, random forest, naive Bayes classifier, support vector machines,and K nearest neighbor

We'll be discussingthese seven algorithms

So without any further delay, let's look at linear regression first

Now what exactly is alinear regression algorithm? So guys, linear regression is basically a supervised learning algorithm that is used to predict acontinuous dependent variable y based on the values ofindependent variable x

Okay

The important thing to note here is that the dependent variable y, the variable that you'retrying to predict, is always going to bea continuous variable

But the independent variable x, which is basically thepredictor variables, these are the variablesthat you'll be using to predict your output variable, which is nothing butyour dependent variable

So your independent variablesor your predictive variables can either be continuous or discreet

Okay, there is not sucha restriction over here

Okay, they can be eithercontinuous variables or they can be discreet variables

Now, again, I'll tell youwhat a continuous variable is, in case you've forgotten

It is a vary that has infinitenumber of possibilities

So I'll give you an exampleof a person's weight

It can be 160 pounds, orthey can weigh 160

11 pounds, or 160

1134 pounds and so on

So the number of possibilitiesfor weight is limitless, and this is exactly whata continuous variable is

Now in order to understandlinear regression, let's assume that you want to predict the price of a stock over a period of time

Okay

For such a problem, you canmake use of linear regression by starting the relationship between the dependent variable, which is the stock price, and the independentvariable, which is the time

You're trying to predict the stock price over a period of time

So basically, you're gonnacheck how the price of a stock varies over a period of time

So your stock price is going to be your dependent variableor your output variable, and the time is going tobe your predictor variable or your independent variable

Let's not confuse it anymore

Your dependent variableis your output variable

Okay, your independentvariable is your input variable or your predictor variable

So in our case, thestock price is obviously a continuous quantity, because the stock price can have an infinite number of values

Now the first step in linear regression is always to draw out a relationship between your dependent andyour independent variable by using the best fitting linear length

We make an assumption that your dependent and independent variable is linearly related to each other

We call it linear regression because both the variables vary linearly, which means that byplotting the relationship between these two variables, we'll get more of a straightline, instead of a curve

Let's discuss the mathbehind linear regression

So, this equation over here, it denotes the relationship between your independent variable x, which is here, and your dependent variable y

This is the variableyou're trying to predict

Hopefully, we all know that the equation for a linear line in mathis y equals mx plus c

I hope all of you remember math

So the equation for a linear line in math is y equals to mx plus c

Similarly, the linear regression equation is represented along the same line

Okay, y equals to mx plus c

There's just a little bit of changes, which I'll tell you what they are

Let's understand this equation properly

So y basically stands foryour dependent variable that you're going to predict

B naught is the y intercept

Now y intercept is nothingbut this point here

Now in this graph, you're basically showing the relationship betweenyour dependent variable y and your independent variable x

Now this is the linear relationship between these two variables

Okay, now your y intercept is basically the point on the line which starts at the y-axis

This is y interceptor, which is represented by B naught

Now B one or beta isthe slope of this line now the slope can eitherbe negative or positive, depending on the relationshipbetween the dependent and independent variable

The next variable that we have is x

X here represents the independent variable that is used to predict ourresulting output variable

Basically, x is used topredict the value of y

Okay

E here denotes the errorin the computation

For example, this is the actual line, and these dots here representthe predicted values

Now the distance between these two is denoted by the errorin the computation

So this is the entire equation

It's quite simple, right? Linear regression will basically draw a relationship between yourinput and your input variable

That's how simple linear regression was

Now to better understandlinear regression, I'll be running a demo in Python

So guys, before I get startedwith our practical demo, I'm assuming that most of you have a good understanding of Python, because explaining Python is going to be out of the scope of today's session

But if some of you are not familiar with the Python language, I'll leave a couple of linksin the description box

Those will be relatedto Python programming

You can go through thoselinks, understand Python, and then maybe try to understand the demo

But I'd be explaining the logicpart of the demo in depth

So the main thing thatwe're going to do here is try and understand linear regression

So it's okay if you do notunderstand Python for now

I'll try to explain as much as I can

But if you still want tounderstand this in a better way, I'll leave a couple oflinks in the description box you can go to those videos

Let me just zoom in for you

I hope all of you can see the screen

Now in this linear regression demo, what we're going to do is we're going to form a linear relationship between the maximum temperature and minimum temperatureon a particular date

We're just going to doweather forecasting here

So our task is to predictthe maximum temperature, taking input featureas minimum temperature

So I'm just going to tryand make you understand linear regression through this demo

Okay, we'll see how itactually works practically

Before I get started with the demo, let me tell you somethingabout the data set

Our data set is storedin this path basically

The name of the data set is weather

csv

Okay, now, this containsdata on whether conditions recorded on each day at various weatherstations around the world

Okay, the informationinclude precipitation, snowfall, temperatures, wind speeds, and whether the dayincluded any thunderstorm or other poor weather conditions

So our first step inany demo for that matter will be to import all thelibraries that are needed

So we're gonna begin our demo by importing all the required libraries

After that, we're goingto read in our data

Our data will be stored in this variable called data set, and we're going to use a read

csv function since our data set is in the CSV format

After that, I'll be showing you how the data set looks

We'll also look at the data set in depth

Now let me just show you the output first

Let's run this demo and see first

We're getting a couple of plots which I'll talk about in a while

So we can ignore this warning

It has nothing to do with

So, first of all, we're printingthe shape of our data set

So, when we print theshape of our data set, This is the output that we get

So, basically, thisshows that we have around 12,000 rows and 31columns in our data set

The 31 columns basically represent the predictor variables

So you can say that wehave 31 predictor variables in order to protect the weather conditions on a particular date

So guys, the main aimin this problem segment is weather forecast

We're going to predict the weather by using a set of predictor variables

So these are the different types of predictor variables that we have

Okay, we have somethingknown as maximum temperature

So this is what our data set looks like

Now what I'm doing inthis block of code is

What we're doing is we'replotting our data points on a 2D graph in order tounderstand our data set and see if we can manuallyfind any relationship between the variables

Here we've taken minimum temperature and maximum temperaturefor doing our analysis

So let's just look at this plot

Before that, let me just commentall of these other plots, so that you see on eithergraph that I'm talking about

So, when you look at this graph, this is basically the graphbetween your minimum temperature and your maximum temperature

Maximum temperature are dependent variable that you're going to predict

This is y

And your minim temperature is your x

It's basically your independent variable

So if you look at this graph, you can see that there is a sort of linear relationship between the two, except there are a little bitof outliers here and there

There are a few data pointswhich are a little bit random

But apart from that, there isa pretty linear relationship between your minimum temperature and your maximum temperature

So by this graphic, you can understand that you can easily solve this problem using linear regression, because our data is very linear

I can see a clear straight line over here

This is our first graph

Next, what I'm doing is I'm just checking the average and maximumtemperature that we have

I'm just looking at theaverage of our output variable

Okay

So guys, what we're doing here right now is just exploratory data analysis

We're trying to understand our data

We're trying to see the relationship between our input variableand our output variable

We're trying to seethe mean or the average of the output variable

All of this is necessaryto understand our data set

So, this is what our averagemaximum temperature looks like

So if we try to understandwhere exactly this is, so our average maximum temperature is somewhere between 28and I would say between 30

28 and 32, somewhere there

So you can say thataverage maximum temperature lies between 25 and 35

And so that is our averagemaximum temperature

Now that you know a littlebit about the data set, you know that there is avery good linear relationship between your input variableand your output variable

Now what you're goingto do is you're going to perform something known as data splicing

Let me just comment that for you

This section is nothing but data splicing

So for those of you whoare paying attention, know that data splicing is nothing but splitting your data set intotraining and testing data

Now before we do that, I mentioned earlier that we'llbe only using two variables, because we're trying to understandthe relationship between the minimum temperatureand maximum temperature

I'm doing this becauseI want you to understand linear regression in thesimplest way possible

So guys, in order to makeunderstand linear regression, I have just derived only twovariables from a data set

Even though when we checkthe structure of a data set, we had around 31 features, meaning that we had 31 variables which include my predictorvariable and my target variable

So, basically, we had30 predictor variables and we had one target variable, which is your maximum temperature

So, what I'm doing hereis I'm only considering these two variables, because I want to show you exactly how linear regression works

So, here what I'm doing is I'm basically extracting only these two variables from our data set, storing it in x and y

After that, I'm performing data splicing

So here, I'm basically splitting the data into training and testing data, and remember one point that I am assigning 20% of the data to our testing data set, and the remaining 80% isassigned for training

That's how training works

We assign maximum data set for training

We do this because we wantthe machine learning model or the machine learning algorithmto train better on data

We wanted to take asmuch data as possible, so that it can predictthe outcome properly

So, to repeat it again for you, so here we're just splitting the data into training and testing data set

So, one more thing to note here is that we're splitting 80% ofthe data from training, and we're assigning the 20%of the data to test data

The test size variable,this variable that you see, is what is used to specify the proportion of the test set

Now after splitting the datainto training and testing set, finally, the time isto train our algorithm

For that, we need to importthe linear regression class

We need to instantiate it and call the fit methodalong with the training data

This is our linear regression class, and we're just creating an instance of the linear regression class

So guys, a good thing about Python is that you have pre-definedclasses for your algorithms, and you don't have call your algorithms

Instead, all you have to do, is you call this classlinear regression class, and you have to create an instance of it

Here I'm basically creatingsomething known as a regressor

And all you have to do is youhave to call the fit method along with your training data

So this is my trainingdata, x train and y train contains my training data, and I'm calling our linearregression instance, which is regressor,along with this data set

So here, basically,we're building the model

We're doing nothingbut building the model

Now, one of the major things that linear regression model does is it finds the best value forthe intercept and the slope, which results in a linethat best fits the data

I've discussed whatintercept and slope is

So if you want to see theintercept and the slope calculated by our linear regression model, we just have to run this line of code

And let's looks at the output for that

So, our intercept is around 10

66 and our coefficient, these are also known as beta coefficients, coefficient are nothing butwhat we discussed, beta naught

These are beta values

Now this will just help you understand the significance of your input variables

Now what this coefficient value means is, see, the coefficient value is around 0

92

This means that for every one unit changed of your minimum temperature, the change in the maximumtemperature is around 0

92

This will just show you how significant your input variable is

So, for every one unit changein your minimum temperature, the change in the maximum temperature will be around 0

92

I hope you've understood this part

Now that we've trained our algorithm, it's trying to make some predictions

To do so, what we'll use iswe'll use our test data set, and we'll see how accurately our algorithm predicts the percentage score

Now to make predictions, we have this line of code

Predict is basically apredefined function in Python

And all you're going todo is you're going to pass your testing data set to this

Now what you'll do is you'll compare the actual output values, which is basically stored in your y test

And you'll compare theseto the predicted values, which is in y prediction

And you'll store thesecomparisons in our data frame called df

And all I'm doing here isI'm printing the data frame

So if you look at the output,this is what it looks like

These are your actual values and these are the valuesthat you predicted by building that model

So, if your actual value is 28, you predicted around 33, here your actual value is 31, meaning that your maximumtemperature is 31

And you predicted amaximum temperature of 30

Now, these values areactually pretty close

I feel like the accuracyis pretty good over here

Now in some cases, you seea lot of variance, like 23

Here it's 15

Right here it's 22

Here it's 11

But such cases are very often

And the best way to improveyour accuracy I would say is by training a model with more data

Alright

You can also view thiscomparison in the form of a plot

Let's see how that looks

So, basically, this is a bar graph that shows our actual valuesand our predicted values

Blue is represented by your actual values, and orange is representedby your predicted values

At places you can see that we've predicted pretty well, like the predictions are pretty close to the actual values

In some cases, the predictionsare varying a little bit

So in a few places, itis actually varying, but all of this depends onyour input data as well

When we saw the input data, also we saw a lot of variation

We saw a couple of outliers

So, all that also mighteffect your output

But then this is how youbuild machine learning models

Initially, you're never going to get a really good accuracy

What you should do is you have to improve your training process

That's the best wayyou can predict better, either you use a lot of data, train your model with a lot of data, or you use other methodslike parameter tuning, or basically you try and findanother predictor variable that'll help you more inpredicting your output

To me, this looks pretty good

Now let me show you another plot

What we're doing is we'redrawing a straight line plot

Okay, let's see how it looks

So guys, this straight line represents a linear relationship

Now let's say you get a new data point

Okay, let's say thevalue of x is around 20

So by using this line, you can predict that four aminimum temperature of 20, your maximum temperaturewould be around 25 or something like that

So, we basically drewa linear relationship between our input andoutput variable over here

And the final step is to evaluate the performance of the algorithm

This step is particularlyimportant to compare how well different algorithms perform on a particular data set

Now for regression algorithms, three evaluation metrics are used

We have something knownas mean absolute error, mean squared error, androot mean square error

Now mean absolute error is nothing but the absolute value of the errors

Your mean squared error is amean of the squared errors

That's all

It's basically you readthis and you understand what the error means

A root mean squarederror is the square root of the mean of the squared errors

Okay

So these are pretty simple to understand your mean absolute error,your mean squared errors, your root mean squared error

Now, luckily, we don'thave to perform these calculations manually

We don't have to code eachof these calculations

The cycle on library comeswith prebuilt functions that can be used to find out these values

Okay

So, when you run this code, you will get these values for each of the errors

You'll get around 3

19 asthe mean absolute error

Your mean squared error is around 17

63

Your root mean squarederror is around 4

19

Now these error values basically show that our model accuracy is not very precise, but it's still able tomake a lot of predictions

We can draw a good linear relationship

Now in order to improvethe efficiency at all, there are a lot of methods like this, parameter tuning and all of that, or basically you can train yourmodel with a lot more data

Apart from that, you can useother predictor variables, or maybe you can studythe relationship between other predictor variables and your maximum temperature variable

There area lot of ways to improve the efficiency of the model

But for now, I just wantedto make you understand how linear regression works, and I hope all of you havea good idea about this

I hope all of you havea good understanding of how linear regression works

This is a small demo about it

If any of you still have any doubts, regarding linear regression, please leave that in the comment section

We'll try and solve all your errors

So, if you look at this equation, we calculated everything here

we drew a relationship between y and x, which is basically x wasour minimum temperature, y was our maximum temperature

We also calculated theslope and the intercept

And we also calculatedthe error in the end

We calculated mean squared error we calculated the root mean squared error

We also calculate the mean absolute error

So that was everythingabout linear regression

This was a simple linear regression model

Now let's move on and lookat our next algorithm, which is a logistic regression

Now, in order to understandwhy we use logistic regression, let's consider a small scenarios

Let's say that your little sister is trying to get into grad school and you want to predictwhether she'll get admitted in her dream school or not

Okay, so based on herCGPA and the past data, you can use logistic regression to foresee the outcome

So logistic regressionwill allow you to analyze the set of variables andpredict a categorical outcome

Since here we need topredict whether she will get into a school or not, which is a classification problem, logistic regression will be used

Now I know the firstquestion in your head is, why are we not using linearregression in this case? The reason is that linear regression is used to predict a continuous quantity, rather than a categorical one

Here we're going to predict whether or not your sister isgoing to get into grad school

So that is clearly a categorical outcome

So when the result in outcome can take only classes of values, like two classes of values, it is sensible to have amodel that predicts the value as either zero or one, or in a probability form thatranges between zero and one

Okay

So linear regression doesnot have this ability

If you use linear regression to model a binary outcome, the resulting model willnot predict y values in the range of zero and one, because linear regression works on continuous dependent variables, and not on categorical variables

That's why we make useof logistic regression

So understand that linearregression was used to predict continuous quantities, and logistic regression is used to predict categorical quantities

Okay, now one majorconfusion that everybody has is people keep asking me why is logistic regressioncalled logistic regression when it is used for classification

The reason it is named logistic regression is because its primary technique is very similar to logistic regression

There's no other reason behind the naming

It belongs to the general linear models

It belongs to the sameclass as linear regression, but that is not the other reason behind the name logistic regression

Logistic regression is mainly used for classification purpose, because here you'll have topredict a dependent variable which is categorical in nature

So this is mainly used for classification

So, to define logistic regression for you, logistic regression isa method used to predict a dependent variable y, given an independent variable x, such that the dependentvariable is categorical, meaning that your outputis a categorical variable

So, obviously, this isclassification algorithm

So guys, again, to clear your confusion, when I say categorical variable, I mean that it can holdvalues like one or zero, yes or no, true or false, and so on

So, basically, in logistic regression, the outcome is always categorical

Now, how does logistic regression work? So guys, before I tell youhow logistic regression works, take a look at this graph

Now I told you that the outcome in a logistic regression is categorical

Your outcome will either be zero or one, or it'll be a probability thatranges between zero and one

So, that's why we have this S curve

Now some of you might thinkthat why do we have an S curve

We can obviously have a straight line

We have something knownas a sigmoid curve, because we can have valuesranging between zero and one, which will basically show the probability

So, maybe your output will be 0

7, which is a probability value

If it is 0

7, it means that youroutcome is basically one

So that's why we have thissigmoid curve like this

Okay

Now I'll explain more about this in depth in a while

Now, in order to understandhow logistic regression works, first, let's take a look at the linear regression equation

This was the logistic regression equation that we discussed

Y here stands for the dependent variable that needs to be predicted beta naught is nothing by the y intercept

Beta one is nothing but the slope

And X here representsthe independent variable that is used to predict y

That E denotes the erroron the computation

So, given the fact that xis the independent variable and y is the dependent variable, how can we represent arelationship between x an y so that y ranges onlybetween zero and one? Here this value basically denotes probably of y equal to one, given some value of x

So here, because thisPr, denotes probability and this value basicallydenotes that the probability of y equal to one, given some value of x, this is what we need to find out

Now, if you wanted tocalculate the probability using the linear regression model, then the probabilitywill look something like P of X equal to beta naught plus beta one into X

P of X will be equal to betanaught plus beta one into X, where P of X nothing but your probability ofgetting y equal to one, given some value of x

So the logistic regression equation is derived from the same equation, except we need to make a few alterations, because the output is only categorical

So, logistic regression doesnot necessarily calculate the outcome as zero or one

I mentioned this before

Instead, it calculates theprobability of a variable falling in the class zero or class one

So that's how we can conclude that the resulting variable must be positive, and it should lie between zero and one, which means that it must be less than one

So to meet these conditions, we have to do two things

First, we can take theexponent of the equation, because taking an exponential of any value will make sure that youget a positive number

Correct? Secondly, you have to make sure that your output is less than one

So, a number divided by itself plus one will always be less than one

So that's how we get this formula First, we take theexponent of the equation, beta naught plus beta one plus x and then we divide itby that number plus one

So this is how we get this formula

Now the next step is to calculate something known as a logic function

Now the logic function is nothing, but it is a link function that is represented as an S curve or as a sigmoid curve that ranges betweenthe value zero and one

It basically calculates the probability of the output variable

So if you look at thisequation, it's quite simple

What we have done hereis we just cross multiply and take each of our beta naught plus beta one into x as common

The RHS denotes the linear equation for the independent variables

The LHS represents the odd ratio

So if you compute this entire thing, you'll get this final value, which is basically yourlogistic regression equation

Your RHS here denotes the linear equation for independent variables, and your LHS represents the odd ratio which is also known as the logic function

So I told you that logic function is basically a functionthat represents an S curve that bring zero and one

this will make sure that our value ranges between zero and one

So in logistic regression, on increasing this X by one measure, it changes the logic bya factor of beta naught

It's the same thing as I showedyou in logistic regression

So guys, that's how you derive the logistic regression equation

So if you have any doubtsregarding these equations, please leave them in the comment section, and I'll get back to you,and I'll clear that out

So to sum it up, logisticregression is used for classification

The output variable will alwaysbe a categorical variable

We also saw how you derive thelogistic regression equation

And one more important thing is that the relationship between the variables and a logistic regression is denoted as an S curve which is alsoknows as a sigmoid curve, and also the outcome does not necessarily have to becalculated as zero or one

It can be calculate as a probability that the output lies inclass one or class zero

So your output can be a probability ranging between zero and one

That's why we have a sigmoid curve

So I hope all of you are clearwith logistic regression

Now I won't be showingyou the demo right away

I'll explain a couple of moreclassification algorithms

Then I'll show you a practical demo where we'll use multipleclassification algorithms to solve the same problem

Again, we'll also calculate the accuracy and se which classificationalgorithm is doing the best

Now the next algorithmI'm gonna talk about is decision tree

Decision tree is one ofmy favorite algorithms, because it's very simple to understand how a decision tree works

So guys, before this, wediscussed linear regression, which was a regression algorithm

Then we discussed logistic regression, which is a classification algorithm

Remember, don't get confused just because it has the name logistic regression

Okay, it is a classification algorithm

Now we're discussing decision tree, which is again a classification algorithm

Okay

So what exactly is a decision tree? Now a decision tree is, again, a supervised machine learning algorithm which looks like an inverted tree wherein each node representsa predictor variable, and the link between thenode represents a decision, and each leaf node represents an outcome

Now I know that's a little confusing, so let me make you understandwhat a decision tree is with the help of an example

Let's say that you hosted a huge party, and you want to knowhow many of your gusts are non-vegetarians

So to solve this problem, you can create a simple decision tree

Now if you look at this figure over here, I've created a decisiontree that classifies a guest as either vegetarian or non-vegetarian

Our last outcome here is non-veg or veg

So here you understand that this is a classification algorithm, because here you're predictinga categorical value

Each node over here representsa predictor variable

So eat chicken is one variable, eat mutton is one variable, seafood is another variable

So each node representsa predictor variable that will help you conclude whether or not a guest is a non-vegetarian

Now as you traverse down the tree, you'll make decisions that each node until you reach the dead end

Okay, that's how it works

So, let's say we got a new data point

Now we'll pass it throughthe decision tree

The first variable is did the guest eat the chicken? If yes, then he's a non-vegetarian

If no, then you'll passit to the next variable, which is did the guest eat mutton? If yes, then he's a non-vegetarian

If no, then you'll passit to the next variable, which is seafood

If he ate seafood, thenhe is a non-vegetarian

If no, then he's a vegetarian

this is how a decision tree works

It's a very simple algorithm that you can easily understand

It has drawn out letters, whichis very easy to understand

Now let's understand thestructure of a decision tree

I just showed you an example of how the decision tree works

Now let me take the same example and tell you the structurefor decision tree

So, first of all, we havesomething known as the root node

Okay

The root node is the starting point of a decision tree

Here you'll perform the first split and split it into two other nodes or three other nodes, dependingon your problem statement

So the top most node isknown as your root node

Now guys, about the root node, the root node is assigned to a variable that is very significant, meaning that thatvariable is very important in predicting the output

Okay, so you assign a variable that you think is the mostsignificant at the root node

After that, we have somethingknown as internal nodes

So each internal noderepresents a decision point that eventually leads to the output

Internal nodes will haveother predictor variables

Each of these are nothingpredictor variables

I just made it into a question otherwise these are justpredictor variables

Those are internal nodes

Terminal nodes, alsoknown as the leaf node, represent the final classof the output variable, because these are basically your outcomes, non-veg and vegetarian

Branches are nothing butconnections between nodes

Okay, these connections are links between each node is known as a branch, and they're represented by arrows

So each branch will havesome response to it, either yes or no, true orfalse, one or zero, and so on

Okay

So, guys, this is thestructure of a decision tree

It's pretty understandable

Now let's move on and we'll understand how thedecision tree algorithm works

Now there are many waysto build a decision tree, but I'll be focusing on something known as the ID3 algorithm

Okay, this is somethingknown as the ID3 algorithm

That is one of the waysin which you can build the decision tree

ID3 stands for IterativeDichotomiser 3 algorithm, which is one of the mosteffective algorithms used to build a decision tree

It uses the concepts ofentropy and information gain in order to build a decision tree

Now you don't have to know what exactly the ID3 algorithm is

It's just a concept behindbuilding a decision tree

Now the ID3 algorithm hasaround six defined steps in order to build a decision tree

So the first step is you willselect the best attribute

Now what do you meanby the best attribute? So, attribute is nothing but the predictor variable over here

So you'll select thebest predictor variable

Let's call it A

After that, you'll assign this A as a decision variable for the root node

Basically, you'll assignthis predictor variable A at the root node

Next, what you'll dois for each value of A, you'll build a descendant of the node

Now these three steps, let's look at it with the previous example

Now here the bestattribute is eat chicken

Okay, this is my bestattribute variable over here

So I selected that attribute

And what is the next step? Step two was assigned thatas a decision variable

So I assigned eat chickas the decision variable at the root node

Now you might be wondering how do I know which is the best attribute

I'll explain all of that in a while

So what we did is we assignedthis other root node

After that, step number threesays for each value of A, build a descendant of the node

So for each value of this variable, build a descendant node

So this variable can taketwo values, yes and no

So for each of these values, I build a descendant node

Step number four, assignclassification labels to the leaf node

To your leaf node, I have assigned classification one asnon-veg, and the other is veg

That is step number four

Step number five is if datais correctly classified, then you stop at that

However, if it is not, then you keep iterating over the tree, and keep changing the position of the predictor variables in the tree, or you change the root node also in order to get the correct output

So now let me answer this question

What is the best attribute? What do you mean by the best attribute or the best predictor variable? Now the best attribute is the one that separates the datainto different classes, most effectively, or it is basically a feature that best splits the data set

Now the next question in yourhead must be how do I decide which variable or whichfeature best splits the data

To do this, there aretwo important measures

There's something knownas information gain and there's something known as entropy

Now guys, in order to understand information gain and entropy, we look at a simple problem statement

This data represents the speed of a car based on certain parameters

So our problem statementhere is to study the data set and create a decision tree that classifies the speed of the caras either slow or fast

So our predictor variableshere are road type, obstruction, and speed limit, and or response variable, orour output variable is speed

So we'll be building a decisiontree using these variables in order to predict the speed of car

Now like I mentioned earlier, we must first begin by deciding a variable that best splits the data set and assign that particularvariable to the root node and repeat the same thingfor other nodes as well

So step one, like we discussed earlier, is to select the best attribute A

Now, how do you know whichvariable best separates the data? The variable with thehighest information gain best derives the data intothe desired output classes

First of all, we'llcalculate two measures

We'll calculate the entropyand the information gain

Now this is where it ellyou what exactly entropy is, and what exactly information gain is

Now entropy is basically used to measure the impurity or the uncertaintypresent in the data

It is used to decide how adecision tree can split the data

Information gain, on the other hand, is the most significant measure which is used to build a decision tree

It indicates how muchinformation a particular variable gives us a bout the final outcome

So information gain is important, because it is used to choose a variable that best splits the data at each node for a decision tree

Now the variable with thehighest information gain will be used to split thedata at the root node

Now in our data set, thereare are four observations

So what we're gonna do iswe'll start by calculating the entropy and information gain for each of the predictor variable

So we're gonna start bycalculating the information gain and entropy for the road type variable

In our data set, you can see that there are four observations

There are four observationsin the road type column, which corresponds to the fourlabels in the speed column

So we're gonna begin bycalculating the information gain of the parent node

The parent node is nothing butthe speed of the care node

This is our output variable, correct? It'll be used to showwhether the speed of the car is slow or fast

So to find out the information gain of the speed of the car variable, we'll go through a couple of steps

Now we know that thereare four observations in this parent node

First, we have slow

Then again we have slow, fast, and fast

Now, out of these fourobservations, we have two classes

So two observationsbelong to the class slow, and two observationsbelong to the class fast

So that's how you calculateP slow and P fast

P slow is nothing by the fraction of slow outcomes in the parent node, and P fast is thefraction of fast outcomes in the parent node

And the formula to calculate P slow is the number of slowoutcomes in the parent node divided by the total number of outcomes

So the number of slow outcomesin the parent node is two, and the total number of outcomes is four

We have four observations in total

So that's how we get P of slow as 0

5

Similarly, for P of fast, you'll calculate the number of fast outcomes divided by the total number of outcomes

So again, two by four, you'll get 0

5

The next thing you'lldo is you'll calculate the entropy of this node

So to calculate the entropy,this is the formula

All you have to do is youhave to substitute the, you'll have to substitutethe value in this formula

So P of slow we're substituting as 0

5

Similarly, P of fast as 0

5

Now when you substitute the value, you'll get a answer of one

So the entropy of your parent node is one

So after calculating theentropy of the parent node, we'll calculate the informationgain of the child node

Now guys, remember thatif the information gain of the road type variable isgreat than the information gain of all the other predictor variables, only then the root nodecan be split by using the road type variable

So, to calculate the informationgain of road type variable, we first need to split the root node by sing the road type variable

We're just doing this in order to check if the road type variable is giving us maximuminformation about a data

Okay, so if you notice thatroad type has two outcomes, it has two values, either steep or flat

Now go back to our data set

So here what you can notice is whenever the road type is steep, so first what we'll do is we'll check the value of speed that we get when the road type is steep

So, first, observation

You see that wheneverthe road type is steep, you're getting a speed of slow

Similarly, in the second observation, when the road type is steep, you'll get a value of slow again

If the road type is flat, you'llget an observation of fast

And again, if it is steep,there is a value of fast

So for three steep values, we have slow, slow, and fast

And when the road type is flat, we'll get an output of fast

That's exactly what I'vedone in this decision tree

So whenever the road type is steep, you'll get slow, slow or fast

And whenever the road type is flat, you'll get fast

Now the entropy of theright-hand side is zero

Entropy is nothing but the uncertainty

There's no uncertainty over here

Because as soon as you seethat the road type is flat, your output is fast

So there's no uncertainty

But when the road type is steep, you can have any one ofthe following outcomes, either your speed will be slow or it can be fast

So you'll start by calculating the entropy of both RHS and LHS of the decision tree

So the entropy for the rightside child node will be zero, because there's no uncertainty here

Immediately, if you seethat the road type is flat, your speed of the car will be fast

Okay, so there's no uncertainty here, and therefore your entropy becomes zero

Now entropy for the left-hand side is we'll again have to calculate the fraction of P slow andthe fraction of P fast

So out of three observations, in two observations we have slow

That's why we have two by three over here

Similarly for P fast, we have one P fast divided by the total number ofobservation which are three

So out of these three, wehave two slows and one fast

When you calculate P slow and P fast, you'll get these two values

And then when you substitutethe entropy in this formula, you'll get the entropy as 0

9for the road type variable

I hope you all are understanding this

I'll go through this again

So, basically, here we are calculating the information gain andentropy for road type variable

Whenever you consider road type variable, there are two values, steep and flat

And whenever the valuefor road type is steep, you'll get anyone of these three outcomes, either you'll get slow, slow, or fast

And when the road type is flat, your outcome will be fast

Now because there is no uncertainty whenever the road type is flat, you'll always get an outcome of fast

This means that the entropy here is zero, or the uncertainty value here is zero

But here, there is a lot of uncertainty

So whenever your road type is steep, your output can either beslow or it can be fast

So, finally, you get the Python as 0

9

So in order to calculatethe information gain of the road type variable

You need to calculatethe weighted average

I'll tell you why

In order to calculatethe information gain, you need to know theentropy of the parent, which we calculate as one, minus the weightedaverage into the entropy of the children

Okay

So for this formula, you need to calculate all of these values

So, first of all, you needto calculate the entropy of the weighted average

Now the total number ofoutcomes in the parent node we saw were four

The total number of outcomes in the left child node were three

And the total number ofoutcomes in the right child node was one

Correct? In order to verify this with you, the total number of outcomesin the parent node are four

One, two, three, and four

Coming to the child node,which is the road type, the total number of outcomeson the right-hand side of the child node is one

And the total number of outcomes on the left-hand side ofthe child node is three

That's exactly whatI've written over here

Alright, I hope you allunderstood these three values

After that, all you have to do is you have to substitute thesevalues in this formula

So when you do that, you'll get the entropy of the childrenwith weighted average will be around 0

675

Now just substitute thevalue in this formula

So if you calculate the information gain of the road type variable, you'll get a value of 0

325

Now by using the same method, you're going to calculatethe information gain for each of the predictor variable, for road type, for obstruction,and for speed limit

Now when you follow the same method and you calculate the information gain, you'll get these values

Now what does thisinformation gain for road type equal to 0

325 denote? Now the value 0

325 forroad type denotes that we're getting very little information gain from this road type variable

And for obstruction, we literally have information gain of zero

Similarly, information gainedfor speed limit is one

This is the highest valuewe've got for information gain

This means that we'll have touse the speed limit variable at our root node in orderto split the data set

So guys, don't getconfused whichever variable gives you the maximum information gain

That variable has to bechosen at the root node

So that's why we have theroot node as speed limit

So if you've maintained the speed limit, then you're going to go slow

But if you haven'tmaintained the speed limit, then the speed of yourcar is going to be fast

Your entropy is literally zero, and your information is one, meaning that you can use thisvariable at your root node in order to split the data set, because speed limit gives youthe maximum information gain

So guys, I hope this usecase is clear to all of you

To sum everything up, I'll just repeat the entirething to you all once more

So basically, here you weregiven a problem statement in order to create a decision tree that classifies the speed ofa car as either slow or fast

So you were given threepredictor variables and this was your output variable

Information gained in entropyare basically two measures that are used to decide which variable will be assigned to the rootnode of a decision tree

Okay

So guys, as soon as youlook at the data set, if you compare these two columns, that is speed limit and speed, you'll get an output easily

Meaning that if you'remaintaining speed limit, you're going to go slow

But if you aren't maintaining speed limit, you're going to a fast

So here itself we canunderstand the speed limit has no uncertainty

So every time you'vemaintained your speed limit, you will be going slow, and every time youroutside or speed limit, you will be going fast

It's as simple as that

So how did you start? So you started by calculating the entropy of the parent node

You calculated the entropyof the parent node, which came down to one

Okay

After that, you calculatedthe information gain of each of the child nodes

In order to calculate the information gain of the child node, you stat by calculating the entropy of the right-hand sideand the left-hand side of the decision tree

Okay

Then you calculate the entropy along with the weighted average

You substitute these values inthe information gain formula, and you get the information gain for each of the predictor variables

So after you get the information gain of each of the predictor variables, you check which variable gives you the maximum information gain, and you assign thatvariable to your root node

It's as simple as that

So guys, that was allabout decision trees

Now let's look at our nextclassification algorithm which is random forest

Now first of all, what is a random forest? Random forest basicallybuilds multiple decision trees and glues them togetherto get a more accurate and stable prediction

Now if already have decision trees and random forest is nothing but a collection of decision tree, why do we have to use a random forest when we already have decision tree? There are three main reasonswhy random forest is used

Now even though decisiontrees are convenient and easily implemented, they are not as accurate as random forest

Decision trees work very effectively with the training data, backup they're not flexible when it comes to classifying anew sample

Now this happens because ofsomething known as overfitting

Now overfitting is a problem that is seen with decision trees

It's something that commonly occurs when we use decision trees

Now overfitting occurswhen a model studies a training data to such an extent that it negatively influences the performance of themodel on a new data

Now this means that the disturbance in the training data is recorded, and it is learned as concept by the model

If there's any disturbance or any thought of noisein the training data or any error in the training data, that is also studied by the model

The problem here is that these concepts do not apply to the testing data, and it negatively impactsthe model's ability to classify new data

So to sum it up, overfitting occurs whenever your model learns the training data, along with all the disturbancein the training data

So it basically memorizedthe training data

And whenever a new datawill be given to your model, it will not predict theoutcome very accurately

now this is a problemseen in decision trees

Okay

But in random forest, there'ssomething known as bagging

Now the basic idea behind bagging is to reduce the variationsand the predictions by combining the resultof multiple decision trees on different samples of the data set

So your data set will bedivided into different samples, and you'll be building a decision tree on each of these samples

This way, each decisiontree will be studying one subset of your data

So this way over fitting will get reduced because one decision tree is not studying the entire data set

Now let's focus on random forest

Now in order to understand random forest, we look at a small example

We can consider this data set

In this data, we havefour predictor variables

We have blood flow, blocked arteries, chest pain, and weight

Now these variables are used to predict whether or not a personhas a heart disease

So we're going to use this data set to create a random forest that predicts if a person has a heart disease or not

Now the first step increating a random forest is that you create a bootstrap data set

Now in bootstrapping, all you have to do is you have to randomly select samples from your original data set

Okay

And a point to note is thatyou can select the same sample more than once

So if you look at the original data set, we have a abnormal, normal,normal, and abnormal

Look at the blood flow section

Now here I've randomly selected samples, normal, abnormal, and I've selected one sample twice

You can do this in a bootstrap data set

Now all I did here is I created a bootstrap data set

Boot strapping is nothingbut an estimation method used to make predictions on a data by re-sampling the data

This is a bootstrap data set

Now even though this seems very simple, in real world problems, you'll never get such small data set

Okay, so bootstrapping is actually a littlemore complex than this

Usually in real world problems, you'll have a huge data set, and bootstrapping thatdata set is actually a pretty complex problem

I'm here because I'm making you understand how random forest works, so that's why I'veconsidered a small data set

Now you're going to usethe bootstrap data set that you created, and you're going to builddecision trees from it

Now one more thing tonote in random forest is you will not be usingyour entire data set

Okay, so you'll only beusing few other variables at each node

So, for example, we'llonly consider two variables at each step

So if you begin at the root node here, we will randomly select two variables as candidates for the root node

Okay, let's say thatwe selected blood flow and blocked arteries

Out of these two variables wehave to select the variable that best separates the sample

Okay

So for the sake of this example, let's say that blocked arteries is the most significant predictor, and that's why we'llassign it to the root node

Now our next step is torepeat the same process for each of these upcoming branch nodes

Here we'll again selecttwo variables at random as candidates for eachof these branch nodes, and then choose a variable that best separates the samples, right? So let me just repeat this entire process

So you know that you startcreating a decision tree by selecting the root node

In random forest, you'll randomly select a couple of variables for each node, and then you'll calculate which variable best splits the data at that node

So for each node, we'll randomly select two or three variables

And out of those two, three variables, we'll see which variablebest separates the data

Okay, so at each node, we'll because calculatinginformation gain an entropy

Basically, that's what I mean

At every node, you'llcalculate information gain and entropy of two or three variables, and you'll see which variable has the highest information gain, and you'll keep descending downwards

That's how you create a decision tree

So we just created ourfirst decision tree

Now what you do is you'llgo back to step one, and you'll repeat the entire process

So each decision tree willpredict the output class based on the predictor variables that you've assignedto each decision tree

Now let's say for this decision tree, you've assigned blood flow

Here we have blockedarteries at the root node

Here we might have blood flowat the root node and so on

So your output will dependon which predictor variable is at the root node

So each decision tree willpredict the output class based on the predictor variable that you assigned in that tree

Now what you do is you'llgo back to step one, you'll create a new bootstrap data set, and then again you'llbuild a new decision tree

And for that decision tree, you'll consider onlya subset of variables, and you'll choose thebest predictor variable by calculating the information gain

So you will keep repeating this process

So you just keep repeatingstep two and step one

Okay

And you'll keep creatingmultiple decision trees

Okay

So having a variety of decisiontrees in a random forest is what makes it more effective than an individual decision tree

So instead of having anindividual decision tree, which is created using all the features, you can build a random forest that uses multiple decision trees wherein each decisiontree has a random set of predictor variables

Now step number four ispredicting the outcome of a new data point

So now that you'vecreated a random forest, let's see how it can be used to predict whether a new patienthas a heart disease or not

Okay, now this diagrambasically has a data about the new patient

Okay, this is the dataabout the new patient

He doesn't have blocked arteries

He has chest pain, and hisweight is around 185 kgs

Now all you have to do isyou have to run this data down each of the decisiontrees that you made

So, the first decision tree shows that yes, this person has heart disease

Similarly, you'll run theinformation of this new patient through every decisiontree that you created

Then depending on how manyvotes you get for yes and no, you'll classify that patient as either having heart disease or not

All you have to do is you have to run the information of the new patient through all the decisiontrees that you created in the previous step, and the final output isbased on the number of votes each of the class is getting

Okay, let's say that three decision trees said that yes the patienthas heart disease, and one decision tree saidthat no it doesn't have

So this means you will obviously classify the patient as having a heart disease because three of them voted for yes

It's based on majority

So guys, I hope the conceptbehind random forest is understandable

Now the next step is you will evaluate the efficiency of the model

Now earlier when we createdthe bootstrap data set we left out one entry sample

This is the entry sample we left out, because we repeated one sample twice

If you'll remember inthe bootstrap data set, here we repeated an entry twice, and we missed out on one of the entries

We missed out on one of the entries

So what we're gonna do is

So for evaluating the model, we'll be using the dataentry that we missed out on

Now in a real world problem, about 1/3 of the originaldata set is not included in the bootstrap dataset

Because there's a huge amount of data in a real world problem, so 1/3 of the original data set is not included in the bootstrap data set

So guys, the sample data set which is not there inyour bootstrap data set is known as out-of-bag data set, because basically this isour out-of-bag data set

Now the out-of-bag data set is used to check theaccuracy of the model

Because the model was not created by using the out-of-bag data set, it will give us a good understanding of whether the model is effective or not

Now the out-of-bag data set is nothing but your testing data set

Remember, in machinelearning, there's training and testing data set

So your out-of-bag data set is nothing but your testing data set

This is used to evaluate theefficiency of your model

So eventually, you canmeasure the accuracy of a random forest by the proportion of out-of-bag samples thatare correctly classified, because the out-of-bag data set is used to evaluate theefficiency of your model

So you can calculate the accuracy by understanding how many samples or was this out-of-bag data set correctly able to classify it

So guys, that was an explanation about how random forest works

To give you an overview, let me just run you throughall the steps that we took

So basically, this was our data set, and all we have to dois we have to predict whether a patient hasheart disease or not

So, our first step was tocreate a bootstrap data set

A bootstrap data set isnothing but randomly selected observations from your original data set, and you can also have duplicate values in your bootstrap data set

Okay

The next step is you're goingto create a decision tree by considering a randomset of predictor variables for each decision tree

Okay

So, the third step isyou'll go back to step one, create a bootstrap data set

Again, create a decision tree

So this iteration isperformed hundreds of times until you are multiple decision trees

Now that you've created a random forest, you'll use this random forestto predict the outcome

So if you're given a new data point and you have to classify it into one of the two classes, we'll just run this new information through all the decision trees

And you'll just take the majority of the output that you'regetting from the decision trees as your outcome

Now in order to evaluatethe efficiency of the model, you'll use the out ofthe bag sample data set

Now the out-of-bag sampleis basically the sample that was not included inyour bootstrap data set, but this sample is coming from your original data set, guys

This is not somethingthat you randomly create

This data set was therein your original data set, but it was just not mentioned in your bootstrap data set

So you'll use your out-of-bag sample in order to calculate the accuracy of your random forest

So the proportion of out-of-bag samples that are correctly classifiedwill give you the accuracy of your model

So that is all for random forest

So guys, I'll discuss other classification algorithms with you, and only then I'll show you a demo on the classification algorithms

Now our next algorithm issomething known as naive Bayes

Naive Bayes is, again, a supervised classification algorithm, which is based on the Bayes Theorem

Now the Bayes Theorem basically follows a probabilistic approach

The main idea behind naive Bayes is that the predictor variables ina machine learning model are independent of each other, meaning that the outcome of a model depends on a set of independent variables that have nothing to do with each other

Now a lot of you mightask why is naive Bayes called naive

Now usually, when I tellanybody why naive Bayes, they keep asking me why isnaive Bayes called naive

So in real world problemspredictor variables aren't always independent of each other

There is always some correlation between the independent variables

Now because naive Bayesconsiders each predictor variable to be independent of anyother variable in the model, it is called naive

This is an assumptionthat naive Bayes states

Now let's understand the math behind the naive Bayes algorithm

So like I mentioned, theprinciple behind naive Bayes is the Bayes Theorem, which is also known as the Bayes Rule

The Bayes Theorem is used to calculate the conditional probability, which is nothing but theprobability of an event occurring based on information aboutthe events in the past

This is the mathematicalequation for the Bayes Theorem

Now, in this equation, the LHS is nothing but the conditional probabilityof event A occurring, given the event B

P of A is nothing butprobability of event A occurring P of B is probability of event B

And PB of A is nothing butthe conditional probability of event B occurring, given the event A

Now let's try to understandhow naive Bayes works

Now consider this data set of around thousand 500 observations

Okay, here we have thefollowing output classes

We have either cat, parrot, or turtle

These are our output classes, and the predictor variables are swim, wings, green color, and sharp teeth

Okay

So, basically, your typeis your output variable, and swim, wings, green, and sharp teeth are your predictor variables

Your output variables has three classes, cat, parrot, and turtle

Okay

Now I've summarized this tableI've shown on the screen

The first thing you can seeis the class of type cats shows that out of 500 cats, 450 can swim, meaning that 90% of them can

And zero number of cats have wings, and zero number of catsare green in color, and 500 out of 500 cats have sharp teeth

Okay

Now, coming to parrot, itsays 50 out of 500 parrots have true value for swim

Now guys, obviously,this does not hold true in real world

I don't think there areany parrots who can swim, but I've just created this data set so that we can understand naive Bayes

So, meaning that 10% of parrotshave true value for swim

Now all 500 parrots have wings, and 400 out of 500 parrotsare green in color, and zero parrots have sharp teeth

Coming to the turtle class, all 500 turtles can swim

Zero number of turtles have wings

And out of 500, hundredturtles are green in color, meaning that 20% of theturtles are green in color

And 50 out of 500turtles have sharp teeth

So that's what we understandfrom this data set

Now the problem here is we are given our observation over here, given some value for swim,wings, green, and sharp teeth

What we need to do is we need to predict whether the animal is acat, parrot, or a turtle, based on these values

So the goal here to predictwhether it is a cat, parrot, or a turtle based on all these defined parameters

Okay

Based on the value of swim,wings, green, and sharp teeth, we'll understand whetherthe animal is a cat, or is it a parrot, or is it a turtle

So, if you look at the observation, the variables swim andgreen have a value of true, and the outcome can beanyone of the types

It can either be a cat,it can be a parrot, or it can be a turtle

So in order to checkif the animal is a cat, all you have to do isyou have to calculate the conditional probability at each step

So here what we're doing is we need to calculate the probability that this is a cat, given that it can swimand it is green in color

First, we'll calculate theprobability that it can swim, given that it's a cat

And two, the probability that it is green and the probability of it being green, given that it is a cat, and then we'll multiply it with the probability of it being a cat divided by the probabilityof swim and green

Okay

So, guys, I know you all cancalculate the probability

It's quite simple

So once you calculatethe probability here, you'll get a direct value of zero

Okay, you'll get a value of zero, meaning that this animalis definitely not a cat

Similarly, if you do this for parrots, you calculate a conditional probability, you'll get a value of 0

0264 divided by probabilityof swim comma green

We don't know this probability

Similarly, if you checkthis for the turtle, you'll get a probability of 0

066 divided by P swim comma green

Okay

Now for these calculations,the denominator is the same

The value of the denominator is the same, and the value of and the probability of it being a turtle is greaterthan that of a parrot

So that's how we can correctly predict that the animal is actually a turtle

So guys, this is how naive Bayes works

You basically calculate the conditional probability at each step

Whatever classification needs to be done, that has to be calculatedthrough probability

There's a lot of statistic that comes into naive Bayes

And if you all want tolearn more about statistics and probability, I'll leave a link in the description

You all can watch that video as well

There I've explain exactly what conditional probability is, and the Bayes Theorem isalso explained very well

So you all can check out that video also

And apart from this, ifyou all have any doubts regarding any of the algorithms, please leave them in the comment section

Okay, I'll solve your doubts

And apart from that, I'llalso leave a couple of links for each of the algorithmsin the description box

Because if you want morein-depth understanding of each of the algorithms, you can check out that content

Since this is a full course video, I have to cover all the topics, and it is hard for meto make you understand in-depth of each topic

So I'll leave a couple oflinks in the description box

You can watch those videos as well

Make sure you checkout the probability and statistics video

So now let's move on andlocate our next algorithm, which is the K nearest neighbor algorithm

Now KNN, which basicallystands for K nearest neighbor, is, again, a supervisedclassification algorithm that classifies a new datapoint into the target class or the output class,depending on the features of its neighboring data points

That's why it's called K nearest neighbor

So let's try to understandKNN with a small analogy

Okay, let's say that we want a machine to distinguish between theimages of cats and dogs

So to do this, we must input our data set of cat and dog images, and we have to train ourmodel to detect the animal based on certain features

For example, features such as pointy ears can be used to identify cats

Similarly, we can identify dogs based on their long ears

So after starting the data set during the training phase, when a new image is given to the model, the KNN algorithm will classify it into either cats or dogs, depending on the similarityin their features

Okay, let's say that anew image has pointy ears, it will classify that image as cat, because it is similar to the cat images, because it's similar to its neighbors

In this manner, the KNNalgorithm classifies the data point basedon how similar they are to their neighboring data points

So this is a small example

We'll discuss more aboutit in the further slides

Now let me tell you a couple of features of KNN algorithm

So, first of all, we know that it is a supervised learning algorithm

It uses labeled input data set to predict the output of the data points

Then it is also one of the simplest machine learning algorithms, and it can be easily implemented for a varied set of problems

Another feature is thatit is non-parametric, meaning that it does nottake in any assumptions

For example, naive Bayesis a parametric model, because it assumes that allthe independent variables are in no way related to each other

It has assumptions about the model

K nearest neighbor hasno such assumptions

That's why it's considereda non-parametric model

Another feature is thatit is a lazy algorithm

Now, lazy algorithmbasically is any algorithm that memorizes the training set, instead of learning adiscriminative function from the training data

Now, even though KNN is mainlya classification algorithm, it can also be used for regression cases

So KNN is actually both a classification and a regression algorithm

But mostly, you'll see that it'll be used on the four classification problems

The most important feature about a K nearest neighbor is that it's based on feature similarity with its neighboring data points

You'll understand this in the example that I'm gonna tell you

Now, in this image, wehave two classes of data

We have class A which is squares and class B which are triangles

Now the problem statement is to assign the new input data point to one of the two classes by using the KNN algorithm

So the first step in the KNN algorithm is to define the value of K

But what is the K in theKNN algorithm stand for? Now the K stands for thenumber of nearest neighbors, and that's why it's got thename K nearest neighbors

Now, in this image, I'vedefined the value of K as three

This means that the algorithm will consider the three neighbors that are closest to the new data point in order to decide theclass of the new data point

So the closest between the data point is calculated by using measure such as Euclidean distanceand Manhattan distance, which I'll be explaining in a while

So our K is equal to three

The neighbors include twosquares and one triangle

So, if I were to classifythe new data point based on K equal to three, then it should be assignedto class A, correct? It should be assigned to squares

But what if the K value is set to seven

Here I'm basically telling my algorithm to look for the seven nearest neighbors and classify the new data point into the class it is most similar to

So our K equal to seven

The neighbors include threesquares and four triangles

So if I were to classifythe new data point based on K equal to seven, then it would be assigned to class B, since majority of itsneighbors are from class B

Now this is where alot of us get confused

So how do we know which Kvalues is the most suitable for K nearest neighbor

Now there are a couple methods used to calculate the K value

One of them is known as the elbow method

We'll be discussing the elbow method in the upcoming slides

So for now let me just show you the measures that are involved behind KNN

Okay, there's very simple math behind the K nearest neighbor algorithm

So I'll be discussing theEuclidean distance with you

Now in this figure, we haveto measure the distance between P one and P two byusing Euclidean distance

I'm sure a lot of you already know what Euclidean distance is

It is something that we learnedin eighth or 10th grade

I'm not sure

So all you're doing isyou're extracting X one

So the formula isbasically x two minus x one the whole square plus y two minus y one the whole square, and the root of that isthe Euclidean distance

It's as simple as that

So Euclidean distance is used as a measure to check thecloseness of data points

So basically, KNN usesthe Euclidean distance to check the closeness of a new data point with its neighbors

So guys, it's as simple as that

KNN makes use of simple measures in order to solve very complex problems

Okay, and this is one of the reasons why KNN is such a commonly used algorithm

Coming to support vector machine

Now, this is our last algorithm under classification algorithms

Now guys, don't get paranoidbecause of the name

Support vector machine actually is one of the simplest algorithmsin supervised learning

Okay, it is basicallyused to classify data into different classes

It's a classification algorithm

Now unlike most algorithms, SVM makes use of somethingknown as a hyperplane which acts like a decision boundary between the separate classes

Okay

Now SVM can be used to generate multiple separating hyperplane, such that the data isdivided into segments, and each segment containsonly one kind of data

So, a few features of SVM include that it is a supervised learning algorithm, meaning that it's going tostudy a labeled training data

Another feature is that it is again a regression and aclassification algorithm

Even though SVM is mainlyused for classification, there is something known asthe support vector regressor

That is useful regression problems

Now, SVM can also be usedto classify non-linear data by using kernel tricks

Non-linear data is basically data that cannot be separated by using a single linear line

I'll be talking more about this in the upcoming slides

Now let's move on anddiscuss how SVM works

Now again, in order to make you understand how support vector machine works, you look at a small scenario

For a second, pretend that you own a farm and you have a problem

You need to set up a fence to protect your rabbitsfrom a pack of wolves

Okay, now, you need to decide where you want to build your fence

So one way to solvethe problem is by using support vector machines

So if I do that and if I tryto draw a decision boundary between the rabbits and the wolves, it looks something like this

Now you can clearly builda fence along this line

So in simple terms, this is exactly how your support vector machines work

It draws a decision boundary, which is nothing but a hyperplane between any two classesin order to separate them or classify them

Now I know that you'rethinking how do you know where to draw a hyperplane

The basic principle behind SVM is to draw a hyperplane that best separates the two classes

In our case, the two classes are the rabbits and the wolves

Now before we move any further, let's discuss the different terminologies that are there in support vector machine

So that is basically a hyperplane

It is a decision boundarythat best separates the two classes

Now, support vectors, whatexactly are support vectors

So when you start with thesupport vector machine, you start by drawing a random hyperplane

And then you check the distance between the hyperplaneand the closest data point from each of the class

These closest datapoints to the hyperplane are known as support vectors

Now these two data points are the closest to your hyperplane

So these are known as support vectors, and that's where the name comes from, support vector machines

Now the hyperplane is drawn based on these support vectors

And optimum hyperplane will be the one which has a maximum distance from each of the support vectors, meaning that the distancebetween the hyperplane and the support vectors has to be maximum

So, to sum it up, SVMis used to classify data by using a hyperplane, such that the distancebetween the hyperplane and the support vector is maximum

Now this distance isnothing but the margin

Now let's try to solve a problem

Let's say that I input a new data point and I want to draw a hyperplane such that it best separatesthese two classes

So what do I do? I start out by drawing a hyperplane, and then I check the distancebetween the hyperplane and the support vectors

So, basically here, I'm trying to check if the margin is maximumfor this hyperplane

But what if I drew thehyperplane like this? The margin for this hyperplaneis clearly being more than the previous one

So this is my optimal hyperplane

This is exactly how you understand which hyperplane needs to be chosen, because you can draw multiple hyperplanes

Now, the best hyperplane is the one that has a maximum module

So, this is my optimal hyperplane

Now so far it was quite easy

Our data was linearly separable, which means that youcould draw a straight line to separate the two classes

But what will you do ifthe data looks like this? You possibly cannot drawa hyperplane like this

You possibly cannot drawa hyperplane like this

It doesn't separate the two classes

We can clearly see rabbits and wolves in both of the classes

Now this is exactly where non-linear SVM comes into the picture

Okay, this is what thekernel trick is all about

Now, kernel is basicallysomething that can be used to transform data into another dimension that has a clear dividingmargin between classes of data

So, basically the kernel function offers the user the option of transforming non-linear spaces into linear ones

Until this point, if you notice that we were plotting our dataon two dimensional space

We had x and y-axis

A simple trick is transformingthe two variables, x and y, into a new feature space, which involves a new variable z

So, basically, what we're doing is we're visualizing the data on a three dimensional space

So when you transform the2D space into a 3D space, you can clearly see a dividing margin between the two classes of data

You can clearly draw a line in the middle that separates these two data sets

So guys, this sums upthe whole idea behind support vector machines

Support vector machines arevery easy to understand

Now, this was all for our supervised learning algorithms

Now, before I move on tounsupervised learning algorithms, I'll be running a demo

We'll be running a demo in order to understand allthe classification algorithms that we studied so far

Earlier in the session, we ran a demo for the regression algorithms

Now we'll run for theclassification algorithms

So, enough of theory

Let's open up Python, and let's start looking at how these classification algorithms work

Now, here what we'll be doing is we'll implement multipleclassification algorithms by using the scikit-learn

Okay, it's one of the most popular machine learning tool for Python

Now we'll be using a simple data set for the task of training aclassifier to distinguish between the different types of fruits

The purpose of this demo is to implement multiple classification algorithms for the same set of problem

So as usual, you start by importing all your libraries in Python

Again, guys, if you don't know Python, check the description box, I'll leave a link there

You can go through that video as well

Next, what we're doing iswe're reading the fruit data in the form of table

You stored it in a variable called fruits

Now if you wanna see thefirst few rows of the data, let's print the first fewobservations in our data set

So, this is our data set

These are the fruit labels

So we have around fourfruits in our data set

We have apple, we have mandarin, orange, and lemon

Okay

Now, fruit label denotesnothing but the label of apple, which is one

Mandarin has two

Similarly, orange is labeled as three

And lemon is labeled as four

Then a fruit subtype is basically the family of fruit it belongs to

Mass is the mass of the fruit, width, height, and color score

These are all our predictor variables

We have to identify the type of fruit, depending on these predictor variables

So, first, we saw a coupleof observations over here

Next, if you want to seethe shape of your data set, this is what it looks like

There are around 59 observations with seven predictor variables, which is one, two, three,four, five, six, and seven

We have seven variables in total

Sorry, not predictor variables

This seven denotes both your predictor and your target variable

Next, I'm just showing youthe four fruits that we have in our data set, which is apple, mandarin,orange, and lemon

Next, I'm just groupingfruits by their names

Okay

So we have 19 apples in our data set

We have 16 lemons

We have only five mandarins, and we have 19 oranges

Even though the number ofmandarin samples is low, we'll have to work with it, because right now I'mjust trying to make you understand the classification algorithms

The main aim for mebehind doing these demos is so that you understand how classification algorithms work

Now what you can do isyou can also plot a graph in order to see the frequencyof each of these fruits

Okay, I'll show you whatthe plot looks like

The number of applesand oranges is the same

We have I think around19 apples and oranges

And similarly, this isthe count for lemons

Okay

So this is a small visualization

Guys, visualization isactually very important when it comes to machine learning, because you can see most of the relations and correlations by plotting graphs

You can't see those correlations by just running code and all of that

Only when you plot differentvariables on your graph, you'll understand how they are related

One of the main task in machine learning is to visualize data

It ensures that you understand the correlation between data

Next, what we're gonna do is we'll graph something known as a box plot

Okay, a box plot basicallyhelps you understand the distribution of your data

Let me run the box plot, and I'll show you what exactly I mean

So this is our box plot

So, box plot will basically give you a clearer idea of the distribution of your input variables

It is mainly used inexploratory data analysis, and it represents thedistribution of the data and its variability

Now, the box plot contains upper quartile and lower quartile

So the box plot basicallyspanned your interquartile range or something known as IQR

IQR is nothing but your third quartile subtracted from your first quartile

Now again, this involvesstatistics and probability

So I'll be leaving a linkin the description box

You can go through that video

I've explained statisticsprobability, IQR, range, and all of that in there

So, one of the main reasonswhy box plots are used is to detect any sortof outliers in the data

Since the box plot spans the IQR, it detects the data point that lie outside the average range

So if you see in the colored space, most of the data isdistributed around the IQR, whereas here the data arenot that well distributed

Height also is not very well distributed, but color space ispretty well distributed

This is what the box plot shows you

So guys, this involves a lot of math

ALl of these, each and everyfunction in machine learning involves a lot of math

So you know it's necessaryto have a good understanding of statistics, probability,and all of that

Now, next, what we'll dois we'll plot a histogram

Histogram will basically show you the frequency of occurrence

Let me just plot this, andthen we'll try and understand

So here you can understanda few correlations

Okay, some pairs of theseattributes are correlated

For example, mass and width, they're somehow correlatedalong the same ranges

So this suggests a high correlation and a predictable relationship

Like if you look at thegraphs, they're quite similar

So for each of the predictor variables, I've drawn a histogram

For each of that input data,we've drawn a histogram

Now guys, again, like i said, plotting graphs is very important because you understanda lot of correlations that you cannot understand by just looking at your data, or just running operations on your data

Repeat, or just running code on your data

Okay

Now, next, what we'redoing here is we're just dividing the data set intotarget and predictor variables

So, basically, I've createdan array of feature names which has your predictor variables

It has mass, width, height, color space

And you have assigned that as X, since this is your input, and y is your outputwhich is your fruit label

That'll show whether it is an apple, orange, lemon, and so on

Now, the next step thatwe'll perform over here is pretty evident

Again, this is data splicing

So data splicing, by now, I'm sure all of you know what it is

It is splitting your data intotraining and testing data

So that's what we've done over here

Next, we're importing somethingknown as the MinMaxScaler

Scaling or normalizing your data is very important in machine learning

Now, I'm seeing this because your raw data can be very biased

So it's very importantto normalize your data

Now when I say normalize your data, so if you look at the value of mass and if you look at thevalue of height and color, you see that mass is rangingin hundreds and double digits, whereas height is in single digit, and color score is noteven in single digits

So, if some of your variableshave a very high range, you know they have a very high scale, like they're in twodigits or three digits, whereas other variables aresingle digits and lesser, then your output isgoing to be very biased

It's obvious that it'sgonna be very biased

That's why you have to scale your data in such a way that all of these values will have a similar range

So that's exactly whatthe scaler function does

Okay

Now since we have already divided our data into training and testing data, our next step is to build the model

So, first, we're gonna be using the logistic regression algorithm

I've already discussed logisticregression with you all

It's a classification algorithm, which is basically usedto predict the outcome of a categorical variable

So we already have the logisticregression class in Python

All you have to do is you have to give an instance for this function, which is logreg over here

And I'm fitting this instancewith a training data set, meaning that I'm running the algorithm with the training data set

Once you do that, you can calculate the accuracy by using this function

So here I'm calculate the accuracy on the training data set and on the testing data set

Okay, so let's look at the output of this

Now guys, ignore this future warning

Warnings are ignored in Python

Now, accuracy of the logisticregression classifier on the training data set is around 70%

It was pretty good onthe training data set

But when it comes to classifyingon the test data set, it's only 40%, which is not that good for a classifier

Now again, this can dependon the problem statement, for which problem statement is logistic regression more suitable

Next, we'll do the same thingusing the decision tree

So again, we just call thedecision tree function, and we'll fit it withthe training data set, and we'll calculate the accuracy of the decision tree on the training, and the testing data set

So if you do that for a decision tree on the training data set, you get 100% accuracy

But on the testing data set, you have around 87% of accuracy

This is something that Idiscussed with you all earlier, that this is decision trees are very good with training data set, because of a process known as overfitting

But when it comes to classifying the outcome on the testing data set, the accuracy reduces

Now, this is very good comparedto logistic regression

For this problem statement, decision trees works better that logistic regression

Coming to KNN classifier

Again, all you have to do is you have to call the K neighborclassifier, this function

And you have to fit thiswith the training data set

If you calculate the accuracyfor a KNN classifier, we get a good accuracy actually

On the training data set, we get an accuracy of 95%

And on the testing data set, it's 100%

That is really good,because our testing data set actually achieved more of an accuracy than on a training data set

Now all of this depends on the value of K that you've chosen for KNN

Now, I mentioned thatyou use the elbow method to choose the K value inthe K nearest neighbor

I'll be discussing the elbowmethod in the next section

So, don't worry if youhaven't understood that yet

Now, we're also using anaive Bayes classifier

Here we're using a Gaussiannaive Bayes classifier

Gaussian is basically a typeof naive Bayes classifier

I'm not going to go into depth of this, because it'll just extend oursession too much more longer

Okay

And if you want to know more about this, I'll leave a link in the description box

You can read all about thecaution naive Bayes classifier

Now, the math behind this is the same

It uses naive Bayes, it usesthe Bayes Theorem itself

Now again, we're gonna call this class, and then we're going to run our data, training data on it

So using the naive Bayes classifier, we're getting an accuracy of 0

86 on the training data set

And on the testing data set,we're getting 67% accuracy

Okay

Now let's do the same thingwith support vector machines

Importing the support vector classifier

And we are fitting the trainingdata into the algorithm

We're getting an accuracy of around 61% on the training data set and33% on the testing data set

Now guys, this accuracy and all depends also on the problem statement

It depends on the type of data that support vector machines get

Usually, SVM is verygood on large data sets

Now since we have a verysmall data set over here, it's sort of obvious bythe accuracy, so less

So guys, these were a coupleof classification algorithms that I showed you here

Now, because our KNN classifier classified our data set more accurately we'll look at the predictionsthat the KNN classifier mean

Okay Now we're storing all our predicted values in the predict variable

now in order to show you the accuracy of the KNN model, we're going to us somethingknown as the confusion matrix

So, a confusion matrix is a table that is often used to describe the performance of a classification model

So, confusion matrix actually represents a tabular representation of actual versus predicted values

So when you draw a confusion matrix on the actual versus predicted values for the KNN classifier, this is what the confusionmatrix looks like

Now, we have four rows over here

If you see, we have four rows

The first row represents apples, second, mandarin, third represents lemons, and fourth, oranges

So this four value correspondsto zero comma zero, meaning that it wascorrectly able to classify all the four apples

Okay

This one value represents one comma one, meaning that our classifiercorrectly classified this as mandarins

This matrix is drawn on actual values versus predicted values

Now, if you look at the summaryof the confusion matrix, we'll get something knownas precision recall, f1-score and support

Precision is basically the ratio of the correctly predictedpositive observations to the total predictedpositive observations

So the correctly predictedpositive observations are four, and there are total of four apples in the testing data set

So that's where I get a precision of one

Okay

Recall on the other hand is the ratio of correctlypredicted positive observations to all the observations in the class

Again, we've correctlyclassified four apples, and there are a total of four apples

F1-score is nothing butthe weighted average of your precision and your recall

Okay, and your support basically denotes the number of data points that were correctly classified

So, in our KNN algorithm,since we got 100% accuracy, all our data points werecorrectly classified

So, 15 out of 15 were correctly classified because we have 100% accuracy

So that's how you read a confusion matrix

Okay, you have four important measures, precision, recall, f1-score, and support

F1-score is just the ratioor the weighted average of your precision and your recall

So precision is basicallythe correctly predicted positive observationsto the total predicted positive observations

Recall is a ratio of the predicted positive observations toall your observations

So guys, that was it for the demo of classification algorithms, we discuss regression algorithms and we discussedclassification algorithms

Now it's time to talk about unsupervised learning algorithms

Under unsupervised learning algorithms may try to solve clustering problems

And the most importantclustering algorithm there is, known as K-means clustering

So we're going to discussthe K-means algorithm, and also show you a demowhere we'll be executing the clustering algorithm, and you're seeing how itimplemented to solve a problem

Now, the main aim of the K-means algorithm is to group similar elementsor data points in to a cluster

So it is basically the process by which objects are classified interest a predefined number of groups, so that they are muchdissimilar as possible from one group to another group, but as much similar aspossible within each group

Now what I mean is let'ssay you're trying to cluster this population intofour different groups, such that each group has people within a specified range of age

Let's say group one is of peoplebetween the age 18 and 22

Similarly, group two is between 23 and 35

Group three is 36 and 39or something like that

So let's say you're trying to cluster people into differentgroups based on their age

So for such problems, you can make use of the K-means clustering algorithm

One of the major applicationsof the clustering algorithm is seen in targeted marketing

I don't know how many of you are aware of targeted marketing

Targeted marketing is all aboutmarketing a specific product to a specific audience

Let's say you're tryingto sell fancy clothes or a fancy set of bags and all of that

And the perfect audience for such product would be teenagers

It would be people aroundthe age of 16 to 21 or 18

So that is what targetmarketing is all about

Your product is marketedto a specific audience that might be interested in it

That is what targeted marketing is

So K means clustering is usemajorly in targeted marketing

A lot of eCommerce websiteslike Amazon, Flipkart, eBay

All of these make useof clustering algorithms in order to target the right audience

Now let's see how theK-means clustering works

Now the K in K-means denotesthe number of clusters

Let's say I give you a dataset containing 20 points, and you want to cluster thisdata set into four clusters

That means your K will be equal to four

So K basically stands forthe number of clusters in your data set, or the number of clustersyou want to form

You start by defining the number K

Now for each of these clusters, you're going to choose a centroid

So for every cluster, there are four cluster in our data set

For each of these clusters, you'll randomly selectone of the data points as a centroid

Now what you'll do is you'll start computing the distance from that centroid to every other point in that cluster

As you keep computing the centroid and the distance between the centroid and other data points in that cluster, your centroid keep shifting, because you're trying to getto the average of that cluster

Whenever you're trying to getto the average of the cluster, the centroid keeps shifting, because the centroid keepsconverging and it keeps shifting

Let's try to understand how K-means works

Let's say that this dataset, this is given to us

Let's say if you're givenrandom points like these and you're asked to usK-means algorithm on this

So your first step will be to decide the number ofclusters you want to create

So let's say I wanna createthree different clusters

So my K value will be equal to three

The next step will be to provide centroids of all the clusters

What you'll do is initiallyyou'll randomly pick three data points as your centroids for your three different clusters

So basically, this red denotesthe centroid for one cluster

Blue denotes a centroidfor another cluster

And this green dot denotes the centroid for another cluster

Now what happens in K-means, the algorithm will calculate the Euclidean distance ofthe points from each centroid and assign the pointsto the closest cluster

Now since we had three centroids here, now what you're gonna do is you're going to calculate the distance from each and every data point to all the centroids, and you're going to check which data point is closest to which centroid

So let's say your data point A is closest to the blue centroid

So you're going to assign the data point A to the blue cluster

So based on the distance between the centroid and the cluster, you're going to formthree different clusters

Now again, you're goingto calculate the centroid and you're going to form a new cluster which is from better clusters, because you're recomputingall those centroids

Basically, your centroids represent the mean of each of your cluster

So you need to make sure that your mean is actuallythe centroid of each cluster

So you'll keep recomputing this centroids until the position of yourcentroid does not change

That means that yourcentroid is actually the main or the average of that particular cluster

So that's how K-means works

It's very simple

All you have to do is you have to start by defining the K value

After that, you have to randomly pick the number of case centroids

Then you're going tocalculate the average distance of each of the datapoints from the centroids, and you're going to assign a data point to the centroid it is closest to

That's how K-means works

It's a very simple process

All you have to do is ushave to keep iterating, and you have to recomputethe centroid value until the centroid value does not change, until you get a constant centroid value

Now guys, again, in K-means, you make use of distancemeasures like Euclidean

I've already discussed whatEuclidean is all about

So, to summarize how K-means works, you start by pickingthe number of clusters

Then you pick a centroid

After that, you calculate the distance of the objects to the centroid

Then you group the datapoints into specific clusters based on their distance

You have to keep computing the centroid until each data point isassigned to the closest cluster, so that's how K-means works

Now let's look at the elbow method

The elbow method is basicallyused in order to find out the most optimum k valuefor a particular problem

So the elbow method isquite simple actually

You start off by computingthe sum of squared errors for some values of K

Now sum of squared error is basically the sum of the squared distance between each member of thecluster and its centroid

So you basically calculatethe sum of squared errors for different values of K

For example, you can consider K value as two, four, six, eight, 10, 12

Consider all these values, compute the sum of squarederrors for each of these values

Now if you plot your K value against your sum of squared errors, you will see that the errordecreases as K gets larger

This is because the numberof clusters increase

If the number of clusters increases, it means that the distortion gets smaller

The distortion keeps decreasing as the number of clusters increase

That's because the more clusters you have, the closer each centroidwill be with its data points

So as you keep increasingthe number of clusters, your distortion will also decrease

So the idea of the elbowmethod is to choose the K at which the distortiondecreases abruptly

So if you look at thisgraph at K equal to four, the distortion is abruptly decreasing

So this is how you find the value of K

When your distortion drops abruptly, that is the most optimal K value you should be choosing foryour problem statement

So let me repeat the ideabehind the elbow method

You're just going to graph thenumber of clusters you have versus the squared sum of errors

This graph will basicallygive you the distortion

Now the distortionobviously going to decrease if you increase the number of clusters, and there is gonna beone point in this graph wherein the distortiondecreases very abruptly

Now for that point, you needto find out the value of K, and that'll be your most optimal K value

That's how you choose your K-means K value and your KNN K value as well

So guys, this is how the elbow method is

It's very simple and itcan be easily implemented

Now we're gonna look at a small demo which involves K-means

This is actually a very interesting demo

Now guys, one interesting application of clustering is in colorcompression with images

For example, imagine you have an image with millions of colors in it

In most images, a large numberof colors will be unused, and many of the pixels in the image will have similar oreven identical colors

Now having too many colors in your image makes it very hard for imageprocessing an image analysis

So this is one area whereK-means is applied very often

It's applied in imagesegmentation, image analysis, image compression, and so on

So what we're gonna do in this demo is we are going to use an image from the scikit-learn data set

Okay, it is a prebuilt image, and you will require to installthe pillow package for this

We're going to use an image form the scikit-learn data set module

So we'll begin by importingthe libraries as usual, and we'll be loading our image as china

The image is china

jpg, and we'll be loading thisin a variable called china

So if you wanna look atthe shape of our image, you can run this command

So we're gonna get athree-dimensional value

So we're getting 427comma 640 comma three

Now this is basically athree dimensional array of size, height, width, and RGB

It contains red, blue,green contributions, as integers from zero to 255

So, your pixel valuesrange between zero and 255, and I think zero stands for your black, and 255 represents white if I'm not wrong

And basically, that's whatthis array shape denotes

Now one way we can view this set of pixels is as a cloud of points in athree dimensional color space

So what we'll do is wewill reshape the data and rescale the color, so that they lie between zero and one

So the output of this will bea two dimensional array now

So basically, we canvisualize these pixels in this color space

Now what we're gonna do is we're gonna try and plot our pixels

We have a really huge data set which contains around 16million possible colors

So this denotes a very,very large data set

So, let me show you what it looks like

We have red against greenand red against blue

These are our RGB value, and we can have around 16 million possible combination of colors

The data set is way toolarge or us to compute

So what we'll do is we willreduce these 16 million colors to just 16 colors

We can do that by usingK-means clustering, because we can cluster similarcolors into similar groups

So this is exactly wherewe'll be importing K-means

Now, one thing to note here is because we're dealing witha very large data set, we will use the MinibatchKMeans

This operates on subsets of the data to compute the results morequickly and more accurately, just like the K-means algorithm, because I told you thisdata set is really huge

Even though this is a single image, the number of pixel combinationscan come up to 16 million, which is a lot

Now each pixel isconsidered as a data point when you've taken imageinto consideration

When you have data points and data values, that's different

When you're starting an imagefor image classification or image segmentation, each and every pixel is considered

So, basically, you're building matrices of all of these pixel values

So having 16 million pixelsis a very huge data set

So, for that reason, we'llbe using the MinibatchKMeans

It's very similar to K-means

The only difference is that it'll operate on subsets of the data

Because the data set is toohuge, it'll operate on subsets

So, basically, we're making use of K-means in order to cluster these 16 million color combinations into just 16 colors

So basically, we're gonna form 16 clusters in this data set

Now, the result is therecoloring of the original pixel where every pixel is assigned the color of its closest cluster center

Let's say that thereare a couple of colors which are very close to green

So we're going to clusterall of these similar colors into one cluster

We'll keep doing thisuntil we get 16 clusters

So, obviously, to do this, we'll be using the clustering method, K-means

Let me show you whatthe output looks like

So, basically, this was the original image from the scikit data set, and this is the 16-color segmented image

Basically, we have only 16 colors here

Here we can have around 16 million colors

Here there are only 16 colors

If you can't also, you canonly see particular colors

Now obviously there's a lotof distortion over here, but this is how you study an image

Remove all the extra contrastthat is there in an image

You try to reduce the pixel to a smaller set of data as possible

The more varied pixels you have, the harder it is going to be for you to study the image for analysis

Now, obviously, there are some details which are lost in this

But overall, the imageis still recognizable

So here, basically, we've compressed this with a compression factorof around one million, because each cluster will have around one million data points in it, or pixel values in it, or pixels in it

Now this is an interestingapplication of K-means

There are actually better ways you can compress information on image

So, basically, I showed you this example because I want you to understand the power of K-means algorithm

You can cluster a dataset that is this huge into just 16 colors

Initially, there were 16 million, and now you can cluster it to 16 colors

So guys, K-means plays a very huge role in computer vision image processing, object detection, and so on

It's a very important algorithm when it comes to detecting objects

So in self-driving cars and all can make use of such algorithms

So guys, that was allabout unsupervised learning and supervised learning

Now it's the last typeof machine learning, which is reinforcement learning

Now this is actually a very interesting partof machine learning, and it is quite difference fromsupervised and unsupervised

So we'll be discussing all the concepts that are involved inreinforcement learning

And also reinforcement learning is a little more advanced

When I say advanced, Imean that it's been used in applications such as self-driving cars and is also a part of a lot of deep learning applications, such as AlphaGo and so on

So, reinforcement learning has a different concept to it itself

So we'll be discussingall the concepts under it

So just to brush up your information about reinforcement learning, reinforcement learning isa part of machine learning where an agent is put inan unknown environment, and he learns how tobehave in this environment by performing certain actionsand observing the rewards which it gets from these actions

Reinforcement learning is all about taking an appropriate action inorder to maximize the reward in a particular situation

Now let's understandreinforcement learning with an analogy

Let's consider a scenario wherein a baby is learning how to walk

This scenario can go aboutin two different ways

The first is baby starts walking and it makes it to the candy

And since the candy is the end goal, the baby is very happy and it's positive

Meaning, the baby is happy and it received a positive reward

Now, the second way this can go in is that the baby starts walking, but it falls due to some hurdle between

That's really cute

So the baby gets hurt andit doesn't get to the candy

It's negative because the baby is sad and it receives a negative reward

So just like how we humanslearn from our mistakes by trial and error, reinforcement learning is also similar

Here we have an agent, and in this case, the agent is the baby, and the reward is the candy with many hurdles in between

The agent is supposed to find the best possible pathto reach the reward

That is the main goal ofreinforcement learning

Now the reinforcement learning process has two important components

It has something known as an agent and something known as an environment

Now the environment is the setting that the agent is acting on, and the agent represents the reinforcement learning algorithm

The whole reinforcementlearning is basically the agent

The environment is the setting in which you place the agent, and it is the setting wherein the agent takes various action

The reinforcement learning process starts when the environmentsends a state to the agent

Now the agent, based onthe observations it makes, it takes an action inresponse to that state

Now, in turn, the environmentwill send the next state and the respectivereward back to the agent

Now the agent will update its knowledge with the reward returnedby the environment to evaluate its last actions

The loop continues until the environment sends a terminal state which means that theagent has accomplished all of its task

To understand this better, let's suppose that our agentis playing Counter Strike

The reinforcement learning process can be broken down into a couple of steps

The first step is thereinforcement learning agent, which is basically the player, he collects a state, Snaught, from the environment

So whenever you're playing Counter Strike, you start off withstage zero or stage one

You start off from the first level

Now based on this state, S naught, the reinforcement learning agent will take an action, A naught

So guys, action can beanything that causes a result

Now if the agent movesleft or right in the game, that is also considered as an action

So initially, the action will be random, because the agent has noclue about the environment

Let's suppose that you'replaying Counter Strike for the first time

You have no idea about how to play it, so you'll just start randomly

You'll just go with whatever, whichever action you think is right

Now the environment is now in a stage one

After passing stage zero, the environment will go into stage one

Once the environment updatesthe stage to stage on, the reinforcement learning agent will get a reward R onefrom the environment

This reward can be anythinglike additional points or you'll get additional weapons when you're playing Counter Strike

Now this reinforcementlearning loop will go on until the agent is dead orreaches the destination, and it continuously outputs a sequence of state action and rewards

This exactly howreinforcement learning works

It starts with the agentbeing put in an environment, and the agent will randomly take some action in state zero

After taking an action,depending on his action, he'll either get a reward and move on to state number one, or he will either die andgo back to the same state

So this will keep happening until the agent reaches the last stage, or he dies or reaches his destination

That's exactly howreinforcement learning works

Now reinforcement learning is the logic behind a lot of games these days

It's being implemented invarious games, such as Dota

A lot of you who playDota might know this

Now let's talk about a couple of reinforcement learningdefinitions or terminologies

So, first, we have somethingknown as the agent

Like I mentioned, an agent is the reinforcement learning algorithm that learns from trial and error

An agent is the onethat takes actions like, for example, a solider in Counter Strike navigating through the game, going right, left, and all of that

Is the agent taking some action? The environment is because the world through which the agent moves

Now the environment, basically, takes the agent's currentstate and action as input, and returns the agent's reward and its next state as the output

Next, we have something known as action

All the possible stepsthat an agent can take is considered as an action

Next, we have something known as state

Now the current conditionreturned by the environment is known as a state

Reward is an instantreturn from the environment to apprise the last action of the reinforcement learning agent

All of these terms arepretty understandable

Next, we have something known as policy

Now, policy is the approachthat the agent uses to determine the next action based on the current state

Policy is basically the approach with which you go aroundin the environment

Next, we have something known as value

Now, the expected long-termreturn with a discount, as opposed to the short-term rewards R, is known as value

Now, terms like discount and value, I'll be discussing in the upcoming slides

Action-value is also verysimilar to the value, except it takes an extra parameter known as the current action

Don't worry about action and Q value

We'll talk about all ofthis in the upcoming slides

So make yourself familiarwith these terms, because we'll be seeing awhole lot of them this session

So, before we move any further, let's discuss a couple of more reinforcement learning concepts

Now we have something knownas the reward maximization

So if you haven't realized it already, the basic aim ofreinforcement learning agent is to maximize the report

How does this happen? Let's try to understand thisin a little more detail

So, basically the agentworks based on the theory of reward maximization

Now that's exactly whythe agent must be trained in such a way that hetakes the best action, so that the reward is maximal

Now let me explain a reward maximization with a small example

Now in this figure, youcan see there is a fox, there is some meat, and there is a tiger

Our reinforcementlearning agent is the fox

His end goal is to eatthe maximum amount of meat before being eaten by the tiger

Now because the fox is a very clever guy, he eats the meat that is closer to him, rather than the meat whichis close to the tiger, because the closer he gets to the tiger, the higher are hischances of getting killed

That's pretty obvious

Even if the reward near thetiger are bigger meat chunks, that'll be discounted

This is exactly what discount is

We just discussed itin the previous slide

This is done because ofthe uncertainty factor that the tiger mightactually kill the fox

Now the next thing tounderstand is how discounting of a reward works

Now, in order to understand discounting, we define a discount rate called gamma

The value of gamma isbetween zero and one

And the smaller the gamma, the larger the discount and so on

Now don't worry about these concepts, gamma and all of that

We'll be seeing that inour practical demo today

So let's move on anddiscuss another concept known as exploration andexploitation trade-off

Now guys, before that, Ihope all of you understood reward maximization

Basically, the main aimbehind reinforcement learning is to maximize the rewardsthat an agent can get

Now, one of the most important concepts in reinforcement learning is the exploration andexploitation trade-off

Now, exploration, like the name suggests, it's about exploring and capturing more information about an environment

On the other hand, exploitation is about using the already knownexploited information to heighten your reward

Now consider the same example that we saw previously

So here the fox eats only the meat chunks which are close to him

He doesn't eat the bigger meat chunks which are at the top, even though the bigger meat chunks would get him more reward

So if the fox only focuseson the closest reward, he will never reachthe big chunks of meat

This process is known as exploitation

But if the fox decide to explore a bit, it can find the bigger reward, which is the big chunk of meat

This is known as exploration

So this is the differencebetween exploitation and exploration

It's always best if the agentexplores the environment, tries to figure out away in which we can get the maximum number of rewards

Now let's discussanother important concept in reinforcement learning, which is known as theMarkov's decision process

Basically, the mathematicalapproach for mapping a solution in reinforcement learning is called Markov's decision process

It's the mathematics behindreinforcement learning

Now, in a way, the purposeof reinforcement learning is to solve a Markov's decision process

Now in order to get a solution, there are a set of parameters in a Markov's decision process

There's a set of actions A, there's a set of states S, a reward R, policy pi, and value V

Also, this image represents how a reinforcement learning works

There's an agent

The agent take someaction on the environment

The environment, in turn,will reward the agent, and it will give him the next state

That's how reinforcement learning works so to sum everything up, what happens in Markov's decision process and reinforcement learning is the agent has to take an action A to transition from the start state to the end state S

While doing so, the agentwill receive some reward R for each action he takes

Now the series of actionthat are taken by the agent define the policy andthe rewards collected to find the value

The main goal here isto maximize the rewards by choosing the optimum policy

So you're gonna choosethe best possible approach in order to maximize the rewards

That's the main aim ofMarkov's decision process

To understand Markov's decision process, let's look at a small example

I'm sure all of you already know about the shortest path problem

We all had such problemsand concepts in math to find the shortest path

Now consider this representationover here, this figure

Here, our goal is tofind the shortest path between two nodes

Let's say we're trying to find the shortest path betweennode A and node D

Now each edge, as you can see, has a number linked with it

This number denotes the cost to traverse through that edge

So we need to choose apolicy to travel from A to D in such a way that our cost is minimum

So in this problem, the set of states are denoted by the nodes A, B, C, D

The action is to traversefrom one node to the other

For example, if you're going from to A C, there is an action

C to B is an action

B to D is another action

The reward is the costrepresented by each edge

Policy is the path takento reach the destination

So we need to make surethat we choose a policy in such a way that our cost is minimal

So what you can do is youcan start off at node A, and you can take baby stepsto reach your destination

Initially, only the nextpossible node is visible to you

So from A, you can either go to B or you can go to C

So if you follow the greedy approach and take the most optimum step, which is choosing A to C, instead of choosing A to B to C

Now you're at node C and you want to traverse to node D

Again, you must chooseyour path very wisely

So if you traverse from A to C, and C to B, and B to D, your cost is the lest

But if you traverse from A to C to D, your cost will actually increase

Now you need to choose a policy that will minimize your cost over here

So let's say, for example, the agent chose A to C to D

It came to node C, andthen it directly chose D

Now the policy followed byour agent in this problem is exploitation type, because we didn't explore the other notes

We just selected three nodesand we traversed through them

And the policy we followed is not actually an optimal policy

We must always explore more to find out the optimal policy

Even if the other nodes arenot giving us any more reward or is actually increasing our cost, we still have to explore and find out if those paths are actually better

That policy is actually better

The method that we implemented here is known as the policy-based learning

Now the aim here is tofind the best policy among all the possible policies

So guys, apart from policy-based, we also have value-based approach and action-based approach

Value based emphasizes onmaximizing the rewards

And in action base, we emphasize on each action taken by the agent

Now a point to note is that all of these learning approacheshave a simple end goal

The end goal is toeffectively guide the agent through the environment, and acquire the most number of rewards

So this was very simple to understand Markov's decision process, exploitation and exploration trade-off, and we also discussed the different reinforcementlearning definitions

I hope all of this was understandable

Now let's move on and understand an algorithm known asQ-learning algorithm

So guys, Q-learning is one ofthe most important algorithms in reinforcement learning

And we'll discuss this algorithm with the help of a small example

We'll study this example, and then we'll implement thesame example using Python, and we'll see how it works

So this is how ourdemonstration looks for now

Now the problem statementis to place an agent in any one of the rooms numbered zero, one, two, three, and four

And the goal is for the agent to reach outside the building, which is room number five

So, basically, this zero,one, two, three, four represents the building, and five represents a roomwhich is outside the building

Now all these roomsare connected by those

Now these gaps that yousee between the rooms are basically those, and each room is numberedfrom zero to four

The outside of the building can be taught of as a big room which is room number five

Now if you've noticed this diagram, the door number one and door number four lead directly to room number five

From one, you can directly go to five, and from four, also, youcan directly go to five

But if you want to go tofive from room number two, then you'll first have togo to room number three, room number one, andthen room number five

So these are indirect links

Direct links are from roomnumber one and room number four

So I hope all of you are clearwith the problem statement

You're basically going to have a reinforcement learning agent, and than agent has totraverse through all the rooms in such a way that hereaches room number five

To solve this problem, first, what we'll do iswe'll represent the rooms on a graph

Now each room is denoted as anode, and the links that are connectingthese nodes are the doors

Alright, so we have node one to five, and the links between each of these nodes represent the doors

So, for example, if you lookat this graph over here, you can see that thereis a direct connection from one to five, meaning that you can directlygo from room number one to your goal, which is room number five

So if you want to go fromroom number three to five, you can either go to room number one, and then go to five, or you can go from roomnumber three to four, and then to five

So guys, remember, end goalis to reach room number five

Now to set the room numberfive as the goal state, what we'll do is we'llassociate a reward value to each door

The doors that leadimmediately to the goal will have an instant reward of 100

So, basically, one to fivewill have a reward of hundred, and four to five will alsohave a reward of hundred

Now other doors that are not directly connected to the target room will have a zero reward, because they do not directlylead us to that goal

So let's say you placed theagent in room number three

So to go from room number three to one, the agent will get a reward of zero

And to go from one to five, the agent will get a reward of hundred

Now because the doors are two-way, the two arrows are assigned to each room

You can see an arrowgoing towards the room and one coming from the room

So each arrow contains an instant reward as shown in this figure

Now of course room number five will loop back to itselfwith a reward of hundred, and all other directconnections to the goal room will carry a reward of hundred

Now in Q-learning, thegoal is to reach the state with the highest reward

So that if the agent arrives at the goal, it will remain there forever

So I hope all of you areclear with this diagram

Now, the terminologies in Q-learning include two terms, state and action

Okay, your room basicallyrepresents the state

So if you're in state two, it basically means thatyou're in room number two

Now the action is basicallythe moment of the agent from one room to the other room

Let's say you're goingfrom room number two to room number three

That is basically an action

Now let's consider some more example

Let's say you place theagent in room number two and he has to get to the goal

So your initial statewill be state number two or room number two

Then from room number two,you'll go to room number three, which is state three

Then from state three, you caneither go back to state two or go to state one or state four

If you go to state four, fromthere you can directly go to your goal room, which is five

This is how the agentis going to traverse

Now in order to depict therewards that you're going to get, we're going to create a matrixknown as the reward matrix

Okay, this is represented by R or also known as the R matrix

Now the minus one in thistable represents null values

That is basically where there isn't a link between the nodes that isrepresented as minus one

Now there is no linkbetween zero and zero

That's why it's minus one

Now if you look at this diagram, there is no direct link from zero to one

That's why I've put minusone over here as well

But if you look at zero comma four, we have a value of zero over here, which means that you cantraverse from zero to four, but your reward is going to be zero, because four is not your goal state

However, if you look at the matrix, look at one comma five

In one comma five, we havea reward value of hundred

This is because you can directly go from room number one to five, and five is the end goal

That's why we've assigneda reward of hundred

Similarly, for four comma five, we have a reward of hundred

And for five comma five, we have a reward of hundred

Zeroes basically represent other links, but they are zero because theydo not lead to the end goal

So I hope you all understoodthe reward matrix

It's very simple

Now before we move any further, we'll be creating another matrix known as the equitable Q matrix

Now the Q matrix basicallyrepresents the memory of what the agent haslearned through experience

The rules of the Q matrix will represent the currentstate of the agent

The columns will representthe next possible actions leading to the next state, and the formula to calculate the Q matrix is this formula, right? Here we have Q state comma action, R state comma action, which is nothing but the reward matrix

Then we have a parameterknown as the Gamma parameter, which I'll explain shortly

And then we are multiplyingthis with a maximum of Q next state comma all actions

Now don't worry if you haven'tunderstood this formula

I'll explain this with a small example

For now, let's understandwhat a Gamma parameter is

So, basically, the value of Gamma will be between zero and one

If Gamma is closer to zero, it means that the agentwill tend to consider only immediate rewards

Now, if the Gamma is closer to one, it means that the agentwill consider future rewards with greater weight

Now what exactly I'm trying to say is if Gamma is closer to one, then we'll be performingsomething known as exploitation

I hope you all remember what exploitation and exploration trade-off is

So, if your gamma is closer to zero, it means that the agent is not going to explore the environment

Instead, it'll justchoose a couple of states, and it'll just traversethrough those states

But if your gammaparameter is closer to one, it means that the agent will traverse through all possible states, meaning that it'll perform exploration, not exploitation

So the closer your gammaparameter is to one, the more your agent will explore

This is exactly what Gamma parameter is

If you want to get the best policy, it's always practical thatyou choose a Gamma parameter which is closer to one

We want the agent toexplore the environment as much as possible so that it can get the bestpolicy and the maximum rewards

I hope this is clear

Now let me just tell you what a Q-learningalgorithm is step by step

So you begin the Q-learning algorithm by setting the Gamma parameter and the environment rewards in matrix R

Okay, so, first, you'llhave set these two values

We've already calculatedthe reward matrix

We need to set the Gamma parameter

Next, you'll initializethe matrix Q to zero

Now why do you do this? Now, if you remember, I said that Q matrix is basicallythe memory of the agent

Initially, obviously, the agent has no memoryof the environment

It's new to the environment and you're placing it randomly anywhere

So it has zero memory

That's why you initializethe matrix Q to zero

After that, you'll selecta random initial state, and you place your agentin that initial state

Then you'll set this initialstate as your current state

Now from the current state,you'll select some action that will lead you to the next state

Then you'll basicallyget the maximum Q value for this next state, based on all the possibleactions that we take

Then you'll keep computing the skew value until you reach the goals state

Now that might be a little bit confusing, so let's look at this entirething with a small example

Let's say that first, you're gonna begin with setting your Gamma parameter

So I'm setting my Gamma parameter to 0

8 which is pretty close to one

This means that our agentwill explore the environment as much as possible

And also, I'm setting theinitial state as room one

Meaning, I'm in stateone or I'm in room one

So basically, your agent isgoing to be in room number one

The next step is to initializethe Q matrix as zero matrix

So this is a Q matrix

You can see thateverything is set to zero, because the agent has no memory at all

He hasn't traversed to any node, so he has no memory

Now since the agent is in room one he can either go to room number three or he can go to room number five

Let's randomly select room number five

So, from room number five, you're going to calculatethe maximum Q value for the next state basedon all possible actions

So all the possible actionsfrom room number five is one, four, and five

So, basically, the traversingfrom Q one comma five, that's why I put one comma five over here, state comma action

Your reward matrix willhave R one comma five

Now R one comma five is basically hundred

That's why I put hundred over here

Now your comma parameter is 0

8

So, guys, what I'm doing here is I'm just substitutingthe values in this formula

So let me just repeat this whole thing

Q state comma action

So you're in state number one, correct? And your action is you'regoing to room number five

So your Q state commaaction is one comma five

Again, your reward matrix Rone comma five is hundred

So here's you're gonna put hundred, plus your Gamma parameter

Your Gamma parameter is 0

8

Then you're going tocalculate the maximum Q value for the next state basedon all possible actions

So let's look at the next state

From room number five,you can go to either one

You can go to four or you can go to five

So your actions are fivecomma one, five comma four, and five comma five

That's exactly what I mentioned over here

Q five comma one, Q five commafour, and Q five comma five

You're basically putting allthe next possible actions from state number five

From here, you'll calculate the maximum Q value that you'regetting for each of these

Now your Q value is zero, because, initially, yourQ matrix is set to zero

So you're going to getzero for Q five comma one, five comma four, and five comma five

So that's why you'll get 0

8 and zero, and hence your Q one commafive becomes hundred

This hundred comes from R one comma five

I hope all of you understood this

So next, what you'll do is you'll update this one comma fivevalue in your Q matrix, because you just calculatedQ one comma five

So I've updated it over here

Now for the next episode, we'll start with a randomlychosen initial state

Again, let's say that we randomlychose state number three

Now from room number three, you can either go to roomnumber one, two or four

Let's randomly select room number one

Now, from room number five, you'll calculate the maximum Q value for the next possible actions

So let's calculate the Q formula for this

So your Q state comma actionbecomes three comma one, because you're in state number three and your action is you'regoing to room number one

So your R three comma one, let's see what R three comma one is

R three comma one is zero

So you're going to put zero over here, plus your Gamma parameter, which is 0

8, and then you're going to checkthe next possible actions from room number one, and you're going tochoose the maximum value from these two

So Q one comma three and Q one comma five denote your next possibleactions from room number one

So Q one comma three is zero, but Q one comma five is hundred

So we just calculated thishundred in the previous step

So, out of zero and hundred, hundred is your maximum value, so you're going to choose hundred

Now 0

8 into hundred is nothing but 80

So again, your Q matrix gets updated

You see an 80 over here

So, basically what you're doingis as you're taking actions, you're updating your Q value, you're just calculatingthe Q value at every step, you're putting it in your Q matrix so that your agent remembers that, okay, when I went from roomnumber one to room number five, I had a Q value of hundred

Similarly, three to onegave me a Q value of 80

So basically, this Q matrix represents the memory of your agent

I hope all of you are clear with this

So basically, what we're gonna do is we're gonna keepiterating through this loop until we've gone throughall possible states and reach the goal state, which is five

Also, our main aim here is tofind the most optimum policy to get to room number five

Now let's implement the exactsame thing using Python

So that was a lot of theory

Now let's understand howthis is done practically

Alright, so we begin byimporting your library

We're gonna be using theNumPy library over here

After that, we'll import the R matrix

We've already created the R matrix

This is the exact matrix that I showed you a couple of minutes ago

So I've created a matrix called R and I've basically storedall the rewards in it

If you want to see the Rmatrix, let me print it

So, basically, this is your R matrix

If you remember, node one to five, you have a reward of hundred

Node four to five, youhave a reward of hundred, and five to five, youhave a reward of hundred, because all of these nodesdirectly lead us to the reward

Correct? Next, what we're doing iswe're creating a Q matrix which is basically a six into six matrix

Which represents all thestates, zero to five

And this matrix is basically zero

After, that we're settingthe Gamma parameter

Now guys, you can playaround with this code, and you know you canchange the comma parameter to 0

7 or 0

9 and see how much morethe agent will explore or whether you perform exploitation

Here I've set the Gamma parameter 0

8 which is a pretty good number

Now what I'm doing is I'msetting the initial state as one

You can randomly choose this state according to your needs

I've set the initial state as one

Now, this function will basically give me all the available actionsfrom my initial state

Since I've set my initial state as one, It'll give me all the possible actions

Here what I'm doing is sincemy initial state is one, I'm checking in my row number one, which value is equal tozero or greater than zero

Those denote my available actions

So look at our row number one

Here we have one zero andwe have a hundred over here

This is one comma four andthis is one comma five

So if you look at the row number one, since I've selected theinitial state as one, we'll consider row number one

Okay, what I'm doing is in row number one, I have two numbers whichare either equal to zero or greater than zero

These denote my possible actions

One comma three has the value of zero and one comma five hasthe value of hundred, which means that the agent can either go to room number three or it can go to room number five

What I'm trying to sayis from room number one, you can basically go to room number three or room number five

This is exactly what I've coded over here

If you remember the reward matrix, from one you can traverse toonly room number three directly and room number five directly

Okay, that's exactly what I've mentioned in my code over here

So this will basically giveme the available actions from my current state

Now once I've moved to me next state, I need to check the availableactions from that state

What I'm doing overhere is basically this

If you're remember, from room number one, we cango to three and five, correct? And from three and five, I'll randomly select the state

And from that state, I need to find out all possible actions

That's exactly what I've done over here

Okay

Now this will randomlychoose an action for me from all my available actions

Next, we need to update our Q matrix, depending on the actions that we took, if you remember

So that's exactly what thisupdate function is four

Now guys, this entire isfor calculating the Q value

I hope all of you remember the formula, which is Q state comma action, R state comma action plusGamma into max value

Max value will basicallygive me the maximum value out of the all possible actions

I'm basically computing this formula

Now this will just update the Q matrix

Coming to the training phase, what we're gonna do is weare going to set a range

Here I've set a range of 10,000, meaning that my agent willperform 10,000 iterations

You can set this dependingon your own needs, and 10,000 iteration isa pretty huge number

So, basically, my agentis going to go through 10,000 possible iterations in order to find the best policy

Now this is the exact samething that we did earlier

We're setting the current state, and then we're choosingthe available action from the current state

The from there, we'llchoose an action at random

Here we'll calculate a Q value and we'll update theQ value in the matrix

Alright

And here I'm doing nothing, but I'm printing the trained Q matrix

This was the training phase

Now the testing phase, basically, you're going to randomlychoose a current state

You're gonna choose a current state, and you're going to keep looping through this entire code, until you reach the goal state,which is room number five

That's exactly what I'mdoing in this whole thing

Also, in the end, I'mprinting the selected part

That is basically thepolicy that the agent took to reach room number five

Now if I set the current state as one, it should give me the best policy to reach to room numberfive from room number one

Alright, let's run this code, and let's see if it's giving us that

Now before that happens, Iwant you to check and tell me which is the best possible way to get from room numberone to room number five

It's obviously directly like this

One to five is the best policyto get from room number one to room number five

So we should get anoutput of one comma five

That's exactly what we're getting this is a Q matrix with all the Q values, and here we are getting the selected path

So if your current state is one, your best policy is togo from one to five

Now, if you want tochange your current state, let's say we set the current state to two

And before we run the code, let's see which is the best possible way to get to room numberfive from room number two

From room number two, you can go to three, then you can go to one, andthen you can go to five

This will give you a reward of hundred, or you can go to room number three, then go to four, and then go to five

This will also give youa reward of hundred

Our path should be something like that

Let's save it and let's run the file

So, basically, from stage two, you're going to say three, then to four, and then to five

This is our best possible path from two to room number five

So, guys, this is exactly how the Q learning algorithm works, and this was a simple implementation of the entire examplethat I just told you

Now if any of you still have doubts regarding Q learning orreinforcement learning, make sure you comment themin the comment section, and I'll try to answer all of your doubts

No we're done with machine learning

We've completed the wholemachine learning model

We've understood reinforcement learning, supervised learning,unsupervised learning, and so on

Before I'll get to deep learning, I want to clear a verycommon misconception

A lot of people get confused between AI machinelearning and deep learning, because, you know,artificial intelligence, machine learning and deep learning are very common applications

For example, Siri is an application of artificial intelligence, machine learning, and deep learning

So how are these three connected? Are they the same thing or how exactly is the relationship betweenartificial intelligence, machine learning, and deep learning? This is what I'll be discussing

now artificial intelligenceis basically the science of getting machines to mimicthe behavior of human beings

But when it comes to machine learning, machine learning is a subsetof artificial intelligence that focuses on gettingmachines to make decisions by feeding them data

That's exactly what machine learning is

It is a subset of artificial intelligence

Deep learning, on the other hand, is a subset of machine learning that uses the concept of neural networks to solve complex problems

So, to sum it up, artificial intelligence, machine learning, and deep learning, are interconnected fields

Machine learning and deep learning aids artificial intelligence by providing a set ofalgorithms and neural networks to solve data-driven problems

That's how AI, machinelearning, and deep learning are related

I hope all of you havecleared your misconceptions and doubts about AI,ML, and deep learning

Now let's look at our next topic, which is limitations of machine learning

Now the first limitationis machine learning is not capable enough tohandle high dimensional data

This is where the input andthe output is very large

So handling and processingsuch type of data becomes very complex and it takes up a lot of resources

This is also sometimes knownas the curse of dimensionality

So, to understand this in simpler terms, look at the image shown on this slide

Consider a line of hundred yards and let's say that you droppeda coin somewhere on the line

Now it's quite convenientfor you to find the coin by simply walking along the line

This is very simple becausethis line is considered as single dimensional entity

Now next, you consider that you have a square of hundred yards, and let's say you dropped acoin somewhere in between

Now it's quite evident thatyou're going to take more time to find the coin within that square as compared to the previous scenario

The square is, let's say,a two dimensional entity

Let's take it a step aheadand let's consider a cube

Okay, let's say there'sa cube of 500 yards and you have dropped a coinsomewhere in between this cube

Now it becomes even more difficult for you to find the coin this time, because this is a threedimensional entity

So, as your dimension increases, the problem becomes more complex

So if you observe that the complexity is increasing the increasein your dimensions, and in real life, thehigh dimensional data that we're talking abouthas thousands of dimensions that makes it very complexto handle and process

and a high dimensional data can easily be found in usedcases like image processing, natural language processing,image translation, and so on

Now in K-means itself, we saw that we had 16million possible colors

That is a lot of data

So this is why machinelearning is restricted

It cannot be used in theprocess of image recognition because image recognition andimages have a lot of pixels and they have a lot ofhigh dimensional data

That's why machine learningbecomes very restrictive when it comes to such uses cases

Now the second major challengeis to tell the computer what are the features it should look for that will play an important role in predicted the outcome andin getting a good accuracy

Now this process is somethingknown as feature extraction

Now feeding raw data tothe algorithm rarely works, and this is the reasonwhy feature extraction is a critical part ofmachine learning workflow

Now the challenge for theprogrammer here increases because the effectiveness of the algorithm depends on how insightfulthe programmer is

As a programmer, you have to tell the machinethat these are the features

And depending on these features, you have to predict the outcome

That's how machine learning works

So far, in all our demos, we saw that we were providingpredictor variables

we were providing input variables that will help us predict the outcome

We were trying to findcorrelations between variables, and we're trying to find out the variable that is very important inpredicting the output variable

So this becomes a challengefor the programmer

That's why it's very difficult to apply machine learning model to complex problems like object recognition,handwriting recognition, natural language processing, and so on

Now all these problems and all these limitationsin machine learning led to the introduction of deep learning

Now we're gonna discussabout deep learning

Now deep learning isone of the only methods by which we can overcome the challenges of feature extraction

This is because deep learning models are capable of learning to focus on the right features by themselves, which requires very littleguidance from the programmer

Basically, deep learning mimicsthe way our brain functions

That is it learns from experience

So in deep learning, what happens is feature extraction happens automatically

You need very littleguidance by the programmer

So deep learning will learn the model, and it will understand whichfeature or which variable is important in predicting the outcome

Let's say you have millionsof predictor variables for a particular problem statement

How are you going to sit down and understand the significance of each of these predictor variables it's going to be almost impossible to sit down with so many features

That's why we have deep learning

Whenever there's high dimensionality data or whenever the data is really large and it has a lot of features and a lot of predictorvariables, we use deep learning

Deep learning will extractfeatures on its own and understand whichfeatures are important in predicting your output

So that's the main ideabehind deep learning

Let me give you a small example also

Suppose we want to make a system that can recognize theface of different people in an image

Okay, so, basically,we're creating a system that can identify the faces ofdifferent people in in image

If we solve this by using the typical machine learning algorithms, we'll have to definefacial features like eyes, nose, ears, et cetera

Okay, and then the system will identify which features are moreimportant for which person

Now, if you consider deeplearning for the same example, deep learning will automaticallyfind out the features which are important for classification, because it uses theconcept of neural networks, whereas in machine learning we have to manually define these features on our own

That's the main differencebetween deep learning and machine learning

Now the next question ishow does deep learning work? Now when people startedcoming up with deep learning, their main aim was tore-engineer the human brain

Okay, deep learning studiesthe basic unit of a brain called the brain cell or a neuron

All of you biology students will know what I'm talking about

So, basically, deep learning is inspired from our brain structure

Okay, in our brains, we havesomething known as neurons, and these neurons arereplicated in deep learning as artificial neurons, which are also called perceptrons

Now, before we understand howartificial neural networks or artificial neurons work, let's understand how thesebiological neurons work, because I'm not sure how many of you are bio students over here

So let's understand thefunctionality of biological neurons and how we can mimic this functionality in a perceptron or inan artificial neuron

So, guys, if you loo at this image, this is basically an imageof a biological neuron

If you focus on the structureof the biological neuron, it has something known dendrites

These dendrites are basicallyused to receive inputs

Now the inputs are basicallyfound in the cell body, and it's passed on thenext biological neuron

So, through dendrites, you'regoing to receive signals from other neurons, basically, input

Then the cell body willsum up all these inputs, and the axon will transmitthis input to other neurons

The axon will fire upthrough some threshold, and it will get passedonto the next neuron

So similar to this, a perceptronor an artificial neuron receives multiple inputs, and applies varioustransformations and functions and provides us an output

These multiple inputs arenothing but your input variables or your predictor variables

You're feeding input datato an artificial neuron or to a perceptron, and this perceptron will apply various functions and transformations, and it will give you an output

Now just like our brain consists of multiple connected neuronscalled neural networks, we also build something known as a network of artificial neurons called artificial neural networks

So that's the basic conceptbehind deep learning

To sum it up, whatexactly is deep learning? Now deep learning is a collection of statistical machine learning techniques used to learn featurehierarchies based on the concept of artificial neural networks

So the main idea behind deep learning is artificial neural networks which work exactly likehow our brain works

Now in this diagram, you can see that there are a couple of layers

The first layer is knownas the input layer

This is where you'llreceive all the inputs

The last layer is knownas the output layer which provides your desired output

Now, all the layers which arethere between your input layer and your output layer areknown as the hidden layers

Now, they can be anynumber of hidden layers, thanks to all the resourcesthat we have these days

So you can have hundreds ofhidden layers in between

Now, the number of hidden layers and the number of perceptronsin each of these layers will entirely depend on the problem or on the use case thatyou're trying to solve

So this is basicallyhow deep learning works

So let's look at theexample that we saw earlier

Here what we want to dois we want to perform image recognition using deep networks

First, what we're gonnado is we are going to pass this high dimensionaldata to the input layer

To mach the dimensionalityof the input data, the input layer will contain multiple sub layers of perceptrons so that it consume the entire input

Okay, so you'll have multiplesub layers of perceptrons

Now, the output receivedfrom the input layer will contain patterns andwill only be able to identify the edges of the images,based on the contrast levels

This output will then be fedto hidden layer number one where it'll be able toidentify facial features like your eyes, nose,ears, and all of that

Now from here, the output will be fed to hidden layer number two, where it will be able to form entire faces it'll go deeper into face recognition, and this output of the hidden layer will be sent to the output layer or any other hidden layer that is there before the output layer

Now, finally, the output layerwill perform classification, based on the result that you'd get from your previous layers

So, this is exactly howdeep learning works

This is a small analogy that I use to make you understandwhat deep learning is

Now let's understand what asingle layer perceptron is

So like I said, perceptron is basically an artificial neuron

For something known as single layer and multiple layer perceptron, we'll first focus onsingle layer perceptron

Now before I explain whata perceptron really is, you should known that perceptronsare linear classifiers

A single layer perceptron is a linear or a binary classifier

It is used mainly in supervised learning, and it helps to classifythe given input data into separate classes

So this diagram basicallyrepresents a perceptron

A perceptron has multiple inputs

It has a set of inputslabeled X one, X two, until X n

Now each of these input isgiven a specific weight

Okay, so W one representsthe weight of input X one

W two represents the weightof input X two, and so on

Now how you assign these weights is a different thing altogether

But for now, you needto know that each input is assigned a particular weightage

Now what a perceptron doesis it computes some functions on these weighted inputs, andit will give you the output

So, basically, these weighted inputs go through something known as summation

Okay, summation is nothing but the product of each of your input withits respective weight

Now after the summation is done, this passed onto transfer function

A transfer function is nothingbut an activation function

I'll be discussing moreabout this in a minute

The activation function

And from the activation function, you'll get the outputsY one, Y two, and so on

So guys, you need tounderstand four important parts in a perceptron

So, firstly, you have the input values

You have X one, X two, X three

You have something knownas weights and bias, and then you have somethingknown as the net sum and finally the activation function

Now, all the inputs X are multiplied withthe respective weights

So, X one will be multiplied with W one

This is known as the summation

After this, you'll addall the multiplied values, and we'll call them as the weighted sum

This is done using the summation function

Now we'll apply the weighted sum to a correct activation function

Now, a lot of people have a confusion about activation function

Activation function is alsoknown as the transfer function

Now, in order to understandactivation function, this word stems from the way neurons in a human brain work

The neuron becomes activate only after a certain potential is reached

That threshold is known asthe activation protection

Therefore, mathematically, it can be represented by a function that reaches saturation after a threshold

Okay, we have a lot ofactivation functions like signum, sigmoid,tan, hedge, and so on

You can think of activation function as a function that maps the input to the respective output

And now I also spokeabout weights and bias

Now why do we assign weightsto each of these inputs? What weights do is they show a strength of a particular input, or how important a particular input is for predicting the final output

So, basically, the weightage of an input denotes the importance of that input

Now, our bias basically allows us to shift the activation function in order to get a precise output

So that was all about perceptrons

Now in order to make youunderstand perceptrons better, let's look at a small analogy

Suppose that you wanna go to a party happening near your hose

Now your decision willdepend on a set of factors

First is how is the weather

Second probably is yourwife, or your girlfriend, or your boyfriend going with you

And third, is there anypublic transport available? Let's say these are the three factors that you're going to considerbefore you go to a party

So, depending on these predictor variables or these features, you're going to decide whetheryou're going to stay at home or go and party

Now, how is the weather isgoing to be your first input

We'll represent this with a value X one

Is your wife going withyou is another input X two

Any public transport is available is your another input X three

Now, X one will have twovalues, one and zero

One represents that the weather is good

Zero represents weather is bad

Similarly, one representsthat your wife is going, and zero represents thatyour wife is not going

And in X three, again, one represents that there is public transport, and zero represents thatthere is no public transport

Now your output willeither be one or zero

One means you are going to the party, and zero means you willbe sitting at home

Now in order to understand weightage, let's say that the mostimportant factor for you is your weather

If the weather is good, it means that you will100% go to the party

Now if you weather is not good, you've decided that you'll sit at home

So the maximum weightage isfor your weather variable

So if your weather is really good, you will go to the party

It is a very importantfactor in order to understand whether you're going to sit at home or you're going to go to the party

So, basically, if X one equal to one, your output will be one

Meaning that if your weather is good, you'll go to the party

Now let's randomly assignweights to each of our input

W one is the weightassociated with input X one

W two is the weight with X two and W three is the weightassociated with X three

Let's say that your W one is six, your W two is two, and W three is two

Now by using the activation function, you're going to set a threshold of five

Now this means that it will fire when the weather is good and won't fire if the weather is bad, irrespective of the other inputs

Now here, because your weightage is six, so, basically, if youconsider your first input which has a weightage of six, that means you're 100% going to go

Let's say you're consideringonly the second input

This means that you're not going to go, because your weightage is twoand your threshold is five

So if your weightage isbelow your threshold, it means that you're not going to go

Now let's consider another scenario where our threshold is three

This means that it'll fire when either X one is high or the other two inputs are high

Now W two is associated withyour wife is going or not

Let's say the weather is bad and you have no public transportation, meaning that your x oneand x three is zero, and only your x two is one

Now if your x two is one, your weightage is going to be two

If your weightage is two, you will not go because thethreshold value is set to three

The threshold value isset in such a way that if X two and X threeare combined together, only then you'll go, or only if x one is true, then you'll go

So you're assigningthreshold in such a way that you will go for sureif the weather is good

This is how you assign threshold

This is nothing but youractivation function

So guys, I hope all of you understood, the most amount of weightage is associated with theinput that is very important in predicting your output

This is exactly how a perceptron works

Now let's look at thelimitations of a perceptron

Now in a perceptron, thereare no hidden layers

There's only an input layer, and there is an output layer

We have no hidden layers in between

And because of this, you cannot classify non-linearly separable data points

Okay, if you have data,like in this figure, how will you separate this

You cannot use a perceptron to do this

Alright, so complex problems that involve a lot of parameters cannot be solved by a single layer perceptron

That's why we need something known as multiple layer perceptron

So now we'll discuss something known as multilayer perceptron

A multilayer perceptronhas the same structure of a single layer perceptron, but with one or more hidden layer

Okay, and that's why it'sconsider as a deep neural network

So in a single layer perceptron, we had only input layer, output layer

We didn't have any hidden layer

Now when it comes tomulti-layer perceptron, there are hidden layers in between, and then there is the output layer

It was in this similarmanner, like I said, first, you'll have theinput X one, X two, X three, and so on

And each of these inputswill be assigned some weight

W one, W two, W three, and so on

Then you'll calculatethe weighted summation of each of these inputs and their weights

After that, you'll sendthem to the transformation or the activation function, and you'll finally get the output

Now, the only thing isthat you'll have multiple hidden layers in between, one or more than one hidden layers

So, guys, this is how amultilayer perceptron works

It works on the concept offeed forward neural networks

Feed forward meansevery node at each level or each layer is connectedto every other node

So that's what feed forward networks are

Now when it comes to assigning weights, what we do is we randomly assign weights

Initially we have inputX one, X two, X three

We randomly assign someweight W one, W two, W three, and so on

Now it's always necessarythat whatever weights we assign to our input, those weights are actually correct, meaning that those weightsare company significant in predicting your output

So how a multilayer perceptron works is a set of inputs are passedto the first hidden layer

Now the activations fromthat layer are passed through the next layer

And from that layer, it'spassed to the next hidden layer, until you reach the output layer

From the output layer,you'll form the two classes, class one and class two

Basically, you'll classify your input into one of the two classes

So that's how a multilayerperceptron works

A very important concept themultiple layer perceptron is back propagation

Now what is back propagation

Back propagation algorithm is a supervised learning methodfor multilayer perceptrons

Okay, now why do we need back propagation? So guys, when we aredesigning a neural network in the beginning, we initialize weights with some random values, orany variable for that fact

Now, obviously, we need tomake sure that these weights actually are correct, meaning that these weightsshow the significance of each predictor variable

These weights have to fit our model in such a way that ouroutput is very precise

So let's say that we randomly selected some weights in the beginning, but our model outputis much more different than our actual output, meaning that our error value is very huge

So how will you reduce this error

Basically, what you need to do is we need to somehow explain to the model that we need to change the weight in such a way that theerror becomes minimum

So the main thing is theweight and your error is very highly related

The weightage that you give to each input will show how much erroris there in your output, because the most significant variables will have the highest weightage

And if the weightage is not correct, then your output is also not correct

Now, back propagation is away to update your weights in such a way that your outcome is precise and your error is reduced

So, in short backpropagation is used to train a multilayer perceptron

It's basically use to update your weights in such a way that youroutput is more precise, and that your error is reduced

So training a neural networkis all about back propagation

So the most common deep learning algorithm for supervised training ofthe multilayer perceptron is known as back propagation

So, after calculating theweighted sum of inputs and passing them throughthe activation function, we propagate backwardsand update the weights to reduce the error

It's as simple as that

So in the beginning, you'regoing to assign some weights to each of your input

Now these inputs will gothrough the activation function and it'll go through all the hidden layers and give us an output

Now when you get the output, the output is not very precise, or it is not the desired output

So what you'll do isyou'll propagate backwards, and you start updating your weights in such a way that your error is as minimum as possible

So, I'm going to repeat this once more

So the idea behind back propagation is to choose weights in such a way that your error gets minimized

To understand this, we'lllook at a small example

Let's say that we have a dataset which has these labels

Okay, your input is zero, one, two, but your desired outputis zero, one, and four now the output of your model when W equal to three is like this

Notice the differencebetween your model output and your desired output

So, your model output is three, but your desired output is two

Similarly, when your model output is six, your desired output issupposed to be four

Now let's calculate the errorwhen weight is equal to three

The error is zero over here because your desired output is zero, and your model output is also zero

Now the error in the second case is one

Basically, your model outputminus your desired output

Three minus two, your error is one

Similarly, your error forthe third input is two, which is six minus four

When you take the square, this is actually a very huge difference, your error becomes larger

Now what we need to do is we need to update the weight value in such a way that our error decreases

Now here we've consideredthe weight as four

So when you consider the weight as four, your model output becomeszero, four, and eight

Your desired output iszero, two, and four

So your model output becomeszero, four, and eight, which is a lot

So guys, I hope you all know how to calculate the output over here

What I'm doing is I'mmultiplying the input with your weightage

The weightage is four, so zero into four will give me zero

One into four will give me four, and two into four will give me eight

That's how I'm getting mymodel output over here

For now, this is how I'mgetting the output over here

That's how you calculate your weightage

Now, here, if you seethat our desire output is supposed to be zero, two, and four, but we're getting an outputof zero, four, and eight

So our error is actually increasing as we increase our weight

Our error four W equal to four have become zero, four, and 16, whereas the error for W equal to three, zero, one, and four

I mean the square error

So if you look at this, aswe increase our weightage, our error is increasing

So, obviously, we know that there is no point in increasingthe value of W further

But if we decrease the value of W, our error actually decreases

Alright, if we give a weightage of two, our error decreases

If we can find a relationshipbetween our weight and error, basically, if you increase the weight, your error also increases

If you decrease the weight,your error also decreases

Now what we did hereis we first initialize some random value to W, and then we propagated forward

Then we notice that there is some error

And to reduce that error,we propagated backwards and increase the value of W

After that, we notice thatthe error has increased, and we came to know that wecan't increase the w value

Obviously, if your error is increasing with increasing your weight, you will not increase the weight

So again, we propagated backwards, and we decreased the W value

So, after that, we noticedthat the error has reduced

So what we're tryingis we're trying to get the value of weight in such a way that the error becomesas minimum as possible so we need to figure out whether we need to increase or decrease thew eight value

Once we know that, we keepon updating the weight value in that direction, until the error becomes minimum

Now you might reach a point where if you further update the weight, the error will again increase

At that point, you need to stop

Okay, at that point is where your final weight value is there

So, basically, thisgraph denotes that point

Now this point is nothingbut the global loss minimum

If you update the weights further, your error will also increase

Now you need to find out whereyour global loss minimum is, and that is where youroptimum weight lies

So let me summarize the steps for you

First, you'll calculate the error

This is how far your model output is from your actual output

Then you'll check whether the error is minimized or not

After that, if the error is very huge, then you'll update the weight, and you'll check the error again

You'll repeat the processuntil the error becomes minimum now once you reach theglobal loss minimum, you'll stop updating the weights, and we'll finalize your weight value

This is exactly howback propagation works

Now in order to tell youmathematically what we're doing is we're using a methodknown as gradient descent

Okay, this method is used to adjust all the weights in the network with an aim of reducing theerror at the output layer

So how gradient descentoptimize our works is the first step is youwill calculate the error by considering the below equation

Here you're subtracting thesummation of your actual output from your network output

Step two is based on the error you get, you will calculate therate of change of error with respect to the change in the weight

The learning rate issomething that you set in the beginning itself

Step three is based onthis change in weight, you will calculate the new weight

Alright, your updatedweight will be your weight plus the rate of change of weight

So guys, that was all about back propagation and weight update

Now let's look at the limitationsof feed forward network

So far, we were discussingthe multiple layer perceptron, which uses the feed forward network

Let's discuss the limitations of these feed forward networks

Now let's consider an exampleof image classification

Okay, let's say you'vetrained the neural network to classify images of various animals

Now let's consider an example

Here the first output is an elephant

We have an elephant

And this output will have nothing to do with the previous output, which is a dog

This means that the output at time T is independent of theoutput at time T minus one

Now consider this scenario where you will require the use of previously obtained output

Okay, the concept is verysimilarly to reading a book

As you turn every page, you need an understanding of the previous pages if you want to makesense of the information, then you need to knowwhat you learned before

That's exactly whatyou're doing right now

In order to understand deep learning, you have to understand machine learning

So, basically, with thefeed forward network the new output at time T plus one has nothing to do withthe output at time T, or T minus one, or T minus two

So feed forward networks cannot be used while predicting a word in a sentence, as it will have absolutely no relationship with the previous set of words

So, a feed forwardnetwork cannot be used in use cases wherein you haveto predict the outcome based on your previous outcome

So, in a lot of use cases, your previous output will alsodetermine your next output

So, for such cases, you may not make use of feed forward network

Now, what modification can you make so that your network can learn from your previous mistakes

For this, we have solution

So, a solution to this isrecurrent neural networks

So, basically, let's say you have an input at time T minus one, and you'll get some output whenyou feed it to the network

Now, some information fromthis input at T minus one is fed to the next input, which is input at time T

Some information from this output is fed into the next input, which is input at T plus one

So, basically, you keepfeeding information from the previous input to the next input

That's how recurrent neuralnetworks really work

So recurrent networks are a type of artificial neural networks designed to recognizepatterns in sequence of data, such as text, genomes,handwriting, spoken words, time series data, sensors, stock markets, and government agencies

So, guys, recurrent neuralnetworks are actually a very important part of deep learning, because recurring neural networks have applications in a lot of domains

Okay, in time series and in stock markets, the main network that I use are recurrent neural networks, because each of your inputs are correlated now to better understandrecurrent neural networks, let's consider a small example let's say that you goto the gym regularly, and the trainer has given you a schedule for your workout

So basically, the exercises are repeated after every third day

Okay, this is what yourschedule looks like

So, make a note that allthese exercises are repeated in a proper order or ina sequence every week first, let us use a feedforward network to try and predict the type of exercises that we're going to do

The inputs here are Dayof the week, the month, and your health status

Okay, so, neural network has to be trained using these inputs to provideus with the prediction of the exercise that we should do

Now let's try and understandthe same thing using recurrent neural networks

In recurrent neural networks, what we'll do is we'll consider the inputs of the previous day

Okay, so if you did ashoulder workout yesterday, then you can do a bicep exercise today, and this goes on for the rest of the week

However, if you happento miss a day at the gym, the data from the previouslyattended time stamps can be considered

It can be done like this

So, if a model istrained based on the data it can obtain from the previous exercise, the output on the modelwill be extremely accurate

In such cases, if youneed to do know the output at T minus one in order topredict the output at T

In such cases, recurrent neuralnetworks are very essential

So, basically, I'm feeding some inputs through the neural networks

You'll go through a few functions, and you'll get the output

So, basically, you'repredicting the output based on past informationor based on your past input

So that's how recurrentneural networks work

Now let's look at anothertype of neural network known as convolutional neural network

To understand why we needconvolutional neural networks, let's look at an analogy

How do you think acomputer reads an image? Consider this image

This is a New York skyline image

On the first glance, you'll see a lot of buildingsand a lot of colors

How does a computer process this image? The image is actually brokendown into three color channels, which is the red, green, and blue

It reads in the form of RGB values

Now each of these colorchannels are mapped with the image's pixel then the computer will recognize the value associated with each pixel, and determine the size of the image

Now for the black and white images, there is only one channel, but the concept is still the same

The thing is we cannot make use of fully connected networks when it comes to convolutional neural networks

I'll tell you why

Now consider the first input image

Okay, first image has size about 28 into 28 into three pixels

And if we input this to a neural network, we'll get about 2,352 weights in the first hidden layer itself

Now consider another example

Okay, let's say we have an image of 200 into 200 into three pixels

So the size of your first hidden layer becomes around 120,000

Now if this is justthe first hidden layer, imagine the number ofneurons that you need to process an entire complex image set

This leads to somethingknown as overfitting, because all of the hiddenlayers are connected

They're massively connected

There's connection betweeneach and every node

Because of this, we face overfitting

We have way too much of data

We have to use way too many neurons, which is not practical

So that's why we have something known as convolutional neural networks

Now convolutional neural networks, like any other neural network are made up of neurons withlearnable weights and basis

So each neuron receives several input

It takes a weighted sum over them, and it gets passed on throughsome activation function, and finally responds with an output

So, the concept inconvolutional neural networks is that the neuron in a particular layer will only be connected to a small region of the layer before it

Not all the neurons will be connected in a fully-connected manner, which leads to overfitting because we need way too many neurons to solve this problem

Only the regions, which are significant are connected to each other

There is no full connection in convolutional neural networks

So gus, what we did so far is we discussed what a perceptron is

We discussed the different types of neural networks that are there

We discussed a feedforward neural network

We discuss multi layer perceptrons we discussed recurrent neural networks, and convolutional neural networks

I'm not going to go too much in depth with these concepts now I'll be executing a demo

If you you haven't understood any theoretical concept of deep learning, please let me know in the comment section

Apart from this, I'll also leave a couple of links in the description box, so that you understand thewhole download in a better way

Okay, if you want a morein-depth explanation, I'll leave a couple of linksin the description box

For now, what I'm gonnado is I'll be running a practical demonstration to show you what exactly download does so, basically, what we'regoing to do in this demo is we're going to predict stock prices

Like I said, stock price prediction is one of the very good applications of deep neural networks

You can easily predict the stock price of a particular stock for the next minute or the next day by usingdeep neural networks

So that's exactly whatwe're gonna do in this demo now, before I discuss the code, let me tell you a fewthings about our data set

The data set containsaround 42,000 minutes of data ranging from April to August 2017 on 500 stocks, as well as the total S&P 500 Index price

So the index and stocks are arranged in a wide format

So, this is my data set, data_stocks

It's in the CSV format

So what I'm gonna do is I'm going to use the read CSV function inorder to import this data set

This is just the part ofwhere my data set is stored

This data set was actuallycleaned and prepared, meaning that we don'thave any missing stock and index prices

So the file does notcontain any missing values

Now what we're gonna do first is we'll drop the data valuable we have a variable known as date, which is not really necessary in predicting our outcome over here

So that's exactly what I'm doing here

I'm just dropping the date variable

So here, I'm checking thedimensions of the data set

This is pretty understandable, using the shape function to do that

Now, always you make thedata as a NymPy array

This makes computation much easier

The next process is the data splicing

I've already discussed datathe data splicing with you all

Here we're just preparing the training and the testing data

So the training data will contain 80% of the total data set

Okay, and also we are notshuffling the data set

We're just slicing thedata set sequentially

That's why we have a test start start and the test end variable

In sequence, I'll be selecting the data

There's no need ofshuffling this data set

These are stock prices it does not make senseto shuffle this data

Now in the next step, we're going to do is we're going to scale the data now, scaling data and data normalization is one of the most important steps

You cannot miss this step I already mentioned earlier what normalization and scaling is

Now most neural networks benefit from scaling inputs

This is because mostcommon activation function of the networks neuron suchas tan, hedge, and sigmoid

Tan, hedge, and sigmoid arebasically activation functions, and these are defined in therange of minus one to one or zero and one

So that's why scalingis an important thing in deep neural networks for scaling, again, we'lluse the MinMaxScaler

So we're just importingthat function over here

And also one point to note is that you have to be very cautious about what part of data you're scaling and when you're doing it

A very common mistake isto scale the whole data set before training and testsplits are being applied

So before data splicing itself, you shouldn't be scaling your data

Now this is a mistake because scaling invokes thecalculation of statistics

For example, minimum ormaximum range of the variable gets affected

So when performing time seriesforecasting in real life, you do not have informationfrom future observations at the time of forecasting

That's why calculationof scaling statistics has to be conducted on training data, and only then it has to beapplied to the test data

Otherwise, you're basicallyusing the future information at the time of forecasting, which obviously going to lead to biasness so that's why you need to make sure you do scaling very accurately

So, basically, what we'redoing is the number of features in the training data are stored in a variable known as n stocks

After this, we'll importthe infamous TensorFlow

So guys, TensorFlow isactually a very good piece of software and itis currently the leading deep learning and neuralnetwork computation framework

It is based on a C++ low-level backend, but it's usuallycontrolled through Python

So TensorFlow actually operates as a graphical representationof your computations

And this is importantbecause neural networks are actually graphs of dataand mathematical operation

So that's why TensorFlow is just perfect for neural networks and deep learning

So the next thing afterimporting the TensorFlow library is something known as placeholders

Placeholders are used tostore, import, and target data

We need two placeholdersin order to fit our model

So basically, X willcontain the network's input, which is the stockprices of all the stocks at time T equal to T

And y will contain the network's output, which is the stock price attime T is equal to T plus one

Now the shape of the X placeholder means that the inputs aretwo-dimensional matrix

And the outputs are aone-dimensional vector

So guys, basically, thenon-argument indicates that at this point we do not yet know the number of observations that'll flow through the neural network

We just keep it as aflexible array for now

We'll later define the variable batch size that controls the number of observations in each training batch

Now, apart form this, we also have something know as initializers

Now, before I tell you whatthese initializers are, you need to understand that there's something known as variables that are used as flexible containers that are allowed to changeduring the execution

Weights and bias arerepresented as variables in order to adapt during training

I already discuss weightsand bias with you earlier

Now weights and bias is something that you need to initializebefore you train the model

That's how we discussed iteven while I was explaining neural networks to you

So here, basically, we makeuse of something known as variant scaling initializer and for bias initializer, we make use of zeros initializers

These are some predefinedfunctions in our TensorFlow model

We'll not get into thedepth of those things

Now let's look at our modelarchitecture parameters

So the next thing we have to discuss is the model architecture parameters

Now the model that we build, it consists of four hidden layers

For the first layer, we'veassigned 1,024 neurons which is likely more thandouble the size of the inputs

The subsequent hidden layers are always half the size of the previous layer, which means that in thehidden layer number two, we'll have 512 neurons

Hidden layer three will have 256

And similarly, hidden layer number four will have 128 neurons

Now why do we keep reducingthe number of neurons as we go through each hidden layer

We do this because the number of neurons for each subsequent layercompresses the information that the network identifiesin the previous layer

Of course there are otherpossible network architectures that you can apply forthis problem statement, but I'm trying to keepit as simple as possible, because I'm introducingdeep learning to you all

So I can't build a model architecture that's very complex and hard to explain

And of course, we have output over here which will be assigned a single neuron

Now it is very important to understand that variable dimensionsbetween your input, hidden, and output layers

So, as a rule of thumb inmultilayer perceptrons, the second dimension of the previous layer is the first dimensionin the current layer

So the second dimensionin my first hidden layer is going to be my first dimensionin my second hidden layer

Now the reason behindthis is pretty logical

It's because the outputfrom the first hidden layer is passed on as an inputto the second hidden layer

That's why the seconddimension of the previous layer is the same as the first dimension of the next layer or the current layer

I hope this is understandable

Now coming to the biasdimension over here, the bias dimension is always equal to the second dimensionof your current layer, meaning that you're just going to pass the number of neurons inthat particular hidden layer as your dimension in your bias

So here, the number of neurons, 1,024, you're passing the same numberas a parameter to your bias

Similarly, even forhidden layer number two, if you see a second dimension here is n_neurons_2

I'm passing the sameparameter over here as well

Similarly, for hidden layer three and hidden layer number four

Alright, I hope this is understandable now we come to the output layer

The output layer will obviously have the output from hidden layer number four

This is our output from hidden layer four that's passed as the firstdimension in our output layer, and it'll finally have your n target, which is set to one over here

This is our output

Your bias will basically havethe current layer's dimension, which is n target

You're passing that sameparameter over here

Now after you define the required weight and the bias variables, the architecture of thenetwork has to be specified

What you do is placeholders and variables need to be combined into a system of sequential matrix multiplication

So that's exactly what'shappening over here

Apart from this, all the hidden layers need to be transformed byusing the activation function

So, activation functions are important components of the network because they introducenon-linearity to the system

This means that high dimensional data can be dealt with with the helpof the activation functions

Obviously, we have veryhigh dimensional data when it comes to neural networks

We don't have a single dimension or we don't have two or three inputs

We have thousands and thousands of inputs

So, in order for aneural network to process that much of high dimensional data, we need something knownas activation functions

That's why we make useof activation functions

Now, there are dozensof activation functions, and one of the most common one is the rectified linear unit, rectified linear unit

RELU is nothing but rectified linear unit, which is what we're gonnabe using in this model

So, after, you applied thetransformation function to your hidden layer, youneed to make sure that your output is transposed

This is followed by a veryimportant function known as cost function

So the cost function of a network is used to generate a measure of deviation between the network's prediction and the actual observed training targets

So this is basically your actual output minus your model output

It basically calculates theerror between your actual output and your predicted output

So, for regression problems,the mean squared error function is commonly used

I have discussed MSC, meansquared error, before

So, basically, we are just measuring the deviation over here

MSC is nothing bot your deviation from your actual output

That's exactly what we're doing here

So after you've computed your error, the next step is obviously to update your weight and your bias

So, we have somethingknown as the optimizers

They basically take care ofall the necessary computations that are needed to adaptthe network's weight and bias variables duringthe training phase

That's exactly what's happening over here

Now the main function ofthis optimizer is that it invoke something known as a gradient

Now if you all remember, wediscussed gradient before it basically indicates the direction in which the weights and the bias has to be changed during the training in order to minimize thenetwork's cost function or the network's error

So you need to figure outwhether you need to increase the weight and the bias inorder to decrease the error, or is it the other way around? You need to understand the relationship between your error andyour weight variable

That's exactly what the optimizer does

It invokes the gradient

We will give you thedirection in which the weights and the bias have to be changed

So now that you knowwhat an optimizer does, in our model, we'll be using something known as the AdamOptimizer

This is one of thecurrent default optimizers in deep learning

Adam basically stands foradaptive moment estimation, and it can be consideredas a combination between very two popular optimizerscalled Adagrad and RMSprop

Now let's not get into thedepth of the optimizers

The main agenda here is for you to understand thelogic behind deep learning

We don't have to go into the functions

I know these are predefined functions which TensorFlow takes care of

Next we have somethingknown as initializers

Now, initializers are used to initialize the network's variables before training

We already discussed this before

I'll define the initializer here again

I've already done itearlier in this session

Initializers are already defined

So I just removed that line of code

Next step would be fittingthe neural network

So after we've defined theplace holders, the variables, variables which arebasically weights and bias, the initializers, the cost functions, and the optimizers of the network, the model has to be trained

Now, this is usually done by using the mini batch training method, because we have very huge data set

So it's always best to use themini batch training method

Now what happens duringmini batch training is random data samples of any batch size are drawn from the training data, and they are fed into the network

So the training data set gets divided into N divided by your batch size batches that are sequentiallyfed into the network

So, one after the other, each of these batches willbe fed into the network

At this point, the placeholderwhich are your X and Y, they come into play

They store the input and the target data and present them to thenetwork as inputs and targets

That's the main functionalityof placeholders

What they do is they storethe input and the target data, and they provide this to the network as inputs and targets

That's exactly what your placeholders do

So let's say that asample data batch of X

Now this data batchflows through the network until it reaches the output layer

There the TensorFlow comparesthe model's predictions against the actual observed targets, which is stored in Y

If you all remember, we stored our actualobserved targets in Y

After this, TensorFlow will conduct something known as optimization step, and it'll update the network's parameters like the weight of thenetwork and the bias

So after having updateyour weight and the bias, the next batch is sampled andthe process gets repeated

So this procedure will continue until all the batches havepresented to the network

And one full sweep over all batches is known as an epoch

So I've defined thisentire thing over here

So we're gonna go through 10 epochs, meaning that all the batches are going to go through training, meaning you're going toinput each batch that is X, and it'll flow through the network until it reaches the output layer

There what happens is TensorFlow will compare your predictions

That is basically whatyour model predicted against the actual observed targets which is stored in Y

After this, TensorFlowwill perform optimization wherein it'll update the network paramters like your weight and your bias

After you update the weight and the bias, the next batch will get sampled and the process will keep repeating

This happens until all the batches are implemented in the network

So what I just told you was one epoch

We're going to repeat this 10 times

So a batch size is 256, meaning that we have 256 batches

So here we're going to assign x and y, what I just spoke to you about

The mini batch training starts over here so, basically, your first batch will start flowing through the network until it reaches the output layer

After this, TensorFlow willcompare your model's prediction

This is where predictions happen

It'll compare your model's prediction to the actual observed targets which is stored in y

Then TensorFlow willstart doing optimization, and it'll update the network paramters like your weight and your bias

So after you update theweight and the biases, the next batch will getinput into the network, and this process will keep repeating

This process will repeat 10 times because we've defined 10 epochs

Now, also during the training, we evaluate the network'sprediction on the test set, which is basically the datawhich we haven't learned, but this data is set asidefor every fifth batch, and this is visualized

So in our problem statement, what a network is going to do is it's going to predict the stock price continuously over a timeperiod of T plus one

We're feeding it data abouta stock price at time T

It's going to give us anoutput of time T plus one

Now let me run this code and let's see how closeour predicted values are to the actual values

We're going to visualizethis entire thing, and we've also exported this in order to combine itinto a video animation

I'll show you what the video looks like

So now let's look at our visualization

We'll look at our output

So the orange basicallyshows our model's prediction

So the model quickly learns the shape and the location of thetime series in the test data and showing us an accurate prediction

It's pretty close tothe actual prediction

Now as I'm explaining this to you, each batch is running here

We are at epoch two

We have 10 epochs to go over here

So you can see that thenetwork is actually adapting to the basic shape of the time series, and it's learning finerpatterns in the data

You see it keeps learning patterns and the production isgetting closer and closer after every epoch

So let just wait til we reach epoch 10 and we complete the entire process

So guys, I think thepredictions are pretty close, like the pattern and theshape is learned very well by our neural network

It is actually mimicking this network

The only deviation is in the values

Apart from that, it's learning the shape of the time series datain almost the same way

The shape is exactly the same

It looks very similar to me

Now, also remember thatthere are a lot of ways of improving your result

You can change the design of your layers or you can change the number of neurons

You can choose differentinitialization functions and activation functions

You can introduce somethingknown as dropout layers which basically help youto get rid of overfitting, and there's also somethingknown as early stopping

Early stopping helps you understand where you must stop your batch training

That's also another methodthat you can implement for improving your model

Now there are also differenttypes of deep learning model that you can use for this problem

Here we use the feedforward network, which basically means that the batches will flow from left to right

Okay, so our 10 epochs are over

Now the final thing that'sgetting calculate is our error, MSC or mean squared error

So guys, don't worry about this warning

It's just a warning

So our mean square errorcomes down to 0

0029 which is pretty low becausethe target is scaled

And this means that ouraccuracy is pretty good

So guys, like I mentioned, if you want to improvethe accuracy of the model, you can use different schemes, you can use differentinitialization functions, or you can try out differenttransformation functions

You can use somethingknown as dropout technique and early stopping in orderto make the training phase even more better

So guys, that was the endof our deep learning demo

I hope all of you understoodthe deep learning demo

For those of you who are just learning deep learning for the first time, it might be a little confusing

So if you have any doubtsregarding the demo, let me know in the comment section

I'll also leave a couple oflinks in the description box, so that you can understand deep learning in a little more depth

Now let's look at ourfinal topic for today, which is natural language processing

Now before we understandwhat text mining is and what natural language processing is, we have to understandthe need for text mining and natural language processing

So guys, the number one reason why we need text mining and naturallanguage processing is because of the amount of data that we're generating during this time

Like I mentioned earlier, there are around 2

5quintillion bytes of data that is created every day, and this number is only going to grow

With the evolution of communication through social media, we generate tons and tons of data

The numbers are on your screen

These numbers areliterally for every minute

On Instagram, every minute, 1

7million pictures are posted

Okay, 1

7 or more than 1

7million pictures are posted

Similarly, we have tweets

We have around 347,000 tweetsevery minute on Twitter

This is actually a lot and lot of data

So, every time we're using a phone, we're generating way too much data

Just watching a video on YouTube is generating a lot of data

When sending text messages from WhatsApp, that is also generatingtons and tons of data

Now the only problem isnot our data generation

The problem is that out of all the data that we're generating,only 21% of the data is structured and well-formatted

The remaining of the data is unstructured, and the major source ofunstructured data include text messages fromWhatsApp, Facebook likes, comments on Instagram, bulk emails that we send out ever single day

All of this accounts forthe unstructured data that we have today

Now the question here is what can be done with so much data

Now the data that we generate can be used to grow businesses

By analyzing and mining the data, we can add more value to a business

This exactly what textmining is all about

So text mining or text analytics is the analysis of data available to us in a day-to-day spokenor written language

It is amazing so muchdata that we generate can actually be used in text mining

We have data from word Word documents, PowerPoints, chat messages, emails

All of this is used toadd value to a business now the data that we get from sources like social media, IoT, they are mainly unstructured, and unstructured data cannot be used to draw useful insightsto grow a business

That's exactly why we need to text mining

Text mining or text analytics is the process of derivingmeaningful information from natural language text

So, all the data that wegenerate through text messages, emails, documents, files, are written in natural language text

And we are going to use text mining and natural language processing to draw useful insights orpatterns from such data

Now let's look at a few examples to show you how naturallanguage processing and text mining is used

So now before I move any further, I want to compare text mining and NLP

A lot of you might be confused about what exactly text mining is and how is it related tonatural language processing

A lot of people have also asked me why is NLP and text mining considered as one and the same and are they the same thing

So, basically, text mining is a vast field that makes use of naturallanguage processing to derive high qualityinformation from the text

So, basically, text mining is a process, and natural languageprocessing is a method used to carry out text mining

So, in a way, you can say that text mining is a vast field which uses and NLP in order perform textanalysis and text mining

So, NLP is a part of text mining

Now let's understand what exactly natural language processing is

Now, natural language processing is a component of text mining which basically helps amachine in reading the text

Obviously, machines don'tactually known English or French, they interpret data in theform of zeroes and ones

So this is where naturallanguage processing comes in

NLP is what computers and smart phones use to understand our language, both spoken and written language

Now because use language tointeract with our device, NLP became an integral part of our life

NLP uses concepts of computer science and artificial intelligence to study the data and deriveuseful information from it

Now before we move any further, let's look at a few applicationsof NLP and text mining

Now we all spend a lotof time surfing the webs

Have you ever notice that if you start typing a word on Google, you immediately getsuggestions like these

These feature is alsoknown as auto complete

It'll basically suggest therest of the word for you

And we also have somethingknown as spam detection

Here is an example ofhow Google recognizes the misspelling Netflix and shows results for keywordsthat match your misspelling

So, the spam detection is also based on the concepts of text mining and natural language processing

Next we have predictivetyping and spell checkers

Features like auto correct,email classification are all applicationsof text mining and NLP

Now we look at a coupleof more applications of natural language processing

We have something knownas sentimental analysis

Sentimental analysis is extremely useful in social media monitoring, because it allows us to gain an overview of the wider public opinionbehind certain topics

So, basically, sentimental analysis is used to understand the public's opinion or customer's opinion on a certain product or on a certain topic

Sentimental analysis isactually a very huge part of a lot of social media platforms like Twitter, Facebook

They use sentimentalanalysis very frequently

Then we have something known as chatbot

Chatbots are basically the solutions for all the consumer frustration, regarding customer call assistance

So we have companies like Pizza Hut, Uber who have started using chatbots to provide good customer service, apart form that speech recognition

NLP has widely been usedin speech recognition

We're all aware of Alexa,Siri, Google Assistant, and Cortana

These are all applications ofnatural language processing

Machine translation is anotherimportant application of NLP

An example of this isthe Google Translator that uses NLP to process and translate one language to the other

Other application include spell checkers, keywords search, information extraction, and NLP can be used toget useful information from various website, from word documents, from files, and et cetera

It can also be used inadvertisement matching

This basically means arecommendation of ads based on your history

So now that you have abasic understanding of where natural language processing is used and what exactly it is, let's take a look atsome important concepts

So, firstly, we're gonnadiscuss tokenization

Now tokenization is the mosbasic step in text mining

Tokenization basicallymeans breaking down data into smaller chunks or tokens so that they can be easily analyzed

Now how tokenization works is it works by breaking acomplex sentence into words

So you're breaking ahuge sentence into words

You'll understand theimportance of each of the word with respect to the whole sentence, after which will produce a description on an input sentence

So, for example, let'ssay we have this sentence, tokens are simple

If we apply tokenization on this sentence, what we get is this

We're just breaking a sentence into words

Then we're understanding the importance of each of these words

We'll perform NLP processon each of these words to understand how important each word is in this entire sentence

For me, I think tokens andsimple are important words, are is basically another stop word

We'll be discussing about stopwords in our further slides

But for now, you eed tounderstand that tokenization is a very simple process that involves breaking sentences into words

Next, we have something known as stemming

Stemming is basically normalizing words into its base form or into its root form

Take a look at this example

We have words like detection, detecting, detected, and detections

Now we all know that the root word for all these words is detect

Basically, all these words mean detect

So the stemming algorithmworks by cutting off the end or the beginning of the word and taking into accounta list of common prefixes and suffixes that canbe found on any word

So guys, stemming can besuccessful in some cases, but not always

That is why a lot of people affirm that stemming has a lot of limitations

So, in order to overcomethe limitations of stemming, we have something known as lemmatization

Now what lemmatization does is it takes into considerationthe morphological analysis of the words

To do so, it is necessary tohave a detailed dictionary which the algorithm can lookthrough to link the form back to its lemma

So, basically lemmatization is also quite similar to stemming

It maps different wordsinto one common root

Sometimes what happens in stemming is that most of the words gets cut off

Let's say we wanted tocut detection into detect

Sometimes it becomesdet or it becomes tect, or something like that

So because of this, the grammar or the importance of the word goes away

You don't know whatthe words mean anymore

Due to the indiscriminatecutting of the word, sometimes the grammar theunderstanding of the word is not there anymore

So that's why lemmatizationwas introduced

The output of lemmatizationis always going to be a proper word

Okay, it's not going to besomething that is half cut or anything like that

You're going to understandthe morphological analysis and then only you're goingto perform lemmatization

An example of a lemmatizer is you're going to convertgone, going, and went into go

All the three words anywaymean the same thing

So you're going to convert it into go

We are not removing the firstand the last part of the word

What we're doing is we're understanding the grammar behind the word

We're understanding the English or the morphological analysis of the word, and only then we're goingto perform lemmatization

That's what lemmatization is all about

Now stop words are basically a set of commonly used words in anylanguage, not just English

Now the reason why stop words are critical to many applications is that if we remove the wordsthat are very commonly used in a given language, we can finally focuson the important words

For example, in thecontext of a search engine, let's say you open up Google and you try how to makestrawberry milkshake

What the search engine is going to do is it's going to find a lot more pages that contain the terms how to make, rather than pages which contain the recipe for your strawberry milkshake

That's why you have todisregard these terms

The search engine can actually focus on the strawberry milkshake recipe, instead of looking for pagesthat have how to and so on

So that's why you need toremove these stop words

Stop words are how to, begin,gone, various, and, the, all of these are stop words

They are not necessarily important to understand theimportance of the sentence

So you get rid of thesecommonly used words, so that you can focuson the actual keywords

Another term you need to understand is document term matrix

A document term matrixis basically a matrix with documents designated byroles and words by columns

So if your document one hasthis sentence, this is fun, or has these word, this is fun, then you're going to getone, one, one over here

In document two, if you seewe have this and we have is, but we do not have fun

So that's what a document term matrix is

It is basically to understandwhether your document contains each of these words

It is a frequency matrix

That is what a document term matrix is

Now let's move on and look at a natural language processing demo

So what we're gonna dois we're gonna perform sentimental analysis

Now like I said, sentimental analysis is one of the most popular applications of natural language processing

It refers to the processing of determining whether a given piece of textor a given sentence of text is positive or negative

So, in some variations, we consider a sentence to also be neutral

That's a third option

And this technique iscommonly used to discover how people feel about a particular topic or what are people's opinionabout a particular topic

So this is mainly used toanalyze the sentiments of users in various forms, such as in marketingcampaigns, in social media, in e-commerce websites, and so on

So now we'll be performingsentimental analysis using Python

So we are going to performnatural language processing by using the NaiveBayesClassifier

That's why we are importingthe NaiveBayesClassifier

So guys, Python provides a library known as natural language toolkit

This library contains allthe functions that are needed to perform natural language processing

Also in this library, we have a predefined dataset called movie reviews

What we're gonna do iswe're going to download that from our NLTK, which isnatural language toolkit

We're basically going to run our analysis on this movie review data set

And that's exactly whatwe're doing over here

Now what we're doing iswe're defining a function in order to extract features

So this is our function

It's just going to extract all our words

Now that we've extracted the data, we need to train it, so we'll do that by usingour movie reviews data set that we just downloaded

We're going to understand the positive words and the negative words

So what we're doing here iswe're just loading our positive and our negative reviews

We're loading both of them

After that, we'll separate each of these into positive featuresand negative features

This is pretty understandable

Next, we'll split the data into our training and testing set

Now this is somethingthat we've been doing for all our demos

This is also known as data splicing

We've also set a threshold factor of 0

8 which basically meansthat 80% of your data set will belong to your training, and 20% will be for your testing

You're going to do thiseven for your positive and your negative words

After that, you're justextracting the features again, and you're just printing the number of trainingdata points that you have

You're just printing the lengthof your training features and you're printing the length of your testing features

We can see the output,let's run this program

So if you see that we're getting the number of trainingdata points as 1,600 and your number of testingdata points are 400, there's an 80 to 20% ration over here

After this, we'll be usingthe NaiveBayesClassifier and we'll define the object for the NaiveBayesClassifierwith basically classifier, and we'll train this usingour training data set

We'll also look at theaccuracy of our model

The accuracy of ourclassifier is around 73%, which is a really good number

Now this classifier objectwill actually contain the most informative words that are obtained during analysis

These words are basicallyessential in understanding which word is classified as positive and which is classified as negative

What we're doing here iswe're going to review movies

We're going to see whichmovie review is positive or which movie review is negative

Now this classifier will basically have all the informative wordsthat will help us decide which is a positive reviewor a negative review

Then we're just printing these10 most informative words, and we have outstanding, insulting, vulnerable, ludicrous, uninvolving, avoids, fascination, and so on

These are the mostimportant words in our text

Now what we're gonna do iswe're gonna test our model

I've randomly given some reviews

If you want, let's add another review

We'll say I loved the movie

So I've added another review over here

Here we're just printing the review, and we're checking ifthis is a positive review or a negative review

Now let's look at our predictions

We'll save this and

I forgot to put a comma over here

Save it and let's run the file again

So these were our randomlywritten movie reviews

The predicted sentiment is positive

Our probability score was 0

61

It's pretty accurate here

This is a dull movie and Iwould never recommend it, is a negative sentiment

The cinematography is pretty great, that's a positive review

The movie is pathetic isobviously a negative review

The direction was terrible, and the story was all over the place

This is also consideredas a negative review

Similarly, I love the movieis what I just inputted, and I've got a positive review on that

So our classifier actuallyworks really well

It's giving us good accuracy and it's classifying thesentiments very accurately

So, guys, this was allabout sentimental analysis

Here we basically saw if a movie review was positive or negative

So guys, that was all for our NLP demo

I hope all of you understood this

It was a simple sentimental analysis that we saw through Python

So again, if you have doubts, please leave them in the comment section, and I'll help you with all of the queries

So guys, that was our last module, which was on natural language processing

Now before I end today's session, I would like to discuss with you the machine learning engineers program that we have Edureka

So we all are aware of the demand of the machinelearning engineer

So, at Edureka, we have a master's program that involves 200-plus hoursof interactive training

So the machine learningmaster's program at Edureka has around nine modules and 200-plus hours of interactive learning

So let me tell you the curriculum that this course provides

So your first module will basically cover Python programming

It'll have all the basics andall your data visualization, your GUI programming, your functions, and your object-oriented concepts

The second module will covermachine learning with Python

So you'll supervise algorithms and unsupervised algorithms along with statistics and time series in Python will be coveredin your second module

Your third module willhave graphical modeling

This is quite important whenti comes to machine learning

Here you'll be taughtabout decision making, graph theory, inference, andBayesian and Markov's network, and module number four will cover reinforcement learning in depth

Here you'll understandingdynamic programming, temporal difference, Bellman equations, all the concepts ofreinforcement learning in depth

All the detail in advance concepts of reinforcement learning

So, module number fivewill cover NLP with Python

You'll understand tokenization,stemming lemmatization, syntax, tree parsing, and so on

And module number six willhave module six will have artificial intelligence anddeep learning with TensorFlow

This module is a very advanced version of all your machine learning and reinforcement learningthat you'll learn

Deep learning will be in depth over here

You'll be using TensorFlow throughout

They'll cover all the conceptsthat we saw, CNN, RNN

it'll cover the varioustype of neural networks, like convolutional neural networks, recurrent neural networks, long, short-term memory, neural networks, and auto encoders and so on

The seventh module is all about PySpark

It'll show you how Spark SQL works and all the features andfunctions of Spark ML library

And the last module will finally cover about Python Spark using PySpark

Appropriate from this seven modules, you'll also get twofree self-paced courses

Let's actually take a look at the course

So this is your machine learning engineer master's program

You'll have nine courses, 200-plus hours of interactive learning

This is the whole course curriculum, which we just discussed

Here there are seven modules

Apart from these seven modules, you'll be given twofree self-paced courses, which I'll discuss shortly

You can also get to knowthe average annual salary for a machine learning engineer, which is over $134,000

And there are also a lot of job openings in the field of machinelearning AI and data science

So the job titles that you might get are machine learningengineer, AI engineer, data scientist, data and analytics manger, NLP engineer, and data engineer

So this is basically the curriculum

Your first will by Pythonprogramming certification, machine learningcertification using Python, graphical modeling,reinforcement learning, natural language processing, AI and deep learning with TensorFlow

Python Spark certificationtraining using PySpark

If you want to learn moreabout each of these modules, you can just go and view the curriculum

They'll explain each and every concept that they'll be showing in this module

All of this is going to be covered here

This is just the first module

Now at the end of this project, you will be given a verifiedcertificate of completion with your name on it, and these are the free elective courses that you're going to get

One is your Python scriptingcertification training

And the other is your Python Statistics for Data Science Course

Both of these coursesexplain Python in depth

The second course on statistics will explain all the concepts of statistics probability,descriptive statistics, inferential statistics, time series, testing data, data clustering, regressionmodeling, and so on

So each of the module isdesigned in such a way that you'll have a practical demo or a practical implementation after each and every model

So all the concept that Itheoretically taught to you will be explained through practical demos

This way you'll get agood understanding of the entire machinelearning and AI concepts

So, if any of you are interested in enrolling for this program or if you want to learn more about the machine learningcourse offered by Edureka, please leave your emailIDs in the comment section, and we'll get back to you with all the details of the course

So guys, with this, we come to the end of this AI full course session

I hope all of you haveunderstood the basic concepts and the idea behind AI machinelearning, deep learning, and natural language processing

So if you still have doubts regarding any of these topics, mention them in the comment section, and I'll try to answer all your queries

So guys, thank you so much forjoining me in this session

Have a great day

I hope you have enjoyedlistening to this video

Please be kind enough to like it, and you can comment any ofyour doubts and queries, and we will reply them at the earliest

Do look out for morevideos in our playlist and subscribe to Edurekachannel to learn more

Happy learning

In this video, I'll be covering all the domains and the concepts involved under the umbrellaof artificial intelligence, and I will also be showingyou a couple of use cases and practical implementationsby using Python

So there's a lot to cover in this session, and let me quickly run youthrough today's agenda

So we're gonna begin the session by understanding the historyof artificial intelligence and how it cam into existence

We'll follow this by looking at why we're talking aboutartificial intelligence now, why has it gotten so famous right now

Then we'll look at what exactlyis artificial intelligence

We'll discuss the applicationsof artificial intelligence, after which we'll discuss the basics of AI where in we'll understand the different types ofartificial intelligence

We'll follow this by understanding the different programming languages that can be used to study AI

And we'll understand whywe're gonna choose Python

Alright, I'll introduce you to Python

And then we'll move on anddiscuss machine learning

Here we'll discuss the differenttypes of machine learning, the different algorithmsinvolved in machine learning, which include classification algorithms, regression algorithms, clustering, and association algorithms

To make you understandmachine learning better, we'll run a couple of demos wherein we'll see howmachine learning algorithms are used to solve real world problems

After that, we'll discuss the limitations of machine learning and why deep learning is needed

I'll introduce you to thedeep learning concept, what are neurons, perceptrons, multiple layer perceptrons and so on

We'll discuss the differenttypes of neural networks, and we'll also look at whatexactly back propagation is

Apart from this, we'll be running a demo to understand deep learning in more depth

And finally we'll moveonto the next module, which is natural language processing

On the natural language processing, we'll try to understandwhat is text mining, the difference between text mining in NLP, what are the differentterminologies in NLP, and we'll end the session by looking at the practical implementationof NLP using Python, alright

So guys, there's a lot tocover in today's session

Also, if you want to stay updated about the recent technologies, and would like to learn moreabout the training technology,

So let's move ahead and takea look at our first topic which is history ofartificial intelligence

So guys, the concept ofartificial intelligence goes back to the classical ages

Under Greek mythology, the concept of machines and mechanical men were well thought of

So, an example of this is Talos

I don't know how many ofyou have heard of this

Talos was a giant animated bronze warrior who was programmed toguard the island of Crete

Now these are just ideas

Nobody knows if this wasactually implemented, but machine learning and AIwere thought of long ago

Now let's get back to the 19th century

Now 1950 was speculated to be one of the most important years for the introduction ofartificial intelligence

In 1950, Alan Turing published a paper in which he speculatedabout the possibility of creating machines that think

So he created what isknown as the Turing test

This test is basically used to determine whether or not a computercan think intelligently like a human being

He noted that thinkingis difficult to define and devised his famous Turing test

So, basically, if a machinecan carry out a conversation that was indistinguishable from a conversation with a human being, it was reasonable to saythat the machine is thinking, meaning that the machinewill pass the Turing test

Now, unfortunately, up to this date, we haven't found a machinethat has fully cleared the Turing test

So, the Turing test was actuallythe first serious proposal in the philosophy ofartificial intelligence

Followed by this was the era of 1951

This was also known as the game AI

So in 1951, by using theFerranti Mark 1 machine of the University of Manchester, a computer scientist knownas Christopher Strachey wrote a checkers program

And at the same time, a program was written for chess as well

Now, these programs werelater improved and redone, but this was the first attempt at creating programs that could play chess or that would compete withhumans in playing chess

This is followed by the year 1956

Now, this is probablythe most important year in the invention of AI

Because in 1956, for the firs time, the term artificialintelligence was coined

Alright

So the term artificial intelligence was coined by John McCarthy at the Dartmouth Conference in 1956

Coming to the year 1959, the first AI laboratory was established

This period marked theresearch era for AI

So the first AI lab whereresearch was performed is the MIT lab, which is still running til date

In 1960, the first robot was introduced to the General Motors assembly line

In 1961, the first chatbot was invented

Now we have Siri, we have Alexa

But in 1961, there was a chatbot known as Eliza, which was introduced

This is followed by thefamous IBM Deep Blue

In 1997, the news broke down that IBM's Deep Bluebeats the world champion, Garry Kasparov, in the game of chess

So this was kind of thefirst accomplishment of AI

It was able to beat theworld champion at chess

So in 2005, when the DARPAGrand Challenge was held, a robotic car named Stanley, which was built by Stanford's racing team, won the DARPA Grand Challenge

That was another big accomplish of AI

In 2011, IBM's questionanswering system, Watson, defeated the two greatestJeopardy champions, Brad Rutter and Ken Jennings

So guys, this was how AI evolved

It started off as ahypothetical situation

Right now it's the mostimportant technology in today's world

If you look around every where, everything around us is runthrough AI deep learning or machine learning

So since the emergence of AI in the 1950s, we have actually seenan exponential growth and its potential

So AI covers domainssuch as machine learning, deep learning, neural networks, natural language processing, knowledge based, expert systems and so on

It is also made its wayinto computer vision and image processing

Now the question hereis if AI has been here for over half a century, why has it suddenlygain so much importance? Why are we talking aboutartificial intelligence now? Let me tell you the mainreasons for the demand of AI

The first reason is what we have more computation power now

So, artificial intelligence requires a lot of computing power

Recently, many advances have been made and complex deep learningmodels are deployed

And one of the greatest technology that made this possible are GPUs

Since we have morecomputational power now, it is possible for us to implement AI in our daily aspects

Second most important reason is that we have a lot of data at present

We're generating dataat an immeasurable pace

We are generating datathrough social media, through IoT devices

Every possible way, there's a lot of data

So we need to find a method or a solution that can help us process this much data, and help us derive useful insight, so that we can grow businesswith the help of data

Alright, so, that process is basically artificial intelligence

So, in order to have a useful AI agent to make smart decisions like telling which item to recommend next when you shop online, or how to classify anobject from an image

AI are trained on large data sets, and big data enables us todo this more efficiently

Next reason is now wehave better algorithms

Right now we have veryeffective algorithms which are based on theidea of neural networks

Neural networks is nothing but the concept behind deep learning

Since we have better algorithms which can do better computations and quicker computationswith more accuracy, the demand for AI has increased

Another reason is thatuniversities, governments, startup, and tech giantsare all investing in AI

Okay, so companies like Google, Amazon, Facebook, Microsoft, all of these companieshave heavily invested in artificial intelligence because they believethat AI is the future

So AI is rapidly growingboth as a field of study and also as an economy

So, actually, this is the right time for you to understand whatis AI and how it works

So let's move on and understand what exactly artificial intelligence is

The term artificial intelligence was first coined in theyear 1956 by John McCarthy at the Dartmouth Conference

I already mentioned this before

It was the birth of AI in the 1956

Now, how did he defineartificial intelligence? John McCarthy defined AI asthe science and engineering of making intelligent machines

In other words, artificial intelligence is the theory and developmentof computer systems able to perform task that normally require human intelligence, such as visual perception,speech recognition, decision making, andtranslation between languages

So guys, in a sense, AI is atechnique of getting machines to work and behave like humans

In the rest past, artificial intelligence has been able to accomplish this by creating machines and robots that have been used inwide range of fields, including healthcare, robotics, marketing, business analytics, and many more

With this in mind, let's discuss a couple ofreal world application of AI, so that you understand howimportant artificial intelligence is in today's world

Now, one of the most famous applications of artificial intelligence is the Google predictive search engine

When you begin typing a search term and Google makes recommendationsfor you to choose from, that is artificial intelligence in action

So predictive searches are based on data that Google collects about you, such as your browserhistory, your location, your age, and other personal details

So by using artificial intelligence, Google attempts to guess whatyou might be trying to find

Now behind this, there's a lot of naturallanguage processing, deep learning, andmachine learning involved

We'll be discussing all of those concepts in the further slides

It's not very simple tocreate a search engine, but the logic behind Google search engine is artificial intelligence

Moving on, in the finance sector, JP Morgan Chase's ContractIntelligence Platform uses machine learning,artificial intelligence, and image recognition software to analyze legal documents

Now let me tell youthat manually reviewing around 12,000 agreementstook over 36,000 hours

That's a lot of time

But as soon as this taskwas replaced by AI machine, it was able to do thisin a matter of seconds

So that's the differencebetween artificial intelligence and manual or human work

Even though AI cannot thinkand reason like humans, but their computationalpower is very strong compared to humans, because the machine learning algorithm, deep learning concepts, andnatural language processing, AI has reach a stagewherein it can compute the most complex of complex problems in a matter of seconds

Coming to healthcare, IBMis one of the pioneers that has developed AI software, specifically for medicine

Let me tell you that more than230 healthcare organizations use IBM AI technology, which is basically IBM Watson

In 2016, IBM Watson technologywas able to cross reference 20 million oncology records quickly and correctly diagnose a rare leukemia condition in a patient

So, it basically wentthrough 20 million records, which it probably did in amatter of second or minutes, max to max

And then it correctly diagnosed a patient with a rare leukemia

Knowing that machines are now used in medical fields as well, it shows how important AI has become

It has reached every domains of our lives

Let me give you another example

The Google's AI Eye Doctor is another initiative,which is taken by Google, where they're working withan Indian eye care chain to develop artificial intelligence system which can examine retinal scans and identify a condition called diabetic retinopathywhich can cause blindness

Now in social mediaplatforms like Facebook, artificial intelligence isused for face verification wherein you make use of machine learning and deep learning concept in order to detect facialfeatures and tag your friends

All the auto tagging featurethat you see in Facebook, behind that there's machine learning, deep learning, neural networks

There's only AI behind it

So we're actually unaware that we use AI very regularly in our life

All the social media platforms like Instagram, Facebook, Twitter, they heavily rely onartificial intelligence

Another such example is Twitter's AI which is being used to identifyany sort of hate speech and terroristic languages in tweets

So again, it makes use of machine leaning, deep learning, natural language processing in order to filter out any offensive or any reportable content

Now recently, the company discovered around 300,000 terroristic link accounts and 95% of these were found by non-human artificially intelligent machines

Coming to virtual assistants, we have virtual assistantslike Siri and Alexa right now

Let me tell you aboutanother newly released Google's virtual assistantcalled the Google Duplex, which has astonished millionsof people around the world

Not only can it respond to calls and book appointments for you, it also adds a human touch

So it adds human filters and all of that

It makes it sound very realistic

It's actually very hardto distinguish between human and the AI speaking over the phone

Another famous applicationis AI is self-driving cars

So, artificial intelligenceimplements computer vision, image detection, deep learning, in order to build cars that can automatically detectany objects or any obstacles and drive around withouthuman intervention

So these are fullyautomated self-driving cars

Also, Elon Musk talks a lotabout how AI is implemented in Tesla's self-driving cars

He quoted that Tesla willhave fully self-driving cars ready by the end of the year, and robo taxi versionthat can ferry passengers without anyone behind the wheel

So if you look at it, AI is actually used by the tech giants

A lot of tech giant companieslike Google, Tesla, Facebook, all of these data-driven companies

In fact, Netflix also makes use of AI,

So, coming to Netflix

So with the help ofartificial intelligence and machine learning, Netflix has developed apersonalized movie recommendation for each of its users

So if each of you opened up Netflix and if you look at the type of movies that are recommended toyou, they are different

This is because Netflix studies each user's personal details, and tries to understand whateach user is interested in and what sort of moviepatterns each user has, and then it recommends movies to them

So Netflix uses the watchinghistory of other users with similar taste to recommend what you may be mostinterested in watching next, so that you can stay engaged and continue your monthly subscription

Also, there's a known factthat over 75% of what you watch is recommended by Netflix

So their recommendationengine is brilliant

And the logic behind theirrecommendation engine is machine learning andartificial intelligence

Apart from Netflix, Gmail alsouses AI on a everyday basis

If you open up your inbox right now, you will notice that thereare separate sections

For example, we have primary section, social section, and all of that

Gmail has a separate sectioncalled the spam mails also

So, what Gmail does is it makes use of concepts of artificial intelligence and machine learning algorithms to classify emails as spam and non-spam

Many times certain words or phrases are frequently used in spam emails

If notice your spam emails, they have words likelottery, earn, full refund

All of this denotes that the email is more likely to be a spam one

So such words andcorrelations are understood by using machine learning andnatural language processing and a few other aspects ofartificial intelligence

So, guys, these werethe common applications of artificial intelligence

Now let's discuss thedifferent types of AI

So, AI is divided into threedifferent evolutionary stages, or you can say that there are three stages of artificial intelligence

Of course, we have artificialnarrow intelligence followed by artificialgeneral intelligence, and that is followed byartificial super intelligence

Artificial narrow intelligence, which is also known as weak AI, it involves applyingartificial intelligence only to specific task

So, many currently existing systems that claim to use artificial intelligence are actually operating as weak AI focused on a narrowlydefined specific problem Let me give you an example of artificial narrow intelligence

Alexa is a very good example of weak AI

It operates within unlimitedpre-defined range of functions

There's no genuine intelligence or there is no self awareness, despite being a sophisticatedexample of weak AI

The Google search engine,Sophia the humanoid, self-driving cars, andeven the famous AlphaGo fall under the category of weak AI

So guys, right now we're at the stage of artificial narrowintelligence or weak AI

We actually haven't reachedartificial general intelligence or artificial super intelligence, but let's look at whatexactly it would be like if we reach artificialgeneral intelligence

Now artificial general intelligence which is also known as strong AI, it involves machinesthat posses the ability to perform any intelligenttask that a human being can

Now this is actually something that a lot of people don't realize

Machines don't posseshuman-like abilities

They have a very strong processing unit that can perform high-level computations, but they're not yetcapable of doing the simple and the most reasonablethings that a human being can

If you tell a machine to processlike a million documents, it'll probably do that ina matter of 10 seconds, or a minute, or even 10 minutes

But if you ask a machine towalk up to your living room and switch on the TV, a machine will take forever to learn that, because machines don't havethe reasonable way of thinking

They have a very strong processing unit, but they're not yet capable of thinking and reasoninglike a human being

So that's exactly why we're still stuck on artificial narrow intelligence

So far we haven't developed any machine that can fully be called strong AI, even though there areexamples of AlphaGo Zero which defeated AlphaGo in the game of Go

AlphaGo Zero basically learnedin a span of four months

It learned on its own withoutany human intervention

But even then, it was not classified as a fully strong artificial intelligence, because it cannot reasonlike a human being

Moving onto artificial super intelligence

Now this is a term referring to the time when the capabilities of a computer will surpass that of a human being

In all actuality, I'll take a while for us to achieve artificial super intelligence

Presently, it's seen asa hypothetical situation as depicted in movies andany science fiction books wherein machines havetaken over the world, movies like Terminator and all of that depict artificial super intelligence

These don't exist yet, which we should be thankful for, but there are a lot of people who speculate thatartificial super intelligence will take over the world by the year 2040

So guys, these were the different types or different stages ofartificial intelligence

To summarize everything,like I said before, narrow intelligence is theonly thing that exist for now

We have only weak AI or weakartificial intelligence

All the major AI technologies that you see are artificial narrow intelligence

We don't have any machineswhich are capable of thinking like human beings orreasoning like a human being

Now let's move on and discuss the different programming language for AI

So there are actually N number of language that can be used forartificial intelligence

I'm gonna mention a few of them

So, first, we have Python

Python is probably themost famous language for artificial intelligence

It's also known as the mosteffective language for AI, because a lot of developersprefer to use Python

And a lot of scientistsare also comfortable with the Python language

This is partly because the syntaxes which belong to Python are very simple and they can be learned very easily

It's considered to be one of the most easiest language to learn

And also many other AI algorithms and machine learning algorithms can be easily implemented in Python, because there are a lot of libraries which are predefined functionsfor these algorithms

So all you have to do is youhave to call that function

You don't actually haveto call your algorithm

So, Python is considered the best choice for artificial intelligence

With Python stands R, which is a statisticalprogramming language

Now R is one of themost effective language and environment for analyzingand manipulating the data for statistical purpose

It is a statistical programming language

So using R we can easily produce well designed publication quality plots, including mathematical symboland formula, wherever needed

If you ask me, I thinkR is also one of the easiest programming language to learn

The syntax is very similarto English language, and it also has N number of libraries that support statistics, data science, AI, machine learning, and so on

It also has predefined functions for machine learning algorithms, natural language processing, and so on

So R is also a very good choice if you want to get startedwith programming languages for machine learning or AI

Apart from this, we have Java

Now Java can also beconsidered as a good choice for AI development

Artificial intelligence has a lot to do with search algorithms, artificial neural networks,and genetic programming, and Java provides many benefits

It's easy to use

Debugging is very easy, package services

There is simplified workwith large scale projects

There's a good user interaction, and graphical representation of data

It has something known asthe standard widget toolkit, which can be used for makinggraphs and interfaces

So, graphic virtualization is actually a very important part of AI, or data science, or machinelearning for that matter

Let me list out a few more languages

We also have something known as Lisp

Now shockingly, a lotof people have not heard of this language

This is actually the oldestand the most suited language for the development ofartificial intelligence

It is considered to be a language which is very suited for the development of artificial intelligence

Now let me tell you that this language was invented by John McCarthy who's also known as the fatherof artificial intelligence

He was the person who coined the term artificial intelligence

It has the capability ofprocessing symbolic information

It has excellent prototyping capabilities

It is easy, and it creates dynamicobjects with a lot of ease

There's automatic garbagecollection in all of that

But over the years,because of advancements, many of these features have migrated into many other languages

And that's why a lot ofpeople don't go for Lisp

There are a lot of new languages which have more effective features or which have better packages you can see

Another language I liketo talk about is Prolog

Prolog is frequentlyused in knowledge base and expert systems

The features provided by Prolog include pattern matching,freebase data structuring, automatic back tracking and so on

All of these features provide a very powerful and flexibleprogramming framework

Prolog is actually widelyused in medical projects and also for designing expert AI systems

Apart from this, we also have C++, we have SaaS, we have JavaScript which can also be used for AI

We have MATLAB, we have Julia

All of these languagesare actually considered pretty good languages forartificial intelligence

But for now, if you ask me which programminglanguage should I go for, I would say Python

Python has all the possible packages, and it is very easy tounderstand and easy to learn

So let's look at a coupleof features of Python

We can see why we should go for Python

First of all, Python was created in the year 1989

It is actually a veryeasy programming language

That's one of the reasons why a lot of people prefer Python

It's very easy to understand

It's very easy to grasp this language

So Python is an interpreted,object-oriented, high-level programming language, and it can be very easily implemented

Now let me tell you afew features of Python

It's very simple and easy to learn

Like I mentioned, it is one of the easiestprogramming language, and it also free and open source

Apart from that, it isa high-level language

You don't have to worry about anything like memory allocation

It is portable, meaning that you canuse it on any platform like Linux, Windows,Macintosh, Solaris, and so on

It support different programming paradigms like object-oriented andprocedure oriented programming, and it is extensible, meaning that it can invokeC and C++ libraries

Apart from this, letme tell you that Python is actually gaining unbelievablehuge momentum in AI

The language is used to developdata science algorithms, machine learning algorithms,and IoT projects

The other advantages to Python also, the fact that you don't have to code much when it comes to Pythonfor AI or machine learning

This is because thereare ready-made packages

There are predefined packages that have all the functionand algorithm stored

For example, there issomething known as PiBrain, which can be used for machine learning, NumPy which can be usedfor scientific computation, Pandas and so on

There are N number of libraries in Python

So guys, I'm now going togo into depth of Python

I'm now going to explain Python to you, since this session is aboutartificial intelligence

So, those of you who don'tknow much about Python or who are new to Python, I will leave a couple oflinks in the description box

You all can get started with programming and any other concepts or any other doubts that you have on Python

We have a lot of contentaround programming with Python or Python for machine learning and so on

Now let's move on and talk about one of the most important aspects of artificial intelligence, which is machine learning

Now a lot of people alwaysask me this question

Is machine learning andartificial intelligence the same thing? Well, both of them are not the same thing

The difference betweenAI and machine learning is that machine learning isused in artificial intelligence

Machine learning is a method through which you can feeda lot of data to a machine and make it learn

Now AI is a vast of field

Under AI, we have machinelearning, we have NLP, we have expert systems,we have image recognition, object detection, and so on

We have deep learning also

So, AI is sort of a processor it's a methodology in which you make machines mimic the behavior of human beings

Machine learning is a way in which you feed a lotof data to a machine, so that it can make it's own decisions

Let's get into depthabout machine learning

So first, we'll understandthe need for machine learning or why machine learningcame into existence

Now the need for machine learning begins since the technicalrevolution itself

So, guys, since technologybecame the center of everything, we've been generating animmeasurable amount of data

As per research, we generate around 2

5 quintillion bytes ofdata every single data every single day

And it is estimatedthat by this year, 2020, 1

7 mb of data will becreated every second for every person on earth

So as I'm speaking to you right now, I'm generating a lot of data

Now your watching this video on YouTube also accounts for data generation

So there's data everywhere

So with the availability of so much data, it is finally possible tobuild predictive models that can study and analyze complex data to find useful insights anddeliver more accurate results

So, top tier companieslike Netflix and Amazon build such machine learning models by using tons of data in order to identify anyprofitable opportunity and avoid any unwanted risk

So guys, one thing youall need to know is that the most important thingfor artificial intelligence is data

For artificial intelligence or whether it's machinelearning or deep learning, it's always data

And now that we have a lot of data, we can find a way to analyze, process, and draw useful insights from this data in order to help us grow businesses or to find solutions to some problems

Data is the solution

We just need to knowhow to handle the data

And the way to handle data is through machinelearning, deep learning, and artificial intelligence

A few reasons why machinelearning is so important is, number one, due toincrease in data generation

So due to excessive production of data, we need to find a method that can be used to structure, analyze, anddraw useful insights from data, this is where machine learning comes in

It is used to solveproblems and find solutions through the most complextask faced by organizations

Apart form this, we also neededto improve decision making

So by making use of various algorithms, machine learning can be used to make better business decisions

For example, machine learningis used to focus sales

It is used to predict anydownfalls n the stock market or identify any sortof risk and anomalies

Other reasons include thatmachine learning helps us uncover patterns and trends in data

So finding hidden patterns and extracting key insights fro data is the most importantpart of machine learning

So by building predictive models and using statistical techniques, machine learning allows youto dig beneath the surface and explode the data at a minute scale

Understanding data andextracting patterns manually takes a lot of time

It'll take several days for us to extract any usefulinformation from data

But if you use machinelearning algorithms, you can perform similarcomputations in less than a second

Another reason is we needto solve complex problems

So from detecting the genes linked to the deadly ALS disease, to building self-driving cars, machine learning can be used to solve the most complex problems

At present, we alsofound a way to spot stars which are 2,400 lightyears away from our planet

Okay, all of this is possible through AI, machine learning, deeplearning, and these techniques

So to sum it up, machine learning is veryimportant at present because we're facing alot of issues with data

We're generating a lot of data, and we have to handle this data in such a way that in benefits us

So that's why machine learning comes in

Moving on, what exactlyis machine learning? So let me give you a shorthistory of machine learning

So machine learning wasfirst coined by Arthur Samuel in the year 1959, which is just three years from when artificial intelligence was coined

So, looking back, that year was probably the most significant in termsof technological advancement, because most of the technologies today are based on the conceptof machine learning

Most of the AI technologies itself are based on the concept of machine learning and deep learning

Don't get confused about machine learning and deep learning

We'll discuss about deeplearning in the further slides, where we'll also see the difference between AI, machinelearning, and deep learning

So coming back to whatexactly machine learning is, if we browse through the internet, you'll find a lot of definitions about what exactly machine learning is

One of the definitions I found was a computer program is saidto learn from experience E with respect to some class of task T and performance measure P ifits performance at task in T, as measured by P, improveswith experience E

That's very confusing, so letme just narrow it down to you

In simple terms, machinelearning is a subset of artificial intelligence which provides machines the ability to learn automatically andimprove with experience without being explicitlyprogrammed to do so

In the sense, it is the practice of getting machines to solve problems by gaining the ability to think

But now you might be thinking how can a machine think or make decisions

Now machines are very similar to humans

Okay, if you feed a machinea good amount of data, it will learn how to interpret, process, and analyze this data by usingmachine learning algorithms, and it will help you solve world problems

So what happens here is a lot of data is fed to the machine

The machine will train on this data and it'll build a predictive model with the help of machinelearning algorithms in order to predict some outcome or in order to find somesolution to a problem

So it involves data

You're gonna train the machine and build a model by usingmachine learning algorithms in order to predict some outcome or to find a solution to a problem

So that is a simple way of understanding what exactly machine learning is

I'll be going into moredepth about machine learning, so don't worry if you haveunderstood anything as of now

Now let's discuss a couple terms which are frequentlyused in machine learning

So, the first definition thatwe come across very often is an algorithm

So, basically, a machinelearning algorithm is a set of rules andstatistical techniques that is used to learn patterns from data and draw significant information from it

Okay

So, guys, the logic behinda machine learning model is basically the machinelearning algorithm

Okay, an example of amachine learning algorithm is linear regression, or decisiontree, or a random forest

All of these are machinelearning algorithms

We'll define the logic behind a machine learning model

Now what is a machine learning model? A model is actually the main component of a machine learning process

Okay, so a model is trained by using the machine learning algorithm

The difference between analgorithm and a model is that an algorithm maps all the decisions that a model is supposed to take based on the given input in order to get the correct output

So the model will use the machine learning algorithm in order to draw usefulinsights from the input and give you an outcomethat is very precise

That's the machine learning model

The next definition wehave is predictor variable

Now a predictor variableis any feature of the data that can be used to predict the output

Okay, let me give you an example to make you understand whata predictor variable is

Let's say you're trying topredict the height of a person, depending on his weight

So here your predictorvariable becomes your weight, because you're usingthe weight of a person to predict the person's height

So your predictor variablebecomes your weight

The next definition is response variable

Now in the same example, height would be the response variable

Response variable is also known as the target variable orthe output variable

This is the variable thatyou're trying to predict by using the predictor variables

So a response variable is the feature or the output variablethat needs to be predicted by using the predictor variables

Next, we have somethingknown as training data

Now training and testingdata are terminologies that you'll come across very often in a machine learning process

So training data is basicallythe data that I used to create the machine learning model

So, basically in amachine learning process, when you feed data into the machine, it'll be divided into two parts

So splitting the data into two parts is also known as data splicing

So you'll take your input data, you'll divide it into two sections

One you'll call the training data, and the other you'llcall the testing data

So then you have somethingknown as the testing data

The training data is basically used to create the machine learning model

The training data helpsthe model to identify key trends and patterns which are essential to predict the output

Now the testing data is,after the model is trained, it must be tested in orderto evaluate how accurately it can predict an outcome

Now this is done byusing the testing data

So, basically, the trainingdata is used to train the model

The testing data is used to test the efficiency of the model

Now let's move on and get our next topic, which is machine learning process

So what is the machine learning process? Now the machine learning process involves building a predictive model that can be used to find a solution for a problem statement

Now in order to solve anyproblem in machine learning, there are a couple of stepsthat you need to follow

Let's look at the steps

The first step is you definethe objective of your problem

And the second step is data gathering, which is followed by preparing your data, data exploration, building a model, model evaluation, andfinally making predictions

Now, in order to understandthe machine learning process, let's assume that you'vebeen given a problem that needs to be solvedby using machine learning

So the problem that you need to solve is we need to predict the occurrence of rain in your local area byusing machine learning

So, basically, you need to predict the possibility of rain bystudying the weather conditions

So what we did here is we basically looked at step number one, which is define theobjective of the problem

Now here you need toanswer questions such as what are we trying to predict

Is that output going tobe a continuous variable, or is it going to be a discreet variable? These are the kinds of questionsthat you need to answer in the first page, which is defining the objectiveof the problem, right? So yeah, exactly whatare the target feature

So here you need to understand which is your target variable and what are the differentpredictor variables that you need in orderto predict this outcome

So here our targetvariable will be basically a variable that can tell us whether it's going to rain or not

Input data is we'llneed data such as maybe the temperature on a particular day or the humidity level, theprecipitation, and so on

So you need to define theobjective at this stage

So basically, you have toform an idea of the problem at this storage

Another question thatyou need to ask yourself is what kind of problem are you solving

Is this a binary classification problem, or is this a clustering problem, or is this a regression problem? Now, a lo of you might not be familiar with the terms classification clustering and regression in termsof machine learning

Don't worry, I'll explainall of these terms in the upcoming slides

All you need to understand at step one is you need to define how you'regoing to solve the problem

You need to understand what sort of data you need to solve the problem, how you're going to approach the problem, what are you trying to predict, what variables you'll need inorder to predict the outcome, and so on

Let's move on and look at step number two, which is data gather

Now in this stage, you mustbe asking questions such as, what kind of data is neededto solve this problem? And is this data available? And if it is available, fromwhere can I get this data and how can I get the data? Data gathering is one ofthe most time-consuming steps in machine learning process

If you have to go manuallyand collect the data, it's going to take a lot of time

But lucky for us, there area lot of resources online, which were wide data sets

All you need to do is web scraping where you just have to goahead and download data

One of the websites I cantell you all about is Cargill

So if you're a beginnerin machine learning, don't worry about datagathering and all of that

All you have to do is goto websites such as cargill and just download the data set

So coming back to the problemthat we are discussing, which is predicting the weather, the data needed for weather forecasting includes measures like humidity level, the temperature, thepressure, the locality, whether or not you live in a hill station, such data has to be collectedor stored for analysis

So all the data is collected during the data gathering stage

This step is followed by data preparation, or also known as data cleaning

So if you're going around collecting data, it's almost never in the right format

And eve if you are taking data from online resources from any website, even then, the data will requirecleaning and preparation

The data is never in the right format

You have to do some sort of preparation and some sort of cleaning in order to make thedata ready for analysis

So what you'll encounterwhile cleaning data is you'll encounter alot of inconsistencies in the data set, like you'll encounter som missing values, redundant variables, duplicatevalues, and all of that

So removing such inconsistenciesis very important, because they might lead to any wrongful computations and predictions

Okay, so at this stageyou can scan the data set for any inconsistencies, and you can fix them then and there

Now let me give you a smallfact about data cleaning

So there was a survey thatwas ran last year or so

I'm not sure

And a lot of data scientists were asked which step was the mostdifficult or the most annoying and time-consuming of all

And 80% of the data scientist said it was data cleaning

Data cleaning takes up 80% of their time

So it's not very easy toget rid of missing values and corrupted data

And even if you get rid of missing values, sometimes your dataset might get affected

It might get biasedbecause maybe one variable has too many missing values, and this will affect your outcome

So you'll have to fix such issue, we'll have to deal withall of this missing data and corrupted data

So data cleaning is actuallyone of the hardest steps in machine learning process

Okay, now let's move onand look at our next step, which is exploratory data analysis

So here what you do is basically become a detective in the stage

So this stage, which is EDAor exploratory data analysis, is like the brainstormingstage of machine learning

Data exploration involvesunderstanding the patterns and the trends in your data

So at this stage, all theuseful insights are drawn and any correlations betweenthe various variables are understood

What do I mean by trends andpatterns and correlations? Now let's consider our example which is we have to predict the rainfall on a particular day

So we know that there is astrong possibility of rain if the temperature has fallen law

So we know that our output will depend on variables such as temperature,humidity, and so on

Now to what level itdepends on these variables, we'll have to find out that

We'll have to find out the patterns, and we'll find out the correlations between such variables

So such patterns and trendshave to be understood and mapped at this stage

So this is what exploratorydata analysis is about

It's the most importantpart of machine learning

This is where you'll understand what exactly your data is and how you can form thesolution to your problem

The next step in amachine learning process is building a machine learning module

So all the insights and the patterns that you derive duringthe data exploration are used to build amachine learning model

So this stage always beginsby splitting the data set into two parts, which istraining data and testing data

I've already discussed with you that the data that you usedin a machine learning process is always split into two parts

We have the training dataand we have the testing data

Now when you're building a model, you always use the training data

So you always make useof the training data in order to build the model

Now a lot of you might beasking what is training data

Is it different from the input data that you're feeding with the machine or is it different from the testing data? Now training data is the same input data that you're feeding to the machine

The only difference is that you're splitting the data set into two

You're randomly picking 80% of your data and you're assigning for training purpose

And the rest 20%, probably, you'll assign it for testing purpose

So guys, always rememberanother thing that the training data is always much more than your testing data, obviously because you needto train your machine

And the more data you feed the machine during the training phase, the better it will beduring the testing phase

Obviously, it'll predict better outcomes if it is being trained on more data

Correct? So the model is basically using the machine learning algorithmthat predicts the output by using the data fed to it

Now in the case of predicting rainfall, the output will be a categorical variable, because we'll be predicting whether it's going to rain or not

Okay, so let's say we have anoutput variable called rain

The two possible valuesthat this variable can take is yes it's going to rainand no it won't rain

Correct, so that is out come

Our outcome is a classificationor a categorical variable

So for such cases where your outcome is a categorical variable, you'll be using classification algorithms

Again, example of aclassification algorithm is logistic regression or you can also support vector machines, you can use K nearest neighbor, and you can also usenaive Bayes, and so on

Now don't worry about these terms, I'll be discussing allthese algorithms with you

But just remember thatwhile you're building a machine learning model, you'll make use of the training data

You'll train the model byusing the training data and the machine learning algorithm

Now like I said, choosing themachine learning algorithm, depends on the problem statement that you're trying to solve because of N number ofmachine learning algorithms

We'll have to choose the algorithm that is the most suitablefor your problem statement

So step number six is model evaluation and optimization

Now after you've done building a model by using the training data set, it is finally time toput the model road test

The testing data set is used to check the efficiency of the model and how accurately itcan predict the outcome

So once the accuracy is calculated, any further improvements in the model can be implemented during this stage

The various methods that can help you improve the performance of the model, like you can use parameter tuning and cross validation methods in order to improve theperformance of the model

Now the main things you need to remember during model evaluation and optimization is that model evaluation is nothing but you're testing how well yourmodel can predict the outcome

So at this stage, you will beusing the testing data set

In the previous stage,which is building a model, you'll be using the training data set

But in the model evaluation stage, you'll be using the testing data set

Now once you've tested your model, you need to calculate the accuracy

You need to calculate how accurately your model is predicting the outcome

After that, if you find that you need to improve your model insome way or the other, because the accuracy is not very good, then you'll use methodssuch as parameter tuning

Don't worry about these terms, I'll discuss all of this with you, but I'm just trying to make sure that you're understanding the concept behind each of the phasesand machine learning

It's very important youunderstand each step

Okay, now let's move on and look at the last stage of machinelearning, which is predictions

Now, once a model is evaluated and once you've improved it, it is finally used to make predictions

The final output can eitherbe a categorical variable or a continuous variable

Now all of this dependson your problem statement

Don't get confused aboutcontinuous variables, categorical variables

I'll be discussing all of this

Now in our case, because we're predicting the occurrence of rainfall, the output will be categorical variable

It's obvious because we're predicting whether it's going to rain or not

The result, we understand that this is a classification problem because we have a categorical variable

So that was the entiremachine learning process

Now it's time to learnabout the different ways in which machines can learn

So let's move ahead and look at the types of machine learning

Now this is one of the most interesting concepts in machine learning, the three different waysin which machines learn

There is something knownas supervised learning, unsupervised learning, andreinforcement learning

So we'll go through this one by one

We'll understand whatsupervised learning is first, and then we'll look atthe other two types

So defined supervised learning, it is basically atechnique in which we teach or train the machine by using the data, which is well labeled

Now, in order to understandsupervised learning, let's consider a small example

So, as kids, we all neededguidance to solve math problems

A lot of us had troublesolving math problems

So our teachers always helpus understand what addition is an dhow it is done

Similarly, you can thinkof supervised learning as a type of machine learning that involves a guide

The label data set is a teacher that will train you to understandthe patterns in the data

So the label data set is nothingbut the training data set

I'll explain more about this in a while

So, to understandsupervised learning better, let's look at the figure on the screen

Right here we're feeding the machine image of Tom and Jerry, and the goal is forthe machine to identify and classify the images into two classes

One will contain images of Tom and the the other willcontain images of Jerry

Now the main thing that you need to note in supervised learningis a training data set

The training data set isgoing to be very well labeled

Now what do I mean when I say that training data set is labeled

Basically, what we're doingis we're telling the machine this how Tom looks andthis is how Jerry looks

By doing this, you're training the machine by using label data

So the main thing that you'redoing is you're labeling every input data thatyou're feeding to the model

So, basically, you're entiretraining data set is labeled

Whenever you're giving an image of Tom, there's gonna be a labelthere saying this is Tom

And when you're giving an image of Jerry, you're saying that thisis how Jerry looks

So, basically, you're guiding the machine and you're telling that,"Listen, this is how Tom looks, "this is how Jerry looks, "and now you need to classify them "into two different classes

" That's how supervised learning works

Apart from that, it'sthe same old process

After getting the input data, you're gonna perform data cleaning

Then there's exploratory data analysis, followed by creating the model by using the machine learning algorithm, and then this is followedby model evaluation, and finally, your predictions

Now, one more thing to note here is that the output that you get byusing supervised learning is also labeled output

So, basically, you'llget two different classes of name Tom and one of name Jerry, and you'll get them labeled

That is how supervised learning works

The most important thingin supervised learning is that you're training the model by using labeled data set

Now let's move on and lookat unsupervised learning

We look at the same example and understand how unsupervised learning works

So what exactly is unsupervised learning? Now this involves trainingby using unlabeled data and allowing the model toact on that information without any guidance

Alright

Like the name suggest itself, there is no supervision here

It's unsupervised learning

So think of unsupervisedlearning as a smart kid that learns without any guidance

Okay, in this type of machine learning, the model is not fed with any label data, as in the model has no clue that this is the image of Tom and this is Jerry

It figures out patterns and the difference betweenTom and Jerry on its own by taking in tons and tons of data

Now how do you think themachine identifies this as Tom, and then finally gives us the output like yes this is Tom, this is Jerry

For example, it identifiesprominent features of Tom, such as pointy ears,bigger in size, and so on, to understand that thisimage is of type one

Similarly, it finds out features in Jerry, and knows that this image is of type two, meaning that the first image is different from the second image

So what the unsupervisedlearning algorithm or the model does is it'llform two different clusters

It'll form one clusterwhich are very similar, and the other clusterwhich is very different from the first cluster

That's how unsupervised learning works

So the important thingsthat you need to know in unsupervised learning is that you're gonna feedthe machine unlabeled data

The machine has to understand the patterns and discover the output on its own

And finally, the machinewill form clusters based on feature similarity

Now let's move on and locate the last type of machine learning, which is reinforcement learning

Reinforcement learning is quite different when compared to supervisedand unsupervised learning

What exactly is reinforcement learning? It is a part of machinelearning where an agent is put in an environment, and he learns to behavein this environment by performing certain actions, and observing the rewards whichis gets from those actions

To understand whatreinforcement learning is, imagine that you were droppedoff at an isolate island

What would you do? Now panic

Yes, of course, initially,we'll all panic

But as time passes by, you will learn how to live on the island

You will explode the environment, you will understandthe climate conditions, the type of food that grows there, the dangers of the island so on

This is exactly howreinforcement learning works

It basically involves an agent, which is you stuck on the island, that is put in an unknownenvironment, which is the island, where he must learn byobserving and performing actions that result in rewards

So reinforcement learning is mainly used in advanced machine learning areas such as self-driving cars and AlphaGo

I'm sure a lot of youhave heard of AlphaGo

So, the logic behind AlphaGo is nothing but reinforcementlearning and deep learning

And in reinforcement learning, there is not really any inputdata given to the agent

All he has to do is he has to explore everything from scratch it's like a newborn baby withno information about anything

He has to go aroundexploring the environment, and getting rewards, andperforming some actions which results in either rewards or in some sort of punishment

Okay

So that sums up the typesof machine learning

Before we move ahead, I'd like to discuss the difference between the three typesof machine learning, just to make the concept clear to you all

So let's start by lookingat the definitions of each

In supervised learning, the machine will learnby using the label data

In unsupervised learning,they'll be unlabeled data, and the machine has to learnwithout any supervision

In reinforcement learning,there'll be an agent which interacts with the environment by producing actions anddiscover errors or rewards based on his actions

Now what are the type of problems that can be solved by usingsupervised, unsupervised, and reinforcement learning

When it comes to supervised learning, the two main types ofproblems that are solved is regression problems andclassification problems

When it comes to unsupervised learning, it is association and clustering problems

When it comes to reinforcement learning, it's reward-based problems

I'll be discussingregression, classification, clustering, and all of thisin the upcoming slides, so don't worry if youdon't understand this

Now the type of data which isused in supervised learning is labeled data

In unsupervised learning, it unlabeled

And in reinforcement learning, we have no predefined data set

The agent has to doeverything from scratch

Now the type of training involved in each of these learnings

In supervised learning, thereis external supervision, as in there is the labeled data set which acts as a guidefor the machine to learn

In unsupervised learning,there's no supervision

Again, in reinforcement learning, there's no supervision at all

Now what is the approach to solve problems by using supervised, unsupervised, and reinforcement learning? In supervised learning, it is simple

You have to mal the labeledinput to the known output

The machine knows whatthe output looks like

So you're just labelingthe input to the output

In unsupervised learning, you're going to understand the patterns and discover the output

Here you have no clueabout what the input is

It's not labeled

You just have to understand the patterns and you'll have to form clustersand discover the output

In reinforcement learning,there is no clue at all

You'll have to follow thetrial and error method

You'll have to go around your environment

You'll have to explore the environment, and you'll have to try some actions

And only once you perform those actions, you'll know that whetherthis is a reward-based action or whether this is apunishment-based action

So, reinforcementlearning is totally based on the concept of trial and error

Okay

A popular algorithm on thesupervised learning include linear regression, logistic regressions, support vector machinesK nearest neighbor, naive Bayes, and so on

Under unsupervised learning, we have the famous K-meansclustering method, C-means and all of that

Under reinforcement learning, we have the famous learningQ-learning algorithm

I'll be discussing thesealgorithms in the upcoming slides

So let's move on andlook at the next topic, which is the types of problems solved using machine learning

Now this is what we weretalking about earlier when I said regression, classification, and clustering problems

Okay, so let's discuss whatexactly I mean by that

In machine learning,all the problems can be classified into three types

Every problem that isapproached in machine learning can be put interest oneof these three categories

Okay, so the first typeis known as a regression, then we have classificationand clustering

So, first, let's look atregression type of problems

So in this type problem, the output is alwaysa continuous quantity

For example, if you want to predict the speed of a car, given the distance, it is a regression problem

Now a lot of you might not be very aware of what exactly a continuous quantity is

A continuous quantity isany quantity that can have an infinite range of values

For example, The weight of a person, it is a continuous quantity, because our weight can be 50, 50

1, 50

001, 5

0021, 50

0321 and so on

It can have an infiniterange of values, correct? So the type of problemthat you have to predict a continuous quantity to makeuse of regression algorithms

So, regression problems can be solved by using supervised learning algorithms like linear regression

Next, we have classification

Now in this type of problem, the output is always a categorical value

Now when I say categorical value, it can be value such as the gender of a personis a categorical value

Now classifying emailsinto two two classes like spam and non-spam isa classification problem that can be solved by using supervised learningclassification algorithms, like support vector machines, naive Bayes, logistic regression, Knearest neighbor, and so on

So, again, the main aim in classification is to compute the category of the data

Coming to clustering problems

This type of problem involves assigned input into two or more clusters based on feature similarity

Thus when I read this sentence, you should understand thatthis is unsupervised learning, because you don't haveenough data about your input, and the only option thatyou have is to form clusters Categories are formedonly when you know that your data is of two type

Your input data is labeledand it's of two types, so it's gonna be a classification problem

But when a clustering problem happens, when you don't have muchinformation about your input, all you have to do isyou have to find patterns and you have to understand that data points which are similar are clustered into one group, and data points which aredifferent from the first group are clustered into another group

That's what clustering is

An example is in Netflix what happens is Netflix clusters theirusers into similar groups based on their interest,based on their age, geography, and so on

This can be done by usingunsupervised learning algorithms like K-means

Okay

So guys, there were thethree categories of problems that can be solved byusing machine learning

So, basically, what I'm trying to say is all the problems will fallinto one of these categories

So any problem that you giveto a machine learning model, it'll fall into one of these categories

Okay

Now to make things alittle more interesting, I have collected real world data sets from online resources

And what we're gonna do is we'regoing to try and understand if this is a regression problem, or a clustering problem, ora classification problem

Okay

Now the problem statement in here is to study the house sales data set, and build a machine learning model that predicts the house pricing index

Now the most importantthing you need to understand when you read a problem statement is you need to understandwhat is your target variable, what are the possible predictorvariable that you'll need

The first thing you shouldlook at is your targe variable

If you want to understandif this a classification, regression, or clustering problem, look at your target variableor your output variable that you're supposed to predict

Here you're supposed to predictthe house pricing index

Our house pricing index is obviously a continuous quantity

So as soon as you understand that, you'll know that thisis a regression problem

So for this, you can make use of the linear regression algorithm, and you can predict thehouse pricing index

Linear regression is theregression algorithm

It is a supervised learning algorithm

We'll discuss more aboutit in the further slides

Let's look at our next problem statement

Here you have to studya bank credit data set, and make a decision about whether to approve the loan of an applicant based on his profile

Now what is your outputvariable over here? Your output variable isto predict whether you can approve the loan of a applicant or not

So, obviously, your outputis going to be categorical

It's either going to be yes or no

Yes is basically approved loan

No is reject loan

So here, you understand that this is a classification problem

Okay

So you can make use ofalgorithms like KNN algorithm or you can make use ofsupport vector machines in order to do this

So, support vector machine and KNN which is K nearest neighbor algorithms are basically supervisedlearning algorithm

We'll talk more about thatin the upcoming slides

Moving on to our next problem statement

Here the problem statement is to cluster a set of movies as either good or average based on the social media outreach

Now if you look properly, your clue is in the question itself

The first line it says isto cluster a set of movies as either good or average

Now guys, whenever youhave a problem statement that is asking you to group the data set into different groups or to form different, different clusters, it's obviously a clustering problem

Right here you can make use of the K-means clustering algorithm, and you can form two clusters

One will contain the popular movies and the other will containthe non-popular movies

These alright smallexamples of how you can use machine learning tosolve clustering problem, the regression, andclassification problems

The key is you need to identifythe type of problem first

Now let's move on anddiscuss the different types of machine learning algorithms

So we're gonna start bydiscussing the different supervised learning algorithms

So to give you a quick overview, we'll be discussing the linear regression, logistic regression, and decision tree, random forest, naive Bayes classifier, support vector machines,and K nearest neighbor

We'll be discussingthese seven algorithms

So without any further delay, let's look at linear regression first

Now what exactly is alinear regression algorithm? So guys, linear regression is basically a supervised learning algorithm that is used to predict acontinuous dependent variable y based on the values ofindependent variable x

Okay

The important thing to note here is that the dependent variable y, the variable that you'retrying to predict, is always going to bea continuous variable

But the independent variable x, which is basically thepredictor variables, these are the variablesthat you'll be using to predict your output variable, which is nothing butyour dependent variable

So your independent variablesor your predictive variables can either be continuous or discreet

Okay, there is not sucha restriction over here

Okay, they can be eithercontinuous variables or they can be discreet variables

Now, again, I'll tell youwhat a continuous variable is, in case you've forgotten

It is a vary that has infinitenumber of possibilities

So I'll give you an exampleof a person's weight

It can be 160 pounds, orthey can weigh 160

11 pounds, or 160

1134 pounds and so on

So the number of possibilitiesfor weight is limitless, and this is exactly whata continuous variable is

Now in order to understandlinear regression, let's assume that you want to predict the price of a stock over a period of time

Okay

For such a problem, you canmake use of linear regression by starting the relationship between the dependent variable, which is the stock price, and the independentvariable, which is the time

You're trying to predict the stock price over a period of time

So basically, you're gonnacheck how the price of a stock varies over a period of time

So your stock price is going to be your dependent variableor your output variable, and the time is going tobe your predictor variable or your independent variable

Let's not confuse it anymore

Your dependent variableis your output variable

Okay, your independentvariable is your input variable or your predictor variable

So in our case, thestock price is obviously a continuous quantity, because the stock price can have an infinite number of values

Now the first step in linear regression is always to draw out a relationship between your dependent andyour independent variable by using the best fitting linear length

We make an assumption that your dependent and independent variable is linearly related to each other

We call it linear regression because both the variables vary linearly, which means that byplotting the relationship between these two variables, we'll get more of a straightline, instead of a curve

Let's discuss the mathbehind linear regression

So, this equation over here, it denotes the relationship between your independent variable x, which is here, and your dependent variable y

This is the variableyou're trying to predict

Hopefully, we all know that the equation for a linear line in mathis y equals mx plus c

I hope all of you remember math

So the equation for a linear line in math is y equals to mx plus c

Similarly, the linear regression equation is represented along the same line

Okay, y equals to mx plus c

There's just a little bit of changes, which I'll tell you what they are

Let's understand this equation properly

So y basically stands foryour dependent variable that you're going to predict

B naught is the y intercept

Now y intercept is nothingbut this point here

Now in this graph, you're basically showing the relationship betweenyour dependent variable y and your independent variable x

Now this is the linear relationship between these two variables

Okay, now your y intercept is basically the point on the line which starts at the y-axis

This is y interceptor, which is represented by B naught

Now B one or beta isthe slope of this line now the slope can eitherbe negative or positive, depending on the relationshipbetween the dependent and independent variable

The next variable that we have is x

X here represents the independent variable that is used to predict ourresulting output variable

Basically, x is used topredict the value of y

Okay

E here denotes the errorin the computation

For example, this is the actual line, and these dots here representthe predicted values

Now the distance between these two is denoted by the errorin the computation

So this is the entire equation

It's quite simple, right? Linear regression will basically draw a relationship between yourinput and your input variable

That's how simple linear regression was

Now to better understandlinear regression, I'll be running a demo in Python

So guys, before I get startedwith our practical demo, I'm assuming that most of you have a good understanding of Python, because explaining Python is going to be out of the scope of today's session

But if some of you are not familiar with the Python language, I'll leave a couple of linksin the description box

Those will be relatedto Python programming

You can go through thoselinks, understand Python, and then maybe try to understand the demo

But I'd be explaining the logicpart of the demo in depth

So the main thing thatwe're going to do here is try and understand linear regression

So it's okay if you do notunderstand Python for now

I'll try to explain as much as I can

But if you still want tounderstand this in a better way, I'll leave a couple oflinks in the description box you can go to those videos

Let me just zoom in for you

I hope all of you can see the screen

Now in this linear regression demo, what we're going to do is we're going to form a linear relationship between the maximum temperature and minimum temperatureon a particular date

We're just going to doweather forecasting here

So our task is to predictthe maximum temperature, taking input featureas minimum temperature

So I'm just going to tryand make you understand linear regression through this demo

Okay, we'll see how itactually works practically

Before I get started with the demo, let me tell you somethingabout the data set

Our data set is storedin this path basically

The name of the data set is weather

csv

Okay, now, this containsdata on whether conditions recorded on each day at various weatherstations around the world

Okay, the informationinclude precipitation, snowfall, temperatures, wind speeds, and whether the dayincluded any thunderstorm or other poor weather conditions

So our first step inany demo for that matter will be to import all thelibraries that are needed

So we're gonna begin our demo by importing all the required libraries

After that, we're goingto read in our data

Our data will be stored in this variable called data set, and we're going to use a read

csv function since our data set is in the CSV format

After that, I'll be showing you how the data set looks

We'll also look at the data set in depth

Now let me just show you the output first

Let's run this demo and see first

We're getting a couple of plots which I'll talk about in a while

So we can ignore this warning

It has nothing to do with

So, first of all, we're printingthe shape of our data set

So, when we print theshape of our data set, This is the output that we get

So, basically, thisshows that we have around 12,000 rows and 31columns in our data set

The 31 columns basically represent the predictor variables

So you can say that wehave 31 predictor variables in order to protect the weather conditions on a particular date

So guys, the main aimin this problem segment is weather forecast

We're going to predict the weather by using a set of predictor variables

So these are the different types of predictor variables that we have

Okay, we have somethingknown as maximum temperature

So this is what our data set looks like

Now what I'm doing inthis block of code is

What we're doing is we'replotting our data points on a 2D graph in order tounderstand our data set and see if we can manuallyfind any relationship between the variables

Here we've taken minimum temperature and maximum temperaturefor doing our analysis

So let's just look at this plot

Before that, let me just commentall of these other plots, so that you see on eithergraph that I'm talking about

So, when you look at this graph, this is basically the graphbetween your minimum temperature and your maximum temperature

Maximum temperature are dependent variable that you're going to predict

This is y

And your minim temperature is your x

It's basically your independent variable

So if you look at this graph, you can see that there is a sort of linear relationship between the two, except there are a little bitof outliers here and there

There are a few data pointswhich are a little bit random

But apart from that, there isa pretty linear relationship between your minimum temperature and your maximum temperature

So by this graphic, you can understand that you can easily solve this problem using linear regression, because our data is very linear

I can see a clear straight line over here

This is our first graph

Next, what I'm doing is I'm just checking the average and maximumtemperature that we have

I'm just looking at theaverage of our output variable

Okay

So guys, what we're doing here right now is just exploratory data analysis

We're trying to understand our data

We're trying to see the relationship between our input variableand our output variable

We're trying to seethe mean or the average of the output variable

All of this is necessaryto understand our data set

So, this is what our averagemaximum temperature looks like

So if we try to understandwhere exactly this is, so our average maximum temperature is somewhere between 28and I would say between 30

28 and 32, somewhere there

So you can say thataverage maximum temperature lies between 25 and 35

And so that is our averagemaximum temperature

Now that you know a littlebit about the data set, you know that there is avery good linear relationship between your input variableand your output variable

Now what you're goingto do is you're going to perform something known as data splicing

Let me just comment that for you

This section is nothing but data splicing

So for those of you whoare paying attention, know that data splicing is nothing but splitting your data set intotraining and testing data

Now before we do that, I mentioned earlier that we'llbe only using two variables, because we're trying to understandthe relationship between the minimum temperatureand maximum temperature

I'm doing this becauseI want you to understand linear regression in thesimplest way possible

So guys, in order to makeunderstand linear regression, I have just derived only twovariables from a data set

Even though when we checkthe structure of a data set, we had around 31 features, meaning that we had 31 variables which include my predictorvariable and my target variable

So, basically, we had30 predictor variables and we had one target variable, which is your maximum temperature

So, what I'm doing hereis I'm only considering these two variables, because I want to show you exactly how linear regression works

So, here what I'm doing is I'm basically extracting only these two variables from our data set, storing it in x and y

After that, I'm performing data splicing

So here, I'm basically splitting the data into training and testing data, and remember one point that I am assigning 20% of the data to our testing data set, and the remaining 80% isassigned for training

That's how training works

We assign maximum data set for training

We do this because we wantthe machine learning model or the machine learning algorithmto train better on data

We wanted to take asmuch data as possible, so that it can predictthe outcome properly

So, to repeat it again for you, so here we're just splitting the data into training and testing data set

So, one more thing to note here is that we're splitting 80% ofthe data from training, and we're assigning the 20%of the data to test data

The test size variable,this variable that you see, is what is used to specify the proportion of the test set

Now after splitting the datainto training and testing set, finally, the time isto train our algorithm

For that, we need to importthe linear regression class

We need to instantiate it and call the fit methodalong with the training data

This is our linear regression class, and we're just creating an instance of the linear regression class

So guys, a good thing about Python is that you have pre-definedclasses for your algorithms, and you don't have call your algorithms

Instead, all you have to do, is you call this classlinear regression class, and you have to create an instance of it

Here I'm basically creatingsomething known as a regressor

And all you have to do is youhave to call the fit method along with your training data

So this is my trainingdata, x train and y train contains my training data, and I'm calling our linearregression instance, which is regressor,along with this data set

So here, basically,we're building the model

We're doing nothingbut building the model

Now, one of the major things that linear regression model does is it finds the best value forthe intercept and the slope, which results in a linethat best fits the data

I've discussed whatintercept and slope is

So if you want to see theintercept and the slope calculated by our linear regression model, we just have to run this line of code

And let's looks at the output for that

So, our intercept is around 10

66 and our coefficient, these are also known as beta coefficients, coefficient are nothing butwhat we discussed, beta naught

These are beta values

Now this will just help you understand the significance of your input variables

Now what this coefficient value means is, see, the coefficient value is around 0

92

This means that for every one unit changed of your minimum temperature, the change in the maximumtemperature is around 0

92

This will just show you how significant your input variable is

So, for every one unit changein your minimum temperature, the change in the maximum temperature will be around 0

92

I hope you've understood this part

Now that we've trained our algorithm, it's trying to make some predictions

To do so, what we'll use iswe'll use our test data set, and we'll see how accurately our algorithm predicts the percentage score

Now to make predictions, we have this line of code

Predict is basically apredefined function in Python

And all you're going todo is you're going to pass your testing data set to this

Now what you'll do is you'll compare the actual output values, which is basically stored in your y test

And you'll compare theseto the predicted values, which is in y prediction

And you'll store thesecomparisons in our data frame called df

And all I'm doing here isI'm printing the data frame

So if you look at the output,this is what it looks like

These are your actual values and these are the valuesthat you predicted by building that model

So, if your actual value is 28, you predicted around 33, here your actual value is 31, meaning that your maximumtemperature is 31

And you predicted amaximum temperature of 30

Now, these values areactually pretty close

I feel like the accuracyis pretty good over here

Now in some cases, you seea lot of variance, like 23

Here it's 15

Right here it's 22

Here it's 11

But such cases are very often

And the best way to improveyour accuracy I would say is by training a model with more data

Alright

You can also view thiscomparison in the form of a plot

Let's see how that looks

So, basically, this is a bar graph that shows our actual valuesand our predicted values

Blue is represented by your actual values, and orange is representedby your predicted values

At places you can see that we've predicted pretty well, like the predictions are pretty close to the actual values

In some cases, the predictionsare varying a little bit

So in a few places, itis actually varying, but all of this depends onyour input data as well

When we saw the input data, also we saw a lot of variation

We saw a couple of outliers

So, all that also mighteffect your output

But then this is how youbuild machine learning models

Initially, you're never going to get a really good accuracy

What you should do is you have to improve your training process

That's the best wayyou can predict better, either you use a lot of data, train your model with a lot of data, or you use other methodslike parameter tuning, or basically you try and findanother predictor variable that'll help you more inpredicting your output

To me, this looks pretty good

Now let me show you another plot

What we're doing is we'redrawing a straight line plot

Okay, let's see how it looks

So guys, this straight line represents a linear relationship

Now let's say you get a new data point

Okay, let's say thevalue of x is around 20

So by using this line, you can predict that four aminimum temperature of 20, your maximum temperaturewould be around 25 or something like that

So, we basically drewa linear relationship between our input andoutput variable over here

And the final step is to evaluate the performance of the algorithm

This step is particularlyimportant to compare how well different algorithms perform on a particular data set

Now for regression algorithms, three evaluation metrics are used

We have something knownas mean absolute error, mean squared error, androot mean square error

Now mean absolute error is nothing but the absolute value of the errors

Your mean squared error is amean of the squared errors

That's all

It's basically you readthis and you understand what the error means

A root mean squarederror is the square root of the mean of the squared errors

Okay

So these are pretty simple to understand your mean absolute error,your mean squared errors, your root mean squared error

Now, luckily, we don'thave to perform these calculations manually

We don't have to code eachof these calculations

The cycle on library comeswith prebuilt functions that can be used to find out these values

Okay

So, when you run this code, you will get these values for each of the errors

You'll get around 3

19 asthe mean absolute error

Your mean squared error is around 17

63

Your root mean squarederror is around 4

19

Now these error values basically show that our model accuracy is not very precise, but it's still able tomake a lot of predictions

We can draw a good linear relationship

Now in order to improvethe efficiency at all, there are a lot of methods like this, parameter tuning and all of that, or basically you can train yourmodel with a lot more data

Apart from that, you can useother predictor variables, or maybe you can studythe relationship between other predictor variables and your maximum temperature variable

There area lot of ways to improve the efficiency of the model

But for now, I just wantedto make you understand how linear regression works, and I hope all of you havea good idea about this

I hope all of you havea good understanding of how linear regression works

This is a small demo about it

If any of you still have any doubts, regarding linear regression, please leave that in the comment section

We'll try and solve all your errors

So, if you look at this equation, we calculated everything here

we drew a relationship between y and x, which is basically x wasour minimum temperature, y was our maximum temperature

We also calculated theslope and the intercept

And we also calculatedthe error in the end

We calculated mean squared error we calculated the root mean squared error

We also calculate the mean absolute error

So that was everythingabout linear regression

This was a simple linear regression model

Now let's move on and lookat our next algorithm, which is a logistic regression

Now, in order to understandwhy we use logistic regression, let's consider a small scenarios

Let's say that your little sister is trying to get into grad school and you want to predictwhether she'll get admitted in her dream school or not

Okay, so based on herCGPA and the past data, you can use logistic regression to foresee the outcome

So logistic regressionwill allow you to analyze the set of variables andpredict a categorical outcome

Since here we need topredict whether she will get into a school or not, which is a classification problem, logistic regression will be used

Now I know the firstquestion in your head is, why are we not using linearregression in this case? The reason is that linear regression is used to predict a continuous quantity, rather than a categorical one

Here we're going to predict whether or not your sister isgoing to get into grad school

So that is clearly a categorical outcome

So when the result in outcome can take only classes of values, like two classes of values, it is sensible to have amodel that predicts the value as either zero or one, or in a probability form thatranges between zero and one

Okay

So linear regression doesnot have this ability

If you use linear regression to model a binary outcome, the resulting model willnot predict y values in the range of zero and one, because linear regression works on continuous dependent variables, and not on categorical variables

That's why we make useof logistic regression

So understand that linearregression was used to predict continuous quantities, and logistic regression is used to predict categorical quantities

Okay, now one majorconfusion that everybody has is people keep asking me why is logistic regressioncalled logistic regression when it is used for classification

The reason it is named logistic regression is because its primary technique is very similar to logistic regression

There's no other reason behind the naming

It belongs to the general linear models

It belongs to the sameclass as linear regression, but that is not the other reason behind the name logistic regression

Logistic regression is mainly used for classification purpose, because here you'll have topredict a dependent variable which is categorical in nature

So this is mainly used for classification

So, to define logistic regression for you, logistic regression isa method used to predict a dependent variable y, given an independent variable x, such that the dependentvariable is categorical, meaning that your outputis a categorical variable

So, obviously, this isclassification algorithm

So guys, again, to clear your confusion, when I say categorical variable, I mean that it can holdvalues like one or zero, yes or no, true or false, and so on

So, basically, in logistic regression, the outcome is always categorical

Now, how does logistic regression work? So guys, before I tell youhow logistic regression works, take a look at this graph

Now I told you that the outcome in a logistic regression is categorical

Your outcome will either be zero or one, or it'll be a probability thatranges between zero and one

So, that's why we have this S curve

Now some of you might thinkthat why do we have an S curve

We can obviously have a straight line

We have something knownas a sigmoid curve, because we can have valuesranging between zero and one, which will basically show the probability

So, maybe your output will be 0

7, which is a probability value

If it is 0

7, it means that youroutcome is basically one

So that's why we have thissigmoid curve like this

Okay

Now I'll explain more about this in depth in a while

Now, in order to understandhow logistic regression works, first, let's take a look at the linear regression equation

This was the logistic regression equation that we discussed

Y here stands for the dependent variable that needs to be predicted beta naught is nothing by the y intercept

Beta one is nothing but the slope

And X here representsthe independent variable that is used to predict y

That E denotes the erroron the computation

So, given the fact that xis the independent variable and y is the dependent variable, how can we represent arelationship between x an y so that y ranges onlybetween zero and one? Here this value basically denotes probably of y equal to one, given some value of x

So here, because thisPr, denotes probability and this value basicallydenotes that the probability of y equal to one, given some value of x, this is what we need to find out

Now, if you wanted tocalculate the probability using the linear regression model, then the probabilitywill look something like P of X equal to beta naught plus beta one into X

P of X will be equal to betanaught plus beta one into X, where P of X nothing but your probability ofgetting y equal to one, given some value of x

So the logistic regression equation is derived from the same equation, except we need to make a few alterations, because the output is only categorical

So, logistic regression doesnot necessarily calculate the outcome as zero or one

I mentioned this before

Instead, it calculates theprobability of a variable falling in the class zero or class one

So that's how we can conclude that the resulting variable must be positive, and it should lie between zero and one, which means that it must be less than one

So to meet these conditions, we have to do two things

First, we can take theexponent of the equation, because taking an exponential of any value will make sure that youget a positive number

Correct? Secondly, you have to make sure that your output is less than one

So, a number divided by itself plus one will always be less than one

So that's how we get this formula First, we take theexponent of the equation, beta naught plus beta one plus x and then we divide itby that number plus one

So this is how we get this formula

Now the next step is to calculate something known as a logic function

Now the logic function is nothing, but it is a link function that is represented as an S curve or as a sigmoid curve that ranges betweenthe value zero and one

It basically calculates the probability of the output variable

So if you look at thisequation, it's quite simple

What we have done hereis we just cross multiply and take each of our beta naught plus beta one into x as common

The RHS denotes the linear equation for the independent variables

The LHS represents the odd ratio

So if you compute this entire thing, you'll get this final value, which is basically yourlogistic regression equation

Your RHS here denotes the linear equation for independent variables, and your LHS represents the odd ratio which is also known as the logic function

So I told you that logic function is basically a functionthat represents an S curve that bring zero and one

this will make sure that our value ranges between zero and one

So in logistic regression, on increasing this X by one measure, it changes the logic bya factor of beta naught

It's the same thing as I showedyou in logistic regression

So guys, that's how you derive the logistic regression equation

So if you have any doubtsregarding these equations, please leave them in the comment section, and I'll get back to you,and I'll clear that out

So to sum it up, logisticregression is used for classification

The output variable will alwaysbe a categorical variable

We also saw how you derive thelogistic regression equation

And one more important thing is that the relationship between the variables and a logistic regression is denoted as an S curve which is alsoknows as a sigmoid curve, and also the outcome does not necessarily have to becalculated as zero or one

It can be calculate as a probability that the output lies inclass one or class zero

So your output can be a probability ranging between zero and one

That's why we have a sigmoid curve

So I hope all of you are clearwith logistic regression

Now I won't be showingyou the demo right away

I'll explain a couple of moreclassification algorithms

Then I'll show you a practical demo where we'll use multipleclassification algorithms to solve the same problem

Again, we'll also calculate the accuracy and se which classificationalgorithm is doing the best

Now the next algorithmI'm gonna talk about is decision tree

Decision tree is one ofmy favorite algorithms, because it's very simple to understand how a decision tree works

So guys, before this, wediscussed linear regression, which was a regression algorithm

Then we discussed logistic regression, which is a classification algorithm

Remember, don't get confused just because it has the name logistic regression

Okay, it is a classification algorithm

Now we're discussing decision tree, which is again a classification algorithm

Okay

So what exactly is a decision tree? Now a decision tree is, again, a supervised machine learning algorithm which looks like an inverted tree wherein each node representsa predictor variable, and the link between thenode represents a decision, and each leaf node represents an outcome

Now I know that's a little confusing, so let me make you understandwhat a decision tree is with the help of an example

Let's say that you hosted a huge party, and you want to knowhow many of your gusts are non-vegetarians

So to solve this problem, you can create a simple decision tree

Now if you look at this figure over here, I've created a decisiontree that classifies a guest as either vegetarian or non-vegetarian

Our last outcome here is non-veg or veg

So here you understand that this is a classification algorithm, because here you're predictinga categorical value

Each node over here representsa predictor variable

So eat chicken is one variable, eat mutton is one variable, seafood is another variable

So each node representsa predictor variable that will help you conclude whether or not a guest is a non-vegetarian

Now as you traverse down the tree, you'll make decisions that each node until you reach the dead end

Okay, that's how it works

So, let's say we got a new data point

Now we'll pass it throughthe decision tree

The first variable is did the guest eat the chicken? If yes, then he's a non-vegetarian

If no, then you'll passit to the next variable, which is did the guest eat mutton? If yes, then he's a non-vegetarian

If no, then you'll passit to the next variable, which is seafood

If he ate seafood, thenhe is a non-vegetarian

If no, then he's a vegetarian

this is how a decision tree works

It's a very simple algorithm that you can easily understand

It has drawn out letters, whichis very easy to understand

Now let's understand thestructure of a decision tree

I just showed you an example of how the decision tree works

Now let me take the same example and tell you the structurefor decision tree

So, first of all, we havesomething known as the root node

Okay

The root node is the starting point of a decision tree

Here you'll perform the first split and split it into two other nodes or three other nodes, dependingon your problem statement

So the top most node isknown as your root node

Now guys, about the root node, the root node is assigned to a variable that is very significant, meaning that thatvariable is very important in predicting the output

Okay, so you assign a variable that you think is the mostsignificant at the root node

After that, we have somethingknown as internal nodes

So each internal noderepresents a decision point that eventually leads to the output

Internal nodes will haveother predictor variables

Each of these are nothingpredictor variables

I just made it into a question otherwise these are justpredictor variables

Those are internal nodes

Terminal nodes, alsoknown as the leaf node, represent the final classof the output variable, because these are basically your outcomes, non-veg and vegetarian

Branches are nothing butconnections between nodes

Okay, these connections are links between each node is known as a branch, and they're represented by arrows

So each branch will havesome response to it, either yes or no, true orfalse, one or zero, and so on

Okay

So, guys, this is thestructure of a decision tree

It's pretty understandable

Now let's move on and we'll understand how thedecision tree algorithm works

Now there are many waysto build a decision tree, but I'll be focusing on something known as the ID3 algorithm

Okay, this is somethingknown as the ID3 algorithm

That is one of the waysin which you can build the decision tree

ID3 stands for IterativeDichotomiser 3 algorithm, which is one of the mosteffective algorithms used to build a decision tree

It uses the concepts ofentropy and information gain in order to build a decision tree

Now you don't have to know what exactly the ID3 algorithm is

It's just a concept behindbuilding a decision tree

Now the ID3 algorithm hasaround six defined steps in order to build a decision tree

So the first step is you willselect the best attribute

Now what do you meanby the best attribute? So, attribute is nothing but the predictor variable over here

So you'll select thebest predictor variable

Let's call it A

After that, you'll assign this A as a decision variable for the root node

Basically, you'll assignthis predictor variable A at the root node

Next, what you'll dois for each value of A, you'll build a descendant of the node

Now these three steps, let's look at it with the previous example

Now here the bestattribute is eat chicken

Okay, this is my bestattribute variable over here

So I selected that attribute

And what is the next step? Step two was assigned thatas a decision variable

So I assigned eat chickas the decision variable at the root node

Now you might be wondering how do I know which is the best attribute

I'll explain all of that in a while

So what we did is we assignedthis other root node

After that, step number threesays for each value of A, build a descendant of the node

So for each value of this variable, build a descendant node

So this variable can taketwo values, yes and no

So for each of these values, I build a descendant node

Step number four, assignclassification labels to the leaf node

To your leaf node, I have assigned classification one asnon-veg, and the other is veg

That is step number four

Step number five is if datais correctly classified, then you stop at that

However, if it is not, then you keep iterating over the tree, and keep changing the position of the predictor variables in the tree, or you change the root node also in order to get the correct output

So now let me answer this question

What is the best attribute? What do you mean by the best attribute or the best predictor variable? Now the best attribute is the one that separates the datainto different classes, most effectively, or it is basically a feature that best splits the data set

Now the next question in yourhead must be how do I decide which variable or whichfeature best splits the data

To do this, there aretwo important measures

There's something knownas information gain and there's something known as entropy

Now guys, in order to understand information gain and entropy, we look at a simple problem statement

This data represents the speed of a car based on certain parameters

So our problem statementhere is to study the data set and create a decision tree that classifies the speed of the caras either slow or fast

So our predictor variableshere are road type, obstruction, and speed limit, and or response variable, orour output variable is speed

So we'll be building a decisiontree using these variables in order to predict the speed of car

Now like I mentioned earlier, we must first begin by deciding a variable that best splits the data set and assign that particularvariable to the root node and repeat the same thingfor other nodes as well

So step one, like we discussed earlier, is to select the best attribute A

Now, how do you know whichvariable best separates the data? The variable with thehighest information gain best derives the data intothe desired output classes

First of all, we'llcalculate two measures

We'll calculate the entropyand the information gain

Now this is where it ellyou what exactly entropy is, and what exactly information gain is

Now entropy is basically used to measure the impurity or the uncertaintypresent in the data

It is used to decide how adecision tree can split the data

Information gain, on the other hand, is the most significant measure which is used to build a decision tree

It indicates how muchinformation a particular variable gives us a bout the final outcome

So information gain is important, because it is used to choose a variable that best splits the data at each node for a decision tree

Now the variable with thehighest information gain will be used to split thedata at the root node

Now in our data set, thereare are four observations

So what we're gonna do iswe'll start by calculating the entropy and information gain for each of the predictor variable

So we're gonna start bycalculating the information gain and entropy for the road type variable

In our data set, you can see that there are four observations

There are four observationsin the road type column, which corresponds to the fourlabels in the speed column

So we're gonna begin bycalculating the information gain of the parent node

The parent node is nothing butthe speed of the care node

This is our output variable, correct? It'll be used to showwhether the speed of the car is slow or fast

So to find out the information gain of the speed of the car variable, we'll go through a couple of steps

Now we know that thereare four observations in this parent node

First, we have slow

Then again we have slow, fast, and fast

Now, out of these fourobservations, we have two classes

So two observationsbelong to the class slow, and two observationsbelong to the class fast

So that's how you calculateP slow and P fast

P slow is nothing by the fraction of slow outcomes in the parent node, and P fast is thefraction of fast outcomes in the parent node

And the formula to calculate P slow is the number of slowoutcomes in the parent node divided by the total number of outcomes

So the number of slow outcomesin the parent node is two, and the total number of outcomes is four

We have four observations in total

So that's how we get P of slow as 0

5

Similarly, for P of fast, you'll calculate the number of fast outcomes divided by the total number of outcomes

So again, two by four, you'll get 0

5

The next thing you'lldo is you'll calculate the entropy of this node

So to calculate the entropy,this is the formula

All you have to do is youhave to substitute the, you'll have to substitutethe value in this formula

So P of slow we're substituting as 0

5

Similarly, P of fast as 0

5

Now when you substitute the value, you'll get a answer of one

So the entropy of your parent node is one

So after calculating theentropy of the parent node, we'll calculate the informationgain of the child node

Now guys, remember thatif the information gain of the road type variable isgreat than the information gain of all the other predictor variables, only then the root nodecan be split by using the road type variable

So, to calculate the informationgain of road type variable, we first need to split the root node by sing the road type variable

We're just doing this in order to check if the road type variable is giving us maximuminformation about a data

Okay, so if you notice thatroad type has two outcomes, it has two values, either steep or flat

Now go back to our data set

So here what you can notice is whenever the road type is steep, so first what we'll do is we'll check the value of speed that we get when the road type is steep

So, first, observation

You see that wheneverthe road type is steep, you're getting a speed of slow

Similarly, in the second observation, when the road type is steep, you'll get a value of slow again

If the road type is flat, you'llget an observation of fast

And again, if it is steep,there is a value of fast

So for three steep values, we have slow, slow, and fast

And when the road type is flat, we'll get an output of fast

That's exactly what I'vedone in this decision tree

So whenever the road type is steep, you'll get slow, slow or fast

And whenever the road type is flat, you'll get fast

Now the entropy of theright-hand side is zero

Entropy is nothing but the uncertainty

There's no uncertainty over here

Because as soon as you seethat the road type is flat, your output is fast

So there's no uncertainty

But when the road type is steep, you can have any one ofthe following outcomes, either your speed will be slow or it can be fast

So you'll start by calculating the entropy of both RHS and LHS of the decision tree

So the entropy for the rightside child node will be zero, because there's no uncertainty here

Immediately, if you seethat the road type is flat, your speed of the car will be fast

Okay, so there's no uncertainty here, and therefore your entropy becomes zero

Now entropy for the left-hand side is we'll again have to calculate the fraction of P slow andthe fraction of P fast

So out of three observations, in two observations we have slow

That's why we have two by three over here

Similarly for P fast, we have one P fast divided by the total number ofobservation which are three

So out of these three, wehave two slows and one fast

When you calculate P slow and P fast, you'll get these two values

And then when you substitutethe entropy in this formula, you'll get the entropy as 0

9for the road type variable

I hope you all are understanding this

I'll go through this again

So, basically, here we are calculating the information gain andentropy for road type variable

Whenever you consider road type variable, there are two values, steep and flat

And whenever the valuefor road type is steep, you'll get anyone of these three outcomes, either you'll get slow, slow, or fast

And when the road type is flat, your outcome will be fast

Now because there is no uncertainty whenever the road type is flat, you'll always get an outcome of fast

This means that the entropy here is zero, or the uncertainty value here is zero

But here, there is a lot of uncertainty

So whenever your road type is steep, your output can either beslow or it can be fast

So, finally, you get the Python as 0

9

So in order to calculatethe information gain of the road type variable

You need to calculatethe weighted average

I'll tell you why

In order to calculatethe information gain, you need to know theentropy of the parent, which we calculate as one, minus the weightedaverage into the entropy of the children

Okay

So for this formula, you need to calculate all of these values

So, first of all, you needto calculate the entropy of the weighted average

Now the total number ofoutcomes in the parent node we saw were four

The total number of outcomes in the left child node were three

And the total number ofoutcomes in the right child node was one

Correct? In order to verify this with you, the total number of outcomesin the parent node are four

One, two, three, and four

Coming to the child node,which is the road type, the total number of outcomeson the right-hand side of the child node is one

And the total number of outcomes on the left-hand side ofthe child node is three

That's exactly whatI've written over here

Alright, I hope you allunderstood these three values

After that, all you have to do is you have to substitute thesevalues in this formula

So when you do that, you'll get the entropy of the childrenwith weighted average will be around 0

675

Now just substitute thevalue in this formula

So if you calculate the information gain of the road type variable, you'll get a value of 0

325

Now by using the same method, you're going to calculatethe information gain for each of the predictor variable, for road type, for obstruction,and for speed limit

Now when you follow the same method and you calculate the information gain, you'll get these values

Now what does thisinformation gain for road type equal to 0

325 denote? Now the value 0

325 forroad type denotes that we're getting very little information gain from this road type variable

And for obstruction, we literally have information gain of zero

Similarly, information gainedfor speed limit is one

This is the highest valuewe've got for information gain

This means that we'll have touse the speed limit variable at our root node in orderto split the data set

So guys, don't getconfused whichever variable gives you the maximum information gain

That variable has to bechosen at the root node

So that's why we have theroot node as speed limit

So if you've maintained the speed limit, then you're going to go slow

But if you haven'tmaintained the speed limit, then the speed of yourcar is going to be fast

Your entropy is literally zero, and your information is one, meaning that you can use thisvariable at your root node in order to split the data set, because speed limit gives youthe maximum information gain

So guys, I hope this usecase is clear to all of you

To sum everything up, I'll just repeat the entirething to you all once more

So basically, here you weregiven a problem statement in order to create a decision tree that classifies the speed ofa car as either slow or fast

So you were given threepredictor variables and this was your output variable

Information gained in entropyare basically two measures that are used to decide which variable will be assigned to the rootnode of a decision tree

Okay

So guys, as soon as youlook at the data set, if you compare these two columns, that is speed limit and speed, you'll get an output easily

Meaning that if you'remaintaining speed limit, you're going to go slow

But if you aren't maintaining speed limit, you're going to a fast

So here itself we canunderstand the speed limit has no uncertainty

So every time you'vemaintained your speed limit, you will be going slow, and every time youroutside or speed limit, you will be going fast

It's as simple as that

So how did you start? So you started by calculating the entropy of the parent node

You calculated the entropyof the parent node, which came down to one

Okay

After that, you calculatedthe information gain of each of the child nodes

In order to calculate the information gain of the child node, you stat by calculating the entropy of the right-hand sideand the left-hand side of the decision tree

Okay

Then you calculate the entropy along with the weighted average

You substitute these values inthe information gain formula, and you get the information gain for each of the predictor variables

So after you get the information gain of each of the predictor variables, you check which variable gives you the maximum information gain, and you assign thatvariable to your root node

It's as simple as that

So guys, that was allabout decision trees

Now let's look at our nextclassification algorithm which is random forest

Now first of all, what is a random forest? Random forest basicallybuilds multiple decision trees and glues them togetherto get a more accurate and stable prediction

Now if already have decision trees and random forest is nothing but a collection of decision tree, why do we have to use a random forest when we already have decision tree? There are three main reasonswhy random forest is used

Now even though decisiontrees are convenient and easily implemented, they are not as accurate as random forest

Decision trees work very effectively with the training data, backup they're not flexible when it comes to classifying anew sample

Now this happens because ofsomething known as overfitting

Now overfitting is a problem that is seen with decision trees

It's something that commonly occurs when we use decision trees

Now overfitting occurswhen a model studies a training data to such an extent that it negatively influences the performance of themodel on a new data

Now this means that the disturbance in the training data is recorded, and it is learned as concept by the model

If there's any disturbance or any thought of noisein the training data or any error in the training data, that is also studied by the model

The problem here is that these concepts do not apply to the testing data, and it negatively impactsthe model's ability to classify new data

So to sum it up, overfitting occurs whenever your model learns the training data, along with all the disturbancein the training data

So it basically memorizedthe training data

And whenever a new datawill be given to your model, it will not predict theoutcome very accurately

now this is a problemseen in decision trees

Okay

But in random forest, there'ssomething known as bagging

Now the basic idea behind bagging is to reduce the variationsand the predictions by combining the resultof multiple decision trees on different samples of the data set

So your data set will bedivided into different samples, and you'll be building a decision tree on each of these samples

This way, each decisiontree will be studying one subset of your data

So this way over fitting will get reduced because one decision tree is not studying the entire data set

Now let's focus on random forest

Now in order to understand random forest, we look at a small example

We can consider this data set

In this data, we havefour predictor variables

We have blood flow, blocked arteries, chest pain, and weight

Now these variables are used to predict whether or not a personhas a heart disease

So we're going to use this data set to create a random forest that predicts if a person has a heart disease or not

Now the first step increating a random forest is that you create a bootstrap data set

Now in bootstrapping, all you have to do is you have to randomly select samples from your original data set

Okay

And a point to note is thatyou can select the same sample more than once

So if you look at the original data set, we have a abnormal, normal,normal, and abnormal

Look at the blood flow section

Now here I've randomly selected samples, normal, abnormal, and I've selected one sample twice

You can do this in a bootstrap data set

Now all I did here is I created a bootstrap data set

Boot strapping is nothingbut an estimation method used to make predictions on a data by re-sampling the data

This is a bootstrap data set

Now even though this seems very simple, in real world problems, you'll never get such small data set

Okay, so bootstrapping is actually a littlemore complex than this

Usually in real world problems, you'll have a huge data set, and bootstrapping thatdata set is actually a pretty complex problem

I'm here because I'm making you understand how random forest works, so that's why I'veconsidered a small data set

Now you're going to usethe bootstrap data set that you created, and you're going to builddecision trees from it

Now one more thing tonote in random forest is you will not be usingyour entire data set

Okay, so you'll only beusing few other variables at each node

So, for example, we'llonly consider two variables at each step

So if you begin at the root node here, we will randomly select two variables as candidates for the root node

Okay, let's say thatwe selected blood flow and blocked arteries

Out of these two variables wehave to select the variable that best separates the sample

Okay

So for the sake of this example, let's say that blocked arteries is the most significant predictor, and that's why we'llassign it to the root node

Now our next step is torepeat the same process for each of these upcoming branch nodes

Here we'll again selecttwo variables at random as candidates for eachof these branch nodes, and then choose a variable that best separates the samples, right? So let me just repeat this entire process

So you know that you startcreating a decision tree by selecting the root node

In random forest, you'll randomly select a couple of variables for each node, and then you'll calculate which variable best splits the data at that node

So for each node, we'll randomly select two or three variables

And out of those two, three variables, we'll see which variablebest separates the data

Okay, so at each node, we'll because calculatinginformation gain an entropy

Basically, that's what I mean

At every node, you'llcalculate information gain and entropy of two or three variables, and you'll see which variable has the highest information gain, and you'll keep descending downwards

That's how you create a decision tree

So we just created ourfirst decision tree

Now what you do is you'llgo back to step one, and you'll repeat the entire process

So each decision tree willpredict the output class based on the predictor variables that you've assignedto each decision tree

Now let's say for this decision tree, you've assigned blood flow

Here we have blockedarteries at the root node

Here we might have blood flowat the root node and so on

So your output will dependon which predictor variable is at the root node

So each decision tree willpredict the output class based on the predictor variable that you assigned in that tree

Now what you do is you'llgo back to step one, you'll create a new bootstrap data set, and then again you'llbuild a new decision tree

And for that decision tree, you'll consider onlya subset of variables, and you'll choose thebest predictor variable by calculating the information gain

So you will keep repeating this process

So you just keep repeatingstep two and step one

Okay

And you'll keep creatingmultiple decision trees

Okay

So having a variety of decisiontrees in a random forest is what makes it more effective than an individual decision tree

So instead of having anindividual decision tree, which is created using all the features, you can build a random forest that uses multiple decision trees wherein each decisiontree has a random set of predictor variables

Now step number four ispredicting the outcome of a new data point

So now that you'vecreated a random forest, let's see how it can be used to predict whether a new patienthas a heart disease or not

Okay, now this diagrambasically has a data about the new patient

Okay, this is the dataabout the new patient

He doesn't have blocked arteries

He has chest pain, and hisweight is around 185 kgs

Now all you have to do isyou have to run this data down each of the decisiontrees that you made

So, the first decision tree shows that yes, this person has heart disease

Similarly, you'll run theinformation of this new patient through every decisiontree that you created

Then depending on how manyvotes you get for yes and no, you'll classify that patient as either having heart disease or not

All you have to do is you have to run the information of the new patient through all the decisiontrees that you created in the previous step, and the final output isbased on the number of votes each of the class is getting

Okay, let's say that three decision trees said that yes the patienthas heart disease, and one decision tree saidthat no it doesn't have

So this means you will obviously classify the patient as having a heart disease because three of them voted for yes

It's based on majority

So guys, I hope the conceptbehind random forest is understandable

Now the next step is you will evaluate the efficiency of the model

Now earlier when we createdthe bootstrap data set we left out one entry sample

This is the entry sample we left out, because we repeated one sample twice

If you'll remember inthe bootstrap data set, here we repeated an entry twice, and we missed out on one of the entries

We missed out on one of the entries

So what we're gonna do is

So for evaluating the model, we'll be using the dataentry that we missed out on

Now in a real world problem, about 1/3 of the originaldata set is not included in the bootstrap dataset

Because there's a huge amount of data in a real world problem, so 1/3 of the original data set is not included in the bootstrap data set

So guys, the sample data set which is not there inyour bootstrap data set is known as out-of-bag data set, because basically this isour out-of-bag data set

Now the out-of-bag data set is used to check theaccuracy of the model

Because the model was not created by using the out-of-bag data set, it will give us a good understanding of whether the model is effective or not

Now the out-of-bag data set is nothing but your testing data set

Remember, in machinelearning, there's training and testing data set

So your out-of-bag data set is nothing but your testing data set

This is used to evaluate theefficiency of your model

So eventually, you canmeasure the accuracy of a random forest by the proportion of out-of-bag samples thatare correctly classified, because the out-of-bag data set is used to evaluate theefficiency of your model

So you can calculate the accuracy by understanding how many samples or was this out-of-bag data set correctly able to classify it

So guys, that was an explanation about how random forest works

To give you an overview, let me just run you throughall the steps that we took

So basically, this was our data set, and all we have to dois we have to predict whether a patient hasheart disease or not

So, our first step was tocreate a bootstrap data set

A bootstrap data set isnothing but randomly selected observations from your original data set, and you can also have duplicate values in your bootstrap data set

Okay

The next step is you're goingto create a decision tree by considering a randomset of predictor variables for each decision tree

Okay

So, the third step isyou'll go back to step one, create a bootstrap data set

Again, create a decision tree

So this iteration isperformed hundreds of times until you are multiple decision trees

Now that you've created a random forest, you'll use this random forestto predict the outcome

So if you're given a new data point and you have to classify it into one of the two classes, we'll just run this new information through all the decision trees

And you'll just take the majority of the output that you'regetting from the decision trees as your outcome

Now in order to evaluatethe efficiency of the model, you'll use the out ofthe bag sample data set

Now the out-of-bag sampleis basically the sample that was not included inyour bootstrap data set, but this sample is coming from your original data set, guys

This is not somethingthat you randomly create

This data set was therein your original data set, but it was just not mentioned in your bootstrap data set

So you'll use your out-of-bag sample in order to calculate the accuracy of your random forest

So the proportion of out-of-bag samples that are correctly classifiedwill give you the accuracy of your model

So that is all for random forest

So guys, I'll discuss other classification algorithms with you, and only then I'll show you a demo on the classification algorithms

Now our next algorithm issomething known as naive Bayes

Naive Bayes is, again, a supervised classification algorithm, which is based on the Bayes Theorem

Now the Bayes Theorem basically follows a probabilistic approach

The main idea behind naive Bayes is that the predictor variables ina machine learning model are independent of each other, meaning that the outcome of a model depends on a set of independent variables that have nothing to do with each other

Now a lot of you mightask why is naive Bayes called naive

Now usually, when I tellanybody why naive Bayes, they keep asking me why isnaive Bayes called naive

So in real world problemspredictor variables aren't always independent of each other

There is always some correlation between the independent variables

Now because naive Bayesconsiders each predictor variable to be independent of anyother variable in the model, it is called naive

This is an assumptionthat naive Bayes states

Now let's understand the math behind the naive Bayes algorithm

So like I mentioned, theprinciple behind naive Bayes is the Bayes Theorem, which is also known as the Bayes Rule

The Bayes Theorem is used to calculate the conditional probability, which is nothing but theprobability of an event occurring based on information aboutthe events in the past

This is the mathematicalequation for the Bayes Theorem

Now, in this equation, the LHS is nothing but the conditional probabilityof event A occurring, given the event B

P of A is nothing butprobability of event A occurring P of B is probability of event B

And PB of A is nothing butthe conditional probability of event B occurring, given the event A

Now let's try to understandhow naive Bayes works

Now consider this data set of around thousand 500 observations

Okay, here we have thefollowing output classes

We have either cat, parrot, or turtle

These are our output classes, and the predictor variables are swim, wings, green color, and sharp teeth

Okay

So, basically, your typeis your output variable, and swim, wings, green, and sharp teeth are your predictor variables

Your output variables has three classes, cat, parrot, and turtle

Okay

Now I've summarized this tableI've shown on the screen

The first thing you can seeis the class of type cats shows that out of 500 cats, 450 can swim, meaning that 90% of them can

And zero number of cats have wings, and zero number of catsare green in color, and 500 out of 500 cats have sharp teeth

Okay

Now, coming to parrot, itsays 50 out of 500 parrots have true value for swim

Now guys, obviously,this does not hold true in real world

I don't think there areany parrots who can swim, but I've just created this data set so that we can understand naive Bayes

So, meaning that 10% of parrotshave true value for swim

Now all 500 parrots have wings, and 400 out of 500 parrotsare green in color, and zero parrots have sharp teeth

Coming to the turtle class, all 500 turtles can swim

Zero number of turtles have wings

And out of 500, hundredturtles are green in color, meaning that 20% of theturtles are green in color

And 50 out of 500turtles have sharp teeth

So that's what we understandfrom this data set

Now the problem here is we are given our observation over here, given some value for swim,wings, green, and sharp teeth

What we need to do is we need to predict whether the animal is acat, parrot, or a turtle, based on these values

So the goal here to predictwhether it is a cat, parrot, or a turtle based on all these defined parameters

Okay

Based on the value of swim,wings, green, and sharp teeth, we'll understand whetherthe animal is a cat, or is it a parrot, or is it a turtle

So, if you look at the observation, the variables swim andgreen have a value of true, and the outcome can beanyone of the types

It can either be a cat,it can be a parrot, or it can be a turtle

So in order to checkif the animal is a cat, all you have to do isyou have to calculate the conditional probability at each step

So here what we're doing is we need to calculate the probability that this is a cat, given that it can swimand it is green in color

First, we'll calculate theprobability that it can swim, given that it's a cat

And two, the probability that it is green and the probability of it being green, given that it is a cat, and then we'll multiply it with the probability of it being a cat divided by the probabilityof swim and green

Okay

So, guys, I know you all cancalculate the probability

It's quite simple

So once you calculatethe probability here, you'll get a direct value of zero

Okay, you'll get a value of zero, meaning that this animalis definitely not a cat

Similarly, if you do this for parrots, you calculate a conditional probability, you'll get a value of 0

0264 divided by probabilityof swim comma green

We don't know this probability

Similarly, if you checkthis for the turtle, you'll get a probability of 0

066 divided by P swim comma green

Okay

Now for these calculations,the denominator is the same

The value of the denominator is the same, and the value of and the probability of it being a turtle is greaterthan that of a parrot

So that's how we can correctly predict that the animal is actually a turtle

So guys, this is how naive Bayes works

You basically calculate the conditional probability at each step

Whatever classification needs to be done, that has to be calculatedthrough probability

There's a lot of statistic that comes into naive Bayes

And if you all want tolearn more about statistics and probability, I'll leave a link in the description

You all can watch that video as well

There I've explain exactly what conditional probability is, and the Bayes Theorem isalso explained very well

So you all can check out that video also

And apart from this, ifyou all have any doubts regarding any of the algorithms, please leave them in the comment section

Okay, I'll solve your doubts

And apart from that, I'llalso leave a couple of links for each of the algorithmsin the description box

Because if you want morein-depth understanding of each of the algorithms, you can check out that content

Since this is a full course video, I have to cover all the topics, and it is hard for meto make you understand in-depth of each topic

So I'll leave a couple oflinks in the description box

You can watch those videos as well

Make sure you checkout the probability and statistics video

So now let's move on andlocate our next algorithm, which is the K nearest neighbor algorithm

Now KNN, which basicallystands for K nearest neighbor, is, again, a supervisedclassification algorithm that classifies a new datapoint into the target class or the output class,depending on the features of its neighboring data points

That's why it's called K nearest neighbor

So let's try to understandKNN with a small analogy

Okay, let's say that we want a machine to distinguish between theimages of cats and dogs

So to do this, we must input our data set of cat and dog images, and we have to train ourmodel to detect the animal based on certain features

For example, features such as pointy ears can be used to identify cats

Similarly, we can identify dogs based on their long ears

So after starting the data set during the training phase, when a new image is given to the model, the KNN algorithm will classify it into either cats or dogs, depending on the similarityin their features

Okay, let's say that anew image has pointy ears, it will classify that image as cat, because it is similar to the cat images, because it's similar to its neighbors

In this manner, the KNNalgorithm classifies the data point basedon how similar they are to their neighboring data points

So this is a small example

We'll discuss more aboutit in the further slides

Now let me tell you a couple of features of KNN algorithm

So, first of all, we know that it is a supervised learning algorithm

It uses labeled input data set to predict the output of the data points

Then it is also one of the simplest machine learning algorithms, and it can be easily implemented for a varied set of problems

Another feature is thatit is non-parametric, meaning that it does nottake in any assumptions

For example, naive Bayesis a parametric model, because it assumes that allthe independent variables are in no way related to each other

It has assumptions about the model

K nearest neighbor hasno such assumptions

That's why it's considereda non-parametric model

Another feature is thatit is a lazy algorithm

Now, lazy algorithmbasically is any algorithm that memorizes the training set, instead of learning adiscriminative function from the training data

Now, even though KNN is mainlya classification algorithm, it can also be used for regression cases

So KNN is actually both a classification and a regression algorithm

But mostly, you'll see that it'll be used on the four classification problems

The most important feature about a K nearest neighbor is that it's based on feature similarity with its neighboring data points

You'll understand this in the example that I'm gonna tell you

Now, in this image, wehave two classes of data

We have class A which is squares and class B which are triangles

Now the problem statement is to assign the new input data point to one of the two classes by using the KNN algorithm

So the first step in the KNN algorithm is to define the value of K

But what is the K in theKNN algorithm stand for? Now the K stands for thenumber of nearest neighbors, and that's why it's got thename K nearest neighbors

Now, in this image, I'vedefined the value of K as three

This means that the algorithm will consider the three neighbors that are closest to the new data point in order to decide theclass of the new data point

So the closest between the data point is calculated by using measure such as Euclidean distanceand Manhattan distance, which I'll be explaining in a while

So our K is equal to three

The neighbors include twosquares and one triangle

So, if I were to classifythe new data point based on K equal to three, then it should be assignedto class A, correct? It should be assigned to squares

But what if the K value is set to seven

Here I'm basically telling my algorithm to look for the seven nearest neighbors and classify the new data point into the class it is most similar to

So our K equal to seven

The neighbors include threesquares and four triangles

So if I were to classifythe new data point based on K equal to seven, then it would be assigned to class B, since majority of itsneighbors are from class B

Now this is where alot of us get confused

So how do we know which Kvalues is the most suitable for K nearest neighbor

Now there are a couple methods used to calculate the K value

One of them is known as the elbow method

We'll be discussing the elbow method in the upcoming slides

So for now let me just show you the measures that are involved behind KNN

Okay, there's very simple math behind the K nearest neighbor algorithm

So I'll be discussing theEuclidean distance with you

Now in this figure, we haveto measure the distance between P one and P two byusing Euclidean distance

I'm sure a lot of you already know what Euclidean distance is

It is something that we learnedin eighth or 10th grade

I'm not sure

So all you're doing isyou're extracting X one

So the formula isbasically x two minus x one the whole square plus y two minus y one the whole square, and the root of that isthe Euclidean distance

It's as simple as that

So Euclidean distance is used as a measure to check thecloseness of data points

So basically, KNN usesthe Euclidean distance to check the closeness of a new data point with its neighbors

So guys, it's as simple as that

KNN makes use of simple measures in order to solve very complex problems

Okay, and this is one of the reasons why KNN is such a commonly used algorithm

Coming to support vector machine

Now, this is our last algorithm under classification algorithms

Now guys, don't get paranoidbecause of the name

Support vector machine actually is one of the simplest algorithmsin supervised learning

Okay, it is basicallyused to classify data into different classes

It's a classification algorithm

Now unlike most algorithms, SVM makes use of somethingknown as a hyperplane which acts like a decision boundary between the separate classes

Okay

Now SVM can be used to generate multiple separating hyperplane, such that the data isdivided into segments, and each segment containsonly one kind of data

So, a few features of SVM include that it is a supervised learning algorithm, meaning that it's going tostudy a labeled training data

Another feature is that it is again a regression and aclassification algorithm

Even though SVM is mainlyused for classification, there is something known asthe support vector regressor

That is useful regression problems

Now, SVM can also be usedto classify non-linear data by using kernel tricks

Non-linear data is basically data that cannot be separated by using a single linear line

I'll be talking more about this in the upcoming slides

Now let's move on anddiscuss how SVM works

Now again, in order to make you understand how support vector machine works, you look at a small scenario

For a second, pretend that you own a farm and you have a problem

You need to set up a fence to protect your rabbitsfrom a pack of wolves

Okay, now, you need to decide where you want to build your fence

So one way to solvethe problem is by using support vector machines

So if I do that and if I tryto draw a decision boundary between the rabbits and the wolves, it looks something like this

Now you can clearly builda fence along this line

So in simple terms, this is exactly how your support vector machines work

It draws a decision boundary, which is nothing but a hyperplane between any two classesin order to separate them or classify them

Now I know that you'rethinking how do you know where to draw a hyperplane

The basic principle behind SVM is to draw a hyperplane that best separates the two classes

In our case, the two classes are the rabbits and the wolves

Now before we move any further, let's discuss the different terminologies that are there in support vector machine

So that is basically a hyperplane

It is a decision boundarythat best separates the two classes

Now, support vectors, whatexactly are support vectors

So when you start with thesupport vector machine, you start by drawing a random hyperplane

And then you check the distance between the hyperplaneand the closest data point from each of the class

These closest datapoints to the hyperplane are known as support vectors

Now these two data points are the closest to your hyperplane

So these are known as support vectors, and that's where the name comes from, support vector machines

Now the hyperplane is drawn based on these support vectors

And optimum hyperplane will be the one which has a maximum distance from each of the support vectors, meaning that the distancebetween the hyperplane and the support vectors has to be maximum

So, to sum it up, SVMis used to classify data by using a hyperplane, such that the distancebetween the hyperplane and the support vector is maximum

Now this distance isnothing but the margin

Now let's try to solve a problem

Let's say that I input a new data point and I want to draw a hyperplane such that it best separatesthese two classes

So what do I do? I start out by drawing a hyperplane, and then I check the distancebetween the hyperplane and the support vectors

So, basically here, I'm trying to check if the margin is maximumfor this hyperplane

But what if I drew thehyperplane like this? The margin for this hyperplaneis clearly being more than the previous one

So this is my optimal hyperplane

This is exactly how you understand which hyperplane needs to be chosen, because you can draw multiple hyperplanes

Now, the best hyperplane is the one that has a maximum module

So, this is my optimal hyperplane

Now so far it was quite easy

Our data was linearly separable, which means that youcould draw a straight line to separate the two classes

But what will you do ifthe data looks like this? You possibly cannot drawa hyperplane like this

You possibly cannot drawa hyperplane like this

It doesn't separate the two classes

We can clearly see rabbits and wolves in both of the classes

Now this is exactly where non-linear SVM comes into the picture

Okay, this is what thekernel trick is all about

Now, kernel is basicallysomething that can be used to transform data into another dimension that has a clear dividingmargin between classes of data

So, basically the kernel function offers the user the option of transforming non-linear spaces into linear ones

Until this point, if you notice that we were plotting our dataon two dimensional space

We had x and y-axis

A simple trick is transformingthe two variables, x and y, into a new feature space, which involves a new variable z

So, basically, what we're doing is we're visualizing the data on a three dimensional space

So when you transform the2D space into a 3D space, you can clearly see a dividing margin between the two classes of data

You can clearly draw a line in the middle that separates these two data sets

So guys, this sums upthe whole idea behind support vector machines

Support vector machines arevery easy to understand

Now, this was all for our supervised learning algorithms

Now, before I move on tounsupervised learning algorithms, I'll be running a demo

We'll be running a demo in order to understand allthe classification algorithms that we studied so far

Earlier in the session, we ran a demo for the regression algorithms

Now we'll run for theclassification algorithms

So, enough of theory

Let's open up Python, and let's start looking at how these classification algorithms work

Now, here what we'll be doing is we'll implement multipleclassification algorithms by using the scikit-learn

Okay, it's one of the most popular machine learning tool for Python

Now we'll be using a simple data set for the task of training aclassifier to distinguish between the different types of fruits

The purpose of this demo is to implement multiple classification algorithms for the same set of problem

So as usual, you start by importing all your libraries in Python

Again, guys, if you don't know Python, check the description box, I'll leave a link there

You can go through that video as well

Next, what we're doing iswe're reading the fruit data in the form of table

You stored it in a variable called fruits

Now if you wanna see thefirst few rows of the data, let's print the first fewobservations in our data set

So, this is our data set

These are the fruit labels

So we have around fourfruits in our data set

We have apple, we have mandarin, orange, and lemon

Okay

Now, fruit label denotesnothing but the label of apple, which is one

Mandarin has two

Similarly, orange is labeled as three

And lemon is labeled as four

Then a fruit subtype is basically the family of fruit it belongs to

Mass is the mass of the fruit, width, height, and color score

These are all our predictor variables

We have to identify the type of fruit, depending on these predictor variables

So, first, we saw a coupleof observations over here

Next, if you want to seethe shape of your data set, this is what it looks like

There are around 59 observations with seven predictor variables, which is one, two, three,four, five, six, and seven

We have seven variables in total

Sorry, not predictor variables

This seven denotes both your predictor and your target variable

Next, I'm just showing youthe four fruits that we have in our data set, which is apple, mandarin,orange, and lemon

Next, I'm just groupingfruits by their names

Okay

So we have 19 apples in our data set

We have 16 lemons

We have only five mandarins, and we have 19 oranges

Even though the number ofmandarin samples is low, we'll have to work with it, because right now I'mjust trying to make you understand the classification algorithms

The main aim for mebehind doing these demos is so that you understand how classification algorithms work

Now what you can do isyou can also plot a graph in order to see the frequencyof each of these fruits

Okay, I'll show you whatthe plot looks like

The number of applesand oranges is the same

We have I think around19 apples and oranges

And similarly, this isthe count for lemons

Okay

So this is a small visualization

Guys, visualization isactually very important when it comes to machine learning, because you can see most of the relations and correlations by plotting graphs

You can't see those correlations by just running code and all of that

Only when you plot differentvariables on your graph, you'll understand how they are related

One of the main task in machine learning is to visualize data

It ensures that you understand the correlation between data

Next, what we're gonna do is we'll graph something known as a box plot

Okay, a box plot basicallyhelps you understand the distribution of your data

Let me run the box plot, and I'll show you what exactly I mean

So this is our box plot

So, box plot will basically give you a clearer idea of the distribution of your input variables

It is mainly used inexploratory data analysis, and it represents thedistribution of the data and its variability

Now, the box plot contains upper quartile and lower quartile

So the box plot basicallyspanned your interquartile range or something known as IQR

IQR is nothing but your third quartile subtracted from your first quartile

Now again, this involvesstatistics and probability

So I'll be leaving a linkin the description box

You can go through that video

I've explained statisticsprobability, IQR, range, and all of that in there

So, one of the main reasonswhy box plots are used is to detect any sortof outliers in the data

Since the box plot spans the IQR, it detects the data point that lie outside the average range

So if you see in the colored space, most of the data isdistributed around the IQR, whereas here the data arenot that well distributed

Height also is not very well distributed, but color space ispretty well distributed

This is what the box plot shows you

So guys, this involves a lot of math

ALl of these, each and everyfunction in machine learning involves a lot of math

So you know it's necessaryto have a good understanding of statistics, probability,and all of that

Now, next, what we'll dois we'll plot a histogram

Histogram will basically show you the frequency of occurrence

Let me just plot this, andthen we'll try and understand

So here you can understanda few correlations

Okay, some pairs of theseattributes are correlated

For example, mass and width, they're somehow correlatedalong the same ranges

So this suggests a high correlation and a predictable relationship

Like if you look at thegraphs, they're quite similar

So for each of the predictor variables, I've drawn a histogram

For each of that input data,we've drawn a histogram

Now guys, again, like i said, plotting graphs is very important because you understanda lot of correlations that you cannot understand by just looking at your data, or just running operations on your data

Repeat, or just running code on your data

Okay

Now, next, what we'redoing here is we're just dividing the data set intotarget and predictor variables

So, basically, I've createdan array of feature names which has your predictor variables

It has mass, width, height, color space

And you have assigned that as X, since this is your input, and y is your outputwhich is your fruit label

That'll show whether it is an apple, orange, lemon, and so on

Now, the next step thatwe'll perform over here is pretty evident

Again, this is data splicing

So data splicing, by now, I'm sure all of you know what it is

It is splitting your data intotraining and testing data

So that's what we've done over here

Next, we're importing somethingknown as the MinMaxScaler

Scaling or normalizing your data is very important in machine learning

Now, I'm seeing this because your raw data can be very biased

So it's very importantto normalize your data

Now when I say normalize your data, so if you look at the value of mass and if you look at thevalue of height and color, you see that mass is rangingin hundreds and double digits, whereas height is in single digit, and color score is noteven in single digits

So, if some of your variableshave a very high range, you know they have a very high scale, like they're in twodigits or three digits, whereas other variables aresingle digits and lesser, then your output isgoing to be very biased

It's obvious that it'sgonna be very biased

That's why you have to scale your data in such a way that all of these values will have a similar range

So that's exactly whatthe scaler function does

Okay

Now since we have already divided our data into training and testing data, our next step is to build the model

So, first, we're gonna be using the logistic regression algorithm

I've already discussed logisticregression with you all

It's a classification algorithm, which is basically usedto predict the outcome of a categorical variable

So we already have the logisticregression class in Python

All you have to do is you have to give an instance for this function, which is logreg over here

And I'm fitting this instancewith a training data set, meaning that I'm running the algorithm with the training data set

Once you do that, you can calculate the accuracy by using this function

So here I'm calculate the accuracy on the training data set and on the testing data set

Okay, so let's look at the output of this

Now guys, ignore this future warning

Warnings are ignored in Python

Now, accuracy of the logisticregression classifier on the training data set is around 70%

It was pretty good onthe training data set

But when it comes to classifyingon the test data set, it's only 40%, which is not that good for a classifier

Now again, this can dependon the problem statement, for which problem statement is logistic regression more suitable

Next, we'll do the same thingusing the decision tree

So again, we just call thedecision tree function, and we'll fit it withthe training data set, and we'll calculate the accuracy of the decision tree on the training, and the testing data set

So if you do that for a decision tree on the training data set, you get 100% accuracy

But on the testing data set, you have around 87% of accuracy

This is something that Idiscussed with you all earlier, that this is decision trees are very good with training data set, because of a process known as overfitting

But when it comes to classifying the outcome on the testing data set, the accuracy reduces

Now, this is very good comparedto logistic regression

For this problem statement, decision trees works better that logistic regression

Coming to KNN classifier

Again, all you have to do is you have to call the K neighborclassifier, this function

And you have to fit thiswith the training data set

If you calculate the accuracyfor a KNN classifier, we get a good accuracy actually

On the training data set, we get an accuracy of 95%

And on the testing data set, it's 100%

That is really good,because our testing data set actually achieved more of an accuracy than on a training data set

Now all of this depends on the value of K that you've chosen for KNN

Now, I mentioned thatyou use the elbow method to choose the K value inthe K nearest neighbor

I'll be discussing the elbowmethod in the next section

So, don't worry if youhaven't understood that yet

Now, we're also using anaive Bayes classifier

Here we're using a Gaussiannaive Bayes classifier

Gaussian is basically a typeof naive Bayes classifier

I'm not going to go into depth of this, because it'll just extend oursession too much more longer

Okay

And if you want to know more about this, I'll leave a link in the description box

You can read all about thecaution naive Bayes classifier

Now, the math behind this is the same

It uses naive Bayes, it usesthe Bayes Theorem itself

Now again, we're gonna call this class, and then we're going to run our data, training data on it

So using the naive Bayes classifier, we're getting an accuracy of 0

86 on the training data set

And on the testing data set,we're getting 67% accuracy

Okay

Now let's do the same thingwith support vector machines

Importing the support vector classifier

And we are fitting the trainingdata into the algorithm

We're getting an accuracy of around 61% on the training data set and33% on the testing data set

Now guys, this accuracy and all depends also on the problem statement

It depends on the type of data that support vector machines get

Usually, SVM is verygood on large data sets

Now since we have a verysmall data set over here, it's sort of obvious bythe accuracy, so less

So guys, these were a coupleof classification algorithms that I showed you here

Now, because our KNN classifier classified our data set more accurately we'll look at the predictionsthat the KNN classifier mean

Okay Now we're storing all our predicted values in the predict variable

now in order to show you the accuracy of the KNN model, we're going to us somethingknown as the confusion matrix

So, a confusion matrix is a table that is often used to describe the performance of a classification model

So, confusion matrix actually represents a tabular representation of actual versus predicted values

So when you draw a confusion matrix on the actual versus predicted values for the KNN classifier, this is what the confusionmatrix looks like

Now, we have four rows over here

If you see, we have four rows

The first row represents apples, second, mandarin, third represents lemons, and fourth, oranges

So this four value correspondsto zero comma zero, meaning that it wascorrectly able to classify all the four apples

Okay

This one value represents one comma one, meaning that our classifiercorrectly classified this as mandarins

This matrix is drawn on actual values versus predicted values

Now, if you look at the summaryof the confusion matrix, we'll get something knownas precision recall, f1-score and support

Precision is basically the ratio of the correctly predictedpositive observations to the total predictedpositive observations

So the correctly predictedpositive observations are four, and there are total of four apples in the testing data set

So that's where I get a precision of one

Okay

Recall on the other hand is the ratio of correctlypredicted positive observations to all the observations in the class

Again, we've correctlyclassified four apples, and there are a total of four apples

F1-score is nothing butthe weighted average of your precision and your recall

Okay, and your support basically denotes the number of data points that were correctly classified

So, in our KNN algorithm,since we got 100% accuracy, all our data points werecorrectly classified

So, 15 out of 15 were correctly classified because we have 100% accuracy

So that's how you read a confusion matrix

Okay, you have four important measures, precision, recall, f1-score, and support

F1-score is just the ratioor the weighted average of your precision and your recall

So precision is basicallythe correctly predicted positive observationsto the total predicted positive observations

Recall is a ratio of the predicted positive observations toall your observations

So guys, that was it for the demo of classification algorithms, we discuss regression algorithms and we discussedclassification algorithms

Now it's time to talk about unsupervised learning algorithms

Under unsupervised learning algorithms may try to solve clustering problems

And the most importantclustering algorithm there is, known as K-means clustering

So we're going to discussthe K-means algorithm, and also show you a demowhere we'll be executing the clustering algorithm, and you're seeing how itimplemented to solve a problem

Now, the main aim of the K-means algorithm is to group similar elementsor data points in to a cluster

So it is basically the process by which objects are classified interest a predefined number of groups, so that they are muchdissimilar as possible from one group to another group, but as much similar aspossible within each group

Now what I mean is let'ssay you're trying to cluster this population intofour different groups, such that each group has people within a specified range of age

Let's say group one is of peoplebetween the age 18 and 22

Similarly, group two is between 23 and 35

Group three is 36 and 39or something like that

So let's say you're trying to cluster people into differentgroups based on their age

So for such problems, you can make use of the K-means clustering algorithm

One of the major applicationsof the clustering algorithm is seen in targeted marketing

I don't know how many of you are aware of targeted marketing

Targeted marketing is all aboutmarketing a specific product to a specific audience

Let's say you're tryingto sell fancy clothes or a fancy set of bags and all of that

And the perfect audience for such product would be teenagers

It would be people aroundthe age of 16 to 21 or 18

So that is what targetmarketing is all about

Your product is marketedto a specific audience that might be interested in it

That is what targeted marketing is

So K means clustering is usemajorly in targeted marketing

A lot of eCommerce websiteslike Amazon, Flipkart, eBay

All of these make useof clustering algorithms in order to target the right audience

Now let's see how theK-means clustering works

Now the K in K-means denotesthe number of clusters

Let's say I give you a dataset containing 20 points, and you want to cluster thisdata set into four clusters

That means your K will be equal to four

So K basically stands forthe number of clusters in your data set, or the number of clustersyou want to form

You start by defining the number K

Now for each of these clusters, you're going to choose a centroid

So for every cluster, there are four cluster in our data set

For each of these clusters, you'll randomly selectone of the data points as a centroid

Now what you'll do is you'll start computing the distance from that centroid to every other point in that cluster

As you keep computing the centroid and the distance between the centroid and other data points in that cluster, your centroid keep shifting, because you're trying to getto the average of that cluster

Whenever you're trying to getto the average of the cluster, the centroid keeps shifting, because the centroid keepsconverging and it keeps shifting

Let's try to understand how K-means works

Let's say that this dataset, this is given to us

Let's say if you're givenrandom points like these and you're asked to usK-means algorithm on this

So your first step will be to decide the number ofclusters you want to create

So let's say I wanna createthree different clusters

So my K value will be equal to three

The next step will be to provide centroids of all the clusters

What you'll do is initiallyyou'll randomly pick three data points as your centroids for your three different clusters

So basically, this red denotesthe centroid for one cluster

Blue denotes a centroidfor another cluster

And this green dot denotes the centroid for another cluster

Now what happens in K-means, the algorithm will calculate the Euclidean distance ofthe points from each centroid and assign the pointsto the closest cluster

Now since we had three centroids here, now what you're gonna do is you're going to calculate the distance from each and every data point to all the centroids, and you're going to check which data point is closest to which centroid

So let's say your data point A is closest to the blue centroid

So you're going to assign the data point A to the blue cluster

So based on the distance between the centroid and the cluster, you're going to formthree different clusters

Now again, you're goingto calculate the centroid and you're going to form a new cluster which is from better clusters, because you're recomputingall those centroids

Basically, your centroids represent the mean of each of your cluster

So you need to make sure that your mean is actuallythe centroid of each cluster

So you'll keep recomputing this centroids until the position of yourcentroid does not change

That means that yourcentroid is actually the main or the average of that particular cluster

So that's how K-means works

It's very simple

All you have to do is you have to start by defining the K value

After that, you have to randomly pick the number of case centroids

Then you're going tocalculate the average distance of each of the datapoints from the centroids, and you're going to assign a data point to the centroid it is closest to

That's how K-means works

It's a very simple process

All you have to do is ushave to keep iterating, and you have to recomputethe centroid value until the centroid value does not change, until you get a constant centroid value

Now guys, again, in K-means, you make use of distancemeasures like Euclidean

I've already discussed whatEuclidean is all about

So, to summarize how K-means works, you start by pickingthe number of clusters

Then you pick a centroid

After that, you calculate the distance of the objects to the centroid

Then you group the datapoints into specific clusters based on their distance

You have to keep computing the centroid until each data point isassigned to the closest cluster, so that's how K-means works

Now let's look at the elbow method

The elbow method is basicallyused in order to find out the most optimum k valuefor a particular problem

So the elbow method isquite simple actually

You start off by computingthe sum of squared errors for some values of K

Now sum of squared error is basically the sum of the squared distance between each member of thecluster and its centroid

So you basically calculatethe sum of squared errors for different values of K

For example, you can consider K value as two, four, six, eight, 10, 12

Consider all these values, compute the sum of squarederrors for each of these values

Now if you plot your K value against your sum of squared errors, you will see that the errordecreases as K gets larger

This is because the numberof clusters increase

If the number of clusters increases, it means that the distortion gets smaller

The distortion keeps decreasing as the number of clusters increase

That's because the more clusters you have, the closer each centroidwill be with its data points

So as you keep increasingthe number of clusters, your distortion will also decrease

So the idea of the elbowmethod is to choose the K at which the distortiondecreases abruptly

So if you look at thisgraph at K equal to four, the distortion is abruptly decreasing

So this is how you find the value of K

When your distortion drops abruptly, that is the most optimal K value you should be choosing foryour problem statement

So let me repeat the ideabehind the elbow method

You're just going to graph thenumber of clusters you have versus the squared sum of errors

This graph will basicallygive you the distortion

Now the distortionobviously going to decrease if you increase the number of clusters, and there is gonna beone point in this graph wherein the distortiondecreases very abruptly

Now for that point, you needto find out the value of K, and that'll be your most optimal K value

That's how you choose your K-means K value and your KNN K value as well

So guys, this is how the elbow method is

It's very simple and itcan be easily implemented

Now we're gonna look at a small demo which involves K-means

This is actually a very interesting demo

Now guys, one interesting application of clustering is in colorcompression with images

For example, imagine you have an image with millions of colors in it

In most images, a large numberof colors will be unused, and many of the pixels in the image will have similar oreven identical colors

Now having too many colors in your image makes it very hard for imageprocessing an image analysis

So this is one area whereK-means is applied very often

It's applied in imagesegmentation, image analysis, image compression, and so on

So what we're gonna do in this demo is we are going to use an image from the scikit-learn data set

Okay, it is a prebuilt image, and you will require to installthe pillow package for this

We're going to use an image form the scikit-learn data set module

So we'll begin by importingthe libraries as usual, and we'll be loading our image as china

The image is china

jpg, and we'll be loading thisin a variable called china

So if you wanna look atthe shape of our image, you can run this command

So we're gonna get athree-dimensional value

So we're getting 427comma 640 comma three

Now this is basically athree dimensional array of size, height, width, and RGB

It contains red, blue,green contributions, as integers from zero to 255

So, your pixel valuesrange between zero and 255, and I think zero stands for your black, and 255 represents white if I'm not wrong

And basically, that's whatthis array shape denotes

Now one way we can view this set of pixels is as a cloud of points in athree dimensional color space

So what we'll do is wewill reshape the data and rescale the color, so that they lie between zero and one

So the output of this will bea two dimensional array now

So basically, we canvisualize these pixels in this color space

Now what we're gonna do is we're gonna try and plot our pixels

We have a really huge data set which contains around 16million possible colors

So this denotes a very,very large data set

So, let me show you what it looks like

We have red against greenand red against blue

These are our RGB value, and we can have around 16 million possible combination of colors

The data set is way toolarge or us to compute

So what we'll do is we willreduce these 16 million colors to just 16 colors

We can do that by usingK-means clustering, because we can cluster similarcolors into similar groups

So this is exactly wherewe'll be importing K-means

Now, one thing to note here is because we're dealing witha very large data set, we will use the MinibatchKMeans

This operates on subsets of the data to compute the results morequickly and more accurately, just like the K-means algorithm, because I told you thisdata set is really huge

Even though this is a single image, the number of pixel combinationscan come up to 16 million, which is a lot

Now each pixel isconsidered as a data point when you've taken imageinto consideration

When you have data points and data values, that's different

When you're starting an imagefor image classification or image segmentation, each and every pixel is considered

So, basically, you're building matrices of all of these pixel values

So having 16 million pixelsis a very huge data set

So, for that reason, we'llbe using the MinibatchKMeans

It's very similar to K-means

The only difference is that it'll operate on subsets of the data

Because the data set is toohuge, it'll operate on subsets

So, basically, we're making use of K-means in order to cluster these 16 million color combinations into just 16 colors

So basically, we're gonna form 16 clusters in this data set

Now, the result is therecoloring of the original pixel where every pixel is assigned the color of its closest cluster center

Let's say that thereare a couple of colors which are very close to green

So we're going to clusterall of these similar colors into one cluster

We'll keep doing thisuntil we get 16 clusters

So, obviously, to do this, we'll be using the clustering method, K-means

Let me show you whatthe output looks like

So, basically, this was the original image from the scikit data set, and this is the 16-color segmented image

Basically, we have only 16 colors here

Here we can have around 16 million colors

Here there are only 16 colors

If you can't also, you canonly see particular colors

Now obviously there's a lotof distortion over here, but this is how you study an image

Remove all the extra contrastthat is there in an image

You try to reduce the pixel to a smaller set of data as possible

The more varied pixels you have, the harder it is going to be for you to study the image for analysis

Now, obviously, there are some details which are lost in this

But overall, the imageis still recognizable

So here, basically, we've compressed this with a compression factorof around one million, because each cluster will have around one million data points in it, or pixel values in it, or pixels in it

Now this is an interestingapplication of K-means

There are actually better ways you can compress information on image

So, basically, I showed you this example because I want you to understand the power of K-means algorithm

You can cluster a dataset that is this huge into just 16 colors

Initially, there were 16 million, and now you can cluster it to 16 colors

So guys, K-means plays a very huge role in computer vision image processing, object detection, and so on

It's a very important algorithm when it comes to detecting objects

So in self-driving cars and all can make use of such algorithms

So guys, that was allabout unsupervised learning and supervised learning

Now it's the last typeof machine learning, which is reinforcement learning

Now this is actually a very interesting partof machine learning, and it is quite difference fromsupervised and unsupervised

So we'll be discussing all the concepts that are involved inreinforcement learning

And also reinforcement learning is a little more advanced

When I say advanced, Imean that it's been used in applications such as self-driving cars and is also a part of a lot of deep learning applications, such as AlphaGo and so on

So, reinforcement learning has a different concept to it itself

So we'll be discussingall the concepts under it

So just to brush up your information about reinforcement learning, reinforcement learning isa part of machine learning where an agent is put inan unknown environment, and he learns how tobehave in this environment by performing certain actionsand observing the rewards which it gets from these actions

Reinforcement learning is all about taking an appropriate action inorder to maximize the reward in a particular situation

Now let's understandreinforcement learning with an analogy

Let's consider a scenario wherein a baby is learning how to walk

This scenario can go aboutin two different ways

The first is baby starts walking and it makes it to the candy

And since the candy is the end goal, the baby is very happy and it's positive

Meaning, the baby is happy and it received a positive reward

Now, the second way this can go in is that the baby starts walking, but it falls due to some hurdle between

That's really cute

So the baby gets hurt andit doesn't get to the candy

It's negative because the baby is sad and it receives a negative reward

So just like how we humanslearn from our mistakes by trial and error, reinforcement learning is also similar

Here we have an agent, and in this case, the agent is the baby, and the reward is the candy with many hurdles in between

The agent is supposed to find the best possible pathto reach the reward

That is the main goal ofreinforcement learning

Now the reinforcement learning process has two important components

It has something known as an agent and something known as an environment

Now the environment is the setting that the agent is acting on, and the agent represents the reinforcement learning algorithm

The whole reinforcementlearning is basically the agent

The environment is the setting in which you place the agent, and it is the setting wherein the agent takes various action

The reinforcement learning process starts when the environmentsends a state to the agent

Now the agent, based onthe observations it makes, it takes an action inresponse to that state

Now, in turn, the environmentwill send the next state and the respectivereward back to the agent

Now the agent will update its knowledge with the reward returnedby the environment to evaluate its last actions

The loop continues until the environment sends a terminal state which means that theagent has accomplished all of its task

To understand this better, let's suppose that our agentis playing Counter Strike

The reinforcement learning process can be broken down into a couple of steps

The first step is thereinforcement learning agent, which is basically the player, he collects a state, Snaught, from the environment

So whenever you're playing Counter Strike, you start off withstage zero or stage one

You start off from the first level

Now based on this state, S naught, the reinforcement learning agent will take an action, A naught

So guys, action can beanything that causes a result

Now if the agent movesleft or right in the game, that is also considered as an action

So initially, the action will be random, because the agent has noclue about the environment

Let's suppose that you'replaying Counter Strike for the first time

You have no idea about how to play it, so you'll just start randomly

You'll just go with whatever, whichever action you think is right

Now the environment is now in a stage one

After passing stage zero, the environment will go into stage one

Once the environment updatesthe stage to stage on, the reinforcement learning agent will get a reward R onefrom the environment

This reward can be anythinglike additional points or you'll get additional weapons when you're playing Counter Strike

Now this reinforcementlearning loop will go on until the agent is dead orreaches the destination, and it continuously outputs a sequence of state action and rewards

This exactly howreinforcement learning works

It starts with the agentbeing put in an environment, and the agent will randomly take some action in state zero

After taking an action,depending on his action, he'll either get a reward and move on to state number one, or he will either die andgo back to the same state

So this will keep happening until the agent reaches the last stage, or he dies or reaches his destination

That's exactly howreinforcement learning works

Now reinforcement learning is the logic behind a lot of games these days

It's being implemented invarious games, such as Dota

A lot of you who playDota might know this

Now let's talk about a couple of reinforcement learningdefinitions or terminologies

So, first, we have somethingknown as the agent

Like I mentioned, an agent is the reinforcement learning algorithm that learns from trial and error

An agent is the onethat takes actions like, for example, a solider in Counter Strike navigating through the game, going right, left, and all of that

Is the agent taking some action? The environment is because the world through which the agent moves

Now the environment, basically, takes the agent's currentstate and action as input, and returns the agent's reward and its next state as the output

Next, we have something known as action

All the possible stepsthat an agent can take is considered as an action

Next, we have something known as state

Now the current conditionreturned by the environment is known as a state

Reward is an instantreturn from the environment to apprise the last action of the reinforcement learning agent

All of these terms arepretty understandable

Next, we have something known as policy

Now, policy is the approachthat the agent uses to determine the next action based on the current state

Policy is basically the approach with which you go aroundin the environment

Next, we have something known as value

Now, the expected long-termreturn with a discount, as opposed to the short-term rewards R, is known as value

Now, terms like discount and value, I'll be discussing in the upcoming slides

Action-value is also verysimilar to the value, except it takes an extra parameter known as the current action

Don't worry about action and Q value

We'll talk about all ofthis in the upcoming slides

So make yourself familiarwith these terms, because we'll be seeing awhole lot of them this session

So, before we move any further, let's discuss a couple of more reinforcement learning concepts

Now we have something knownas the reward maximization

So if you haven't realized it already, the basic aim ofreinforcement learning agent is to maximize the report

How does this happen? Let's try to understand thisin a little more detail

So, basically the agentworks based on the theory of reward maximization

Now that's exactly whythe agent must be trained in such a way that hetakes the best action, so that the reward is maximal

Now let me explain a reward maximization with a small example

Now in this figure, youcan see there is a fox, there is some meat, and there is a tiger

Our reinforcementlearning agent is the fox

His end goal is to eatthe maximum amount of meat before being eaten by the tiger

Now because the fox is a very clever guy, he eats the meat that is closer to him, rather than the meat whichis close to the tiger, because the closer he gets to the tiger, the higher are hischances of getting killed

That's pretty obvious

Even if the reward near thetiger are bigger meat chunks, that'll be discounted

This is exactly what discount is

We just discussed itin the previous slide

This is done because ofthe uncertainty factor that the tiger mightactually kill the fox

Now the next thing tounderstand is how discounting of a reward works

Now, in order to understand discounting, we define a discount rate called gamma

The value of gamma isbetween zero and one

And the smaller the gamma, the larger the discount and so on

Now don't worry about these concepts, gamma and all of that

We'll be seeing that inour practical demo today

So let's move on anddiscuss another concept known as exploration andexploitation trade-off

Now guys, before that, Ihope all of you understood reward maximization

Basically, the main aimbehind reinforcement learning is to maximize the rewardsthat an agent can get

Now, one of the most important concepts in reinforcement learning is the exploration andexploitation trade-off

Now, exploration, like the name suggests, it's about exploring and capturing more information about an environment

On the other hand, exploitation is about using the already knownexploited information to heighten your reward

Now consider the same example that we saw previously

So here the fox eats only the meat chunks which are close to him

He doesn't eat the bigger meat chunks which are at the top, even though the bigger meat chunks would get him more reward

So if the fox only focuseson the closest reward, he will never reachthe big chunks of meat

This process is known as exploitation

But if the fox decide to explore a bit, it can find the bigger reward, which is the big chunk of meat

This is known as exploration

So this is the differencebetween exploitation and exploration

It's always best if the agentexplores the environment, tries to figure out away in which we can get the maximum number of rewards

Now let's discussanother important concept in reinforcement learning, which is known as theMarkov's decision process

Basically, the mathematicalapproach for mapping a solution in reinforcement learning is called Markov's decision process

It's the mathematics behindreinforcement learning

Now, in a way, the purposeof reinforcement learning is to solve a Markov's decision process

Now in order to get a solution, there are a set of parameters in a Markov's decision process

There's a set of actions A, there's a set of states S, a reward R, policy pi, and value V

Also, this image represents how a reinforcement learning works

There's an agent

The agent take someaction on the environment

The environment, in turn,will reward the agent, and it will give him the next state

That's how reinforcement learning works so to sum everything up, what happens in Markov's decision process and reinforcement learning is the agent has to take an action A to transition from the start state to the end state S

While doing so, the agentwill receive some reward R for each action he takes

Now the series of actionthat are taken by the agent define the policy andthe rewards collected to find the value

The main goal here isto maximize the rewards by choosing the optimum policy

So you're gonna choosethe best possible approach in order to maximize the rewards

That's the main aim ofMarkov's decision process

To understand Markov's decision process, let's look at a small example

I'm sure all of you already know about the shortest path problem

We all had such problemsand concepts in math to find the shortest path

Now consider this representationover here, this figure

Here, our goal is tofind the shortest path between two nodes

Let's say we're trying to find the shortest path betweennode A and node D

Now each edge, as you can see, has a number linked with it

This number denotes the cost to traverse through that edge

So we need to choose apolicy to travel from A to D in such a way that our cost is minimum

So in this problem, the set of states are denoted by the nodes A, B, C, D

The action is to traversefrom one node to the other

For example, if you're going from to A C, there is an action

C to B is an action

B to D is another action

The reward is the costrepresented by each edge

Policy is the path takento reach the destination

So we need to make surethat we choose a policy in such a way that our cost is minimal

So what you can do is youcan start off at node A, and you can take baby stepsto reach your destination

Initially, only the nextpossible node is visible to you

So from A, you can either go to B or you can go to C

So if you follow the greedy approach and take the most optimum step, which is choosing A to C, instead of choosing A to B to C

Now you're at node C and you want to traverse to node D

Again, you must chooseyour path very wisely

So if you traverse from A to C, and C to B, and B to D, your cost is the lest

But if you traverse from A to C to D, your cost will actually increase

Now you need to choose a policy that will minimize your cost over here

So let's say, for example, the agent chose A to C to D

It came to node C, andthen it directly chose D

Now the policy followed byour agent in this problem is exploitation type, because we didn't explore the other notes

We just selected three nodesand we traversed through them

And the policy we followed is not actually an optimal policy

We must always explore more to find out the optimal policy

Even if the other nodes arenot giving us any more reward or is actually increasing our cost, we still have to explore and find out if those paths are actually better

That policy is actually better

The method that we implemented here is known as the policy-based learning

Now the aim here is tofind the best policy among all the possible policies

So guys, apart from policy-based, we also have value-based approach and action-based approach

Value based emphasizes onmaximizing the rewards

And in action base, we emphasize on each action taken by the agent

Now a point to note is that all of these learning approacheshave a simple end goal

The end goal is toeffectively guide the agent through the environment, and acquire the most number of rewards

So this was very simple to understand Markov's decision process, exploitation and exploration trade-off, and we also discussed the different reinforcementlearning definitions

I hope all of this was understandable

Now let's move on and understand an algorithm known asQ-learning algorithm

So guys, Q-learning is one ofthe most important algorithms in reinforcement learning

And we'll discuss this algorithm with the help of a small example

We'll study this example, and then we'll implement thesame example using Python, and we'll see how it works

So this is how ourdemonstration looks for now

Now the problem statementis to place an agent in any one of the rooms numbered zero, one, two, three, and four

And the goal is for the agent to reach outside the building, which is room number five

So, basically, this zero,one, two, three, four represents the building, and five represents a roomwhich is outside the building

Now all these roomsare connected by those

Now these gaps that yousee between the rooms are basically those, and each room is numberedfrom zero to four

The outside of the building can be taught of as a big room which is room number five

Now if you've noticed this diagram, the door number one and door number four lead directly to room number five

From one, you can directly go to five, and from four, also, youcan directly go to five

But if you want to go tofive from room number two, then you'll first have togo to room number three, room number one, andthen room number five

So these are indirect links

Direct links are from roomnumber one and room number four

So I hope all of you are clearwith the problem statement

You're basically going to have a reinforcement learning agent, and than agent has totraverse through all the rooms in such a way that hereaches room number five

To solve this problem, first, what we'll do iswe'll represent the rooms on a graph

Now each room is denoted as anode, and the links that are connectingthese nodes are the doors

Alright, so we have node one to five, and the links between each of these nodes represent the doors

So, for example, if you lookat this graph over here, you can see that thereis a direct connection from one to five, meaning that you can directlygo from room number one to your goal, which is room number five

So if you want to go fromroom number three to five, you can either go to room number one, and then go to five, or you can go from roomnumber three to four, and then to five

So guys, remember, end goalis to reach room number five

Now to set the room numberfive as the goal state, what we'll do is we'llassociate a reward value to each door

The doors that leadimmediately to the goal will have an instant reward of 100

So, basically, one to fivewill have a reward of hundred, and four to five will alsohave a reward of hundred

Now other doors that are not directly connected to the target room will have a zero reward, because they do not directlylead us to that goal

So let's say you placed theagent in room number three

So to go from room number three to one, the agent will get a reward of zero

And to go from one to five, the agent will get a reward of hundred

Now because the doors are two-way, the two arrows are assigned to each room

You can see an arrowgoing towards the room and one coming from the room

So each arrow contains an instant reward as shown in this figure

Now of course room number five will loop back to itselfwith a reward of hundred, and all other directconnections to the goal room will carry a reward of hundred

Now in Q-learning, thegoal is to reach the state with the highest reward

So that if the agent arrives at the goal, it will remain there forever

So I hope all of you areclear with this diagram

Now, the terminologies in Q-learning include two terms, state and action

Okay, your room basicallyrepresents the state

So if you're in state two, it basically means thatyou're in room number two

Now the action is basicallythe moment of the agent from one room to the other room

Let's say you're goingfrom room number two to room number three

That is basically an action

Now let's consider some more example

Let's say you place theagent in room number two and he has to get to the goal

So your initial statewill be state number two or room number two

Then from room number two,you'll go to room number three, which is state three

Then from state three, you caneither go back to state two or go to state one or state four

If you go to state four, fromthere you can directly go to your goal room, which is five

This is how the agentis going to traverse

Now in order to depict therewards that you're going to get, we're going to create a matrixknown as the reward matrix

Okay, this is represented by R or also known as the R matrix

Now the minus one in thistable represents null values

That is basically where there isn't a link between the nodes that isrepresented as minus one

Now there is no linkbetween zero and zero

That's why it's minus one

Now if you look at this diagram, there is no direct link from zero to one

That's why I've put minusone over here as well

But if you look at zero comma four, we have a value of zero over here, which means that you cantraverse from zero to four, but your reward is going to be zero, because four is not your goal state

However, if you look at the matrix, look at one comma five

In one comma five, we havea reward value of hundred

This is because you can directly go from room number one to five, and five is the end goal

That's why we've assigneda reward of hundred

Similarly, for four comma five, we have a reward of hundred

And for five comma five, we have a reward of hundred

Zeroes basically represent other links, but they are zero because theydo not lead to the end goal

So I hope you all understoodthe reward matrix

It's very simple

Now before we move any further, we'll be creating another matrix known as the equitable Q matrix

Now the Q matrix basicallyrepresents the memory of what the agent haslearned through experience

The rules of the Q matrix will represent the currentstate of the agent

The columns will representthe next possible actions leading to the next state, and the formula to calculate the Q matrix is this formula, right? Here we have Q state comma action, R state comma action, which is nothing but the reward matrix

Then we have a parameterknown as the Gamma parameter, which I'll explain shortly

And then we are multiplyingthis with a maximum of Q next state comma all actions

Now don't worry if you haven'tunderstood this formula

I'll explain this with a small example

For now, let's understandwhat a Gamma parameter is

So, basically, the value of Gamma will be between zero and one

If Gamma is closer to zero, it means that the agentwill tend to consider only immediate rewards

Now, if the Gamma is closer to one, it means that the agentwill consider future rewards with greater weight

Now what exactly I'm trying to say is if Gamma is closer to one, then we'll be performingsomething known as exploitation

I hope you all remember what exploitation and exploration trade-off is

So, if your gamma is closer to zero, it means that the agent is not going to explore the environment

Instead, it'll justchoose a couple of states, and it'll just traversethrough those states

But if your gammaparameter is closer to one, it means that the agent will traverse through all possible states, meaning that it'll perform exploration, not exploitation

So the closer your gammaparameter is to one, the more your agent will explore

This is exactly what Gamma parameter is

If you want to get the best policy, it's always practical thatyou choose a Gamma parameter which is closer to one

We want the agent toexplore the environment as much as possible so that it can get the bestpolicy and the maximum rewards

I hope this is clear

Now let me just tell you what a Q-learningalgorithm is step by step

So you begin the Q-learning algorithm by setting the Gamma parameter and the environment rewards in matrix R

Okay, so, first, you'llhave set these two values

We've already calculatedthe reward matrix

We need to set the Gamma parameter

Next, you'll initializethe matrix Q to zero

Now why do you do this? Now, if you remember, I said that Q matrix is basicallythe memory of the agent

Initially, obviously, the agent has no memoryof the environment

It's new to the environment and you're placing it randomly anywhere

So it has zero memory

That's why you initializethe matrix Q to zero

After that, you'll selecta random initial state, and you place your agentin that initial state

Then you'll set this initialstate as your current state

Now from the current state,you'll select some action that will lead you to the next state

Then you'll basicallyget the maximum Q value for this next state, based on all the possibleactions that we take

Then you'll keep computing the skew value until you reach the goals state

Now that might be a little bit confusing, so let's look at this entirething with a small example

Let's say that first, you're gonna begin with setting your Gamma parameter

So I'm setting my Gamma parameter to 0

8 which is pretty close to one

This means that our agentwill explore the environment as much as possible

And also, I'm setting theinitial state as room one

Meaning, I'm in stateone or I'm in room one

So basically, your agent isgoing to be in room number one

The next step is to initializethe Q matrix as zero matrix

So this is a Q matrix

You can see thateverything is set to zero, because the agent has no memory at all

He hasn't traversed to any node, so he has no memory

Now since the agent is in room one he can either go to room number three or he can go to room number five

Let's randomly select room number five

So, from room number five, you're going to calculatethe maximum Q value for the next state basedon all possible actions

So all the possible actionsfrom room number five is one, four, and five

So, basically, the traversingfrom Q one comma five, that's why I put one comma five over here, state comma action

Your reward matrix willhave R one comma five

Now R one comma five is basically hundred

That's why I put hundred over here

Now your comma parameter is 0

8

So, guys, what I'm doing here is I'm just substitutingthe values in this formula

So let me just repeat this whole thing

Q state comma action

So you're in state number one, correct? And your action is you'regoing to room number five

So your Q state commaaction is one comma five

Again, your reward matrix Rone comma five is hundred

So here's you're gonna put hundred, plus your Gamma parameter

Your Gamma parameter is 0

8

Then you're going tocalculate the maximum Q value for the next state basedon all possible actions

So let's look at the next state

From room number five,you can go to either one

You can go to four or you can go to five

So your actions are fivecomma one, five comma four, and five comma five

That's exactly what I mentioned over here

Q five comma one, Q five commafour, and Q five comma five

You're basically putting allthe next possible actions from state number five

From here, you'll calculate the maximum Q value that you'regetting for each of these

Now your Q value is zero, because, initially, yourQ matrix is set to zero

So you're going to getzero for Q five comma one, five comma four, and five comma five

So that's why you'll get 0

8 and zero, and hence your Q one commafive becomes hundred

This hundred comes from R one comma five

I hope all of you understood this

So next, what you'll do is you'll update this one comma fivevalue in your Q matrix, because you just calculatedQ one comma five

So I've updated it over here

Now for the next episode, we'll start with a randomlychosen initial state

Again, let's say that we randomlychose state number three

Now from room number three, you can either go to roomnumber one, two or four

Let's randomly select room number one

Now, from room number five, you'll calculate the maximum Q value for the next possible actions

So let's calculate the Q formula for this

So your Q state comma actionbecomes three comma one, because you're in state number three and your action is you'regoing to room number one

So your R three comma one, let's see what R three comma one is

R three comma one is zero

So you're going to put zero over here, plus your Gamma parameter, which is 0

8, and then you're going to checkthe next possible actions from room number one, and you're going tochoose the maximum value from these two

So Q one comma three and Q one comma five denote your next possibleactions from room number one

So Q one comma three is zero, but Q one comma five is hundred

So we just calculated thishundred in the previous step

So, out of zero and hundred, hundred is your maximum value, so you're going to choose hundred

Now 0

8 into hundred is nothing but 80

So again, your Q matrix gets updated

You see an 80 over here

So, basically what you're doingis as you're taking actions, you're updating your Q value, you're just calculatingthe Q value at every step, you're putting it in your Q matrix so that your agent remembers that, okay, when I went from roomnumber one to room number five, I had a Q value of hundred

Similarly, three to onegave me a Q value of 80

So basically, this Q matrix represents the memory of your agent

I hope all of you are clear with this

So basically, what we're gonna do is we're gonna keepiterating through this loop until we've gone throughall possible states and reach the goal state, which is five

Also, our main aim here is tofind the most optimum policy to get to room number five

Now let's implement the exactsame thing using Python

So that was a lot of theory

Now let's understand howthis is done practically

Alright, so we begin byimporting your library

We're gonna be using theNumPy library over here

After that, we'll import the R matrix

We've already created the R matrix

This is the exact matrix that I showed you a couple of minutes ago

So I've created a matrix called R and I've basically storedall the rewards in it

If you want to see the Rmatrix, let me print it

So, basically, this is your R matrix

If you remember, node one to five, you have a reward of hundred

Node four to five, youhave a reward of hundred, and five to five, youhave a reward of hundred, because all of these nodesdirectly lead us to the reward

Correct? Next, what we're doing iswe're creating a Q matrix which is basically a six into six matrix

Which represents all thestates, zero to five

And this matrix is basically zero

After, that we're settingthe Gamma parameter

Now guys, you can playaround with this code, and you know you canchange the comma parameter to 0

7 or 0

9 and see how much morethe agent will explore or whether you perform exploitation

Here I've set the Gamma parameter 0

8 which is a pretty good number

Now what I'm doing is I'msetting the initial state as one

You can randomly choose this state according to your needs

I've set the initial state as one

Now, this function will basically give me all the available actionsfrom my initial state

Since I've set my initial state as one, It'll give me all the possible actions

Here what I'm doing is sincemy initial state is one, I'm checking in my row number one, which value is equal tozero or greater than zero

Those denote my available actions

So look at our row number one

Here we have one zero andwe have a hundred over here

This is one comma four andthis is one comma five

So if you look at the row number one, since I've selected theinitial state as one, we'll consider row number one

Okay, what I'm doing is in row number one, I have two numbers whichare either equal to zero or greater than zero

These denote my possible actions

One comma three has the value of zero and one comma five hasthe value of hundred, which means that the agent can either go to room number three or it can go to room number five

What I'm trying to sayis from room number one, you can basically go to room number three or room number five

This is exactly what I've coded over here

If you remember the reward matrix, from one you can traverse toonly room number three directly and room number five directly

Okay, that's exactly what I've mentioned in my code over here

So this will basically giveme the available actions from my current state

Now once I've moved to me next state, I need to check the availableactions from that state

What I'm doing overhere is basically this

If you're remember, from room number one, we cango to three and five, correct? And from three and five, I'll randomly select the state

And from that state, I need to find out all possible actions

That's exactly what I've done over here

Okay

Now this will randomlychoose an action for me from all my available actions

Next, we need to update our Q matrix, depending on the actions that we took, if you remember

So that's exactly what thisupdate function is four

Now guys, this entire isfor calculating the Q value

I hope all of you remember the formula, which is Q state comma action, R state comma action plusGamma into max value

Max value will basicallygive me the maximum value out of the all possible actions

I'm basically computing this formula

Now this will just update the Q matrix

Coming to the training phase, what we're gonna do is weare going to set a range

Here I've set a range of 10,000, meaning that my agent willperform 10,000 iterations

You can set this dependingon your own needs, and 10,000 iteration isa pretty huge number

So, basically, my agentis going to go through 10,000 possible iterations in order to find the best policy

Now this is the exact samething that we did earlier

We're setting the current state, and then we're choosingthe available action from the current state

The from there, we'llchoose an action at random

Here we'll calculate a Q value and we'll update theQ value in the matrix

Alright

And here I'm doing nothing, but I'm printing the trained Q matrix

This was the training phase

Now the testing phase, basically, you're going to randomlychoose a current state

You're gonna choose a current state, and you're going to keep looping through this entire code, until you reach the goal state,which is room number five

That's exactly what I'mdoing in this whole thing

Also, in the end, I'mprinting the selected part

That is basically thepolicy that the agent took to reach room number five

Now if I set the current state as one, it should give me the best policy to reach to room numberfive from room number one

Alright, let's run this code, and let's see if it's giving us that

Now before that happens, Iwant you to check and tell me which is the best possible way to get from room numberone to room number five

It's obviously directly like this

One to five is the best policyto get from room number one to room number five

So we should get anoutput of one comma five

That's exactly what we're getting this is a Q matrix with all the Q values, and here we are getting the selected path

So if your current state is one, your best policy is togo from one to five

Now, if you want tochange your current state, let's say we set the current state to two

And before we run the code, let's see which is the best possible way to get to room numberfive from room number two

From room number two, you can go to three, then you can go to one, andthen you can go to five

This will give you a reward of hundred, or you can go to room number three, then go to four, and then go to five

This will also give youa reward of hundred

Our path should be something like that

Let's save it and let's run the file

So, basically, from stage two, you're going to say three, then to four, and then to five

This is our best possible path from two to room number five

So, guys, this is exactly how the Q learning algorithm works, and this was a simple implementation of the entire examplethat I just told you

Now if any of you still have doubts regarding Q learning orreinforcement learning, make sure you comment themin the comment section, and I'll try to answer all of your doubts

No we're done with machine learning

We've completed the wholemachine learning model

We've understood reinforcement learning, supervised learning,unsupervised learning, and so on

Before I'll get to deep learning, I want to clear a verycommon misconception

A lot of people get confused between AI machinelearning and deep learning, because, you know,artificial intelligence, machine learning and deep learning are very common applications

For example, Siri is an application of artificial intelligence, machine learning, and deep learning

So how are these three connected? Are they the same thing or how exactly is the relationship betweenartificial intelligence, machine learning, and deep learning? This is what I'll be discussing

now artificial intelligenceis basically the science of getting machines to mimicthe behavior of human beings

But when it comes to machine learning, machine learning is a subsetof artificial intelligence that focuses on gettingmachines to make decisions by feeding them data

That's exactly what machine learning is

It is a subset of artificial intelligence

Deep learning, on the other hand, is a subset of machine learning that uses the concept of neural networks to solve complex problems

So, to sum it up, artificial intelligence, machine learning, and deep learning, are interconnected fields

Machine learning and deep learning aids artificial intelligence by providing a set ofalgorithms and neural networks to solve data-driven problems

That's how AI, machinelearning, and deep learning are related

I hope all of you havecleared your misconceptions and doubts about AI,ML, and deep learning

Now let's look at our next topic, which is limitations of machine learning

Now the first limitationis machine learning is not capable enough tohandle high dimensional data

This is where the input andthe output is very large

So handling and processingsuch type of data becomes very complex and it takes up a lot of resources

This is also sometimes knownas the curse of dimensionality

So, to understand this in simpler terms, look at the image shown on this slide

Consider a line of hundred yards and let's say that you droppeda coin somewhere on the line

Now it's quite convenientfor you to find the coin by simply walking along the line

This is very simple becausethis line is considered as single dimensional entity

Now next, you consider that you have a square of hundred yards, and let's say you dropped acoin somewhere in between

Now it's quite evident thatyou're going to take more time to find the coin within that square as compared to the previous scenario

The square is, let's say,a two dimensional entity

Let's take it a step aheadand let's consider a cube

Okay, let's say there'sa cube of 500 yards and you have dropped a coinsomewhere in between this cube

Now it becomes even more difficult for you to find the coin this time, because this is a threedimensional entity

So, as your dimension increases, the problem becomes more complex

So if you observe that the complexity is increasing the increasein your dimensions, and in real life, thehigh dimensional data that we're talking abouthas thousands of dimensions that makes it very complexto handle and process

and a high dimensional data can easily be found in usedcases like image processing, natural language processing,image translation, and so on

Now in K-means itself, we saw that we had 16million possible colors

That is a lot of data

So this is why machinelearning is restricted

It cannot be used in theprocess of image recognition because image recognition andimages have a lot of pixels and they have a lot ofhigh dimensional data

That's why machine learningbecomes very restrictive when it comes to such uses cases

Now the second major challengeis to tell the computer what are the features it should look for that will play an important role in predicted the outcome andin getting a good accuracy

Now this process is somethingknown as feature extraction

Now feeding raw data tothe algorithm rarely works, and this is the reasonwhy feature extraction is a critical part ofmachine learning workflow

Now the challenge for theprogrammer here increases because the effectiveness of the algorithm depends on how insightfulthe programmer is

As a programmer, you have to tell the machinethat these are the features

And depending on these features, you have to predict the outcome

That's how machine learning works

So far, in all our demos, we saw that we were providingpredictor variables

we were providing input variables that will help us predict the outcome

We were trying to findcorrelations between variables, and we're trying to find out the variable that is very important inpredicting the output variable

So this becomes a challengefor the programmer

That's why it's very difficult to apply machine learning model to complex problems like object recognition,handwriting recognition, natural language processing, and so on

Now all these problems and all these limitationsin machine learning led to the introduction of deep learning

Now we're gonna discussabout deep learning

Now deep learning isone of the only methods by which we can overcome the challenges of feature extraction

This is because deep learning models are capable of learning to focus on the right features by themselves, which requires very littleguidance from the programmer

Basically, deep learning mimicsthe way our brain functions

That is it learns from experience

So in deep learning, what happens is feature extraction happens automatically

You need very littleguidance by the programmer

So deep learning will learn the model, and it will understand whichfeature or which variable is important in predicting the outcome

Let's say you have millionsof predictor variables for a particular problem statement

How are you going to sit down and understand the significance of each of these predictor variables it's going to be almost impossible to sit down with so many features

That's why we have deep learning

Whenever there's high dimensionality data or whenever the data is really large and it has a lot of features and a lot of predictorvariables, we use deep learning

Deep learning will extractfeatures on its own and understand whichfeatures are important in predicting your output

So that's the main ideabehind deep learning

Let me give you a small example also

Suppose we want to make a system that can recognize theface of different people in an image

Okay, so, basically,we're creating a system that can identify the faces ofdifferent people in in image

If we solve this by using the typical machine learning algorithms, we'll have to definefacial features like eyes, nose, ears, et cetera

Okay, and then the system will identify which features are moreimportant for which person

Now, if you consider deeplearning for the same example, deep learning will automaticallyfind out the features which are important for classification, because it uses theconcept of neural networks, whereas in machine learning we have to manually define these features on our own

That's the main differencebetween deep learning and machine learning

Now the next question ishow does deep learning work? Now when people startedcoming up with deep learning, their main aim was tore-engineer the human brain

Okay, deep learning studiesthe basic unit of a brain called the brain cell or a neuron

All of you biology students will know what I'm talking about

So, basically, deep learning is inspired from our brain structure

Okay, in our brains, we havesomething known as neurons, and these neurons arereplicated in deep learning as artificial neurons, which are also called perceptrons

Now, before we understand howartificial neural networks or artificial neurons work, let's understand how thesebiological neurons work, because I'm not sure how many of you are bio students over here

So let's understand thefunctionality of biological neurons and how we can mimic this functionality in a perceptron or inan artificial neuron

So, guys, if you loo at this image, this is basically an imageof a biological neuron

If you focus on the structureof the biological neuron, it has something known dendrites

These dendrites are basicallyused to receive inputs

Now the inputs are basicallyfound in the cell body, and it's passed on thenext biological neuron

So, through dendrites, you'regoing to receive signals from other neurons, basically, input

Then the cell body willsum up all these inputs, and the axon will transmitthis input to other neurons

The axon will fire upthrough some threshold, and it will get passedonto the next neuron

So similar to this, a perceptronor an artificial neuron receives multiple inputs, and applies varioustransformations and functions and provides us an output

These multiple inputs arenothing but your input variables or your predictor variables

You're feeding input datato an artificial neuron or to a perceptron, and this perceptron will apply various functions and transformations, and it will give you an output

Now just like our brain consists of multiple connected neuronscalled neural networks, we also build something known as a network of artificial neurons called artificial neural networks

So that's the basic conceptbehind deep learning

To sum it up, whatexactly is deep learning? Now deep learning is a collection of statistical machine learning techniques used to learn featurehierarchies based on the concept of artificial neural networks

So the main idea behind deep learning is artificial neural networks which work exactly likehow our brain works

Now in this diagram, you can see that there are a couple of layers

The first layer is knownas the input layer

This is where you'llreceive all the inputs

The last layer is knownas the output layer which provides your desired output

Now, all the layers which arethere between your input layer and your output layer areknown as the hidden layers

Now, they can be anynumber of hidden layers, thanks to all the resourcesthat we have these days

So you can have hundreds ofhidden layers in between

Now, the number of hidden layers and the number of perceptronsin each of these layers will entirely depend on the problem or on the use case thatyou're trying to solve

So this is basicallyhow deep learning works

So let's look at theexample that we saw earlier

Here what we want to dois we want to perform image recognition using deep networks

First, what we're gonnado is we are going to pass this high dimensionaldata to the input layer

To mach the dimensionalityof the input data, the input layer will contain multiple sub layers of perceptrons so that it consume the entire input

Okay, so you'll have multiplesub layers of perceptrons

Now, the output receivedfrom the input layer will contain patterns andwill only be able to identify the edges of the images,based on the contrast levels

This output will then be fedto hidden layer number one where it'll be able toidentify facial features like your eyes, nose,ears, and all of that

Now from here, the output will be fed to hidden layer number two, where it will be able to form entire faces it'll go deeper into face recognition, and this output of the hidden layer will be sent to the output layer or any other hidden layer that is there before the output layer

Now, finally, the output layerwill perform classification, based on the result that you'd get from your previous layers

So, this is exactly howdeep learning works

This is a small analogy that I use to make you understandwhat deep learning is

Now let's understand what asingle layer perceptron is

So like I said, perceptron is basically an artificial neuron

For something known as single layer and multiple layer perceptron, we'll first focus onsingle layer perceptron

Now before I explain whata perceptron really is, you should known that perceptronsare linear classifiers

A single layer perceptron is a linear or a binary classifier

It is used mainly in supervised learning, and it helps to classifythe given input data into separate classes

So this diagram basicallyrepresents a perceptron

A perceptron has multiple inputs

It has a set of inputslabeled X one, X two, until X n

Now each of these input isgiven a specific weight

Okay, so W one representsthe weight of input X one

W two represents the weightof input X two, and so on

Now how you assign these weights is a different thing altogether

But for now, you needto know that each input is assigned a particular weightage

Now what a perceptron doesis it computes some functions on these weighted inputs, andit will give you the output

So, basically, these weighted inputs go through something known as summation

Okay, summation is nothing but the product of each of your input withits respective weight

Now after the summation is done, this passed onto transfer function

A transfer function is nothingbut an activation function

I'll be discussing moreabout this in a minute

The activation function

And from the activation function, you'll get the outputsY one, Y two, and so on

So guys, you need tounderstand four important parts in a perceptron

So, firstly, you have the input values

You have X one, X two, X three

You have something knownas weights and bias, and then you have somethingknown as the net sum and finally the activation function

Now, all the inputs X are multiplied withthe respective weights

So, X one will be multiplied with W one

This is known as the summation

After this, you'll addall the multiplied values, and we'll call them as the weighted sum

This is done using the summation function

Now we'll apply the weighted sum to a correct activation function

Now, a lot of people have a confusion about activation function

Activation function is alsoknown as the transfer function

Now, in order to understandactivation function, this word stems from the way neurons in a human brain work

The neuron becomes activate only after a certain potential is reached

That threshold is known asthe activation protection

Therefore, mathematically, it can be represented by a function that reaches saturation after a threshold

Okay, we have a lot ofactivation functions like signum, sigmoid,tan, hedge, and so on

You can think of activation function as a function that maps the input to the respective output

And now I also spokeabout weights and bias

Now why do we assign weightsto each of these inputs? What weights do is they show a strength of a particular input, or how important a particular input is for predicting the final output

So, basically, the weightage of an input denotes the importance of that input

Now, our bias basically allows us to shift the activation function in order to get a precise output

So that was all about perceptrons

Now in order to make youunderstand perceptrons better, let's look at a small analogy

Suppose that you wanna go to a party happening near your hose

Now your decision willdepend on a set of factors

First is how is the weather

Second probably is yourwife, or your girlfriend, or your boyfriend going with you

And third, is there anypublic transport available? Let's say these are the three factors that you're going to considerbefore you go to a party

So, depending on these predictor variables or these features, you're going to decide whetheryou're going to stay at home or go and party

Now, how is the weather isgoing to be your first input

We'll represent this with a value X one

Is your wife going withyou is another input X two

Any public transport is available is your another input X three

Now, X one will have twovalues, one and zero

One represents that the weather is good

Zero represents weather is bad

Similarly, one representsthat your wife is going, and zero represents thatyour wife is not going

And in X three, again, one represents that there is public transport, and zero represents thatthere is no public transport

Now your output willeither be one or zero

One means you are going to the party, and zero means you willbe sitting at home

Now in order to understand weightage, let's say that the mostimportant factor for you is your weather

If the weather is good, it means that you will100% go to the party

Now if you weather is not good, you've decided that you'll sit at home

So the maximum weightage isfor your weather variable

So if your weather is really good, you will go to the party

It is a very importantfactor in order to understand whether you're going to sit at home or you're going to go to the party

So, basically, if X one equal to one, your output will be one

Meaning that if your weather is good, you'll go to the party

Now let's randomly assignweights to each of our input

W one is the weightassociated with input X one

W two is the weight with X two and W three is the weightassociated with X three

Let's say that your W one is six, your W two is two, and W three is two

Now by using the activation function, you're going to set a threshold of five

Now this means that it will fire when the weather is good and won't fire if the weather is bad, irrespective of the other inputs

Now here, because your weightage is six, so, basically, if youconsider your first input which has a weightage of six, that means you're 100% going to go

Let's say you're consideringonly the second input

This means that you're not going to go, because your weightage is twoand your threshold is five

So if your weightage isbelow your threshold, it means that you're not going to go

Now let's consider another scenario where our threshold is three

This means that it'll fire when either X one is high or the other two inputs are high

Now W two is associated withyour wife is going or not

Let's say the weather is bad and you have no public transportation, meaning that your x oneand x three is zero, and only your x two is one

Now if your x two is one, your weightage is going to be two

If your weightage is two, you will not go because thethreshold value is set to three

The threshold value isset in such a way that if X two and X threeare combined together, only then you'll go, or only if x one is true, then you'll go

So you're assigningthreshold in such a way that you will go for sureif the weather is good

This is how you assign threshold

This is nothing but youractivation function

So guys, I hope all of you understood, the most amount of weightage is associated with theinput that is very important in predicting your output

This is exactly how a perceptron works

Now let's look at thelimitations of a perceptron

Now in a perceptron, thereare no hidden layers

There's only an input layer, and there is an output layer

We have no hidden layers in between

And because of this, you cannot classify non-linearly separable data points

Okay, if you have data,like in this figure, how will you separate this

You cannot use a perceptron to do this

Alright, so complex problems that involve a lot of parameters cannot be solved by a single layer perceptron

That's why we need something known as multiple layer perceptron

So now we'll discuss something known as multilayer perceptron

A multilayer perceptronhas the same structure of a single layer perceptron, but with one or more hidden layer

Okay, and that's why it'sconsider as a deep neural network

So in a single layer perceptron, we had only input layer, output layer

We didn't have any hidden layer

Now when it comes tomulti-layer perceptron, there are hidden layers in between, and then there is the output layer

It was in this similarmanner, like I said, first, you'll have theinput X one, X two, X three, and so on

And each of these inputswill be assigned some weight

W one, W two, W three, and so on

Then you'll calculatethe weighted summation of each of these inputs and their weights

After that, you'll sendthem to the transformation or the activation function, and you'll finally get the output

Now, the only thing isthat you'll have multiple hidden layers in between, one or more than one hidden layers

So, guys, this is how amultilayer perceptron works

It works on the concept offeed forward neural networks

Feed forward meansevery node at each level or each layer is connectedto every other node

So that's what feed forward networks are

Now when it comes to assigning weights, what we do is we randomly assign weights

Initially we have inputX one, X two, X three

We randomly assign someweight W one, W two, W three, and so on

Now it's always necessarythat whatever weights we assign to our input, those weights are actually correct, meaning that those weightsare company significant in predicting your output

So how a multilayer perceptron works is a set of inputs are passedto the first hidden layer

Now the activations fromthat layer are passed through the next layer

And from that layer, it'spassed to the next hidden layer, until you reach the output layer

From the output layer,you'll form the two classes, class one and class two

Basically, you'll classify your input into one of the two classes

So that's how a multilayerperceptron works

A very important concept themultiple layer perceptron is back propagation

Now what is back propagation

Back propagation algorithm is a supervised learning methodfor multilayer perceptrons

Okay, now why do we need back propagation? So guys, when we aredesigning a neural network in the beginning, we initialize weights with some random values, orany variable for that fact

Now, obviously, we need tomake sure that these weights actually are correct, meaning that these weightsshow the significance of each predictor variable

These weights have to fit our model in such a way that ouroutput is very precise

So let's say that we randomly selected some weights in the beginning, but our model outputis much more different than our actual output, meaning that our error value is very huge

So how will you reduce this error

Basically, what you need to do is we need to somehow explain to the model that we need to change the weight in such a way that theerror becomes minimum

So the main thing is theweight and your error is very highly related

The weightage that you give to each input will show how much erroris there in your output, because the most significant variables will have the highest weightage

And if the weightage is not correct, then your output is also not correct

Now, back propagation is away to update your weights in such a way that your outcome is precise and your error is reduced

So, in short backpropagation is used to train a multilayer perceptron

It's basically use to update your weights in such a way that youroutput is more precise, and that your error is reduced

So training a neural networkis all about back propagation

So the most common deep learning algorithm for supervised training ofthe multilayer perceptron is known as back propagation

So, after calculating theweighted sum of inputs and passing them throughthe activation function, we propagate backwardsand update the weights to reduce the error

It's as simple as that

So in the beginning, you'regoing to assign some weights to each of your input

Now these inputs will gothrough the activation function and it'll go through all the hidden layers and give us an output

Now when you get the output, the output is not very precise, or it is not the desired output

So what you'll do isyou'll propagate backwards, and you start updating your weights in such a way that your error is as minimum as possible

So, I'm going to repeat this once more

So the idea behind back propagation is to choose weights in such a way that your error gets minimized

To understand this, we'lllook at a small example

Let's say that we have a dataset which has these labels

Okay, your input is zero, one, two, but your desired outputis zero, one, and four now the output of your model when W equal to three is like this

Notice the differencebetween your model output and your desired output

So, your model output is three, but your desired output is two

Similarly, when your model output is six, your desired output issupposed to be four

Now let's calculate the errorwhen weight is equal to three

The error is zero over here because your desired output is zero, and your model output is also zero

Now the error in the second case is one

Basically, your model outputminus your desired output

Three minus two, your error is one

Similarly, your error forthe third input is two, which is six minus four

When you take the square, this is actually a very huge difference, your error becomes larger

Now what we need to do is we need to update the weight value in such a way that our error decreases

Now here we've consideredthe weight as four

So when you consider the weight as four, your model output becomeszero, four, and eight

Your desired output iszero, two, and four

So your model output becomeszero, four, and eight, which is a lot

So guys, I hope you all know how to calculate the output over here

What I'm doing is I'mmultiplying the input with your weightage

The weightage is four, so zero into four will give me zero

One into four will give me four, and two into four will give me eight

That's how I'm getting mymodel output over here

For now, this is how I'mgetting the output over here

That's how you calculate your weightage

Now, here, if you seethat our desire output is supposed to be zero, two, and four, but we're getting an outputof zero, four, and eight

So our error is actually increasing as we increase our weight

Our error four W equal to four have become zero, four, and 16, whereas the error for W equal to three, zero, one, and four

I mean the square error

So if you look at this, aswe increase our weightage, our error is increasing

So, obviously, we know that there is no point in increasingthe value of W further

But if we decrease the value of W, our error actually decreases

Alright, if we give a weightage of two, our error decreases

If we can find a relationshipbetween our weight and error, basically, if you increase the weight, your error also increases

If you decrease the weight,your error also decreases

Now what we did hereis we first initialize some random value to W, and then we propagated forward

Then we notice that there is some error

And to reduce that error,we propagated backwards and increase the value of W

After that, we notice thatthe error has increased, and we came to know that wecan't increase the w value

Obviously, if your error is increasing with increasing your weight, you will not increase the weight

So again, we propagated backwards, and we decreased the W value

So, after that, we noticedthat the error has reduced

So what we're tryingis we're trying to get the value of weight in such a way that the error becomesas minimum as possible so we need to figure out whether we need to increase or decrease thew eight value

Once we know that, we keepon updating the weight value in that direction, until the error becomes minimum

Now you might reach a point where if you further update the weight, the error will again increase

At that point, you need to stop

Okay, at that point is where your final weight value is there

So, basically, thisgraph denotes that point

Now this point is nothingbut the global loss minimum

If you update the weights further, your error will also increase

Now you need to find out whereyour global loss minimum is, and that is where youroptimum weight lies

So let me summarize the steps for you

First, you'll calculate the error

This is how far your model output is from your actual output

Then you'll check whether the error is minimized or not

After that, if the error is very huge, then you'll update the weight, and you'll check the error again

You'll repeat the processuntil the error becomes minimum now once you reach theglobal loss minimum, you'll stop updating the weights, and we'll finalize your weight value

This is exactly howback propagation works

Now in order to tell youmathematically what we're doing is we're using a methodknown as gradient descent

Okay, this method is used to adjust all the weights in the network with an aim of reducing theerror at the output layer

So how gradient descentoptimize our works is the first step is youwill calculate the error by considering the below equation

Here you're subtracting thesummation of your actual output from your network output

Step two is based on the error you get, you will calculate therate of change of error with respect to the change in the weight

The learning rate issomething that you set in the beginning itself

Step three is based onthis change in weight, you will calculate the new weight

Alright, your updatedweight will be your weight plus the rate of change of weight

So guys, that was all about back propagation and weight update

Now let's look at the limitationsof feed forward network

So far, we were discussingthe multiple layer perceptron, which uses the feed forward network

Let's discuss the limitations of these feed forward networks

Now let's consider an exampleof image classification

Okay, let's say you'vetrained the neural network to classify images of various animals

Now let's consider an example

Here the first output is an elephant

We have an elephant

And this output will have nothing to do with the previous output, which is a dog

This means that the output at time T is independent of theoutput at time T minus one

Now consider this scenario where you will require the use of previously obtained output

Okay, the concept is verysimilarly to reading a book

As you turn every page, you need an understanding of the previous pages if you want to makesense of the information, then you need to knowwhat you learned before

That's exactly whatyou're doing right now

In order to understand deep learning, you have to understand machine learning

So, basically, with thefeed forward network the new output at time T plus one has nothing to do withthe output at time T, or T minus one, or T minus two

So feed forward networks cannot be used while predicting a word in a sentence, as it will have absolutely no relationship with the previous set of words

So, a feed forwardnetwork cannot be used in use cases wherein you haveto predict the outcome based on your previous outcome

So, in a lot of use cases, your previous output will alsodetermine your next output

So, for such cases, you may not make use of feed forward network

Now, what modification can you make so that your network can learn from your previous mistakes

For this, we have solution

So, a solution to this isrecurrent neural networks

So, basically, let's say you have an input at time T minus one, and you'll get some output whenyou feed it to the network

Now, some information fromthis input at T minus one is fed to the next input, which is input at time T

Some information from this output is fed into the next input, which is input at T plus one

So, basically, you keepfeeding information from the previous input to the next input

That's how recurrent neuralnetworks really work

So recurrent networks are a type of artificial neural networks designed to recognizepatterns in sequence of data, such as text, genomes,handwriting, spoken words, time series data, sensors, stock markets, and government agencies

So, guys, recurrent neuralnetworks are actually a very important part of deep learning, because recurring neural networks have applications in a lot of domains

Okay, in time series and in stock markets, the main network that I use are recurrent neural networks, because each of your inputs are correlated now to better understandrecurrent neural networks, let's consider a small example let's say that you goto the gym regularly, and the trainer has given you a schedule for your workout

So basically, the exercises are repeated after every third day

Okay, this is what yourschedule looks like

So, make a note that allthese exercises are repeated in a proper order or ina sequence every week first, let us use a feedforward network to try and predict the type of exercises that we're going to do

The inputs here are Dayof the week, the month, and your health status

Okay, so, neural network has to be trained using these inputs to provideus with the prediction of the exercise that we should do

Now let's try and understandthe same thing using recurrent neural networks

In recurrent neural networks, what we'll do is we'll consider the inputs of the previous day

Okay, so if you did ashoulder workout yesterday, then you can do a bicep exercise today, and this goes on for the rest of the week

However, if you happento miss a day at the gym, the data from the previouslyattended time stamps can be considered

It can be done like this

So, if a model istrained based on the data it can obtain from the previous exercise, the output on the modelwill be extremely accurate

In such cases, if youneed to do know the output at T minus one in order topredict the output at T

In such cases, recurrent neuralnetworks are very essential

So, basically, I'm feeding some inputs through the neural networks

You'll go through a few functions, and you'll get the output

So, basically, you'repredicting the output based on past informationor based on your past input

So that's how recurrentneural networks work

Now let's look at anothertype of neural network known as convolutional neural network

To understand why we needconvolutional neural networks, let's look at an analogy

How do you think acomputer reads an image? Consider this image

This is a New York skyline image

On the first glance, you'll see a lot of buildingsand a lot of colors

How does a computer process this image? The image is actually brokendown into three color channels, which is the red, green, and blue

It reads in the form of RGB values

Now each of these colorchannels are mapped with the image's pixel then the computer will recognize the value associated with each pixel, and determine the size of the image

Now for the black and white images, there is only one channel, but the concept is still the same

The thing is we cannot make use of fully connected networks when it comes to convolutional neural networks

I'll tell you why

Now consider the first input image

Okay, first image has size about 28 into 28 into three pixels

And if we input this to a neural network, we'll get about 2,352 weights in the first hidden layer itself

Now consider another example

Okay, let's say we have an image of 200 into 200 into three pixels

So the size of your first hidden layer becomes around 120,000

Now if this is justthe first hidden layer, imagine the number ofneurons that you need to process an entire complex image set

This leads to somethingknown as overfitting, because all of the hiddenlayers are connected

They're massively connected

There's connection betweeneach and every node

Because of this, we face overfitting

We have way too much of data

We have to use way too many neurons, which is not practical

So that's why we have something known as convolutional neural networks

Now convolutional neural networks, like any other neural network are made up of neurons withlearnable weights and basis

So each neuron receives several input

It takes a weighted sum over them, and it gets passed on throughsome activation function, and finally responds with an output

So, the concept inconvolutional neural networks is that the neuron in a particular layer will only be connected to a small region of the layer before it

Not all the neurons will be connected in a fully-connected manner, which leads to overfitting because we need way too many neurons to solve this problem

Only the regions, which are significant are connected to each other

There is no full connection in convolutional neural networks

So gus, what we did so far is we discussed what a perceptron is

We discussed the different types of neural networks that are there

We discussed a feedforward neural network

We discuss multi layer perceptrons we discussed recurrent neural networks, and convolutional neural networks

I'm not going to go too much in depth with these concepts now I'll be executing a demo

If you you haven't understood any theoretical concept of deep learning, please let me know in the comment section

Apart from this, I'll also leave a couple of links in the description box, so that you understand thewhole download in a better way

Okay, if you want a morein-depth explanation, I'll leave a couple of linksin the description box

For now, what I'm gonnado is I'll be running a practical demonstration to show you what exactly download does so, basically, what we'regoing to do in this demo is we're going to predict stock prices

Like I said, stock price prediction is one of the very good applications of deep neural networks

You can easily predict the stock price of a particular stock for the next minute or the next day by usingdeep neural networks

So that's exactly whatwe're gonna do in this demo now, before I discuss the code, let me tell you a fewthings about our data set

The data set containsaround 42,000 minutes of data ranging from April to August 2017 on 500 stocks, as well as the total S&P 500 Index price

So the index and stocks are arranged in a wide format

So, this is my data set, data_stocks

It's in the CSV format

So what I'm gonna do is I'm going to use the read CSV function inorder to import this data set

This is just the part ofwhere my data set is stored

This data set was actuallycleaned and prepared, meaning that we don'thave any missing stock and index prices

So the file does notcontain any missing values

Now what we're gonna do first is we'll drop the data valuable we have a variable known as date, which is not really necessary in predicting our outcome over here

So that's exactly what I'm doing here

I'm just dropping the date variable

So here, I'm checking thedimensions of the data set

This is pretty understandable, using the shape function to do that

Now, always you make thedata as a NymPy array

This makes computation much easier

The next process is the data splicing

I've already discussed datathe data splicing with you all

Here we're just preparing the training and the testing data

So the training data will contain 80% of the total data set

Okay, and also we are notshuffling the data set

We're just slicing thedata set sequentially

That's why we have a test start start and the test end variable

In sequence, I'll be selecting the data

There's no need ofshuffling this data set

These are stock prices it does not make senseto shuffle this data

Now in the next step, we're going to do is we're going to scale the data now, scaling data and data normalization is one of the most important steps

You cannot miss this step I already mentioned earlier what normalization and scaling is

Now most neural networks benefit from scaling inputs

This is because mostcommon activation function of the networks neuron suchas tan, hedge, and sigmoid

Tan, hedge, and sigmoid arebasically activation functions, and these are defined in therange of minus one to one or zero and one

So that's why scalingis an important thing in deep neural networks for scaling, again, we'lluse the MinMaxScaler

So we're just importingthat function over here

And also one point to note is that you have to be very cautious about what part of data you're scaling and when you're doing it

A very common mistake isto scale the whole data set before training and testsplits are being applied

So before data splicing itself, you shouldn't be scaling your data

Now this is a mistake because scaling invokes thecalculation of statistics

For example, minimum ormaximum range of the variable gets affected

So when performing time seriesforecasting in real life, you do not have informationfrom future observations at the time of forecasting

That's why calculationof scaling statistics has to be conducted on training data, and only then it has to beapplied to the test data

Otherwise, you're basicallyusing the future information at the time of forecasting, which obviously going to lead to biasness so that's why you need to make sure you do scaling very accurately

So, basically, what we'redoing is the number of features in the training data are stored in a variable known as n stocks

After this, we'll importthe infamous TensorFlow

So guys, TensorFlow isactually a very good piece of software and itis currently the leading deep learning and neuralnetwork computation framework

It is based on a C++ low-level backend, but it's usuallycontrolled through Python

So TensorFlow actually operates as a graphical representationof your computations

And this is importantbecause neural networks are actually graphs of dataand mathematical operation

So that's why TensorFlow is just perfect for neural networks and deep learning

So the next thing afterimporting the TensorFlow library is something known as placeholders

Placeholders are used tostore, import, and target data

We need two placeholdersin order to fit our model

So basically, X willcontain the network's input, which is the stockprices of all the stocks at time T equal to T

And y will contain the network's output, which is the stock price attime T is equal to T plus one

Now the shape of the X placeholder means that the inputs aretwo-dimensional matrix

And the outputs are aone-dimensional vector

So guys, basically, thenon-argument indicates that at this point we do not yet know the number of observations that'll flow through the neural network

We just keep it as aflexible array for now

We'll later define the variable batch size that controls the number of observations in each training batch

Now, apart form this, we also have something know as initializers

Now, before I tell you whatthese initializers are, you need to understand that there's something known as variables that are used as flexible containers that are allowed to changeduring the execution

Weights and bias arerepresented as variables in order to adapt during training

I already discuss weightsand bias with you earlier

Now weights and bias is something that you need to initializebefore you train the model

That's how we discussed iteven while I was explaining neural networks to you

So here, basically, we makeuse of something known as variant scaling initializer and for bias initializer, we make use of zeros initializers

These are some predefinedfunctions in our TensorFlow model

We'll not get into thedepth of those things

Now let's look at our modelarchitecture parameters

So the next thing we have to discuss is the model architecture parameters

Now the model that we build, it consists of four hidden layers

For the first layer, we'veassigned 1,024 neurons which is likely more thandouble the size of the inputs

The subsequent hidden layers are always half the size of the previous layer, which means that in thehidden layer number two, we'll have 512 neurons

Hidden layer three will have 256

And similarly, hidden layer number four will have 128 neurons

Now why do we keep reducingthe number of neurons as we go through each hidden layer

We do this because the number of neurons for each subsequent layercompresses the information that the network identifiesin the previous layer

Of course there are otherpossible network architectures that you can apply forthis problem statement, but I'm trying to keepit as simple as possible, because I'm introducingdeep learning to you all

So I can't build a model architecture that's very complex and hard to explain

And of course, we have output over here which will be assigned a single neuron

Now it is very important to understand that variable dimensionsbetween your input, hidden, and output layers

So, as a rule of thumb inmultilayer perceptrons, the second dimension of the previous layer is the first dimensionin the current layer

So the second dimensionin my first hidden layer is going to be my first dimensionin my second hidden layer

Now the reason behindthis is pretty logical

It's because the outputfrom the first hidden layer is passed on as an inputto the second hidden layer

That's why the seconddimension of the previous layer is the same as the first dimension of the next layer or the current layer

I hope this is understandable

Now coming to the biasdimension over here, the bias dimension is always equal to the second dimensionof your current layer, meaning that you're just going to pass the number of neurons inthat particular hidden layer as your dimension in your bias

So here, the number of neurons, 1,024, you're passing the same numberas a parameter to your bias

Similarly, even forhidden layer number two, if you see a second dimension here is n_neurons_2

I'm passing the sameparameter over here as well

Similarly, for hidden layer three and hidden layer number four

Alright, I hope this is understandable now we come to the output layer

The output layer will obviously have the output from hidden layer number four

This is our output from hidden layer four that's passed as the firstdimension in our output layer, and it'll finally have your n target, which is set to one over here

This is our output

Your bias will basically havethe current layer's dimension, which is n target

You're passing that sameparameter over here

Now after you define the required weight and the bias variables, the architecture of thenetwork has to be specified

What you do is placeholders and variables need to be combined into a system of sequential matrix multiplication

So that's exactly what'shappening over here

Apart from this, all the hidden layers need to be transformed byusing the activation function

So, activation functions are important components of the network because they introducenon-linearity to the system

This means that high dimensional data can be dealt with with the helpof the activation functions

Obviously, we have veryhigh dimensional data when it comes to neural networks

We don't have a single dimension or we don't have two or three inputs

We have thousands and thousands of inputs

So, in order for aneural network to process that much of high dimensional data, we need something knownas activation functions

That's why we make useof activation functions

Now, there are dozensof activation functions, and one of the most common one is the rectified linear unit, rectified linear unit

RELU is nothing but rectified linear unit, which is what we're gonnabe using in this model

So, after, you applied thetransformation function to your hidden layer, youneed to make sure that your output is transposed

This is followed by a veryimportant function known as cost function

So the cost function of a network is used to generate a measure of deviation between the network's prediction and the actual observed training targets

So this is basically your actual output minus your model output

It basically calculates theerror between your actual output and your predicted output

So, for regression problems,the mean squared error function is commonly used

I have discussed MSC, meansquared error, before

So, basically, we are just measuring the deviation over here

MSC is nothing bot your deviation from your actual output

That's exactly what we're doing here

So after you've computed your error, the next step is obviously to update your weight and your bias

So, we have somethingknown as the optimizers

They basically take care ofall the necessary computations that are needed to adaptthe network's weight and bias variables duringthe training phase

That's exactly what's happening over here

Now the main function ofthis optimizer is that it invoke something known as a gradient

Now if you all remember, wediscussed gradient before it basically indicates the direction in which the weights and the bias has to be changed during the training in order to minimize thenetwork's cost function or the network's error

So you need to figure outwhether you need to increase the weight and the bias inorder to decrease the error, or is it the other way around? You need to understand the relationship between your error andyour weight variable

That's exactly what the optimizer does

It invokes the gradient

We will give you thedirection in which the weights and the bias have to be changed

So now that you knowwhat an optimizer does, in our model, we'll be using something known as the AdamOptimizer

This is one of thecurrent default optimizers in deep learning

Adam basically stands foradaptive moment estimation, and it can be consideredas a combination between very two popular optimizerscalled Adagrad and RMSprop

Now let's not get into thedepth of the optimizers

The main agenda here is for you to understand thelogic behind deep learning

We don't have to go into the functions

I know these are predefined functions which TensorFlow takes care of

Next we have somethingknown as initializers

Now, initializers are used to initialize the network's variables before training

We already discussed this before

I'll define the initializer here again

I've already done itearlier in this session

Initializers are already defined

So I just removed that line of code

Next step would be fittingthe neural network

So after we've defined theplace holders, the variables, variables which arebasically weights and bias, the initializers, the cost functions, and the optimizers of the network, the model has to be trained

Now, this is usually done by using the mini batch training method, because we have very huge data set

So it's always best to use themini batch training method

Now what happens duringmini batch training is random data samples of any batch size are drawn from the training data, and they are fed into the network

So the training data set gets divided into N divided by your batch size batches that are sequentiallyfed into the network

So, one after the other, each of these batches willbe fed into the network

At this point, the placeholderwhich are your X and Y, they come into play

They store the input and the target data and present them to thenetwork as inputs and targets

That's the main functionalityof placeholders

What they do is they storethe input and the target data, and they provide this to the network as inputs and targets

That's exactly what your placeholders do

So let's say that asample data batch of X

Now this data batchflows through the network until it reaches the output layer

There the TensorFlow comparesthe model's predictions against the actual observed targets, which is stored in Y

If you all remember, we stored our actualobserved targets in Y

After this, TensorFlow will conduct something known as optimization step, and it'll update the network's parameters like the weight of thenetwork and the bias

So after having updateyour weight and the bias, the next batch is sampled andthe process gets repeated

So this procedure will continue until all the batches havepresented to the network

And one full sweep over all batches is known as an epoch

So I've defined thisentire thing over here

So we're gonna go through 10 epochs, meaning that all the batches are going to go through training, meaning you're going toinput each batch that is X, and it'll flow through the network until it reaches the output layer

There what happens is TensorFlow will compare your predictions

That is basically whatyour model predicted against the actual observed targets which is stored in Y

After this, TensorFlowwill perform optimization wherein it'll update the network paramters like your weight and your bias

After you update the weight and the bias, the next batch will get sampled and the process will keep repeating

This happens until all the batches are implemented in the network

So what I just told you was one epoch

We're going to repeat this 10 times

So a batch size is 256, meaning that we have 256 batches

So here we're going to assign x and y, what I just spoke to you about

The mini batch training starts over here so, basically, your first batch will start flowing through the network until it reaches the output layer

After this, TensorFlow willcompare your model's prediction

This is where predictions happen

It'll compare your model's prediction to the actual observed targets which is stored in y

Then TensorFlow willstart doing optimization, and it'll update the network paramters like your weight and your bias

So after you update theweight and the biases, the next batch will getinput into the network, and this process will keep repeating

This process will repeat 10 times because we've defined 10 epochs

Now, also during the training, we evaluate the network'sprediction on the test set, which is basically the datawhich we haven't learned, but this data is set asidefor every fifth batch, and this is visualized

So in our problem statement, what a network is going to do is it's going to predict the stock price continuously over a timeperiod of T plus one

We're feeding it data abouta stock price at time T

It's going to give us anoutput of time T plus one

Now let me run this code and let's see how closeour predicted values are to the actual values

We're going to visualizethis entire thing, and we've also exported this in order to combine itinto a video animation

I'll show you what the video looks like

So now let's look at our visualization

We'll look at our output

So the orange basicallyshows our model's prediction

So the model quickly learns the shape and the location of thetime series in the test data and showing us an accurate prediction

It's pretty close tothe actual prediction

Now as I'm explaining this to you, each batch is running here

We are at epoch two

We have 10 epochs to go over here

So you can see that thenetwork is actually adapting to the basic shape of the time series, and it's learning finerpatterns in the data

You see it keeps learning patterns and the production isgetting closer and closer after every epoch

So let just wait til we reach epoch 10 and we complete the entire process

So guys, I think thepredictions are pretty close, like the pattern and theshape is learned very well by our neural network

It is actually mimicking this network

The only deviation is in the values

Apart from that, it's learning the shape of the time series datain almost the same way

The shape is exactly the same

It looks very similar to me

Now, also remember thatthere are a lot of ways of improving your result

You can change the design of your layers or you can change the number of neurons

You can choose differentinitialization functions and activation functions

You can introduce somethingknown as dropout layers which basically help youto get rid of overfitting, and there's also somethingknown as early stopping

Early stopping helps you understand where you must stop your batch training

That's also another methodthat you can implement for improving your model

Now there are also differenttypes of deep learning model that you can use for this problem

Here we use the feedforward network, which basically means that the batches will flow from left to right

Okay, so our 10 epochs are over

Now the final thing that'sgetting calculate is our error, MSC or mean squared error

So guys, don't worry about this warning

It's just a warning

So our mean square errorcomes down to 0

0029 which is pretty low becausethe target is scaled

And this means that ouraccuracy is pretty good

So guys, like I mentioned, if you want to improvethe accuracy of the model, you can use different schemes, you can use differentinitialization functions, or you can try out differenttransformation functions

You can use somethingknown as dropout technique and early stopping in orderto make the training phase even more better

So guys, that was the endof our deep learning demo

I hope all of you understoodthe deep learning demo

For those of you who are just learning deep learning for the first time, it might be a little confusing

So if you have any doubtsregarding the demo, let me know in the comment section

I'll also leave a couple oflinks in the description box, so that you can understand deep learning in a little more depth

Now let's look at ourfinal topic for today, which is natural language processing

Now before we understandwhat text mining is and what natural language processing is, we have to understandthe need for text mining and natural language processing

So guys, the number one reason why we need text mining and naturallanguage processing is because of the amount of data that we're generating during this time

Like I mentioned earlier, there are around 2

5quintillion bytes of data that is created every day, and this number is only going to grow

With the evolution of communication through social media, we generate tons and tons of data

The numbers are on your screen

These numbers areliterally for every minute

On Instagram, every minute, 1

7million pictures are posted

Okay, 1

7 or more than 1

7million pictures are posted

Similarly, we have tweets

We have around 347,000 tweetsevery minute on Twitter

This is actually a lot and lot of data

So, every time we're using a phone, we're generating way too much data

Just watching a video on YouTube is generating a lot of data

When sending text messages from WhatsApp, that is also generatingtons and tons of data

Now the only problem isnot our data generation

The problem is that out of all the data that we're generating,only 21% of the data is structured and well-formatted

The remaining of the data is unstructured, and the major source ofunstructured data include text messages fromWhatsApp, Facebook likes, comments on Instagram, bulk emails that we send out ever single day

All of this accounts forthe unstructured data that we have today

Now the question here is what can be done with so much data

Now the data that we generate can be used to grow businesses

By analyzing and mining the data, we can add more value to a business

This exactly what textmining is all about

So text mining or text analytics is the analysis of data available to us in a day-to-day spokenor written language

It is amazing so muchdata that we generate can actually be used in text mining

We have data from word Word documents, PowerPoints, chat messages, emails

All of this is used toadd value to a business now the data that we get from sources like social media, IoT, they are mainly unstructured, and unstructured data cannot be used to draw useful insightsto grow a business

That's exactly why we need to text mining

Text mining or text analytics is the process of derivingmeaningful information from natural language text

So, all the data that wegenerate through text messages, emails, documents, files, are written in natural language text

And we are going to use text mining and natural language processing to draw useful insights orpatterns from such data

Now let's look at a few examples to show you how naturallanguage processing and text mining is used

So now before I move any further, I want to compare text mining and NLP

A lot of you might be confused about what exactly text mining is and how is it related tonatural language processing

A lot of people have also asked me why is NLP and text mining considered as one and the same and are they the same thing

So, basically, text mining is a vast field that makes use of naturallanguage processing to derive high qualityinformation from the text

So, basically, text mining is a process, and natural languageprocessing is a method used to carry out text mining

So, in a way, you can say that text mining is a vast field which uses and NLP in order perform textanalysis and text mining

So, NLP is a part of text mining

Now let's understand what exactly natural language processing is

Now, natural language processing is a component of text mining which basically helps amachine in reading the text

Obviously, machines don'tactually known English or French, they interpret data in theform of zeroes and ones

So this is where naturallanguage processing comes in

NLP is what computers and smart phones use to understand our language, both spoken and written language

Now because use language tointeract with our device, NLP became an integral part of our life

NLP uses concepts of computer science and artificial intelligence to study the data and deriveuseful information from it

Now before we move any further, let's look at a few applicationsof NLP and text mining

Now we all spend a lotof time surfing the webs

Have you ever notice that if you start typing a word on Google, you immediately getsuggestions like these

These feature is alsoknown as auto complete

It'll basically suggest therest of the word for you

And we also have somethingknown as spam detection

Here is an example ofhow Google recognizes the misspelling Netflix and shows results for keywordsthat match your misspelling

So, the spam detection is also based on the concepts of text mining and natural language processing

Next we have predictivetyping and spell checkers

Features like auto correct,email classification are all applicationsof text mining and NLP

Now we look at a coupleof more applications of natural language processing

We have something knownas sentimental analysis

Sentimental analysis is extremely useful in social media monitoring, because it allows us to gain an overview of the wider public opinionbehind certain topics

So, basically, sentimental analysis is used to understand the public's opinion or customer's opinion on a certain product or on a certain topic

Sentimental analysis isactually a very huge part of a lot of social media platforms like Twitter, Facebook

They use sentimentalanalysis very frequently

Then we have something known as chatbot

Chatbots are basically the solutions for all the consumer frustration, regarding customer call assistance

So we have companies like Pizza Hut, Uber who have started using chatbots to provide good customer service, apart form that speech recognition

NLP has widely been usedin speech recognition

We're all aware of Alexa,Siri, Google Assistant, and Cortana

These are all applications ofnatural language processing

Machine translation is anotherimportant application of NLP

An example of this isthe Google Translator that uses NLP to process and translate one language to the other

Other application include spell checkers, keywords search, information extraction, and NLP can be used toget useful information from various website, from word documents, from files, and et cetera

It can also be used inadvertisement matching

This basically means arecommendation of ads based on your history

So now that you have abasic understanding of where natural language processing is used and what exactly it is, let's take a look atsome important concepts

So, firstly, we're gonnadiscuss tokenization

Now tokenization is the mosbasic step in text mining

Tokenization basicallymeans breaking down data into smaller chunks or tokens so that they can be easily analyzed

Now how tokenization works is it works by breaking acomplex sentence into words

So you're breaking ahuge sentence into words

You'll understand theimportance of each of the word with respect to the whole sentence, after which will produce a description on an input sentence

So, for example, let'ssay we have this sentence, tokens are simple

If we apply tokenization on this sentence, what we get is this

We're just breaking a sentence into words

Then we're understanding the importance of each of these words

We'll perform NLP processon each of these words to understand how important each word is in this entire sentence

For me, I think tokens andsimple are important words, are is basically another stop word

We'll be discussing about stopwords in our further slides

But for now, you eed tounderstand that tokenization is a very simple process that involves breaking sentences into words

Next, we have something known as stemming

Stemming is basically normalizing words into its base form or into its root form

Take a look at this example

We have words like detection, detecting, detected, and detections

Now we all know that the root word for all these words is detect

Basically, all these words mean detect

So the stemming algorithmworks by cutting off the end or the beginning of the word and taking into accounta list of common prefixes and suffixes that canbe found on any word

So guys, stemming can besuccessful in some cases, but not always

That is why a lot of people affirm that stemming has a lot of limitations

So, in order to overcomethe limitations of stemming, we have something known as lemmatization

Now what lemmatization does is it takes into considerationthe morphological analysis of the words

To do so, it is necessary tohave a detailed dictionary which the algorithm can lookthrough to link the form back to its lemma

So, basically lemmatization is also quite similar to stemming

It maps different wordsinto one common root

Sometimes what happens in stemming is that most of the words gets cut off

Let's say we wanted tocut detection into detect

Sometimes it becomesdet or it becomes tect, or something like that

So because of this, the grammar or the importance of the word goes away

You don't know whatthe words mean anymore

Due to the indiscriminatecutting of the word, sometimes the grammar theunderstanding of the word is not there anymore

So that's why lemmatizationwas introduced

The output of lemmatizationis always going to be a proper word

Okay, it's not going to besomething that is half cut or anything like that

You're going to understandthe morphological analysis and then only you're goingto perform lemmatization

An example of a lemmatizer is you're going to convertgone, going, and went into go

All the three words anywaymean the same thing

So you're going to convert it into go

We are not removing the firstand the last part of the word

What we're doing is we're understanding the grammar behind the word

We're understanding the English or the morphological analysis of the word, and only then we're goingto perform lemmatization

That's what lemmatization is all about

Now stop words are basically a set of commonly used words in anylanguage, not just English

Now the reason why stop words are critical to many applications is that if we remove the wordsthat are very commonly used in a given language, we can finally focuson the important words

For example, in thecontext of a search engine, let's say you open up Google and you try how to makestrawberry milkshake

What the search engine is going to do is it's going to find a lot more pages that contain the terms how to make, rather than pages which contain the recipe for your strawberry milkshake

That's why you have todisregard these terms

The search engine can actually focus on the strawberry milkshake recipe, instead of looking for pagesthat have how to and so on

So that's why you need toremove these stop words

Stop words are how to, begin,gone, various, and, the, all of these are stop words

They are not necessarily important to understand theimportance of the sentence

So you get rid of thesecommonly used words, so that you can focuson the actual keywords

Another term you need to understand is document term matrix

A document term matrixis basically a matrix with documents designated byroles and words by columns

So if your document one hasthis sentence, this is fun, or has these word, this is fun, then you're going to getone, one, one over here

In document two, if you seewe have this and we have is, but we do not have fun

So that's what a document term matrix is

It is basically to understandwhether your document contains each of these words

It is a frequency matrix

That is what a document term matrix is

Now let's move on and look at a natural language processing demo

So what we're gonna dois we're gonna perform sentimental analysis

Now like I said, sentimental analysis is one of the most popular applications of natural language processing

It refers to the processing of determining whether a given piece of textor a given sentence of text is positive or negative

So, in some variations, we consider a sentence to also be neutral

That's a third option

And this technique iscommonly used to discover how people feel about a particular topic or what are people's opinionabout a particular topic

So this is mainly used toanalyze the sentiments of users in various forms, such as in marketingcampaigns, in social media, in e-commerce websites, and so on

So now we'll be performingsentimental analysis using Python

So we are going to performnatural language processing by using the NaiveBayesClassifier

That's why we are importingthe NaiveBayesClassifier

So guys, Python provides a library known as natural language toolkit

This library contains allthe functions that are needed to perform natural language processing

Also in this library, we have a predefined dataset called movie reviews

What we're gonna do iswe're going to download that from our NLTK, which isnatural language toolkit

We're basically going to run our analysis on this movie review data set

And that's exactly whatwe're doing over here

Now what we're doing iswe're defining a function in order to extract features

So this is our function

It's just going to extract all our words

Now that we've extracted the data, we need to train it, so we'll do that by usingour movie reviews data set that we just downloaded

We're going to understand the positive words and the negative words

So what we're doing here iswe're just loading our positive and our negative reviews

We're loading both of them

After that, we'll separate each of these into positive featuresand negative features

This is pretty understandable

Next, we'll split the data into our training and testing set

Now this is somethingthat we've been doing for all our demos

This is also known as data splicing

We've also set a threshold factor of 0

8 which basically meansthat 80% of your data set will belong to your training, and 20% will be for your testing

You're going to do thiseven for your positive and your negative words

After that, you're justextracting the features again, and you're just printing the number of trainingdata points that you have

You're just printing the lengthof your training features and you're printing the length of your testing features

We can see the output,let's run this program

So if you see that we're getting the number of trainingdata points as 1,600 and your number of testingdata points are 400, there's an 80 to 20% ration over here

After this, we'll be usingthe NaiveBayesClassifier and we'll define the object for the NaiveBayesClassifierwith basically classifier, and we'll train this usingour training data set

We'll also look at theaccuracy of our model

The accuracy of ourclassifier is around 73%, which is a really good number

Now this classifier objectwill actually contain the most informative words that are obtained during analysis

These words are basicallyessential in understanding which word is classified as positive and which is classified as negative

What we're doing here iswe're going to review movies

We're going to see whichmovie review is positive or which movie review is negative

Now this classifier will basically have all the informative wordsthat will help us decide which is a positive reviewor a negative review

Then we're just printing these10 most informative words, and we have outstanding, insulting, vulnerable, ludicrous, uninvolving, avoids, fascination, and so on

These are the mostimportant words in our text

Now what we're gonna do iswe're gonna test our model

I've randomly given some reviews

If you want, let's add another review

We'll say I loved the movie

So I've added another review over here

Here we're just printing the review, and we're checking ifthis is a positive review or a negative review

Now let's look at our predictions

We'll save this and

I forgot to put a comma over here

Save it and let's run the file again

So these were our randomlywritten movie reviews

The predicted sentiment is positive

Our probability score was 0

61

It's pretty accurate here

This is a dull movie and Iwould never recommend it, is a negative sentiment

The cinematography is pretty great, that's a positive review

The movie is pathetic isobviously a negative review

The direction was terrible, and the story was all over the place

This is also consideredas a negative review

Similarly, I love the movieis what I just inputted, and I've got a positive review on that

So our classifier actuallyworks really well

It's giving us good accuracy and it's classifying thesentiments very accurately

So, guys, this was allabout sentimental analysis

Here we basically saw if a movie review was positive or negative

So guys, that was all for our NLP demo

I hope all of you understood this

It was a simple sentimental analysis that we saw through Python

So again, if you have doubts, please leave them in the comment section, and I'll help you with all of the queries

So guys, that was our last module, which was on natural language processing

Now before I end today's session, I would like to discuss with you the machine learning engineers program that we have Edureka

So we all are aware of the demand of the machinelearning engineer

So, at Edureka, we have a master's program that involves 200-plus hoursof interactive training

So the machine learningmaster's program at Edureka has around nine modules and 200-plus hours of interactive learning

So let me tell you the curriculum that this course provides

So your first module will basically cover Python programming

It'll have all the basics andall your data visualization, your GUI programming, your functions, and your object-oriented concepts

The second module will covermachine learning with Python

So you'll supervise algorithms and unsupervised algorithms along with statistics and time series in Python will be coveredin your second module

Your third module willhave graphical modeling

This is quite important whenti comes to machine learning

Here you'll be taughtabout decision making, graph theory, inference, andBayesian and Markov's network, and module number four will cover reinforcement learning in depth

Here you'll understandingdynamic programming, temporal difference, Bellman equations, all the concepts ofreinforcement learning in depth

All the detail in advance concepts of reinforcement learning

So, module number fivewill cover NLP with Python

You'll understand tokenization,stemming lemmatization, syntax, tree parsing, and so on

And module number six willhave module six will have artificial intelligence anddeep learning with TensorFlow

This module is a very advanced version of all your machine learning and reinforcement learningthat you'll learn

Deep learning will be in depth over here

You'll be using TensorFlow throughout

They'll cover all the conceptsthat we saw, CNN, RNN

it'll cover the varioustype of neural networks, like convolutional neural networks, recurrent neural networks, long, short-term memory, neural networks, and auto encoders and so on

The seventh module is all about PySpark

It'll show you how Spark SQL works and all the features andfunctions of Spark ML library

And the last module will finally cover about Python Spark using PySpark

Appropriate from this seven modules, you'll also get twofree self-paced courses

Let's actually take a look at the course

So this is your machine learning engineer master's program

You'll have nine courses, 200-plus hours of interactive learning

This is the whole course curriculum, which we just discussed

Here there are seven modules

Apart from these seven modules, you'll be given twofree self-paced courses, which I'll discuss shortly

You can also get to knowthe average annual salary for a machine learning engineer, which is over $134,000

And there are also a lot of job openings in the field of machinelearning AI and data science

So the job titles that you might get are machine learningengineer, AI engineer, data scientist, data and analytics manger, NLP engineer, and data engineer

So this is basically the curriculum

Your first will by Pythonprogramming certification, machine learningcertification using Python, graphical modeling,reinforcement learning, natural language processing, AI and deep learning with TensorFlow

Python Spark certificationtraining using PySpark

If you want to learn moreabout each of these modules, you can just go and view the curriculum

They'll explain each and every concept that they'll be showing in this module

All of this is going to be covered here

This is just the first module

Now at the end of this project, you will be given a verifiedcertificate of completion with your name on it, and these are the free elective courses that you're going to get

One is your Python scriptingcertification training

And the other is your Python Statistics for Data Science Course

Both of these coursesexplain Python in depth

The second course on statistics will explain all the concepts of statistics probability,descriptive statistics, inferential statistics, time series, testing data, data clustering, regressionmodeling, and so on

So each of the module isdesigned in such a way that you'll have a practical demo or a practical implementation after each and every model

So all the concept that Itheoretically taught to you will be explained through practical demos

This way you'll get agood understanding of the entire machinelearning and AI concepts

So, if any of you are interested in enrolling for this program or if you want to learn more about the machine learningcourse offered by Edureka, please leave your emailIDs in the comment section, and we'll get back to you with all the details of the course

So guys, with this, we come to the end of this AI full course session

I hope all of you haveunderstood the basic concepts and the idea behind AI machinelearning, deep learning, and natural language processing

So if you still have doubts regarding any of these topics, mention them in the comment section, and I'll try to answer all your queries

So guys, thank you so much forjoining me in this session

Have a great day

I hope you have enjoyedlistening to this video

Please be kind enough to like it, and you can comment any ofyour doubts and queries, and we will reply them at the earliest

Do look out for morevideos in our playlist and subscribe to Edurekachannel to learn more

Happy learning