Naive Bayes 

Naive Bayes is a modeling approach employed to solve classification issues in which the Y variable may be multiple classes. If the variable being studied is categorical, frequency values are employed, while when it is a continuous variable, the gaussian density functions are used to calculate the probabilities. 

Naive Bayes is built upon the Bayesian Theorem. Before going into the intricacies of the theorem, and an in-depth explanation of the operation of Naive Bayes, it is important to be aware of the practical use of Naive Bayes because it will be easy to comprehend the work of Naive Bayes by using an example. In this instance, we’ll be looking at categorical variables that are independent.

Quick Example 

We have a data set that includes four categorical variables and an independent binary variable. 

This dataset contains 4 variables called X. These can be referred to as symptoms, while the Y variable reveals whether the individual suffers from a disease called ‘Z or not. We need Naive Bayes to give us the possibility of determining whether a patient who has no fever but has diabetes, high blood pressure, and vomiting suffers from a condition called ‘Z or not. This can be defined with the help of a formula that is described below. (More about the formula will be discussed in this blog) 

The formula for using Naive Bayes 

To determine the necessary inputs to perform some calculations 

Step 1 

Find out how many No and Yes exist on the Y Variable. This is the probability of the class being P( C ), which in our case is a person who suffers from the disease “Z” or not. Thus, the likelihood of someone suffering from disease “Z” is 9/14. The chance of a person who is not suffering from disease “Z 5/14. In this case, we just determined the number of observations for every class of dependent variables (Yes/No) and then divided it by the sum of the number of observations. 

Step 2 

We now calculate the probability for every type of each feature. Let’s consider the first feature as an instance. We calculate the chance of suffering from disease “Z for the variables “Blood Pressure” as well as class “high,” which results in a ratio of 2.9. We also determine the likelihood of a person who is not suffering from the disease Z for the variables “Blood Pressure,” as well as class “high,” which is 3/5. We can do the same for the various classes of the variables x and have this table. 

Step 3 

Now let’s consider the main question that needs to be addressed, whether a person who is not feverish but suffering from diabetes, high blood pressure, and vomiting suffers from a disease called “Z” or not. 

From our table above, we first determine the likelihood of someone suffering from the disease “Z” for high blood pressure, fever = no Yes, Diabetes=yes, and Vomit=yes. 

So, the chance of being a victim of disease “Z” is high blood pressure. 

i.e. 

P (Blood Pressure = high Suffering from disease “Z’ is a definite ) = 2-9 

Similar to this, we see it for other variables 

The P (Fever = no, suffering from disease Z is a definite ) = 3-9 

P (Diabetes = yes Suffering from illness “Z” is a yes ) = 3-9 

P (Vomit = yes Suffering from Z-related disease (yes ) = 3-9 

We also calculate the likelihood of a person not suffering from the disease “Z” with the same symptoms. We then calculate some probabilities for the same. 

P (Blood Pressure = high Suffering from disease “Z” means no ) = 3/5 

P (Fever = no | not suffering from the disease “Z” means no ) = 1/5 

P (Diabetes = yes “Z” = not ) = 4/5 

P (Vomit = yes Suffering from Z-related disease = not ) = 3/5 

Step 4 

It is necessary to add all possible causes of the disease Z to equal “yes” for all variables (dependent and independent variables) 

So when Z is ‘Yes,’ we perform this calculation: 

P(X (Suffering from illness Z’ is a yes) P(Suffering from Z’ disease means indeed) = ((2/9) * (3/9) (3/9) (3/9) (3/9) (3/9)) 9/14) (9/14) 

The above calculation is used to come up with a value that will give us an estimate of the probability of X in a particular class multiplied by the likelihood of the class. The probability could be expressed as P( C ). In our example, C = Probability of someone suffering from the disease “Z.” 

The calculation in Step 4 creates the numerator for the Naive Bayes formula. 

P( X | C ) = (2/9) x (3/9) x (3/9) x (3/9) = 0.222 x 0.333 x 0.333 x 0.333 = 0.00823 

P( C ) = 9/14 = 0.624857 

P (P( X | C ) x P( C ) = 0.00823 x 0.624857 = 0.005291 

The same calculation is done, but this time, C represents the probability of a person being free from a disease called ‘Z.’ 

P( X | C ) = (3/5) x (1/5) x (4/5) x (3/5) = 0.6 x 0.2 x 0.8 x 0.6 = 0.0576 

P( C ) = 5/14 = 0.357143 

P( X | C ) x P( C ) = 0.0576 x 0.357143 = 0.020571 

Step 5 

This is in which we input values to the numerator in the Naive Bayes formula. This is where you divide these calculations with the probabilities of the X (evidence) to normalize the results. 

This probability of X can be determined by dividing the total instances of a class by the total number of cases. In this case, we have high blood pressure, fever=no, and Diabetes=yes. We have 5 instances of high blood pressure, 4 times for fever = no, 7 for diabetes=yes, and 6 for Vomit = yes. We subdivide each of these values by the total amount of samples and then add them all. 

Therefore, we perform this calculation: 

P(X) is P(Blood High Pressure) + P(Fever = zero) * P(Diabetes is yes) P(Vomit = yes) P(Vomit is yes) 

P(X) P(X) = (5/14) x(4/14) 7/14) x (7/14) 7/14) (6/14) 

P(X) = 0.357143 x0.285714 x 0.5 x 0.428571 

P(X) = 0.021866 

It is also possible to find nearly the exact same value by adding the final values in the second step: 0.005291 and 0.020571 

Step 6 

This is the last step in which we will divide the values we found in step 4 by the values we found in Step 5. 

Therefore, we perform these calculations. 

P(Suffering from a disease called “X) = 0.005291 = 0.021866 / 0.021866 = 0.241974 

P(Suffering from the disease “Z” = no”x”) = 0.020571 / 0.021866 = 0.940776 

Then we examine both probabilities and discover that the probability of not suffering from the Z-related disease is higher than the other probability. So it is concluded that the individual with these symptoms isn’t suffering from the disease “Z’. 

Knowing the steps in relation to the equation 

Naive Bayes is an algorithm that is supervised for linear classification (when visualized, its decision boundary could appear as a straight line or an elliptical, circular, or parabolic curvature) that generates a generative model that solves the problem of classification. Naive Bayes is a sub-type of a Bayesian Classifier. 

Bayesian Classifiers are part of the category of probabilistic classifiers. In this blog, we’ve primarily talked about modeling algorithms whose purpose is to discover a function that analyzes the information input (x variables) and then makes forecasts (y variable). However, in a probabilistic classifier, the class is predicted on the probability computed for the class. Thus it first computes the probabilities of a class, and then the ones with the highest probability are chosen to create an estimate. 

In contrast to other models where the equation appears to be similar to f(x) =”y” (learn an equation from the input x to predict the value of y), In the probabilistic classifier, we calculate y = maximum P(c|x) (compute the value called y, which is the p of c given x, for all kinds of C). 

In our case, we saw the situation that a person is suffering from an illness-yes or no, then the x would be different symptoms, and c will be two categories, yes (have disease “z”) as well as no (doesn’t suffer from disease “z”) The classification system will yield two figures, the probability that the person is suffering from a condition ‘z’ with these symptoms, and the probability of not having that illness based on those symptoms. Then, it determines the class with the greatest probability and will predict the class accordingly. In order to determine the likelihood for each class, we need to determine other probabilities. We employ this formula in Bayesian classifiers to calculate the probabilities. 

In this case, P(c) is the probability of someone suffering from the disease “z,” given the symptoms. We can determine this with a Bayesian formula in which the numerator can be found by examining the probability of the x given by c multiplied by its probability. Then it is subtracted by the number of p in x, which is a summation of the various kinds of the c, i.e., the probability of x for one class multiplied by the probability of the class. 

So, we have three components to address the prior probability, conditional probability for class, and the normalizer. 

Prior probability ( c )( c ) Probability of a particular class of c, for our instance, we will consider the likelihood of someone suffering from a disease “z” regardless of the symptoms. 

Class Conditional Probability P(xc) In the event that someone has the condition ‘z,’ the probability of noticing specific symptoms occurring or not. 

Normalizer: P(x), A normalizing constant 

Prior Probability 

In this case, our c determines whether a person is suffering from a condition ‘z or not. The p of c is the pre. What we are trying to convey by this calculation is that without checking for the signs of an individual (without looking at the variables x), What are the probabilities of someone suffering from the disease “z”? Suppose we assume that”z” is extremely rare. In that case, the chance of suffering from disease “z” is very low. In our data of 1,000 records, the number of instances with a dependent variable 1 (person suffering from the disease “z”) is just 10. there is a high probability that the class prior to being diagnosed is very low. So, if 10 records exhibit all symptoms associated with the disease “z,” the odds of having a person suffering from this type of disease remain extremely low. This can help provide us with a level at which we can be confident in this model in determining the cause based on previous information. 

Class Conditional Probability 

The x is defined as a class of c. such as a person with the illness ‘z.’ What is the likelihood of symptoms we’re experiencing as a result of disease ‘z’? So, for instance, if he is suffering from a high fever, assuming that he is suffering from the disease “z,” what is the likelihood that he has? This is calculated to determine how well the model can predict. 

Normalizer 

Normalizer is the part of x, which is the denominator. It is often left out of the calculation of the Bayesian classifier because it does not affect the classification. To grasp this mathematically, you need to know that the Bayesian equation for each of the classes of c would look like this: 

P (c=1|x) = P(x|c=1) x P(c=1) / P(x) 

P (c=0|x) = P(x|c=0) x P(c=0) / P(x) 

It is necessary to evaluate the two equations above, but we can also eliminate the P(x) 

P (c=1|x) = P(x|c=1) x P(c=1) / P(x) 

P (c=0|x) = P(x|c=0) x P(c=0) / P(x) 

In this case, we’re left with the numerator. It can be used to estimate the type of information input. This means that we can disregard the chance of having a group of symptoms that are not connected to any particular class (In our case, we could suppose the chance of a person suffering from high fever, regardless of whether they suffer from the illness ‘z’ as well). Since these numbers are the same for both classes, they don’t impact the probability of the highest for both classes of c and may be left unnoticed. Therefore, we can conclude that the answer to our scenario was found in step 4 alone. 

In this way, by using all three elements, we can calculate our posterior probability P(c|x), which is the amount to which we believe the model accurately categorizes with respect to the input information and primary data. 

Calculation for Continuous Variables 

In our case, If the independent variables Pressure and Fever were continuous, and Pressure and Fever were continuously measured, then the calculation might have been different. We could have calculated the mean and the standard deviation of these continuous variables for every category of the dependent variable if we had determined that the average blood pressure at Y=1 is 73. Still, it’s 75 when the Y is zero. We also find the standard deviation of Blood Pressure when the Y=1 value is 6.2 and 7.90 when the Y is zero. Similar results are found for Fever and arrive at two different values for what is called the median (79 when the Y=1 value is 86 and 86 when the Y=0) and the standard deviation (10.2 for the Y=1 value is 9.7 when you are 0). 

Returning to the formula, we concentrate on the numerator in which the formula P( C ) is the exact same. However, the calculations are different for the class conditional Probability; p (x|c) differs in the case where the probability for a particular variable is determined in a different way when it comes to continuous variables. We employ a probability density function that looks like this: 

Suppose we take a number such as a blood pressure of 66. What would be its class probability, assuming that a category is a person who suffers from a disease “z”? Therefore, we apply Probability Density Function to get a result of 0.0340 

This is required for all the classes of c for all continuous variables, for the given values for the variables of the x. Then, we use these values to determine the numerator of the formula: 

If C is, Peron suffers from illness “z.” 

P( X | C ) = (0.340) x (0.2221) x (3/9) x (3/9) = 0.340 x 0.2221 x 0.333 x 0.333 = 0.008374 

P( C ) = 9/14 = 0.624857 

P (P( X | C ) x P( C ) = 0.008374 x 0.624857 = 0.005232 

The same calculation is done, but this time, C represents the probability of a person who is not suffering from the disease “Z.” 

P( X | C ) = (0.0291) x (0.0380) x (4/5) x (3/5) = 0.0291 x 0.0380 x 0.8 x 0.6 = 0.000531 

P( C ) = 5/14 = 0.357143 

P( X | C ) x P( C ) = 0.000531 x 0.357143 = 0.00019 

It is now possible to normalize the values by dividing them by the normalizer’s value. This is 0.005232 + 0.00019 = 0.005422 

So, the chance of a person suffering from the disease ‘z’ turns at 0.005232 (0.005232) / 0.005422 = 0.964958, and the chance of someone not suffering from”z” disease is found at 0.00019 or 0.005422 equals 0.035042. Since the likelihood of suffering from the illness “z” is higher, it is concluded that the person suffers from the condition “Z.” 

Limitations of Naive Bayes 

What distinguishes Naive Bayes from all different Bayesian Classifiers is its naive assumption that variables are inseparable from each other. It is, however, generally observed that Naive Bayes can be used even when the variables aren’t, in fact, dependent on each other. Still, the breaking of this assumption could make predictions incorrect. Since it is the case that the Naive Bayes theorem assigns equal weight to all variables involved, it’s crucial to ensure that there isn’t any multicollinearity among variables and that identical and redundant variables are not considered for calculation. 

In the case of numerical variables, one is assumed to be in normal distributions as the function of probability (used to calculate the Bayes theorem if the dependent variables have a continuous nature) needs for the numerical variables to be normally distributed to function correctly. 

Additionally, suppose a particular categorical variable class is not present in the training data set and shown to the test dataset. In that case, the Naive Bayes model can likely not produce accurate predictions as the probabilities for class-conditional variables will be zero. To overcome this issue of zero frequency, various techniques for smoothing are employed, including the Laplace Estimation, in which we add one as an additional smoothing term to ensure that the algorithm does not divide by zero. 

Also, Naive Bayes is sensitive to outliers, so treating outliers is essential when using this model. Additionally, if you use Naive Bayes to determine the text (Span and not Spam) and so on, the model could fail when trying to classify phrases; for instance, it did when Google introduced it, and users were searching for the term “Chicago Bulls. The results resulted in pictures of the Chicago Bull as well as the City of Chicago instead of the pictures from the Chicago Bulls. American Basketball team Chicago Bulls. 

Advantages of Naive Bayes 

Naive Bayes is extremely simple to use and a speedy high-performance algorithm capable of solving many real-world classification problems. It also does well when faced with multiclass classification issues. Additionally, we require less training data than logistic regressions may need and produces a highly robust model even when there are lots of features (as high as 20,00 features and beyond) which is why it is that it is widely used to detect spam and other issues where words are features, which causes the model to be very large dimensions. 

Naive Bayes is an efficient technique for Classification Problems and has its own advantages and drawbacks. It adjusts the probabilities as more evidence is available and is, therefore, able to work when there is a lot of data. It is always beneficial to examine how accurate Naive Bayes is in relation to other advanced classification algorithms, such as SVM and ANN. SVM and ANN, it has been observed that accuracy with Naive Bayes is usually equal to the precision of these advanced and more complex algorithms. So, making use of Naive Bayes is a great choice when you are faced with a classification issue. 

Leave a Reply