Bayes' Rule

CS123, Intro to AI

Weeks 
1. Overview of AI6. Neural networks, deep learning and Generative AI
2. History and applications of AI7. Prompt engineering
3. Bayes Rule and Machine Learning8. Custom chatbot creation
4. More Machine Learning9. Social and ethical issues of AI
5. Midterm11. Final

 

Table of Contents

What's Due This Week

Bayes' Rule

Bayes’ Rule is a theorem in probability and statistics that describes a way to update the probabilities of predictions when given new evidence. It’s named after Thomas Bayes, who provided the first mathematical formulation of the rule in the eighteenth century (1763).

At its core, Bayes’ Rule is about learning from experience. It provides a mathematical framework for integrating new evidence (a new likelihood) into our prior beliefs and predictions (prior odds). It is used in a wide range of disciplines, including medicine, psychology, and artificial intelligence.

Odds vs. Probabilities

These are different ways of representing the concpetthe chances of something happening.

Probability

This is written as a fraction or a percent representing the the number of times a thing happens divided by the total number of occurance. For thowing a die, the probability of rolling a 6 is 1/6 or about 16.7%.

Odds

This is another way of representing the same thing, but we write the ratio the number of times a thing happens to the number of times it doesn't happen. For the example above, we would write the odds as: 1:5.

Prior odds

Definition assessment of odds before adding new information.

Example
The odds of rain in Eugene1 are 146:219.

Likelihood ratio

Definition The probability of the observation in the case of the event of interest divided by the probability of the observation in the case of no event, where:

The likelihood ratio is where the new information comes in. It is a second set of probabilities that gives us new information that will improve the accuracy of predictions.

Example
The new information is: the probability that clouds in the morning predict rain later in the day.

(This is not real data and is different from the example in the book)

Likelihood of rain on days it is cloudy is: 7/9 ÷ 1/9 = 7 (I chose the odds so that the math is easy, the denominators don't always have to be the same.)

Posterior odds

Definition This is the end result. The new odds that are calculated after the new information has been added.

Bayes' Formula: posteriorOdds = priorOdds * likelihoodRatio

Example 146:219 * 7 = 1022:219 or (82.4% chance). This is the odds of rain on days it is cloudy in the morning.

 

A Sentiment Analysis Example

One application of the Bayes categorization methodology might be to analyze the sentiment of online blog and forum posts. For example, LCC might want to evaluate public opinion about the college by analyzing blog and forum posts to see how many express an overall positive opinion and how many are negative.

Training

Before using the AI software system to do sentiment analysis, the system has to be trained.

Labeling

First the sentiment evaluation system needs to be trained. This can be done by scraping posts off the public internet, have a human read each of them and label them as positive or negative. Let's say we got 50 negative and 100 positive.

Calculating Likelihood Values

Next, the number of times each word appears in each category of post would be counted (this is called word frequency). Here are a few of the words and their frequencies:

WordNegative frequencyPositive frequency
fine1050
disappoint7012
fun1560
smart540
frustrate855
Total words5001000

Next we calculate the likelihood values for posts with negative sentiments.

To calculate the likelihood value, we need two probabilities for each word:

We calculate likelihood values for each of the words using this formula;

(2)likelihood=negFreq/negTotalposFreq/posTotal

Here are the results:
(Reminder, we are calculating the likelihood that a particular word indicates a post with negative sentiment.)

WordLikelihood
fine10/500 ÷ 50/1000 = 0.40
disappoint70/500 ÷ 12/1000 = 11.67
fun15/500 ÷ 60/1000 = 0.50
smart5/500 ÷ 40/1000 = 0.25
frustrate85/500 ÷ 5/1000 = 34.00

Setting a Decision Threshold

Next we would calculate the odds of each post having negative sentiment. In other words we would do inference on our training data. Let's say these were the average of the posterior probabilities for each type of post:
(We'd actually want to look at standard deviations, etc. as well. But let's keep this simple.)

We might then decide that 75 is the threshold value for deciding the sentiment of a post.

Inference

Inference is the term we use to describe the process of making decisions based on the training. In this case, we want to make decisions based on posterior odds (in the form of a probability) which are calculated using the formula below. We are assuming the prior odds (odds of a post having negative sentiment) are 1:1 or a probability of 0.5.

(3)probability=prior×liklihood1×likelihood2×...×likelihoodn

Let's say we want to do inference on two new posts. The posts have these words (as well as others) in them:

 

Reference

Thomas BayesWikipedia

Bayes' Rule - Explained for BeginnersFreeCodeCamp

An Intuitive (and Short) Explanation of Bayes’ TheoremBetter Explained

A Hitchhiker’s Guide to Sentiment Analysis using Naive-Bayes Classifier Anwesan De, Medium, 2021.


Creative Commons License Intro to AI lecture notes by Brian Bird, written in , are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

MS Copilot GPT-4 was used to draft parts of these notes.