The factorial of a non-negative integer \(n\) is denoted \(n!\), and \[ n! = n(n - 1)(n - 2)\cdots 1,\] where \(0!=1\).
To calculate factorials, I cretate an R function below other than using the built-in function.
r.fct <- function(n){ ## function to calculate factorials
if(n == 0){ ## if n is 0, the formula of n = 1 is returned
return(n = 1)
}
else{ ## if n is not 0, the formula of n(n-1)(n-2)...1 is returned
return(n * r.fct(n - 1 : 1))
}
}
We are able to get the outcome by using the function above, for instance, the case of calculating \(0!\) is
r.fct(0)
## [1] 1
and \(5!\) is
r.fct(5)
## [1] 120
This is a formula which denotes choosing k elements from n \[ {n \choose k} = \frac{n!}{k!(n-k)!}\]
By using the function to calculate factorial above, I create an R function calculating the number of possible combinations that can be obtained by taking a subset of elements from a larger set.
cmbntn <- function(n, k){ ## function to calculate combination
com <- r.fct(n) / (r.fct(n - k) * r.fct(k))
return(com)
}
Then, we can actually calculate combination, for instance, in the case of \({5 \choose 2}\), the result is as below:
cmbntn(5, 2)
## [1] 10
and in the case of \({20 \choose 18}\), the result is
cmbntn(20, 18)
## [1] 190
The PMF of a random variable \(X\) can be written as \[f(x) = {n \choose x} p^{x} (1-p)^{(n-x)}~~~~~~~(x=0,1,2,\cdots, n)\] By denoting the binomial distribution as \(b(n, p)\), it is possible to say: \[X \sim b(n,p)\] where \(n\) is the number of independent bernoulli trials, and \(p\) is the probability of success for each trial.
By using the function I make at second question, I create a R function to calculate probability mass of binomial distributions as below.
Pmf <- function(x, n, p){ ## function to calculate PMF of binomial distribution
## arguments:
## x = number of successe for each trial
## n = number of bernoulli trials
## p = probability of succesful outocome
binPmf <- r.fct(n) / (r.fct(n - x) * r.fct(x)) * p^x *(1 - p)^(n - x)
return(binPmf)
}
This is an outocome as the number of success\((x)\) \(= 2\), the number of trials\((n)\) \(= 2\), and probability of succesful outcome\((p)\) \(= 0.2\).
Pmf(2, 2, 0.3)
## [1] 0.09
Moreover, in the case of \(x = 2\), \(n = 2\), and \(p = 0.6\), we get the result as below.
Pmf(2, 2, 0.6)
## [1] 0.36
To display graphically the PMF of binomial distributions for \(n = 2\), \(5\), \(10\), and \(50\), I write commands by using the function I create at third question.
First of all, we have to load the packages we will use.
library(ggplot2)
library(gridExtra)
Then, we determine the number of trials \((n)\) and probability of succesful outcome \((p)\),
n <- 2 ## setting the number of trials
p <- 0.3 ## setting the probability of succesful outcome
in addition, make a variable to save the results got from the simulations.
## making variables to save the results from the simulations
bin01 <- c(Pmf(0, n, p), rep(NA, n)) ## PMF of binomial distribution
trl <- 0:2 ## the number of trials
After that, we need preparing to use for loop to run a simulation.
for(i in 1:n){ ## loop for the simulation, i = 1, 2, ..., trials
b <- Pmf(i, n, p) ## assigning the results of calculating PMF to b
bin01[i + 1] <- b ## storing b(results) in bin01
}
Finally, we use ggplot to visualize the results.
## making a data frame to use ggplot
deta <- data.frame(bin01, trl)
## making a plot
ss50 <- ggplot(deta, aes(x = trl, y = bin01)) +
geom_bar(stat = "identity", aes(), colour = "black", fill = "white") +
ylab("PMF") +
xlab("trial") +
ggtitle("n = 2, p = 0.3")
print(ss50)
We can use same commands written above to visualize the results, however need to change the number of trials and the probability of succesful outcome so that we get another results.
n <- 5
p <- 0.3
bin01 <- c(Pmf(0, n, p), rep(NA, n))
trl <- 0:5
for(i in 1:n){
b <- Pmf(i, n, p)
bin01[i + 1] <- b
}
deta <- data.frame(bin01, trl)
ss55 <- ggplot(deta, aes(x = trl, y = bin01)) +
geom_bar(stat = "identity", aes(), colour = "black", fill = "white") +
ylab("PMF") +
xlab("trial") +
ggtitle("n = 5, p = 0.3")
print(ss55)
n <- 10
p <- 0.3
bin01 <- c(Pmf(0, n, p), rep(NA, n))
trl <- 0:10
for(i in 1:n){
b <- Pmf(i, n, p)
bin01[i + 1] <- b
}
deta <- data.frame(bin01, trl)
ss60 <- ggplot(deta, aes(x = trl, y = bin01)) +
geom_bar(stat = "identity", aes(), colour = "black", fill = "white") +
ylab("PMF") +
xlab("trial") +
ggtitle("n = 10, p = 0.3")
print(ss60)
n <- 50
p <- 0.3
bin01 <- c(Pmf(0, n, p), rep(NA, n))
trl <- 0:50
for(i in 1:n){
b <- Pmf(i, n, p)
bin01[i + 1] <- b
}
deta <- data.frame(bin01, trl)
ss65 <- ggplot(deta, aes(x = trl, y = bin01)) +
geom_bar(stat = "identity", aes(), colour = "black", fill = "white") +
ylab("PMF") +
xlab("trial") +
ggtitle("n = 50, p = 0.3")
print(ss65)
library(ggplot2)
n <- 2
p <- 0.3
bin01 <- c(Pmf(0, n, p), rep(NA, n))
trl <- 0:2
for(i in 1:n){
b <- Pmf(i, n, p)
bin01[i + 1] <- b
}
deta <- data.frame(bin01, trl)
ss70 <- ggplot(deta, aes(x = trl, y = bin01)) +
geom_bar(stat = "identity", aes(), colour = "black", fill = "white") +
ylab("PMF") +
xlab("trial") +
ggtitle("n = 2, p = 0.6")
print(ss70)
n <- 5
p <- 0.6
bin01 <- c(Pmf(0, n, p), rep(NA, n))
trl <- 0:5
for(i in 1:n){
b <- Pmf(i, n, p)
bin01[i + 1] <- b
}
deta <- data.frame(bin01, trl)
ss75 <- ggplot(deta, aes(x = trl, y = bin01)) +
geom_bar(stat = "identity", aes(), colour = "black", fill = "white") +
ylab("PMF") +
xlab("trial") +
ggtitle("n = 5, p = 0.6")
print(ss75)
n <- 10
p <- 0.5
bin01 <- c(Pmf(0, n, p), rep(NA, n))
trl <- 0:10
for(i in 1:n){
b <- Pmf(i, n, p)
bin01[i + 1] <- b
}
deta <- data.frame(bin01, trl)
ss80 <- ggplot(deta, aes(x = trl, y = bin01)) +
geom_bar(stat = "identity", aes(), colour = "black", fill = "white") +
ylab("PMF") +
xlab("trial") +
ggtitle("n = 10, p = 0.6")
print(ss80)
n <- 50
p <- 0.6
bin01 <- c(Pmf(0, n, p), rep(NA, n))
trl <- 0:50
for(i in 1:n){
b <- Pmf(i, n, p)
bin01[i + 1] <- b
}
deta <- data.frame(bin01, trl)
ss85 <- ggplot(deta, aes(x = trl, y = bin01)) +
geom_bar(stat = "identity", aes(), colour = "black", fill = "white") +
ylab("PMF") +
xlab("trial") +
ggtitle("n = 50, p = 0.6")
print(ss85)
To easily make sure the difference between each trial, we are able to combine these plots by using grid.arrange().
grid.arrange(ss50, ss55, ss60, ss65)
grid.arrange(ss70, ss75, ss80, ss85)