R has a built-in function factorial()
, but we will build our own factorial function get_factorial()
. The following codes are just samples. You do not have to write the exactly same codes.
get_factorial <- function(x) { ## function to calculate a factorial
## Argument: x = vector of non-negative integers
## Return: res = the factorials of x
res <- rep(NA, length(x)) ## vector to save the factorials
for (i in seq_along(x)) { ## loop for x's
if (x[i] < 0 | x[i] != round(x[i])) { ## NaN if x[i] is negative or decimal
res[i] <- NaN
} else if (x[i] == 0){ ## factorial of 0 equals 1
res[i] <- 1
} else { ## for a non-negative integer, calculate
res[i] <- prod(1:x[i])
}
}
return(res)
}
This function treats three conditions differently. The conditions are as follows:
In Case 1, the function simply gives NaN (Not a Number) because the factorial of a negative or decimal number is not defined. (The gamma function is an extension of the factorial. While the factorial is usually only used for non-negative integers, the gamma is for non-negative real numbers. Since the built-in factorial()
uses gamma()
for its calculation, factorial()
can handle decimal numbers, which is not necessarily desirable.)
In Case 2, we know the anwer, and we do not have to calculate anything. Thus, it simply returns the answer 1. When the function reaches return()
, it ends evaluating the code. The rest of the code (after return()
) is not evaluated.
Then, in Case 3, we need to calculate the factorial. Finally, the function returns the the vector “res”, which contains the factorials of \(x\).
Let’s try using the function.
get_factorial(0)
## [1] 1
get_factorial(5)
## [1] 120
get_factorial(c(0, 5))
## [1] 1 120
get_factorial(c(-2, 0, 5, 6.2))
## [1] NaN 1 120 NaN
R has a built-in function choose()
, but we will create our own function get_choose()
. Since it is tedious to define the function from scratch, let’s use get_factorial()
function to define get_choose()
.
get_choose <- function(n, k) { ## combinations for choosing k from n
## Only works for non-negative n
## Arguments: n = vector of the total number of elements
## k = vector of the number of elements to choose from n
## Return: res = vector of possible combinations
## NOTE: n and k must have the same number of elements
if (length(n) != length(k)) {
stop(message = "lengths of n and k must be the same")
}
res <- rep(NA, length(n)) ## vector to save the results
for (i in seq_along(n)) {
if (n[i] != round(n[i])) { ## stops if n is decimal
stop(message = paste0("n[", i, "] (" , n[i], ") must be integer"))
} else if (k[i] != round(k[i])) { ## stops if k is decimal
stop(message = paste0("k[", i, "] (" , k[i], ") must be integer"))
} else if (n[i] < 0){ ## stops if n is negative
stop(message = paste0("n[", i, "] (" , n[i], ") must be non-negative"))
} else if (n[i] < k[i] | k[i] <0){ ## 0 prob. if n < k or if k < 0
res[i] <- 0
} else { ## calculate the combinations for the other
res[i] <- get_factorial(n[i]) /
(get_factorial(k[i]) * get_factorial(n[i] - k[i]))
}
}
return(res)
}
First, this function checks if the lengths of the vectors n and k are the same. If not, it prints an error message. As you can see in the codes, we use error()
to handle anomalies. We can tell the user what is wrong by priting a message defined in message.
Then, it handles the following five cases separately. Please note that these five cases are not mutually exclusive. If one condition were met, the other cases would not be evaluated. The conditions are examined in the following order.
In Cases 1 through 3, the function prints an error message, but three cases print different one. For Cases 1 and 2, the error asks the user to use an integer. For Case 3, it asks for a non-negative number.
In Case 4, we know that there is no possible combination, and we get 0. We have to calculate the number of possible combinations only in Case 5 using our own get_factorial()
.
As this example shows, we can use a function(s) to define another function. Thinking bacward from this fact, we do not have to build a large function in a lump. Instead, we can build a lot of small functions and combine them to make a large one. It is usually desirable to separate large functions into small chunks (1) because it is easier to maintain or revise small functions than a large one, and (2) because a small function can be used for a lot of different functions (or different purposes).
Now, let’s try using the function we created. You should also try to get an error.
get_choose(10, 3)
## [1] 120
get_choose(3, 8)
## [1] 0
get_choose(c(10, 3), c(3, 8))
## [1] 120 0
R has a built-in function dbinom()
, but we would like to define our own function pmf_binom()
.
pmf_binom <- function(x, n, prob) { ## PMF of binomial dist.
## Args: x = a random variable vector
## n = num. of Bernoulli trials
## prob = success rate of each Bernoulli trial, [0, 1]
## Return: mass = a vector of PMF f(x)
mass <- rep(NA, length(x)) ## vector to save prob. mass for each x
for (i in seq_along(x)) { ## calculate f(x) for each x
if (prob < 0 | prob > 1){ ## stops if prob isn't in [0, 1]
stop(message = paste0("prob (", prob, ") must be in [0, 1]"))
} else if (n < 0) { ## stops if n is negative
stop(message = paste0("n (", n, ") must be non-negative"))
} else if (n != round(n)) { ## stops if n is a decimal number
stop(message = paste0("n (", n, ") must be integer"))
} else if (x[i] != round(x[i])) { ## decimal x[i] has 0 prob.
mass[i] <- 0
} else {
mass[i] <- get_choose(n, x[i]) * prob^x[i] * (1 - prob)^(n - x[i])
}
}
return(mass)
}
This function takes a vector of \(x\)’s as an argument and calculates the probability mass for each \(x\) using a for-loop, where it disinguishes four cases.
In Cases 1 through 3, it prints an appropriate error massage. In Case 4, it saves the value of 0 as the result because there is no possibility to get a decimal number of successes. Only in Case 5, it calculates the probability mass of \(x\).
Let’s display graphically the PMF of the binomial distribution using pmf_binom()
.
Let’s set \(p=0.5\).
p <- .5
We will make figures of a binomial distribtuion for \(n= 2, 5, 10\), and \(50\).
First, let’s create a vector of the values for \(n\).
n_vec <- c(2, 5, 10, 50)
For each \(n\), we’ll make a figure of the PMF. Since we only have to change the value of \(n\) to make different figures, a for-loop works well here.
par(mfrow = c(2,2), oma = c(0, 0, 2, 0)) ## set 2-by-2 plot area
for (i in seq_along(n_vec)){ ## a loop for n
x_vec <- 0:n_vec[i] ## possible values of x
mass_vec <- pmf_binom(x = x_vec, n = n_vec[i], prob = p)
plot(x_vec, mass_vec, ylim = c(0, max(mass_vec)),
type = 'h', lwd = 2, col = "royalblue",
xlab = "x", ylab = "f(x): probability",
main = paste("n =", n_vec[i]))
}
mtext("Bin(n, p = 0.5)", outer = TRUE, cex = 1.5)
For \(p=0.8\), we only have to change the value of \(p\) and re-run the codes above.
p <- .8
par(mfrow = c(2,2), oma = c(0, 0, 2, 0)) ## set 2-by-2 plot area
for (i in seq_along(n_vec)){ ## a figure for each n
x_vec <- 0:n_vec[i] ## possible values of x
mass_vec <- pmf_binom(x = x_vec, n = n_vec[i], prob = p)
plot(x_vec, mass_vec, ylim = c(0, max(mass_vec)),
type = 'h', lwd = 2, col = "tomato",
xlab = "x", ylab = "f(x): probability",
main = paste("n =", n_vec[i]))
}
mtext("Bin(n, p = 0.8)", outer = TRUE, cex = 1.5)
Two for-loops are used here. One is for the values of the success rate \(p\); the other is for the number of Bernoulli trials \(n\). The loop for \(p\) is nested in the loop for \(n\). We will save the probability mass in the data frame “binom_res”.
p_vec <- c(.1, .3, .5, .9) ## use 4 different p's
## determine the dimension of the data frame
n_obs <- sum((n_vec + 1) * length(p_vec))
## make an empty data frame to save the results
binom_res <- data.frame(n = rep(NA, n_obs),
p = rep(NA, n_obs),
x = rep(NA, n_obs),
mass = rep(NA, n_obs))
## index to specify a row of the data frame
index <- 1
for (n in n_vec) { ## loop for 4 n's defined above
for (p in p_vec) { ## loop for 4 p's
x_vec <- 0:n ## possible values of x
## PMF
mass_vec <- pmf_binom(x = x_vec, n = n, prob = p)
## save the results to the data frame
slicer <- index:(index + n) ## set locations to save the results
binom_res$n[slicer] <- n
binom_res$p[slicer] <- p
binom_res$x[slicer] <- x_vec
binom_res$mass[slicer] <- mass_vec
## increment the index by n + 1 (num of possible outcomes)
index <- index + n + 1
}
}
Let’s examine the content of the data frame.
head(binom_res)
## n p x mass
## 1 2 0.1 0 0.81
## 2 2 0.1 1 0.18
## 3 2 0.1 2 0.01
## 4 2 0.3 0 0.49
## 5 2 0.3 1 0.42
## 6 2 0.3 2 0.09
Now we have the probability mass for each combination of \(n\), \(x\), and \(p\).
Lastly, let’s display the result graphically.
library("ggplot2")
binom_plt <- ggplot(binom_res, aes(x = x, y = mass)) +
geom_bar(stat = "identity") +
facet_grid(p ~ n, scale = "free_x", labeller = label_both) +
labs(y = "Probability", title = "PMF of Bin(n, p)")
print(binom_plt)