Understanding Uncertainty
An Introduction to Probability and Statistics
Preface
What are these notes?
These notes were originally written to supplement the Winter 2024 offering of STAT 1793 at the University of New Brunswick on the Saint John campus. They have since grown into a more comprehensive resource, attempting to direct a path through introductory probability and statistics material, both for students in a Statistics program, as well those who require probability or statistics for their major. Owing to the wide range of possible audiences, not all sections of the notes will be relevant to all students. I have tried to clearly indicate requirements throughout the text.
Further, these notes remain a work-in-progress. There is additional material being added, old material being revised or clarified, and some restructuring going on. If at any point you have comments, questions, suggestions, or corrections to these notes, please do reach out!
In the writing of these notes, I am indebted to the numerous introductory texts that I have previously read, and the multitude of Statistics instructors that I have had the pleasure of studying with. As a result, I am certain that I have drawn inspiration from many sources for examples, exercises, or techniques of description. Where parallels with other work are explicit and intentional, those works are mentioned alongside the relevant passages. Where parallels are the result of internalized knowledge coming forth as inspiration, I am not able to draw direct connections. However, I will advise that the following books are all excellent resources, and each has played a part in my current approach to teaching statistics, either directly or indirectly. For introductory topics consider Engelhardt and Bain (1991); Navidi (2010); Mendenhall et al. (2013) among countless others. For problems, consider Bassett et al. (2000); Grimmett and Stirzaker (2001); Schiller, Alu Srinivasan, and Spiegel (2012).
One final note is that our coverage of concepts throughout reflects my personal biases and proclivities towards what is important in a sequence on probability and statistics. For instance, compared to others I may have a larger emphasis on probability. This is reflected both in the ordering of material (we start at Probability not at Statistics), and in the weight that each topic is given. It is my belief that you need to be able to fluently speak the language of probability in order to study statistics, and the notes begin with a strong focus on this. If these notes are being used in a two course sequence, it is my belief that the first term should consider almost exclusively probability, leaving statistical results for a second course.
Using These Notes
These notes are best viewed online. You can access them from anywhere at https://dylanspicker.github.io/Understanding-Uncertainty/index.html. Viewing online will allow you to take advantage of interactivity (such as through code blocks and solutions hiding), the search feature (visible in the left hand margin), as well as better overall formatting. There is a version of these notes in PDF format available. To download them, press the PDF icon button () next to the note title in the left hand bar. The PDF notes contain all the same content, and will be updated alongside the web notes. The formatting of the PDF document may be less visually appealing compared to the web notes. Though an effort will be made to ensure that everything remains legible, and well-formatted, there may be formatting differences. Please reach out if any components of the PDF are illegible.
Studying Statistics (and other quantitative subjects)
To successfully use these notes, I would recommend a several step procedure:
- When reading the material, or watching lecture videos, focus on developing your understanding. Statistics is not a subject that is predominantly memory-driven. It is not about remembering the facts, formula, or question types, it is about understanding how these all come together.
- If you get lost along the way, ask questions.
- When you get to examples in these notes, do not look at the solutions directly. Instead, you should try to solve the examples yourself. If you get stuck, look at the solution, but only enough to see remove the current confusion, and then continue.
- The process of learning statistics, and most quantitative subjects for that matter, relies on doing the work in these subjects.
- Looking at solutions will often undermine your own understanding. It is a much different process to look at a solution and understand why it is correct, than it is to determine a solution for yourself.
- Complete the practice questions.
- Note: struggling with problems is a key component of learning in any mathematical discipline. It is only by struggling to understand how the pieces fit together that you will develop a proper understanding for yourself.
- Throughout your study, make sure you are continually asking yourself how this fits together, how this confirms or contradicts what you already know, and how this may be used in your life beyond these pages.
- Drawing personal connections to the material is a very effective way of ensuring that it sticks better.
Throughout the notes there will be references to computer code in a programming language known as R. R is the most common language used for statistics by practicing statisticians, and it is a great utility for many of the types of problems we will be considering throughout these notes. It is possible to pass through the notes without writing R code, however, familiarity with it will make the process of statistics far nicer to engage with. If you are using the web notes this code can all be run directly in the browser. If you are using the PDF notes, the code and output is still provided. In either case, I would suggest getting R running on your own device, if possible. It is free to download and install. I would also suggest installing R Studio, which is a program specifically designed for writing and running R code with a lot of nice features.
Some Mathematical Background
These notes require high school mathematics to complete. Some sections of the notes require calculus (limits, derivatives, and integrals). It is possible to pass through the core of these notes skipping any sections that specifically require calculus: these are noted in text. For all sections, the assumption is that you will be comfortable manipulating mathematical expressions, solving basic equations, using a calculator, and so forth. The following are a handful of topics which will come up throughout the notes that are assumed to be known, presented here in brief as a quick reference.
Logarithms and Exponent Rules
Recall that if \(a^x = b\), then we can solve for \(x\) through the use of logarithms. Specifically, \(x = \log_a(b)\), where we read this as “the logarithm with base \(a\) of \(b\).” The expression means that “\(x\) is the exponent we put on \(a\) to get \(b\).” In these notes1 we use the notation \(\log(x)\) to represent the natural logarithm. The natural logarithm is \(\log_e(x)\), where \(e \approx 2.71828\) is Euler’s number. The exponential function is given by \(e^{x} = \exp(x)\). Which notation is used depends on the specific setting, but both are equivalent.
Recall further the following exponent and log rules:\[\begin{align*} a^b\times a^c &= a^{b + c} \\ a^b\times c^b &= (ac)^b \\ a^{-b} &= \frac{1}{a^b} \\ \left(a^{b}\right)^c &= a^{bc} \\ a^0 &= 1 \\ \log(ab) &= \log(a) + \log(b) \\ \log(a^b) &= b\log(a) \\ \log(1) &= 0 \\ \log(e) &= 1. \end{align*}\]
Summation and Product Notation
If we wish to add up a series of numbers, \(x_1,x_2,\dots,x_n\) then we can use summation notation to record this. Specifically, \[x_1 + x_2 + \cdots + x_n = \sum_{i=1}^n x_i.\] Here \(i\) is an indexing variable just used to keep track of which number we are currently working on. Note that the following quantities will hold \[\begin{align*} \sum_{i=1}^n (x_i + y_i) &= \sum_{i=1}^n x_i + \sum_{i=1}^n y_i \\ \sum_{i=1}^n (x_i - y_i) &= \sum_{i=1}^n x_i - \sum_{i=1}^n y_i \\ \sum_{i=1}^n cx_i &= c\sum_{i=1}^n x_i. \end{align*}\]
We can also use a similar shorthand to discuss repeated products. That is, if we want to multiply \(x_1,x_2,\dots,x_n\), then we can write \[x_1\times x_2\times\cdots\times x_n = \prod_{i=1}^n x_i.\] Again, \(i\) is used as an indexing variable to keep track of what value is currently being multiplied. The following results hold \[\begin{align*} \prod_{i=1}^n x &= x^n \\ \prod_{i=1}^n b^{x_i} &= b^{\sum_{i=1}^n x_i} \\ \prod_{i=1}^n cx_i &= c^n\prod_{i=1}^n x_i. \end{align*}\]
Other Important Points
The following are some additional points that are useful to know, but which do not necessarily fit together nicely.
- For any two values \(x\) and \(y\), we get \((x + y)^2 = x^2 + 2xy + y^2\).
- The integers are the set of all whole numbers, \(\{0, \pm1,\pm2,\dots\}\). We will often denote the set of integers using \(\mathbb{Z}\).
- The natural numbers are the set of all counting numbers, either \(\{0,1,2,3,\dots\}\) or else \(\{1,2,3,4,\cdots\}\) depending on the source. We often denote the set using \(\mathbb{N}\).
- The real numbers are the set of all numbers that you think of when thinking of numbers. They include the integers, and any fractions, and anything that cannot be written as a fraction (like \(\pi\) and \(e\)) as well. We denote the real numbers as \(\mathbb{R}\).
- Intervals of real numbers are denoted as \((a,b)\) or \([a,b]\) (or a mixture, giving \([a,b)\) or \((a,b]\)). A round bracket means that the endpoint is not included in the set, where a square bracket means that it is. That is, a value of \(x\) is in \((a,b)\) if \(a < x < b\), where it is in \([a,b]\) if \(a \leq x \leq b\).
- We write “\(x\) is in the interval \((a,b)\)” mathematically using the \(\in\) symbol. Specifically, \(x \in (a,b)\).
- Infinity (\(\infty\)) is not a number. Rather, it is a statement regarding the lack of a bound. When we say that something is infinity, or infinite, we simply mean that it is larger than any of the natural numbers. If you were to be given any number, then infinity will be larger than that.
- Limits: note that, while limits are from calculus, they are useful for formally understanding a few concepts in class. The limit of a function, \(f(x)\), is simply the value that the function approaches as \(x\) approaches a specific point. So, for instance, \(\lim_{x\to a} f(x)\) is the value that \(f(x)\) approaches as \(x\) approaches \(a\). If \(f(a)\) exists then it is simply \(f(a)\). However, \(f(a)\) may not exist, at which point, we look at where the function is tending to.
- Infinite Limits: Of specific importance to us are infinite limits. Specifically, \[\lim_{x\to\infty}f(x)\] captures the beahviour of the function \(f(x)\) as \(x\) gets larger and larger. In many cases, \(f(x)\) will approach some specific value, which we assign to be the limiting value of the function.
Additional Resources
The following are helpful resources which may be used to supplement the material in the class. If you have any suggested resources, please feel free to send them to me, and I will include them here.
References
And in most math books and courses.↩︎