List of All Flashcard Terms¶
Link to the Interactive Flashcards
Here is a list of all of the data science terms included in these flashcards:
Basic Data Science Terms
- data
- data science
- data point
- variables
- features
- quantitative data
- qualitative data
- research question
Basics of Random Experiments
- scatter plot
- histogram
- relative frequency
- random experiment
- outcome
- set
- event
- event class
- fair experiment
- probability
Basics of Hypothesis Testing and Summary Statistics
- disjoint
- set
- partition
(data set) - statistical hypothesis
- binary hypothesis test
- null hypothesis
(for multiple groups) - model-based methods
- model-free methods
- resampling
- plot legend
- outlier
- mode
- median
- average (or sample mean) of a data set
Probability Spaces and Combinatorics
- event class
- power set
- probability measure
- composite experiment
- trial (compound experiments)
- repeated experiment
- statistical regularity
- outcome (of a random experiment)
- sample space
- event
- fair (random experiment)
- combinatorics
- Cartesian product
- permutation
Statistical Studies and Null Hypothesis Significance Testing
- Type-I error
- Type-II error
- power
- exact permutation test
- bootstrap distribution
- confidence interval (CI)
- statistical study
- experimental study or experiment
- observational study
- randomized control trial (RCT)
- natural experiment
- population
- population study
- sample
- cross-sectional study
- longitudinal study
- longitudinal cohort study
- prospective study
- retrospective study
- post hoc analysis
- selection bias
- sampling distribution
- mode(s) of a distribution
- unimodal distribution
- tail probability
- right, or upper, tail
- left, or lower, tail
- two-sided tail
Conditional Probability and Statistical Independence
- conditional probability
- statistically independent
(two events) - statistically independent
(any number of events) - pairwise statistically independent
- statistically independent (s.i.)
vs
mutually exclusive (m.e.) - conditional independence (events)
Bayes Rule and Optimal Decisions
- stochastic system
- likelihoods
(discrete stochastic systems) - a posteriori probability
(discrete stochastic system) - a priori probability
(discrete stochastic system) - Bayes' Rule
(discrete stochastic system) - base rate fallacy
- hidden state
- decision rule
(discrete stochastic system) - maximum likelihood (ML) rule
(discrete stochastic system) - MAP rule
(discrete stochastic system) - uninformative prior
- informative prior
- credible interval
Random Variables
- Borel sets (of $\mathbb{R}$)
- Borel field or Borel $\sigma$-algebra
- random variable
- range (of a random variable)
- discrete random variable
- probability mass function (PMF)
- cumulative distribution function (CDF)
- staircase function
- survival function (SF)
- discrete uniform random variable
- Bernoulli random variable
- Binomial random variable
- Geometric random variable
- Poisson random variable
- continuous uniform random variable
- probability density function (pdf)
- piecewise function
- (Continuous) Uniform RV
- Exponential RV
- inverse CDF
- Normal (Gaussian) RV
- Chi-squared RV
- Student's $t$ RV
Expected Values and Estimation
- expected value
(discrete random variable) - expected value
(continuous random variable) - mode (of a random variable)
- median (of a random variable)
- $n$th moment
- Law of the Unconscious Statistician
(LOTUS) - $n$th central moment
- variance
(random variable) - variance of a constant:
$\operatorname{Var}[c]$ - variance when adding a constant
$\operatorname{Var}[X+c]$ - variance when multiplying by a constant
$\operatorname{Var}[cX]$ - variance of sum of independent random variables
\begin{equation*} \operatorname{Var} \left[ \sum_{i=0}^{N-1} X_i \right] \end{equation*} - vector
- estimate
- estimator
- estimator error
- estimator bias
- unbiased estimator
- standard error of the mean
SEM - sampling distribution
- effect size
Point Conditioning, Non-Bayesian and Bayesian Decision Rules with Continuous Observations
- likelihood for discrete-input, continuous-output systems
- receiver operating characteristic (ROC) curve
- area under curve (AUC)
- point conditioning
a posteriori probability for discrete-input, continuous-output systems- total probability for CDFs
- total probability for pdfs
- total probability for events with point conditioning
Categorical Data, Contingency Tables, and Chi-Squared Tests
- categorical data
- ordinal data
- nominal data
- contingency table
- degrees of freedom
(contingency table) - one-way table
Covariance, Correlation, and Linear Regression
- vector
- component or element (vector)
- scalar
- size (of a vector)
- zero vector
- ones vector
- standard unit vector
- vector addition
- scalar-vector multiplication
- component-wise vector multiplication
(Hadamard product) - dot product/
inner product - norm squared
- norm
- distance (vectors)
- transpose
- covariance
(random variables) - covariance
(data vectors) - correlation coefficient
(random variables) - correlation coefficient
(data vectors) - explanatory variable
- response variable
- coefficient of determination
(simple linear regression) - total variance
(simple linear regression) - explained variance
(simple linear regression)
Jointly Distributed Random Variables, KLT, and PCA
- joint probability mass function
(pair of random variables) - joint cumulative distribution function
(pair of random variables) - joint probability density function
(pair of random variables) - marginal probability density function
(pair of random variables) - contour of equal probability density
(pair of random variables) - random vector
- mean vector
- covariance matrix
- correlation coefficient
(random variables) - uncorrelated
- correlation matrix
- iid
- broadcasting
- standardization
- eigenvector
- eigenvalue
- characteristic equation
- modal matrix
- eigendecomposition
- relating determinant and eigenvalues
- dimensionality reduction
- Karhunen-Loève Transform
(KLT) - principal components analysis
(PCA) - scree plot
- explained variance
- test-train split