Biostatistics - Variation


An essential part of data analysis is summarizing the data using descriptive statistics. In addition, visual display of the data enables the researcher to recognize the distribution of the variables and the relationship between different variables. In this session students will be introduced to probabilities laws, variable types and frequency distributions commonly used in clinical research.



Product (multiplication) rule

  • Used to estimate probability of two events
  • Depends on the assumption of independence or not
  • One probability is not influenced by the outcome of another probability i.e. P(A+|B+)= P(A+|B-)
  • Independence assumed
    • P(A and B)= P(A) x P(B)
  • Independence NOT assumed (used in Bayes theorem)
    • P(A and B)= P(A) x P(B|A) = P(B) x P(A|B)
  • Addition rule
    • All possible different probabilities in situation must add to one
    • Mutually exclusive events
      • P(A or B)= P(A) + P(B)
    • Mutually inclusive events (non-mutually exclusive)
      • Modified addition rule
        • P(A or B or both)= P(A) + P(B) – P(A and B)

Types of variables

  • Nominal
  • Dichotomous (binary)
  • Ordinal (ranked)
  • Continuous (interval)
  • Continuous (ratio)
  • Risks and proportions
  • Counts and units of observation

  • Nominal variables: The simplest scale of measurement. Used for characteristics that have no numerical values, no measurement scales and no rank order. It is also called a categorical or qualitative scale ex. Skin color
  • Dichotomous: Dichotomous from the Greek “cut into two” variables. Are qualitative variables that have two categories.
  • Ordinal (ranked): Are categorical (qualitative) scales of three or more levels. Used for characteristics that have an underlying order to their values; that have clearly implied direction from better to worse.
  • Numerical (continuous) scales: The highest level of measurement. It is used for characteristics that can be given numerical values; the difference between numbers has meaning, ex. BMI, height.
    • Has a value on a continuum
      • Interval: arbitrary zero point
        • Ex. Centigrade temperature scale
      • Ratio: absolute zero point
        • Ex. Kalvin temperature scale
  • Numerical (discrete): Has values equal to integers. Units of observation: person, animal, thing, etc.…Presented in frequency tables.
  • Risk: Risk is the conditional probability of an event (e.g. death) in a defined population in a defined period.
  • Frequency distribution: TABLE of data displaying the VALUE of each data point ( or range of data points) in one column and the FREQUENCY with which that value occurs in the other column/axis or PLOT of data displaying the VALUE of each data point ( or range of data points) on one axis and the FREQUENCY with which that value occurs on the other axis.
  • Frequency table: A table showing the number and or the percentages of observations occurring at different values (or range of values) of a variable.
  • Normal distribution: A symmetric bell-shaped probability distribution with a shape that is determined by mean (µ) and standard deviation (σ)
  • Standard normal distribution: The normal distribution with mean 0 and standard deviation 1
  • T-distribution: A symmetric distribution with mean 0 and standard deviation larger than that for the normal distribution which is used for small sample sizes. Used if the population standard deviation is unknown
  • Binomial distribution: The probability distribution that describes the number of successes X observed in ‘n’ independent trials, each with the same probability of occurrence
  • Chi-square distribution: The distribution used to analyze counts in frequency tables. A nonsymmetrical distribution with mean (µ) and variance (σ2). Used for categorical (nominal) data

Additional (Optional) Reading

Chapter 9: Describing Data in Variation