Biostatistics - Variation Part 2


An essential part of data analysis is summarizing the data using descriptive statistics. In addition, visual display of the data enables the researcher to recognize the distribution of the variables and the relationship between different variables. In this session students will be introduced to methods of data summarization, and graphs commonly used in clinical research.


  • Arithmetic mean: The arithmetic average of the observations, which is denoted by µ in the population and by x ̅ in the sample. In a sample the mean is the sum of X values divided by the number n in the sample (ΣX/n)
  • Geometric mean: The nth root of the product of n observations
  • Median: It is the middle observation; i.e., the one that divides the distribution of values into is also equal to the 50th percentile
  • Mode: The value of a numerical variable that occurs the most frequently
  • Range: The difference between the largest and the smallest observation
  • Standard deviation: The most common measure of spread, denoted by σ in the population and SD or s in the sample. It can be used with the mean to describe the distribution of observations. It is the square root of the average of the squared deviations of the observations from their mean
  • Percentile: A number that indicates the percentage of a distribution that is less than or equal to that number
  • Interquartile range: The difference between the 25th percentile(first quartile) and the 75th percentile(third quartile)
  • Frequency table: A table showing the number and or the percentages of observations occurring at different values (or range of values) of a variable
  • Stem-and-leaf plot: A graphical display for numerical data. It is similar to both frequency table and histogram
  • Histogram: A bar graph of a frequency distribution of numerical observations
  • Frequency polygon: A line graph connecting the mid-points of the top of the columns of histogram
  • Boxplot: A graph that summarize the data by displaying the minimum, first quartile, median, third quartile, and maximum statistics
  • Error bar: A graph that displays the mean and a measure of a spread for one or more groups
  • Proportion: The number of observations with the characteristic of interest divided by the total number of observations
  • Ratio: A part divided by another part. It is the number of observations WITH the characteristic of interest divided by the number of observations WITHOUT the characteristic of interest
  • Rate: A proportion associated with a multiplier, called the base (e.g., 1000, 100,000) and computed over a specified period
  • Contingency table: A table used to display counts and or frequencies for two or more nominal or quantitative variables
  • Bar chart: A graph used with nominal characteristics to display the numbers or percentages of observations with the characteristic of interest
  • Scatterplot: A two-dimensional graph displaying the relationship between two numerical characteristics of variables

Additional (Optional) Reading

Chapter 9: Describing Data in Variation