What do we mean by charts and graphs? We are going to use these terms to broadly refer to graphical representations of data. You are already familiar with many types of charts and graphs such as the food pyramid, a chart used to represent how much of each food group you should eat, or the battery symbol on a screen, a chart that represents your device's battery level. In this lesson, we are going to cover some commonly used graphs and charts. We will cover histograms, bar graphs, pie charts, scatterplots, line charts, and box plots. Being able to analyze each one correctly will allow you to draw meaningful conclusions from sets of data. It’s also important to keep in mind that a single data set can usually be represented using more than one type of chart/graph.

Histograms are used to represent the **distribution **of a data set over a continuous domain. The domain is broken down into intervals, commonly referred to as **bins**. These bins each make up some range of the variable being measured. This variable could be a time, weight, price, etc. The applications of histograms are practically endless and their uses range from scientific research to everyday business and daily activity. They are a clear way of representing the distribution of a data set.

**Now let’s look at an example.**

This histogram is being used to represent the distribution of minutes that the last 66 cars spent waiting at a railroad crossing. Imagine the data set as a list made up of the specific time each car spent waiting at the crossing.

This histogram has the number of cars as the *y-*axis unit and minutes spent waiting as the *x*-axis unit. Each bin indicates how many cars spent waiting at a time in that range. Therefore, the value of all the bins should add up to the total of 66 cars. You can think of the edges of each bin as representing the outer limits of the bin’s range.

This same histogram could also be displayed as follows:

Notice that that the same data is represented on this histogram, but the way the bins are labeled is slightly different. Rather than listing the intervals for each bin, this histogram only lists the endpoints of the bins.

Now let’s go over some basic analysis of this histogram. As with any data set, we can use the histogram to help us find the mode, median, and mean

So, what would be the **mode** of this histogram? The mode is whichever value has the greatest frequency; in other words, which value shows up the most. In this case, the mode is the bin making up the 8-9 minute interval because the greatest number of cars (14) waited for 8 to 9 minutes.

How about the **median** of this histogram? Remember that the median is the middle value of a data set. Think about listing out all the times for the 66 cars. The median would be whatever value is exactly in the middle.

Notice how there are 33 cars (half the total 66) on either side of the dashed line, which is being used to represent the median waiting time. This means the median waiting time would be 7 minutes.

How about the **mean**, or average, of this histogram? This would be the average time spent waiting at the crossing. With a histogram, the mean can be calculated by multiplying the middle value of each bin by that bins frequency and then adding together the calculated values of all the bins. You can see how this can be done on the histogram below.

Notice how in red, the middle value of each bin’s interval is multiplied by each bin’s respective height. Once this is done for all bins, we add the values together and then divide by the total number of cars. In this case, we have 66 total cars. The following represents this complete calculation for the mean.

The answer comes out to an average time of 6.54 minutes waiting at the railroad crossing.

As you now know, histograms are used for grouping data into intervals, or “bins”. The height of each bin can be used to determine that bin's frequency out of the whole data set. It is important to remember that histograms are just an approximate representation and that they are good for looking at trends and distribution for a data set, but they are not good at giving specific details on individual pieces of data from the set.

Bar graphs are a great way to break down a series of data into different categories with each category being proportionally represented by a bar. This means that the height of each bar represents the value of that category, with the units indicated on the opposite axis. Bar graphs can also be a great way to compare multiple series of data paired into the same unique categories.

**Example:**

The following bar graph shows the breakdown of a person’s monthly expenses grouped into 8 different categories: Food, Transportation, Housing, Entertainment, Pets, Phone/Internet, Shopping, and Saving. The amount of spending in each category is represented in dollars by the *y*-axis.

**What can we determine from this bar graph?**

**What is Mark's largest monthly expense?**Housing. He spends $800 per month on it.**How much money does he spend on food each month?**He spends $300 each month on food.**How much is Mark's entire monthly budget?**This can be calculated by adding together the value of all the categories. Going from left to right down the graph: 300 + 250 + 800 + 100 + 75 + 100 + 200 + 175 = 2000. His entire monthly budget is $2000.

We can also make a bar graph showing a monthly budget breakdown for multiple people with the same categories. This allows us to compare two different data sets using different colored bars.

**Now what can we determine from this graph?**

**Whose budget has more spending categories?**Jill does not have any expenses for pets or savings, meaning she has 2 less spending categories than Mark.**Who has the larger monthly budget?**Mark's budget was calculated to be $2000 by adding all the blue bars. Jill’s budget comes out to be $2100 by adding all the orange bars together. Therefore, Jill has the larger budget by $100.**In which categories does Jill spend more than Mark?**Jill spends more on food, housing, entertainment and shopping than Mark does.**Is there any category with equal spending between Jill and Mark?**Yes, they both spend $100 per month on phone and internet.

As you can see, bar graphs are useful when comparing different categories, as well as between multiple sets of data.

Pie charts are one of the most basic and common ways to display the **proportions** of each data set relative to the whole. This is done by making a circle, or “pie,” and dividing it into slices that proportionally represent each data set’s contribution to the whole. The slices of a pie chart will often be labeled with the percentage of the pie each slice comprises. These charts are an intuitive way to represent the distribution of a data set into distinct groups that are easy to understand.

**Let’s look at an example.**

A class of students is given a survey about how many siblings they have. Their responses are represented on the following pie chart.

The pie chart shows us that there 21% of the students replied as having no siblings, 28% of the students have one sibling, 24% of the students have two siblings, 18% of the students have three sibling and 9% of the students have four or more siblings. Here are some other things you might be asked about this pie chart:

**What is the mode of this data set?**The mode is one sibling because 28% of students replied that they have one sibling, which makes up the largest piece of the pie.**What’s the least common response?**The slice representing 4 or more siblings makes up the smallest portion of the pie, only 9%. Therefore, this is the least common response.**If you were to randomly ask someone in that class how many siblings they have, what is the probability they have at least 2 siblings?**We want to know what proportion of the pie is made up by responses for 2 or more siblings. This means the slices for two, three, and 4+ are all valid answers and should be added together. Adding the percentages for these slices together, we get 24 + 18 + 9 = 51%. This means there is a 51% chance that the person would have at least 2 siblings.**If you are told that the survey was answered by 500 students, how many of those students have exactly 3 siblings?**We know that 18% of the students have 3 siblings. This means that 18/100 students or 0.18 students have 3 siblings. We need need to multiply the total number of students surveyed, 500, by our percentage, 0.18, to find the number of students with 3 siblings. This gives us an answer of 0.18 × 500 = 90. There are 90 students that have 3 siblings.

A scatterplot is a group of points plotted on a cartesian plane representing two different variables. You can think of this as points placed on a plane consisting of an *x*-axis and *y*-axis, with each axis labeled as one of the variables. The location of each point is defined by its coordinate pair (*x*,* y*). These plots are commonly used to analyze if there is some sort of relationship, or **correlation**, between the two variables being measured.

**Consider the following example. **

The heights and weights of 200 adults are recorded. Each person’s height and weight is represented by a point on the scatterplot where the *x*-coordinate represents the height value and the *y*-coordinate represents the weight.

**What can we learn about this data set from analyzing the scatterplot?**

**What is the heaviest weight out of the 200 people surveyed?**We would need to find whichever point has the greatest*y*-value. This appears to be roughly 160 pounds.**What would be this person’s height?**The point with the highest weight has a corresponding x-value of about 71 inches. This means the heaviest person in the group weighed around 160 pounds and was around 71 inches tall.

**What is the range of heights for the people surveyed?**All of the heights are roughly between 63-74 inches, meaning there was a range of about 11 inches in the heights of the people surveyed.**What is the relationship between the height and weight values?**In general, it appears there is a weak positive relationship between the two. This means, in general, as one variable increases, the other variable increases as well, but it is not true for every point.

You will often see a squared R value, R², presented with a scatterplot. R² can be thought of as a measurement of the **strength** of the relationship between the two variables being measured on a scatterplot.

An **R² value of 1** defines a perfect relationship, meaning one variable can perfectly predict the value of the other variable. In other words, there is a direct linear relationship between the two variables. The example below shows a trend line included along with the data set, showing us that all the points lie along the line.

You would expect an R² of 1 when relating feet to inches because 1 foot always equals 12 inches. This means if you know the length of something in inches, you also know its length in feet.

An **R² value of 0** defines no relationship between the two variables. This means knowing the value of one variable gives no information on the value of the other variable. For example, a scatterplot showing the relationship between the number of letters in someone's name and how high they can jump. This would likely result in an R² value of 0 because you do not expect these to be related to each other. Below is an example of a plot with a R² value near 0.

A line chart, or line graph, consists of a series of data points that are connected by a line. They are similar to a scatterplot, but while a scatterplot maps the relationship between two values and may contain a large number of points, a line chart consists of relatively few ordered measurements that are joined by straight line segments (rather than a line of best fit). Line graphs are frequently used to show trends over time.

**Now let’s look at an example.**

This line chart represents the number of sunglasses sold (in thousands) in each month of the year.

**What can we observe from this graph?**We can see the general trend in sunglasses sales throughout the year. We can see that sunglasses sales increase from January until July, and then decline rather abruptly after September.**In which month is the largest number of sunglasses sold?**The month with the largest number of sunglasses sold is July.**Between which two months does the number of sunglasses sold increase the most?**The portion of the line graph that has the steepest, positive slope is between April and May. This means the greatest increase in sunglasses sold is between these months.**Between which two months does the number of sunglasses sold decrease the most?**Just like when finding the largest increase, the largest decrease on a line chart will be the steepest line segment, but with a negative slope. Looking at the chart, it could be hard to tell if the largest decrease occurs between August and September, or September and October. To determine the largest decrease, we are looking for the line segment that has the largest negative slope. We can use the slope formula to determine the slope between August and September: (note that the*x*-axis is essentially a measure of time, so the change in*x*= 1 month). From September to October, the slope is . Therefore, the greatest decrease occurred between September and October.

A box plot, also known as a **box-and-whisker plot**, is used to show the distribution of data. It is a useful visual tool because it allows us to easily see the median and range of a set of data, as well as its different **quartiles**. Quartile is a statistical word that means quarter - data can be divided into four quartiles, where the first quartile represents the lowest 25% of data points and the fourth quartile represents the highest 25% of data points. The median is in between the second and third quartile since 50% of the data will fall above and below this point.

Let’s look at a simple example.

Above is a simple box and whisker plot with a normal distribution. This means that the data are symmetrical about the median. We can see that the line in the middle represents the **median** - in this box-and-whisker plot, the median is 15. The bottom and top “whiskers” represent the **minimum **and **maximum** data points, respectively.

The box and whisker plot can be divided into four quartiles, as shown above. The first quartile (Q1), is represented by the area between the bottom whisker and the bottom of the box. The second quartile (Q2) contains the data points that fall in the area between the bottom of the box and the median. The third quartile (Q3) contains points between the median and the top of the box. The fourth quartile (Q4) contains points that fall between the top of the box and the top whisker. For example, the data point 16 would fall within the third quartile.

We can also easily determine the range of a box-and whisker-plot by subtracting the maximum value from the minimum value. The range of this set of data would be 18 − 12 = 6.

**Now let’s look at a more complicated example. **

The box plot below shows a class’ scores on 3 different tests.

You may notice that these boxes aren’t as symmetrical as in our first example. The shape or symmetry of the box can tell us about the distribution and variance of the data.

Let's think about the kinds of questions we can answer using a box-and-whisker plot:

**Which test had the highest median score?**Looking at the median line of each test, we can see that the median was the highest on Test 1, and the lowest on Test 3.**Which test had the highest maximum score?**Looking at the top whisker, Test 3 had the highest maximum score, and Test 1 had the lowest maximum score.**Which test had the largest range of scores?**We can determine the range of the data by comparing the distance between the top whisker and the bottom whisker. Test 3 has the greatest distance between the top and bottom whisker, so it has the greatest range.

Create a free account below to start practicing nearly 7,000 adaptive questions.