Just as there are rules of grammar in composition, there are rules of graphing that help to visualize data for your audience. A well-designed graph should not need much explanation because the graph itself should make the trends in the data visually apparent. A well-designed graph also doesn't need any unnecessary decoration that doesn't convey useful information, such as depth on bars in a 2-D plot. Each of the following terms carries an important meaning.
Dependent and Independent Variables
Imagine that we want to make a graph of the amount of rainfall that occurs at different times of year. Rainfall depends on time of year, but time of year does not depend on rainfall. Therefore, rainfall is the dependent variable and time of year is the independent variable. In some graphs, you may have more than one dependent variable, but never more than one independent variable. For example, you may overlap plots of rainfall in the desert and rainfall in the tropics against time of year, or you could graph inches of rainfall in 2005 and 2006 against time of year.
The independent variable belongs on the x-axis (horizontal line) of the graph and the dependent variable belongs on the y-axis (vertical line). The x and y axes cross at a point referred to as the origin, where the coordinates are (0,0). In graphs with only positive values for x and y, the origin is in the lower left corner.
Each axis needs a scale to show the range of the data on that axis. The low end of the scale may be zero or a round number value slightly smaller than the smallest data point. The high end of the scale is usually a round number value slightly larger than the largest data point. The scale is measured off in major and minor tick marks. Typically the scale runs from low to high in easily counted multiples like 10s, 50s, 100s, etc. When graphs are compared side-by-side, consider scaling them to the same data range to make comparisons easier.
The Axis Labels
Each axis needs a descriptive axis label indicating which variable is represented. For example, the y-axis label might read "Total Rainfall" and the x-axis label might read "Month"
If you are measuring time, you must include the units as well as the numeric values so people will know if you are talking about seconds, minutes, hours, days, years, etc. If you are measuring rainfall, people won't know if you mean inches, millimeters, gallons, etc. unless you include the units. Units should be reported following the axis label, as in "Total Rainfall (inches)."
The Data Points
Typically, each independent measurement represents a point on the graph. If there are multiple data sets being plotted on the same graph, each set should be represented by a unique symbol.
Should you connect the data points with a line? Typically the answer is yes if the data points are part of a series of measurements of the same thing over a period of time, for example. The implication is that the values do not drop back to zero between measurements. However, if you are graphing a scatterplot, you might do a trendline or regression line through the points, but would not connect each one. In some cases, the relationship may not be linear, but exponential or logarithmic, or some other mathematical function, so a curve might be more appropriate than a line. However, there should be a reason why a particular curve is chosen.
The legend becomes important when you are graphing more than one dependent variable. For example, your legend might indicate that green lines or bars represent rainfall in the tropics while brown lines or bars represent rainfall in the desert region.
Use of Color
Colors or patterns should be used to help convey information, but should not be used simply for decoration. In the previous example, why were green and brown chosen? If the colors were reversed, would this be better or worse? Why?
Which Visual Representation?
The type of data you are presenting may be better suited for one kind of graph than another. For example, if your measurements are periodic samples of an ongoing event, like rainfall each day, then a line with points helps to convey that message. If on the other hand, you are first averaging across distinct units of time like months, then bars might work better. If you are trying to visually display the pieces of a whole, a piechart might be a good choice.
Each point on the graph might represent a single data point, or the average of a collection of measurements at that point. In that case, it would be appropriate to express the range of variation around that point. Typically the error around the mean is expressed as the standard deviation, but with small sample sizes, the standard error is sometimes used.
The title should be a brief statement describing the subject of the graph, but should not describe or interpret the results.
What not to include
The graph should only include elements that enhance the interpretation, and there should be a minimum of visual adornment. For example, bars should not be 3-D unless the third dimension adds information.