Principles of Analytic Graphs

R version 4.0.0 (2020-04-24) -- "Arbor Day"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Workspace loaded from C:/Users/kk/PortableApps/Git/home/k-allika/repos/DataScienceWithR/.RData]

setwd("C:/Users/kk/PortableApps/Git/home/k-allika/repos/DataScienceWithR/04_Exploratory_Data_Analysis/week01/workspace")
library(swirl)

| Hi! I see that you have some variables saved in your workspace. To keep things running
| smoothly, I recommend you clean up before starting swirl.

| Type ls() to see a list of the variables in your workspace. Then, type rm(list=ls()) to
| clear your workspace.

| Type swirl() when you are ready to begin.

swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you did
| then. If you are new, call yourself something unique.

What shall I call you? Krishnakanth Allika

| Please choose a course, or type 0 to exit swirl.

1: Exploratory Data Analysis
2: Getting and Cleaning Data
3: R Programming
4: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Principles of Analytic Graphs 2: Exploratory Graphs
3: Graphics Devices in R 4: Plotting Systems
5: Base Plotting System 6: Lattice Plotting System
7: Working with Colors 8: GGPlot2 Part1
9: GGPlot2 Part2 10: GGPlot2 Extras
11: Hierarchical Clustering 12: K Means Clustering
13: Dimension Reduction 14: Clustering Example
15: CaseStudy

Selection: 1

| Attempting to load lesson dependencies...

| Package ‘jpeg’ loaded correctly!

| | 0%

| Principles_of_Analytic_Graphs. (Slides for this and other Data Science courses may be
| found at github https://github.com/DataScienceSpecialization/courses/. If you care to use
| them, they must be downloaded as a zip file and viewed locally. This lesson corresponds to
| 04_ExploratoryAnalysis/Principles_of_Analytic_Graphics.)

...

|== | 3%
| In this lesson, we'll discuss some basic principles of presenting data effectively. These
| will illustrate some fundamental concepts of displaying results in order to make them more
| meaningful and convincing. These principles are cribbed from Edward Tufte's great 2006
| book, Beautiful Evidence. You can read more about them at the www.edwardtufte.com website.

...

|===== | 6%
| As a warm-up, which of the following would NOT be a good use of analytic graphing?

1: To show causality, mechanism, explanation
2: To show multivariate data
3: To decide which horse to bet on at the track
4: To show comparisons

Selection: 3

| Keep up the great work!

|======= | 8%
| You're ready to start. Graphs give us a visual form of data, and the first principle of
| analytic graphs is to show some comparison. You'll hear more about this when you study
| statistical inference (another great course BTW), but evidence for a hypothesis is always
| relative to another competing or alternative hypothesis.

...

|========= | 11%
| When presented with a claim that something is good, you should always ask "Compared to
| What?" This is why in commercials you often hear the phrase "other leading brands". An
| implicit comparison, right?

...

|============ | 14%
| Consider this boxplot which shows the relationship between the use of an air cleaner and
| the number of symptom-free days of asthmatic children. (The top and bottom lines of the
| box indicate the 25% and 75% quartiles of the data, and the horizontal line in the box
| shows the 50%.) Since the box is above 0, the number of symptom-free days for children
| with asthma is bigger using the air cleaner. This is good, right?

...

graph

|============== | 17%
| How many days of improvement does the median correspond to?

1: 4
2: -2
3: 1
4: 12

Selection: 3

| That's correct!

|================ | 19%
| While it's somewhat informative, it's also somewhat cryptic, since the y-axis is claiming
| to show a change in number of symptom-free days. Wouldn't it be better to show a
| comparison?

...

|================== | 22%
| Like this? Here's a graphic which shows two boxplots, the one on the left showing the
| results for a control group that doesn't use an air cleaner alongside the previously shown
| boxplot.

...

|===================== | 25%
| By showing the two boxplots side by side, you can clearly see that using the air cleaner
| increases the number of symptom-free days for most asthmatic children. The plot on the
| right (using the air cleaner) is generally higher than the one on the left (the control
| group).

...

graph

|======================= | 28%
| What does this graph NOT show you?

1: Half the chidren in the control group had no improvement
2: Children in the control group had at most 3 symptom-free days
3: 75% of the children using the air cleaner had at most 3 symptom-free days
4: Using the air cleaner makes asthmatic children sicker

Selection: 4

| You're the best!

|========================= | 31%
| So the first principle was to show a comparison. The second principle is to show causality
| or a mechanism of how your theory of the data works. This explanation or systematic
| structure shows your causal framework for thinking about the question you're trying to
| answer.

...

|============================ | 33%
| Consider this plot which shows the dual boxplot we just showed, but next to it we have a
| corresponding plot of changes in measures of particulate matter.

...

graph

|============================== | 36%
| This picture tries to explain how the air cleaner increases the number of symptom-free
| days for asthmatic children. What mechanism does the graph imply?

1: That the air cleaner increases pollution
2: That the air cleaner reduces pollution
3: That the children in the control group are healthier
4: That the air in the control group is cleaner than the air in the other group

Selection: 2

| You are amazing!

|================================ | 39%
| By showing the two sets of boxplots side by side you're explaining your theory of why the
| air cleaner increases the number of symptom-free days. Onward!

...

|=================================== | 42%
| So the first principle was to show some comparison, the second was to show a mechanism, so
| what will the third principle say to show?

...

|===================================== | 44%
| Multivariate data!

...

|======================================= | 47%
| What is multivariate data you might ask? In technical (scientific) literature this term
| means more than 2 variables. Two-variable plots are what you saw in high school algebra.
| Remember those x,y plots when you were learning about slopes and intercepts and equations
| of lines? They're valuable, but usually questions are more complicated and require more
| variables.

...

|========================================== | 50%
| Sometimes, if you restrict yourself to two variables you'll be misled and draw an
| incorrect conclusion.

...

|============================================ | 53%
| Consider this plot which shows the relationship between air pollution (x-axis) and
| mortality rates among the elderly (y-axis). The blue regression line shows a surprising
| result. (You'll learn about regression lines when you take the fabulous Regression Models
| course.)

...

graph

|============================================== | 56%
| What does the blue regression line indicate?

1: Pollution doesn't really increase, it just gets reported more
2: As pollution increases fewer people die
3: As pollution increases the number of deaths doesn't change
4: As pollution increases more people die

Selection: 2

| Excellent job!

|================================================ | 58%
| Fewer deaths with more pollution? That's a surprise! Something's gotta be wrong, right? In
| fact, this is an example of Simpson's paradox, or the Yule–Simpson effect. Wikipedia
| (http://en.wikipedia.org/wiki/Simpson%27s_paradox) tells us that this "is a paradox in
| probability and statistics, in which a trend that appears in different groups of data
| disappears when these groups are combined."

...

|=================================================== | 61%
| Suppose we divided this mortality/pollution data into the four seasons. Would we see
| different trends?

...

|===================================================== | 64%
| Yes, we do! Plotting the same data for the 4 seasons individually we see a different
| result.

...

graph

|======================================================= | 67%
| What does the new plot indicate?

1: Pollution doesn't really increase, it just gets reported more
2: As pollution increases the seasons change
3: As pollution increases more people die in all seasons
4: As pollution increases fewer people die in all seasons

Selection: 3

| That's correct!

|========================================================== | 69%
| The fourth principle of analytic graphing involves integrating evidence. This means not
| limiting yourself to one form of expression. You can use words, numbers, images as well as
| diagrams. Graphics should make use of many modes of data presentation. Remember, "Don't
| let the tool drive the analysis!"

...

|============================================================ | 72%
| To show you what we mean, here's an example of a figure taken from a paper published in
| the Journal of the AMA. It shows the relationship between pollution and hospitalization of
| people with heart disease. As you can see, it's a lot different from our previous plots.
| The solid circles in the center portion indicate point estimates of percentage changes in
| hospitalization rates for different levels of pollution. The lines through the circles
| indicate confidence intervals associated with these estimates. (You'll learn more about
| confidence intervals in another great course, the one on statistical inference.)

graph
...

|============================================================== | 75%
| Note that on the right side of the figure is another column of numbers, one for each of
| the point estimates given. This column shows posterior probabilities that relative risk is
| greater than 0. This, in effect, is a measure of the strength of the evidence showing the
| correlation between pollution and hospitalization. The point here is that all of this
| information is located in one picture so that the reader can see the strength of not only
| the correlations but the evidence as well.

...

|================================================================= | 78%
| The fifth principle of graphing involves describing and documenting the evidence with
| sources and appropriate labels and scales. Credibility is important so the data graphics
| should tell a complete story. Also, using R, you want to preserve any code you use to
| generate your data and graphics so that the research can be replicated if necessary. This
| allows for easy verification or finding bugs in your analysis.

...

|=================================================================== | 81%
| The sixth and final principle of analytic graphing is maybe the most important. Content is
| king! If you don't have something interesting to report, your graphs won't save you.
| Analytical presentations ultimately stand or fall depending on the quality, relevance, and
| integrity of their content.

...

|===================================================================== | 83%
| Review time!!!

...

|======================================================================= | 86%
| Which of the following is NOT a good principle of graphing?

1: To integrate multiple modes of evidence
2: Having unreadable labels
3: To describe and document evidence
4: Content is king

Selection: 2

| You are really on a roll!

|========================================================================== | 89%
| Which of the following is NOT a good principle of graphing?

1: To prove you're always right
2: To show two competing hypotheses
3: To demonstrate a causative mechanism underlying a correlation
4: Content is king

Selection: 1

| You nailed it! Good job!

|============================================================================ | 92%
| Which of the following is NOT a good principle of graphing?

1: To integrate different types of evidence
2: To show that some fonts are better than others
3: To show good labels and scales
4: Content is king

Selection: 2

| That's the answer I was looking for.

|============================================================================== | 94%
| True or False? Color is king.

1: False
2: True

Selection: 2

| Not quite, but you're learning! Try again.

| Think of the sixth principle

1: True
2: False

Selection: 2

| You are doing so well!

|================================================================================= | 97%
| Congrats! You've concluded exploring this lesson on principles of graphing. We hope you
| found it principally principled.

...

|===================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 1

| Nice work!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: Exploratory Data Analysis
2: Getting and Cleaning Data
3: R Programming
4: Take me to the swirl course repository!

Selection: 0

| Leaving swirl now. Type swirl() to resume.

Last updated 2020-05-02 16:41:08.783416 IST

Comments