Subsetting Vectors

Krishnakanth Allika

2020-10-01 18:19

R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Workspace loaded from C:/Users/kk/PortableApps/Git/home/k-allika/repos/DataScienceWithR/.RData]

library("swirl")

| Hi! I see that you have some variables saved in your workspace. To keep things running
| smoothly, I recommend you clean up before starting swirl.

| Type ls() to see a list of the variables in your workspace. Then, type rm(list=ls()) to
| clear your workspace.

| Type swirl() when you are ready to begin.

ls()
[1] "my_char" "my_data" "my_div" "my_na" "my_name" "my_seq" "my_sqrt"
[8] "num_vect" "old.dir" "tf" "x" "y" "z"
rm(list=ls())
swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you
| did then. If you are new, call yourself something unique.

What shall I call you? Krishnakanth Allika

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Workspace and Files 3: Sequences of Numbers
4: Vectors 5: Missing Values 6: Subsetting Vectors
7: Matrices and Data Frames 8: Logic 9: Functions
10: lapply and sapply 11: vapply and tapply 12: Looking at Data
13: Simulation 14: Dates and Times 15: Base Graphics

Selection: 6

| | 0%

| In this lesson, we'll see how to extract elements from a vector based on some
| conditions that we specify.

...

|== | 3%
| For example, we may only be interested in the first 20 elements of a vector, or only
| the elements that are not NA, or only those that are positive or correspond to a
| specific variable of interest. By the end of this lesson, you'll know how to handle
| each of these scenarios.

...

|==== | 5%
| I've created for you a vector called x that contains a random ordering of 20 numbers
| (from a standard normal distribution) and 20 NAs. Type x now to see what it looks like.

x
[1] -0.68754438 NA NA NA NA NA NA
[8] NA -0.01654302 1.03010195 -0.40799451 -0.55849418 NA -0.07687958
[15] -0.05351510 NA NA 1.16924926 1.60452324 -0.08284351 1.66735009
[22] NA NA 2.18942224 -0.14724334 NA NA -0.99999522
[29] NA NA -0.12665386 -0.61215464 -0.58919026 NA NA
[36] 1.12894965 -1.36770314 NA -1.33061090 NA

| You are amazing!

...

|======== | 10%
| For a simple example, try x[1:10] to view the first ten elements of x.

x[1:10]
[1] -0.68754438 NA NA NA NA NA NA
[8] NA -0.01654302 1.03010195

| You got it right!

|========== | 13%
| Index vectors come in four different flavors -- logical vectors, vectors of positive
| integers, vectors of negative integers, and vectors of character strings -- each of
| which we'll cover in this lesson.

...

|============ | 15%
| Let's start by indexing with logical vectors. One common scenario when working with
| real-world data is that we want to extract all elements of a vector that are not NA
| (i.e. missing data). Recall that is.na(x) yields a vector of logical values the same
| length as x, with TRUEs corresponding to NA values in x and FALSEs corresponding to
| non-NA values in x.

...

|============== | 18%
| What do you think x[is.na(x)] will give you?

1: A vector of length 0
2: A vector of all NAs
3: A vector of TRUEs and FALSEs
4: A vector with no NAs

Selection: 2

| That's the answer I was looking for.

|================ | 21%
| Prove it to yourself by typing x[is.na(x)].

x[is.na(x)]
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| You got it!

|================== | 23%
| Recall that ! gives us the negation of a logical expression, so !is.na(x) can be read
| as 'is not NA'. Therefore, if we want to create a vector called y that contains all of
| the non-NA values from x, we can use y <- x[!is.na(x)]. Give it a try.

x[!is.na(x)]
[1] -0.68754438 -0.01654302 1.03010195 -0.40799451 -0.55849418 -0.07687958 -0.05351510
[8] 1.16924926 1.60452324 -0.08284351 1.66735009 2.18942224 -0.14724334 -0.99999522
[15] -0.12665386 -0.61215464 -0.58919026 1.12894965 -1.36770314 -1.33061090

| Nice try, but that's not exactly what I was hoping for. Try again. Or, type info() for
| more options.

| Type y <- x[!is.na(x)] to capture all non-missing values from x.

y<-x[!is.na(x)]

| You are doing so well!

|===================== | 26%
| Print y to the console.

y
[1] -0.68754438 -0.01654302 1.03010195 -0.40799451 -0.55849418 -0.07687958 -0.05351510
[8] 1.16924926 1.60452324 -0.08284351 1.66735009 2.18942224 -0.14724334 -0.99999522
[15] -0.12665386 -0.61215464 -0.58919026 1.12894965 -1.36770314 -1.33061090

| You are really on a roll!

|======================= | 28%
| Now that we've isolated the non-missing values of x and put them in y, we can subset y
| as we please.

...

|========================= | 31%
| Recall that the expression y > 0 will give us a vector of logical values the same
| length as y, with TRUEs corresponding to values of y that are greater than zero and
| FALSEs corresponding to values of y that are less than or equal to zero. What do you
| think y[y > 0] will give you?

1: A vector of all the negative elements of y
2: A vector of length 0
3: A vector of TRUEs and FALSEs
4: A vector of all NAs
5: A vector of all the positive elements of y

Selection: 5

| You are quite good my friend!

|=========================== | 33%
| Type y[y > 0] to see that we get all of the positive elements of y, which are also the
| positive elements of our original vector x.

y[y>0]
[1] 1.030102 1.169249 1.604523 1.667350 2.189422 1.128950

| All that practice is paying off!

|============================= | 36%
| You might wonder why we didn't just start with x[x > 0] to isolate the positive
| elements of x. Try that now to see why.

x[x>0]
[1] NA NA NA NA NA NA NA 1.030102 NA
[10] NA NA 1.169249 1.604523 1.667350 NA NA 2.189422 NA
[19] NA NA NA NA NA 1.128950 NA NA

| You are amazing!

|=============================== | 38%
| Since NA is not a value, but rather a placeholder for an unknown quantity, the
| expression NA > 0 evaluates to NA. Hence we get a bunch of NAs mixed in with our
| positive numbers when we do this.

...

|================================= | 41%
| Combining our knowledge of logical operators with our new knowledge of subsetting, we
| could do this -- x[!is.na(x) & x > 0]. Try it out.

x[!is.na(x)&x>0]
[1] 1.030102 1.169249 1.604523 1.667350 2.189422 1.128950

| You are really on a roll!

|=================================== | 44%
| In this case, we request only values of x that are both non-missing AND greater than
| zero.

...

|===================================== | 46%
| I've already shown you how to subset just the first ten values of x using x[1:10]. In
| this case, we're providing a vector of positive integers inside of the square brackets,
| which tells R to return only the elements of x numbered 1 through 10.

...

|======================================= | 49%
| Many programming languages use what's called 'zero-based indexing', which means that
| the first element of a vector is considered element 0. R uses 'one-based indexing',
| which (you guessed it!) means the first element of a vector is considered element 1.

...

|========================================= | 51%
| Can you figure out how we'd subset the 3rd, 5th, and 7th elements of x? Hint -- Use the
| c() function to specify the element numbers as a numeric vector.

x[c(3,5,7)]
[1] NA NA NA

| That's correct!

|=========================================== | 54%
| It's important that when using integer vectors to subset our vector x, we stick with
| the set of indexes {1, 2, ..., 40} since x only has 40 elements. What happens if we ask
| for the zeroth element of x (i.e. x[0])? Give it a try.

x[0]
numeric(0)

| Nice work!

|============================================= | 56%
| As you might expect, we get nothing useful. Unfortunately, R doesn't prevent us from
| doing this. What if we ask for the 3000th element of x? Try it out.

x[3000]
[1] NA

| Nice work!

|=============================================== | 59%
| Again, nothing useful, but R doesn't prevent us from asking for it. This should be a
| cautionary tale. You should always make sure that what you are asking for is within the
| bounds of the vector you're working with.

...

|================================================= | 62%
| What if we're interested in all elements of x EXCEPT the 2nd and 10th? It would be
| pretty tedious to construct a vector containing all numbers 1 through 40 EXCEPT 2 and
| 10.

...

|=================================================== | 64%
| Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] gives us ONLY the 2nd
| and 10th elements of x, x[c(-2, -10)] gives us all elements of x EXCEPT for the 2nd and
| 10 elements. Try x[c(-2, -10)] now to see this.

x[c(-2,-10)]
[1] -0.68754438 NA NA NA NA NA NA
[8] -0.01654302 -0.40799451 -0.55849418 NA -0.07687958 -0.05351510 NA
[15] NA 1.16924926 1.60452324 -0.08284351 1.66735009 NA NA
[22] 2.18942224 -0.14724334 NA NA -0.99999522 NA NA
[29] -0.12665386 -0.61215464 -0.58919026 NA NA 1.12894965 -1.36770314
[36] NA -1.33061090 NA

| That's a job well done!

|===================================================== | 67%
| A shorthand way of specifying multiple negative numbers is to put the negative sign out
| in front of the vector of positive numbers. Type x[-c(2, 10)] to get the exact same
| result.

x[-c(2,10)]
[1] -0.68754438 NA NA NA NA NA NA
[8] -0.01654302 -0.40799451 -0.55849418 NA -0.07687958 -0.05351510 NA
[15] NA 1.16924926 1.60452324 -0.08284351 1.66735009 NA NA
[22] 2.18942224 -0.14724334 NA NA -0.99999522 NA NA
[29] -0.12665386 -0.61215464 -0.58919026 NA NA 1.12894965 -1.36770314
[36] NA -1.33061090 NA

| Your dedication is inspiring!

|======================================================= | 69%
| So far, we've covered three types of index vectors -- logical, positive integer, and
| negative integer. The only remaining type requires us to introduce the concept of
| 'named' elements.

...

|========================================================= | 72%
| Create a numeric vector with three named elements using vect <- c(foo = 11, bar = 2,
| norf = NA).

vect<-c(foo=11,bar=2,norf=NA)

| You are amazing!

|=========================================================== | 74%
| When we print vect to the console, you'll see that each element has a name. Try it out.

vect
foo bar norf
11 2 NA

| That's a job well done!

|============================================================== | 77%
| We can also get the names of vect by passing vect as an argument to the names()
| function. Give that a try.

names(vect)
[1] "foo" "bar" "norf"

| Keep working like that and you'll get there!

|================================================================ | 79%
| Alternatively, we can create an unnamed vector vect2 with c(11, 2, NA). Do that now.

vect2<-c(11,2,NA)

| You are quite good my friend!

|================================================================== | 82%
| Then, we can add the names attribute to vect2 after the fact with names(vect2) <-
| c("foo", "bar", "norf"). Go ahead.

names(vect2)<-c("foo","bar","norf")

| Keep up the great work!

|==================================================================== | 85%
| Now, let's check that vect and vect2 are the same by passing them as arguments to the
| identical() function.

identical(vect,vect2)
[1] TRUE

| Keep up the great work!

|====================================================================== | 87%
| Indeed, vect and vect2 are identical named vectors.

...

|======================================================================== | 90%
| Now, back to the matter of subsetting a vector by named elements. Which of the
| following commands do you think would give us the second element of vect?

1: vect["2"]
2: vect["bar"]
3: vect[bar]

Selection: 2

| That's a job well done!

|========================================================================== | 92%
| Now, try it out.

vect["bar"]
bar
2

| You are doing so well!

|============================================================================ | 95%
| Likewise, we can specify a vector of names with vect[c("foo", "bar")]. Try it out.

vect[c("foo","bar",'norf')]
foo bar norf
11 2 NA

| You're close...I can feel it! Try it again. Or, type info() for more options.

| Use vect[c("foo", "bar")] to get only the elements of vect named "foo" and "bar".

vect[c("foo","bar")]
foo bar
11 2

| You are quite good my friend!

|============================================================================== | 97%
| Now you know all four methods of subsetting data from vectors. Different approaches are
| best in different scenarios and when in doubt, try it out!

...

|================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: No
2: Yes

Selection: 2
What is your email address? xxxxxx@xxxxxxxxxxxx
What is your assignment token? xXxXxxXXxXxxXXXx
Grade submission succeeded!

| You got it right!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

Last updated 2020-10-01 18:19:32.682468 IST

Comments