D467 DATA APPLICATIONS STUDY GUIDE COMPLETE LATEST UPDATED WESTERN GOVERNORS’ UNIVERSITY
Intro
Computer programming - giving instructions to a computer to perform an action orset of functions.
R Vs. Python
Programming languages
From Spreadsheets to SQL to R:
R- a programming language frequently used for statistical analysis, visualization, and other data analysis.
Programming using Rstudio
The basic concepts:
• Functions – a body of reusable code used to perform specifc tasks in R.
o Argument (R)- information that a function in R needs in order to run.
• Comments – begin with # sign and describe what is taking place
• Variables – a representation of a value in R that can be stored for use later during programming.
o A variable name should start with a letter and can also contain numbers and underscores.
• Data types -
• Vectors – a group of data elements of the same type stored in a sequence in R.
• Pipes – a tool in R for expressing a sequence of multiple operations, represented with “%>%”
data frame - is a collection of columns–similar to a spreadsheet or SQL table.
If you need to manually create a data frame in R, you can use the data.frame() function.
Assignment operators- used to assign valuesto variables and vectors.
Arithmetic operators- used to complete math calculations.
Facet_wrap() is a function used to create subplots
Packages (R) units of reproducible R code.
Packages include:
• Reusable R functions
• Documentation about the functions
• Sample datasets
• Test for checking code
Welcome to Tidyverse
Tidyverse(r) – a system of packages in R with common design philosophy for data manipulation, exploration, and
visualization.
Conflicts happen when packages have functions with the same names as other functions.
8 core tidyverse packages:
• Ggplot2
• Tidyr
• Readr
• Dplyr
• Tibble – works with data frames
• Purr- works with functions and vectors
• Stringr – includes functions to work with strings
• Forecats – provides tools to solve common problems with factors
Factors(R) – store categorical data in R where the data values are limited and usually based on
definatete group like country or year.
Four packagesthat are essential to data analysts are:
Ggplot2- used for visualizations
Dplyr- used for data manipulation, Containsfunctions
Tidyr- used for cleaning data
Category | Study Material |
Comments | 0 |
Rating | |
Sales | 0 |