Fan Yang - MUSA 5080
  • Home
  • Weekly Notes
    • Week 1
    • Week 2
    • Week 3
  • Labs
    • Lab 1: Setup Instructions
    • Lab 2: Getting Started with dplyr
    • Lab 3: Data Visualization and EDA
    • Lab 4: Spatial Operations with Pennsylvania Data
  • Assignments
    • Assignment 1: Census Data Quality for Policy Decisions
    • Assignment 2: Spatial Analysis and Visualization

On this page

  • Git & GitHub
  • Markdown Basics
  • Basic R
  • Summary

MUSA 5080 Notes #1

Week 1: Introduction to R and dplyr

Author

Fan Yang

Published

September 8, 2025

Note

Week 1: Introduction to R and dplyr
Date: 09/08/2025


Git & GitHub

Git

GitHub

1. Git

Version control system that tracks changes in files

  • “Track changes” for code projects
  • Time machine for your work
  • Collaboration tool for teams

2. GitHub

Cloud hosting for Git repositories

  • Backup your work in the cloud
  • Share projects with others
  • Deploy websites (like our portfolios)
  • Collaborate on code projects

3. Key GitHub Concepts

Repository (repo): Folder containing your project files

  • Commit: Snapshot of your work at a point in time
  • Push: Send your changes to GitHub cloud
  • Pull: Get latest changes from GitHub cloud

Markdown Basics

1. Text Formatting

**Bold text**
*Italic text*
***Bold and italic***
`code text`
~~Strikethrough~~

2. Headers

# Main Header
## Section Header
### Subsection Header

3. Lists

## Unordered List
- Item 1
- Item 2
  - Sub-item A
  - Sub-item B

## Ordered List  
1. First item
2. Second item
3. Third item

4. Links and Images

[Link text](https://example.com)
[Link to another page](about.qmd)
![Alt text](path/to/image.png)

Basic R

1. Tibbles better?

# Traditional Data Frame
class(data)
# Convert to tibble
car_data <- as_tibble(data)
class(car_data)
  • Shows first 10 rows by default
  • Displays column names
  • Fits nicely on a screen

2. Dplyr

library(tidyverse)

# Load car sales data
car_data <- read_csv("data/car_sales_data.csv")

# Basic exploration
glimpse(car_data)
names(car_data)
# The power of pipes - read as "then"
car_summary <- data %>%
  filter(`Year of manufacture` >= 2020) %>%      # Recent models only
  select(Manufacturer, Model, Price, Mileage) %>% # Key variables
  mutate(price_k = Price / 1000) %>%             # Convert to thousands
  filter(Mileage < 50000) %>%                    # Low mileage cars
  group_by(Manufacturer) %>%                     # Group by brand
  summarize(                                     # Calculate statistics
    avg_price = mean(price_k, na.rm = TRUE),
    count = n()
  )

Summary

Tip

This week I mainly familiarized myself with several commonly used advanced tools (not limited to classroom use), and also learned the basic usage of R and dplyr functions. I need to reinforce the function usage in a timely manner.