MUSA 5080 Notes #1

Week 1: Introduction to R and dplyr

Author

Fan Yang

Published

September 8, 2025

Note

Week 1: Introduction to R and dplyr
Date: 09/08/2025

Git & GitHub

Git

GitHub

1. Git

Version control system that tracks changes in files

“Track changes” for code projects
Time machine for your work
Collaboration tool for teams

2. GitHub

Cloud hosting for Git repositories

Backup your work in the cloud
Share projects with others
Deploy websites (like our portfolios)
Collaborate on code projects

3. Key GitHub Concepts

Repository (repo): Folder containing your project files

Commit: Snapshot of your work at a point in time
Push: Send your changes to GitHub cloud
Pull: Get latest changes from GitHub cloud

Markdown Basics

1. Text Formatting

**Bold text**
*Italic text*
***Bold and italic***
`code text`
~~Strikethrough~~

2. Headers

# Main Header
## Section Header
### Subsection Header

3. Lists

## Unordered List
- Item 1
- Item 2
  - Sub-item A
  - Sub-item B

## Ordered List  
1. First item
2. Second item
3. Third item

4. Links and Images

[Link text](https://example.com)
[Link to another page](about.qmd)
![Alt text](path/to/image.png)

Basic R

1. Tibbles better?

# Traditional Data Frame
class(data)
# Convert to tibble
car_data <- as_tibble(data)
class(car_data)

Shows first 10 rows by default
Displays column names
Fits nicely on a screen

2. Dplyr

library(tidyverse)

# Load car sales data
car_data <- read_csv("data/car_sales_data.csv")

# Basic exploration
glimpse(car_data)
names(car_data)

# The power of pipes - read as "then"
car_summary <- data %>%
  filter(`Year of manufacture` >= 2020) %>%      # Recent models only
  select(Manufacturer, Model, Price, Mileage) %>% # Key variables
  mutate(price_k = Price / 1000) %>%             # Convert to thousands
  filter(Mileage < 50000) %>%                    # Low mileage cars
  group_by(Manufacturer) %>%                     # Group by brand
  summarize(                                     # Calculate statistics
    avg_price = mean(price_k, na.rm = TRUE),
    count = n()
  )

Summary

Tip

This week I mainly familiarized myself with several commonly used advanced tools (not limited to classroom use), and also learned the basic usage of R and dplyr functions. I need to reinforce the function usage in a timely manner.