Session 4: Git Fundamentals
Tracking your own work
Want a PDF for note-taking? Open the slides in your browser, append ?print-pdf to the URL, and use File → Print → Save as PDF. Reveal.js handles the layout. Works in Chrome, Edge, and Firefox.
Why Version Control?
Version control is a systematic record of how your code changes. It matters in three situations every applied economist now faces.
Yourself over time
If your project folder has ever looked like this:
analysis_v1.R
analysis_v2.R
analysis_FINAL.R
analysis_FINAL_v2.R
analysis_REALLY_FINAL.R
analysis_FINAL_USE_THIS.R
…then you already know the cost. You made choices you cannot reconstruct. You changed something that worked and cannot remember what.
Git replaces this chaos with a clean history of snapshots. Each snapshot (called a commit) records exactly what changed, when, and why. You can return to any previous version at any time.
This matters especially for managing paper submissions. Every paper moves through several versions of code: the first submission, the R&R revision, the final accepted version, the posted replication package. A reviewer asks, three months after you submitted, “what exactly did you do for the coefficient in Table 2?” Without version control, answering that question means digging through emails and folders. With Git, you mark each submission with a tag (covered in Session 5) and return to it in one command, even years later.
You and AI coding tools
Increasingly you will write research code alongside AI coding assistants: Claude Code, GitHub Copilot, Cursor, Codex, and similar tools. These systems produce dozens of lines in seconds. Most of it is useful; some of it is subtly wrong in ways you will not catch on first read.
Version control is what makes this workflow safe. Every AI-assisted change becomes a commit you can review before accepting, revert cleanly when a test breaks three hours later, or roll back selectively while keeping the parts that worked. It also produces a written record of what came from a model and what you wrote yourself. That record is increasingly expected by journal disclosure policies and by AEA replication review.
What Is Git?
Git is a distributed version control system. It takes a snapshot of your entire project every time you commit. Think of it as a timeline:
v1: Load data → v2: Clean variables → v3: Run regression
Each point on the timeline is a commit. Git stores the full history locally on your computer.
Git ≠ GitHub. Git is the tool that runs on your computer. GitHub is a cloud hosting service where you can store and share your Git repositories online. We cover GitHub in Session 5.
Situations Where Git Helps
Before we set up Git, it helps to see what using it looks like in practice. These scenarios introduce the vocabulary you will meet throughout both sessions, in context rather than abstractly.
Working solo, months later. You wrote clean_data.R in February. It is now August, and a coefficient changed since the results you reported at a seminar in April. What did you change? With Git, every save-point is a commit. The list of commits is your log. The difference between any two versions is a diff. You find the change, understand it, and decide whether it was right.
Refactoring clunky code. Your analysis.R has grown to 600 lines and does everything: data cleaning, regressions, tables, figures. Adding anything new is painful. You want to split it into three cleaner scripts with one job each, but the refactor touches nearly every line of the project and risks introducing bugs that quietly change your results. With Git, you create a branch called restructure-scripts, do the surgery there, and verify that the refactored code produces identical output before committing to it. If it works, you merge the branch back. If it breaks something you cannot diagnose, you delete the branch and your old working code is untouched.
Working with a co-author. You are cleaning data; your co-author is writing the robustness checks. Both of you need to edit the same project simultaneously. Each of you keeps your own clone of the project. You push your changes to a shared remote on GitHub. Your co-author pulls your changes into their copy. If you both edited the same line, Git surfaces a merge conflict so one of you decides the final version. Nobody’s work is silently overwritten.
Returning to a paper revision. You submitted to a journal in March. The R&R comes back in August. You need to run new analyses but also reproduce the figures from the submitted version if a referee asks. With Git, you tag the submitted code with a name like v1.0-first-submission. A single git checkout brings it back to your screen, years later if needed.
Recovering from a mistake. You edit run_regression.R for three hours, save, close, and then realize you deleted something you needed. If the earlier version was committed, Git lets you restore the file to that commit. If you committed the deletion too, you undo the commit with reset. Git is forgiving, but only for states it knows about, which is why committing often matters.
Working with AI coding tools. You ask Claude Code or Copilot to refactor a function. It returns 40 lines of new code: some better than yours, some subtly wrong. With Git, you review the AI’s suggestion as a diff, stage only the lines that are good, and revert the commit cleanly if something breaks hours later. Without Git, AI-assisted edits are a gamble.
Each of the bolded terms is covered in detail below. For now, the point is that Git is not one workflow: it is a toolkit for different research situations.
Setup
You should already have Git installed from the pre-class setup on the Preliminaries page. We now need to verify the installation and tell Git who you are, so it can attach your name to every change you make. We go step by step.
Step 1: Open your terminal
Every command in this session is typed into a terminal: a program that lets you send instructions to your computer as text.
- On Mac: open the Terminal app. Find it by pressing
Cmd+Spaceto open Spotlight, typing “Terminal”, and pressing Enter. - On Windows: open Git Bash (installed together with Git). Find it by pressing the Windows key, typing “Git Bash”, and pressing Enter.
A window opens with a few lines of text ending in a prompt that looks like yourname@yourlaptop ~ $ on Mac or yourname@yourlaptop MINGW64 ~ $ on Windows. The $ is where your typing goes. You write a command after it and press Enter to run it.
Keep this window open for the rest of the session.
Step 2: Confirm Git is installed
At the prompt, type the following and press Enter:
git --versionYou should see output like:
git version 2.39.0
The exact version number will differ. As long as the output begins with git version, Git is installed and you can move on. If instead you see command not found or a similar error, Git is not installed on this machine. Go back to the Preliminaries page and follow the install instructions before continuing.
Step 3: Check whether your identity is already set
Git attaches a name and email to every change you make. Before setting these values, check whether they are already set from earlier coursework or another project.
At the prompt, type the following, pressing Enter after each line:
git config --global user.name
git config --global user.emailIf both commands print a name and an email (one per command), Git already knows who you are. You can skip to The Three Areas of Git below.
If either command prints nothing (a blank line, or an empty prompt), continue to Step 4.
Step 4: Set your identity
Both commands use the --global flag, which means the setting is stored once per laptop and applied automatically to every Git project on this machine. You do this once; you do not repeat it for each project.
Replace "Your Name" and "you@email.com" with your actual name and email, then run each command:
git config --global user.name "Your Name"
git config --global user.email "you@email.com"You will not see any output after either command. Silence means success. Git now knows who you are.
If you plan to push your work to public GitHub repositories, the email you set above will be visible in the commit log to anyone who views the repo. If you prefer to keep your real email private, GitHub provides a stable noreply address you can use instead. We cover the setup in Session 5.
Step 5: Verify the settings
Run the two check commands from Step 3 again:
git config --global user.name
git config --global user.emailBoth should now return the name and email you just set. Setup is complete.
git config --global --listThis prints your full global configuration (name, email, default editor, default branch name, and any other settings). Useful for diagnosing “why is Git acting strange” later.
The Three Areas of Git
Every Git project has three areas. Understanding them is the key mental model:
| Area | What it is | How you interact |
|---|---|---|
| Working Directory | The files you see and edit | You edit files normally |
| Staging Area | A holding zone for the next commit | git add moves files here |
| Repository | The permanent history of commits | git commit saves a snapshot |
The workflow is: edit → stage → commit.
Think of Git’s workflow as online shopping.
- Working directory is browsing the store and dropping items into your cart. You can add, remove, change your mind freely.
- Staging area is the checkout page where you review your cart before ordering. You can still remove things or add more.
- Repository is your order history. Every order is timestamped and permanent.
The command mapping is straightforward:
git addmoves a file to the checkout page.git restore --stagedremoves a file from the checkout page (we cover this later).git commitpresses Place Order. You get a confirmation number (the commit hash), and the order joins your permanent history.git logis your order history page.
The staging step exists for the same reason stores have a checkout review: before you make anything permanent, you want to see exactly what you are about to commit to.
Your First Repository
Before creating your first repository, make sure you are not inside Dropbox, Box, iCloud Drive, OneDrive, or Google Drive. These services can race against Git’s internal file writes and silently corrupt your repository. GitHub itself is the cloud backup for Git; layering another cloud sync on top creates conflicts.
A common convention is to keep all Git projects in a single dedicated folder in your home directory, for example ~/github/. Each project is a subfolder of that.
If you do not have this folder yet, create it now:
mkdir ~/github
cd ~/githubFrom now on, all mkdir some-new-project commands in this tutorial assume you are inside ~/github/.
Step 1: Create a project folder
At the prompt (still inside ~/github/ from the callout above), type the following two commands, pressing Enter after each:
mkdir my-research
cd my-researchmkdir my-researchcreates a new empty folder calledmy-research.cd my-researchchanges your current location to inside that folder.
Both commands produce no output. Your prompt should now end with my-research, something like yourname@yourlaptop my-research $. That ending signals you are now inside the project folder.
Step 2: Initialize Git
At the prompt, type:
git initYou should see output similar to:
Initialized empty Git repository in /Users/yourname/github/my-research/.git/
This command created a hidden .git/ folder inside my-research. That folder is where Git stores all version history, configurations, and internal bookkeeping. You never open it or touch its contents directly.
Do not delete the .git/ folder. Deleting it erases all the history of your project. The folder is hidden by default, so you are unlikely to find it by accident, but if you ever see it in a file browser with “show hidden files” turned on, leave it alone.
Step 3: Check the status
At the prompt, type:
git statusYou should see output that looks roughly like this:
On branch main
No commits yet
nothing to commit (create/copy files and use "git add" to track)
Three things to notice:
- You are on branch
main. Every Git repository has at least one branch, and the default branch is calledmain. - No commits yet. Git knows the folder is now a repository, but you have not saved any snapshots of work yet.
- Nothing to commit. The folder contains no files yet, so Git has nothing to track.
This is your starting point. In the next section, you will create a file and take the first snapshot.
Staging and Committing
Create a file
macOS TextEdit defaults to rich text (RTF) and silently adds .txt to your filenames, so clean_data.R becomes clean_data.R.txt with invisible formatting markup inside. Windows Notepad has similar encoding and line-ending issues that break code files.
Use RStudio (File → New File → R Script) or VS Code (File → New File) instead. Both are installed as part of the Preliminaries setup and handle plain-text code files correctly. For Terminal users, running code . inside your project folder opens the whole folder in VS Code in one step.
We create the file in two steps: first open the project in your editor, then create the file. Opening the project first is what lets your editor see the Git state live. Pick whichever editor you want to use:
Open the folder in VS Code. Launch VS Code (from Applications, Spotlight, or the Start menu). Use File → Open Folder, browse to my-research, and click Open.
VS Code opens a new window showing the contents of my-research in the left file explorer. The Source Control panel (Source Control icon (three-node fork shape) in the left activity bar) now displays the Git state for this repository.
code .
Once you work with Git often, clicking through File → Open Folder each time becomes tedious. VS Code provides a shortcut: from your terminal, inside any folder, type code . and press Enter. The . means “this folder”, and VS Code opens directly into it.
Why this is useful now. You are already navigating to folders in your terminal to run Git commands. Being able to jump into the editor from that same spot, without switching to the File menu, keeps your workflow in one place. It also guarantees you open the exact folder you are currently in, so you cannot accidentally open the wrong one.
Why this pays off later. Once you eventually SSH into Cornell’s CAC cluster, FSRDC secure computing environments, or a cloud VM to run analyses too large for your laptop, VS Code’s Remote SSH extension lets you type code . inside the remote terminal and have your local VS Code window show and edit files that actually live on the remote machine. The editor feels local; the files are remote. This workflow only works if the code command is on your PATH. Setting it up now means it just works the first time you need it, possibly years from now.
One-time setup:
- Open VS Code.
- Press
Cmd+Shift+P(Mac) orCtrl+Shift+P(Windows/Linux) to open the Command Palette. - Start typing
Shell Command. Select Shell Command: Install ‘code’ command in PATH. - Close and reopen your terminal window (the old one caches the PATH).
Now, from any terminal, code . opens the current folder in VS Code. If you skip this setup, File → Open Folder still works fine.
Create the file. In VS Code, use File → New File (or Cmd+N / Ctrl+N). Paste the content below. Use File → Save As and name the file clean_data.R, making sure it saves inside the my-research folder.
Open the folder as an RStudio Project. RStudio needs a Project (a .Rproj file) for Git integration to appear. Go to File → New Project → Existing Directory, browse to your my-research folder, and click Create Project. RStudio restarts in project mode. The Git pane appears in the top-right, connected to your local repository.
Create the file. Use File → New File → R Script. Paste the content below. Save with File → Save, naming the file clean_data.R. RStudio saves it inside the project folder by default.
If you are not using an editor, open the file in any plain-text editor you have (not TextEdit or Notepad; see the warning above). Paste the content below and save as clean_data.R inside my-research.
The content for the file:
# clean_data.R
# Load and clean the wage dataset
# Uses wooldridge::wage1 (Jeffrey Wooldridge's teaching dataset; 526 obs)
wages <- wooldridge::wage1
# Remove observations with missing wages (none in wage1, but real data always needs this)
wages <- wages[!is.na(wages$wage), ]
# Log transformation
wages$log_wage <- log(wages$wage)After saving, verify back in your Terminal (still inside my-research):
lsYou should see clean_data.R listed. Now run:
git statusYou should see something like:
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
clean_data.R
nothing added to commit but untracked files present (use "git add" to track)
clean_data.R appears under Untracked files. Git has noticed the file exists but is not yet tracking it. The next step stages it.
If you have my-research open in VS Code, clean_data.R appears in the file tree marked with a green U (untracked). In RStudio, the Git pane lists it with a yellow ? icon. You do not need to do anything in the editor; it already reflects what git status just told you.
Stage the file
From here on, command blocks appear in tabs: Terminal, RStudio, and VS Code. All three do exactly the same thing. In class we walk through the Terminal tab on the projector.
Why Terminal for teaching? Four reasons worth naming up front.
The mental model is cleaner.
edit → stage → commitis three explicit commands. In a GUI, staging is often just a checkbox, and students never really register what the operation is until something breaks. A rigorous researcher needs to understand the operation, not just the button.Terminal Git works everywhere. GUIs do not. You will eventually use Cornell’s CAC cluster, FSRDC secure computing environments, AWS or Google Cloud VMs, or any other remote machine. None of those have a GUI. A researcher who only knows the RStudio Git pane is stuck the first time they work with a dataset that does not fit on their laptop.
Documentation, error messages, and AI tools all speak command-line. Every answer on Stack Overflow, every chapter of the Pro Git book and Happy Git with R, every response from Claude Code, GitHub Copilot, and Cursor will tell you to run a Git command. If the only Git you know is “I clicked the button in RStudio”, you cannot use any of those resources to get unstuck.
Git is scriptable. Researchers use
system("git log --pretty=format:%h")inside R to embed a commit hash in an output file for reproducibility. They rungit archiveinside a Makefile to package data. They callgit tagautomatically from a submission script at paper-submission time. None of this is available through a GUI.
The bottom line. We teach commands in class so you have the vocabulary, the portability, and the safety net when things go wrong. Once you understand what each command does, feel free to use RStudio or VS Code for daily work. The tabs on this page show the equivalent actions in each interface.
git add clean_data.RClick Commit in the Git pane (top-right of the IDE). The Review Changes dialog opens. Select clean_data.R in the top pane. The bottom pane shows your diff. Click the Stage button, or tick the checkbox next to the file. It moves from the Unstaged list to the Staged list.
Quick shortcut. If you already know what you are staging and do not need to see the diff, tick the Staged checkbox directly in the Git pane without opening the dialog.
No Git pane? The project must be an RStudio Project (.Rproj) inside a Git repository. Tools → Project Options → Git/SVN → Version control system: Git, then restart RStudio.
Open the Source Control panel (Source Control icon (three-node fork shape) in the left activity bar, or ⌃⇧G / Ctrl+Shift+G). Under Changes, click clean_data.R. A diff view opens in the editor showing your changes. In the Source Control panel, hover over clean_data.R and click the + icon to stage. The file moves from Changes to Staged Changes.
Quick shortcut. If you already know what you are staging, click the + directly without previewing the diff.
All three do the same thing: they tell Git “include this file in the next commit.”
Check the status again
At the prompt, run:
git statusYou should see output like:
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: clean_data.R
The key line is new file: clean_data.R under “Changes to be committed”. This confirms the file is staged, that is, ready to be included in the next commit.
If you have VS Code or RStudio open on this folder right now, they are already displaying everything git status just told you. You do not need to run the command to know the state of your repo; your editor is watching.
VS Code. The Source Control panel (branch icon, left activity bar) lists all modified and staged files, updating live. The file tree shows letters next to each file: U for untracked, M for modified, A for added/staged. Open a file and the gutter next to the line numbers shows green/blue/red bars for added, modified, and deleted lines compared to the last commit.
RStudio. The Git pane (top-right) shows the same information with status icons. It refreshes automatically every time you save a file or run a Git command in the Terminal.
This is one of the underrated benefits of keeping an editor open on your project: the Git state is ambient, not something you have to query. You can run Terminal commands in class and watch the editor reflect each change instantly.
Commit
At the prompt, run:
git commit -m "Add data cleaning script"You should see output like:
[main (root-commit) a1b2c3d] Add data cleaning script
1 file changed, 10 insertions(+)
create mode 100644 clean_data.R
A few details to notice:
- The first line shows the branch (
main), a note that this is the root commit (the very first commit in the repository), and a short commit hash (a1b2c3din the example; yours will differ). - The following lines summarize what changed: one file added with ten new lines.
- The
-mflag lets you write the commit message inline without opening an editor. Every commit needs a message. The message explains why the change happened.
Run git status again:
git statusYou should now see:
On branch main
nothing to commit, working tree clean
“Working tree clean” means every change in your project folder has been committed. This is the state you want to leave your project in at the end of each work session.
In VS Code, clean_data.R disappears from the Source Control panel. The Staged and Changes lists are both empty. In RStudio, the Git pane empties out. Empty panels mean there is nothing pending, which matches “working tree clean” in the Terminal.
Make another change and commit
The workflow repeats for every change. Let us practice once.
Open clean_data.R in your editor (RStudio or VS Code). Add this line at the end of the file:
# Keep only workers with at least a high school education
wages <- wages[wages$educ >= 12, ]Save the file.
Back at the Terminal prompt, stage the change and commit it:
git add clean_data.R
git commit -m "Filter to workers with high school education or more"You should see a commit message similar to before, this time reporting one file changed and two lines added. Run git status to confirm the working tree is clean again.
You now have two commits in the project history. You can browse them visually in your editor.
RStudio. In the Git pane (top-right), click the History button (clock icon). A dialog opens listing every commit with its author, date, and message. Click any commit to see its diff in the lower pane. Both of your commits are there, newest on top.
VS Code. Click on clean_data.R in the file explorer to open it. In the explorer sidebar, expand the Timeline section at the bottom. It lists every version of this file in the commit history, with your commit messages as labels. Click any entry to open the diff between that version and the next. For a full repo-wide commit graph, install the free Git Graph or GitLens extension from the Extensions marketplace.
Writing good commits
A commit has two parts: what changed (the files you staged) and why (the message). Both matter.
Be specific in the message. “Fix bug” is unhelpful. “Fix off-by-one error in sample selection” is useful. Your future self, three months from now, will thank you. Your co-authors will thank you even more.
One logical change per commit. If you need the word “and” to describe what a commit does, it is probably two commits. Good messages read like a research changelog:
Add control for state fixed effects
Switch to winsorized outcome at 1%
Fix sample filter for pre-2000 observations
Mixing unrelated changes in one commit makes it impossible to revert just one of them later.
Back to the shopping cart. Each commit is an order. If an order contains one coherent purchase (all the groceries for the week), returning it makes sense. If it contains a book, a stapler, and a bag of flour lumped together, returning just one item is painful. Before committing, ask yourself: if I had to undo just this commit six months from now, would that make sense as a unit?
Different styles exist. Researchers differ in how granular their commits are:
- Atomic commits. Every small change is a commit. Very detailed history. Easy to review and revert. Standard in open source software. Recommended by Jenny Bryan in Happy Git with R, the canonical Git guide for R users.
- Feature commits. One commit per completed task, for example “baseline regression table done”. Bigger commits but still coherent. More common in solo research.
- End-of-session commits. One commit per day, bundling everything. Tempting but loses the ability to audit what you did and where.
For PhD research, aim between atomic and feature. Commit after each logical step in your workflow: data pulled, sample selected, variables built, model estimated, figure produced. Roughly one commit per line you would write in a research diary.
When to commit. Whenever the code runs and produces the intended result after your change. Also before starting something risky (a refactor, an alternative specification), and before you stop working for the day.
Avoid these messages. "Fixed stuff", "WIP", "Lots of changes", ".". They mean you skipped thinking about what you changed. Either split the commit into pieces with meaningful messages, or think for another ten seconds about what to write.
Exercise 1: Add a Second Script and Commit It
Time: ~5 minutes. Work in your existing my-research project.
So far you have been typing along with the walkthrough. In this exercise you apply the same loop independently with a new file. Pick the interface you want to use for the rest of the course and practice the full workflow in it: create a file, stage it, commit it, check the log.
The task
Add a second R script to my-research called run_regression.R with the content below. Stage it, commit it with a meaningful message, then view the resulting history.
# run_regression.R
# Estimate a Mincer wage equation on wooldridge::wage1
wages <- wooldridge::wage1
# Basic OLS
model1 <- lm(log(wage) ~ educ + exper + tenure, data = wages)
summary(model1)How to do it in each interface
- Open
run_regression.Rin your editor (RStudio or VS Code) and save it insidemy-researchwith the content above. - Back at the prompt, confirm Git sees it:
git statusYou should see run_regression.R under Untracked files.
- Stage, commit, and check the log:
git add run_regression.R
git commit -m "Add baseline Mincer wage regression"
git log --onelineYou should see your new commit on top, followed by the commits from the walkthrough. Four commits total (or more, depending on how many you accumulated).
- With
my-researchopen as an RStudio Project, use File → New File → R Script. Paste the content. File → Save and name itrun_regression.R. - The Git pane (top-right) now lists
run_regression.Ras untracked (yellow?). - Click the Commit button in the Git pane. The Review Changes dialog opens.
- Tick the Staged checkbox next to
run_regression.R(or click the Stage button). - In the message box at the top, type
Add baseline Mincer wage regression. - Click Commit. A dialog confirms the commit was made; close it.
- Click the History button (clock icon) in the Git pane. You should see your new commit on top, followed by the walkthrough commits.
- With
my-researchopen as a VS Code folder, use File → New File. Paste the content. Save with File → Save As and name itrun_regression.Rinside themy-researchfolder. - In the Source Control panel (branch icon, left activity bar),
run_regression.Rappears under Changes. - Hover over
run_regression.Rand click the+icon to stage it. It moves to Staged Changes. - In the message box at the top of the Source Control panel, type
Add baseline Mincer wage regression. - Press
Cmd+Enter(or click the checkmark icon) to commit. - Open
run_regression.Rand expand the Timeline section in the file explorer. Your new commit appears at the top.
Expected result
However you did it, at the end your repository has one additional commit. Running git log --oneline in the Terminal should produce something like:
a9b8c7d Add baseline Mincer wage regression
e4d5c6b Filter to workers with high school education or more
2c5e8a1 Add data cleaning script
Newest on top. Three commits, one per logical change, each with a meaningful message. This is the habit we want.
Viewing History
You now have three commits in your project. Git offers two commands to explore that history: git log lists commits, and git diff shows changes between them.
git log: the commit list
At the prompt, run:
git logYou should see output like:
commit 7f3a2b1c9e5d4f6a8b2c1d3e4f5a6b7c8d9e0f1a (HEAD -> main)
Author: Your Name <you@email.com>
Date: Fri Apr 18 15:30:22 2026 -0400
Filter to workers with high school education or more
commit 2c5e8a1b3d9f4e7a6b2c1d3e4f5a6b7c8d9e0f1a
Author: Your Name <you@email.com>
Date: Fri Apr 18 15:25:10 2026 -0400
Add data cleaning script
Each commit entry shows four fields: the commit hash (a 40-character unique ID), the author, the date, and the message. Your hashes will differ; they are generated from the content and metadata of each commit.
HEAD?
In the output above, the newest commit is annotated (HEAD -> main). HEAD is Git’s pointer to your current commit. Think of it as a bookmark: it marks where you are in the repository’s history.
- When you commit,
HEADmoves forward to include the new commit. - When you undo a commit (
git reset --soft HEAD~1, covered below),HEADmoves backward. HEAD~1means “one commit beforeHEAD”;HEAD~2means two before, and so on.
HEAD -> main means “HEAD points to the tip of the branch called main.” In the undoing-mistakes diagrams below, ← HEAD marks the commit you are currently on.
If your history fills more than one screen, Git opens the output in a pager program called less. The screen fills and the prompt disappears. Navigate with the arrow keys, Page Down, or Space. To exit and return to your prompt, press q. This is one of the most common “I am stuck in Git” moments for new users.
For a compact one-line-per-commit view:
git log --onelineOutput:
7f3a2b1 Filter to workers with high school education or more
2c5e8a1 Add data cleaning script
The seven-character hashes are short prefixes of the full ID. They are unique enough to refer to a specific commit in a small repository and much easier to read.
git diff: what changed
git diff shows what you have changed in the working directory since the last commit. Run it now:
git diffYou should see no output, and the prompt returns right away. Empty output means the working tree is clean: nothing has been modified since the last commit, so there is nothing to diff.
To see git diff in action, open clean_data.R in your editor and add one line at the end of the file:
cat("Rows after cleaning:", nrow(wages), "\n")Save the file. Back in the Terminal, run git diff again:
git diffYou should now see output like:
diff --git a/clean_data.R b/clean_data.R
index 1a2b3c4..5d6e7f8 100644
--- a/clean_data.R
+++ b/clean_data.R
@@ -9,3 +9,5 @@ wages <- wages[!is.na(wages$wage), ]
# Log transformation
wages$log_wage <- log(wages$wage)
+
+cat("Rows after cleaning:", nrow(wages), "\n")Lines starting with + are added. Lines starting with - are removed (none in this example). The @@ line identifies where in the file the change begins. Git calls this a unified diff; it is the same format used by Stack Overflow, email patches, and GitHub pull requests.
git diff vs. git diff –staged
Git separates the two halves of your change: what is in the working directory (edits you have saved on disk) versus what is in the staging area (edits you have marked with git add). Each half has its own diff command.
git diffshows the changes in the working directory that are not yet staged.git diff --stagedshows the changes that are staged and will go into the next commit.
Run the staged-diff command now, before staging anything:
git diff --stagedYou will see no output. That is expected: nothing has been staged yet. The difference between the last commit and what is currently staged is zero.
Now stage the line you added:
git add clean_data.RRe-run both diff commands:
git diffEmpty. Your edit is no longer in the working directory relative to the staging area; it has moved to the staging area.
git diff --stagedNow you see the + line that is waiting to be committed. The exact same diff you saw a moment ago, but it moved from the “unstaged” view to the “staged” view as soon as you ran git add.
This is the practical value of git diff --staged: it is the command you run right before git commit to see exactly what will go into the next commit. Final review.
Commit it
Since we like this edit, commit it:
git commit -m "Log number of rows after cleaning"You should now have three commits. Verify:
git log --onelineThree entries, newest on top.
The same history and diffs are available in your editor, in visual form.
RStudio. The History button in the Git pane opens the same commit list, with click-to-diff on each entry. The Review Changes dialog shows diffs side-by-side and color-coded, which is easier to read than the Terminal output for anything longer than a few lines.
VS Code. The Timeline section in the file explorer shows per-file version history. Click any entry to see the diff. Any open file also displays a +/- gutter on the left, marking modified, added, or deleted lines compared to the last commit in real time as you type.
.gitignore
Not every file belongs in version control. Large data files, sensitive credentials, and system files should be excluded. Git uses a special file called .gitignore to know which patterns to skip.
Create the file
The file must be named exactly .gitignore, starting with a dot and with no extension after it. It lives in the top level of your project folder (the same folder as clean_data.R).
From the prompt, inside my-research, create an empty .gitignore file with touch:
touch .gitignoretouch creates an empty file with the given name. You will see no output; silence means success. Confirm with:
ls -aThe -a flag shows hidden files (those starting with a dot). You should see .gitignore in the listing. Now open it in your editor to add content:
code .gitignoreIf you did not set up the code command, open it manually through RStudio or VS Code’s file explorer.
In RStudio (with my-research open as a Project), use File → New File → Text File. A blank untyped document opens in the editor. Use File → Save As and enter the filename as .gitignore (starting with a dot, nothing else). Save inside my-research. RStudio may warn about files whose names begin with a dot; accept the warning and save.
In the Explorer sidebar (with my-research open as the workspace), hover over the top row showing the folder name MY-RESEARCH. A row of small icons appears to the right. Click the New File icon (a page with a +). Type .gitignore as the name and press Enter. The file is created and opens for editing.
On macOS, files whose names begin with a dot are hidden in Finder by default. Do not be alarmed if .gitignore does not appear when you browse the project folder in Finder. The Terminal (ls -a) and your code editors both show it normally.
Add the patterns
Before pasting patterns, three bits of .gitignore syntax:
*is a wildcard that matches any characters in a filename, so*.csvmeans “any file ending in.csv.”- A trailing
/marks a directory;data/ignores the folderdataand everything inside it. - A pattern without a
/matches at any depth in the project. A pattern beginning with/anchors it to the project root.
In the .gitignore file you just opened, paste the following content and save:
# Data files (too large or sensitive for Git)
*.csv
*.dta
data/
# R artifacts
.Rhistory
.RData
.Rproj.user/
# System files
.DS_Store
Thumbs.dbEach line is a pattern:
*.csvmatches any.csvfile anywhere in the project.data/matches any folder calleddata, including all its contents..DS_Storematches exactly that filename (macOS folder metadata).- Lines starting with
#are comments, ignored by Git.
Commit the .gitignore file
Back in the Terminal (still in my-research), stage and commit:
git add .gitignore
git commit -m "Add .gitignore for data and R artifacts".gitignore is itself a tracked file in your repository. Unlike the files it excludes, it belongs in Git: your collaborators need to see the same ignore patterns you do.
Data files do not belong in Git. Git is designed for code (small text files), not for large datasets. If you commit a 500 MB CSV, every collaborator will have to download it with the full repository. Use a shared drive, Dropbox, or a data repository for large files.
Linking to data that lives elsewhere
You have told Git to ignore your data folder. But your code still needs to read the data. In practice most applied economists split their project files across two places:
- Code in
~/github/my-research/(this Git project, not on Dropbox). - Data in a cloud-synced folder like
~/Dropbox/projects/my-research/data/(backed up, shared across machines, not version-controlled with Git).
The cleanest bridge between the two is a symbolic link (or symlink): a tiny pointer file that looks like a folder but actually redirects to another location. You put a symlink called data inside your project that points to your Dropbox data folder. Your code reads from data/wages.csv, and the operating system transparently serves it from Dropbox.
Think of it as a signpost: data lives here (in the project folder) but actually points there (to Dropbox). R, Python, Stata, and almost every other tool follow symlinks automatically without knowing anything is unusual.
The exact command depends on your operating system. Run this once per project.
First, navigate to your project folder from the Terminal. This is essential: ln -s creates the symlink in whatever folder you are currently in.
cd ~/github/my-researchThen create the symlink:
ln -s ~/Dropbox/projects/my-research/data dataThis creates a file called data inside my-research that points to ~/Dropbox/projects/my-research/data. Verify:
ls data/You should see the contents of the Dropbox folder listed, as if they were inside my-research.
On Windows, the equivalent is a junction, which does not require administrator rights (unlike true symlinks on Windows). Open Command Prompt (not Git Bash) and first navigate to your project folder. Junctions are created in the current directory, so this step matters.
cd %USERPROFILE%\github\my-researchThen create the junction:
mklink /J data "%USERPROFILE%\Dropbox\projects\my-research\data"mklink /J creates a directory junction called data inside my-research, pointing to the Dropbox folder. It behaves identically to a symlink for reading files.
Both operating systems offer menu-based alternatives: Make Alias on Mac, Create Shortcut on Windows. Do not use these for this purpose. Aliases and shortcuts are recognized only by the graphical file manager; programs like R do not follow them. Use the Terminal commands above instead.
A few notes.
First, add data to your .gitignore if it is not there already. The symlink itself is a small file, but you do not want Git to track a pointer to a Dropbox path that will differ on every collaborator’s machine. The existing data/ line in the .gitignore above already covers this.
Second, each collaborator creates their own symlink with their own Dropbox path. The symlink is not shared through the repo; that is by design. You and your co-author may both have a data/ folder in your projects, but pointing to different absolute locations on your laptops.
Third, symlinks make your project portable only for the person who created them. If a stranger clones the repo, they need both the code and the data, and a way to connect them. For published replication packages, you include the data directly in the archive or link to it by a citation, not by a symlink.
Undoing Mistakes
You will make mistakes. Git has tools to recover from the common ones. Four operations cover almost every situation you will hit in the first year.
Each operation below opens with a small before/after diagram using A → B → C ← HEAD notation. Letters are commits in chronological order (A is oldest, C is newest), and each arrow points from an earlier commit to the next one. HEAD is Git’s pointer to your current commit; see the HEAD callout in Viewing History for a full explanation if you skipped it.
| Operation | When to use it | Command |
|---|---|---|
| Unstage a file | You ran git add by accident; want to pull back from staging |
git restore --staged <file> |
| Discard local changes | You edited a file and want to throw the edit away before committing | git restore <file> |
| Undo the last commit | You committed too early; want to un-commit but keep the work staged | git reset --soft HEAD~1 |
| Return to an earlier version | You want the version of a specific file from a previous commit | git checkout <hash> -- <file> |
The first three operate on recent state (staging area, working directory, last commit). The fourth reaches into history. Below we walk through each with Terminal, RStudio, and VS Code versions.
Unstage a file
Scenario. You ran git add on a file you did not mean to include in the next commit. The file itself is fine; you just want to pull it out of the staging area so it does not get committed.
git restore --staged clean_data.RNo output. Run git status to confirm: the file is now listed as “modified” or “untracked” rather than under “Changes to be committed”.
In the Git pane (top-right), find the file in the list and uncheck the box under the Staged column. The file moves from Staged back to Modified.
Alternatively, from the Review Changes dialog (click Commit to open), select the file in the top pane and click the Unstage button.
In the Source Control panel, find the file under Staged Changes. Hover over the row; a - icon (minus) appears to the right. Click it. The file moves back down to Changes.
Alternatively, right-click the staged file and select Unstage Changes.
Discard local changes
Scenario. You edited a file, decided the change is wrong, and want to revert to the last committed version.
This operation is destructive. Uncommitted edits cannot be recovered. If you might want the edit later, commit it first (even with a placeholder message like “WIP: exploring alternative spec”). Once committed, you can always roll back to a previous version.
git restore clean_data.RGit overwrites the working-directory version of the file with the last committed version. Your edits are gone.
In the Git pane, select the file. Click More → Discard in the pane’s toolbar, or right-click the file and choose Revert. RStudio asks you to confirm. Click Yes. The file returns to its last committed state.
In the Source Control panel, find the file under Changes. Hover over it; a ↺ icon (discard) appears. Click it. Confirm in the dialog. The edits are discarded.
Alternatively, right-click the file and select Discard Changes.
Undo the last commit (keep the changes)
Scenario. You committed too early. You want to pull the commit back so you can edit more, then recommit with a better message or a different scope. The file changes should stay in your staging area, ready to commit again.
Run:
git reset --soft HEAD~1HEAD~1 means “one commit before the current commit”. The --soft flag tells Git: undo the commit, but leave all the file changes exactly as they were before the commit, staged and ready.
Run git status after. You should see the changes from the undone commit listed under “Changes to be committed”. No file content is lost.
RStudio’s Git pane does not expose soft reset as a button, but RStudio includes a built-in Terminal tab where you can run the command directly.
- Open the Terminal tab.
Tools → Terminal → New Terminal, or keyboard shortcutShift+Alt+T(Windows) /Shift+Option+T(Mac). - The terminal opens inside your project folder. Run:
git reset --soft HEAD~1The Git pane refreshes automatically and shows the files from the undone commit back under the Staged section.
VS Code’s Source Control panel does not expose soft reset as a button, but VS Code has a built-in integrated terminal where you can run the command.
- Open the integrated terminal. View → Terminal, or keyboard shortcut
Ctrl+`(backtick). - The terminal opens inside your project folder. Run:
git reset --soft HEAD~1The Source Control panel refreshes automatically and shows the files from the undone commit back under the Staged Changes section.
Alternative with GitLens extension. If you install the popular free GitLens extension, it adds an Undo Commit command to the commit context menu. Right-click any commit in the Source Control graph and choose Undo. Same result, no terminal needed.
Across all three paths, the underlying command is identical: git reset --soft HEAD~1. The only difference is where you type it.
Return to an earlier version of a file
Scenario. Three months ago, clean_data.R had a variable construction you now want back. Since then you have replaced it with something else and committed the replacement. You want the old version of that one file back, without rewinding the rest of the project.
This operation reaches into history and pulls out a past version of a specific file. Your other files are untouched.
First, find the commit hash where the old version of the file still existed:
git log --oneline clean_data.RYou should see something like:
a9b8c7d Switch to winsorized outcome
7f3a2b1 Add quadratic experience specification
e4d5c6b Filter to workers with high school education or more
2c5e8a1 Add data cleaning script
Pick the hash of the commit whose version you want (say 2c5e8a1). Then restore just that file from that commit:
git checkout 2c5e8a1 -- clean_data.RThe file in your working directory is now the version from that commit. Run git status to confirm, then stage and commit if you want to keep the restored version as the current state:
git add clean_data.R
git commit -m "Restore clean_data.R to earlier version"-- file at the end is a filter
The part after -- tells Git which file(s) to restore. Other files in that commit, and in the rest of your repo, are not touched. If the commit 2c5e8a1 had modified clean_data.R, run_regression.R, and descriptive_stats.R, the command above restores only clean_data.R. The other two stay as they currently are.
You can target more than one file or a whole folder:
git checkout 2c5e8a1 -- clean_data.R run_regression.R # two specific files
git checkout 2c5e8a1 -- code/ # everything in code/
git checkout 2c5e8a1 -- code/*.R # all R files in code/The double-dash -- tells Git: “everything after this is a file path, not an option.” It removes ambiguity if a filename happens to match a branch name. Safe habit: always include it.
RStudio’s Git pane does not expose a per-file history restore. Use the Terminal tab inside RStudio:
- Open the Terminal tab.
Tools → Terminal → New Terminal, or the keyboard shortcut. - Find the commit hash with
git log --oneline clean_data.R. - Run
git checkout <hash> -- clean_data.R. - The file in the Files pane (and in any open editor) updates to the restored version. Stage and commit through the Git pane as usual.
VS Code’s built-in Timeline lets you view a file’s full history as a diff viewer, but it does not have a one-click “Restore” option on its own. The smoothest built-in path combines Timeline browsing with the integrated terminal.
- Open
clean_data.Rin the editor. - In the Explorer sidebar, scroll down and expand the Timeline section. Every commit that touched this file is listed, newest on top.
- Click any entry. A diff opens in the editor showing that version vs. the current one. Browse until you find the version you want to restore.
- Right-click that Timeline entry and select Copy Commit Hash.
- Open the integrated terminal (
Ctrl+`) and paste the hash into this command:
git checkout <paste-hash-here> -- clean_data.RThe file is now the restored version. The Source Control panel shows it as modified. Stage and commit to make this the current state.
If you install the free GitLens extension, right-clicking a file in any commit view exposes a direct Restore command. No copy-paste-terminal dance. GitLens is widely used in the VS Code community and adds many other Git features beyond this one. Worth installing if you work with Git in VS Code often.
Restoring a file is a new commit forward, not a move backward. Your history grows; it does not rewind.
Before checkout: A → B → C ← HEAD
After checkout: A → B → C ← HEAD (clean_data.R now staged, history unchanged)
After commit: A → B → C → D ← HEAD (D has a new hash, contains the restored version)
The old commits still contain the old versions. Nothing is destroyed. You added a new commit (D) that happens to contain an old file state. Both the old commit and the new commit are in your log, each with its own hash, each fully reachable.
This operation restores a single file to an older state. It does not rewrite history or change other files. If the restored version was broken for a reason you forgot, commit as usual and then decide later. Git’s history has both versions; nothing is lost.
Exercise 2: Practice the Safety Rails
Time: ~10 minutes. Continue in my-research.
You have the add-and-commit loop down. This exercise practices the “undo” tools: unstaging, discarding changes, and viewing the diff before committing. These are the safety rails that make committing often feel low-stakes.
Do these operations in your editor of choice. The Terminal version is shown below for concreteness, but every step has an RStudio and VS Code equivalent in the earlier sections: staging is in Staging and Committing, unstaging and discarding are in Undoing Mistakes, and the log view is in Viewing History. Pick your preferred interface and scroll back to any section if you need the clicks.
1. Modify run_regression.R
Open run_regression.R in your editor and add a second specification at the end of the file:
# Add experience squared
model2 <- lm(log(wage) ~ educ + exper + I(exper^2) + tenure, data = wages)
summary(model2)Save the file.
2. See your changes before staging
At the Terminal prompt:
git diffYou should see your new lines marked with +. The diff is your preview of what you are about to commit.
3. Stage and commit
git add run_regression.R
git commit -m "Add quadratic experience specification"4. Practice unstaging
Create a junk file you do not want to commit, stage it by accident, then unstage it. The shell > below redirects the output of echo into a new file, creating junk.R with one line of text:
echo "temporary scratch" > junk.R
git add junk.R
git statusYou should see junk.R under “Changes to be committed”. Unstage it:
git restore --staged junk.R
git statusNow junk.R appears as untracked again. The file still exists on disk, but Git no longer plans to include it in the next commit. Delete the junk file (rm is the shell command to remove a file):
rm junk.R5. Practice discarding an edit
Open clean_data.R in your editor. Add any line (say, a silly comment: # this is a test edit). Save the file.
Check the diff:
git diffYour edit shows as a + line. Now discard it:
git restore clean_data.ROpen clean_data.R in your editor again. The edit is gone. This is a destructive operation: uncommitted edits cannot be recovered. Git only remembers things you committed.
6. View your full history
git log --onelineYou should see five commits, newest on top. Each one describes a single logical change. Each one is a point you can return to.
Your RStudio Git pane and VS Code Source Control panel mirrored every step of this exercise in real time: showing you which files were staged, what the diff looked like, and when the working tree returned to clean. Over time you will develop a sense for whether to consult the Terminal or the editor for any given question. Both show the same underlying Git state.
What’s Next
In Session 5 we will connect your local repository to GitHub, learn to push and pull, create branches, and collaborate with pull requests.
Before next session: create a free account at github.com.