Session 7: AI Tools II

Agentic desktop AI (Cowork): from empty folder to GitHub

Slides for this session: View the slide deck (opens in your browser; press F for fullscreen). The slides are a lean anchor to the concepts below. The walkthrough on this page is the substantive material and the reference you will come back to.

Want a PDF for note-taking? Open the slides in your browser, append ?print-pdf to the URL, and use File → Print → Save as PDF. Reveal.js handles the layout. Works in Chrome, Edge, and Firefox.

Instructor pre-flight (the weekend before class). Open the Dyson placements page in a browser and confirm it still renders the same 4-column PhD table. Run ai-tools/scrape_dyson_cowork.R from a fresh R session. Check that it produces a data/placements_dyson.csv with 80 or more rows. If the selector has broken, patch the XPath noted in the script header before class.

Pre-class checklist. One thing only: Cowork installed and signed in. We are starting from an empty folder in class. Nothing else needs to exist on your machine before you arrive. The first ten minutes of class are reserved for stragglers; if Cowork will not launch, say so immediately.

Where we are in the course

Today builds directly on Session 6. Same running exercise, same page, same verification reflex. What changes is the tool, and the workflow around the tool.

In Session 6 we worked with chat. Chat is a probabilistic pattern-completer with a copy-paste interface. It has no access to your files and cannot run your code. You carried the context across the boundary yourself, pasting snippets in and scripts out. That friction forced Mode B by accident: the artifact you walked away with was always an R script, because there was nothing else chat could give you.

Today we raise the ceiling. Cowork is the same underlying model, wrapped in an interface that can see a folder on your machine and run code in a sandboxed shell. A sandboxed shell is an isolated terminal the agent drives on its side. It can execute R, Python, and shell commands in the folder you granted, but nothing else on your computer is visible to it. Almost everything that was slow in Session 6 becomes fast. A new class of risks appears. The discipline that keeps you safe is the same discipline that kept you safe in Session 6: Mode B, plus Git, plus pushing to GitHub so your work is durable.

The comparison between chat and Cowork is the spine of this session. Keep Session 6 in mind as a reference point.

NoteThese sessions assume Sessions 4 and 5

Every exercise starts with git init in a clean folder and ends with a git push. Without Git, you cannot safely let an AI touch your working directory. Today that matters more than last week, because today the AI can actually touch it. We will use VS Code’s Source Control panel for all Git interactions instead of the terminal, but the underlying operations are the same ones you learned in Sessions 4 and 5.

Recap and safety brief

Mode A vs Mode B, one more time

The single most important distinction from Session 6 carries straight into today. Write it down again if you have to.

Mode A: AI as runtime. You ask the AI to do the thing. The output is data. The reasoning lives inside the model.

Mode B: AI as code author. You ask the AI to write code that does the thing. The output is a script. The reasoning lives in the script, visible and rerunnable.

Mode B is the reproducible path. Mode A almost never belongs in a paper.

Chat forced Mode B by accident. Cowork does not. Cowork will happily run code for you and show you the result table without ever saving a script. That is the central temptation of this session. Every prompt we write today has Mode B baked in explicitly.

What changes when the agent can reach your files

Three capabilities shift all at once.

First, file access. The agent sees the contents of whatever folder you grant it. It can read your data, your existing scripts, your notes, and your .gitignore. You no longer paste snippets in. You also no longer choose what it sees. Granting a folder is a trust decision.

Second, code execution. The agent can run R, Python, and shell commands in a sandboxed process. It gets real output, real errors, real data frames. Its next suggestion can be informed by the actual shape of your data rather than a plausible guess.

Third, on-disk modifications. The agent can create, edit, and delete files in the folder you granted. It will do this without asking each time, by design. If you told it to produce scrape.R and a data/ subfolder, it will produce them. If it decides midway that it also needs to edit .gitignore, it will do that too.

All three capabilities are upgrades. All three are also new surfaces for silent damage. The guardrail, as in Session 6, is Git, and Git is at its strongest when it is talking to GitHub.

Git as the safety net

Every Cowork session in this course follows the same skeleton, expressed today in VS Code’s Source Control panel rather than the terminal.

  1. Before you prompt: confirm the working tree is clean. In VS Code’s Source Control panel, the file list under “Changes” should be empty.
  2. Prompt Cowork. Watch the file tree on the left light up with new and modified files.
  3. After Cowork finishes: read every diff. In Source Control, click each changed file to see its diff. Do not stage anything you have not read.
  4. Stage, commit, push. Three clicks: the + next to the file, the checkmark to commit, the cloud-with-arrow to sync.

The point is not ceremony. The point is that the diff view in VS Code is the only reliable audit trail of what Cowork actually did. The chat transcript will describe what it claims it did. The diff will tell you what it actually did.

If the two disagree, the diff wins. This happens more often than you would expect.

WarningNever point Cowork at a dirty working tree

If “Changes” is not empty before you start, commit or stash first. Otherwise you cannot tell which changes came from you and which came from the agent. This is the single most common mistake in the first week of using any agentic tool.

Cowork as a category

The category, not the brand

Cowork is a current example of agentic desktop AI: a chat interface attached to a model that can see files on your computer and run code in a sandbox. The brand names rotate. At the time of writing they include Cowork, ChatGPT’s desktop integration, and Gemini in Google Workspace. In two years the list will look different. The category will not.

What you should remember is the category profile, because that is what transfers.

Chat and Cowork, side by side

The clearest way to place Cowork is against the chat interface you used in Session 6. Same underlying model, different surface, different capabilities, different risks.

Dimension Chat (Session 6) Cowork (today)
How context arrives You paste it in manually Automatic from the folder you grant
Can read your files? No Yes, any file in the folder
Can run your code? No Yes, in a sandboxed shell
Can edit your files? No, you paste its output Yes, it creates and edits files directly
Native surface Web browser Desktop app
What the model “knows” Only what you typed Full file contents, real data shapes, real errors
Audit trail Your git commit of pasted code Your git diff of the folder it touched
Typical friction Copy-paste fatigue Permission dialogs, trust decisions

Read the table as a set of trades. Capability goes up. Friction goes down. Risk goes up. A chat model cannot delete your thesis; Cowork can, if you grant it the wrong folder. Git is what turns that risk from catastrophic to annoying.

The table also explains why the same research task feels so different in the two sessions. In Session 6 you spent most of your time pasting HTML snippets and comparing the model’s guess to the actual page. In Session 7 Cowork fetches the page itself. That shift is worth forty minutes on the clock. It is also worth new habits.

What Cowork is good at

Cowork is strong in five areas.

  • Bootstrapping projects. README drafts, .gitignore entries, folder layout, renv setup. Boring boilerplate that the agent does at speed. We will use it for exactly this in a few minutes.
  • Exploratory data work. “Here is a CSV I just received. Describe the columns, show me the distribution of placement, flag any rows with missing year.” The agent runs the code, sees the output, and revises. This is genuinely fast.
  • Multi-step tasks with intermediate state. Scrape, clean, reshape, save, then plot. Each step can reference the real output of the previous step, not a plausible guess at what the previous step returned.
  • Debugging with full context. You can say “run code/scrape.R and fix whatever error it throws.” The agent reads the file, runs it, sees the real error, and patches. Compare to chat, where you paste the error and hope the model guesses right.
  • Translating your intent into small refactors. Renaming a variable consistently across files, splitting a 200-line script into helpers, converting a script into an Rmd. Tasks that are tedious by hand and that you can verify with git diff afterwards.

What Cowork is bad at

Cowork is a poor fit for four kinds of work.

  • Outputs you cannot sanity-check by eye. Example: “compute the elasticity of yield with respect to temperature in this panel.” A number appears. You cannot tell by looking whether it is correct. This covers most econometric estimates, which is to say, most research claims.
  • Tasks that must run unattended. Example: a nightly cron job that pulls USDA data at 3am. Cowork needs a human to grant permissions and watch for errors. It cannot run on a schedule.
  • Work whose reasoning must survive the session. Example: “classify these 200 placements as academic or non-academic” with only the output CSV saved. The rule that labelled each row is gone. You have data you cannot rerun, audit, or defend in a paper.
  • Anything where a confident-sounding hallucination would cost you. Example: Cowork cites an rvest function that does not exist, or a CSS selector it never tested. The answer looks authoritative either way. The committed code is yours; the agent is not on your author list.

Examples: when to reach for which

The good-at and bad-at lists are general. The harder skill is recognizing which list applies to the specific task in front of you. Three short examples drawn from the kind of work you actually do.

Chat

Example 1: Feedback on a paragraph in your draft

You wrote a paragraph for the introduction of a paper. You want a sharper argument, a better topic sentence, fewer hedges.

Why chat? There is no data, no code, no folder to see. The task is pure reasoning over text you can paste. Cowork’s file access adds nothing here, only attack surface and friction. Chat is faster and cleaner. The output is suggested rewrites which you accept or discard by hand. There is no artifact to commit, so the Mode B discipline does not apply: the rewrite ends up in your manuscript, not in a script.

Cowork

Example 2: Stack thirty USDA county-level CSVs into one panel

A coauthor sent you thirty CSV files, one per year, from USDA NASS. Column names drift across years. The 2003 file calls a column yield_bu_acre; by 2010 it became Yield (Bu/Acre). You need them in one long panel.

Why Cowork? The agent can list the folder, open three or four files to inspect headers, write a mapping table, build the bind, run it, and report row counts before and after. Chat would have you describing files it cannot see, which is exactly the failure mode of Session 6. The committed artifact is the cleaning script. Verify with row totals and a spot-check on three random county-year cells.

Cowork, verify hard

Example 3: Add a flag column to your placements CSV

You want a logical column is_us set to TRUE when the institution is in the United States. Same data we used today. The task sounds simple.

Why Cowork, with the verification reflex on? The agent can see the institution strings, propose a rule, run it, and report counts. That is fast. The trap is silent row loss if the parsing chokes on an unexpected pattern, and silent column reordering if you forgot to specify the position. Chat would force you to write the rule yourself, which is slower but harder to break. Use Cowork, then re-run the five-step checklist before committing. The speed is not free without the verification.

Today’s arc: GitHub repo, then Cowork

With the framing in place, here is the arc for the next 75 minutes:

  1. Create the repo on github.com (README plus R .gitignore template).
  2. Clone it locally in your terminal.
  3. Open in VS Code, grant Cowork access, and scaffold the project with a small Cowork prompt: README and .Rproj only. The code/, data/, and output/ folders are created later, by the scripts that actually fill them.
  4. Scrape. Drive Cowork to write code/scrape_dyson_cowork.R. Verify, commit, push. (This is the chat-style task at Cowork speed.)
  5. Build the pipeline. Drive Cowork to write code/pipeline_dyson.R that parses, classifies, plots, and summarizes. Verify, commit, push. (This is what Cowork lets you do that chat practically could not.)
  6. Stretch: ask Cowork to scrape a second department for a cross-school comparison.

The point is not the scraper. The point is the rhythm and the depth: GitHub-first, prompt, verify, sync. The first half is the same task as Session 6 at a different speed. The second half is a small project that chat could not produce in 75 minutes.

Create the repo on GitHub

We start where the repo will live: on GitHub. The local copy is a clone of the canonical online version. This mirrors the workflow you saw in Session 5 and the workflow you will use for almost every collaboration in your career.

If a folder named aem7010-ai already exists on your machine from earlier experimenting, rename it (aem7010-ai-old/) before you start. Beginning from a clean slate is the lesson.

Step-by-step on github.com

  1. Open https://github.com in your browser. Sign in if you are not already.
  2. Click the green New button (top-left of the page, next to your avatar) or go to https://github.com/new.
  3. Fill in the form:
    • Repository name: aem7010-ai
    • Description: Cowork and Claude Code exercises for AEM 7010 (one line, optional but useful)
    • Public or Private: either is fine. Public is the norm for course exercises; private is the norm for unpublished research. Pick one.
    • Add a README file: check this box. We want a non-empty first commit.
    • Add .gitignore: open the dropdown and select R. This gives you the community-standard ignore list for R projects.
    • Choose a license: leave as None for now. You can add one later if you publish.
  4. Click the green Create repository button at the bottom.
  5. You land on github.com/<your-handle>/aem7010-ai. Confirm three files exist: README.md, .gitignore, and the License entry (which says “Add a license” if you skipped it).

The repo now exists online. Nothing exists locally yet. That asymmetry is the point.

Copy the clone URL

Still on the repo page on github.com, click the green Code button. A small panel opens with two tabs: HTTPS and SSH.

  • If you set up SSH keys in Session 5, click SSH and copy the URL. It looks like git@github.com:<your-handle>/aem7010-ai.git.
  • Otherwise, click HTTPS and copy that URL. It looks like https://github.com/<your-handle>/aem7010-ai.git.

You will paste this URL in the next step.

Clone the repo in your terminal

Open a terminal. Any terminal will do: macOS Terminal, iTerm, or VS Code’s integrated terminal (Terminal → New Terminal). Run the two lines below. Replace <your-handle> with your GitHub username.

cd ~/github
git clone git@github.com:<your-handle>/aem7010-ai.git
cd aem7010-ai
cd ~/github
git clone https://github.com/<your-handle>/aem7010-ai.git
cd aem7010-ai

git clone copies the GitHub repo into a new ~/github/aem7010-ai/ folder. After cd, run a quick sanity check:

ls -la
git status
git log --oneline

You should see README.md, .gitignore, the hidden .git/ folder, “nothing to commit, working tree clean”, and one commit titled “Initial commit”. That commit is the one GitHub made for you when you ticked “Add a README”.

TipIf git clone asks for a password

GitHub turned off password authentication years ago. If the HTTPS clone prompts for a password, two options. (1) Set up SSH keys (Session 5, “Connect to GitHub”). (2) Use a personal access token as the password (GitHub Settings → Developer settings → Personal access tokens). The first is the long-term answer.

Open in VS Code, grant Cowork access

Open the cloned folder

In VS Code: File → Open Folder~/github/aem7010-ai. The Explorer pane on the left should show README.md, .gitignore, and the Source Control panel should report no pending changes.

If you have not added VS Code’s code command-line helper yet, you can also open the folder from the terminal you just used: code . (from inside the folder).

Grant Cowork access

Launch Cowork. Grant it access to ~/github/aem7010-ai (the whole folder). Confirm three things:

  1. The folder path appears in Cowork’s UI.
  2. The chat transcript sits beside the file tree.
  3. You found the stop button before you need it.

If any of these are unclear, flag it now. The rest of the session assumes all three.

Scaffold the project with Cowork

The repo already has a sensible README and .gitignore from GitHub. We need two small additions: a one-paragraph note in the README, and an RStudio project file (.Rproj). We deliberately do not pre-create a session7/ subfolder. Instead, the scripts will write into functional folders (code/, data/, output/) that describe content, not chronology. A Session 8 script can land next to a Session 7 script in code/ without any reorganization.

TipWhy an RStudio project file

An .Rproj file is RStudio’s marker that “this folder is a project”. Two benefits matter today. First, working directory. When you double-click aem7010-ai.Rproj, RStudio opens with the working directory set to the repo root, so relative paths like "data/placements_dyson.csv" resolve consistently every time, on every machine. No setwd() calls in the script. Second, the project remembers state. Open files, command history, and the Git pane all stay scoped to this repo. You can have several projects open in separate RStudio windows without them polluting each other.

For reproducible research, the working-directory point is the load-bearing one. Anyone who clones the repo and double-clicks the .Rproj runs the same scripts in the same context. That is one of the smallest, highest-leverage habits we will pick up this term.

Paste the following prompt into Cowork:

NoteScaffolding prompt

I am working inside this folder, which is a fresh clone of a GitHub repo. It already has a README.md and .gitignore (R template). Please do exactly two things and then stop.

  1. Append one short paragraph to README.md describing what this repo holds: scripts and outputs for the AEM 7010 AI-tools exercises. The repo follows a functional layout: code/ for R scripts, data/ for input and processed data, output/ for generated figures and reports.

  2. Create an RStudio project file at aem7010-ai.Rproj at the repo root, using the standard RStudio defaults. This makes the repo open as a project in RStudio with the working directory set to the repo root.

Do not create any other files (no code/, no data/, no output/; the scripts will create those when they need them). Do not run any R code. Do not install any packages. Do not edit .gitignore. Stop after the two items above and report what you changed.

When Cowork stops, look at VS Code’s Source Control panel. You should see exactly two changes: a new aem7010-ai.Rproj and a modified README.md. Read each diff.

First commit and push

Two ways. Pick the one you prefer; we will use it for all later commits today.

  1. In the Source Control panel, click the + next to “Changes” to stage both files.
  2. Type the commit message: Add RStudio project and project description.
  3. Press Cmd+Enter (Mac) or Ctrl+Enter (Windows) to commit.
  4. Click Sync Changes at the bottom (cloud-with-arrows icon). This pushes to GitHub.
git add aem7010-ai.Rproj README.md
git commit -m "Add RStudio project and project description"
git push
TipOpen the repo in RStudio

With aem7010-ai.Rproj in place, double-click that file in Finder (or open it from inside RStudio). RStudio will open the repo as a project, with its working directory set to the repo root. Keep RStudio open in parallel with VS Code: VS Code is your editor and Source Control panel; RStudio is where you run R.

Open github.com/<your-handle>/aem7010-ai and refresh. You should see two commits in the history (the original “Initial commit” and your new one), the aem7010-ai.Rproj file, and the updated README.md with your project paragraph. If not, the push did not happen. Check for errors in the bottom-left of VS Code or in the terminal output before continuing.

Guided scrape of Cornell Dyson placements

The task

We return to the Cornell Dyson PhD placements page, this time with Cowork. This is the same page the instructor demoed in Session 6, deliberately. The point is to see the same task played out at very different speeds with very different risks.

Goal output: a CSV at data/placements_dyson.csv (at the repo root) with the following columns.

Column Example
name Sharan Banerjee
year 2025
placement Postdoctoral Fellow at KAPSARC School of Public Policy, Riyadh
source_url https://dyson.cornell.edu/programs/graduate/placements/

A working script, a CSV of about 90 rows spanning 2015 to 2025, and a clean Git commit pushed to GitHub. That is the deliverable.

How we work this together

We do this in lockstep, not as a demo. The instructor projects the same screen you have. Each step below happens on every laptop in the room. We pause at the checkpoints. Do not skip ahead, and do not lag silently.

The rhythm is the same one you will use in your own research:

  1. Source Control panel clean? Check.
  2. Paste the shared prompt (below).
  3. Wait for Cowork to finish, then read what it changed.
  4. Run the verification checklist.
  5. Commit. Sync. Refresh github.com on a side tab to confirm.

Watch for the moments where Cowork wants to reply with a table instead of a script. That is the Mode A temptation. The prompt below is written to prevent it, but the agent will still drift if you are not watching.

The shared prompt

This is the exact text you paste into Cowork. Do not paraphrase. Mode B enforcement is load-bearing; the third paragraph is doing most of the work.

NotePrompt to paste into Cowork

I am working inside this folder. Please do the following.

  1. Write an R script at code/scrape_dyson_cowork.R (create the code/ folder if it does not exist) that scrapes the PhD placements table from https://dyson.cornell.edu/programs/graduate/placements/. The script should save a CSV at data/placements_dyson.csv (create the data/ folder if it does not exist) with exactly these columns, in this order: name, year, placement, source_url. The placement column is the job title joined to the institution by the word “at”, like “Assistant Professor at University of Illinois Urbana-Champaign”. The source_url column is the URL above, repeated on every row.

  2. Use rvest and readr. Anchor your selector on the heading text “Recent PhD Job Placements” so the script is robust to changes in CSS class names. Drop any row in the table where all four cells are empty.

  3. The R script is the artifact I care about. Do not paste the scraped data into this chat. I will run the script myself from a fresh R session to verify it works. Include a one-line message() at the end reporting the number of rows written.

  4. After writing the script, run it once in the sandbox so we know it works. Report the row count and stop. Do not create any other files. Do not edit my .gitignore.

Read the prompt before pasting. Note four things about it.

  • The output schema is specified exactly. Column names, column order, separator word for the placement column. If you say “a CSV with the relevant info”, you will get something you did not want.
  • The selector strategy is given. “Anchor on the heading text” is the same rule the fallback script follows. If you leave this out, Cowork picks a CSS class that may not survive the next page redesign.
  • The artifact is declared. “The R script is the artifact I care about. Do not paste the data.” This is Mode B in one sentence.
  • The scope is closed. “Do not create any other files. Do not edit my .gitignore.” Without this, Cowork sometimes adds a README, a renv.lock, or a helper script you did not ask for. Harmless most of the time, noisy in a git diff.

Paste and wait

Time: ~20 minutes. Cowork already has access to ~/github/aem7010-ai from the permissions step earlier. Paste the shared prompt and let it run.

When Cowork reports a row count, do not yet trust it. Move to the verification checklist.

Verification checklist

Five checks, in order. All five must pass before you commit. If any fails, fix it first.

  1. Does the script exist at code/scrape_dyson_cowork.R? If Cowork named it differently, rename it. The name is part of the contract.

  2. Does the script run cleanly from a fresh R session? Do not trust “it ran in the sandbox”. Run it yourself.

    With the aem7010-ai.Rproj project open in RStudio, open code/scrape_dyson_cowork.R. Click Session → Restart R, then click Source (or press Cmd+Shift+S / Ctrl+Shift+S). Read the message in the console.

    From the repo root: Rscript code/scrape_dyson_cowork.R. Requires Rscript on your PATH (true on most macOS installs, sometimes missing on Windows).

  3. Does data/placements_dyson.csv have 80 or more rows, and the four expected columns in the right order? In the R console: readr::read_csv("data/placements_dyson.csv") followed by nrow() and names().

  4. Pick three random rows and verify them against the live page. Open the browser, find the row, compare name, year, and the join of position-plus-institution. If any of the three is wrong, the scraper is wrong, even if the row count looks right.

  5. Can you explain every line of the script? If a line uses a function you do not recognize, ask Cowork to explain it until you can restate what it does in your own words. Then decide whether to keep it.

If a check fails, prompt Cowork to fix that specific failure. Do not accept a new full-rewrite response: ask for a patch.

Commit and push #1

When all five verification checks pass, commit and push from VS Code. Three clicks plus a sentence.

  1. Source Control panel → click the + next to code/scrape_dyson_cowork.R and data/placements_dyson.csv to stage them. (Or + next to “Changes” to stage everything: read each diff first.)
  2. In the message box at the top, type: Session 7: Cowork-drafted scraper for Dyson PhD placements.
  3. Click the checkmark (or Cmd+Enter / Ctrl+Enter) to commit.
  4. Click the Sync Changes button at the bottom of the VS Code window (cloud-with-arrows icon). This pushes to GitHub.

Now open github.com/<your-handle>/aem7010-ai and refresh. You should see:

  • A new commit with your message at the top of the commit list.
  • The code/scrape_dyson_cowork.R file.
  • The data/placements_dyson.csv file. Click it; GitHub will render the CSV as a table for you to spot-check.

If GitHub does not show the new commit, you have not actually pushed. Check the bottom-left of VS Code for any sync errors and resolve them before continuing.

Build the pipeline

Time: ~30 minutes. This is where Session 7 goes beyond Session 6. The scrape we just finished is something chat could also have produced, more slowly. The pipeline below is something chat practically could not finish in a class period, because each step depends on the actual shape of the data, not on a guess.

What the pipeline does

A research project is rarely just a scrape. The placements table is the input to a small analysis. We will ask Cowork to build a second script, code/pipeline_dyson.R, that takes the CSV from the scrape and produces three new artifacts in four steps.

  1. Parse the placement column into position_title and institution.
  2. Classify each row as academic, government, industry, or other using a short keyword rule.
  3. Plot placements per year, colored by category, saved to output/figures/placements_dyson_by_year.png.
  4. Summarize the result by writing output/findings_dyson.md with the totals, the category counts, the year range, and the most common institution.

The output is a small, rerunnable project. The CSV is the boundary between the scrape and the pipeline. Either side can be rerun without the other.

Shared pipeline prompt

Paste the following prompt into Cowork. As before, do not paraphrase. The structure of the prompt is what keeps the agent honest.

NotePipeline prompt to paste into Cowork

I am working inside this folder. The scrape from earlier produced data/placements_dyson.csv. Please build the analysis pipeline.

  1. Write an R script at code/pipeline_dyson.R that reads data/placements_dyson.csv and does steps 2 through 5 below. Use only tidyverse and base R. Do not install other packages. Run the script in the sandbox once at the end so we know it works.

  2. Parse. Split the placement column into two new columns: position_title (everything before the first ” at “) and institution (everything after the first” at “). Trim whitespace on both. If a row has no” at ” separator, set both to NA and report the count of failures with message().

  3. Classify. Add a category column with values academic, government, industry, or other. Apply these rules in order:

    • academic if position_title contains any of: “Professor”, “Lecturer”, “Postdoctoral”, “Postdoc”, “Faculty”, “Research Fellow” (case-insensitive).
    • government if institution contains any of: “Bureau”, “Department of”, “USDA”, “Federal Reserve”, “World Bank”, “OECD”, “IMF”, “United Nations”, “FAO”, “Ministry” (case-insensitive).
    • industry if position_title contains any of: “Analyst”, “Consultant”, “Manager”, “Director” (case-insensitive) and the row was not already classified as academic or government.
    • other otherwise.

    Report the count by category at the end with message().

  4. Plot. Make a stacked bar chart of placements per year, colored by category, using ggplot2. Use the four categories in this fixed order in the legend: academic, government, industry, other. Save the plot to output/figures/placements_dyson_by_year.png (create the output/figures/ folders if they do not exist) at 8 by 5 inches, 150 dpi.

  5. Summarize. Write a short output/findings_dyson.md markdown file containing: total rows, count by category, year range (min to max), and the most common institution. Generate every number programmatically from the data.

  6. The two scripts and the four output files are the artifacts. Do not paste the data or the markdown content into this chat. I will run the pipeline myself from a fresh R session to verify. Do not modify code/scrape_dyson_cowork.R. Do not edit .gitignore or README.md. Do not create any other files.

Read the prompt before pasting. Notice five things:

  • The contract is precise. Column names, classification rules in order, file paths, and dimensions for the plot are all specified.
  • The script is the artifact. The numbers in findings.md must come from the data, not be typed by the agent. That is Mode B applied to a markdown report.
  • The boundary is explicit. The pipeline reads data/placements_dyson.csv. It does not re-scrape. Scraping and analysis are different steps that fail in different ways.
  • The scope is closed. The agent is told what not to touch (the scrape, the README, the .gitignore).
  • The agent must report. The two message() calls are the parsing failure count and the category counts. These are the numbers you will reconcile against findings.md.

Paste and wait, again

Time: ~20 minutes. Same rhythm as before, in lockstep with the room. Paste the pipeline prompt. Watch the file tree. When Cowork stops, you should see new entries in the Source Control panel: code/pipeline_dyson.R, output/figures/placements_dyson_by_year.png, output/findings_dyson.md, and any folders that did not yet exist (output/, output/figures/).

If you see anything else (a tests/ folder, a new renv.lock, an edit to the scrape script), read the diff before staging.

Verification checklist (pipeline)

Five checks, in order. All five must pass before the second commit.

  1. Does code/pipeline_dyson.R run cleanly from a fresh R session?

    In the RStudio project, open code/pipeline_dyson.R. Session → Restart R, then Source. Read the two message() lines that appear in the console.

    From the repo root: Rscript code/pipeline_dyson.R. Read the two message() lines in the terminal output.

  2. Do the parsing failures and category counts make sense? Total rows minus parsing failures should equal the sum of the four category counts. If they do not reconcile, the script has a silent bug.

  3. Does output/figures/placements_dyson_by_year.png exist and look right? Open it. Years on the x-axis, four categories in the legend in the right order, no obvious gaps in the bars.

  4. Does output/findings_dyson.md exist and reconcile to the data? Open it. The numbers should match the messages from step 1 and the values in the CSV. Spot-check the “most common institution” by hand.

  5. Can you explain every line of the script, including each classification rule? If a regex looks magic, ask Cowork to explain it until you can restate it. Then decide whether to keep it.

If a check fails, prompt Cowork for a patch on that specific failure. Do not accept a full rewrite.

Commit and push #2

When all five pipeline checks pass, repeat the Source Control rhythm.

  1. Source Control panel → stage the new entries: code/pipeline_dyson.R, output/figures/placements_dyson_by_year.png, output/findings_dyson.md.
  2. Commit message: Session 7: pipeline (parse, classify, plot, summarize).
  3. Commit (checkmark or Cmd+Enter).
  4. Sync Changes. Refresh github.com.

You should now see two scrape-and-pipeline commits and a clean functional layout: code/scrape_dyson_cowork.R, code/pipeline_dyson.R, data/placements_dyson.csv, output/figures/placements_dyson_by_year.png, and output/findings_dyson.md. Click output/findings_dyson.md on github.com. It renders as a small report with numbers that came from the data, not from the agent’s prose.

Two commits in one session is a healthy pattern. The diff between them is the shape of an actual research workflow: from raw data to a small, reproducible analysis.

Stretch: cross-school comparison

TipIf you finish early

Ask Cowork to scrape the Berkeley ARE PhD placements page (https://are.berkeley.edu/graduate/job-market-placement) into data/placements_berkeley.csv using the same column contract, then run the same pipeline on it, and produce a output/findings_comparison.md with a small table of category counts side by side for Cornell and Berkeley.

Two things to watch:

  • Different page, different selectors. The Berkeley page does not anchor on the same heading. Cowork will need a different selector strategy, which is itself a useful lesson.
  • Same pipeline, different inputs. The classifier and the plot helper are reusable. If you find yourself rewriting them, refactor them out of pipeline_dyson.R into a small helper file Cowork can call from both pipelines. This is a Session 8 move; doing it once here will make Monday feel familiar.

Debrief

What we learned

The first half of class compressed a forty-minute chat-and-paste workflow into a three-minute interaction. Same CSV, same target page, much less typing. That is a real productivity gain, but it is the smaller of the two lessons.

The bigger lesson is the second half. Cowork did not just write the scraper faster. It built a small project around it: a parser, a classifier, a plot, a programmatic findings report. Five files, one CSV boundary between them. Doing the same thing in chat would have meant copy-pasting partial outputs back and forth across many turns, with the agent guessing at columns it could not see. By the time you finished, class would be over.

The verification reflex did not change. It got more work to do. You ran it twice today on different artifacts: a CSV from the scraper, and a markdown report plus a PNG from the pipeline. The questions are the same. Do the numbers reconcile? Does it run from a fresh R session? Can you explain every line?

What changes from chat, to Cowork, to Claude Code

The course module is not about three tools. It is one ladder. Each rung adds one new dimension and keeps the rest constant.

Rung Tool Task scope Pipeline depth Workflow style
Session 6 chat one school, one script scrape only manual paste
Session 7 (today) Cowork one school, small project scrape, parse, classify, plot, summary interactive, you watch
Session 8 (Monday) Claude Code five schools, study same pipeline shape delegated, you review the diff

Session 6 to 7 added pipeline depth and held data scope constant. Session 7 to 8 will add data scope and delegation, and hold the pipeline shape constant. Your Session 7 code will be the seed of Monday’s Session 8.

What Session 6 forced, what Cowork allows

The other lesson, which carries every week. Notice when a tool’s constraints were protecting you, and replace them with discipline when those constraints fall away.

Behavior In Session 6 (chat) In Session 7 (Cowork)
Running the script You had to, by pasting and executing The agent can, so you have to make yourself
Verifying the output You had to, the agent could not The agent reports a row count, so you have to insist
Saving an artifact Only possible as a script, pasted in Possible as a script or as a displayed table; Mode B is your rule now
Editing files You, one paste at a time The agent, potentially many files at once, possibly silently
Ending the session Clean by default Requires git diff to know what happened
Sharing the work A pasted script in your notes A live GitHub repo with a commit history

The more capable the tool, the more of the verification reflex you own directly.

The bottleneck is no longer typing

The bottleneck is verification, and it always will be. Faster tools do not remove verification work. They move it closer to the end of the pipeline, where it is easier to skip. The verification checklist and the GitHub round-trip are how you do not skip it.

For Monday

  1. Your aem7010-ai repo on GitHub is your Session 7 deliverable. Confirm at least two commits are visible at github.com/<your-handle>/aem7010-ai.

  2. Install Claude Code before class. Anthropic ships a native installer that bundles everything as a single binary. No Node.js required. One command in a terminal:

    # macOS / Linux
    curl -fsSL https://claude.ai/install.sh | bash
    
    # Windows (PowerShell)
    irm https://claude.ai/install.ps1 | iex

    Then verify with claude --version. You will need a Claude account (free signup at https://claude.com). The in-class exercises will not require a paid plan. Full guide at https://docs.claude.com/en/docs/claude-code/setup. Bring any install errors to the first ten minutes of class on Monday; that time is reserved for troubleshooting.

  3. Open the aem7010-ai repo in your terminal at the repo root. That is the directory Claude Code will operate in on Monday. Confirm git status is clean before you arrive.

NoteInstructor fallback

If Cowork misbehaves on a student machine during the synchronous walkthrough, a working scraper lives at ai-tools/scrape_dyson_cowork.R in the course repo. It takes about 10 seconds to run and produces the same CSV. Use it as a reference, or as a literal drop-in if the room gets stuck on the scrape.