Session 7: AI Tools II
Agentic desktop AI (Cowork): from empty folder to GitHub
Want a PDF for note-taking? Open the slides in your browser, append ?print-pdf to the URL, and use File → Print → Save as PDF. Reveal.js handles the layout. Works in Chrome, Edge, and Firefox.
Pre-class checklist. One thing only: Cowork installed and signed in. We are starting from an empty folder in class. Nothing else needs to exist on your machine before you arrive. The first ten minutes of class are reserved for stragglers; if Cowork will not launch, say so immediately.
Where we are in the course
Today builds directly on Session 6. Same running exercise, same page, same verification reflex. What changes is the tool, and the workflow around the tool.
In Session 6 we worked with chat. Chat is a probabilistic pattern-completer with a copy-paste interface. It has no access to your files and cannot run your code. You carried the context across the boundary yourself, pasting snippets in and scripts out. That friction forced Mode B by accident: the artifact you walked away with was always an R script, because there was nothing else chat could give you.
Today we raise the ceiling. Cowork is the same underlying model, wrapped in an interface that can see a folder on your machine and run code in a sandboxed shell. A sandboxed shell is an isolated terminal the agent drives on its side. It can execute R, Python, and shell commands in the folder you granted, but nothing else on your computer is visible to it. Almost everything that was slow in Session 6 becomes fast. A new class of risks appears. The discipline that keeps you safe is the same discipline that kept you safe in Session 6: Mode B, plus Git, plus pushing to GitHub so your work is durable.
The comparison between chat and Cowork is the spine of this session. Keep Session 6 in mind as a reference point.
Every exercise starts with git init in a clean folder and ends with a git push. Without Git, you cannot safely let an AI touch your working directory. Today that matters more than last week, because today the AI can actually touch it. We will use VS Code’s Source Control panel for all Git interactions instead of the terminal, but the underlying operations are the same ones you learned in Sessions 4 and 5.
Recap and safety brief
Mode A vs Mode B, one more time
The single most important distinction from Session 6 carries straight into today. Write it down again if you have to.
Mode A: AI as runtime. You ask the AI to do the thing. The output is data. The reasoning lives inside the model.
Mode B: AI as code author. You ask the AI to write code that does the thing. The output is a script. The reasoning lives in the script, visible and rerunnable.
Mode B is the reproducible path. Mode A almost never belongs in a paper.
Chat forced Mode B by accident. Cowork does not. Cowork will happily run code for you and show you the result table without ever saving a script. That is the central temptation of this session. Every prompt we write today has Mode B baked in explicitly.
What changes when the agent can reach your files
Three capabilities shift all at once.
First, file access. The agent sees the contents of whatever folder you grant it. It can read your data, your existing scripts, your notes, and your .gitignore. You no longer paste snippets in. You also no longer choose what it sees. Granting a folder is a trust decision.
Second, code execution. The agent can run R, Python, and shell commands in a sandboxed process. It gets real output, real errors, real data frames. Its next suggestion can be informed by the actual shape of your data rather than a plausible guess.
Third, on-disk modifications. The agent can create, edit, and delete files in the folder you granted. It will do this without asking each time, by design. If you told it to produce scrape.R and a data/ subfolder, it will produce them. If it decides midway that it also needs to edit .gitignore, it will do that too.
All three capabilities are upgrades. All three are also new surfaces for silent damage. The guardrail, as in Session 6, is Git, and Git is at its strongest when it is talking to GitHub.
Git as the safety net
Every Cowork session in this course follows the same skeleton, expressed today in VS Code’s Source Control panel rather than the terminal.
- Before you prompt: confirm the working tree is clean. In VS Code’s Source Control panel, the file list under “Changes” should be empty.
- Prompt Cowork. Watch the file tree on the left light up with new and modified files.
- After Cowork finishes: read every diff. In Source Control, click each changed file to see its diff. Do not stage anything you have not read.
- Stage, commit, push. Three clicks: the
+next to the file, the checkmark to commit, the cloud-with-arrow to sync.
The point is not ceremony. The point is that the diff view in VS Code is the only reliable audit trail of what Cowork actually did. The chat transcript will describe what it claims it did. The diff will tell you what it actually did.
If the two disagree, the diff wins. This happens more often than you would expect.
If “Changes” is not empty before you start, commit or stash first. Otherwise you cannot tell which changes came from you and which came from the agent. This is the single most common mistake in the first week of using any agentic tool.
Cowork as a category
The category, not the brand
Cowork is a current example of agentic desktop AI: a chat interface attached to a model that can see files on your computer and run code in a sandbox. The brand names rotate. At the time of writing they include Cowork, ChatGPT’s desktop integration, and Gemini in Google Workspace. In two years the list will look different. The category will not.
What you should remember is the category profile, because that is what transfers.
Chat and Cowork, side by side
The clearest way to place Cowork is against the chat interface you used in Session 6. Same underlying model, different surface, different capabilities, different risks.
| Dimension | Chat (Session 6) | Cowork (today) |
|---|---|---|
| How context arrives | You paste it in manually | Automatic from the folder you grant |
| Can read your files? | No | Yes, any file in the folder |
| Can run your code? | No | Yes, in a sandboxed shell |
| Can edit your files? | No, you paste its output | Yes, it creates and edits files directly |
| Native surface | Web browser | Desktop app |
| What the model “knows” | Only what you typed | Full file contents, real data shapes, real errors |
| Audit trail | Your git commit of pasted code | Your git diff of the folder it touched |
| Typical friction | Copy-paste fatigue | Permission dialogs, trust decisions |
Read the table as a set of trades. Capability goes up. Friction goes down. Risk goes up. A chat model cannot delete your thesis; Cowork can, if you grant it the wrong folder. Git is what turns that risk from catastrophic to annoying.
The table also explains why the same research task feels so different in the two sessions. In Session 6 you spent most of your time pasting HTML snippets and comparing the model’s guess to the actual page. In Session 7 Cowork fetches the page itself. That shift is worth forty minutes on the clock. It is also worth new habits.
What Cowork is good at
Cowork is strong in five areas.
- Bootstrapping projects. README drafts,
.gitignoreentries, folder layout,renvsetup. Boring boilerplate that the agent does at speed. We will use it for exactly this in a few minutes. - Exploratory data work. “Here is a CSV I just received. Describe the columns, show me the distribution of
placement, flag any rows with missingyear.” The agent runs the code, sees the output, and revises. This is genuinely fast. - Multi-step tasks with intermediate state. Scrape, clean, reshape, save, then plot. Each step can reference the real output of the previous step, not a plausible guess at what the previous step returned.
- Debugging with full context. You can say “run
code/scrape.Rand fix whatever error it throws.” The agent reads the file, runs it, sees the real error, and patches. Compare to chat, where you paste the error and hope the model guesses right. - Translating your intent into small refactors. Renaming a variable consistently across files, splitting a 200-line script into helpers, converting a script into an Rmd. Tasks that are tedious by hand and that you can verify with
git diffafterwards.
What Cowork is bad at
Cowork is a poor fit for four kinds of work.
- Outputs you cannot sanity-check by eye. Example: “compute the elasticity of yield with respect to temperature in this panel.” A number appears. You cannot tell by looking whether it is correct. This covers most econometric estimates, which is to say, most research claims.
- Tasks that must run unattended. Example: a nightly cron job that pulls USDA data at 3am. Cowork needs a human to grant permissions and watch for errors. It cannot run on a schedule.
- Work whose reasoning must survive the session. Example: “classify these 200 placements as academic or non-academic” with only the output CSV saved. The rule that labelled each row is gone. You have data you cannot rerun, audit, or defend in a paper.
- Anything where a confident-sounding hallucination would cost you. Example: Cowork cites an
rvestfunction that does not exist, or a CSS selector it never tested. The answer looks authoritative either way. The committed code is yours; the agent is not on your author list.
Examples: when to reach for which
The good-at and bad-at lists are general. The harder skill is recognizing which list applies to the specific task in front of you. Three short examples drawn from the kind of work you actually do.
Chat
Example 1: Feedback on a paragraph in your draft
You wrote a paragraph for the introduction of a paper. You want a sharper argument, a better topic sentence, fewer hedges.
Why chat? There is no data, no code, no folder to see. The task is pure reasoning over text you can paste. Cowork’s file access adds nothing here, only attack surface and friction. Chat is faster and cleaner. The output is suggested rewrites which you accept or discard by hand. There is no artifact to commit, so the Mode B discipline does not apply: the rewrite ends up in your manuscript, not in a script.
Cowork
Example 2: Stack thirty USDA county-level CSVs into one panel
A coauthor sent you thirty CSV files, one per year, from USDA NASS. Column names drift across years. The 2003 file calls a column yield_bu_acre; by 2010 it became Yield (Bu/Acre). You need them in one long panel.
Why Cowork? The agent can list the folder, open three or four files to inspect headers, write a mapping table, build the bind, run it, and report row counts before and after. Chat would have you describing files it cannot see, which is exactly the failure mode of Session 6. The committed artifact is the cleaning script. Verify with row totals and a spot-check on three random county-year cells.
Cowork, verify hard
Example 3: Add a flag column to your placements CSV
You want a logical column is_us set to TRUE when the institution is in the United States. Same data we used today. The task sounds simple.
Why Cowork, with the verification reflex on? The agent can see the institution strings, propose a rule, run it, and report counts. That is fast. The trap is silent row loss if the parsing chokes on an unexpected pattern, and silent column reordering if you forgot to specify the position. Chat would force you to write the rule yourself, which is slower but harder to break. Use Cowork, then re-run the five-step checklist before committing. The speed is not free without the verification.
Today’s arc: GitHub repo, then Cowork
With the framing in place, here is the arc for the next 75 minutes:
- Create the repo on github.com (README plus R
.gitignoretemplate). - Clone it locally in your terminal.
- Open in VS Code, grant Cowork access, and scaffold the project with a small Cowork prompt: README and
.Rprojonly. Thecode/,data/, andoutput/folders are created later, by the scripts that actually fill them. - Scrape. Drive Cowork to write
code/scrape_dyson_cowork.R. Verify, commit, push. (This is the chat-style task at Cowork speed.) - Build the pipeline. Drive Cowork to write
code/pipeline_dyson.Rthat parses, classifies, plots, and summarizes. Verify, commit, push. (This is what Cowork lets you do that chat practically could not.) - Stretch: ask Cowork to scrape a second department for a cross-school comparison.
The point is not the scraper. The point is the rhythm and the depth: GitHub-first, prompt, verify, sync. The first half is the same task as Session 6 at a different speed. The second half is a small project that chat could not produce in 75 minutes.
Create the repo on GitHub
We start where the repo will live: on GitHub. The local copy is a clone of the canonical online version. This mirrors the workflow you saw in Session 5 and the workflow you will use for almost every collaboration in your career.
If a folder named aem7010-ai already exists on your machine from earlier experimenting, rename it (aem7010-ai-old/) before you start. Beginning from a clean slate is the lesson.
Step-by-step on github.com
- Open https://github.com in your browser. Sign in if you are not already.
- Click the green New button (top-left of the page, next to your avatar) or go to https://github.com/new.
- Fill in the form:
- Repository name:
aem7010-ai - Description:
Cowork and Claude Code exercises for AEM 7010(one line, optional but useful) - Public or Private: either is fine. Public is the norm for course exercises; private is the norm for unpublished research. Pick one.
- Add a README file: check this box. We want a non-empty first commit.
- Add .gitignore: open the dropdown and select R. This gives you the community-standard ignore list for R projects.
- Choose a license: leave as None for now. You can add one later if you publish.
- Repository name:
- Click the green Create repository button at the bottom.
- You land on
github.com/<your-handle>/aem7010-ai. Confirm three files exist:README.md,.gitignore, and the License entry (which says “Add a license” if you skipped it).
The repo now exists online. Nothing exists locally yet. That asymmetry is the point.
Copy the clone URL
Still on the repo page on github.com, click the green Code button. A small panel opens with two tabs: HTTPS and SSH.
- If you set up SSH keys in Session 5, click SSH and copy the URL. It looks like
git@github.com:<your-handle>/aem7010-ai.git. - Otherwise, click HTTPS and copy that URL. It looks like
https://github.com/<your-handle>/aem7010-ai.git.
You will paste this URL in the next step.
Clone the repo in your terminal
Open a terminal. Any terminal will do: macOS Terminal, iTerm, or VS Code’s integrated terminal (Terminal → New Terminal). Run the two lines below. Replace <your-handle> with your GitHub username.
cd ~/github
git clone git@github.com:<your-handle>/aem7010-ai.git
cd aem7010-aicd ~/github
git clone https://github.com/<your-handle>/aem7010-ai.git
cd aem7010-aigit clone copies the GitHub repo into a new ~/github/aem7010-ai/ folder. After cd, run a quick sanity check:
ls -la
git status
git log --onelineYou should see README.md, .gitignore, the hidden .git/ folder, “nothing to commit, working tree clean”, and one commit titled “Initial commit”. That commit is the one GitHub made for you when you ticked “Add a README”.
git clone asks for a password
GitHub turned off password authentication years ago. If the HTTPS clone prompts for a password, two options. (1) Set up SSH keys (Session 5, “Connect to GitHub”). (2) Use a personal access token as the password (GitHub Settings → Developer settings → Personal access tokens). The first is the long-term answer.
Open in VS Code, grant Cowork access
Open the cloned folder
In VS Code: File → Open Folder → ~/github/aem7010-ai. The Explorer pane on the left should show README.md, .gitignore, and the Source Control panel should report no pending changes.
If you have not added VS Code’s code command-line helper yet, you can also open the folder from the terminal you just used: code . (from inside the folder).
Grant Cowork access
Launch Cowork. Grant it access to ~/github/aem7010-ai (the whole folder). Confirm three things:
- The folder path appears in Cowork’s UI.
- The chat transcript sits beside the file tree.
- You found the stop button before you need it.
If any of these are unclear, flag it now. The rest of the session assumes all three.
Scaffold the project with Cowork
The repo already has a sensible README and .gitignore from GitHub. We need two small additions: a one-paragraph note in the README, and an RStudio project file (.Rproj). We deliberately do not pre-create a session7/ subfolder. Instead, the scripts will write into functional folders (code/, data/, output/) that describe content, not chronology. A Session 8 script can land next to a Session 7 script in code/ without any reorganization.
An .Rproj file is RStudio’s marker that “this folder is a project”. Two benefits matter today. First, working directory. When you double-click aem7010-ai.Rproj, RStudio opens with the working directory set to the repo root, so relative paths like "data/placements_dyson.csv" resolve consistently every time, on every machine. No setwd() calls in the script. Second, the project remembers state. Open files, command history, and the Git pane all stay scoped to this repo. You can have several projects open in separate RStudio windows without them polluting each other.
For reproducible research, the working-directory point is the load-bearing one. Anyone who clones the repo and double-clicks the .Rproj runs the same scripts in the same context. That is one of the smallest, highest-leverage habits we will pick up this term.
Paste the following prompt into Cowork:
I am working inside this folder, which is a fresh clone of a GitHub repo. It already has a README.md and .gitignore (R template). Please do exactly two things and then stop.
Append one short paragraph to
README.mddescribing what this repo holds: scripts and outputs for the AEM 7010 AI-tools exercises. The repo follows a functional layout:code/for R scripts,data/for input and processed data,output/for generated figures and reports.Create an RStudio project file at
aem7010-ai.Rprojat the repo root, using the standard RStudio defaults. This makes the repo open as a project in RStudio with the working directory set to the repo root.
Do not create any other files (no code/, no data/, no output/; the scripts will create those when they need them). Do not run any R code. Do not install any packages. Do not edit .gitignore. Stop after the two items above and report what you changed.
When Cowork stops, look at VS Code’s Source Control panel. You should see exactly two changes: a new aem7010-ai.Rproj and a modified README.md. Read each diff.
First commit and push
Two ways. Pick the one you prefer; we will use it for all later commits today.
- In the Source Control panel, click the
+next to “Changes” to stage both files. - Type the commit message:
Add RStudio project and project description. - Press Cmd+Enter (Mac) or Ctrl+Enter (Windows) to commit.
- Click Sync Changes at the bottom (cloud-with-arrows icon). This pushes to GitHub.
git add aem7010-ai.Rproj README.md
git commit -m "Add RStudio project and project description"
git pushWith aem7010-ai.Rproj in place, double-click that file in Finder (or open it from inside RStudio). RStudio will open the repo as a project, with its working directory set to the repo root. Keep RStudio open in parallel with VS Code: VS Code is your editor and Source Control panel; RStudio is where you run R.
Open github.com/<your-handle>/aem7010-ai and refresh. You should see two commits in the history (the original “Initial commit” and your new one), the aem7010-ai.Rproj file, and the updated README.md with your project paragraph. If not, the push did not happen. Check for errors in the bottom-left of VS Code or in the terminal output before continuing.
Guided scrape of Cornell Dyson placements
The task
We return to the Cornell Dyson PhD placements page, this time with Cowork. This is the same page the instructor demoed in Session 6, deliberately. The point is to see the same task played out at very different speeds with very different risks.
Goal output: a CSV at data/placements_dyson.csv (at the repo root) with the following columns.
| Column | Example |
|---|---|
name |
Sharan Banerjee |
year |
2025 |
placement |
Postdoctoral Fellow at KAPSARC School of Public Policy, Riyadh |
source_url |
https://dyson.cornell.edu/programs/graduate/placements/ |
A working script, a CSV of about 90 rows spanning 2015 to 2025, and a clean Git commit pushed to GitHub. That is the deliverable.
How we work this together
We do this in lockstep, not as a demo. The instructor projects the same screen you have. Each step below happens on every laptop in the room. We pause at the checkpoints. Do not skip ahead, and do not lag silently.
The rhythm is the same one you will use in your own research:
- Source Control panel clean? Check.
- Paste the shared prompt (below).
- Wait for Cowork to finish, then read what it changed.
- Run the verification checklist.
- Commit. Sync. Refresh github.com on a side tab to confirm.
Watch for the moments where Cowork wants to reply with a table instead of a script. That is the Mode A temptation. The prompt below is written to prevent it, but the agent will still drift if you are not watching.
The shared prompt
This is the exact text you paste into Cowork. Do not paraphrase. Mode B enforcement is load-bearing; the third paragraph is doing most of the work.
I am working inside this folder. Please do the following.
Write an R script at
code/scrape_dyson_cowork.R(create thecode/folder if it does not exist) that scrapes the PhD placements table from https://dyson.cornell.edu/programs/graduate/placements/. The script should save a CSV atdata/placements_dyson.csv(create thedata/folder if it does not exist) with exactly these columns, in this order:name,year,placement,source_url. Theplacementcolumn is the job title joined to the institution by the word “at”, like “Assistant Professor at University of Illinois Urbana-Champaign”. Thesource_urlcolumn is the URL above, repeated on every row.Use
rvestandreadr. Anchor your selector on the heading text “Recent PhD Job Placements” so the script is robust to changes in CSS class names. Drop any row in the table where all four cells are empty.The R script is the artifact I care about. Do not paste the scraped data into this chat. I will run the script myself from a fresh R session to verify it works. Include a one-line
message()at the end reporting the number of rows written.After writing the script, run it once in the sandbox so we know it works. Report the row count and stop. Do not create any other files. Do not edit my
.gitignore.
Read the prompt before pasting. Note four things about it.
- The output schema is specified exactly. Column names, column order, separator word for the
placementcolumn. If you say “a CSV with the relevant info”, you will get something you did not want. - The selector strategy is given. “Anchor on the heading text” is the same rule the fallback script follows. If you leave this out, Cowork picks a CSS class that may not survive the next page redesign.
- The artifact is declared. “The R script is the artifact I care about. Do not paste the data.” This is Mode B in one sentence.
- The scope is closed. “Do not create any other files. Do not edit my
.gitignore.” Without this, Cowork sometimes adds a README, arenv.lock, or a helper script you did not ask for. Harmless most of the time, noisy in a git diff.
Paste and wait
Time: ~20 minutes. Cowork already has access to ~/github/aem7010-ai from the permissions step earlier. Paste the shared prompt and let it run.
When Cowork reports a row count, do not yet trust it. Move to the verification checklist.
Verification checklist
Five checks, in order. All five must pass before you commit. If any fails, fix it first.
Does the script exist at
code/scrape_dyson_cowork.R? If Cowork named it differently, rename it. The name is part of the contract.Does the script run cleanly from a fresh R session? Do not trust “it ran in the sandbox”. Run it yourself.
With the
aem7010-ai.Rprojproject open in RStudio, opencode/scrape_dyson_cowork.R. Click Session → Restart R, then click Source (or press Cmd+Shift+S / Ctrl+Shift+S). Read the message in the console.From the repo root:
Rscript code/scrape_dyson_cowork.R. RequiresRscripton yourPATH(true on most macOS installs, sometimes missing on Windows).Does
data/placements_dyson.csvhave 80 or more rows, and the four expected columns in the right order? In the R console:readr::read_csv("data/placements_dyson.csv")followed bynrow()andnames().Pick three random rows and verify them against the live page. Open the browser, find the row, compare name, year, and the join of position-plus-institution. If any of the three is wrong, the scraper is wrong, even if the row count looks right.
Can you explain every line of the script? If a line uses a function you do not recognize, ask Cowork to explain it until you can restate what it does in your own words. Then decide whether to keep it.
If a check fails, prompt Cowork to fix that specific failure. Do not accept a new full-rewrite response: ask for a patch.
Commit and push #1
When all five verification checks pass, commit and push from VS Code. Three clicks plus a sentence.
- Source Control panel → click the
+next tocode/scrape_dyson_cowork.Randdata/placements_dyson.csvto stage them. (Or+next to “Changes” to stage everything: read each diff first.) - In the message box at the top, type:
Session 7: Cowork-drafted scraper for Dyson PhD placements. - Click the checkmark (or Cmd+Enter / Ctrl+Enter) to commit.
- Click the Sync Changes button at the bottom of the VS Code window (cloud-with-arrows icon). This pushes to GitHub.
Now open github.com/<your-handle>/aem7010-ai and refresh. You should see:
- A new commit with your message at the top of the commit list.
- The
code/scrape_dyson_cowork.Rfile. - The
data/placements_dyson.csvfile. Click it; GitHub will render the CSV as a table for you to spot-check.
If GitHub does not show the new commit, you have not actually pushed. Check the bottom-left of VS Code for any sync errors and resolve them before continuing.
Build the pipeline
Time: ~30 minutes. This is where Session 7 goes beyond Session 6. The scrape we just finished is something chat could also have produced, more slowly. The pipeline below is something chat practically could not finish in a class period, because each step depends on the actual shape of the data, not on a guess.
What the pipeline does
A research project is rarely just a scrape. The placements table is the input to a small analysis. We will ask Cowork to build a second script, code/pipeline_dyson.R, that takes the CSV from the scrape and produces three new artifacts in four steps.
- Parse the
placementcolumn intoposition_titleandinstitution. - Classify each row as
academic,government,industry, orotherusing a short keyword rule. - Plot placements per year, colored by category, saved to
output/figures/placements_dyson_by_year.png. - Summarize the result by writing
output/findings_dyson.mdwith the totals, the category counts, the year range, and the most common institution.
The output is a small, rerunnable project. The CSV is the boundary between the scrape and the pipeline. Either side can be rerun without the other.
Shared pipeline prompt
Paste the following prompt into Cowork. As before, do not paraphrase. The structure of the prompt is what keeps the agent honest.
I am working inside this folder. The scrape from earlier produced data/placements_dyson.csv. Please build the analysis pipeline.
Write an R script at
code/pipeline_dyson.Rthat readsdata/placements_dyson.csvand does steps 2 through 5 below. Use onlytidyverseand base R. Do not install other packages. Run the script in the sandbox once at the end so we know it works.Parse. Split the
placementcolumn into two new columns:position_title(everything before the first ” at “) andinstitution(everything after the first” at “). Trim whitespace on both. If a row has no” at ” separator, set both toNAand report the count of failures withmessage().Classify. Add a
categorycolumn with valuesacademic,government,industry, orother. Apply these rules in order:academicifposition_titlecontains any of: “Professor”, “Lecturer”, “Postdoctoral”, “Postdoc”, “Faculty”, “Research Fellow” (case-insensitive).governmentifinstitutioncontains any of: “Bureau”, “Department of”, “USDA”, “Federal Reserve”, “World Bank”, “OECD”, “IMF”, “United Nations”, “FAO”, “Ministry” (case-insensitive).industryifposition_titlecontains any of: “Analyst”, “Consultant”, “Manager”, “Director” (case-insensitive) and the row was not already classified asacademicorgovernment.otherotherwise.
Report the count by category at the end with
message().Plot. Make a stacked bar chart of placements per year, colored by
category, usingggplot2. Use the four categories in this fixed order in the legend:academic,government,industry,other. Save the plot tooutput/figures/placements_dyson_by_year.png(create theoutput/figures/folders if they do not exist) at 8 by 5 inches, 150 dpi.Summarize. Write a short
output/findings_dyson.mdmarkdown file containing: total rows, count by category, year range (min to max), and the most common institution. Generate every number programmatically from the data.The two scripts and the four output files are the artifacts. Do not paste the data or the markdown content into this chat. I will run the pipeline myself from a fresh R session to verify. Do not modify
code/scrape_dyson_cowork.R. Do not edit.gitignoreorREADME.md. Do not create any other files.
Read the prompt before pasting. Notice five things:
- The contract is precise. Column names, classification rules in order, file paths, and dimensions for the plot are all specified.
- The script is the artifact. The numbers in
findings.mdmust come from the data, not be typed by the agent. That is Mode B applied to a markdown report. - The boundary is explicit. The pipeline reads
data/placements_dyson.csv. It does not re-scrape. Scraping and analysis are different steps that fail in different ways. - The scope is closed. The agent is told what not to touch (the scrape, the README, the
.gitignore). - The agent must report. The two
message()calls are the parsing failure count and the category counts. These are the numbers you will reconcile againstfindings.md.
Paste and wait, again
Time: ~20 minutes. Same rhythm as before, in lockstep with the room. Paste the pipeline prompt. Watch the file tree. When Cowork stops, you should see new entries in the Source Control panel: code/pipeline_dyson.R, output/figures/placements_dyson_by_year.png, output/findings_dyson.md, and any folders that did not yet exist (output/, output/figures/).
If you see anything else (a tests/ folder, a new renv.lock, an edit to the scrape script), read the diff before staging.
Verification checklist (pipeline)
Five checks, in order. All five must pass before the second commit.
Does
code/pipeline_dyson.Rrun cleanly from a fresh R session?In the RStudio project, open
code/pipeline_dyson.R. Session → Restart R, then Source. Read the twomessage()lines that appear in the console.From the repo root:
Rscript code/pipeline_dyson.R. Read the twomessage()lines in the terminal output.Do the parsing failures and category counts make sense? Total rows minus parsing failures should equal the sum of the four category counts. If they do not reconcile, the script has a silent bug.
Does
output/figures/placements_dyson_by_year.pngexist and look right? Open it. Years on the x-axis, four categories in the legend in the right order, no obvious gaps in the bars.Does
output/findings_dyson.mdexist and reconcile to the data? Open it. The numbers should match the messages from step 1 and the values in the CSV. Spot-check the “most common institution” by hand.Can you explain every line of the script, including each classification rule? If a regex looks magic, ask Cowork to explain it until you can restate it. Then decide whether to keep it.
If a check fails, prompt Cowork for a patch on that specific failure. Do not accept a full rewrite.
Commit and push #2
When all five pipeline checks pass, repeat the Source Control rhythm.
- Source Control panel → stage the new entries:
code/pipeline_dyson.R,output/figures/placements_dyson_by_year.png,output/findings_dyson.md. - Commit message:
Session 7: pipeline (parse, classify, plot, summarize). - Commit (checkmark or Cmd+Enter).
- Sync Changes. Refresh github.com.
You should now see two scrape-and-pipeline commits and a clean functional layout: code/scrape_dyson_cowork.R, code/pipeline_dyson.R, data/placements_dyson.csv, output/figures/placements_dyson_by_year.png, and output/findings_dyson.md. Click output/findings_dyson.md on github.com. It renders as a small report with numbers that came from the data, not from the agent’s prose.
Two commits in one session is a healthy pattern. The diff between them is the shape of an actual research workflow: from raw data to a small, reproducible analysis.
Stretch: cross-school comparison
Ask Cowork to scrape the Berkeley ARE PhD placements page (https://are.berkeley.edu/graduate/job-market-placement) into data/placements_berkeley.csv using the same column contract, then run the same pipeline on it, and produce a output/findings_comparison.md with a small table of category counts side by side for Cornell and Berkeley.
Two things to watch:
- Different page, different selectors. The Berkeley page does not anchor on the same heading. Cowork will need a different selector strategy, which is itself a useful lesson.
- Same pipeline, different inputs. The classifier and the plot helper are reusable. If you find yourself rewriting them, refactor them out of
pipeline_dyson.Rinto a small helper file Cowork can call from both pipelines. This is a Session 8 move; doing it once here will make Monday feel familiar.
Debrief
What we learned
The first half of class compressed a forty-minute chat-and-paste workflow into a three-minute interaction. Same CSV, same target page, much less typing. That is a real productivity gain, but it is the smaller of the two lessons.
The bigger lesson is the second half. Cowork did not just write the scraper faster. It built a small project around it: a parser, a classifier, a plot, a programmatic findings report. Five files, one CSV boundary between them. Doing the same thing in chat would have meant copy-pasting partial outputs back and forth across many turns, with the agent guessing at columns it could not see. By the time you finished, class would be over.
The verification reflex did not change. It got more work to do. You ran it twice today on different artifacts: a CSV from the scraper, and a markdown report plus a PNG from the pipeline. The questions are the same. Do the numbers reconcile? Does it run from a fresh R session? Can you explain every line?
What changes from chat, to Cowork, to Claude Code
The course module is not about three tools. It is one ladder. Each rung adds one new dimension and keeps the rest constant.
| Rung | Tool | Task scope | Pipeline depth | Workflow style |
|---|---|---|---|---|
| Session 6 | chat | one school, one script | scrape only | manual paste |
| Session 7 (today) | Cowork | one school, small project | scrape, parse, classify, plot, summary | interactive, you watch |
| Session 8 (Monday) | Claude Code | five schools, study | same pipeline shape | delegated, you review the diff |
Session 6 to 7 added pipeline depth and held data scope constant. Session 7 to 8 will add data scope and delegation, and hold the pipeline shape constant. Your Session 7 code will be the seed of Monday’s Session 8.
What Session 6 forced, what Cowork allows
The other lesson, which carries every week. Notice when a tool’s constraints were protecting you, and replace them with discipline when those constraints fall away.
| Behavior | In Session 6 (chat) | In Session 7 (Cowork) |
|---|---|---|
| Running the script | You had to, by pasting and executing | The agent can, so you have to make yourself |
| Verifying the output | You had to, the agent could not | The agent reports a row count, so you have to insist |
| Saving an artifact | Only possible as a script, pasted in | Possible as a script or as a displayed table; Mode B is your rule now |
| Editing files | You, one paste at a time | The agent, potentially many files at once, possibly silently |
| Ending the session | Clean by default | Requires git diff to know what happened |
| Sharing the work | A pasted script in your notes | A live GitHub repo with a commit history |
The more capable the tool, the more of the verification reflex you own directly.
The bottleneck is no longer typing
The bottleneck is verification, and it always will be. Faster tools do not remove verification work. They move it closer to the end of the pipeline, where it is easier to skip. The verification checklist and the GitHub round-trip are how you do not skip it.
For Monday
Your
aem7010-airepo on GitHub is your Session 7 deliverable. Confirm at least two commits are visible atgithub.com/<your-handle>/aem7010-ai.Install Claude Code before class. Anthropic ships a native installer that bundles everything as a single binary. No Node.js required. One command in a terminal:
# macOS / Linux curl -fsSL https://claude.ai/install.sh | bash # Windows (PowerShell) irm https://claude.ai/install.ps1 | iexThen verify with
claude --version. You will need a Claude account (free signup at https://claude.com). The in-class exercises will not require a paid plan. Full guide at https://docs.claude.com/en/docs/claude-code/setup. Bring any install errors to the first ten minutes of class on Monday; that time is reserved for troubleshooting.Open the
aem7010-airepo in your terminal at the repo root. That is the directory Claude Code will operate in on Monday. Confirmgit statusis clean before you arrive.
If Cowork misbehaves on a student machine during the synchronous walkthrough, a working scraper lives at ai-tools/scrape_dyson_cowork.R in the course repo. It takes about 10 seconds to run and produces the same CSV. Use it as a reference, or as a literal drop-in if the room gets stuck on the scrape.