Session 7: AI Tools II

Agentic desktop AI (Cowork): from empty folder to GitHub

Prof. Ariel Ortiz-Bobea

2026-04-29

Pre-Class Checklist

§ Tutorial: Pre-class checklist

One thing only:

  • Cowork installed and signed in.

We start from an empty folder today. Nothing to clone, nothing to push ahead of time.

The first ten minutes of class are for stragglers. If Cowork is not installed, say so immediately.

VS Code is our IDE today. If you have not used it for Git yet, the Source Control panel is the third icon on the left sidebar.

Quick Recap: Session 6

§ Tutorial: Where we are

Session 6: chat. Probabilistic pattern-completer, copy-paste interface. No file access. No code execution.

Mode A vs Mode B:

  • Mode A: AI does the thing. Output is data. Reasoning hidden. → Almost always wrong for research.
  • Mode B: AI writes code that does the thing. Output is a script. Reasoning visible. → The reproducible path.

Chat forced Mode B by accident. Cowork does not. Every prompt today bakes Mode B in explicitly.

What Changes When the Agent Can Reach Your Files

§ Tutorial: What changes

Three capabilities shift at once.

  • File access. The agent sees every file in the folder you grant. You no longer paste snippets. You no longer choose what it sees. Granting a folder is a trust decision.
  • Code execution. R, Python, shell, in a sandboxed shell: an isolated terminal on the agent’s side, scoped to your folder.
  • On-disk modifications. The agent creates, edits, and deletes files. Without asking each time, by design.

All three are upgrades. All three are new surfaces for silent damage.

The guardrail is Git. git status and git diff are your only reliable audit trail of what the agent actually did.

Chat and Cowork, Side by Side

§ Tutorial: Chat and Cowork, side by side

Same underlying model. Different surface, capabilities, risks.

Dimension Chat (Session 6) Cowork (today)
Context Manual paste Automatic from folder
Read files? No Yes
Run code? No Yes, sandboxed
Edit files? No Yes, directly
Model “knows” Only what you typed Full files, real errors
Audit trail Git commit of pasted code Git diff of touched folder
Friction Copy-paste fatigue Permission + trust decisions

Capability goes up. Friction goes down. Risk goes up. Git turns that risk from catastrophic to annoying.

What Cowork Is Good At / Bad At

§ Tutorial: What Cowork is good at · What Cowork is bad at

Good:

  • Bootstrapping projects (README, .gitignore, folder layout, renv setup)
  • Exploratory data work (real columns, real distributions)
  • Multi-step tasks with intermediate state (scrape → clean → plot)
  • Debugging with full context (“run this, fix the error”)

Bad:

  • Outputs you cannot sanity-check by eye (an elasticity estimate appears: is it right?)
  • Tasks that must run unattended (no cron jobs)
  • Work whose reasoning must survive the session (classify 200 rows, save only the CSV → the rule is gone)
  • Anything where a confident-sounding hallucination would cost you (fake rvest functions, untested selectors)

Today’s Arc: GitHub Repo, Then Cowork

§ Tutorial: Today’s arc

With the framing in place, here is the arc for the next 75 minutes:

  1. Create the repo on github.com (README + R .gitignore template).
  2. Clone it locally in your terminal.
  3. Open in VS Code. Grant Cowork access. Scaffold the project (README + .Rproj).
  4. Scrape. Cowork writes scrape_dyson_cowork.R. Verify, commit, push.
  5. Pipeline. Cowork writes pipeline_dyson.R: parse, classify, plot, summarize. Verify, commit, push.
  6. Stretch: add Berkeley ARE for a cross-school comparison.

First half feels like Session 6 at speed. Second half is something chat could not have finished.

Step 1: Create the Repo on github.com

§ Tutorial: Create the repo on GitHub

In your browser, go to https://github.com/new. Fill in the form:

  • Repository name: aem7010-ai
  • Description: Cowork and Claude Code exercises for AEM 7010
  • Public or Private: pick one. Either is fine for this course.
  • Add a README file: ✅ check it.
  • Add .gitignore: select R from the dropdown.
  • License: None for now.

Click Create repository. You land on github.com/<your-handle>/aem7010-ai.

The repo exists online. Nothing exists locally yet. That asymmetry is the point: GitHub is the canonical source.

⟶ Switch to the tutorial: Create the repo on GitHub (~2 min).

Step 2: Clone in Your Terminal

§ Tutorial: Clone in terminal

On the repo page, click Code → copy the SSH URL (or HTTPS if you skipped Session 5’s SSH setup).

Open a terminal. Replace <your-handle> and run:

cd ~/github
git clone git@github.com:<your-handle>/aem7010-ai.git
cd aem7010-ai

Sanity check:

ls -la
git status        # nothing to commit
git log --oneline # one "Initial commit" from GitHub

HTTPS form, if SSH not set up: git clone https://github.com/<your-handle>/aem7010-ai.git

Step 3: Open in VS Code, Grant Cowork, Scaffold

§ Tutorial: Open in VS Code · Grant Cowork access

Open the cloned folder in VS Code: File → Open Folder → ~/github/aem7010-ai. Or from the terminal: code ..

Grant Cowork access to that folder. Confirm three things:

  1. The folder path appears in Cowork’s UI.
  2. The chat transcript sits beside the file tree.
  3. You found the stop button before you need it.

Paste the scaffolding prompt (full text in the tutorial). It asks Cowork for two things: one paragraph appended to the README, and an aem7010-ai.Rproj file. No subfolders pre-created. Scripts will write into code/, data/, and output/ as they need them: functional names, not session-numbered.

Why the .Rproj? Double-click it and RStudio opens the repo with the working directory set to the repo root. Every relative path (data/..., output/...) resolves correctly without setwd(). Reproducibility, free.

Then commit and push: stage, message “Add RStudio project and project description”, commit, sync. Refresh github.com to confirm two commits exist.

⟶ Switch to the tutorial: Open in VS Code, grant Cowork access (~5 min).

The Shared Scrape Prompt

§ Tutorial: Shared prompt handout

We all paste the same prompt. Four design choices matter:

  • Output schema specified exactly (column names, order, separator)
  • Selector strategy given (anchor on heading text, not CSS class)
  • Artifact declared (“the R script is what I care about; do not paste the data”)
  • Scope closed (“do not create other files; do not edit my .gitignore”)

Mode B enforcement is load-bearing. Do not paraphrase the prompt.

⟶ Switch to the tutorial: Shared prompt handout. Read it before pasting.

Paste, Wait, Verify, Commit (~20 min)

§ Tutorial: Paste and wait · Verification checklist

Paste the prompt. Cowork writes code/scrape_dyson_cowork.R and produces data/placements_dyson.csv. Do not trust the row count it reports.

Five-step verification (all must pass before commit):

  1. Script at code/scrape_dyson_cowork.R?
  2. Open in RStudio (project open), Restart R + Source. Runs clean?
  3. CSV at data/placements_dyson.csv has ≥ 80 rows and the 4 columns in the right order?
  4. Three random rows match the live page?
  5. You can explain every line?

If a check fails, ask Cowork for a patch, not a full rewrite.

Commit and Push From VS Code

§ Tutorial: Commit and push #1

Source Control panel:

  1. Stage code/scrape_dyson_cowork.R and data/placements_dyson.csv.
  2. Commit message: “Session 7: Cowork-drafted scraper for Dyson PhD placements”.
  3. Commit (the checkmark, or Cmd+Enter).
  4. Sync Changes (the cloud-with-arrow icon at the bottom).

Open github.com//aem7010-ai. Refresh. Commit and files visible? If not, you have not actually pushed. Stop and fix.

Build the Pipeline (~30 min)

§ Tutorial: Build the pipeline

Where Session 7 goes beyond Session 6. The pipeline below is a small project that chat could not have finished in 75 minutes.

code/pipeline_dyson.R reads the CSV from the scrape and does four things:

  1. Parse. Split placement into position_title and institution on ” at “.
  2. Classify. Each row → academic | government | industry | other by keyword rules.
  3. Plot. Stacked bar of placements per year, colored by category, to output/figures/placements_dyson_by_year.png.
  4. Summarize. Auto-generate output/findings_dyson.md with totals, counts, year range, top institution.

The CSV is the boundary. Either side can be rerun without the other.

The Pipeline Prompt

§ Tutorial: Shared pipeline prompt

Same lockstep rhythm. Paste the full prompt from the tutorial. It specifies:

  • Exact file paths. code/pipeline_dyson.R, output/figures/placements_dyson_by_year.png, output/findings_dyson.md.
  • Classification rules in order. Academic first, then government, then industry, then other.
  • Two message() reports. Parsing failure count, then category counts. These are what you reconcile against findings.md.
  • Scope closed. Do not modify the scrape. Do not edit .gitignore or README.md.

⟶ Switch to the tutorial: The pipeline prompt. Read before pasting.

Pipeline Verification Checklist

§ Tutorial: Verification checklist (pipeline)

Five checks. All must pass before the second commit.

  1. Open pipeline_dyson.R in RStudio (project open), Restart R + Source. Runs clean?
  2. Parsing failures + sum of category counts = total rows? (The numbers reconcile.)
  3. output/figures/placements_dyson_by_year.png exists and looks right?
  4. output/findings_dyson.md exists, and its numbers match the messages and the CSV?
  5. You can explain every line, including each classification rule?

If a check fails, ask Cowork for a patch on that specific failure. Do not accept a full rewrite.

Second Commit, Second Push

§ Tutorial: Commit and push #2

Source Control panel:

  1. Stage code/pipeline_dyson.R, output/figures/placements_dyson_by_year.png, output/findings_dyson.md.
  2. Commit message: “Session 7: pipeline (parse, classify, plot, summarize)”.
  3. Commit. Sync Changes. Refresh github.com.

Click findings.md on github.com. It renders as a small report. The numbers came from the data, not from the agent’s prose. That is the artifact.

Stretch: Cross-School Comparison

§ Tutorial: Stretch

If you finish early:

“Cowork, scrape Berkeley ARE PhD placements into data/placements_berkeley.csv using the same column contract. Run the same pipeline on it. Produce output/findings_comparison.md with category counts side by side for Cornell and Berkeley.”

Two things to watch:

  • Different page, different selectors. Berkeley does not anchor on the same heading. Cowork needs a different selector strategy.
  • Same pipeline, different inputs. The classifier and plot helper are reusable. Refactor them out into a helper file. That is the Session 8 move; doing it once today makes Monday feel familiar.

Three Sessions, One Ladder

§ Tutorial: Debrief

The module is not about three tools. It is one ladder. Each rung adds one new dimension.

Rung Tool Task scope Pipeline depth Workflow style
Session 6 chat one school, one script scrape only manual paste
Session 7 (today) Cowork one school, small project scrape + parse + classify + plot + summary interactive, you watch
Session 8 (Monday) Claude Code five schools, study same pipeline shape delegated, you review the diff

Today added pipeline depth. Monday adds scope and delegation. Your Session 7 code is the seed of Session 8.

What Session 6 Forced, What Cowork Allows

The other lesson, every week. Notice when a tool’s constraints were protecting you, and replace them with discipline when they fall away.

Behavior Session 6 (chat) Session 7 (Cowork)
Running the script You had to The agent can, so you have to choose to
Verifying the output You had to The agent reports; you have to insist
Saving an artifact Only as a pasted script As script or displayed table; Mode B is your rule
Editing files You, one paste at a time The agent, many files, possibly silently
Ending the session Clean by default Requires git diff to know what happened

The more capable the tool, the more of the verification reflex you own directly.

What’s Next

Next Monday (Session 8): Claude Code. Terminal-native, git-aware, scales to many files. Replaces the keyword classifier with an LLM, scrapes five schools at once, produces a write-up.

For Monday:

  1. Confirm two commits visible on github.com/<your-handle>/aem7010-ai.
  2. Install Claude Code (native installer, no Node.js). macOS/Linux: curl -fsSL https://claude.ai/install.sh | bash. Windows (PowerShell): irm https://claude.ai/install.ps1 | iex. Verify with claude --version. Full guide at https://docs.claude.com/en/docs/claude-code/setup. Free Claude account required.
  3. Have the aem7010-ai repo open in a terminal at the repo root before class.

The bottleneck is no longer typing. It is verification.

Companion site: arielortizbobea.github.io/aem7010