How LLMs work, and the limits of chat
2026-04-27
§ Tutorial: Where we are in the course
Sessions 4 and 5 gave you Git. Sessions 6, 7, and 8 give you AI.
Three tools, three sessions, one framework. The discipline is the same across all three.
These sessions assume Sessions 4 and 5. Every exercise starts with git init and ends with git commit.
§ Tutorial: A mental model of LLMs
You will use these tools for the rest of your research life. You do not need the math. You need four facts.
Each fact has a consequence for research.
§ Tutorial: Fact 1
Your phone guesses the next word. A large language model does the same thing, at scale. Text in, next token out, loop until stop.
Research implication. Plausible continuations are useful for prose, common-pattern code, explanations. They are risky for numbers, citations, and statistical claims. Plausible is not the same as correct.
§ Tutorial: Fact 2
A chat interface is not a database. Asked for a citation, it generates a string that looks like a citation, token by token.
Rule of thumb. If an AI gives you a citation, verify it before you use it. Paste the title into Scholar. Check the DOI. Read the paper.
§ Tutorial: Fact 3
The model samples from a probability distribution. Temperature scales the distribution.
Where it is exposed:
Two students, same prompt, same tool, same minute → different outputs. We will see this live in class.
A chat transcript is not a replication artifact. You cannot rerun it. Even five minutes later. Even by yourself.
§ Tutorial: Fact 4
Everything the model “sees” lives in its context window: your prompts, its replies, any files pasted in.
Three implications. (1) Verify everything probabilistic. (2) Summarize long sessions and restart. (3) What you cannot verify, you cannot cite. The strictest rule of the module.
§ Tutorial: Chat as a category
Brand names rotate. Category persists: turn-based conversation, copy-paste workflow.
Good at:
Bad at:
§ Tutorial: Mode A vs Mode B
The single most important distinction in the module. Write it down.
Mode A
AI as runtime
“Do the thing.”
Output: data
Reasoning: hidden
vs
Mode B
AI as code author
“Write code that does the thing.”
Output: script
Reasoning: visible
Mode A is faster. Mode A is almost always wrong for research.
The reproducible artifact is always code. A chat transcript is not a method. An R script your co-author can run is.
§ Tutorial: Live demo
We do the Cornell Dyson PhD placements page on the projector. Three failure modes any of which will surface:
The tool’s confidence is not evidence of correctness.
⟶ Switch to the tutorial for the live demo on the projector (~10 min). You will see one of these three happen in real time.
§ Tutorial: The verification reflex
Five checks, every time. The habit that recurs in every session.
.R file committed?If any fails, do not commit until it passes.
§ Tutorial: Hands-on exercise
You each pick a different department’s placement page. Cornell Dyson is reserved for Wednesday.
Three rules:
.R script. Not pasted data.⟶ Switch to the tutorial: Hands-on exercise (~20 min). Claim a department. Write the script. Run the verification reflex. Commit.
§ Tutorial: Gallery · Debrief
Two or three volunteers project their work (~5 min total). Focus:
The ceiling of chat: you pasted HTML in, you pasted scripts out. That friction cost time and cost errors. The next two sessions lift that ceiling. Same discipline, more power.
Wednesday (Session 7): Cowork. Same model, new interface. The agent sees your files and runs code.
Before Wednesday, four things:
aem7010-ai — a dedicated repo for the AI module. Full setup block is in the tutorial’s “For Wednesday” section. Takes five minutes.Full walkthrough with copy-paste commands on the companion site: arielortizbobea.github.io/aem7010