Table of Links
-
TEXTGRAD: Optimizing AI systems by backpropagating text feedback
-
Results
3.2 Solution optimization by test-time training to improve problem solving
G. Treatment Plan Optimization
D Solution Optimization
D.1 Methodology
For the CoT 0-shot prediction, we use the question template and system prompt released with GPT-4o in the simple-evals repository. In particular, to closely match their evaluations, we use the ChatGPT system prompt: You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture. \n Knowledge cutoff: 2023-12 \n Current date: 2024-04-01" . Further, we use the following template:

During optimization, we provide the constraint to the optimizer that the prediction should conclude with an answer, following the simple-evals repository, through the following constraint description: The last line of your response should be of the following format: ’Answer: $LETTER’ (without quotes) where LETTER is one of ABCD..
Evaluation: Similarly, using the practice in the simple-evals repository, we perform string matching to find the final answer, which is one of the letters ABCD, and compare it to the ground truth answer. GPQA Diamond subset contains 198 questions. MMLU Machine Learning subset contains 112 questions, and College Physics subset contains 92 questions. At each iteration of optimization, we make 1 call to gpt-4o to evaluate the test time loss, 1 call to collect gradients, and 1 call to update the solution.
D.2 Prompts
The loss function for this task looks like the following:

Authors:
(1) Mert Yuksekgonul, Co-first author from Department of Computer Science, Stanford University ([email protected]);
(2) Federico Bianchi, Co-first author from Department of Computer Science, Stanford University ([email protected]);
(3) Joseph Boen, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);
(4) Sheng Liu, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);
(5) Zhi Huang, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);
(6) Carlos Guestrin, Department of Computer Science, Stanford University and Chan Zuckerberg Biohub ([email protected]);
(7) James Zou, Department of Computer Science, Stanford University, Department of Biomedical Data Science, Stanford University, and Chan Zuckerberg Biohub ([email protected]).
This paper is
