Automating Prompt Engineering: Scalable Instruction Optimization with TextGrad

18 Jun 2026

Table of Links

Abstract and 1. Introduction

TEXTGRAD: Optimizing AI systems by backpropagating text feedback
Results

3.1 Code optimization

3.2 Solution optimization by test-time training to improve problem solving

3.3 Prompt optimization for reasoning

3.4 Molecule optimization

3.5 Radiotherapy treatment plan optimization
Related work
Discussion, Acknowledgements, and References

A. TEXTGRAD Details

B. Optimizer Extensions

C. Code Optimization

D. Solution Optimization

E. Prompt Optimization

F. Molecule Optimization

G. Treatment Plan Optimization

E Prompt Optimization

E.1 Tasks

Below, we provide an example query for each of the tasks in the prompt optimization section.

For word sorting and object counting, we obtain the datasets from the BBH repository, and we randomly split examples into 50 (training)/100 (validation)/100 (test) samples. For GSM8k, we use the splits provided in DSPy [10] which has 200 (training)/300 (validation)/1319 (test) samples.

Evaluation: For object counting and GSM8k, we use the string-based exact match metric, which looks at the last numerical value provided in the answer, and compares it to the ground truth answer. For word sorting, we prompt gpt-4o to compare the ground truth list to the response provided in the answer, through the following prompt:

Authors:

(1) Mert Yuksekgonul, Co-first author from Department of Computer Science, Stanford University ([email protected]);

(2) Federico Bianchi, Co-first author from Department of Computer Science, Stanford University ([email protected]);

(3) Joseph Boen, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(4) Sheng Liu, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(5) Zhi Huang, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(6) Carlos Guestrin, Department of Computer Science, Stanford University and Chan Zuckerberg Biohub ([email protected]);

(7) James Zou, Department of Computer Science, Stanford University, Department of Biomedical Data Science, Stanford University, and Chan Zuckerberg Biohub ([email protected]).

This paper is available on arxiv under CC BY 4.0 license.

← Previous

Optimizing Reasoning Benchmarks: GPQA and MMLU via TextGrad

Up Next →

Agentic Molecular Design: Generative Chemistry via TextGrad