TextGrad Framework: The Future of Compound AI Optimization

14 Jun 2026

Table of Links

Abstract and 1. Introduction

TEXTGRAD: Optimizing AI systems by backpropagating text feedback
Results

3.1 Code optimization

3.2 Solution optimization by test-time training to improve problem solving

3.3 Prompt optimization for reasoning

3.4 Molecule optimization

3.5 Radiotherapy treatment plan optimization
Related work
Discussion, Acknowledgements, and References

A. TEXTGRAD Details

B. Optimizer Extensions

C. Code Optimization

D. Solution Optimization

E. Prompt Optimization

F. Molecule Optimization

G. Treatment Plan Optimization

5 Discussion

TextGrad is built on three key principles: i) It is a general and performant framework that is not handcrafted for a specific application domain, ii) It is easy-to-use, mirroring PyTorch abstractions thus allowing knowledge transfer, iii) It is fully open-source. Through TEXTGRAD, we obtained state-of-the-art results in code optimization and PhD-level question answering, optimized prompts, and provided proof-of-concept results in scientific applications such as developing molecules and optimizing treatment plans.

While we took a first step, there are various limitations that motivate future work to realize the potential of automatic differentiation frameworks powered by LLMs. First, while we demonstrated the potential of backpropagating text feedback, there are many applications our framework can be extended to. We hope TEXTGRAD can be used to accelerate iterative processes in scientific discovery and increase the productivity of engineering efforts. For instance, to allow for this, we hope to extend the operations in our computation graphs to include more components used in practical LLM applications, such as for tool use [83] or retrieval-augmented generation systems [93]. Second, the automatic differentiation analogy enables a large design space for algorithms. We believe there are many fruitful connections to be drawn between numerical optimization, automatic differentiation, and TEXTGRAD. In particular, increasing the stability of the optimization using variance reduction techniques [94], adaptive gradients [95], or self-verification using LLMs [96] are interesting connections. Meta learning approaches [97–99] to optimize the TextGrad framework using methods such as TextGrad itself is also an intriguing direction of future work.

Finally, while we conducted proof-of-concept applications of TEXTGRAD to design new molecules and treatment plans with in silico validations, the ultimate test requires experimental and clinical assessments, which are outside of the scope of this paper.

As the paradigm of AI shifts from training individual models to optimizing compound systems involving multiple interacting LLM components and tools, we need a new generation of automated optimizers. TEXTGRAD combines the reasoning power of LLMs with the decomposable efficiency of backpropation to create a general framework to optimize AI systems.

Acknowledgements

We would like to thank Duygu Yilmaz, Begum Ergun, Fatih Dinc, Yu Sun, Omar Khattab, Ian Covert, Kyle Swanson, Omer Faruk Akgun, Yusuf Efe, Kevin Y Wu, Eric Wu, Kailas Vodrahalli, Oscar Pastor Serrano, Patrick John Chia, Jacopo Tagliabue, Nitya Thakkar, Elana Simon, Pan Lu, Sabri Eyuboglu, Irena Gao, Lingjiao Chen, and members of the Zou Group for their support and comments on this work.

References

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901 (2020).
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
Reid, M., Savinov, N., Teplyashin, D., Lepikhin, D., Lillicrap, T., Alayrac, J.-b., Soricut, R., Lazaridou, A., Firat, O., Schrittwieser, J., et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530 (2024).
AI@Meta. Llama 3 Model Card. https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md (2024).
Anthropic, A. The Claude 3 Model Family: Opus, Sonnet, Haiku. Claude-3 Model Card (2024).
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
Trinh, T. H., Wu, Y., Le, Q. V., He, H. & Luong, T. Solving olympiad geometry without human demonstrations. Nature 625, 476–482 (2024).
Li, Y., Choi, D., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., Eccles, T., Keeling, J., Gimeno, F., Dal Lago, A., et al. Competition-level code generation with alphacode. Science 378, 1092–1097 (2022).
Yang, J., Jimenez, C. E., Wettig, A., Lieret, K., Yao, S., Narasimhan, K. & Press, O. SWE-agent: AgentComputer Interfaces Enable Automated Software Engineering 2024.
Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., A, S. V., Haq, S., Sharma, A., Joshi, T. T., Moazam, H., Miller, H., Zaharia, M. & Potts, C. DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines in The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=sY5N0zY5Od.
Zaharia, M., Khattab, O., Chen, L., Davis, J. Q., Miller, H., Potts, C., Zou, J., Carbin, M., Frankle, J., Rao, N. & Ghodsi, A. The Shift from Models to Compound AI Systems https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/. 2024.
Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H. & Ba, J. Large Language Models are Human-Level Prompt Engineers in The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=92gvk82DE-.
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Fawzi, A., Balog, M., Huang, A., Hubert, T., Romera-Paredes, B., Barekatain, M., Novikov, A., R Ruiz, F. J., Schrittwieser, J., Swirszcz, G., et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610, 47–53 (2022).
Mankowitz, D. J., Michi, A., Zhernov, A., Gelmi, M., Selvi, M., Paduraru, C., Leurent, E., Iqbal, S., Lespiau, J.-B., Ahern, A., et al. Faster sorting algorithms discovered using deep reinforcement learning. Nature 618, 257–263 (2023).
Merchant, A., Batzner, S., Schoenholz, S. S., Aykol, M., Cheon, G. & Cubuk, E. D. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).
Goodfellow, I., Bengio, Y. & Courville, A. Deep learning (MIT press, 2016).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S. & Darrell, T. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D. & Bengio, Y. Theano: A CPU and GPU Math Expression Compiler in Proceedings of the Python for Scientific Computing Conference (SciPy) (2010).
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. TensorFlow: A System for Large-Scale Machine Learning in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016), 265–283.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
Collobert, R., Bengio, S. & Mariéthoz, J. Torch: a modular machine learning software library (2002).
Pryzant, R., Iter, D., Li, J., Lee, Y., Zhu, C. & Zeng, M. Automatic Prompt Optimization with “Gradient Descent” and Beam Search in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H., Pino, J. & Bali, K.) (Association for Computational Linguistics, Singapore, Dec. 2023), 7957–7968. https://aclanthology.org/2023.emnlp-main.494.
Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K. & Yao, S. Reflexion: language agents with verbal reinforcement learning in Advances in Neural Information Processing Systems 36 (2023). https://proceedings.neurips.cc/paper_files/paper/2023/file/1b44b878bb782e6954cd888628510e90-Paper-Conference.pdf.
Rein, D., Hou, B. L., Stickland, A. C., Petty, J., Pang, R. Y., Dirani, J., Michael, J. & Bowman, S. R. Gpqa: A graduate-level google-proof q&a benchmark. arXiv preprint arXiv:2311.12022 (2023).
Li, X., Zhang, T., Dubois, Y., Taori, R., Gulrajani, I., Guestrin, C., Liang, P. & Hashimoto, T. B. Alpacaeval: An automatic evaluator of instruction-following models 2023.
Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862 (2022).
Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., et al. Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems 36 (2024).
Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., Radford, A., Amodei, D. & Christiano, P. F. Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33, 3008–3021 (2020).
Yuan, W., Pang, R. Y., Cho, K., Sukhbaatar, S., Xu, J. & Weston, J. Self-rewarding language models. arXiv preprint arXiv:2401.10020 (2024).
Dubois, Y., Li, C. X., Taori, R., Zhang, T., Gulrajani, I., Ba, J., Guestrin, C., Liang, P. S. & Hashimoto, T. B. Alpacafarm: A simulation framework for methods that learn from human feedback. Advances in Neural Information Processing Systems 36 (2024).
Bottou, L. Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT’2010, 177–186 (2010).
Boyd, S., Boyd, S. P. & Vandenberghe, L. Convex optimization (Cambridge university press, 2004).
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35, 27730–27744 (2022).
Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M. & Le, Q. V. Finetuned Language Models are Zero-Shot Learners in International Conference on Learning Representations (2022). https://openreview.net/forum?id=gEZrGCozdqR.
Yuksekgonul, M., Chandrasekaran, V., Jones, E., Gunasekar, S., Naik, R., Palangi, H., Kamar, E. & Nushi, B. Attention Satisfies: A Constraint Satisfaction Lens on Factual Errors of Language Models in The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=gfFVATffPd.
Abdin, M. I., Gunasekar, S., Chandrasekaran, V., Li, J., Yuksekgonul, M., Peshawaria, R. G., Naik, R. & Nushi, B. KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval in The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=b3kDP3IytM.
Polyak, B. T. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4, 1–17 (1964).
Sutskever, I., Martens, J., Dahl, G. & Hinton, G. On the importance of initialization and momentum in deep learning in International conference on machine learning (2013), 1139–1147.
Sun, Y., Wang, X., Liu, Z., Miller, J., Efros, A. & Hardt, M. Test-Time Training with Self-Supervision for Generalization under Distribution Shifts in Proceedings of the 37th International Conference on Machine Learning (PMLR, 2020). https://proceedings.mlr.press/v119/sun20b.html.
Sun, Y., Li, X., Dalal, K., Hsu, C., Koyejo, S., Guestrin, C., Wang, X., Hashimoto, T. & Chen, X. Learning to (learn at test time). arXiv preprint arXiv:2310.13807 (2023).
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R. & Cao, Y. ReAct: Synergizing Reasoning and Acting in Language Modelsin The Eleventh International Conference on Learning Representations(2023). https://openreview.net/forum?id=WE_vluYUL-X.
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. & Steinhardt, J. Measuring MassiveMultitask Language Understanding in International Conference on Learning Representations (2021). https://openreview.net/forum?id=d7KBjmI3GmQ.
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. Advances in neural information processing systems 35, 22199–22213 (2022).
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. Chain-ofthought prompting elicits reasoning in large language models. Advances in neural information processing systems 35, 24824–24837 (2022).
OpenAI. Hello GPT-4o Accessed: 2024-05-18. 2024. https://openai.com/index/hello-gpt-4o/.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H. & Neubig, G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55, 1–35 (2023).
Suzgun, M., Scales, N., Schärli, N., Gehrmann, S., Tay, Y., Chung, H. W., Chowdhery, A., Le, Q., Chi, E., Zhou, D. & Wei, J. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them in Findings of the Association for Computational Linguistics: ACL 2023 (Association for Computational Linguistics, Toronto, Canada, July 2023). https://aclanthology.org/2023.findings-acl.824.
Srivastava, A. et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research. ISSN: 2835-8856. https://openreview.net/forum?id=uyTL5Bvosj (2023).
Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C. & Schulman, J. Training Verifiers to Solve Math Word Problems. arXiv preprint arXiv:2110.14168 (2021).
Nicolaou, C. A. & Brown, N. Multi-objective optimization methods in drug design. Drug Discovery Today: Technologies 10, e427–e435 (2013).
Hoelder, S., Clarke, P. A. & Workman, P. Discovery of small molecule cancer drugs: successes, challenges and opportunities. Molecular oncology 6, 155–176 (2012).
Kontoyianni, M. Docking and virtual screening in drug discovery. Proteomics for drug discovery: Methods and protocols, 255–266 (2017).
Agarwal, S. & Mehrotra, R. An overview of molecular docking. JSM chem 4, 1024–1028 (2016).
Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry 31, 455– 461 (2010).
Ursu, O., Rayan, A., Goldblum, A. & Oprea, T. I. Understanding drug-likeness. Wiley Interdisciplinary Reviews: Computational Molecular Science 1, 760–781 (2011).
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nature chemistry 4, 90–98 (2012).
Bender, B. J., Gahbauer, S., Luttens, A., Lyu, J., Webb, C. M., Stein, R. M., Fink, E. A., Balius, T. E., Carlsson, J., Irwin, J. J., et al. A practical guide to large-scale docking. Nature protocols 16, 4799–4832 (2021).
García-Ortegón, M., Simm, G. N., Tripp, A. J., Hernández-Lobato, J. M., Bender, A. & Bacallado, S. DOCKSTRING: easy molecular docking yields better benchmarks for ligand design. Journal of chemical information and modeling 62, 3486 3502 (2022).
Khan, F. M., Sperduto, P. W. & Gibbons, J. P. Khan’s Treatment Planning in Radiation Oncology:. (Lippincott Williams & Wilkins, 2021).
Webb, S. The physical basis of IMRT and inverse planning. The British journal of radiology 76, 678–689 (2003).
Hussein, M., Heijmen, B. J. M., Verellen, D. & Nisbet, A. Automation in Intensity Modulated Radiotherapy Treatment Planning—a Review of Recent Innovations. British Journal of Radiology 91, 20180270. ISSN: 0007-1285. (2024) (Dec. 2018).
Wieser, H.-P., Cisternas, E., Wahl, N., Ulrich, S., Stadler, A., Mescher, H., Müller, L.-R., Klinge, T., Gabrys, H., Burigo, L., et al. Development of the open-source dose calculation and optimization toolkit matRad. Medical Physics 44, 2556–2568 (2017).
Nori, H., Lee, Y. T., Zhang, S., Carignan, D., Edgar, R., Fusi, N., King, N., Larson, J., Li, Y., Liu, W., et al. Can generalist foundation models outcompete special-purpose tuning? case study in medicine. arXiv preprint arXiv:2311.16452 (2023).
Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E. & Singh, S. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, Online, Nov. 2020), 4222–4235. https://aclanthology.org/2020.emnlp-main.346.
Jia, M., Tang, L., Chen, B.-C., Cardie, C., Belongie, S., Hariharan, B. & Lim, S.-N. Visual prompt tuning in European Conference on Computer Vision (2022), 709–727.
Li, X. L. & Liang, P. Prefix-Tuning: Optimizing Continuous Prompts for Generation in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (Association for Computational Linguistics, Online, Aug. 2021), 4582–4597. https://aclanthology.org/2021.acl-long.353.
Chen, X., Zhang, N., Xie, X., Deng, S., Yao, Y., Tan, C., Huang, F., Si, L. & Chen, H. Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction in Proceedings of the ACM Web conference 2022 (2022), 2778–2788.
Ye, Q., Axmed, M., Pryzant, R. & Khani, F. Prompt engineering a prompt engineer. arXiv preprint arXiv:2311.05661 (2023).
Khattab, O., Santhanam, K., Li, X. L., Hall, D., Liang, P., Potts, C. & Zaharia, M. Demonstrate-SearchPredict: Composing Retrieval and Language Models for Knowledge-Intensive NLP. arXiv preprint arXiv:2212.14024 (2022).
Singhvi, A., Shetty, M., Tan, S., Potts, C., Sen, K., Zaharia, M. & Khattab, O. DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines. arXiv preprint arXiv:2312.13382 (2023).
Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D. & Chen, X. Large Language Models as Optimizers in The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=Bb4VGOWELI.
Song, X., Tian, Y., Lange, R. T., Lee, C., Tang, Y. & Chen, Y. Position: Leverage Foundational Models for Black-Box Optimization 2024. arXiv: 2405.03547 [cs.LG].
Liu, T., Astorga, N., Seedat, N. & van der Schaar, M. Large Language Models to Enhance Bayesian Optimization in The Twelfth International Conference on Learning Representations(2024). https://openreview.net/forum?id=OOxotBmGol.
Wang, R., Zelikman, E., Poesia, G., Pu, Y., Haber, N. & Goodman, N. Hypothesis Search: Inductive Reasoning with Language Models in The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=G7UtIGQmjm.
Gao, L., Dai, Z., Pasupat, P., Chen, A., Chaganty, A. T., Fan, Y., Zhao, V., Lao, N., Lee, H., Juan, D.-C. & Guu, K. RARR: Researching and Revising What Language Models Say, Using Language Models in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics, Toronto, Canada, July 2023), 16477–16508. https://aclanthology.org/2023.acl-long.910.
Chen, X., Lin, M., Schärli, N. & Zhou, D. Teaching Large Language Models to Self-Debug in The Twelfth International Conference on Learning Representations (2024). https : / / openreview . net / forum ? id = KuPixIqPiq.
Shypula, A. G., Madaan, A., Zeng, Y., Alon, U., Gardner, J. R., Yang, Y., Hashemi, M., Neubig, G., Ranganathan, P., Bastani, O. & Yazdanbakhsh, A. Learning Performance-Improving Code Edits in The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=ix7rLVHXyY.
Chase, H. LangChain Oct. 17, 2022. https://github.com/langchain-ai/langchain.
Liu, J. LlamaIndex https://github.com/jerryjliu/llamaindex. 2022.
Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Hambro, E., Zettlemoyer, L., Cancedda, N. & Scialom, T. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems 36 (2024).
Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N. & Lewis, M. Measuring and Narrowing the Compositionality Gap in Language Models in Findings of the Association for Computational Linguistics: EMNLP 2023 (Association for Computational Linguistics, Singapore, Dec. 2023), 5687–5711. https://aclanthology.org/2023.findings-emnlp.378.
Zelikman, E., Wu, Y., Mu, J. & Goodman, N. STaR: Bootstrapping Reasoning With Reasoning in Advances in Neural Information Processing Systems (eds Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K. & Oh, A.) 35 (Curran Associates, Inc., 2022), 15476–15488. https://proceedings.neurips.cc/paper_files/paper/2022/file/639a9a172c044fbb64175b5fad42e9a5-Paper-Conference.pdf.
Huang, J., Gu, S., Hou, L., Wu, Y., Wang, X., Yu, H. & Han, J. Large Language Models Can Self-Improve in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H., Pino, J. & Bali, K.) (Association for Computational Linguistics, Singapore, Dec. 2023). https://aclanthology.org/2023.emnlp-main.67.
Yang, K., Tian, Y., Peng, N. & Klein, D. Re3: Generating Longer Stories With Recursive Reprompting and Revision in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Dec. 2022). https : / / aclanthology . org / 2022 . emnlp - main.296.
Xie, Y., Kawaguchi, K., Zhao, Y., Zhao, X., Kan, M.-Y., He, J. & Xie, Q. Self-Evaluation Guided Beam Search for Reasoning in Thirty-seventh Conference on Neural Information Processing Systems (2023). https://openreview.net/forum?id=Bw82hwg5Q3.
Paul, D., Ismayilzada, M., Peyrard, M., Borges, B., Bosselut, A., West, R. & Faltings, B. REFINER: Reasoning Feedback on Intermediate Representations in Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics, Mar. 2024). https://aclanthology.org/2024.eacl-long.67.
Zhao, A., Huang, D., Xu, Q., Lin, M., Liu, Y.-J. & Huang, G. Expel: Llm agents are experiential learners in Proceedings of the AAAI Conference on Artificial Intelligence 38 (2024), 19632–19642.
Le, H., Chen, H., Saha, A., Gokul, A., Sahoo, D. & Joty, S. CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules in The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=vYhglxSj8j.
Robbins, H. & Monro, S. A stochastic approximation method. The Annals of Mathematical Statistics, 400–407 (1951).
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33, 9459–9474 (2020).
Robert, C. P. & Casella, G. Simulation and the Monte Carlo method. Springer Texts in Statistics, New York: Springer 2 (1999).
Kingma, D. & Ba, J. Adam: A Method for Stochastic Optimization in International Conference on Learning Representations (ICLR) (San Diega, CA, USA, 2015).
Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I. & Cobbe, K. Let’s Verify Step by Step in The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=v8L0pN6EOi.
Schmidhuber, J. Evolutionary Principles in Self-Referential Learning PhD thesis (Technical University of Munich, 1987).
Thrun, S. & Pratt, L. in Learning to Learn 3–17 (Springer, 1998).
Finn, C., Abbeel, P. & Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks in Proceedings of the 34th International Conference on Machine Learning (2017).
RDKit: Open-source cheminformatics http://www.rdkit.org.
Hughes, J. P., Rees, S., Kalindjian, S. B. & Philpott, K. L. Principles of early drug discovery. British journal of pharmacology 162, 1239–1249 (2011).
Wenlock, M. C., Austin, R. P., Barton, P., Davis, A. M. & Leeson, P. D. A comparison of physiochemical property profiles of development and marketed oral drugs. Journal of medicinal chemistry 46, 1250–1256 (2003).
Knox, C., Wilson, M., Klinger, C. M., Franklin, M., Oler, E., Wilson, A., Pon, A., Cox, J., Chin, N. E., Strawbridge, S. A., et al. Drugbank 6.0: the drugbank knowledgebase for 2024. Nucleic Acids Research 52, D1265–D1275 (2024).
Berry, M., Fielding, B. & Gamieldien, J. Practical considerations in virtual screening and molecular docking. Emerging trends in computational biology, bioinformatics, and systems biology, 487 (2015).
Birhane, A., Kasirzadeh, A., Leslie, D. & Wachter, S. Science in the age of large language models. Nature Reviews Physics 5, 277–280 (2023).
Nikolova, N. & Jaworska, J. Approaches to measure chemical similarity–a review. QSAR & Combinatorial Science 22, 1006–1026 (2003).
Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., Hersey, A., Light, Y., McGlinchey, S., Michalovich, D., Al-Lazikani, B., et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic acids research 40, D1100–D1107 (2012).
Swanson, K., Walther, P., Leitz, J., Mukherjee, S., Wu, J. C., Shivnaraine, R. V. & Zou, J. ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries. bioRxiv, 2023–12 (2023).
Xing, L., Li, J. G., Donaldson, S., Le, Q. T. & Boyer, A. L. Optimization of Importance Factors in Inverse Planning. Physics in Medicine & Biology 44, 2525. ISSN: 0031-9155. (2024) (Oct. 1999).

Authors:

(1) Mert Yuksekgonul, Co-first author from Department of Computer Science, Stanford University ([email protected]);

(2) Federico Bianchi, Co-first author from Department of Computer Science, Stanford University ([email protected]);

(3) Joseph Boen, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(4) Sheng Liu, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(5) Zhi Huang, Co-first author from Department of Biomedical Data Science, Stanford University ([email protected]);

(6) Carlos Guestrin, Department of Computer Science, Stanford University and Chan Zuckerberg Biohub ([email protected]);

(7) James Zou, Department of Computer Science, Stanford University, Department of Biomedical Data Science, Stanford University, and Chan Zuckerberg Biohub ([email protected]).

This paper is available on arxiv under CC BY 4.0 license.

← Previous

TextGrad vs. DSPy & ProTeGi: Evolution of Textual Autograd

Up Next →

TextGrad Autograd Engine: Variables, Roles, & PyTorch-Style Textual Backpropagation