AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation

Abstract

Besides humans and machines, Artificial Intelligence (AI) models have emergedto be another important audience of programming languages, as we come to theera of large language models (LLMs). LLMs can now excel at coding competitionsand even program like developers to address various tasks, such as mathcalculation. Yet, the grammar and layout of existing programs are designed forhumans. Particularly, abundant grammar tokens and formatting tokens areincluded to make the code more readable to humans. While beneficial, such ahuman-centric design imposes an unnecessary computational burden on LLMs whereeach token, either consumed or generated, consumes computational resources. Toimprove inference efficiency and reduce computational costs, we propose theconcept of AI-oriented grammar, which aims to represent the code in a way thatbetter suits the working mechanism of AI models. Code written with AI-orientedgrammar discards formats and uses a minimum number of tokens to convey codesemantics effectively. To demonstrate the feasibility of this concept, weexplore and implement the first AI-oriented grammar for Python, named SimplePython (SimPy). SimPy is crafted by revising the original Python grammarthrough a series of heuristic rules. Programs written in SimPy maintainidentical Abstract Syntax Tree (AST) structures to those in standard Python,allowing execution via a modified AST parser. In addition, we explore methodsto enable existing LLMs to proficiently understand and use SimPy, and ensurethe changes remain imperceptible for human developers. Compared with theoriginal Python, SimPy not only reduces token usage by 13.5% and 10.4% forCodeLlama and GPT-4, but can also achieve equivalent, even improved,performance over the models trained on Python code.

Quick Read (beta)

loading the full paper ...