As the adoption of Large Language Models (LLMs) has increased across various applications, the rate of associated challenges has also risen. One notable challenge is the need to structure LLM outputs for specific use cases. While LLMs excel at producing fluent context-aware text, many practical applications demand more structured and predictable output formats.
The real challenge, however, is ensuring that LLMs adhere to these structured formats. While prompt engineering is one way to address this, GBNF provides the same solution through a different approach.
By defining strict structural rules, GBNF ensures consistency and predictability in responses. To understand how GBNF achieves this and its significance in the LLM ecosystem, let's delve deeper into what GBNF is and how it works.
GBNF (GGML [Backus-Naur Form]) is an extension of the traditional Backus-Naur Form, specifically designed for use with Large Language Models (LLMs). It provides a formal way to define grammars that constrain and structure the output of LLMs. By implementing GBNF, developers can guide LLMs to generate content in specific formats, such as JSON or custom data structures, while maintaining the model's ability to produce relevant and contextual responses. This tool bridges the gap between the free-form nature of LLM outputs and the structured data requirements of many practical applications.
Many real-world applications require LLM outputs in specific formats (e.g., JSON, XML) for seamless integration with existing systems or APIs. GBNF enables this without compromising the LLM's language understanding capabilities.
Now that we understand what GBNF is and why it's valuable, let's explore how it's implemented in practice. One notable framework that utilizes GBNF is llama.cpp. llama.cpp is a lightweight, portable C/C++ implementation is designed for efficient large language model inference across various hardware platforms, both locally and in the cloud.
To illustrate GBNF's practical application with llama.cpp, let us look at an example, structuring a person's details in JSON format. We'll start by defining a GBNF grammar that outlines the expected structure. Below is the GBNF file with GBNF rules:
root ::= "{" ws pair (ws "," ws pair)* ws "}"
pair ::= string ws ":" ws value
string ::= "\"" ([a-zA-Z0-9_])+ "\""
value ::= string | number
number ::= [0-9]+
ws ::= [ \t\n]*
While GBNF file format is useful for defining grammars, some users have reported parsing issues when reading these files with llama.cpp. As an alternative, you can define the grammar inline as a string in the code as below.
grammar_string = r"""root ::= "{" pair ("," pair)* "}"
pair ::= string ":" value
string ::= "\"" [a-zA-Z0-9_]+ "\""
value ::= string | number
number ::= [0-9]+"""
If you have a JSON structure that you want to use as a template for your LLM's output, llama.cpp provides a Python script that can convert it into GBNF format. This script allows you to create GBNF grammars based on your JSON templates without manually writing the grammar rules.
The code snippet demonstrates how to use GBNF grammar with llama.cpp using its Python bindings.
from llama_cpp import Llama,LlamaGrammar
llm = Llama(model_path="./models/7B/llama-model.gguf")
my_grammar = LlamaGrammar.from_string(grammar_string, verbose=True)
prompt="Give me a person info"
output = llm(prompt,max_tokens=100,grammar=my_grammar)
output['choices'][0]['text']
About GGUF Model Format: The code uses a GGUF (GGML Universal File Format) file, which is the format supported by llama.cpp. GGUF files are designed for efficient storage of model weights and are optimized for inference. They support various quantization levels, allowing large language models to run efficiently on a wide range of hardware. In the code, the GGUF file is specified in the model_path parameter when initializing the Llama object. Check gguf files here:- Link
Without using the grammar string, the output is unstructured:
prompt="Give me a person info"
output = llm(prompt,max_tokens=100)
output['choices'][0]['text']
':\nAnna works with a Lab Centre and her age is 50 years and lives in Australia.\n Create a variable for each info.\n Use these variables to create the user.'
When we apply the GBNF grammar, the output will be structured according to the defined rules, typically in a format like JSON or another specified structure.
prompt="Give me a person info"
output = llm(prompt,max_tokens=100,grammar=my_grammar)
output['choices'][0]['text']
'{"name":"Shayla","gender":"Female","age":"31","city":"Chicago","state":"IL"}'
Rule Format
A: The left-hand side of the production rule is always a single non-terminal.
a: The first symbol on the right-hand side is always a single terminal.
α: The rest of the right-hand side can be any string of non-terminals, including the empty string.
The following visual representation deconstructs our previously defined grammar, offering an intuitive breakdown of its core elements. This diagram provides a clear, at-a-glance understanding of the grammar's structure, simplifying complex concepts through color-coded components.
GBNF Grammar used in above script:
Here's how the grammar rules shaped this result:
root
: Outlines the overall structure (e.g., { "key":"value" }
).
pair
: Describes key-value pairs with a colon separator.
value
: Allows flexibility (string or number).
string
/number
: Specifies valid formats for keys and values.
ws:
handles optional spaces for clean formatting.
The GBNF grammar rules above illustrate how they effectively guided the LLM to generate a valid JSON object, maintaining consistent key-value formatting as defined in the grammar.
Rule: Defines how a part of the output is built.
Example: root ::= "{" pair ("," pair)* "}"
Non-Terminal: A concept that can be expanded further, written in lowercase.
Examples: root
, pair
, string
Terminal: Fixed characters or strings in quotes, appearing in the output.
Examples: "{"
, ","
, ":"
_expression_: The right-hand side of a rule explaining how to build the non-terminal.
Example: In pair ::= string ":" value
, the _expression_ is string ":" value
.
Alternatives (|
): Offers options.
Example: value ::= string | number
.
Repetition:
*
= Zero or more (e.g., ("," pair)*
)
+
= One or more
?
= Optional (zero or one)
Grouping (()
): Combines elements.
Example: ("," pair)*
.
Character Sets ([]
): Specifies allowed characters.
Example: [a-zA-Z0-9_]
.
Root Rule: Defines the overall structure of the output.
Example: root ::= "{" pair ("," pair)* "}"
.
Whitespace (ws
): Allows optional spaces, tabs, or newlines.
Example: ws ::= [ \t\n]*
.
After understanding GBNF basics, practical implementation becomes key. For developers working with API responses, GBNF may be helpful to enforce strict JSON structure directly at the grammar level when generating responses from Large Language Models (LLMs).
As already mentioned, Traditional approaches, like prompting with "Please respond in JSON format" may be promising but may fall short, as LLMs may overlook such instructions. GBNF is another area to help on this but yet to be explored more. It’s integration with tools like llama.cpp showcases practical implementations of this approach.
Explore more on implementation : GitHub - llama.cpp grammars.