IIRC, there's a way to "force" LLMs to output proper JSON by adding some logic to the top token selection. I.e. in the randomness function (which OpenAI calls temperature) you'd never choose a next token that results in broken JSON. The only reason it wouldn't would be if the output exceeds the token limit. I wonder if OpenAI is doing something like this.
Note that you don’t necessarily need to have the AI output any JSON at all — simply have it answer when being asked for the value to a specific JSON key, and handle the JSON structure part in your hallucinations-free own code: https://github.com/manuelkiessling/php-ai-tool-bridge
Would be nice if you could send a back and forth interaction for each key. This approach turns into lots of requests that reapply the entire context and ends up slow. I wish i could just send a Microsoft guidance template program, and process that in a single pass.
For various reasons, token selection may be implemented as upweighting/downweighting instead of outright ban of invalid tokens. (Maybe it helps training?) Then the model could generate malformed JSON. I think it is premature to infer from "can generate malformed JSON" that OpenAI is not using token selection restriction.
> I assume OpenAI’s implementation works conceptually similar to jsonformer, where the token selection algorithm is changed from “choose the token with the highest logit” to “choose the token with the highest logit which is valid for the schema”.
But only for the whole generation. So if you want to constrain things one token at a time (as you would to force things to follow a grammar) you have to make fresh calls and only request one token which makes things more or less impractical if you want true guarantees. A few months ago I built this anyway to suss out how much more expensive it was [1]
I think the problem is that tokens are not characters. So even if you had access to a JSON parser state that could tell you whether or not a given character is valid as the next character, I am not sure how you would translate that into tokens to apply the logit biases appropriately. There would be a great deal of computation required at each step to scan the parser state and generate the list of prohibited or allowable tokens.
But if one could pull this off, it would be super cool. Similar to how Microsoft’s guidance module uses the logit_bias parameter to force the model to choose between a set of available options.
You simply sample tokens starting with the allowed characters and truncate if needed. It’s pretty efficient, there’s an implementation here: https://github.com/1rgs/jsonformer
It's not temperature, but sampling. Output of LLM is probabilistic distribution over tokens. To get concrete tokens, you sample from that distribution. Unfortunately, OpenAI API does not expose the distribution. You only get the sampled tokens.
As an example, on the link JSON schema is defined such that recipe ingredient unit is one of grams/ml/cups/pieces/teaspoons. LLM may output the distribution grams(30%), cups(30%), pounds(40%). Sampling the best token "pounds" would generate an invalid document. Instead, you can use the schema to filter tokens and sample from the filtered distribution, which is grams(50%), cups(50%).
Not traditional temperature, maybe the parent worded it somewhat obtusely. Anyway, to disambiguate...
I think it works something like this: You let something akin to a json parser run with the output sampler. First token must be either '{' or '['; then if you see [ has the highest probability, you select that. Ignore all other tokens, even those with high probability.
Second token must be ... and so on and so on.
Guarantee for non-broken (or at least parseable) json