IIRC, there's a way to "force" LLMs to output proper JSON by adding some logic t...

ManuelKiessling · on June 14, 2023

Note that you don’t necessarily need to have the AI output any JSON at all — simply have it answer when being asked for the value to a specific JSON key, and handle the JSON structure part in your hallucinations-free own code: https://github.com/manuelkiessling/php-ai-tool-bridge

lyjackal · on June 15, 2023

Would be nice if you could send a back and forth interaction for each key. This approach turns into lots of requests that reapply the entire context and ends up slow. I wish i could just send a Microsoft guidance template program, and process that in a single pass.

naiv · on June 14, 2023

Thanks for sharing!

senko · on June 14, 2023

It would seem not, as the official documentation mentions the arguments may be hallucinated or be a malformed JSON.

(except if the meaning is the JSON syntax is valid but may not conform to the schema, but they're unclear on that).

sanxiyn · on June 14, 2023

For various reasons, token selection may be implemented as upweighting/downweighting instead of outright ban of invalid tokens. (Maybe it helps training?) Then the model could generate malformed JSON. I think it is premature to infer from "can generate malformed JSON" that OpenAI is not using token selection restriction.

woodrowbarlow · on June 14, 2023

the linked article hypothesizes:

> I assume OpenAI’s implementation works conceptually similar to jsonformer, where the token selection algorithm is changed from “choose the token with the highest logit” to “choose the token with the highest logit which is valid for the schema”.

sanxiyn · on June 14, 2023

Note that this (token selection restriction) is even available on OpenAI API as logit_bias.

newhouseb · on June 14, 2023

But only for the whole generation. So if you want to constrain things one token at a time (as you would to force things to follow a grammar) you have to make fresh calls and only request one token which makes things more or less impractical if you want true guarantees. A few months ago I built this anyway to suss out how much more expensive it was [1]

[1] https://github.com/newhouseb/clownfish#so-how-do-i-use-this-...

ttul · on June 15, 2023

I think the problem is that tokens are not characters. So even if you had access to a JSON parser state that could tell you whether or not a given character is valid as the next character, I am not sure how you would translate that into tokens to apply the logit biases appropriately. There would be a great deal of computation required at each step to scan the parser state and generate the list of prohibited or allowable tokens.

But if one could pull this off, it would be super cool. Similar to how Microsoft’s guidance module uses the logit_bias parameter to force the model to choose between a set of available options.

yunyu · on June 15, 2023

You simply sample tokens starting with the allowed characters and truncate if needed. It’s pretty efficient, there’s an implementation here: https://github.com/1rgs/jsonformer

DougBTX · on June 15, 2023

This is the best implementation I've seen, but only for Hugging Face models: https://github.com/1rgs/jsonformer

have_faith · on June 14, 2023

How would a tweaked temp enforce a non broken output exactly?

sanxiyn · on June 14, 2023

It's not temperature, but sampling. Output of LLM is probabilistic distribution over tokens. To get concrete tokens, you sample from that distribution. Unfortunately, OpenAI API does not expose the distribution. You only get the sampled tokens.

As an example, on the link JSON schema is defined such that recipe ingredient unit is one of grams/ml/cups/pieces/teaspoons. LLM may output the distribution grams(30%), cups(30%), pounds(40%). Sampling the best token "pounds" would generate an invalid document. Instead, you can use the schema to filter tokens and sample from the filtered distribution, which is grams(50%), cups(50%).

isoprophlex · on June 14, 2023

Not traditional temperature, maybe the parent worded it somewhat obtusely. Anyway, to disambiguate...

I think it works something like this: You let something akin to a json parser run with the output sampler. First token must be either '{' or '['; then if you see [ has the highest probability, you select that. Ignore all other tokens, even those with high probability.

Second token must be ... and so on and so on.

Guarantee for non-broken (or at least parseable) json