Guidance

code01.Introducemicrosoft-phi-cookbook

About Guidance

Guidance is a proven open-source Python library for controlling outputs of any language model (LM). With one API call, you can express (in Python) the precise programmatic constraint(s) that the model must follow and generate the structured output in JSON, Python, HTML, SQL, or any structure that the use case requires.

Guidance differs from conventional prompting techniques. It enforces constraints by steering the model token by token in the inference layer, producing higher quality outputs and reducing cost and latency by as much as 30–50% when utilizing for highly structured scenarios.

To learn more about Guidance, visit the public repository on GitHub or watch the Guidance Breakout Session at Microsoft Build.

Setup

  1. Install Guidance with pip install guidance --pre
  2. Deploy a Phi 3.5 mini endpoint in Azure by going to https://ai.azure.com/explore/models/Phi-3.5-mini-instruct/version/2/registry/azureml and clicking the "Deploy" button
  3. Store your endpoint's API key in an environment variable called AZURE_PHI3_KEY and the URL in an environment variable called AZURE_PHI3_URL
[ ]

Unconstrained generation

Text can be generated without any constraints using the gen() function. This is the same as using the model without Guidance.

Chat Formatting

Like many chat models, Phi-3 expects messages between a user and assistant in a specific format. Guidance supports Phi-3's chat template and will manage chat formatting for you. To create chat turns, put each portion of the conversation in a with user() or with assistant() block. A with system() block can be used to set the system message.

[22]

Token savings

In highly structured scenarios, Guidance can skip tokens and generate only necessary tokens, improving performance, increasing efficiency and saving API costs. Generated tokens are shown in this notebook with a highlighted background. Forced tokens are shown without highlighting and cost the same as input tokens, which are estimated at one third the cost of output tokens.

Note: The first example with unconstrained generation was not able to force any tokens because we provided no constraints.

Speaking for Phi 3

With Guidance, you can easily inject text into the model's responses. This can be helpful if you want to guide the model's output in a specific direction.

[5]

The capital of Australia is is not highlighted because that portion of the assistant's response was forced by Guidance.

Constraining with regex

In the previous example, Phi 3 responded with follow-up explanations after answering the question with Canberra. In order to constrain the model's output to exactly one word, a regex can be used.

[6]

With the regex, only the word Canberra is generated.

Selecting from multiple choices

When some possible choices are known, you can use the select() function to have the model choose from a list of options.

[23]

With select(), only the token Can was generated. Because Canberra is the only option that can possibly complete the response, the remaining tokens were forced.

Chain of Thought

Chain of thought is a technique that can help improve the quality of the model's output by encouraging it to process a problem step by step. Typically, to reach a final answer, multiple prompt turns are necessary. First, instruct the model to think step by step. Then, the prompt the model again to provide the final answer. With standard chat inference APIs, this takes 2 API calls, and the model’s generated “chain of thought” gets charged twice – once as output tokens when the model generated it, and then again as input tokens for the second call. With Guidance, the entire multi-step process is processed and charged as part of a single API call, reducing cost and latency.

[8]
Final answer: 35

JSON Generation

Guidance can be used to guarantee generation of JSON compliant with a JSON schema or pydantic model, such as the user profile schema shown here.

[16]
[19]

HTML Generation

Guidance can also be used to generate code and follow the syntactical requirements in the programming language. In this section, we will create a small Guidance program for writing very simple HTML webpages.

We will break the webpage down into smaller sections, each with its own Guidance function. These are then combined in our final function to create an HTML webpage. We will then run this function against a Guidance-enabled model in Azure AI.

Note: This is not going to be a fully-featured HTML generator; the goal is to show how you can create structured output for your individual needs

We begin by importing what we require from Guidance:

[ ]

HTML webpages are highly structured, and we will 'force' those parts of the page using Guidance. When we explicitly require text from the model, we need to ensure it doesn't include anything which could be a tag - that is, we must exclude the '<' and '>' characters:

[ ]

We can then use this function to generate text within an arbitrary HTML tag:

[ ]

Now, let us create the page header. As part of this, we need to generate a page title:

[ ]

The body of the HTML page is going to be filled with headings and paragraphs. We can define a function to do each:

[ ]

Now, the function to define the body of the HTML itself. This uses select() with recurse=True to generate multiple headings and paragraphs:

[ ]

Next, we come to the function which generates the complete HTML page. We add the HTML start tag, then generate the header, then body, and then append the ending HTML tag:

[ ]

We provide a user-friendly wrapper, which will allow us to:

  • Set the temperature of the generation
  • Capture the generated page from the Model object
[ ]

We can provide a prompt to the model, and then request a generation:

[ ]

We can then write the output to a file:

[ ]