Notebooks
N
NVIDIA
2 Structured Outputs And Jinja Expressions

2 Structured Outputs And Jinja Expressions

gpu-accelerationretrieval-augmented-generationllm-inferencetensorrtnvidia-generative-ai-exampleslarge-language-modelsmicroservicetriton-inference-serverLLMragmanaged-service-tutorialsnemoNeMo-Data-Designergetting-started

🎨 NeMo Data Designer 101: Structured Outputs and Jinja Expressions

Note: In order to run this notebook, you must have the NeMo Data Designer microservice deployed locally via docker compose. See the deployment guide for more details.


In this notebook, we will continue our exploration of Data Designer, demonstrating more advanced data generation using structured outputs and Jinja expressions.

If this is your first time using Data Designer, we recommend starting with the first notebook in this 101 series.

💾 Install dependencies

IMPORTANT 👉 If you haven't already, follow the instructions in the README to install the necessary dependencies. Note you may need to restart your kernel after setting up the environment.

[ ]

⚙️ Initialize the NeMo Data Designer Client

  • The data designer client is responsible for submitting generation requests to the Data Designer microservice.

  • In this notebook, we connect to the managed service of data designer. Alternatively, you can connect to your own instance of data designer by following the deployment instructions here.

  • If you have an instance of data designer running locally, you can connect to it as follows

    data_designer_client = DataDesignerClient(client=NeMoMicroservices(base_url="http://localhost:8080"))
    
[ ]
[ ]

🏗️ Initialize the Data Designer Config Builder

  • The Data Designer config defines the dataset schema and generation process.

  • The config builder provides an intuitive interface for building this configuration.

  • You must provide a list of model configs to the builder at initialization.

  • This list contains the models you can choose from (via the model_alias argument) during the generation process.

Note: The NeMo Data Designer Managed service has access to specific models. Please visit https://build.nvidia.com/nemo/data-designer to see the latest list of which models are available.

[ ]
[ ]

🧑‍🎨 Designing our data

  • We will again create a product review dataset, but this time we will use structured outputs and Jinja expressions.

  • Structured outputs let you specify the exact schema of the data you want to generate.

  • Data Designer supports schemas specified using either json schema or Pydantic data models (recommended).


We'll define our structured outputs using Pydantic data models:

[ ]

Next, let's design our product review dataset using a few more tricks compared to the previous notebook:

[ ]

👀 Preview the dataset

  • Iteration is key to generating high-quality synthetic data.

  • Use the preview method to generate 10 records for inspection.

  • Setting verbose_logging=True prints logs within each task of the generation process.

[ ]
[ ]
[ ]
[ ]

⏭️ Next Steps

Check out the following notebooks to learn more about: