Notebooks
N
NVIDIA
Multi Turn Conversation

Multi Turn Conversation

gpu-accelerationretrieval-augmented-generationllm-inferencetensorrtnvidia-generative-ai-examplesself-hosted-tutorialslarge-language-modelsmicroservicetriton-inference-servermulti-turn-chatcommunity-contributionsLLMragnemoNeMo-Data-Designer

๐ŸŽจ NeMo Data Designer: Synthetic Conversational Data with Person Details

๐Ÿ“š What you'll learn

  • This notebook demonstrates how to use the NeMo Data Designer to build a synthetic data generation pipeline step-by-step.

  • We will create multi-turn user-assistant dialogues tailored for fine-tuning language models, enhanced with realistic person details.

  • These datasets could be used for developing and enhancing conversational AI applications, including customer
    support chatbots, virtual assistants, and interactive learning systems.


๐Ÿ‘‹ IMPORTANT โ€“ย Environment Setup

  • If you haven't already, follow the instructions in the README to install the necessary dependencies.

  • You may need to restart your notebook's kernel after setting up the environment.

  • In this notebook, we assume you have a self-hosted instance of Data Designer up and running.

  • For deployment instructions, see the Installation Options section of the NeMo Data Designer documentation.

๐Ÿ“ฆ Import the essentials

  • The data_designer module of nemo_microservices exposes Data Designer's high-level SDK.

  • The essentials module provides quick access to the most commonly used objects.

[ ]

โš™๏ธ Initialize the NeMo Data Designer Client

  • NeMoDataDesignerClient is responsible for submitting generation requests to the microservice.
[ ]

๐ŸŽ›๏ธ Define model configurations

  • Each ModelConfig defines a model that can be used during the generation process.

  • The "model alias" is used to reference the model in the Data Designer config (as we will see below).

  • The "model provider" is the external service that hosts the model (see the model config docs for more details).

  • By default, the microservice uses build.nvidia.com as the model provider.

[ ]

๐Ÿ—๏ธ Initialize the Data Designer Config Builder

  • The Data Designer config defines the dataset schema and generation process.

  • The config builder provides an intuitive interface for building this configuration.

  • The list of model configs is provided to the builder at initialization.

[ ]

Define Pydantic Models for Structured Outputs

You can use Pydantic to define a structure for the messages that are produced by Data Designer

[ ]

๐ŸŽฒ Adding Sampler Columns

  • Sampler columns offer non-LLM based generation of synthetic data.

  • They are particularly useful for steering the diversity of the generated data, as we demonstrate below.

[ ]

๐Ÿฆœ Adding LLM Generated columns

Now define the columns that the model will generate. These prompts instruct the LLM to produce the actual conversation:

  • a system prompt to guide how the AI assistant engages in the conversation with the user,
  • the conversation, and
  • finally, we generate a toxicity_label to assess user toxicity over the entire conversation.

๐Ÿ’ฌ๐Ÿค– AI Assistant system prompt and conversation

We generate a system prompt to base the AI assistant and then generate the entire conversation.

[ ]

๐Ÿ” LLM-as-a-Judge: Toxicity Assessment

When generating our synthetic dataset, we need to determine the quality of the generated dialogs.
We use the LLM-as-a-Judge strategy to do this.

To do so, we need to define the rubric that the LLM should use to assess generation quality along with a prompt that provides relavant instructions.

[ ]

๐Ÿ” Iteration is key โ€“ย preview the dataset!

  1. Use the preview method to generate a sample of records quickly.

  2. Inspect the results for quality and format issues.

  3. Adjust column configurations, prompts, or parameters as needed.

  4. Re-run the preview until satisfied.

[ ]
[ ]

๐Ÿ“Š Analyze the generated data

  • Data Designer automatically generates a basic statistical analysis of the generated data.

  • This analysis is available via the analysis property of generation result objects.

[ ]

๐Ÿ†™ Scale up!

  • Happy with your preview data?

  • Use the create method to submit larger Data Designer generation jobs.

[ ]
[ ]
[ ]
[ ]