Notebooks
N
NVIDIA
Safe Synthesizer 101

Safe Synthesizer 101

gpu-accelerationintroretrieval-augmented-generationllm-inferencetensorrtnvidia-generative-ai-exampleslarge-language-modelsmicroservicetriton-inference-serverLLMNeMo-Safe-Synthesizerragnemo

🎛️ NeMo Safe Synthesizer 101: The Basics

⚠️ Warning: NeMo Safe Synthesizer is in Early Access and not recommended for production use.


In this notebook, we demonstrate how to create a synthetic version of a tabular dataset using the NeMo Microservices Python SDK. The notebook should take about 20 minutes to run.

After completing this notebook, you'll be able to:

  • Use the NeMo Microservices SDK to interact with Safe Synthesizer
  • Create novel synthetic data that follows the statistical properties of your input dataset
  • Access an evaluation report on synthetic data quality and privacy

💾 Install dependencies

IMPORTANT 👉 Ensure you have a NeMo Microservices Platform deployment available. Follow the quickstart or Helm chart instructions in your environment's setup guide. You may need to restart your kernel after installing dependencies.

[ ]
[ ]

⚙️ Initialize the NeMo Safe Synthesizer Client

  • The Python SDK provides a wrapper around the NeMo Microservices Platform APIs.
  • http://localhost:8080 is the default url for the client's base_url in the quickstart.
  • If using a managed or remote deployment, ensure correct base URLs and tokens.
[ ]

NeMo DataStore is launched as one of the services, and we'll use it to manage our storage. so we'll set the following:

[ ]

📥 Load input data

Safe Synthesizer learns the patterns and correlations in your input dataset to produce synthetic data with similar properties. For this tutorial, we will use a small public sample dataset. Replace it with your own data if desired.

The sample dataset used here is a set of women's clothing reviews, including age, product category, rating, and review text. Some of the reviews contain Personally Identifiable Information (PII), such as height, weight, age, and location.

[ ]
[ ]

🏗️ Create a Safe Synthesizer job

The SafeSynthesizerBuilder provides a fluent interface to configure and submit jobs.

The following code creates and submits a job:

  • SafeSynthesizerBuilder(client): initialize with the NeMo Microservices client.
  • .with_data_source(df): set the input data source.
  • .with_datastore(datastore_config): configure model artifact storage.
  • .with_replace_pii(): enable automatic replacement of PII.
  • .synthesize(): train and generate synthetic data.
  • .create_job(): submit the job to the platform.
[ ]
[ ]

👀 View synthetic data

After the job completes, fetch the generated synthetic dataset.

[ ]

📊 View evaluation report

An evaluation comparing the synthetic data to the input data is performed automatically. You can:

  • Inspect key scores: overall synthetic data quality and privacy.
  • Download the full HTML report: includes charts and detailed metrics.
  • Display the report inline: useful when viewing in notebook environments.
[ ]
[ ]
[ ]