NVIDIA Cyber Dev Day

Cyber Dev Day

gpu-accelerationretrieval-augmented-generationllm-inferencetensorrtnvidia-generative-ai-examplesevent-driven-rag-cve-analysislarge-language-modelsmicroservicetriton-inference-servercommunityLLMnotebooksragnemo

alph-notebooks/nvidia-generative-ai-examples / cyber-dev-day.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

An Introduction to Developing Agents with NVIDIA Morpheus

Introduction

Generative AI (GenAI) and Large Language Models (LLMs) are becoming essential tools in cybersecurity in part due to their ability to enhance the efficiency of cyber threat detection and response by accelerating analyst workflows. However, prototyping and moving these accelerated workflows into production can be daunting.

Cybersecurity remains among the top three challenges impacting every industry—from the public sector to financial services, telecommunications, retail, automotive, and more. Most CEOs believe organizations with the most advanced generative AI capabilities will have a competitive advantage and are looking for ways to incorporate this into their business. While adversaries are already leveraging generative AI in their attacks, there is significant potential to harness this power for cyber defense.

This hands-on tutorial will focus on accelerating an exploitability analysis workflow to increase analyst productivity and enhance cybersecurity defenses.

Problem Statement: Common Vulnerabilities and Exposures (CVE) Impact Analysis

Determining the impact of a documented CVE on a specific project or container is a labor-intensive and manual task. This intricate process involves the collection, comprehension, and synthesis of various pieces of information to ascertain whether immediate remediation, such as patching, is necessary upon the identification of a new CVE.

Challenges

Information Collection: The process involves significant manual labor to collect and synthesize relevant information.
Decision Complexity: Decisions on whether to update a library impacted by a CVE often hinge on various considerations, including:
- Scan False Positives: Occasionally, vulnerability scans may incorrectly flag a library as vulnerable, leading to a false alarm.
- Mitigating Factors: In some cases, existing safeguards within the environment may reduce or negate the risk posed by a CVE.
- Lack of Required Environments or Dependencies: For an exploit to succeed, specific conditions must be met. The absence of these necessary elements can render a vulnerability irrelevant.
Manual Documentation: Once an analyst has determined the library is not affected, a Vulnerability Exploitability eXchange (VEX) document must be created to standardize and distribute the results.

The efficiency of this process can be significantly enhanced through the deployment of an event-driven LLM agent pipeline.

Tutorial Goals

Our team developed a cybersecurity vulnerability analysis tool to aid in assessing the exploitability of CVEs in specific projects and containers. This tutorial will guide you step-by-step through the process of using LLMs, Retrieval-Augmented Generation (RAG), and agents to create both a toy version and a microservice running LLM-powered CVE exploitability analysis.

You'll have the chance to experiment with various modules, boosting your skills and understanding of these technologies. This experience will prepare you to later expand your use case by exploring new functionalities, enhancing the current setup, or even creating your own tailored solutions to meet specific needs or address new challenges.

0 - Environment Setup
1 - Intro to Interacting with LLMs
2 - Prototyping
3 - Beyond Prototyping
4 - Conclusion

Note: Please continue running the notebook up to Part 1 during the introduction presentation to ensure your environment is set up correctly.

0 - Environment Setup

The following code blocks are used to setup environment variables and imports for the rest of the notebook.

[ ]

Ensure the necessary environment variables are set. As a last resort, try to load them from a .env file.

[ ]

Import some common libraries to allow them to be used later in the notebook.

[ ]

Configure logging to allow Morpheus messages to appear in the notebook.

[ ]

Finally, test out the logger to ensure that it is working correctly. You should see a message printed to the console.

[ ]

Note: Please wait here until instructed to continue with running Part 1 of the notebook.

1 - Intro to Interacting with LLMs

This section will go over how to integrate LLMs into code with Python based examples. We will highlight some of the basic techniques for using and improving calls to LLMs for cybersecurity use cases.

1.1 - Python Calls to LLM API
1.2 - Prompt Engineering
1.3 - Prompt Templating
1.4 - One-Shot Learning
1.5 - Few-Shot Learning
1.6 - Evaluation Strategies

In this lab, we will use NVIDIA NIM for LLMs as our generative AI platform. NIM is a cloud-native framework for building, customizing and deploying generative AI models with a familiar ChatGPT-like interface. Utilizing NIM (or any other generative AI service) in our pipelines allows us to offload the heavy lifting of language model inference to a dedicated service, freeing up our own resources for other tasks. All requests to the NIM microservice are made via an HTTP API, which allows us to easily integrate it into our existing codebase.

To simplify the process of interacting with the LLM, we will use a Python client library (openai) that wraps the HTTP API. This library provides a simple interface for making requests to the LLM, and handles the details of making HTTP requests and parsing the responses. This allows us to focus on the high-level logic of our application, rather than the low-level details of making HTTP requests.

Before sending requests to the LLM, we need to set up a connection object, llm_client which is shown below. We will use the completions endpoint of the connection object to send requests to the LLM for the remainder of this section.

It's important to note here that although we store the NGC API Key under the OPENAI_API_KEY variable, we will be interacting with NVIDIA hosted LLMs and not OpenAI LLMs.

NVIDIA NIM microservices are OpenAI API compliant to maximize usability, so we will be using the openai with package as a wrapped to make API calls.

1.1 - Python Calls to LLM API

This section demonstrates executing a call to the LLM API for cybersecurity knowledge support. This could stand alone as a potential use case where we have a cyber knowledge assistant to aid junior cyber analysts.

Query

How can one determine if a CVE is vulnerable in a specific environment?

The code snippet below utilizes the chat.completions.create() method of the connection object (llm_client) to query the LLM, detailing the potential model parameters that can be provided:

Temperature: Controls the creativity of the model. Higher values enable the model to generate more creative outputs.
Top P: Controls the creativity of the model. Higher values enable the model to generate more creative outputs, suitable for tasks such as creative writing. This determines the minimum number of highest-probability tokens whose probabilities sum to or exceed the Top P value, from which the next token will be selected at random during text generation.
Seed: Affects the generation of random results by the model. It is possible to reproduce results by fixing the random seed (assuming all other hyperparameters are also fixed).
Presence Penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Frequency Penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Stream:If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.

[ ]

1.1.1 - Explore On Your Own: Different Models

Try another model such as from https://build.nvidia.com/explore/reasoning below

[ ]

1.1.2 - Explore On Your Own: Model Parameters

How do the different models compare? Can you change the parameters (like temperature or presence_penalty) to help the smaller models improve?
What are some other cybersecurity questions you could ask an LLM to upskill a junior analyst?

Try a few temperature values such as [0.0, 0.5, 0.7, 0.9].

What happens to the model's output with higher creativity?

[ ]

1.2 - Prompt Engineering

Sometimes, a simple prompt might not deliver the results we're aiming for. That's where prompt engineering steps in.
Prompt engineering is an iterative process that focuses on crafting prompts to clearly communicate our intentions to the model, guiding it to generate the most relevant and accurate responses. This approach helps optimize the model's performance, especially in specialized fields.
(Some tips for improving performance using prompt engineering can be found here https://www.promptingguide.ai/introduction/tips.)

Implementing Personas in Prompts

An interesting approach within prompt engineering involves assigning a persona to the model. By doing this, we can guide the model to produce responses that align with a specific character, making the interaction more tailored, in-depth, and relevant to our needs. The example below demonstrates a way to achieve persona prompting.

Persona

You are a highly experienced and knowledgeable cybersecurity expert with a deep understanding of cyber threats, network defense strategies, and the latest in cybersecurity technology. Your communication is clear, concise, and authoritative, aiming to educate and inform on best practices for digital security.

Query

How can one determine if a CVE is vulnerable in a specific environment?

[ ]

1.2.1 - Explore on your own: Different Personas

Does the persona improve performance? What happens if you change the persona or attributes such as communication style?

[ ]

1.3 - Prompt Templating

While cyber knowledge assistants are valuable, there are occasions when we need more detailed information or support on particular subjects, such as specific malware or a security vulnerability we're examining.
For instance, if we're assessing whether a known vulnerability can be exploited in our systems, how can we leverage LLM to guide us through the process? Can the LLM provide us with clear instructions on what steps to take?

Use Case: Utilizing LLMs to Evaluate System Vulnerabilities to Specific CVEs
Query: How can I determine if my specific environment is affected by CVE-2023-47248?

[ ]

Observation
Although the LLM offered general cybersecurity guidance, it couldn't give specific details about the CVE due to the limited capability caused by its offline nature.

Potential Solution
To enhance the model's effectiveness, we can directly include specific details about the CVE in our prompts. This approach leverages the model's ability to analyze information and compensates for its inability to access real-time data.
By doing so, the model can provide more precise and helpful recommendations concerning particular issues.

CVE Intel Examples
Here are two example CVEs that we'll be using as recurring examples throughout the notebook (the information is sourced from the internet):

[ ]

Below is an example illustrating how to create a prompt template that allows us to easily insert information of any given CVE:

[ ]

1.3.1 - Reflective Questions

1. Assessing the Model's Output: ⦁ Review the checklist provided by the model. Do the steps outlined seem practical and relevant to the CVE?; ⦁ How does the generated checklist aligns with your expectations?
2. Checking for Format Compliance: ⦁ Did the model generate the output in the format we requested, specifically as a Python list of strings?; ⦁ Consider the importance of format in data pipelines and how it affects the usability of the model's output.
3. Measuring Accuracy: ⦁ How can we determine the accuracy of a language model's output?; ⦁ Think about the criteria you would use to evaluate whether the checklist is accurate and relevant to the CVE details provided.

1.3.2 - Explore On Your Own: Different Models

How do alternative models perform? Do any adhere more closely to the formatting instructions?

[ ]

1.3.3 - Evaluating Model Performance through Formatting Checks

One way to to assess the model's performance is by checking its ability to adhere to our formatting instructions to output a python list.

[ ]

If the model's output doesn't meet our expectations, what are our next steps?

1.4 - One-Shot Learning

Our prevoius examples illustrated zero-shot learning or direct prompting, where the LLM was simply given instructions and asked to follow them. The process of adding one example to the prompt or one-shot learning can often greatly improve performance as it is more difficult to describe the desired output than it is to show it. It's as straight forward as it sounds, add a single example of the desired output to the prompt. Let's try it.

[ ]

Zero-Shot Example

[ ]

One-shot Example

[ ]

1.4.1 - Explore on your own: Robustness

How robust is the one-shot example?
Is it effective when the unparsable list is enumerated or bulleted instead of using dashes?

[ ]

1.5 - Few-Shot Learning and Detailed Prompts

Prompts can be extended to be quite large, descriptive and include many examples. Below is an example of very detailed prompt that contains all the elements discussed above and more. What extra elements do you notice?

[ ]

1.5.1 - Explore on your own

What feedback could an expert cyber analyst give you about this output?
What happens when you take a checklist item from what the model generated above and ask the model about it?

[ ]

How many examples do you think could fit in a prompt before the context is too large for the model?

[ ]

Do you see an error indicating that the context is too long for the model to handle?
Try different values of n to see where the boundary of context length lies.

1.6 - Evaluation Strategies

A Note On Evaluation Strategies

Evaluating model performance on desired metrics such as creates a properly formatted list is relatively straightforward and traditional accuracy measurements (ie. properly formatted outputs/total outputs) can be used.

For evaluating more subjective outcomes such as completeness of the checklist there are other strategies that can be explored for task-specific LLMs.

During this initial experimental stage, it makes sense to have expert humans review outputs to determine the model's performance. A common pattern that emerges when developing and evaluating cybersecurity use cases around LLMs is as follows:

Experiment using a few golden examples to determine feasibility, and evaluate candidate models and prompts by hand
Collect feedback on initial model outputs from experts and use this feedback to create a larger dataset
Use the newly created larger dataset from experts to create use-case-specific training and benchmark datasets

Since getting these initial results into the hands of experts for evaluation is oftentimes a crucial component for obtaining a larger benchmark dataset, we will focus on quickly and easily building out the end-to-end pipeline for this use case example.

Note: Please wait here until instructed to continue with running the notebook.

2 - Prototyping

Now that we have the task generation for this workflow ready, how can we automate getting the answers for our checklist items?

2.1 - Overview

It is possible to build a language model-based system that accesses external knowledge sources to complete tasks. In Section 1.3, we added additional CVE details into the prompt by hand. While this strategy can be effective for adding additional context for very specific items like CVE Details, it requires a priori knowledge of what details to include (like those from NVD). When you would like to help your LLM with its query by adding more context in real-time, you're ready for RAG (Retrieval Augmented Generation).

When a query or checklist item is posed to an LLM equipped with RAG, the model first consults the vector database to find relevant information related to the query. This retrieved data is then combined with the original question and fed back into the LLM. With this enriched context, the LLM can generate a more accurate and informed response, potentially including evidence or reasoning based on the newly incorporated data. This approach not only improves the quality of the LLM's outputs but also gives our tool access to project- and container-specific information to determine CVE exploitability.

2.2 - Building the Vector Database

In addition to having a query and LLM, RAG requires additional information to be stored in a vector database. One mechanism of finding the proper information from the database is to first embed the query into the same vector space and retrieve the top most similar items via a distance metric. The additional information is then presented in the prompt of the LLM. The neighboring vectors in the database are said to be "semantically similar" to the query and likely relevant.

For our demonstration purposes, we would like our LLM to be able to access the code repository of the project we're interested in checking for exploitable CVEs. The first step is transforming the specific repo into a vector database. Before that, lets pull a shallow clone of the Morpheus 24.03 branch from GitHub and use that as the codebase for this example. We'll also set up a logging directory for the Morpheus LLM Client logs.

[ ]

Note: Please wait here until instructed to continue with running the notebook.

2.3 - Running a RAG Pipeline with Morpheus

Now that we have built a vector database to provide external knowledge for the LLM, we need to make a tool that can query the vector database, add the information to the prompt, and execute the LLM query. There are many tools out there that can perform this task, but in this lab, we will be using NVIDIA Morpheus.

2.3.1 - Morpheus Overview

NVIDIA Morpheus is an open AI application framework that aids cybersecurity experts in building high-performance pipelines for cybersecurity workflows. Morpheus is well suited for building a RAG pipeline due to its LLM Engine, which is specifically designed to aid in integrating LLMs into high throughput and low latency pipelines. A complete guide covering Morpheus is beyond the scope of this notebook but more information on Morpheus can be found at the following locations:

Documentation: https://docs.nvidia.com/morpheus/index.html
Github Repo: https://github.com/nv-morpheus/Morpheus
Morpheus Examples: https://github.com/nv-morpheus/Morpheus/tree/branch-24.03/examples

To start building a Morpheus pipeline, the first step is always to create a configuration object. The configuration object controls global options for the pipeline such as batch size, number of threads, logging, and more. For our needs, we can use the default values and only need to create the object which we will be passing to each pipeline.

[ ]

2.3.2 - Building a Morpheus RAG Pipeline

Below, we will build a pipeline that uses Morpheus to answer questions about the code in the repository that we created a vector database for. This works by using the LLMEngine in Morpheus with a RAGNode.

[ ]

2.3.3 - RAG Limitations

Using the pipeline we built, we can now ask questions about the code in the repository and the LLM will be able to use the vector database to answer them. However, what happens if we need to ask questions about code that is not in the vector database? For example, what if we needed to ask questions about the dependencies that the code uses? Would the LLM be able to answer these questions? Let's try it out by re-running our RAG pipeline with a more complex question:

[ ]

It's likely that the model was not able to determine the answer to this question because it would need additional information. Depending on the model used, you might see output similar to:

Without further information about the `langchain` package or its documentation, it's difficult to determine if any specific functions or methods used in the code are deprecated.

How would we go about solving this problem?

2.4 - Running the CVE Pipeline with Morpheus

2.4.1 - Answering Complex Questions with RAG + LLM Agents

To answer a question about the existence of deprecated langchain functions, the model needs to look up versions of the packages in our container or project. We can add an additional knowledge source such as a Software Bill of Materials (SBOM). With multiple tools/knowledge sources- SBOM Package Checker and Docker Container Code QA System we need a new framework to allow our LLM to choose what tools it needs to use and synthesize the responses. One method we can use is LangChain agents.

An agent in this sense is an LLM that has "agency" to determine what sources of information it needs to retrieve to answer questions. This can be achieved through prompting. The most simplistic prompt to use to turn an LLM into an agent with tool usage might look like this:

You are a helpful assistant. Help the user answer any questions.

You have access to the following tools:

{tools}

In order to use a tool, you can use <tool></tool> and <tool_input></tool_input> tags.
You will then get back a response in the form <observation></observation>
When you are done, respond with a final answer between <final_answer></final_answer>. 

Question: {input}

Ideally, with just one round of query-> tool-> observation-> final answer, the LLM will get the information it needs to answer simple queries such as What version of PyArrow is in the repo?

But what about more complex queries such as Does the code repo use langchain functions which are deprecated? This query would require the LLM to first find what functions are deprecated before searching the code base for them. We would prompt the LLM to use a series of steps (repeated N times): Thought, Action, and Observation. This process loop of reasoning and acting is called a ReAct Agent. In practice, it could be like this:

query: Does the morpheus code repo use langchain functions which are deprecated?
> Entering new AgentExecutor chain...
I need to check the langchain version in the container's SBOM and the deprecated source code functions.
Action: SBOM Package Checker
Action Input: langchain
Observation: 0.1.12
Thought: The langchain version in the container is 0.1.12.
Thought: I need to check the langchain source code for deprecated functions.
Action: Docker Container Code QA System
Action Input: Does the repo use the format_tool_to_openai_function or __call__ from langchain?
Observation: No, the repo does not use format_tool_to_openai_function or __call__ from langchain.
Thought: I now know the final answer.
Final Answer: The morpheus code repo does not use langchain functions which are deprecated.

How can we incorporate these powerful ReAct agents and their tools into an end-to-end pipeline?

2.4.2 - The Morpheus CVE Pipeline

To convert our RAG pipeline into a CVE pipeline, all we need to do is update the LLM engine to run the CVE steps instead of a single RAG node as before.

[ ]

2.4.3 - The Engine Config

The EngineConfig object controls options about the CVE pipeline we are building. It allows us to contain all of the settings in a single object which can be easily used from many different classes which will be used to construct the pipeline. Below we will create the default configuration we will be using for the rest of the notebook.

[ ]

2.4.4 - Running the Pipeline

Now that the pipeline has been defined and the configuration variables have been created, it's time to run the pipeline. The final step is to convert the PyArrow intel dictionary into a single string that our run_cve_pipeline function accepts using a template cve_details_template. To simplify converting intel dictionaries into strings in the rest of the notebook, we will reuse this template.

[ ]

The output of the pipeline should be similar in theme to the following:

Received 1 responses:
{
  "checklist":{
    "0":[
      "Check for PyArrow: Verify if your project uses the PyArrow library, which is the affected package. If PyArrow is not a dependency in your project, then your code is not vulnerable to this CVE.",
      "Review Affected Versions: If PyArrow is used, check the version that your project depends on. According to the vulnerability details, versions before 14.0.1 are vulnerable.",
      "Review Code To Check for Deserialization of Untrusted Data: Check if the IPC and Parquet readers are used to deserialize untrusted data, which can lead to arbitrary code execution.",
      "Check for Mitigation: If upgrading to PyArrow 14.0.1 or later is not possible, check if the `pyarrow-hotfix` package is imported to disable the vulnerability on older PyArrow versions."
    ]
  },
  "response":{
    "0":[
      "Yes, the project uses the PyArrow library, which is the affected package.",
      "Yes, the Docker container is using a vulnerable version of PyArrow (11.0.0).",
      "No, the IPC and Parquet readers are not used to deserialize untrusted data.",
      "Yes, the `pyarrow-hotfix` package is imported to disable the vulnerability on older PyArrow versions."
    ]
  }
}

In the output, we can see the output from the first model, which will be the generated checklist items, and the output of each agent, which will be the response to each checklist item. Looking at the checklist items and answers, we can see that the model has successfully determined that the project is vulnerable to the CVE.

NOTE: Depending on your choice for the Agent or Checklist model, you will see different outputs that can vary in quality quite drastically. Try and explore a few different model choices and temperatures to explore what that looks like. You may also find some inconsistency in results when keeping your parameters constant. This stochasticity is a natural occurence with LLMs, and can be mitigated with prompt engineering or fine tuning.

2.4.5 - Hitting the Limits of the LLMs

While LLMs can work well for many tasks, they are not perfect. They can fail on seemingly simple tasks, get into a loop, or not follow the output formatting correctly. These edge cases can be hard to catch and can be difficult to debug. For example, if we use the below prompt about Log4j and change the model we use for the Agent, what happens when we run the pipeline?

[ ]

When we run the pipeline with the Log4j example, it hits an exception instead of running the pipeline to completion. The error message is Error running agent: An output parsing error occurred because the agent was not able to reason through the checklist while following LangChain's formatting guidelines. If we look closer, we can see that the model generated the following output for each checklist item:

[
      "Error running agent: An output parsing error occurred. In order to pass this error back to the agent and have it try again, pass `handle_parsing_errors=True` to the AgentExecutor. This is the error: Could not parse LLM output: ` I need to check if the log4j library is present in the Docker`",
      "Error running agent: An output parsing error occurred. In order to pass this error back to the agent and have it try again, pass `handle_parsing_errors=True` to the AgentExecutor. This is the error: Could not parse LLM output: ` To answer this question, I need to find out which version of log4j`",
      "Error running agent: An output parsing error occurred. In order to pass this error back to the agent and have it try again, pass `handle_parsing_errors=True` to the AgentExecutor. This is the error: Could not parse LLM output: ` To answer this question, I need to inspect the log4j configuration within the`",
      "Error running agent: An output parsing error occurred. In order to pass this error back to the agent and have it try again, pass `handle_parsing_errors=True` to the AgentExecutor. This is the error: Could not parse LLM output: ` To answer this question, I need to check if the Docker container uses log`"
    ]

Such errors can be hard to debug as it is explicit why a seemingly innocuous sentence about a thought leads to a parsing error. The reason this occurs is because the LangChain Zero Shot Agent requires every response from the Agent to always end with either a request for an Action or a Final Answer. We see above that the response contains neither. This occurs despite us explicitly asking the agent to follow those guidelines, as is evident in the cyber_dev_day.pipeline_utils.build_agent_executor method as follows:

 Action input must only contain the exact input, do not provide any text following that in your response. Always end your response with either an action, or a final answer.

Careful debugging of such output is critical, and some strategies for preventing such errors could include few-shot prompting techniques, model fine-tuninging, or changing the choice of our model.

Note: Please wait here until instructed to continue with running the notebook.

3 - Beyond Prototyping

Up until now, we have been using the pipeline we built to answer questions about the code in the repository. While this works for a few hand picked use cases, it is not suitable to deploy into a production environment for several reasons:

The LLM models fail to work on some questions which can generate errors in the pipeline
- Since the pipeline chains many LLM calls together, a single error can cause the entire pipeline to fail. For a production environment, we would need to handle these errors more gracefully or improve the model to reduce the number of errors.
The pipeline is not optimized for performance
- The pipeline is slow to run, because each model needs to be executed sequentially. For a production environment, we would need to optimize the pipeline to handle multiple requests at once.
The pipeline cannot be easily integrated into other systems
- The pipeline is a standalone script which reads from a single file and needs to be run manually. For a production environment, the pipeline would need to be integrated with other systems, such as a web server or a chatbot.

In this section, we will address some of the limitations we encountered in the previous section and discuss how we can overcome them utilizing NIM and Morpheus.

3.1 - Scaling the Pipeline

When running pipelines which utilize LLMs, it's important to understand how the LLMs are executed to parallelize their execution as much as possible. This is because LLMs can take a long time to run, as low as a few hundred milliseconds and upwards of a few seconds. Running LLMs serially can compound those runtimes, leading to execution times that grow linearly with the number of LLM requests. A simple diagram of the execution of LLMs for our CVE pipeline is shown below:

Single CVE - Serial

In the diagram above, we can see that the LLMs are executed serially, one after the other. This is not ideal as the execution time of the pipeline is directly proportional to the number of LLMs that are executed. However, there is no dependency between the LLM calls in each of the checklist items. This means that we can parallelize the execution of the LLMs to reduce the overall execution time of the pipeline. A simple diagram of the parallel execution of LLMs for our CVE pipeline is shown below:

Single CVE - Parallel

In the diagram above, we can see that the total execution time has been reduced as the checklist agent LLMs are executed in parallel. The total execution time is now the maximum time taken to execute any of the LLM agent chains. This is a significant improvement over the serial execution of the LLMs. But what happens if we need to run the entire pipeline for multiple CVEs? A naive approach would be to run the pipeline for each CVE serially, which is shown below:

Multiple CVE - Serial

With most LLM libraries, this is the default behavior and improving on this requires more complex solutions such as multiprocessing or distributed workers. However, with Morpheus, this is trivial since Morpheus benefits from pipeline parallelism where each message is processed in an assembly line fashion. This means that we can start processing the next message before the previous one is even completed. A simple diagram of the parallel execution of the pipeline for multiple CVEs is shown below:

Multiple CVE - Parallel

[ ]

If you look at the above output, you should see a section that looks like the following:

> Entering new AgentExecutor chain...

> Entering new AgentExecutor chain...

> Entering new AgentExecutor chain...

> Entering new AgentExecutor chain...

> Entering new AgentExecutor chain...

Because each executor chain is being started before the previous one completes, they are all running in parallel. But can we verify that this is actually leading to a performance improvement? Let's run the pipeline for a single CVE and multiple CVEs and compare their execution time.

[ ]

Your actual execution time may differ, but it should look something like the following:

Executing 1 CVE(s). Total: 44263.65375518799 ms, Average: 44263.65375518799 ms
Executing 5 CVE(s). Total: 66988.19637298584 ms, Average: 13397.639274597168 ms
Executing 10 CVE(s). Total: 62277.28486061096 ms, Average: 6227.728486061096 ms

As you can see, the average execution time per CVE actually goes down as we increase the number of pipeline runs due to the fact that they are being run in parallel. To get an idea of how well the pipeline scales, we can make a plot of the CVE count vs runtimes for the pipeline.

[ ]

We can see the actual line is much closer to the parallel line than the serial line, indicating we are running most of the LLMs in parallel.

3.2 - Event Driven Pipeline: Creating a Microservice

In a true production environment, the CVE scans would be triggered by some other event, such as a new container being uploaded into a registry or a new project being created. In this section, we will show how to create a microservice that can be triggered by an event and run the pipeline we built in the previous sections.

Previously, when our pipeline was started, it would read all inputs from a DataFrame and run the pipeline for each input. Once the pipeline was done processing the DataFrame, it would shut down. To run the pipeline as a microservice, we need to modify the pipeline to run continuously and listen for new inputs on an HTTP endpoint.

Fortunately, in Morpheus this is as easy as changing out the type of source that is used in the pipeline. The code below is identical to the previous pipeline except we have changed the source from InMemorySourceStage to HttpServerSourceStage. The HttpServerSourceStage class listens for new inputs on an HTTP endpoint and passes them to the next stage in the pipeline. It pulls the inputs from the request body and passes them to the pipeline to be processed.

Additionally, right after the HttpServerSourceStage we have added a simple custom stage to the pipeline print_payload. This custom stage simply prints the payload that was passed to the pipeline. This is useful for debugging and logging exactly when the pipeline was triggered since the results may take time to process and be shown to the console.

[ ]

Finally, we can start our microservice by running the pipeline as we have in the past. While the pipeline is running, move on to the next section to see how to trigger the pipeline with an HTTP request.

Note: When executed, the following cell will run indefinitely. You will need to interrupt the kernel to stop it.

[ ]

3.2.1 - Triggering the Microservice

To trigger the microservice, we will use a CURL request to send a request to the microservice. Since the notebook cannot run commands while the microservice is running, we need to open up a new terminal to send the request. To do that, follow the steps below:

In Jupyter Lab, press Ctrl + Shift + L (Shift + ⌘ + L on Mac) to open a new Launcher tab
In the Launcher tab, click on the Terminal icon to open a new terminal
In the terminal, run the following command to send a request to the microservice:

curl --request POST \
  --url http://localhost:26302/scan \
  --header 'Content-Type: application/json' \
  --data '[{
      "cve_info" : "An issue was discovered in the Linux kernel through 6.0.9. drivers/media/dvb-core/dvbdev.c has a use-after-free, related to dvb_register_device dynamically allocating fops."
   }]'

Once the request is sent, the microservice will process the request and return the results in the terminal
1. To see the results, switch back to the Notebook tab. You should see that the microservice received your request and started processing it.
```
I20240308 16:00:56.422039 3010283 http_server.cpp:129] Received request: POST : /scan
```
1. It helps to have the terminal and the notebook side by side so you can see the results in the terminal as they come in. To do this, click on the terminal tab and drag it to the right side of the screen. You should then be able to see the terminal and the notebook side by side similar to the image below:
To stop the microservice, interrupt the kernel by pressing the stop button in the toolbar

4 - Conclusion

Throughout this notebook, we explored how GenAI and LLMs can take the transformative role in cybersecurity through automating the CVE analysis workflow. Here are the key learnings and takeaways:

Generative AI and Cybersecurity

The Role of GenAI and LLMs in Cybersecurity: Learned about the transformative impact of GenAI and LLMs in cybersecurity, particularly in automating and improving threat detection, analysis, and response. These technologies are crucial for mitigating the manual and time-consuming aspects of cybersecurity tasks.

CVE Impact Analysis

Challenges in CVE impact analysis: Challenges include the intensive effort required for gathering information, the complexity of making informed decisions, and the fact that the risk posed by vulnerabilities can vary greatly depending on the specific environment in which they are found.
Event-Driven LLM Agent Pipeline: Learned about the concept and implementation of an event-driven LLM agent pipeline as a solution to streamline the CVE analysis process.

Hands-On with LLMs

LLM Inferencing Through NVIDIA Inference Microservices (NIM): Interacted with LLMs through NIM and the nemollm Python client library, leveraging the cloud-native framework to simplify the process of making LLM inference requests.
Refining Model Outputs with Prompt Engineering: Gained insights into various prompting techniques, including persona-based prompting, prompt templating, and both one-shot and few-shot learning methods.
Evaluating Model Performance: Explored strategies for assessing model performance, such as conducting format checks, undergoing manual expert reviews, and creating benchmark datasets.

Utilization of Retrieval-Augmented Generation (RAG)

RAG Functionality: Understood how RAG can augment LLM responses by incorporating external knowledge, thus enhancing the accuracy and context-relevance of the outputs for cybersecurity applications.
Building RAG Pipelines with Morpheus: Learned to build RAG pipelines using NVIDIA Morpheus, focusing on its application in constructing high-performance AI-driven cybersecurity workflows.

Prototyping to Production

Fine-Tuning for Task-Specific Improvements: Understood the importance of fine-tuning LLMs on specific tasks to overcome limitations and improve output quality.
Scaling and Parallelization: Learned about Morpheus’ ability to execute multiple LLM inquiries in parallel, significantly reducing the overall runtime of LLM pipelines.
Event-Driven Microservice Creation: Explored the creation of a microservice that responds to real-world events, enabling the automated execution of the CVE analysis pipeline in response to triggers such as container updates.

The tutorial demonstrates how to utilize NIM and NVIDIA Morpheus to develop an LLM-powered agent that assists security analysts with CVE impact analysis. It provides practical insights into refining model outputs, integrating diverse technologies into workflows, and deploying scalable, event-driven solutions for real-world applications. This tutorial serves as a solid starting point for anyone interested in leveraging LLMs to address real-world challenges in cybersecurity and beyond.

Cyber Dev Day

An Introduction to Developing Agents with NVIDIA Morpheus

Introduction

Problem Statement: Common Vulnerabilities and Exposures (CVE) Impact Analysis

Challenges

Tutorial Goals

Table of Contents

0 - Environment Setup

1 - Intro to Interacting with LLMs

1.1 - Python Calls to LLM API

1.1.1 - Explore On Your Own: Different Models

1.1.2 - Explore On Your Own: Model Parameters

1.2 - Prompt Engineering

1.2.1 - Explore on your own: Different Personas

1.3 - Prompt Templating

1.3.1 - Reflective Questions

1.3.2 - Explore On Your Own: Different Models

1.3.3 - Evaluating Model Performance through Formatting Checks

1.4 - One-Shot Learning

1.4.1 - Explore on your own: Robustness

1.5 - Few-Shot Learning and Detailed Prompts

1.5.1 - Explore on your own

1.6 - Evaluation Strategies

2 - Prototyping

2.1 - Overview

2.2 - Building the Vector Database

2.3 - Running a RAG Pipeline with Morpheus

2.3.1 - Morpheus Overview

2.3.2 - Building a Morpheus RAG Pipeline

2.3.3 - RAG Limitations

2.4 - Running the CVE Pipeline with Morpheus

2.4.1 - Answering Complex Questions with RAG + LLM Agents

2.4.2 - The Morpheus CVE Pipeline

2.4.3 - The Engine Config

2.4.4 - Running the Pipeline

2.4.5 - Hitting the Limits of the LLMs

3 - Beyond Prototyping

3.1 - Scaling the Pipeline

3.2 - Event Driven Pipeline: Creating a Microservice

3.2.1 - Triggering the Microservice

4 - Conclusion

Generative AI and Cybersecurity

CVE Impact Analysis

Hands-On with LLMs

Utilization of Retrieval-Augmented Generation (RAG)

Prototyping to Production