Notebooks
P
Pinecone
Gpt 4 Langchain Docs

Gpt 4 Langchain Docs

vector-databasesemantic-searchlearnopenaiAILLMgenerationPythonjupyter-notebookpinecone-examples

Open In Colab Open nbviewer

GPT4 with Retrieval Augmentation over LangChain Docs

Open nbviewer

In this notebook we'll work through an example of using GPT-4 with retrieval augmentation to answer questions about the LangChain Python library.

[1]

🚨 Note: the above pip install is formatted for Jupyter notebooks. If running elsewhere you may need to drop the !.


In this example, we will download the LangChain docs, we can find a static version of the docs on Hugging Face datasets in jamescalam/langchain-docs-23-06-27. To download them we do:

[2]
Downloading and preparing dataset json/jamescalam--langchain-docs-23-06-27 to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--langchain-docs-23-06-27-4631410d07444b03/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96...
Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]
Downloading data:   0%|          | 0.00/4.68M [00:00<?, ?B/s]
Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]
Generating train split: 0 examples [00:00, ? examples/s]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--langchain-docs-23-06-27-4631410d07444b03/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96. Subsequent calls will reuse this data.
Dataset({
,    features: ['id', 'text', 'url'],
,    num_rows: 505
,})

This leaves us with 505 doc pages. Let's take a look at the format each one contains:

[3]
'Example Selector\uf0c1\nLogic for selecting examples to include in prompts.\nclass langchain.prompts.example_selector.LengthBasedExampleSelector(*, examples, example_prompt, get_text_length=<function _get_le'

We access the plaintext page content like so:

[4]
Example Selector
Logic for selecting examples to include in prompts.
class langchain.prompts.example_selector.LengthBasedExampleSelector(*, examples, example_prompt, get_text_length=<function _get_le

We can also find the source of each document:

[5]
'https://api.python.langchain.com/en/latest/modules/example_selector.html'

Now let's see how we can process all of these. We will chunk everything into ~500 token chunks, we can do this easily with langchain and tiktoken:

[6]
'cl100k_base'
[7]
[8]

Process the docs into more chunks using this approach.

[10]
  0%|          | 0/505 [00:00<?, ?it/s]
2482

Our chunks are ready so now we move onto embedding and indexing everything.

Initialize Embedding Model

We use text-embedding-ada-002 as the embedding model. We can embed text like so:

[11]
<OpenAIObject list at 0x7fc82e74dee0> JSON: {
,  "data": [
,    {
,      "created": null,
,      "id": "whisper-1",
,      "object": "engine",
,      "owner": "openai-internal",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "babbage",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "davinci",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-davinci-edit-001",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "babbage-code-search-code",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-similarity-babbage-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-embedding-ada-002",
,      "object": "engine",
,      "owner": "openai-internal",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "code-davinci-edit-001",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-davinci-001",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "ada",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "babbage-code-search-text",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "babbage-similarity",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "code-search-babbage-text-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-curie-001",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "gpt-4-0314",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "gpt-4-0613",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "code-search-babbage-code-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-ada-001",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-similarity-ada-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "curie-instruct-beta",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "gpt-4",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "ada-code-search-code",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "ada-similarity",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "code-search-ada-text-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-search-ada-query-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "davinci-search-document",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "ada-code-search-text",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-search-ada-doc-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "davinci-instruct-beta",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-similarity-curie-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "code-search-ada-code-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "ada-search-query",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-search-davinci-query-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "curie-search-query",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "davinci-search-query",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "babbage-search-document",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "ada-search-document",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-search-curie-query-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-search-babbage-doc-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "curie-search-document",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-search-curie-doc-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "babbage-search-query",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-babbage-001",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-search-davinci-doc-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-search-babbage-query-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "curie-similarity",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "gpt-3.5-turbo-0613",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "curie",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "gpt-3.5-turbo-16k-0613",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-similarity-davinci-001",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-davinci-002",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "gpt-3.5-turbo-0301",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "text-davinci-003",
,      "object": "engine",
,      "owner": "openai-internal",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "davinci-similarity",
,      "object": "engine",
,      "owner": "openai-dev",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "gpt-3.5-turbo",
,      "object": "engine",
,      "owner": "openai",
,      "permissions": null,
,      "ready": true
,    },
,    {
,      "created": null,
,      "id": "gpt-3.5-turbo-16k",
,      "object": "engine",
,      "owner": "openai-internal",
,      "permissions": null,
,      "ready": true
,    }
,  ],
,  "object": "list"
,}
[12]

In the response res we will find a JSON-like object containing our new embeddings within the 'data' field.

[13]
dict_keys(['object', 'data', 'model', 'usage'])

Inside 'data' we will find two records, one for each of the two sentences we just embedded. Each vector embedding contains 1536 dimensions (the output dimensionality of the text-embedding-ada-002 model.

[14]
2
[15]
(1536, 1536)

We will apply this same embedding logic to the langchain docs dataset we've just scraped. But before doing so we must create a place to store the embeddings.

Initializing the Index

Now we need a place to store these embeddings and enable a efficient vector search through them all. To do that we use Pinecone, we can get a free API key and enter it below where we will initialize our connection to Pinecone and create a new index.

[ ]

Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all available providers and regions here.

[ ]
[17]
[ ]

We can see the index is currently empty with a total_vector_count of 0. We can begin populating it with OpenAI text-embedding-ada-002 built embeddings like so:

[19]
  0%|          | 0/25 [00:00<?, ?it/s]

Now we've added all of our langchain docs to the index. With that we can move on to retrieval and then answer generation using GPT-4.

Retrieval

To search through our documents we first need to create a query vector xq. Using xq we will retrieve the most relevant chunks from the LangChain docs, like so:

[20]
[21]
{'matches': [{'id': '35afffd0-a42a-42ee-ac6f-92b5491183fb-0',
,              'metadata': {'chunk': 0.0,
,                           'text': 'Source code for langchain.chains.llm\n'
,                                   '"""Chain that just formats a prompt and '
,                                   'calls an LLM."""\n'
,                                   'from __future__ import annotations\n'
,                                   'import warnings\n'
,                                   'from typing import Any, Dict, List, '
,                                   'Optional, Sequence, Tuple, Union\n'
,                                   'from pydantic import Extra, Field\n'
,                                   'from langchain.base_language import '
,                                   'BaseLanguageModel\n'
,                                   'from langchain.callbacks.manager import (\n'
,                                   '    AsyncCallbackManager,\n'
,                                   '    AsyncCallbackManagerForChainRun,\n'
,                                   '    CallbackManager,\n'
,                                   '    CallbackManagerForChainRun,\n'
,                                   '    Callbacks,\n'
,                                   ')\n'
,                                   'from langchain.chains.base import Chain\n'
,                                   'from langchain.input import '
,                                   'get_colored_text\n'
,                                   'from langchain.load.dump import dumpd\n'
,                                   'from langchain.prompts.base import '
,                                   'BasePromptTemplate\n'
,                                   'from langchain.prompts.prompt import '
,                                   'PromptTemplate\n'
,                                   'from langchain.schema import (\n'
,                                   '    BaseLLMOutputParser,\n'
,                                   '    LLMResult,\n'
,                                   '    NoOpOutputParser,\n'
,                                   '    PromptValue,\n'
,                                   ')\n'
,                                   '[docs]class LLMChain(Chain):\n'
,                                   '    """Chain to run queries against LLMs.\n'
,                                   '    Example:\n'
,                                   '        .. code-block:: python\n'
,                                   '            from langchain import '
,                                   'LLMChain, OpenAI, PromptTemplate\n'
,                                   '            prompt_template = "Tell me a '
,                                   '{adjective} joke"\n'
,                                   '            prompt = PromptTemplate(\n'
,                                   '                '
,                                   'input_variables=["adjective"], '
,                                   'template=prompt_template\n'
,                                   '            )\n'
,                                   '            llm = LLMChain(llm=OpenAI(), '
,                                   'prompt=prompt)\n'
,                                   '    """\n'
,                                   '    @property\n'
,                                   '    def lc_serializable(self) -> bool:\n'
,                                   '        return True\n'
,                                   '    prompt: BasePromptTemplate\n'
,                                   '    """Prompt object to use."""\n'
,                                   '    llm: BaseLanguageModel\n'
,                                   '    """Language model to call."""\n'
,                                   '    output_key: str = "text"  #: :meta '
,                                   'private:\n'
,                                   '    output_parser: BaseLLMOutputParser = '
,                                   'Field(default_factory=NoOpOutputParser)\n'
,                                   '    """Output parser to use.\n'
,                                   '    Defaults to one that takes the most '
,                                   'likely string but does not change it \n'
,                                   '    otherwise."""\n'
,                                   '    return_final_only: bool = True\n'
,                                   '    """Whether to return only the final '
,                                   'parsed result. Defaults to True.\n'
,                                   '    If false, will return a bunch of extra '
,                                   'information about the generation."""\n'
,                                   '    llm_kwargs: dict = '
,                                   'Field(default_factory=dict)\n'
,                                   '    class Config:\n'
,                                   '        """Configuration for this pydantic '
,                                   'object."""\n'
,                                   '        extra = Extra.forbid\n'
,                                   '        arbitrary_types_allowed = True',
,                           'url': 'https://api.python.langchain.com/en/latest/_modules/langchain/chains/llm.html'},
,              'score': 0.800940871,
,              'values': []},
,             {'id': '35cde68a-b909-43b6-b918-81c4eb2db5cd-82',
,              'metadata': {'chunk': 82.0,
,                           'text': 'Bases: langchain.chains.base.Chain\n'
,                                   'Chain for question-answering with '
,                                   'self-verification.\n'
,                                   'Example\n'
,                                   'from langchain import OpenAI, '
,                                   'LLMSummarizationCheckerChain\n'
,                                   'llm = OpenAI(temperature=0.0)\n'
,                                   'checker_chain = '
,                                   'LLMSummarizationCheckerChain.from_llm(llm)\n'
,                                   'Parameters\n'
,                                   'memory '
,                                   '(Optional[langchain.schema.BaseMemory]) '
,                                   '– \n'
,                                   'callbacks '
,                                   '(Optional[Union[List[langchain.callbacks.base.BaseCallbackHandler], '
,                                   'langchain.callbacks.base.BaseCallbackManager]]) '
,                                   '– \n'
,                                   'callback_manager '
,                                   '(Optional[langchain.callbacks.base.BaseCallbackManager]) '
,                                   '– \n'
,                                   'verbose (bool) – \n'
,                                   'tags (Optional[List[str]]) – \n'
,                                   'sequential_chain '
,                                   '(langchain.chains.sequential.SequentialChain) '
,                                   '– \n'
,                                   'llm '
,                                   '(Optional[langchain.base_language.BaseLanguageModel]) '
,                                   '– \n'
,                                   'create_assertions_prompt '
,                                   '(langchain.prompts.prompt.PromptTemplate) '
,                                   '– \n'
,                                   'check_assertions_prompt '
,                                   '(langchain.prompts.prompt.PromptTemplate) '
,                                   '– \n'
,                                   'revised_summary_prompt '
,                                   '(langchain.prompts.prompt.PromptTemplate) '
,                                   '– \n'
,                                   'are_all_true_prompt '
,                                   '(langchain.prompts.prompt.PromptTemplate) '
,                                   '– \n'
,                                   'input_key (str) – \n'
,                                   'output_key (str) – \n'
,                                   'max_checks (int) – \n'
,                                   'Return type\n'
,                                   'None',
,                           'url': 'https://api.python.langchain.com/en/latest/modules/chains.html'},
,              'score': 0.79580605,
,              'values': []},
,             {'id': '993db45b-4e3b-431d-a2a6-48ed5944912a-1',
,              'metadata': {'chunk': 1.0,
,                           'text': '[docs]    @classmethod\n'
,                                   '    def from_llm(\n'
,                                   '        cls,\n'
,                                   '        llm: BaseLanguageModel,\n'
,                                   '        chain: LLMChain,\n'
,                                   '        critique_prompt: '
,                                   'BasePromptTemplate = CRITIQUE_PROMPT,\n'
,                                   '        revision_prompt: '
,                                   'BasePromptTemplate = REVISION_PROMPT,\n'
,                                   '        **kwargs: Any,\n'
,                                   '    ) -> "ConstitutionalChain":\n'
,                                   '        """Create a chain from an LLM."""\n'
,                                   '        critique_chain = LLMChain(llm=llm, '
,                                   'prompt=critique_prompt)\n'
,                                   '        revision_chain = LLMChain(llm=llm, '
,                                   'prompt=revision_prompt)\n'
,                                   '        return cls(\n'
,                                   '            chain=chain,\n'
,                                   '            '
,                                   'critique_chain=critique_chain,\n'
,                                   '            '
,                                   'revision_chain=revision_chain,\n'
,                                   '            **kwargs,\n'
,                                   '        )\n'
,                                   '    @property\n'
,                                   '    def input_keys(self) -> List[str]:\n'
,                                   '        """Defines the input keys."""\n'
,                                   '        return self.chain.input_keys\n'
,                                   '    @property\n'
,                                   '    def output_keys(self) -> List[str]:\n'
,                                   '        """Defines the output keys."""\n'
,                                   '        if '
,                                   'self.return_intermediate_steps:\n'
,                                   '            return ["output", '
,                                   '"critiques_and_revisions", '
,                                   '"initial_output"]\n'
,                                   '        return ["output"]\n'
,                                   '    def _call(\n'
,                                   '        self,\n'
,                                   '        inputs: Dict[str, Any],\n'
,                                   '        run_manager: '
,                                   'Optional[CallbackManagerForChainRun] = '
,                                   'None,\n'
,                                   '    ) -> Dict[str, Any]:\n'
,                                   '        _run_manager = run_manager or '
,                                   'CallbackManagerForChainRun.get_noop_manager()\n'
,                                   '        response = self.chain.run(\n'
,                                   '            **inputs,\n'
,                                   '            '
,                                   'callbacks=_run_manager.get_child("original"),\n'
,                                   '        )\n'
,                                   '        initial_response = response\n'
,                                   '        input_prompt = '
,                                   'self.chain.prompt.format(**inputs)\n'
,                                   '        _run_manager.on_text(\n'
,                                   '            text="Initial response: " + '
,                                   'response + "\\n\\n",\n'
,                                   '            verbose=self.verbose,\n'
,                                   '            color="yellow",\n'
,                                   '        )\n'
,                                   '        critiques_and_revisions = []\n'
,                                   '        for constitutional_principle in '
,                                   'self.constitutional_principles:\n'
,                                   '            # Do critique\n'
,                                   '            raw_critique = '
,                                   'self.critique_chain.run(\n'
,                                   '                '
,                                   'input_prompt=input_prompt,\n'
,                                   '                '
,                                   'output_from_model=response,\n'
,                                   '                '
,                                   'critique_request=constitutional_principle.critique_request,\n'
,                                   '                '
,                                   'callbacks=_run_manager.get_child("critique"),\n'
,                                   '            )\n'
,                                   '            critique = '
,                                   'self._parse_critique(\n'
,                                   '                '
,                                   'output_string=raw_critique,',
,                           'url': 'https://api.python.langchain.com/en/latest/_modules/langchain/chains/constitutional_ai/base.html'},
,              'score': 0.79369247,
,              'values': []},
,             {'id': 'adea5d40-2691-4bc9-9403-3360345bc25e-0',
,              'metadata': {'chunk': 0.0,
,                           'text': 'Source code for '
,                                   'langchain.chains.conversation.base\n'
,                                   '"""Chain that carries on a conversation '
,                                   'and calls an LLM."""\n'
,                                   'from typing import Dict, List\n'
,                                   'from pydantic import Extra, Field, '
,                                   'root_validator\n'
,                                   'from langchain.chains.conversation.prompt '
,                                   'import PROMPT\n'
,                                   'from langchain.chains.llm import LLMChain\n'
,                                   'from langchain.memory.buffer import '
,                                   'ConversationBufferMemory\n'
,                                   'from langchain.prompts.base import '
,                                   'BasePromptTemplate\n'
,                                   'from langchain.schema import BaseMemory\n'
,                                   '[docs]class ConversationChain(LLMChain):\n'
,                                   '    """Chain to have a conversation and '
,                                   'load context from memory.\n'
,                                   '    Example:\n'
,                                   '        .. code-block:: python\n'
,                                   '            from langchain import '
,                                   'ConversationChain, OpenAI\n'
,                                   '            conversation = '
,                                   'ConversationChain(llm=OpenAI())\n'
,                                   '    """\n'
,                                   '    memory: BaseMemory = '
,                                   'Field(default_factory=ConversationBufferMemory)\n'
,                                   '    """Default memory store."""\n'
,                                   '    prompt: BasePromptTemplate = PROMPT\n'
,                                   '    """Default conversation prompt to '
,                                   'use."""\n'
,                                   '    input_key: str = "input"  #: :meta '
,                                   'private:\n'
,                                   '    output_key: str = "response"  #: :meta '
,                                   'private:\n'
,                                   '    class Config:\n'
,                                   '        """Configuration for this pydantic '
,                                   'object."""\n'
,                                   '        extra = Extra.forbid\n'
,                                   '        arbitrary_types_allowed = True\n'
,                                   '    @property\n'
,                                   '    def input_keys(self) -> List[str]:\n'
,                                   '        """Use this since so some prompt '
,                                   'vars come from history."""\n'
,                                   '        return [self.input_key]\n'
,                                   '    @root_validator()\n'
,                                   '    def '
,                                   'validate_prompt_input_variables(cls, '
,                                   'values: Dict) -> Dict:\n'
,                                   '        """Validate that prompt input '
,                                   'variables are consistent."""\n'
,                                   '        memory_keys = '
,                                   'values["memory"].memory_variables\n'
,                                   '        input_key = values["input_key"]\n'
,                                   '        if input_key in memory_keys:\n'
,                                   '            raise ValueError(\n'
,                                   '                f"The input key '
,                                   '{input_key} was also found in the memory '
,                                   'keys "\n'
,                                   '                f"({memory_keys}) - please '
,                                   'provide keys that don\'t overlap."\n'
,                                   '            )\n'
,                                   '        prompt_variables = '
,                                   'values["prompt"].input_variables\n'
,                                   '        expected_keys = memory_keys + '
,                                   '[input_key]\n'
,                                   '        if set(expected_keys) != '
,                                   'set(prompt_variables):\n'
,                                   '            raise ValueError(\n'
,                                   '                "Got unexpected prompt '
,                                   'input variables. The prompt expects "\n'
,                                   '                f"{prompt_variables}, but '
,                                   'got {memory_keys} as inputs from "\n'
,                                   '                f"memory, and {input_key} '
,                                   'as the normal input key."\n'
,                                   '            )\n'
,                                   '        return values',
,                           'url': 'https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversation/base.html'},
,              'score': 0.792259932,
,              'values': []},
,             {'id': '3b6f9660-0346-4992-a6f5-b9cc2977f446-5',
,              'metadata': {'chunk': 5.0,
,                           'text': 'callbacks: Callbacks = None,\n'
,                                   '        **kwargs: Any,\n'
,                                   '    ) -> '
,                                   'BaseConversationalRetrievalChain:\n'
,                                   '        """Load chain from LLM."""\n'
,                                   '        combine_docs_chain_kwargs = '
,                                   'combine_docs_chain_kwargs or {}\n'
,                                   '        doc_chain = load_qa_chain(\n'
,                                   '            llm,\n'
,                                   '            chain_type=chain_type,\n'
,                                   '            callbacks=callbacks,\n'
,                                   '            **combine_docs_chain_kwargs,\n'
,                                   '        )\n'
,                                   '        condense_question_chain = '
,                                   'LLMChain(\n'
,                                   '            llm=llm, '
,                                   'prompt=condense_question_prompt, '
,                                   'callbacks=callbacks\n'
,                                   '        )\n'
,                                   '        return cls(\n'
,                                   '            vectorstore=vectorstore,\n'
,                                   '            combine_docs_chain=doc_chain,\n'
,                                   '            '
,                                   'question_generator=condense_question_chain,\n'
,                                   '            callbacks=callbacks,\n'
,                                   '            **kwargs,\n'
,                                   '        )',
,                           'url': 'https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html'},
,              'score': 0.791279614,
,              'values': []}],
, 'namespace': ''}

With retrieval complete, we move on to feeding these into GPT-4 to produce answers.

Retrieval Augmented Generation

GPT-4 is currently accessed via the ChatCompletions endpoint of OpenAI. To add the information we retrieved into the model, we need to pass it into our user prompts alongside our original query. We can do that like so:

[22]
[27]
Source code for langchain.chains.llm
"""Chain that just formats a prompt and calls an LLM."""
from __future__ import annotations
import warnings
from typing import Any, Dict, List, Optional, Sequence, Tuple, Union
from pydantic import Extra, Field
from langchain.base_language import BaseLanguageModel
from langchain.callbacks.manager import (
    AsyncCallbackManager,
    AsyncCallbackManagerForChainRun,
    CallbackManager,
    CallbackManagerForChainRun,
    Callbacks,
)
from langchain.chains.base import Chain
from langchain.input import get_colored_text
from langchain.load.dump import dumpd
from langchain.prompts.base import BasePromptTemplate
from langchain.prompts.prompt import PromptTemplate
from langchain.schema import (
    BaseLLMOutputParser,
    LLMResult,
    NoOpOutputParser,
    PromptValue,
)
[docs]class LLMChain(Chain):
    """Chain to run queries against LLMs.
    Example:
        .. code-block:: python
            from langchain import LLMChain, OpenAI, PromptTemplate
            prompt_template = "Tell me a {adjective} joke"
            prompt = PromptTemplate(
                input_variables=["adjective"], template=prompt_template
            )
            llm = LLMChain(llm=OpenAI(), prompt=prompt)
    """
    @property
    def lc_serializable(self) -> bool:
        return True
    prompt: BasePromptTemplate
    """Prompt object to use."""
    llm: BaseLanguageModel
    """Language model to call."""
    output_key: str = "text"  #: :meta private:
    output_parser: BaseLLMOutputParser = Field(default_factory=NoOpOutputParser)
    """Output parser to use.
    Defaults to one that takes the most likely string but does not change it 
    otherwise."""
    return_final_only: bool = True
    """Whether to return only the final parsed result. Defaults to True.
    If false, will return a bunch of extra information about the generation."""
    llm_kwargs: dict = Field(default_factory=dict)
    class Config:
        """Configuration for this pydantic object."""
        extra = Extra.forbid
        arbitrary_types_allowed = True

---

Bases: langchain.chains.base.Chain
Chain for question-answering with self-verification.
Example
from langchain import OpenAI, LLMSummarizationCheckerChain
llm = OpenAI(temperature=0.0)
checker_chain = LLMSummarizationCheckerChain.from_llm(llm)
Parameters
memory (Optional[langchain.schema.BaseMemory]) – 
callbacks (Optional[Union[List[langchain.callbacks.base.BaseCallbackHandler], langchain.callbacks.base.BaseCallbackManager]]) – 
callback_manager (Optional[langchain.callbacks.base.BaseCallbackManager]) – 
verbose (bool) – 
tags (Optional[List[str]]) – 
sequential_chain (langchain.chains.sequential.SequentialChain) – 
llm (Optional[langchain.base_language.BaseLanguageModel]) – 
create_assertions_prompt (langchain.prompts.prompt.PromptTemplate) – 
check_assertions_prompt (langchain.prompts.prompt.PromptTemplate) – 
revised_summary_prompt (langchain.prompts.prompt.PromptTemplate) – 
are_all_true_prompt (langchain.prompts.prompt.PromptTemplate) – 
input_key (str) – 
output_key (str) – 
max_checks (int) – 
Return type
None

---

[docs]    @classmethod
    def from_llm(
        cls,
        llm: BaseLanguageModel,
        chain: LLMChain,
        critique_prompt: BasePromptTemplate = CRITIQUE_PROMPT,
        revision_prompt: BasePromptTemplate = REVISION_PROMPT,
        **kwargs: Any,
    ) -> "ConstitutionalChain":
        """Create a chain from an LLM."""
        critique_chain = LLMChain(llm=llm, prompt=critique_prompt)
        revision_chain = LLMChain(llm=llm, prompt=revision_prompt)
        return cls(
            chain=chain,
            critique_chain=critique_chain,
            revision_chain=revision_chain,
            **kwargs,
        )
    @property
    def input_keys(self) -> List[str]:
        """Defines the input keys."""
        return self.chain.input_keys
    @property
    def output_keys(self) -> List[str]:
        """Defines the output keys."""
        if self.return_intermediate_steps:
            return ["output", "critiques_and_revisions", "initial_output"]
        return ["output"]
    def _call(
        self,
        inputs: Dict[str, Any],
        run_manager: Optional[CallbackManagerForChainRun] = None,
    ) -> Dict[str, Any]:
        _run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
        response = self.chain.run(
            **inputs,
            callbacks=_run_manager.get_child("original"),
        )
        initial_response = response
        input_prompt = self.chain.prompt.format(**inputs)
        _run_manager.on_text(
            text="Initial response: " + response + "\n\n",
            verbose=self.verbose,
            color="yellow",
        )
        critiques_and_revisions = []
        for constitutional_principle in self.constitutional_principles:
            # Do critique
            raw_critique = self.critique_chain.run(
                input_prompt=input_prompt,
                output_from_model=response,
                critique_request=constitutional_principle.critique_request,
                callbacks=_run_manager.get_child("critique"),
            )
            critique = self._parse_critique(
                output_string=raw_critique,

---

Source code for langchain.chains.conversation.base
"""Chain that carries on a conversation and calls an LLM."""
from typing import Dict, List
from pydantic import Extra, Field, root_validator
from langchain.chains.conversation.prompt import PROMPT
from langchain.chains.llm import LLMChain
from langchain.memory.buffer import ConversationBufferMemory
from langchain.prompts.base import BasePromptTemplate
from langchain.schema import BaseMemory
[docs]class ConversationChain(LLMChain):
    """Chain to have a conversation and load context from memory.
    Example:
        .. code-block:: python
            from langchain import ConversationChain, OpenAI
            conversation = ConversationChain(llm=OpenAI())
    """
    memory: BaseMemory = Field(default_factory=ConversationBufferMemory)
    """Default memory store."""
    prompt: BasePromptTemplate = PROMPT
    """Default conversation prompt to use."""
    input_key: str = "input"  #: :meta private:
    output_key: str = "response"  #: :meta private:
    class Config:
        """Configuration for this pydantic object."""
        extra = Extra.forbid
        arbitrary_types_allowed = True
    @property
    def input_keys(self) -> List[str]:
        """Use this since so some prompt vars come from history."""
        return [self.input_key]
    @root_validator()
    def validate_prompt_input_variables(cls, values: Dict) -> Dict:
        """Validate that prompt input variables are consistent."""
        memory_keys = values["memory"].memory_variables
        input_key = values["input_key"]
        if input_key in memory_keys:
            raise ValueError(
                f"The input key {input_key} was also found in the memory keys "
                f"({memory_keys}) - please provide keys that don't overlap."
            )
        prompt_variables = values["prompt"].input_variables
        expected_keys = memory_keys + [input_key]
        if set(expected_keys) != set(prompt_variables):
            raise ValueError(
                "Got unexpected prompt input variables. The prompt expects "
                f"{prompt_variables}, but got {memory_keys} as inputs from "
                f"memory, and {input_key} as the normal input key."
            )
        return values

---

callbacks: Callbacks = None,
        **kwargs: Any,
    ) -> BaseConversationalRetrievalChain:
        """Load chain from LLM."""
        combine_docs_chain_kwargs = combine_docs_chain_kwargs or {}
        doc_chain = load_qa_chain(
            llm,
            chain_type=chain_type,
            callbacks=callbacks,
            **combine_docs_chain_kwargs,
        )
        condense_question_chain = LLMChain(
            llm=llm, prompt=condense_question_prompt, callbacks=callbacks
        )
        return cls(
            vectorstore=vectorstore,
            combine_docs_chain=doc_chain,
            question_generator=condense_question_chain,
            callbacks=callbacks,
            **kwargs,
        )

-----

how do I use the LLMChain in LangChain?

Now we ask the question:

[28]

To display this response nicely, we will display it in markdown.

[29]

Let's compare this to a non-augmented query...

[30]

If we drop the "I don't know" part of the primer?

[31]

Then we see something even worse than "I don't know" — hallucinations. Clearly augmenting our queries with additional context can make a huge difference to the performance of our system.

Great, we've seen how to augment GPT-4 with semantic search to allow us to answer LangChain specific queries.

Once you're finished, we delete the index to save resources.

[32]