Getting started with Oracle Vector Store support in LangChain

In this post, I would like to show you the basics of how to use the Oracle Vector Store support in LangChain. I am using Visual Studio Code with the Python and Jupyter extensions from Microsoft installed. I will show more detailed usage in future posts!

Prefer to watch a video? Check it out here:

To get started, create a new project in Visual Studio code, and then create a new Jupyter Notebook using File > New File… then choose Jupyter Notebook as the type of file, and save your new file at getting_started.ipynb,

First, we need to set up the Python runtime environment. Click on the Select Kernel button (its on the top right). Select Python Environment then Create Python Environment. Select the option to create a Venv (Virtual Environment) and choose your Python interpreter. I recommend using at least Python 3.11. This will download all the necessary files and will take a minute or two.

In this example, we will use OpenAI for our chat model. You’ll need to get an API Key from OpenAI, which you can do by logging into https://platform.openai.com/settings/organization/api-keys and creating a key. Of course you could use a different model, including a self-hosted model so that you don’t have to send your data outside your organization. I’ll cover that in future posts, stay tuned!

In the first cell, check that the type is Python and enter this code:

%pip install -qU "langchain[openai]"

Press Shift+Enter or click on the Run icon to run this code block. This will also take a minute or so to install the LangChain library for OpenAI.

Now create a second cell and paste in this code:

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

model = init_chat_model("gpt-4o-mini", model_provider="openai")

Run this block, and when it prompts you for your key, paste in the key, it will start with something like sk-proj and have a long string of mostly letters and numbers after that. This will save your key in the environment so that you don’t have to keep entering it each time.

Now, we are ready to talk to the LLM! Let’s try a simple prompt. Create a new cell and enter this code:

model.invoke("Hello, world!")

Run this cell and observe the output. It should look something like this:

Great, now we are ready to connect to a vector store. If you don’t already have one, start up an instance of Oracle Database 23ai in a container on your machine. Run this command in a terminal window (not the notebook)

docker run -d --name db23ai \
  -p 1521:1521 \
  -e ORACLE_PWD=Welcome12345 \
  -v db23ai-volume:/opt/oracle/oradata \
  container-registry.oracle.com/database/free:latest

This will start up an Oracle Database 23ai Free instance in a container. It will have a PDB called FREEPDB1 and the password for PDBADMIN (and SYS and SYSTEM) will be Welcome12345.

Now, run the following command to create an Oracle user with appropriate permissions to create a vector store:

docker exec -i db23ai sqlplus sys/Welcome12345@localhost:1521/FREEPDB1 as sysdba <<EOF
alter session set container=FREEPDB1;
create user vector identified by vector;
grant connect, resource, unlimited tablespace, create credential, create procedure, create any index to vector;
commit;
EOF

Let’s connect to the database! First we’ll isntall the oracledb library. Create a new cell and enter this code:

%pip install oracledb

Run this code block to install the libary.

Now create a new code block with this code:

import oracledb

username = "vector"
password = "vector"
dsn = "localhost:1521/FREEPDB1"

try:
    connection = oracledb.connect(
        user=username, 
        password=password, 
        dsn=dsn)
    print("Connection successful!")
except Exception as e:
    print("Connection failed!")

Run this code block. You should see the output “Connection successful!”

Now, let’s install the dependencies we will need to load some documents into the vector store. Create a new cell with this code and run it:

%pip install langchain-community langchain-huggingface

Now, import the things we will need by creating a new call with this code and running it:

from langchain_community.vectorstores import oraclevs
from langchain_community.vectorstores.oraclevs import OracleVS
from langchain_community.vectorstores.utils import DistanceStrategy
from langchain_core.documents import Document
from langchain_huggingface import HuggingFaceEmbeddings

We are going to need some documents to load into the vector store, so let’s define some to use for an example. In real life, you’d probably want use your own non-public documents to load a vector store if you were building a chatbot or using retrieval augmented generation. Create and run a new call with this code:

documents_json_list = [
    {
        "id": "moby_dick_2701_P1",
        "text": "Queequeg was a native of Rokovoko, an island far away to the West and South. It is not down in any map; true places never are.",
        "link": "https://www.gutenberg.org/cache/epub/2701/pg2701-images.html#link2HCH0012",
    },
    {
        "id": "moby_dick_2701_P2",
        "text": "It was not a great while after the affair of the pipe, that one morning shortly after breakfast, Ahab, as was his wont, ascended the cabin-gangway to the deck. There most sea-captains usually walk at that hour, as country gentlemen, after the same meal, take a few turns in the garden.",
        "link": "https://www.gutenberg.org/cache/epub/2701/pg2701-images.html#link2HCH0036",
    },
    {
        "id": "moby_dick_2701_P3",
        "text": "Now, from the South and West the Pequod was drawing nigh to Formosa and the Bashee Isles, between which lies one of the tropical outlets from the China waters into the Pacific. And so Starbuck found Ahab with a general chart of the oriental archipelagoes spread before him; and another separate one representing the long eastern coasts of the Japanese islands—Niphon, Matsmai, and Sikoke. ",
        "link": "https://www.gutenberg.org/cache/epub/2701/pg2701-images.html#link2HCH0109",
    },
]

Now, let’s load them into a LangChain documents list with some metadata. Create and run a cell with this code:

# Create Langchain Documents

documents_langchain = []

for doc in documents_json_list:
    metadata = {"id": doc["id"], "link": doc["link"]}
    doc_langchain = Document(page_content=doc["text"], metadata=metadata)
    documents_langchain.append(doc_langchain)

Ok, great. Now we can create a vector store and load those documents. Create and run a cell with this code:

model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

vector_store = OracleVS.from_documents(
    documents_langchain,
    model,
    client=connection,
    table_name="Documents_COSINE",
    distance_strategy=DistanceStrategy.COSINE,
)

Let’s have a look in the database and see what was created. Run this code in your terminal:

docker exec -i db23ai sqlplus vector/vector@localhost:1521/FREEPDB1 <<EOF
select table_name from user_tables;
describe documents_cosine;
column id format a20;
column text format a30;
column metadata format a30;
column embedding format a30;
set linesize 150;
select * from documents_cosine;
EOF

You should see output similar to this:

SQL>
TABLE_NAME
--------------------------------------------------------------------------------
DOCUMENTS_COSINE

SQL>  Name                                         Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                        NOT NULL RAW(16)
 TEXT                                               CLOB
 METADATA                                           JSON
 EMBEDDING                                          VECTOR(768, FLOAT32)

SQL> SQL> SQL> SQL> SQL> SQL>
ID                   TEXT                           METADATA                       EMBEDDING
-------------------- ------------------------------ ------------------------------ ------------------------------
957B602A0B55C487     Now, from the South and West t {"id":"moby_dick_2701_P3","lin [9.29364376E-003,-5.70030287E-
                     he Pequod was drawing nigh to  k":"https://www.gutenberg.org/ 002,-4.62282933E-002,-1.599499
                     Formosa and the Bash           cache/epub/2701/pg27           58E-002,

A8A71597D56432FD     Queequeg was a native of Rokov {"id":"moby_dick_2701_P1","lin [4.28722538E-002,-8.80071707E-
                     oko, an island far away to the k":"https://www.gutenberg.org/ 003,3.56001826E-003,6.765306E-
                      West and South. It            cache/epub/2701/pg27           003,

E7675836CF07A695     It was not a great while after {"id":"moby_dick_2701_P2","lin [1.06763924E-002,3.91203648E-0
                      the affair of the pipe, that  k":"https://www.gutenberg.org/ 04,-1.01576066E-002,-3.5316135
                     one morning shortly            cache/epub/2701/pg27           7E-002,

Now, let’s do a vector similarity search. Create and run a cell with this code:

query = 'Where is Rokovoko?'
print(vector_store.similarity_search(query, 1))

query2 = 'What does Ahab like to do after breakfast?'
print(vector_store.similarity_search(query2, 1))

This will find the one (1) nearest match in each case. You should get an answer like this:

[Document(metadata={'id': 'moby_dick_2701_P1', 'link': 'https://www.gutenberg.org/cache/epub/2701/pg2701-images.html#link2HCH0012'}, page_content='Queequeg was a native of Rokovoko, an island far away to the West and South. It is not down in any map; true places never are.')]

[Document(metadata={'id': 'moby_dick_2701_P2', 'link': 'https://www.gutenberg.org/cache/epub/2701/pg2701-images.html#link2HCH0036'}, page_content='It was not a great while after the affair of the pipe, that one morning shortly after breakfast, Ahab, as was his wont, ascended the cabin-gangway to the deck. There most sea-captains usually walk at that hour, as country gentlemen, after the same meal, take a few turns in the garden.')]

Well, there you go, that’s the most basic example of creating a vector store, loading some documents into it and doing a simple similarity search. Stay tuned to learn about more advanced features!

About Mark Nelson

Mark Nelson is a Developer Evangelist at Oracle, focusing on microservices and messaging. Before this role, Mark was an Architect in the Enterprise Cloud-Native Java Team, the Verrazzano Enterprise Container Platform project, worked on Wercker, WebLogic and was a senior member of the A-Team since 2010, and worked in Sales Consulting at Oracle since 2006 and various roles at IBM since 1994.
This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

1 Response to Getting started with Oracle Vector Store support in LangChain

  1. Pingback: Basic Retrieval Augmented Generation with Oracle Vector Store in LangChain | RedStack

Leave a comment