In this post, I would like to show you the basics of how to use the Oracle Vector Store support in LangChain. I am using Visual Studio Code with the Python and Jupyter extensions from Microsoft installed. I will show more detailed usage in future posts!
Prefer to watch a video? Check it out here:
To get started, create a new project in Visual Studio code, and then create a new Jupyter Notebook using File > New File… then choose Jupyter Notebook as the type of file, and save your new file at getting_started.ipynb,
First, we need to set up the Python runtime environment. Click on the Select Kernel button (its on the top right). Select Python Environment then Create Python Environment. Select the option to create a Venv (Virtual Environment) and choose your Python interpreter. I recommend using at least Python 3.11. This will download all the necessary files and will take a minute or two.
In this example, we will use OpenAI for our chat model. You’ll need to get an API Key from OpenAI, which you can do by logging into https://platform.openai.com/settings/organization/api-keys and creating a key. Of course you could use a different model, including a self-hosted model so that you don’t have to send your data outside your organization. I’ll cover that in future posts, stay tuned!
In the first cell, check that the type is Python and enter this code:
%pip install -qU "langchain[openai]"
Press Shift+Enter or click on the Run icon to run this code block. This will also take a minute or so to install the LangChain library for OpenAI.
Now create a second cell and paste in this code:
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain.chat_models import init_chat_model
model = init_chat_model("gpt-4o-mini", model_provider="openai")
Run this block, and when it prompts you for your key, paste in the key, it will start with something like sk-proj and have a long string of mostly letters and numbers after that. This will save your key in the environment so that you don’t have to keep entering it each time.
Now, we are ready to talk to the LLM! Let’s try a simple prompt. Create a new cell and enter this code:
model.invoke("Hello, world!")
Run this cell and observe the output. It should look something like this:
Great, now we are ready to connect to a vector store. If you don’t already have one, start up an instance of Oracle Database 23ai in a container on your machine. Run this command in a terminal window (not the notebook)
This will start up an Oracle Database 23ai Free instance in a container. It will have a PDB called FREEPDB1 and the password for PDBADMIN (and SYS and SYSTEM) will be Welcome12345.
Now, run the following command to create an Oracle user with appropriate permissions to create a vector store:
docker exec -i db23ai sqlplus sys/Welcome12345@localhost:1521/FREEPDB1 as sysdba <<EOF
alter session set container=FREEPDB1;
create user vector identified by vector;
grant connect, resource, unlimited tablespace, create credential, create procedure, create any index to vector;
commit;
EOF
Let’s connect to the database! First we’ll isntall the oracledb library. Create a new cell and enter this code:
Now, import the things we will need by creating a new call with this code and running it:
from langchain_community.vectorstores import oraclevs
from langchain_community.vectorstores.oraclevs import OracleVS
from langchain_community.vectorstores.utils import DistanceStrategy
from langchain_core.documents import Document
from langchain_huggingface import HuggingFaceEmbeddings
We are going to need some documents to load into the vector store, so let’s define some to use for an example. In real life, you’d probably want use your own non-public documents to load a vector store if you were building a chatbot or using retrieval augmented generation. Create and run a new call with this code:
documents_json_list = [
{
"id": "moby_dick_2701_P1",
"text": "Queequeg was a native of Rokovoko, an island far away to the West and South. It is not down in any map; true places never are.",
"link": "https://www.gutenberg.org/cache/epub/2701/pg2701-images.html#link2HCH0012",
},
{
"id": "moby_dick_2701_P2",
"text": "It was not a great while after the affair of the pipe, that one morning shortly after breakfast, Ahab, as was his wont, ascended the cabin-gangway to the deck. There most sea-captains usually walk at that hour, as country gentlemen, after the same meal, take a few turns in the garden.",
"link": "https://www.gutenberg.org/cache/epub/2701/pg2701-images.html#link2HCH0036",
},
{
"id": "moby_dick_2701_P3",
"text": "Now, from the South and West the Pequod was drawing nigh to Formosa and the Bashee Isles, between which lies one of the tropical outlets from the China waters into the Pacific. And so Starbuck found Ahab with a general chart of the oriental archipelagoes spread before him; and another separate one representing the long eastern coasts of the Japanese islands—Niphon, Matsmai, and Sikoke. ",
"link": "https://www.gutenberg.org/cache/epub/2701/pg2701-images.html#link2HCH0109",
},
]
Now, let’s load them into a LangChain documents list with some metadata. Create and run a cell with this code:
Let’s have a look in the database and see what was created. Run this code in your terminal:
docker exec -i db23ai sqlplus vector/vector@localhost:1521/FREEPDB1 <<EOF
select table_name from user_tables;
describe documents_cosine;
column id format a20;
column text format a30;
column metadata format a30;
column embedding format a30;
set linesize 150;
select * from documents_cosine;
EOF
You should see output similar to this:
SQL>
TABLE_NAME
--------------------------------------------------------------------------------
DOCUMENTS_COSINE
SQL> Name Null? Type
----------------------------------------- -------- ----------------------------
ID NOT NULL RAW(16)
TEXT CLOB
METADATA JSON
EMBEDDING VECTOR(768, FLOAT32)
SQL> SQL> SQL> SQL> SQL> SQL>
ID TEXT METADATA EMBEDDING
-------------------- ------------------------------ ------------------------------ ------------------------------
957B602A0B55C487 Now, from the South and West t {"id":"moby_dick_2701_P3","lin [9.29364376E-003,-5.70030287E-
he Pequod was drawing nigh to k":"https://www.gutenberg.org/ 002,-4.62282933E-002,-1.599499
Formosa and the Bash cache/epub/2701/pg27 58E-002,
A8A71597D56432FD Queequeg was a native of Rokov {"id":"moby_dick_2701_P1","lin [4.28722538E-002,-8.80071707E-
oko, an island far away to the k":"https://www.gutenberg.org/ 003,3.56001826E-003,6.765306E-
West and South. It cache/epub/2701/pg27 003,
E7675836CF07A695 It was not a great while after {"id":"moby_dick_2701_P2","lin [1.06763924E-002,3.91203648E-0
the affair of the pipe, that k":"https://www.gutenberg.org/ 04,-1.01576066E-002,-3.5316135
one morning shortly cache/epub/2701/pg27 7E-002,
Now, let’s do a vector similarity search. Create and run a cell with this code:
query = 'Where is Rokovoko?'
print(vector_store.similarity_search(query, 1))
query2 = 'What does Ahab like to do after breakfast?'
print(vector_store.similarity_search(query2, 1))
This will find the one (1) nearest match in each case. You should get an answer like this:
[Document(metadata={'id': 'moby_dick_2701_P1', 'link': 'https://www.gutenberg.org/cache/epub/2701/pg2701-images.html#link2HCH0012'}, page_content='Queequeg was a native of Rokovoko, an island far away to the West and South. It is not down in any map; true places never are.')]
[Document(metadata={'id': 'moby_dick_2701_P2', 'link': 'https://www.gutenberg.org/cache/epub/2701/pg2701-images.html#link2HCH0036'}, page_content='It was not a great while after the affair of the pipe, that one morning shortly after breakfast, Ahab, as was his wont, ascended the cabin-gangway to the deck. There most sea-captains usually walk at that hour, as country gentlemen, after the same meal, take a few turns in the garden.')]
Well, there you go, that’s the most basic example of creating a vector store, loading some documents into it and doing a simple similarity search. Stay tuned to learn about more advanced features!
Mark Nelson is a Developer Evangelist at Oracle, focusing on microservices and messaging. Before this role, Mark was an Architect in the Enterprise Cloud-Native Java Team, the Verrazzano Enterprise Container Platform project, worked on Wercker, WebLogic and was a senior member of the A-Team since 2010, and worked in Sales Consulting at Oracle since 2006 and various roles at IBM since 1994.
1,356,019 people have been kind enough to visit our humble blog. Others get our posts by RSS or email or through syndicators. We hope you took away something of value. Please come again!
Copyright 2009-2025 Mark Nelson and other contributors. All Rights Reserved. The views expressed in this blog are our own and do not necessarily reflect the views of Oracle Corporation. All content is provided on an ‘as is’ basis, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties or conditions of title, non-infringement, merchantability, or fitness for a particular purpose. You are solely responsible for determining the appropriateness of using or redistributing and assume any risks.
Pingback: Basic Retrieval Augmented Generation with Oracle Vector Store in LangChain | RedStack