Leveraging Langchain Library to Build Interactive CLI Tools for Document Retrieval with ChatGPT

This article introduces two innovative command-line interface tools, developed with the Langchain library and OpenAI's GPT model, for interactive and intelligent document retrieval. The first tool, represented by a Python script, interacts with various file types including .csv, .json, and .txt, using ChatGPT as an AI layer to provide relevant responses based on the content in these files. A second tool extends these capabilities to PDF files. By indexing and parsing user documents, the tools enable queries directly from the command line, providing concise and pertinent information as responses. Additionally, the tools can be aliased in the .zshrc file for more accessible usage. These AI-powered tools offer a revolution in document interaction and information retrieval, making it more efficient and effective.

Posted by Gregory Pacheco on July 17, 2023
Command-line interfaces (CLI) have been a staple in the world of software development and system administration. CLI tools are known for their efficiency, effectiveness, and the level of control they offer to users. Today, we present you with two exciting new CLI tools built upon the versatile capabilities of the Langchain library and the incredible prowess of OpenAI's GPT model, known as ChatGPT.

Our first tool is designed to interact with ChatGPT, using it as an AI layer atop your personal documents, providing you with detailed and accurate answers based on your files' content. Whether it's a .csv, .json, or a .txt file, the tool can parse through these files and provide relevant responses to your queries.
Let's delve into the python script that makes this possible:

Link for langchain: Langchain Ideally create a virtual env and install the following dependency:
pip install langchain Our script starts by importing the necessary modules, including the Langchain library's modules for loading documents, creating indexes, and communicating with OpenAI's models. The OpenAI API key is set using a constant defined elsewhere in your code. The argparse.ArgumentParser() is used to manage command-line arguments, allowing users to specify the directory containing the documents for analysis. If the user doesn't provide a directory, the current working directory is used as the default.

The script then prompts the user for a query and sets up an instance of the OpenAI language model. Using the DirectoryLoader, the script loads all text documents in the specified directory. These documents are then indexed using VectorstoreIndexCreator.

When the index is created, the script uses the query method to retrieve information related to the user's query based on the indexed documents. The result is then printed to the console.

To further simplify the usage of this tool, an alias can be configured in your .zshrc file. You can call the script using the alias gpt and optionally pass the directory path as an argument. This configuration brings your intelligent document query system right to your fingertips, every time you open your terminal.

Our second tool carries the functionality of the first tool but extends its capabilities to PDF files. Here's the Python script for the PDF processing tool:

This script functions similarly to the previous one, but it's tailored to handle PDF files specifically using the PyPDFLoader from Langchain. It starts by loading a specified PDF file. Once the file is loaded, it is indexed using the same VectorstoreIndexCreator method as before.

Upon receiving a user query, the script again leverages the query method to search the indexed content from the PDF file and returns relevant responses based on the query, which is then printed to the console.

These tools open up a world of possibilities for document interaction and information retrieval, effectively creating an AI-powered system that not only stores your documents but understands them too, enabling it to provide relevant and concise responses to your queries. This, therefore, can revolutionize how we interact with our files, making document browsing and data retrieval faster, easier, and more efficient.