-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
save and import embeddings for faster answer generation #721
Comments
Hey @n400peanuts 👋, I'm here to help you with any issues or questions you have about PaperQA. Let's work together to solve any bugs or address any questions you have. To save and import embeddings in PaperQA and avoid reloading files and retraining an embedding model each time you have a new question, you can pickle the import pickle
from paperqa import Docs
# Assuming you have already added documents to your Docs object
docs = Docs()
# Add documents to docs...
# Save the Docs object with embeddings
with open("my_docs.pkl", "wb") as f:
pickle.dump(docs, f)
# Later, load the Docs object with embeddings
with open("my_docs.pkl", "rb") as f:
docs = pickle.load(f) By pickling the |
ok let's do a step back, @dosu. Show me:
|
To add your documents to the
from pathlib import Path
from paperqa import Docs
docs = Docs()
file_path = Path("path/to/your/document.pdf")
docs.add(file_path)
from paperqa import Docs
docs = Docs()
url = "http://example.com/document.pdf"
docs.add_url(url)
from paperqa import Docs
docs = Docs()
with open("path/to/your/document.pdf", "rb") as file:
docs.add_file(file) Once your documents are added, embeddings are generated for them. To use these embeddings to generate an answer, you can use the query = "Your question here"
session = docs.query(query)
answer = session.formatted_answer
print(answer) This process retrieves relevant texts using the embeddings and generates an answer based on your question [1][2]. |
@dosu, Show me how to set to use the following embedding model when generating the embeddings stored into Docs
|
Hello,
Thank you for the fantastic work on PaperQA. I’ve been able to use it to ask questions by providing over 100 papers as input, and I’ve been using only local models via Ollama. Everything is working well, but I’d like to know how I can avoid reloading the same files and retraining an embedding model each time I have a new Question.
Is there a way to save the vector store and load it later, so it can be used by the LLM to generate answers? I couldn't find documentation about that, I found something re- caching, but it's unclear to me how to use it properly. Can you provide some help?
best wishes
My code so far:
The text was updated successfully, but these errors were encountered: