You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A code snippet is provided in PaperQA' documentation to create a reusable documents index :
import os
from paperqa import Settings
from paperqa.agents.main import agent_query
from paperqa.agents.models import QueryRequest
from paperqa.agents.search import get_directory_index
async def amain(folder_of_papers: str | os.PathLike) -> None:
settings = Settings(paper_directory=folder_of_papers)
# 1. Build the index. Note an index name is autogenerated when unspecified
built_index = await get_directory_index(settings=settings)
print(settings.get_index_name()) # Display the autogenerated index name
print(await built_index.index_files) # Display the index contents
# 2. Use the settings as many times as you want with ask
answer_response_1 = await agent_query(
query=QueryRequest(
query="What is the best way to make a vaccine?", settings=settings
)
)
answer_response_2 = await agent_query(
query=QueryRequest(
query="What manufacturing challenges are unique to bispecific antibodies?",
settings=settings,
)
)
In this way we can create an index by adding papers to index automatically but it seems it's not possible to add "citation" and "docname" for each paper as it can be done when creating a Doc() object. See following :
for doc in tqdm(file_list):
try:
docs.add(str("./Papers/ToAdd/"+str(doc)),
citation="File " + doc,
docname=doc,
settings=settings,
embedding_model=hybrid_model)
shutil.move(str("./Papers/ToAdd/"+str(doc)),str("./Papers/Added/"+str(doc)))
added_doc += 1
except Exception as e:
# sometimes this happens if PDFs aren't downloaded or readable
print("Could not read", doc, e)
shutil.move(str("./Papers/ToAdd/"+str(doc)),str("./Papers/Failed/"+str(doc)))
failed_doc += 1
continue
with open("./Docs/my_docs3.pkl", "wb") as f:
compress_pickle.dump(docs, f, compression="lzma")
So I wonder if it is a way to add papers manually to an empty IndexSearch() object, iteratively, as it can be done with Doc() objects.
Also, documentation provides snippet to fill a Doc() object with papers form ZoteroDB :
from paperqa import Docs
from paperqa.contrib import ZoteroDB
docs = Docs()
zotero = ZoteroDB(library_type="user") # "group" if group library
for item in zotero.iterate(limit=20):
if item.num_pages > 30:
continue # skip long papers
docs.add(item.pdf, docname=item.key)
Is it a way to create reusable index from ZoteroDB ?
To manually add papers to an IndexSearch object in PaperQA, you can use the add_document method of the SearchIndex class. This allows you to add documents with metadata such as 'citation' and 'docname'. Here's a general approach:
Create an index_doc Dictionary: Include metadata like 'title', 'year', 'file_location', and 'body' of the document.
Prepare the Document: If you have a document object, prepare it for storage.
Use the add_document Method: Add the document to the index by passing the index_doc dictionary and the document object.
Example:
awaitsearch_index.add_document(
{
"title": "Document Title",
"year": "2023",
"file_location": "/path/to/document",
"body": "The full text of the document goes here.",
},
document=your_document_object, # Optional
)
This method handles the addition of the document to the index, including writing the document to storage if needed [1].
For creating a reusable index from ZoteroDB, you can iterate over items in the Zotero library using the ZoteroDB class and add them to a Docs object. This process involves fetching metadata and PDFs from Zotero and adding them to the Docs object, which can then be used to build an index. Here's a snippet for adding documents from Zotero:
frompaperqaimportDocsfrompaperqa.contribimportZoteroDBdocs=Docs()
zotero=ZoteroDB(library_type="user") # "group" if group libraryforiteminzotero.iterate(limit=20):
ifitem.num_pages>30:
continue# skip long papersdocs.add(item.pdf, docname=item.key)
This method handles metadata by using the Zotero library to fetch and manage document details [2].
Hi,
A code snippet is provided in PaperQA' documentation to create a reusable documents index :
In this way we can create an index by adding papers to index automatically but it seems it's not possible to add "citation" and "docname" for each paper as it can be done when creating a
Doc()
object. See following :So I wonder if it is a way to add papers manually to an empty
IndexSearch()
object, iteratively, as it can be done withDoc()
objects.Also, documentation provides snippet to fill a
Doc()
object with papers form ZoteroDB :Is it a way to create reusable index from ZoteroDB ?
Best regards.
@dosu.
The text was updated successfully, but these errors were encountered: