Gemini Rate Limit Issue #786

gurugecl · 2025-01-03T23:04:30Z

I am running into a rate limit issue when using Gemini 2.0 Flash. According to this thread in LiteLLM I believe it is supported 7188. The rate limit errors comes on the very first query to the docs object and no amount of additional waiting or exponential backoff seems to resolve it so I feel that it might be due to something happening behind the scenes with either PaperQA or LiteLLM however they do not believe this to be an issue on their end 7358.

Here is how I am using paperqa. Can you please confirm if this is utilized correctly to avoid rate limits or if not how I should set it up correctly. Thank you!

docs = Docs()

docs.add(pdf_path, pdf_file, docname=pdf_file)

pickle_buffer = io.BytesIO()
pickle.dump(docs, pickle_buffer)
pickle_buffer.seek(0)

docs = pickle.load(f)

model = 'gemini/gemini-2.0-flash-exp'

settings = Settings(
        llm=model,
        summary_llm=model,
        llm_config={
            "model_list": [{
                "model_name": model,
                "litellm_params": {
                    "model": model,
                    "api_key": os.environ.get('GEMINI_API_KEY')
                }
            }],
            "rate_limit": {
                "gemini/gemini-2.0-flash-exp": "10 per 1 minute"  
            }
        },
        summary_llm_config={
            "model_list": [{
                "model_name": model,
                "litellm_params": {
                    "model": model,
                    "api_key": os.environ.get('GEMINI_API_KEY')
                }
            }],
            "rate_limit": {
                "gemini/gemini-2.0-flash-exp": "10 per 1 minute"
            }
        }
    )
    
  
 max_choices = len(list(docs.docnames))
 settings.answer.answer_max_sources = max_choices
 settings.answer.evidence_k = relevancy * max_choices
    
 model_response = docs.query(model_input, settings=settings)

Here are some of the logs/errors:

litellm.acompletion(model=gpt-4o-2024-08-06) 200 OK
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Retrying request to /embeddings in 0.377732 seconds
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Using in-memory rate limiter.
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"

Error during paper query after 0 retries: Timeout (60.00999999999663 secs): rate limit for key: ('client', 'gemini/gemini-2.0-flash-exp')

The text was updated successfully, but these errors were encountered:

dosubot · 2025-01-03T23:06:09Z

Hey there, @gurugecl! I'm here to help you with any bugs, questions, or contributions you may have.

The rate limit issue you're encountering with the Gemini 2.0 Flash model in the paperqa library might be related to how the Docs object and the settings are configured. Here are a few points to consider:

Concurrent Requests: The Docs class in paperqa uses asynchronous methods, which can lead to multiple concurrent requests. Ensure that the max_concurrent_requests setting is configured to a value that respects the rate limits of the Gemini API. If this setting is too high, it could cause rate limit issues even on the first query ^[1].
Rate Limit Configuration: You have specified a rate limit of "10 per 1 minute" for the Gemini model. Verify that this matches the actual rate limit provided by the Gemini API. If the API's rate limit is lower, adjust your configuration accordingly.
In-Memory Rate Limiter: The logs mention using an in-memory rate limiter. Ensure that this rate limiter is correctly configured to handle the rate limits specified in your settings. If the rate limiter is not properly managing the requests, it could lead to immediate rate limit errors.
Retry Logic: The error message indicates a timeout after 0 retries. Consider implementing a more robust retry mechanism with exponential backoff to handle transient rate limit errors more effectively.
API Key and Model Configuration: Double-check that the API key and model configurations in litellm_params are correct and that the API key has the necessary permissions for the Gemini model.

By addressing these points, you should be able to mitigate the rate limit issues you're experiencing. If the problem persists, further investigation into the specific interactions between paperqa, LiteLLM, and the Gemini API might be necessary.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

gurugecl · 2025-01-04T03:30:30Z

Resolved by modifying the Docs setup even though it seemed to work without an issue initially

derspotter · 2025-01-08T09:38:04Z

i think your rate limit is wrong:
"gemini/gemini-2.0-flash-exp": "10 per 1 minute"

this is not rpm but tpm. so 4 Million.

Exemocaro · 2025-01-08T20:07:28Z

Hi @gurugecl , could you please share your working code? I'm having trouble setting this up with Gemini and you seem to have made it work. I tried your code in here but it's missing a lot of stuff.

dosubot bot added the bug Something isn't working label Jan 3, 2025

gurugecl mentioned this issue Jan 4, 2025

[Bug]: Rate Limit Errors when using with PaperQA BerriAI/litellm#7358

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini Rate Limit Issue #786

Gemini Rate Limit Issue #786

gurugecl commented Jan 3, 2025 •

edited

Loading

dosubot bot commented Jan 3, 2025

gurugecl commented Jan 4, 2025

derspotter commented Jan 8, 2025

Exemocaro commented Jan 8, 2025

Gemini Rate Limit Issue #786

Gemini Rate Limit Issue #786

Comments

gurugecl commented Jan 3, 2025 • edited Loading

dosubot bot commented Jan 3, 2025

gurugecl commented Jan 4, 2025

derspotter commented Jan 8, 2025

Exemocaro commented Jan 8, 2025

gurugecl commented Jan 3, 2025 •

edited

Loading