Skip to main content

arXiv Operations Runbook

Use this runbook when arXiv lookups are slow, rate-limited, or behaving unexpectedly.

Signals to Check

  • Logs for arxiv.request_scheduled, arxiv.request_completed, arxiv.cooldown_activated, arxiv.cache_hit, arxiv.cache_miss
  • arxiv_runtime_state row for cooldown and next-allowed timestamps
  • arxiv_query_cache_entries size and expiry churn

Event Field Guide

FieldDescription
wait_secondsEnforced pre-request delay from the global limiter
status_codeUpstream response code from arXiv
cooldown_remaining_secondsRemaining cooldown when blocked or after 429
source_pathCaller path (search or lookup_ids)

Quick SQL Checks

Runtime State

SELECT state_key, next_allowed_at, cooldown_until, updated_at
FROM arxiv_runtime_state;

Cache Status

SELECT count(*) AS cache_rows,
min(expires_at) AS earliest_expiry,
max(expires_at) AS latest_expiry
FROM arxiv_query_cache_entries;

Common Scenarios

1. Repeated arxiv.cooldown_activated Events

  • Confirm recent 429 statuses in arxiv.request_completed logs.
  • Reduce caller pressure (check title-quality/identifier guards are active).
  • Temporarily raise ARXIV_RATE_LIMIT_COOLDOWN_SECONDS if upstream remains strict.

2. High Request Latency with Few Completions

  • Inspect wait_seconds in arxiv.request_scheduled.
  • Verify only one process path is repeatedly hitting arXiv (source_path).
  • Confirm cache is enabled (ARXIV_CACHE_TTL_SECONDS > 0) and effective (cache_hit appears).

3. Low Cache Effectiveness

  • Validate normalized query behavior and caller churn.
  • Increase ARXIV_CACHE_TTL_SECONDS for stable workloads.
  • Increase ARXIV_CACHE_MAX_ENTRIES if heavy eviction is observed.

Environment Variables

VariableDefaultDescription
ARXIV_ENABLED1Enable arXiv lookups
ARXIV_TIMEOUT_SECONDS3.0Request timeout
ARXIV_MIN_INTERVAL_SECONDS4.0Min interval between requests
ARXIV_RATE_LIMIT_COOLDOWN_SECONDS60.0Cooldown after 429
ARXIV_DEFAULT_MAX_RESULTS3Max results per query
ARXIV_CACHE_TTL_SECONDS900Cache TTL (15 min)
ARXIV_CACHE_MAX_ENTRIES512Max cached queries
ARXIV_MAILTO(empty)Contact email for API headers

Safe Recovery

  1. Pause automated ingestion if rate-limit storms persist.
  2. Let cooldown expire naturally; avoid manual burst retries.
  3. Resume and monitor event rates before restoring full load.