# Background Jobs Long-running and deferrable work is offloaded to background jobs processed by [RQ (Redis Queue)](https://python-rq.org/). The Plone instance enqueues jobs into Redis; a separate worker process picks them up and runs them outside the request/response cycle. Examples of background work in `wcs.backend` include cache invalidation (Cloudflare/CDN), Matomo statistics collection, external content synchronisation, and RAG embedding generation. A set of REST endpoints (`@rq-queues`, `@rq-job`) exposes the live state of the queues for monitoring and operations. ## Redis Connection The worker and the instance connect to the same Redis server, configured through the `PLONE_REDIS_DSN` environment variable. When unset, the instance falls back to `redis://localhost:6379/0`. The worker requires the variable to be set. ```bash export PLONE_REDIS_DSN="redis://localhost:6379/0" ``` ## Queue Tiers There are three queues, named by priority: | Queue | Purpose | |-------------|---------| | `important` | High-priority and scheduled jobs -- e.g. scheduled cache invalidation. | | `normal` | The default queue for most jobs -- cache invalidation, Matomo stats, external sync, etc. | | `low` | Background, deferrable work that should not compete with the rest -- e.g. RAG embedding generation. | The worker listens to all three in priority order (`important`, then `normal`, then `low`), so an important job is always picked up before a normal or low one when several are waiting. ```text ┌──────────────────────────────┐ │ Plone instance │ │ enqueue / schedule jobs │ └───────────────┬──────────────┘ │ ▼ ┌────────────────────┐ │ Redis │ │ important ──┐ │ │ normal ──┼──▶ consumed in priority order │ low ──┘ │ └─────────┬──────────┘ │ ▼ ┌────────────────────┐ │ RQ Worker │ │ runs the job func │ └────────────────────┘ ``` ## How Jobs Are Enqueued and Processed A job is a plain Python function plus its arguments. The instance places it on one of the queues; the worker imports the function and executes it. Jobs may be enqueued for immediate processing or scheduled for a future time, and they can declare retry behaviour, a timeout, and how long the result is kept. Internally the codebase obtains a queue (the default `normal` queue, the `important` queue, or the `low` queue) and enqueues or schedules a job onto it. Scheduled jobs land in the `important` queue's scheduled registry and are released to the worker at their due time -- this is why the worker runs with the scheduler enabled. Typical job options used in this codebase include: - **retry** -- automatic retries with a back-off interval (e.g. cache invalidation retries up to 3 times). - **job_timeout** -- maximum run time before the job is considered failed. - **result_ttl** -- how long the return value / job record is retained in Redis. ## Running the Worker The worker is a standalone process that preloads the `wcs.backend` and `collective.elasticsearch` code and then consumes the three queues with the scheduler enabled. It needs `PLONE_REDIS_DSN` pointing at the same Redis the instance uses. ```bash export PLONE_REDIS_DSN="redis://localhost:6379/0" bin/python -m wcs.backend.worker ``` Run one or more workers depending on throughput needs. Because the worker preloads the application code, jobs can import and use anything available in the `wcs.backend` package. ## Monitoring REST API Two read-only endpoints expose the live state of the queues. They are available on the Plone site root and are intended for operators and dashboards. The job and registry data returned mirror RQ's own model. Each queue reports its pending `count` and a set of **registries** that track jobs by lifecycle stage: - `started_job_registry` - `deferred_job_registry` - `canceled_job_registry` - `failed_job_registry` - `finished_job_registry` - `scheduled_job_registry` ### GET @rq-queues Lists all queues with their counts and registry summaries. ```http GET /Plone/@rq-queues HTTP/1.1 Host: localhost:8080 Accept: application/json ``` Response: ```json { "@id": "@rq-queues", "items": [ { "@id": "@rq-queues/important", "name": "important", "count": 0, "registries": [ {"@id": "@rq-queues/important/started_job_registry", "name": "started_job_registry", "count": 0}, {"@id": "@rq-queues/important/deferred_job_registry", "name": "deferred_job_registry", "count": 0}, {"@id": "@rq-queues/important/canceled_job_registry", "name": "canceled_job_registry", "count": 0}, {"@id": "@rq-queues/important/failed_job_registry", "name": "failed_job_registry", "count": 0}, {"@id": "@rq-queues/important/finished_job_registry", "name": "finished_job_registry", "count": 0}, {"@id": "@rq-queues/important/scheduled_job_registry", "name": "scheduled_job_registry", "count": 0} ] } ] } ``` Queues are returned sorted by name (`important`, `low`, `normal`). ### GET @rq-queues/{queue} Returns a single queue, identified by name, in the same shape as one item of the list above. ```http GET /Plone/@rq-queues/normal HTTP/1.1 Host: localhost:8080 Accept: application/json ``` ### GET @rq-queues/{queue}/{registry} Returns one registry of a queue, expanded to include the jobs it currently holds. Use this to inspect, for example, the failed or scheduled jobs of a queue. ```http GET /Plone/@rq-queues/normal/failed_job_registry HTTP/1.1 Host: localhost:8080 Accept: application/json ``` Response: ```json { "@id": "@rq-queues/normal/failed_job_registry", "name": "failed_job_registry", "count": 1, "items": [ { "id": "a1b2c3...", "description": "wcs.backend.tasks.cache.run_async_invalidate_cache_for_url('https://...')", "args": ["https://example.com/page"], "created": "2026-06-03T08:00:00", "enqueued": "2026-06-03T08:00:00", "ended": "2026-06-03T08:00:05", "scheduled": null, "func_name": "wcs.backend.tasks.cache.run_async_invalidate_cache_for_url", "retries_left": 0, "return_value": null, "ttl": null, "result_ttl": 86400, "last_job_result": "Traceback (most recent call last): ..." } ] } ``` ### GET @rq-job/{queue}/{job_id} Returns a single job by id from a named queue. Useful for polling the status of a specific job after it was enqueued. ```http GET /Plone/@rq-job/normal/a1b2c3... HTTP/1.1 Host: localhost:8080 Accept: application/json ``` Each job serialization includes its `id`, `description`, `args`, the `created` / `enqueued` / `ended` / `scheduled` timestamps, `func_name`, `retries_left`, `return_value`, `ttl`, `result_ttl`, and `last_job_result` (the exception string of the last run, when the job failed). A missing job returns an empty object. #### Consuming the monitoring endpoints **JavaScript -- list queues:** ```javascript async function listQueues(siteUrl, token) { const response = await fetch(`${siteUrl}/@rq-queues`, { headers: { 'Accept': 'application/json', 'Authorization': `Bearer ${token}` } }); const data = await response.json(); return data.items; } // Usage const queues = await listQueues('http://localhost:8080/Plone', token); queues.forEach(q => console.log(`${q.name}: ${q.count} pending`)); ``` **Python -- poll a job until it finishes:** ```python import requests import time def wait_for_job(site_url, queue, job_id, auth): url = f'{site_url}/@rq-job/{queue}/{job_id}' while True: job = requests.get( url, auth=auth, headers={'Accept': 'application/json'}, ).json() if not job: raise RuntimeError('Job not found') if job['ended']: return job time.sleep(1) job = wait_for_job( 'http://localhost:8080/Plone', 'normal', 'a1b2c3...', ('admin', 'password'), ) print(job['last_job_result'] or job['return_value']) ```