Skip to main content

Documentation Index

Fetch the complete documentation index at: https://acem-52171079.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Extensions allow the Collector to ingest content from sources other than local files, such as video platforms or code repositories.

YouTube Extension

Module: app/extensions/youtube.py Fetches transcripts from YouTube videos. It uses a robust fallback strategy:
  1. Primary: youtube_transcript_api (official-like API).
  2. Secondary: yt-dlp (extracts subtitles/captions).
  3. Fallback: playwright (browser-based automation).

Cookies Configuration

For reliable extraction (especially for age-restricted or member-only content), you can provide a Netscape-format cookies file.
  • Env Variable: YT_COOKIES_FILE pointing to the path of cookies.txt.
  • Default: Looks for app/cookies/youtube_cookies.txt.

GitHub Extension

Module: app/extensions/github.py Clones and processes repositories.
  • Endpoint: /ext/github/process
  • Auth: Requires GITHUB_TOKEN in environment variables.

GitLab Extension

Module: app/extensions/gitlab.py Similar to GitHub, but for GitLab repositories.
  • Auth: Requires GITLAB_TOKEN.