Content Vault — novacaustic

(copy this prompt)

Podcasts, notes, and saved posts go in. Get searchable highlights, hooks, and angles.

The Prompt

1. Scaffold the working files first

Before you do anything else, suggest the workspace folder `~/Documents/content-vault-workflow/` and ask me to confirm or choose a different local path. After I confirm, create the folder and write these three files as your first act:

- `plan.md`: the durable runbook. Put the full workflow plan in it: folder layout, install choices, credential placeholders, scripts to create, trigger commands, output formats, review rules, and the success test. Re-read `plan.md` whenever context is compacted or a new session starts.
- `.env`: the local credential/config store. Create one placeholder per line, with no values yet:
`OPENAI_API_KEY=`
`OPENAI_VECTOR_STORE_ID=`
`READWISE_TOKEN=`
`CONTENT_VAULT_DIR=`
`OPENAI_TRANSCRIBE_MODEL=`
Tell me to open `.env` and fill the values directly, or prompt me for each value one at a time and write them into `.env` as they arrive. Never store secrets in `plan.md` or `progress.md`.
- `progress.md`: the step tracker. Append a timestamped line every time you finish a step, using this exact format: `## 2026-05-11 14:23 — completed: <step>` followed by `## next: <step>`. Re-read `progress.md` before each new action so you know what remains.

Also create the local vault directory from `CONTENT_VAULT_DIR`, defaulting to `<confirmed folder>/content_vault/` if I do not specify another path. Build these subfolders: `inbox/audio/`, `inbox/urls/`, `inbox/notes/`, `sources/`, `draft_angles/`, `logs/`, and `index/`.

2. Connect the required tools and accounts

Use a local Python script plus official APIs. Do not use MCP as the primary surface; this workflow needs file-system access, audio pre-processing, batch transcription, saved-post polling, deterministic Markdown writes, and retry logs. Do not make this CLI-only either; use direct SDK/API access from scripts because structured extraction and indexing need reliable response handling.

Set up Python inside the confirmed workspace:

```bash
python3 -m venv .venv
. .venv/bin/activate
pip install openai requests python-dotenv feedparser trafilatura beautifulsoup4 pydub
```

If this machine will process long podcasts or converted audio, check whether `ffmpeg` is installed. If it is missing, tell me to install it before long-file ingestion. Record the check and the install note in `plan.md`.

Create or update a custom skill in my current AI workspace at `<target-ai-workspace-skills>/content-vault/SKILL.md`. If my workspace has a different skill root, ask me for the configured skill root and use the same `content-vault/SKILL.md` placement under it. The skill must teach you these operating rules: treat `content_vault/` as the canonical store; never overwrite source notes without appending a revision; keep one Markdown file per source; extract only traceable highlights from the source text; label hooks and draft angles as generated; use semantic search before drafting from the vault.

OpenAI setup: ask me to create an OpenAI API project with a restricted API key where available. The key must allow only the endpoints this workflow needs: audio transcription, Responses API extraction, files, vector stores, and embeddings or file search. Store it only as `OPENAI_API_KEY` in `.env` or the workspace environment. Use `OPENAI_TRANSCRIBE_MODEL` if set; otherwise default to `gpt-4o-mini-transcribe` for normal voice notes and podcast clips. Use `gpt-4o-transcribe` when I choose it, and consider `gpt-4o-transcribe-diarize` only for multi-speaker podcasts where speaker-separated segments matter.

Readwise setup is optional. If I already use Readwise Reader for saved posts, ask me to generate a Readwise access token and store it as `READWISE_TOKEN`. Send it as `Authorization: Token <READWISE_TOKEN>` only from the script. Validate it with the documented Readwise auth-check endpoint expecting HTTP 204. Use Reader API fetching for saved documents and the Readwise highlights endpoint for highlights. If `READWISE_TOKEN` is blank, skip Reader polling and process only the local URL manifest.

Notion is optional and not part of the canonical setup. Do not add Notion unless I explicitly request it later. If I request it, keep the local Markdown vault as the source of truth and use Notion only as a display layer with its own token and shared pages.

3. Watch these trigger surfaces

The ingestion trigger is a manual or scheduled local command:

```bash
python scripts/ingest_vault.py --source all --since last_successful_sync
```

I will add podcast audio files or exported voice notes to `content_vault/inbox/audio/`, paste saved URLs into `content_vault/inbox/urls/urls.csv`, and write quick ideas as Markdown or text files under `content_vault/inbox/notes/`. If Readwise is connected, the same ingestion command also polls Reader documents and Readwise highlights since the last successful sync.

The drafting trigger is separate:

```bash
python scripts/search_vault.py "AI agents for creators" --top-k 12
python scripts/draft_angles.py --query "AI agents for creators" --use-search-results
```

Do not automatically draft every time ingestion runs. Ingestion updates the vault; drafting happens only after I ask with a focused query.

4. Run the workflow step by step

Create `scripts/ingest_vault.py`, `scripts/search_vault.py`, and `scripts/draft_angles.py`. Write the scripts so they load `.env`, append to `logs/ingest.log`, and update `logs/state.json` with last-read timestamps, processed file hashes, uploaded file IDs, and vector-store file IDs.

For ingestion:
1. Read `plan.md` and `progress.md`, then validate that `CONTENT_VAULT_DIR` points to the local vault and `.env` is not committed or copied into shareable notes.
2. Validate credentials with a harmless OpenAI models or files check. If `READWISE_TOKEN` is set, validate Readwise expecting HTTP 204 from the auth check. If any credential fails, stop and write the fix needed to `progress.md`.
3. Scan `inbox/audio/`, `inbox/urls/urls.csv`, `inbox/notes/`, and Readwise/Reader if enabled. Build a manifest of new audio files, URL rows, text notes, Reader documents, highlights, source IDs, and content hashes not already present in `logs/state.json`.
4. For each audio item, confirm the format is one of `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, or `webm`. Check file size before upload. If a file exceeds 25 MB, split or convert it with `ffmpeg` before transcription. Transcribe each valid part through OpenAI speech-to-text, merge part transcripts in order, and preserve timestamps or speaker hints when available.

One email. We'll never message you again unless you ask.

Heads upRun this in a local AI workspace — browser chat can't reach your files.

Download Glass →

ContentVault

The Prompt