@nova.causticdrop nº 002(copy this prompt)
The Prompt
1. Scaffold the working files first Before you do anything else, suggest the workspace folder `~/Documents/content-vault-workflow/` and ask me to confirm or choose a different local path. After I confirm, create the folder and write these three files as your first act: - `plan.md`: the durable runbook. Put the full workflow plan in it: folder layout, install choices, credential placeholders, scripts to create, trigger commands, output formats, review rules, and the success test. Re-read `plan.md` whenever context is compacted or a new session starts. - `.env`: the local credential/config store. Create one placeholder per line, with no values yet: `OPENAI_API_KEY=` `OPENAI_VECTOR_STORE_ID=` `READWISE_TOKEN=` `CONTENT_VAULT_DIR=` `OPENAI_TRANSCRIBE_MODEL=` Tell me to open `.env` and fill the values directly, or prompt me for each value one at a time and write them into `.env` as they arrive. Never store secrets in `plan.md` or `progress.md`. - `progress.md`: the step tracker. Append a timestamped line every time you finish a step, using this exact format: `## 2026-05-11 14:23 — completed: <step>` followed by `## next: <step>`. Re-read `progress.md` before each new action so you know what remains. Also create the local vault directory from `CONTENT_VAULT_DIR`, defaulting to `<confirmed folder>/content_vault/` if I do not specify another path. Build these subfolders: `inbox/audio/`, `inbox/urls/`, `inbox/notes/`, `sources/`, `draft_angles/`, `logs/`, and `index/`. 2. Connect the required tools and accounts Use a local Python script plus official APIs. Do not use MCP as the primary surface; this workflow needs file-system access, audio pre-processing, batch transcription, saved-post polling, deterministic Markdown writes, and retry logs. Do not make this CLI-only either; use direct SDK/API access from scripts because structured extraction and indexing need reliable response handling. Set up Python inside the confirmed workspace: ```bash python3 -m venv .venv . .venv/bin/activate pip install openai requests python-dotenv feedparser trafilatura beautifulsoup4 pydub ``` If this machine will process long podcasts or converted audio, check whether `ffmpeg` is installed. If it is missing, tell me to install it before long-file ingestion. Record the check and the install note in `plan.md`. Create or update a custom skill in my current AI workspace at `<target-ai-workspace-skills>/content-vault/SKILL.md`. If my workspace has a different skill root, ask me for the configured skill root and use the same `content-vault/SKILL.md` placement under it. The skill must teach you these operating rules: treat `content_vault/` as the canonical store; never overwrite source notes without appending a revision; keep one Markdown file per source; extract only traceable highlights from the source text; label hooks and draft angles as generated; use semantic search before drafting from the vault. OpenAI setup: ask me to create an OpenAI API project with a restricted API key where available. The key must allow only the endpoints this workflow needs: audio transcription, Responses API extraction, files, vector stores, and embeddings or file search. Store it only as `OPENAI_API_KEY` in `.env` or the workspace environment. Use `OPENAI_TRANSCRIBE_MODEL` if set; otherwise default to `gpt-4o-mini-transcribe` for normal voice notes and podcast clips. Use `gpt-4o-transcribe` when I choose it, and consider `gpt-4o-transcribe-diarize` only for multi-speaker podcasts where speaker-separated segments matter. Readwise setup is optional. If I already use Readwise Reader for saved posts, ask me to generate a Readwise access token and store it as `READWISE_TOKEN`. Send it as `Authorization: Token <READWISE_TOKEN>` only from the script. Validate it with the documented Readwise auth-check endpoint expecting HTTP 204. Use Reader API fetching for saved documents and the Readwise highlights endpoint for highlights. If `READWISE_TOKEN` is blank, skip Reader polling and process only the local URL manifest. Notion is optional and not part of the canonical setup. Do not add Notion unless I explicitly request it later. If I request it, keep the local Markdown vault as the source of truth and use Notion only as a display layer with its own token and shared pages. 3. Watch these trigger surfaces The ingestion trigger is a manual or scheduled local command: ```bash python scripts/ingest_vault.py --source all --since last_successful_sync ``` I will add podcast audio files or exported voice notes to `content_vault/inbox/audio/`, paste saved URLs into `content_vault/inbox/urls/urls.csv`, and write quick ideas as Markdown or text files under `content_vault/inbox/notes/`. If Readwise is connected, the same ingestion command also polls Reader documents and Readwise highlights since the last successful sync. The drafting trigger is separate: ```bash python scripts/search_vault.py "AI agents for creators" --top-k 12 python scripts/draft_angles.py --query "AI agents for creators" --use-search-results ``` Do not automatically draft every time ingestion runs. Ingestion updates the vault; drafting happens only after I ask with a focused query. 4. Run the workflow step by step Create `scripts/ingest_vault.py`, `scripts/search_vault.py`, and `scripts/draft_angles.py`. Write the scripts so they load `.env`, append to `logs/ingest.log`, and update `logs/state.json` with last-read timestamps, processed file hashes, uploaded file IDs, and vector-store file IDs. For ingestion: 1. Read `plan.md` and `progress.md`, then validate that `CONTENT_VAULT_DIR` points to the local vault and `.env` is not committed or copied into shareable notes. 2. Validate credentials with a harmless OpenAI models or files check. If `READWISE_TOKEN` is set, validate Readwise expecting HTTP 204 from the auth check. If any credential fails, stop and write the fix needed to `progress.md`. 3. Scan `inbox/audio/`, `inbox/urls/urls.csv`, `inbox/notes/`, and Readwise/Reader if enabled. Build a manifest of new audio files, URL rows, text notes, Reader documents, highlights, source IDs, and content hashes not already present in `logs/state.json`. 4. For each audio item, confirm the format is one of `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, or `webm`. Check file size before upload. If a file exceeds 25 MB, split or convert it with `ffmpeg` before transcription. Transcribe each valid part through OpenAI speech-to-text, merge part transcripts in order, and preserve timestamps or speaker hints when available.
One email. We'll never message you again unless you ask.
Heads upRun this in a local AI workspace — browser chat can't reach your files.