Nova@nova.causticdrop nº 004

(copy this prompt)

The Prompt

1. Scaffold the working files first

You are setting up a repeatable AI file organizer for my local files and one optional Google Drive inbox. Before doing anything else, suggest a workspace folder path like `~/Documents/file-pile-sorter-workflow/`, ask me to confirm or edit it, then create the folder after I confirm.

Your first act after the folder exists is to create these three files:

- `plan.md`: write the complete runbook here. Include the folder taxonomy, install/connect choices, dry-run rules, active-mode rules, Google Drive behavior, model classification schema, review rules, and operating commands. Re-read `plan.md` whenever context gets compacted or a new session starts.
- `.env`: create one placeholder per line and no secret values yet. Use exactly these placeholders:
  - `OPENAI_API_KEY=`
  - `OPENAI_MODEL=`
  - `GOOGLE_DRIVE_CREDENTIALS_PATH=secrets/google_drive_credentials.json`
  - `GOOGLE_DRIVE_TOKEN_PATH=secrets/google_drive_token.json`
  - `ORGANIZER_CONFIG_PATH=organizer/config.yaml`
  - `ORGANIZER_DRY_RUN=true`
  Prompt me to paste each missing value one at a time, or tell me to open `.env` locally and fill them in directly. Never store secrets, OAuth tokens, model keys, or credential JSON contents in `plan.md` or `progress.md`.
- `progress.md`: track setup and runs here. Append a timestamped line after each completed step using this exact format: `## 2026-05-11 14:23 — completed: <step>` followed by `## next: <step>`. Re-read `progress.md` before each new action so you know what remains.

Also create the recommended project structure after I confirm the workspace:

```text
organizer/config.yaml
organizer/run.py
organizer/review_queue/
organizer/logs/moves.jsonl
secrets/google_drive_credentials.json
secrets/google_drive_token.json
```

2. What to connect up-front

Use a local Python script plus direct SDK/API calls. Do not use an MCP server or CLI-only workflow for the main organizer, because this workflow must combine local folder events, image/PDF inputs, typed model output, Google Drive folder creation, download/export, remote movement, and a local audit log. The archived Google Drive MCP reference is read-oriented; this workflow needs write-capable Drive movement, so the official Python client is the better surface.

Create a small custom skill in my current AI workspace at its configured skills location. If the workspace supports skills, use a path like `<target-ai-workspace>/skills/ai-file-organizer/SKILL.md`; if the workspace uses a different skills folder, ask me for the correct folder and record the final path in `plan.md`. The skill must teach these rules: only run against configured inbox folders; never scan the whole disk; destructive actions are forbidden; moves require dry-run approval or confidence above the configured threshold; classification must return strict JSON; sensitive documents require review; normalize filenames without overwriting; escalate low confidence, unreadable files, IDs, financial documents, legal documents, and medical records.

Inside the confirmed workspace, set up Python like this and record the commands in `plan.md`:

```bash
python -m venv .venv
. .venv/bin/activate
python -m pip install -U watchdog google-api-python-client google-auth-httplib2 google-auth-oauthlib openai pyyaml
```

Connect these items before running:

- Local folders to watch. Ask me to explicitly list paths such as `~/Downloads`, `~/Desktop`, a screenshots folder, and a scanned-documents inbox. Do not scan my whole home directory.
- Local destination root folder. Ask me where organized folders should live.
- A category taxonomy. Start with `Receipts`, `Tax`, `Contracts`, `Screenshots`, `IDs`, `Manuals`, `Work`, `Personal`, `Archive`, and `Needs Review`, then ask me to approve or edit it.
- A model API key for a vision/file-capable model endpoint, stored only as `OPENAI_API_KEY` in `.env`, with the model name in `OPENAI_MODEL`.
- If Google Drive files should be organized remotely, a Google Cloud project with Google Drive API enabled, an OAuth Desktop app client for local testing, and the downloaded client JSON saved privately at `secrets/google_drive_credentials.json`.
- If Google Drive is enabled, a Drive inbox folder ID, a Drive destination root folder ID, and approved destination folder names or IDs for the taxonomy.

For Google Drive OAuth during local testing, use the installed Desktop app flow. Store the downloaded client details at `GOOGLE_DRIVE_CREDENTIALS_PATH` and the generated user token at `GOOGLE_DRIVE_TOKEN_PATH`. For dry-run inventory, the read-only Drive scope `https://www.googleapis.com/auth/drive.readonly` can list and read content but cannot move files. For active remote movement between arbitrary Drive folders, use a write-capable Drive scope such as `https://www.googleapis.com/auth/drive`. Choose the most limited scope that satisfies the mode. For personal local automation, keep the OAuth app in testing and add only the intended Google account as a test user unless production verification is completed.

For local folders, ask me to grant the Python process read/write permission only to configured inbox and destination roots. On macOS, remind me that Terminal or my AI workspace may need Files and Folders or Full Disk Access for Desktop, Downloads, or Documents. If I use Google Drive for desktop, ask whether it is in Stream or Mirror mode, but use the Drive API for dependable remote re-parenting instead of relying only on a synced local path.

Assumption: if my AI workspace names its skills folder differently, you may rename the custom skill path, but keep the skill contents and safety rules unchanged.

3. What you watch, read, or trigger on

Support two triggers:

- Manual dry run: I run `python organizer/run.py --dry-run --once`. The script inventories the configured local inbox paths plus the configured Drive inbox and writes proposed actions without moving anything.
- Ongoing watch mode: I run `python organizer/run.py --watch`. The script starts local folder observers for the configured inbox paths and starts a periodic Google Drive polling loop.

The input format is `organizer/config.yaml`. Build it with allowed source paths, destination roots, Drive folder IDs, category definitions, ignored extensions, maximum file size for model submission, confidence thresholds, and explicit dry-run mode. Ignore temporary or partial files with extensions such as `.crdownload`, `.part`, `.tmp`, and scanner lock files. Watch only the configured inbox folders.

4. What you do step-by-step at each event

Implement `organizer/run.py` so each file is processed independently.

1. Load config and secrets from `.env` placeholders. Refuse to start unless `ORGANIZER_DRY_RUN` is explicitly set to `true` or `false`. Default new setups to dry run.

2. Build or refresh the destination folder map. Locally, create missing folders only under the approved destination root. In Drive, use `files.list` to find destination folders and `files.create` only for approved folder names with folder MIME type `application/vnd.google-apps.folder`.

3. Discover candidate files from configured local inboxes and the Drive inbox. Skip directories, trashed Drive files, unsupported sizes, temporary files, scanner lock files, and anything already recorded as moved in `organizer/logs/moves.jsonl`.

4. For local files, wait until file size and modified time are stable. Then collect path, filename, extension, MIME guess, size, timestamps, content hash when practical, and extractable text when available. For screenshots, images, and PDFs, submit the file, page images, or extracted text to the vision/file-capable model endpoint.

One email. We'll never message you again unless you ask.

Heads upRun this in a local AI workspace — browser chat can't reach your files.