Files
librenotes/.wave/pipelines/ingest.yaml
Michael Czechowski fc24f9a8ab Add Wave general-purpose pipelines
ADR, changelog, code-review, debug, doc-sync, explain, feature,
hotfix, improve, onboard, plan, prototype, refactor, security-scan,
smoke-test, speckit-flow, supervise, test-gen, and more.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 17:02:36 +01:00

174 lines
5.6 KiB
YAML

kind: WavePipeline
metadata:
name: ingest
description: "Ingest a web article into the Zettelkasten as bibliographic and permanent notes"
release: true
input:
source: cli
examples:
- "https://simonwillison.net/2026/Feb/7/software-factory/"
- "https://langfuse.com/blog/2025-03-observability"
- "https://arxiv.org/abs/2401.12345"
steps:
- id: fetch
persona: scout
workspace:
mount:
- source: ./
target: /project
mode: readonly
exec:
type: prompt
source: |
Fetch and extract structured content from a web article.
URL: {{ input }}
## Steps
1. Use WebFetch to retrieve the article content
2. Extract:
- title: article title
- author: author name (look in byline, meta tags, about section)
- date: publication date
- summary: 50-3000 character summary of the article
- key_concepts: list of key concepts with name and description
- notable_quotes: direct quotes with context
- author_year_key: generate AuthorYear key (e.g., Willison2026)
3. If the author name is unclear, use the domain name as author
## Output
Write the result as JSON to output/source-extract.json matching the contract schema.
output_artifacts:
- name: source-extract
path: output/source-extract.json
type: json
handover:
contract:
type: json_schema
source: output/source-extract.json
schema_path: .wave/contracts/source-extract.schema.json
on_failure: retry
max_retries: 2
- id: connect
persona: navigator
dependencies: [fetch]
memory:
inject_artifacts:
- step: fetch
artifact: source-extract
as: source
workspace:
mount:
- source: ./
target: /project
mode: readonly
exec:
type: prompt
source: |
Find connections between extracted source content and existing Zettelkasten notes.
Read the source extract: cat artifacts/source
## Steps
1. For each key concept in the source, search for related notes:
- `notesium lines --filter="concept_name"`
- Read the most relevant matches
2. Identify the Folgezettel neighborhood where new notes belong:
- What section does this content fit in?
- What would be the parent note?
- What Folgezettel address should new notes get?
3. Check if the index note needs updating
4. Determine link directions (should new note link to existing, or existing link to new?)
## Output
Write the result as JSON to output/connections.json matching the contract schema.
Include:
- source_title: title of the source being connected
- related_notes: list of related existing notes with filename, title,
folgezettel_address, relationship explanation, and link_direction
- suggested_placements: where new notes should go in the Folgezettel
with address, parent_note, section, rationale, and concept
- index_update_needed: boolean
- suggested_index_entries: new entries if needed
- timestamp: current ISO 8601 timestamp
output_artifacts:
- name: connections
path: output/connections.json
type: json
handover:
contract:
type: json_schema
source: output/connections.json
schema_path: .wave/contracts/connections.schema.json
on_failure: retry
max_retries: 2
- id: create
persona: scribe
dependencies: [connect]
memory:
inject_artifacts:
- step: fetch
artifact: source-extract
as: source
- step: connect
artifact: connections
as: connections
workspace:
mount:
- source: ./
target: /project
mode: readwrite
exec:
type: prompt
source: |
Create Zettelkasten notes from an ingested web source.
Read the artifacts:
cat artifacts/source
cat artifacts/connections
## Steps
1. **Create the bibliographic note**:
- Use `notesium new` for the filename
- Title: `# AuthorYear` using the author_year_key from the source extract
- Content: source URL, author, date, summary, key quotes
- One sentence per line
2. **Create permanent notes** for key ideas that warrant standalone Zettel:
- Use `notesium new` for each
- Use the Folgezettel address from suggested_placements
- Title: `# {address} {Concept-Name}`
- Write in own words — transform, don't copy
- Add contextual links to related notes (explain *why* the connection exists)
- Link back to the bibliographic note
3. **Update existing notes** if bidirectional links are suggested:
- Add links from existing notes to the new permanent notes
- Include contextual explanation for each link
4. **Update the index note** if index_update_needed is true:
- Add new keyword → entry point mappings
5. **Commit all changes**:
- `git add *.md`
- `git commit -m "ingest: {AuthorYear key in lowercase}"`
6. **Write summary** to output/ingest-summary.md:
- Bibliographic note created (filename, title)
- Permanent notes created (filename, title, Folgezettel address)
- Links added to existing notes
- Index updates made
output_artifacts:
- name: ingest-summary
path: output/ingest-summary.md
type: markdown