librenotes/.wave/pipelines/ingest.yaml

kind: WavePipeline
metadata:
  name: ingest
  description: "Ingest a web article into the Zettelkasten as bibliographic and permanent notes"
  release: true

input:
  source: cli
  examples:
    - "https://simonwillison.net/2026/Feb/7/software-factory/"
    - "https://langfuse.com/blog/2025-03-observability"
    - "https://arxiv.org/abs/2401.12345"

steps:
  - id: fetch
    persona: scout
    workspace:
      mount:
        - source: ./
          target: /project
          mode: readonly
    exec:
      type: prompt
      source: |
        Fetch and extract structured content from a web article.

        URL: {{ input }}

        ## Steps

        1. Use WebFetch to retrieve the article content
        2. Extract:
           - title: article title
           - author: author name (look in byline, meta tags, about section)
           - date: publication date
           - summary: 50-3000 character summary of the article
           - key_concepts: list of key concepts with name and description
           - notable_quotes: direct quotes with context
           - author_year_key: generate AuthorYear key (e.g., Willison2026)
        3. If the author name is unclear, use the domain name as author

        ## Output

        Write the result as JSON to output/source-extract.json matching the contract schema.
    output_artifacts:
      - name: source-extract
        path: output/source-extract.json
        type: json
    handover:
      contract:
        type: json_schema
        source: output/source-extract.json
        schema_path: .wave/contracts/source-extract.schema.json
        on_failure: retry
        max_retries: 2

  - id: connect
    persona: navigator
    dependencies: [fetch]
    memory:
      inject_artifacts:
        - step: fetch
          artifact: source-extract
          as: source
    workspace:
      mount:
        - source: ./
          target: /project
          mode: readonly
    exec:
      type: prompt
      source: |
        Find connections between extracted source content and existing Zettelkasten notes.

        Read the source extract: cat artifacts/source

        ## Steps

        1. For each key concept in the source, search for related notes:
           - `notesium lines --filter="concept_name"`
           - Read the most relevant matches
        2. Identify the Folgezettel neighborhood where new notes belong:
           - What section does this content fit in?
           - What would be the parent note?
           - What Folgezettel address should new notes get?
        3. Check if the index note needs updating
        4. Determine link directions (should new note link to existing, or existing link to new?)

        ## Output

        Write the result as JSON to output/connections.json matching the contract schema.
        Include:
        - source_title: title of the source being connected
        - related_notes: list of related existing notes with filename, title,
          folgezettel_address, relationship explanation, and link_direction
        - suggested_placements: where new notes should go in the Folgezettel
          with address, parent_note, section, rationale, and concept
        - index_update_needed: boolean
        - suggested_index_entries: new entries if needed
        - timestamp: current ISO 8601 timestamp
    output_artifacts:
      - name: connections
        path: output/connections.json
        type: json
    handover:
      contract:
        type: json_schema
        source: output/connections.json
        schema_path: .wave/contracts/connections.schema.json
        on_failure: retry
        max_retries: 2

  - id: create
    persona: scribe
    dependencies: [connect]
    memory:
      inject_artifacts:
        - step: fetch
          artifact: source-extract
          as: source
        - step: connect
          artifact: connections
          as: connections
    workspace:
      mount:
        - source: ./
          target: /project
          mode: readwrite
    exec:
      type: prompt
      source: |
        Create Zettelkasten notes from an ingested web source.

        Read the artifacts:
          cat artifacts/source
          cat artifacts/connections

        ## Steps

        1. **Create the bibliographic note**:
           - Use `notesium new` for the filename
           - Title: `# AuthorYear` using the author_year_key from the source extract
           - Content: source URL, author, date, summary, key quotes
           - One sentence per line

        2. **Create permanent notes** for key ideas that warrant standalone Zettel:
           - Use `notesium new` for each
           - Use the Folgezettel address from suggested_placements
           - Title: `# {address} {Concept-Name}`
           - Write in own words — transform, don't copy
           - Add contextual links to related notes (explain *why* the connection exists)
           - Link back to the bibliographic note

        3. **Update existing notes** if bidirectional links are suggested:
           - Add links from existing notes to the new permanent notes
           - Include contextual explanation for each link

        4. **Update the index note** if index_update_needed is true:
           - Add new keyword → entry point mappings

        5. **Commit all changes**:
           - `git add *.md`
           - `git commit -m "ingest: {AuthorYear key in lowercase}"`

        6. **Write summary** to output/ingest-summary.md:
           - Bibliographic note created (filename, title)
           - Permanent notes created (filename, title, Folgezettel address)
           - Links added to existing notes
           - Index updates made
    output_artifacts:
      - name: ingest-summary
        path: output/ingest-summary.md
        type: markdown