Module Packaging Guide

Data Generators

How to configure generator blocks to automatically fetch or scaffold input data for a rescile module.

Data Generators

Data generators allow rescile to actively fetch or scaffold data to be processed by a module. Instead of requiring users to manually craft JSON or CSV inputs, [generator] blocks automatically fetch data from external APIs, transform formats, or execute scripts to generate inputs or assets.

The Runner Architecture

Generators operate using a multi-runner architecture. You can select the execution mode using the runner field:

  1. Native API Caller (http): Executes HTTP requests natively (e.g., fetching from REST or GraphQL endpoints). Highly secure and portable.

If runner is omitted, it defaults to "http".

Generic Configuration Fields

Every generator requires a target and allows you to configure its execution behavior:

  • target_input / target_asset: The explicit target file to populate. Exactly one must be specified. target_asset outputs to a CSV file matching a declared [[asset]], while target_input outputs to a JSON file matching a declared [[input]].
  • runner: The execution mode (only "http" is currently supported). Defaults to "http".
  • description: A human-readable description of what the generator does.
  • abort_on_failure: If true, aborts the entire rescile process if the generator returns a failure or produces invalid data.

Automatic Output Validation

rescile automatically intercepts and validates the data produced by your generators before importing it into the graph.

  • Asset Validation: If the target_asset corresponds to an [[asset]] defined in module.toml, rescile verifies that the generated CSV output contains all required = true columns and that the data types in each column precisely match the declared schema.
  • Input Validation: If the target_input corresponds to an [[input]] block, rescile parses the generated JSON and checks that the overall structure (format) and all inner objects strictly adhere to the defined fields.

If the generated output is invalid, the generator execution will fail, throwing an error indicating exactly which validation constraint was violated (e.g., missing required column, invalid type, empty value where allow_empty = false). If abort_on_failure is set to true, the entire rescile process will abort.

Execution Triggers & Caching

Since fetching external data can take significant time, rescile provides triggers and caching to improve the developer experience:

  • condition = "on_missing": The command runs only if the target output file does not exist. Ideal for heavy initial dataset seeding.
  • ttl = "1h" (Time-To-Live): Checks the modified time (mtime) of the target file. If the file is younger than the TTL (e.g., 15m, 1h, 2d), execution is skipped.
  • condition = "always" (Default): Runs on every execution unless a valid TTL skips it.

Note: You can forcefully bypass TTL caches by running the CLI with --refresh-generators. You can also entirely skip generator execution using --ignore-generators.