Data Generators
Data generators allow rescile to actively fetch or scaffold data to be processed by a module. Instead of requiring users to manually craft JSON or CSV inputs, [generator] blocks automatically fetch data from external APIs, transform formats, or execute scripts to generate inputs or assets.
The Runner Architecture
Generators operate using a multi-runner architecture. You can select the execution mode using the runner field:
- Native API Caller (
http): Executes HTTP requests natively (e.g., fetching from REST or GraphQL endpoints). Highly secure and portable.
If runner is omitted, it defaults to "http".
Generic Configuration Fields
Every generator requires a target and allows you to configure its execution behavior:
target_input/target_asset: The explicit target file to populate. Exactly one must be specified.target_assetoutputs to a CSV file matching a declared[[asset]], whiletarget_inputoutputs to a JSON file matching a declared[[input]].runner: The execution mode (only"http"is currently supported). Defaults to"http".description: A human-readable description of what the generator does.abort_on_failure: Iftrue, aborts the entirerescileprocess if the generator returns a failure or produces invalid data.
Automatic Output Validation
rescile automatically intercepts and validates the data produced by your generators before importing it into the graph.
- Asset Validation: If the
target_assetcorresponds to an[[asset]]defined inmodule.toml,rescileverifies that the generated CSV output contains allrequired = truecolumns and that the data types in each column precisely match the declared schema. - Input Validation: If the
target_inputcorresponds to an[[input]]block,rescileparses the generated JSON and checks that the overall structure (format) and all inner objects strictly adhere to the definedfields.
If the generated output is invalid, the generator execution will fail, throwing an error indicating exactly which validation constraint was violated (e.g., missing required column, invalid type, empty value where allow_empty = false). If abort_on_failure is set to true, the entire rescile process will abort.
Execution Triggers & Caching
Since fetching external data can take significant time, rescile provides triggers and caching to improve the developer experience:
condition = "on_missing": The command runs only if the target output file does not exist. Ideal for heavy initial dataset seeding.ttl = "1h"(Time-To-Live): Checks the modified time (mtime) of the target file. If the file is younger than the TTL (e.g.,15m,1h,2d), execution is skipped.condition = "always"(Default): Runs on every execution unless a valid TTL skips it.
Note: You can forcefully bypass TTL caches by running the CLI with --refresh-generators. You can also entirely skip generator execution using --ignore-generators.