A flexible synthetic data generator in Go, designed to produce realistic logs and metrics for testing and development.
---
config:
look: handDrawn
theme: default
---
flowchart LR
subgraph Config["Config Source"]
YAML["config.yaml"]
ENV["Env Vars"]
end
subgraph Tool["Synthetic Data Generator"]
direction TB
G["<b>Generator</b><br>Logs & Metrics"]
P["<b>Processor</b><br>Throttling & Batching"]
E["<b>Exporter Engine</b>"]
end
subgraph AWS["AWS Services"]
S3["S3 Bucket"]
CW["CloudWatch Logs"]
FH["Kinesis Firehose"]
end
subgraph Azure["Azure"]
EH["Event Hub"]
end
subgraph Local["Local Storage"]
LF["Local File"]
end
G --> P
P --> E
Config --> Tool
E --> AWS & Azure & Local
style E fill:#d1e7ff,stroke:#004a99,stroke-width:2px
style Tool fill:#f9f9f9,stroke:#D50000,stroke-width:2px,color:#000000
-
Clone the repository (alternatively download a release binary from releases)
-
Copy following to
config.yamlinput: type: LOGS delay: 500ms batching: 2s max_batch_size: 500_000 max_runtime: 5s output: type: DEBUG config: verbosity: detailed
-
Run data generator with one of the following commands,
From source,
go run cmd/main.go --config ./config.yamlUsing a release binary,
./dataGenerator_darwin_arm64 --config ./config.yaml
Observe the terminal for generated logs.
Following program commands are supported,
--config <file_path>: Path to configuration file. Default is./config.yaml.--metrics <file_path>: Path to write internal metrics. If ignored, metrics will be logged to console. (see supported metrics below)--debug: Enable debug mode for verbose logging.
For example, to run with a custom config file at ./myconfig.yaml, writing internal metrics to metrics.json at end and enabling debug mode,
./dataGenerator_darwin_arm64 --config ./myconfig.yaml --metrics ./metrics.json --debugList below explains supported internal metrics.
startTime: Timestamp when the generator started.endTime: Timestamp when the generator ended.totalBatches: Total number of batches exported.totalElements: Total number of data elements generated.totalBytes: Total number of bytes generated.
Given below are the supported configuration options.
Check config.sample.yaml for reference.
Given below are supported input types and their related environment variable overrides,
| YAML Property | Environment Variable | Default | Description |
|---|---|---|---|
type |
ENV_INPUT_TYPE |
- (required from user) | Specifies the input data type (eg, LOGS, METRICS, ALB, NLB, VPC, CLOUDTRAIL, WAF). |
delay |
ENV_INPUT_DELAY |
1s | Delay between a data point. Accepts value in format like 5s (5 seconds), 10ms (10 milliseconds). |
batching |
ENV_INPUT_BATCHING |
0 (no batch duration) | [Batching] Set time delay between data batches. Accepts a time value similar to delay. The generated data batch get forwarded to output when this target is met. |
max_batch_size |
ENV_INPUT_MAX_BATCH_SIZE |
0 (no max bytes) | [Batching] Set maximum byte size for a batch. The generated data batch get forwarded to output when this target is met. |
max_batch_elements |
ENV_INPUT_MAX_BATCH_ELEMENTS |
0 (no max element count) | [Batching] Set maximum element count for a batch. The generated data batch get forwarded to output when this target is met. |
max_data_points |
ENV_INPUT_MAX_DATA_POINTS |
- (no limit) | [Runtime] Set maximum amount of data points (elements) to generate during runtime. Program exit once this target is met. |
max_runtime |
ENV_INPUT_MAX_RUNTIME |
- (no max runtime) | [Runtime] Set the duration for full load generation runtime. Program exit once this target is met. |
Note
You must define one of the terminal conditions for batching ([Batching]) or runtime ([Runtime]).
For example, define one of max_data_points or max_runtime when max_batch_size, max_batch_elements or batching duration is not set.
On the other hand, for example when batching duration is set, you can run load generator indefinitely without defining runtime ([Runtime]) limits.
Given below are supported type values for input,
| Log Type | Description |
|---|---|
ALB |
Generate AWS ALB formatted logs with some random content |
NLB |
Generate AWS NLB formatted logs with some random content |
VPC |
Generate AWS VPC formatted logs with randomized content |
CLOUDTRAIL |
Generate AWS CloudTrail formatted logs with randomized content. Data is generated for AWS S3 Data Event |
WAF |
Generate AWS WAF formatted logs with randomized content |
AZURE_RESOURCE_LOGS |
Generate Azure Resource logs with randomized content |
LOGS |
ECS (Elastic Common Schema) formatted logs based on zap |
METRICS |
Generate metrics similar to a CloudWatch metrics entry |
Example:
input:
type: LOGS # Input type LOGS
delay: 500ms # 500 milliseconds between each data point
batching: 10s # Emit generated data batched within 10 seconds
max_batch_size: 10000 # Limit maximum batch size to 10,000 bytes. The output is capped at 1000 bytes/second max
max_data_points: 10000 # Exit input after generating 10,000 data pointsTip
When max_batch_size is reached, elapsed time for batching will be considered before generating new data
Given below are supported output configurations and their related environment variable overrides,
| YAML Property | Environment Variable | Default | Description |
|---|---|---|---|
type |
ENV_OUT_TYPE |
- (Mandatory custom property) | Accepts the output type (see table below) |
wait_for_completion |
ENV_OUT_WAIT_FOR_COMPLETION |
true | Wait for output exports to complete when shutting down. Default is true. |
Given below are supported output types,
| Output Type | Description |
|---|---|
FIREHOSE |
Export to AWS Firehose stream |
CLOUDWATCH_LOG |
Export to AWS CloudWatch log group |
S3 |
Export to AWS S3 bucket |
EVENTHUB |
Export to Azure Event hub |
FILE |
Export to a file |
Sections below provide output specific configurations
| YAML Property | Environment Variable | Description |
|---|---|---|
s3_bucket |
ENV_OUT_S3_BUCKET |
S3 bucket name (required). |
compression |
ENV_OUT_COMPRESSION |
To compress or not the output. Currently supports gzip. |
path_prefix |
ENV_OUT_PATH_PREFIX |
Optional prefix for the bucket entry. Default to logFile-. |
Example:
output:
type: S3
config:
s3_bucket: "testing-bucket"
compression: gzip
path_prefix: "datagen"| YAML Property | Environment Variable | Description |
|---|---|---|
stream_name |
ENV_OUT_STREAM_NAME |
Firehose stream name (required). |
Example:
output:
type: FIREHOSE
config:
stream_name: "my-firehose-stream"| YAML Property | Environment Variable | Description |
|---|---|---|
log_group |
ENV_OUT_LOG_GROUP |
CloudWatch log group name. |
log_stream |
ENV_OUT_LOG_STREAM |
Log group stream name. |
Example:
output:
type: CLOUDWATCH_LOG
config:
logGroup: "MyGroup"
logStream: "data"Note
CloudWatch Logs API (PutLogEvents) is optimized for single log messages per API call. When batching is enabled, multiple log entries are concatenated into a single message, which may not be ideal for log analysis and searching in CloudWatch. For CloudWatch destinations, consider setting batching: 0s (no batching) or using a small delay without batching. Batching is more suitable for bulk ingest endpoints like S3 and Firehose.
| YAML Property | Environment Variable | Description |
|---|---|---|
connection_string |
ENV_OUT_EVENTHUB_CONNECTION_STRING |
Connection string for the Event Hub namespace |
event_hub_name |
ENV_OUT_EVENTHUB_NAME |
Event hub entity name to export genereated data |
namespace |
ENV_OUT_EVENTHUB_NAMESPACE |
Event hub namespace |
Example:
Using with connection string & event hub name:
output:
type: EVENTHUB
config:
connection_string: "Endpoint=sb:xxxxxx"
event_hub_name: "<event_hub_name>"Using with connection string & event hub name (requires IAM role assigned to the role running this application):
output:
type: EVENTHUB
config:
event_hub_name: "<event_hub_name>"
namespace: "<namespace>"| YAML Property | Environment Variable | Description |
|---|---|---|
location |
ENV_OUT_LOCATION |
Output file location. Default to ./out. When batching, file suffix will increment with numbers (e.g., out_0, out_2). |
Example:
output:
type: FILE
config:
location: "./data"| YAML Property | Environment Variable | Description |
|---|---|---|
region |
AWS_REGION |
Region to use by exporters. Default is us-east-1. |
profile |
AWS_PROFILE |
Credential profile to use by exporters. Default is default. |
Example:
aws:
region: "us-east-1"
profile: "default"Generate ECS-formatted log every 2s, batch them in 10 seconds and forward to S3 bucket. Default delay is 1s between data points.
input:
type: LOGS
delay: 2s
batching: 10s
output:
type: s3
config:
s3_bucket: "testing-bucket"Generate ALB logs.
No delay between data points (continuous data generating).
Limit batching to 10 seconds and max batch size is set to 10MB. This translates to ~1 MB/second data load.
S3 files will be in gzip format.
input:
type: ALB
delay: 0s
batching: 10s
max_batch_size: 10_000_000 # 10 MB max bytes per batch
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"Generate WAF logs.
Delay of 100ms between data points.
Limit batching to max 1000 elements per batch.
Send to S3 in gzip format.
input:
type: WAF
delay: 100ms
max_batch_elements: 1000 # Max 1000 elements per batch
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"Generate VPC logs and limit to 200 data points. Then upload it to S3 in gzip format.
input:
type: VPC
delay: 1s
max_data_points: 200 # Limit to 200 data points total of the runtime
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"Generate CLOUDTRAIL logs and limit to generator runtime of 5 minutes.
input:
type: CLOUDTRAIL
delay: 10us # 10 microseconds between data points
batching: 10s
max_runtime: 5m # 5 minutes
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"