Observability¶
Farm provides a built-in observability hub that aggregates metrics, traces, and logs from your infrastructure without requiring you to leave the portal. The hub is available at Observability in the main navigation and is restricted to administrators.
Metrics¶
The Metrics tab connects to your Prometheus instance and renders time-series charts natively inside Farm.
Pre-configured charts¶
Two charts are displayed by default:
| Chart | PromQL query |
|---|---|
| Request rate | rate(http_requests_total[5m]) |
| Memory usage | process_resident_memory_bytes |
Custom PromQL queries¶
Each chart card exposes a query input field. Type any valid PromQL expression and press Run to render the result as a line chart. The chart shows the last hour of data by default.
If Prometheus is not reachable, the card displays a "Prometheus not available" notice and no error is propagated to other parts of the UI.
Configuration¶
Set the PROMETHEUS_URL environment variable on the API server (default: http://localhost:9090). No API key is required for query-only access.
Traces¶
The Traces tab provides a native trace waterfall viewer compatible with Jaeger and Grafana Tempo.
Searching traces¶
- Select a service from the dropdown (populated from
/api/services). - Choose a time range: 15 minutes, 1 hour, 3 hours, or 24 hours.
- The trace list shows Trace ID, service, root operation, total duration, span count, and start time.
Waterfall view¶
Click any row to expand the trace waterfall. Each span is rendered as a horizontal bar proportional to its duration relative to the total trace duration. Services are color-coded automatically using a hash of the service name.
If Jaeger is not reachable, the list shows an "unavailable" notice.
Configuration¶
Set the JAEGER_URL environment variable (default: http://localhost:16686). Farm uses the standard Jaeger HTTP API (/api/traces, /api/services).
Logs¶
The Logs tab queries your Loki instance with LogQL and displays log lines with automatic level detection.
Running a query¶
- Enter a LogQL selector in the query input (default:
{project="farm"}). - Choose a time range: 15 minutes, 1 hour, 3 hours, or 24 hours.
- Press Run to execute.
Up to 200 log lines are shown. Press Load more to fetch additional results.
Useful LogQL selectors¶
| Goal | Selector |
|---|---|
| All Farm logs | {project="farm"} |
| API logs only | {container="farm-api"} |
| Error logs | {project="farm", level="error"} |
| Specific NestJS context | {container="farm-api", context="HttpException"} |
Log levels¶
Farm auto-detects the level from each log line's content:
| Level | Color |
|---|---|
| error | Red |
| warn | Yellow |
| info | Blue |
| debug | Gray |
Configuration¶
Set the LOKI_URL environment variable (default: http://loki:3100 when using the observability stack, http://localhost:3100 otherwise). Farm uses the Loki HTTP API (/loki/api/v1/query_range).
Grafana Dashboards¶
The observability stack ships three pre-configured Grafana dashboards at http://localhost:3002:
| Dashboard | Description |
|---|---|
| Farm API Overview | Request rate, latency percentiles, error rate, traces, and business metrics |
| Farm — Application Logs | Log throughput, error/warn counts, and live log panels per container |
| Farm — Infrastructure | Host CPU, memory, disk I/O, network, and filesystem usage |
All dashboards are provisioned automatically from observability/grafana/provisioning/dashboards/. No login is required in local development.
Alerting Rules¶
Alerting rules let you define PromQL-based thresholds linked to catalog components or environments.
Managing rules¶
Navigate to Alerting in the sidebar to see all configured rules. From this page you can:
- Create a new rule using the "Create Rule" button.
- Enable / disable a rule with the inline toggle switch.
- Delete a rule via the trash icon (confirmation required).
Rule fields¶
| Field | Description |
|---|---|
| Name | Unique identifier for the rule |
| Description | Optional human-readable description |
| PromQL Query | Expression to evaluate (e.g., up == 0) |
| Duration | How long the condition must hold before firing (e.g., 5m, 1h) |
| Severity | critical, warning, or info |
| Component ID | Optional link to a catalog component |
| Environment ID | Optional link to an environment |
| Enabled | Whether the rule is active |
Real-time notifications¶
Farm broadcasts events over WebSocket so you receive instant feedback without polling.
| Event | Toast type |
|---|---|
| Audit log entry created | Info |
| Pipeline run completed successfully | Success |
| Pipeline run failed | Error |
Notifications appear in the bottom-right corner and dismiss automatically after 3 seconds.
External service availability¶
All observability proxies return a graceful degradation response when the upstream service is unreachable:
The UI handles these responses without displaying a global error — individual cards or tabs show a targeted "unavailable" notice. Other tabs in the Observability hub remain fully functional.