Technology

Credible systems architecture—not slideware

This page reflects the engineering reference behind VerifiedSignal: presigned uploads, worker orchestration, deterministic LLM settings, Postgres as truth, and OpenSearch as a derived plane.

Stack overview

AWS-native core with clear separation of concerns

Frontend delivery, identity, object storage, OCR, and LLM inference map to managed services so teams spend integration effort on scoring quality and review UX.

AWS Amplify

Frontend hosting and global edge delivery.

Amazon S3

Object storage with presigned URLs for direct uploads, reducing unnecessary compute on the hot path.

Amazon Cognito

Authentication, including paths that align with AWS Marketplace–native identity flows.

Amazon Textract

High-accuracy OCR for tables, forms, and document geometry.

Amazon Bedrock

LLM reasoning (for example Claude 3.5 / Llama 3) for structured scoring and agentic sanity checks.

Visual overview

System architecture infographic

End-to-end view of major components, flows, and integrations.

VerifiedSignal platform architecture: services, data stores, workers, and client flows across AWS and the application stack.
Architectural overview of the VerifiedSignal platform.

Data plane

PostgreSQL is canonical; OpenSearch is derived

PostgreSQL is the system of record: users, permissions, final scores, billing, and authoritative outcomes. Amazon OpenSearch holds derived search and analytics state: full-text search, kNN vectors, and dashboard aggregations—treated as expendable relative to Postgres.

System of record

PostgreSQL is the system of record: users, permissions, final scores, billing, and authoritative outcomes.

Search & analytics

Amazon OpenSearch holds derived search and analytics state: full-text search, kNN vectors, and dashboard aggregations—treated as expendable relative to Postgres.

Pipeline

Eight-stage ingestion and scoring pipeline

From queued intake through Bedrock scoring, canonical persistence, OpenSearch indexing, and SSE completion events.

Step 1

Intake

Client submits file or URL; API creates a queued record in Postgres.

Step 2

Acquisition

Worker fetches bytes, computes content hashes, and deduplicates.

Step 3

Extraction

Textract/worker extracts plain text and structure; progress published to Redis.

Step 4

Enrichment

Metadata, topical tags, and initial quality flags.

Step 5

LLM scoring

Worker calls Bedrock for structured analysis via the Converse API.

Step 6

Canonical persistence

Validated scores written to the authoritative PostgreSQL layer.

Step 7

Search indexing

Document indexed into OpenSearch for retrieval and analytics.

Step 8

Completion

Final status pushed to the frontend via SSE.

LLM layer

Auditor-style prompting and strict structure

Temperature at zero, schema discipline, and few-shot examples that include graceful failure reduce hallucinated fields.

  • Auditor-style prompting with strict structure (for example XML-style sections) and JSON schema enforcement—including sanity checks such as totals matching line items.
  • Few-shot sets covering success, graceful failure (explicit nulls when data is missing), and messy edge documents to reduce hallucinated fields.
  • Low temperature (0.0) for deterministic, “boring auditor” consistency.

Streaming UX

Server-Sent Events and incremental field hydration

Nginx should disable proxy buffering for SSE (for example `X-Accel-Buffering: no`) so streams reach clients reliably.

Partial JSON token parsing streams fields to the UI as they complete; field state progresses through unseen → in progress → complete → validated → persisted.

{
  "event": "stage",
  "document_id": "doc_123",
  "stage": "extract_text",
  "status": "running",
  "timestamp": "2026-03-26T22:11:10Z",
  "progress": 35
}

Marketplace & metering

Subscription plumbing and usage reporting

Patterns for AWS Marketplace tokens, customer resolution, and scheduled metering that align billable usage with Postgres-grounded counts.

  • Marketplace subscriptions arrive with an `x-amzn-marketplace-token`; the backend exchanges it for a customer identifier linked to billing.
  • Scheduled metering (for example hourly) aggregates document usage from Postgres and reports to AWS Marketplace Metering for usage-based plans.

Deployment

Targets from local to AWS

The reference explicitly supports bare-metal style dev, containerized compose, and Fargate/RDS/OpenSearch Service production shapes.

  • Bare metal / local: filesystem or MinIO and single-node OpenSearch for development.
  • Containers: portable Docker Compose for repeatable environments.
  • AWS: production-style topologies using ECS Fargate, RDS, and Amazon OpenSearch Service.

API-first posture

Integrations and exports

Higher tiers expose CSV/JSON export and API access in the reference packaging—ideal for analysts wiring scores into notebooks, GRC tools, or newsroom CMS hooks.

Security & governance

Explicit failure mitigations

Operational resilience is specified: malformed JSON, search outages, worker crashes, and Redis loss each have a playbook.

DimensionMitigation strategy
LLM returns malformed JSONIncremental parser with field-level validation; fallback to a final pass parse.
Search unavailablePersist to Postgres first; retry indexing later; UI shows search as pending.
Worker crashIdempotent stage checkpoints so work resumes from the last saved state.
Redis outageUI falls back to polling; event logs persist in Postgres for replay.
Drawn from the engineering reference; implement exactly to your SRE standards.

MVP scope

What the reference explicitly defers

Clarity on exclusions keeps delivery honest: no custom training, no complex multi-tenant RBAC for MVP, no native mobile shell.

  • Custom model training (stay on foundation models for MVP).
  • Complex multi-tenant roles (admin-first MVP).
  • Dedicated mobile apps (responsive web only).

Go deeper with the team

Architecture reviews, threat modeling, and integration design for your AWS estate.