How to modernize Java apps on AWS (EC2 to EKS)

May 2, 2026

Refactoring to Stateless

One of the biggest shifts was moving the application toward a stateless design. Each code path that touched the local filesystem was audited and reworked. Temporary files now use Kubernetes ephemeral volumes, sized specifically per workload, and nothing user-related persists on a pod or node once the process finishes.Sessions were restructured around stateless JWTs signed with rotating keys, removing the need for any server-side session store. Background jobs were split out into separate worker Deployments and Cronjobs so they could scale independently of the API layer. This made it much easier to handle I/O-heavy file operations without slowing down user-facing traffic.

Amazon S3 as File System of Record

S3 became the backbone of file storage. Prefix structures were carefully designed to prevent hot partitions; the pattern looks like bucket/tenant/id/yyyy/mm/dd/uuid. All data is encrypted at rest using AWS KMS and access to the bucket is locked down through a VPC endpoint with Block Public Access enforced.

Data lifecycle policies automatically transition older files to S3 Standard-IA and, later, to Glacier Instant Retrieval based on retention requirements. For files larger than 16 MB, multipart upload is used to boost throughput and reliability. From the application’s point of view, these uploads are atomic; a file either completes fully or not at all.

Each write to S3 includes a checksum, and the consumer validates it either by comparing the ETag or verifying the digest before committing metadata into Aurora.

AWS Secrets Manager and IRSA

All sensitive information like database credentials, API keys, and signing materials are now stored securely in AWS Secrets Manager. Each pod assumes a scoped IAM role through IRSA, which grants access only to the specific secret ARNs required for its function.

Secrets are fetched from Secrets Manager during pod startup and injected into the application before it begins serving traffic. This ensures the app always runs with the latest approved credentials while keeping runtime access minimal. Secrets are cached in memory for performance, and retry logic is built in to handle transient retrieval issues cleanly.

Rotations are managed through AWS Secrets Manager. When a secret is updated, the application picks up the new value automatically after a pod restarts or redeploys. Since no credentials are ever baked into build artifacts or container images, this change effectively removed an entire class of audit and compliance risks that existed in the legacy architecture.

Okta SSO (OIDC) Integration

Since Okta was already being used as the client’s SSO platform, the goal was to make the application integrate smoothly with their existing setup rather than introducing a new identity system. The app was configured as an OIDC confidential client, using Okta as the identity provider. It validates each incoming token by checking the issuer, audience, signature, and expiry before allowing access. Group claims from Okta map directly to roles inside the app, such as ROLE_ADMIN or ROLE_USER, which keeps access control simple and predictable.

The login flow uses the standard authorization code grant with PKCE, which is well-suited for browser-based clients. Token lifetimes are intentionally short, and refresh tokens are tied to both the user and the client, reducing the chance of misuse.

A small local cache stores Okta’s JWKs (public keys) so that token validation stays fast. The cache also detects when keys rotate and refreshes them automatically. If a verification fails, the app blocks access immediately; a fail-closed approach that adds another layer of safety.

Amazon Aurora MySQL Migration

The move to Aurora was one of the most sensitive parts of the modernization. We chose Aurora MySQL mainly for its managed high availability, automated failover, and simpler scaling compared to a self-managed MySQL setup. The cluster was provisioned in private subnets with Multi-AZ enabled, along with automated backups and point-in-time recovery.

After evaluating tools like DMS and native replication, we decided to go with a two-step dump and restore. The first run seeded Aurora early so we could test schema compatibility and performance. The second done on the day of cutover captured the latest data before switching over. This approach was straightforward, low-risk, and easy to assess.

Amazon EKS Deployment and Scaling

On the deployment side, each service has its own set of resource requests and limits, including ephemeral storage for file processing. Readiness and liveness probes were added so that traffic only reaches healthy pods, and if something goes wrong, Kubernetes can automatically restart or drain them.

Traffic comes in through an Application Load Balancer managed by the AWS Load Balancer Controller. It handles TLS termination and ties into AWS WAF for optional request filtering. This setup keeps the networking layer simple and consistent across environments.

Scaling is mostly hands-off now. The Horizontal Pod Autoscaler looks at both CPU usage and a custom metric that tracks the depth of the processing queue, so extra pods spin up only when they’re really needed. Behind the scenes, the Cluster Autoscaler keeps the node groups balanced and right-sized. Pod Disruption Budgets and topology-spread rules make sure workloads stay available across multiple Availability Zones, even during upgrades or node replacements.

Observability (Logs, Metrics, Traces)

Observability was built in from the start rather than added at the end. Since the application runs on the JVM, we wanted solid visibility into how it behaved under load things like garbage collection, memory usage, and thread activity.

Logs are structured and sent to CloudWatch, tagged with correlation IDs so individual requests and background jobs can be traced across different components. This made debugging much easier when something went wrong or performance dipped unexpectedly.

The client already used New Relic for monitoring, so we extended that setup instead of introducing new tools. The application and worker pods publish runtime and business metrics directly to New Relic, where dashboards show latency, throughput, and error trends in near real time. JVM metrics are also collected there, giving teams a single view of system health without jumping between tools.

For distributed tracing, New Relic APM captures traces from the API, worker jobs, and S3 or Aurora interactions. These traces help visualize how requests flow through the system and where bottlenecks occur. SLOs were defined around latency, error rate, and file-processing time, and alerts are routed through New Relic to the on-call team with direct links to runbooks for quick recovery.

Security and Network Hardening

Security was treated as part of the design, not an afterthought. Most internal traffic including calls from the ALB to the pods, now runs over TLS within the cluster. Both Aurora and S3 are accessed entirely through private networking, which keeps data paths off the public internet. Security groups were tightened so only the ALB can reach the application, and Kubernetes Network Policies limit communication between pods to just what’s necessary.

Container images are built from trusted base images and signed before they’re pushed. The registry only accepts verified images, so there’s a clear chain of provenance. IAM policies were reviewed and rewritten to follow strict least-privilege rules: each service can only reach the S3 prefixes and Secrets Manager ARNs it actually needs. KMS keys are tightly controlled, with rotation enabled and minimal grants to reduce exposure.

Operations, Cost and Reliability

Automation became one of the key themes during modernization. Most of the delivery pipeline was built around GitHub Actions and Argo CD, which made the whole process cleaner and easier to repeat. GitHub Actions handled the build workflow following a simple branching model. It built Docker images and pushed them to Amazon ECR using an IAM role connected through GitHub’s OIDC provider. This setup meant there were no long-lived credentials lying around, which was a big security win.

Once the images were in ECR, Argo CD picked it to handle deployments to Amazon EKS. We used Helm charts to keep configurations consistent across environments and to make rollbacks quick if we ever needed them.

Most deployments were done as rolling updates, so new versions went out gradually while old pods drained gracefully. It allowed updates without downtime, and if anything misbehaved, we could stop or roll back halfway through without causing a full outage. It felt like a good middle ground between caution and agility steady enough for production, but fast enough to keep delivery moving.

Cost management was baked in early. We used S3 lifecycle policies to move older files to cheaper storage, tuned node groups to avoid overprovisioning, and relied on VPC endpoints to avoid unnecessary NAT egress charges. Backups and restore procedures weren’t just set up, they were tested regularly. We made sure S3 version recovery and Aurora snapshot or PITR restores actually met the RTO and RPO targets on paper.