SRE / SecOps Engineer (Infrastructure & Security)
Remote (LATAM) | Full-Time Contractor
The Opportunity
AceUp is pivoting to an AI-first architecture. This transition introduces new infrastructure challenges: deploying Python microservices alongside our Ruby monolith, managing Vector Database latency, and securing sensitive conversational data used in RAG pipelines.
We are looking for a SRE / SecOps Engineer to work under the guidance of our Infrastructure Lead. Together, you will own the foundation of our platform. You will be the guardian of our production environment, responsible for automating our infrastructure (IaC), securing our perimeter (DevSecOps), and ensuring “Ally” (our AI) is always available and fast.
The Tech Stack
- Cloud Provider: Google Cloud Platform (GCP).
- Infrastructure as Code: Terraform (or Pulumi).
- Compute: Cloud Run (Serverless), GKE (Kubernetes – if applicable), Cloud Functions.
- CI/CD: GitHub Actions.
- Observability: Datadog / GCP Cloud Monitoring / Sentry.
- Security: GCP IAM, Cloud Armor, Vanta (Compliance).
What You Will Do
- API Reliability & Availability: Ensure our core APIs and AI inference endpoints maintain high availability (99.9%+). You will define and monitor strict SLAs, SLIs, and SLOs, using synthetic checks and real-user monitoring to catch degradation before customers do.
- Incident Management & On-Call: Establish and lead the engineering on-call rotation. You will define incident response protocols, run blameless post-mortems to prevent recurrence, and serve as the primary escalation point during critical outages.
- Own the Cloud Architecture: Manage and evolve our GCP infrastructure using Terraform. You will ensure our environments (Staging, Prod, AI-Sandbox) are reproducible, isolated, and cost-optimized.
- Secure the AI Pipeline: Implement strict IAM policies and network controls (VPC Service Controls) to protect our proprietary datasets and Vector Stores. You will ensure that PII is redacted or encrypted before it touches our AI models.
- Automate CI/CD: Build robust deployment pipelines for both our Ruby on Rails monolith and our new Python AI microservices. You will implement “Guardrails” that prevent bad code from reaching production.
- Compliance & SecOps: Lead our security initiatives. You will manage vulnerability scanning (container scanning), coordinate penetration tests, and automate evidence collection for SOC2 compliance.
Who You Are
- A “DevOps” Native: You don’t click buttons in the Cloud Console; you write code to create resources. You believe in “Immutable Infrastructure.”
- Security-First: You think about “Least Privilege” by default. You know how to lock down a GCP project without stopping developers from doing their jobs.
- Pragmatic Scaler: You know when to use a simple Cloud Run service and when to spin up a GKE cluster. You avoid over-engineering.
- Incident Commander: You remain calm during an outage. You are capable of debugging a production fire, whether it’s a database lock or a failed API handshake.
Requirements
- Experience: 5+ years of DevOps / SRE experience.
- GCP Mastery: Deep experience with Google Cloud Platform (IAM, Networking, Cloud Run, Cloud Build).
- IaC Expert: Strong proficiency with Terraform.
- Containerization: Expert in Docker and container orchestration.
- Scripting: fluent in Bash and Python (or Ruby).
- Language: Conversational English is required.
Nice to Haves
- Experience with AI/ML Infrastructure (deploying models to Vertex AI, managing GPU quotas).
- Experience with SOC2 or ISO 27001 audits.
- Background in Rails hosting.
AceUp is proud to be an equal opportunity employer, seeking to create a welcoming and diverse environment.
All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.
Please send us a resume and a short intro to careers@aceup.com