Unlocking the Power of Consumer Behavior Insights in Digital Advertising
EN

Senior Site Reliability Engineer

Description

You will be joining Azira, a global Consumer Insights platform, helping marketing and operational leaders improve their effectiveness with actionable intelligence to drive business results. Its mission is to create a more relevant world where brands are empowered to reach and build relationships with their consumers.

Azira is seeking a Senior Site Reliability Engineer to perform day-to-day activities that support the company’s data centers, software, and application platforms that service the entire business. It is a demanding role that requires the candidate to be capable of working with cross-functional teams and diagnosing complex issues on various platforms. At Azira, an SRE is essentially a cloud infrastructure engineer, focusing on ensuring the reliability, scalability, and efficiency of our systems.

The ideal candidate should have extensive cloud infrastructure experience as well as superior troubleshooting skills and knowledge of monitoring and alerting mechanisms.

A Day in the Life

  • Manage large-scale production environment and mission-critical cloud infrastructure.
  • Handle stability, automation, scalability, deployment, monitoring, alerting, and security and ensure maximum availability of Azira’s tech infrastructure.
  • Manage distributed big data systems composed of Kafka, EMR, Spark, MongoDB, Elasticsearch, Redis/Valkey, Google AppEngine, and other cloud services.
  • Work closely with big data, data science, and software engineering teams to ensure the infrastructure is capable of serving current and future needs and work independently when needed.
  • Set up monitoring systems, and create and maintain operational run-books.
  • Participate in 24×7 on-call support roles on a rotational basis as needed.
  • Influence, create, and contribute to the automation platform.
  • Ability to work independently and take complete ownership of assigned modules, including collaborating with other teams.

What You Bring to the Role

  • Bachelor’s/Master’s degree in B.Tech/M.Tech.
  • 6-8 years of working experience as a Site Reliability Engineer.
  • Experience with RHEL, CentOS, or Ubuntu system administration
  • 3+ years of strong proficiency with essential Google Cloud and AWS services including IAM, S3/buckets, VPCs.
  • 2+ years experience working with automation tools such as Terraform and Ansible.
  • 4+ years of strong knowledge of DevOps principles and the use of CI/CD tools such as Github Actions, Jenkins, Artifactory, Nexus, Bitbucket, etc.
  • 1+ years of experience with observability/monitoring tools such as Prometheus or Grafana.
  • High-level technical experience with front-end web technologies, CDN, and web server configuration (Apache/Nginx).
  • 2+ years experience with container orchestration services including Docker or Kubernetes.
  • 2+ years experience with defining and deploying monitoring, metrics, and logging systems.
  • 4+ years of experience with source code versioning and pull requests with Git.
  • 5+ years experience scripting in bash; Python/other a plus.
  • Proficiency in documenting processes and monitoring performance metrics.
  • Good interpersonal and communication skills necessary to work effectively with other team members.

Apply to join us