Skip to main content

Senior Site Reliability Engineer

Alter Domus Alter Domus

Hyderabad, India

Hybrid

Apply now

Senior Site Reliability Engineer

As a Platform Engineer with an SRE focus, you will be instrumental in ensuring our infrastructure is scalable, reliable, and efficient. You will apply software engineering principles to resolve systematic problems, automate production operations, and streamline processes. Your role will balance service reliability and delivery speed while working closely with development teams to build and maintain tools for deployment, monitoring, and operations.

    Key Responsibilities:

    • Design, write, and deliver Terraform modules to improve the availability and reliability of our services on Azure.
    • Manage Kubernetes clusters, ensuring a secure, scalable, and robust environment for our applications.
    • Develop and maintain CI/CD pipelines for automated deployment and management of infrastructure and applications.
    • Monitor and ensure performance, reliability, and security across all platforms and infrastructure.
    • Implement proactive measures to prevent operational issues.
    • Collaborate with development teams to design scalable services through the lens of SRE principles.
    • Administer and troubleshoot Windows systems and Active Directory in a cloud or hybrid environment.
    • Conduct post-incident reviews and drive root-cause analyses and the implementation of learned preventative measures.
    • Continuously assess and improve system performance and reliability.
    • Develop documentation and operational standards as well as mentor team members.

    Qualifications:

    • Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.
    • Minimum of 5 years of experience in a similar platform engineering or SRE role.
    • Strong experience with Azure cloud services, Kubernetes, and Terraform.
    • Proven background in Windows system administration and Active Directory management.
    • Familiarity with scripting and automation using PowerShell or other scripting languages.
    • Experience with implementing CI/CD pipelines.
    • Understanding of network protocols and services (DNS, HTTP, TLS, etc.).
    • Experience in managing full application stacks from the OS up through custom applications.
    • Good knowledge of best practices for IT operations in an always-up, always-available service.

    Preferred Skills:

    • Certifications such as Azure Solutions Architect, Kubernetes Administrator (CKA), Terraform Associate, or similar.
    • Experience in an Agile/Scrum environment.
    • Solid experience in MS Sql Server
    • Familiarity with monitoring tools such as Prometheus, Grafana, or equivalent.
    • Strong communication skills and the ability to work well within a team.
    • Passionate about learning new technologies and improving systems and processes.

    #LI-DH1 #LI-Hybrid

    Apply now

    Sign up for job alerts

    Don't see what you're looking for? Sign up and we'll notify you when roles become available.

    Select a job category from the list of options. Search for a location and select one from the list of suggestions. You can also search and select 'ALL' categories or locations. Finally, click “Add” to create your job alert.

    Interested InSearch for a category and select one from the list of suggestions. Search for a location and select one from the list of suggestions. Finally, click “Add” to create your job alert.

    By submitting your information, you acknowledge that you have read our privacy policy (this content opens in new window) and consent to receive email communication from Alter Domus.

    Employee smiling in an office