We are looking for a highly skilled Senior Operations Engineer to join our dynamic team. In this role, you will be responsible for the reliability, performance, and availability of our systems and applications. The ideal candidate will have extensive experience in operations, systems engineering, and automation, along with a strong understanding of cloud technologies and best practices in DevOps. This position offers a unique opportunity to lead initiatives that drive efficiency and innovation within our organization.
Responsibilities:
- Design, implement, and manage robust operational systems to ensure high availability and performance of applications and services
- Monitor system performance, identify bottlenecks, and proactively address issues to maintain operational excellence
- Collaborate with development teams to build and maintain CI/CD pipelines, ensuring smooth deployment processes
- Develop and maintain automation scripts and tools for system monitoring, configuration, and deployment
- Conduct root cause analysis of incidents, develop preventive measures, and implement corrective actions to improve system reliability
- Evaluate and recommend new technologies, tools, and processes to enhance operational efficiency and scalability
- Ensure compliance with security policies and best practices in system configurations and operations
- Prepare and present operational reports to stakeholders, highlighting performance metrics and areas for improvement
- Mentor and train junior operations engineers, fostering a culture of collaboration and continuous learning
- Participate in on-call rotations to provide support for critical incidents and ensure timely resolution of operational issues
Requirements:
- Bachelor’s degree in Computer Science, Information Technology, or a related field
- 5+ years of experience in operations engineering, systems administration, or related roles
- Strong knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes)
- Proficiency in scripting and programming languages (e.g., Python, Bash, PowerShell) for automation and tooling
- Experience with monitoring tools (e.g., Nagios, Prometheus, Grafana) and logging solutions (e.g., ELK Stack, Splunk)
- Understanding of networking concepts and protocols, as well as infrastructure as code (IaC) practices (e.g., Terraform, CloudFormation)
- Excellent problem-solving skills and the ability to work effectively under pressure in a fast-paced environment
- Strong communication skills, with the ability to collaborate with cross-functional teams and convey technical concepts clearly
- Familiarity with Agile methodologies and DevOps practices is a plus
- Relevant certifications (e.g., AWS Certified Solutions Architect, Google Cloud Professional DevOps Engineer) are an advantage
Work Environment:
- Office-based with potential for remote work options, depending on company policies
- Dynamic and collaborative atmosphere that encourages innovation and professional growth
- Opportunities for career advancement within the operations engineering field