SITE RELIABILITY ENGINEER - NTUC ENTERPRISE NEXUS CO-OPERATIVE LIMITED
SITE RELIABILITY ENGINEER

SITE RELIABILITY ENGINEER

NTUC ENTERPRISE NEXUS CO-OPERATIVE LIMITED

Full Time : $ 5000 - $ 8000 / PER MONTH
Full TimeInformation Technology    

Job Description

A site reliability engineer (SRE) will spend up to 50% of their time doing "ops" related work such as issues, on-call, and manual intervention. Since the software system that an SRE oversees is expected to be highly automatic and self-healing, the SRE should spend the other 50% of their time on development tasks such as new features, scaling or automation. The ideal site reliability engineer candidate is either a software engineer with a good administration background or a highly skilled system administrator with knowledge of coding and automation. 

As a SRE in NE Digital, you will drive the initiatives to improve automation, scalability and reliability of our core services such as Fairprice Online, Scan&Go, Identity, my first skool and much more. As a member of NTUC Enterprise Center of Excellence you will be exposed to the latest technologies with AWS Cloud, Google Cloud Platform, Kubernetes, Kubeflow, ML/AI, Big Data, in Hybrid/multi cloud environment. We are strong believers in DevSecOps, SRE, Agile and FinOps. 

Work with release engineers to ensure that the software delivery pipeline is as efficient as possible. 

Collaborate closely with product developers to ensure that the designed solution responds to non-functional requirements such as availability, performance, security, and maintainability. 

Responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning 

Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, and refinement. 

Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.

Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. 

Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity. 

Practice sustainable incident response and blameless postmortem.

Documenting “tribal” knowledge.

Job Requirements

Bachelor's degree in Computer Science, related technical field involving systems engineering, or equivalent practical experience. 

Experience in Unix/Linux and/or Windows operating systems. 

Experience in analyzing and troubleshooting systems. 

Understanding of Infrastructure monitoring, logging, alerting release and configuration management. 

Understanding of networking (e.g. TCP/IP, routing, network topology, load balancers, DNS, NTP). 

Experience in one of the following: Python, Go, Perl, Ruby or shell scripting.

Experience in Public Cloud, AWS and/or GCP. 

Experience maintaining Internet-facing production-grade applications.

Experience with software deployment and/or orchestration technologies, e.g., Puppet, Chef, Salt, Ansible, Docker, Kubernetes, Terraform. 

Experience in CI/CD (e.g., JIRA, Git, Jenkins, Nexus, ...).

Experience in standard IT security practices (e.g., encryption, certificates, key management).

Excellent communication, and problem-solving skills with strong attention to detail.

Flexibility to work non-business hours that may include weekends and/or holidays.

Self-starter who is able to identify and perform tasks with minimal supervision.

Experience with GSuite apps (Gmail, Gsheet, Gdoc, ...).





Work Location

1 MARINA BOULEVARD ONE MARINA BOULEVARD, 018989


Get In Touch


   +65 6904 9612

   75 Ayer Rajah Crescent,
       #01-04 Singapore 139953
        View map here

   sales@findjobs.com.sg

Download App


Findjobs

Cari Kerja

找工

Findjobs Tamil
FindJobs English
  
FindJobs Tamil
  
FindJobs Malay
  
FindJobs Chinese
  
FindJobs App

Follow us