My Shortlist

Your shortlisted jobs will appear here. To view your shortlist: Login Or Register

Date Added: YESTERDAY

Linux, HPC, And Kubernetes Systems Engineer

Wallingford, OX, UK
Apply Now

Company: WNTD

Job Type: Contract

Job Title: Linux, HPC, and Kubernetes Systems Engineer

Location: Remote and onsite required as needs be in Wallingford

Job Type: Contract 3 months - Inside IR35

Job Summary: We are looking for a highly skilled Linux, HPC, and Kubernetes Systems Engineer to join our growing team. This position will be responsible for maintaining and troubleshooting High-Performance Computing (HPC) environments, with a focus on Lenovo and Ubiquity platforms, while also managing Kubernetes clusters. The ideal candidate will have strong experience in Linux administration, HPC systems, and Kubernetes, along with a proven ability to solve complex technical issues and optimize infrastructure performance.

Key Responsibilities:

  • Manage and maintain HPC environments with a primary focus on Lenovo and Ubiquity platforms.
  • Install, configure, and troubleshoot Kubernetes clusters in a production environment.
  • Monitor and optimize Linux-based systems, ensuring reliability and performance for HPC and containerized applications.
  • Troubleshoot complex issues in HPC clusters and Kubernetes infrastructure, including hardware, software, networking, and performance-related problems.
  • Manage resource allocation, workload scheduling, and performance tuning for HPC environments.
  • Implement and manage container orchestration using Kubernetes, ensuring scalability and high availability.
  • Automate system processes and improve operational efficiency using Scripting (Bash, Python, etc.).
  • Perform system upgrades, apply patches, and monitor security vulnerabilities in Linux, HPC, and Kubernetes environments.
  • Collaborate with cross-functional teams to design, deploy, and optimize infrastructure solutions for both HPC and Kubernetes-based workloads.
  • Provide documentation, training, and technical support to end-users and internal stakeholders.
  • Ensure that backup and recovery strategies are effectively implemented for both HPC and Kubernetes environments.
  • Monitor system health and performance using appropriate tools (eg, Prometheus, Grafana) and take proactive measures to address potential issues.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience.
  • Proven experience in Linux system administration (Red Hat, CentOS, or Ubuntu).
  • Strong experience managing HPC systems, particularly with Lenovo and Ubiquity platforms.
  • Extensive hands-on experience with Kubernetes cluster deployment, maintenance, and troubleshooting.
  • Deep understanding of containerization technologies like Docker and Kubernetes.
  • Strong troubleshooting skills across Linux, HPC environments, and Kubernetes infrastructures.
  • Proficiency in Scripting languages (Bash, Python) for automation and process improvement.
  • Knowledge of cluster management and workload scheduling software (eg, SLURM, PBS) for HPC environments.
  • Familiarity with networking protocols, server hardware, storage solutions, and system monitoring tools.
  • Ability to work independently in a fast-paced environment, managing multiple tasks and priorities.

Preferred Skills:

  • Experience with cloud-based Kubernetes deployments (AWS, Azure, GCP).
  • Familiarity with container networking, service discovery, and load balancing (eg, Istio, Envoy).
  • Knowledge of DevOps tools and methodologies (eg, Ansible, Terraform).
  • Experience with virtualization and container security practices.
  • Experience working in research, academic, or enterprise-level environments.

Benefits:

  • Competitive salary and benefits package.
  • Health, dental, and vision insurance.
  • Paid time off, holidays, and professional development opportunities.
  • Opportunity to work in a cutting-edge technological environment.
Apply Now