University of Liverpool
Senior Research IT Platform Engineer
Salary: £45,485 - £57,968
Role overview and University context:
The IT Services Department has a key role to play in maintaining the IT infrastructure and business systems of the University to support the 54 academic departments, based in 300 buildings, across a number of sites.
The University of Liverpool's IT Services Department is recruiting a Senior Research IT Platform Engineer to join the Research IT Team (formerly Advanced Research Computing). The team currently supports various research IT infrastructure, including a large (more than 7000 cores) HPC cluster called Barkla (see: https://www.liverpool.ac.uk/it/advanced-research-computing/facilities/high-performance-computing/ ), a HTCondor High Throughput Computing pool providing around 2500 job slots peak (see: https://condor.liv.ac.uk), various Linux servers and cloud facilities.
Based on this infrastructure, a range of high-quality research IT services have been delivered to University researchers by the team. We are embarking on an ambitious expansion of our Research IT capabilities and are looking to significantly grow the portfolio of services provided by the existing Research IT team, from enhanced technical infrastructure, research software engineering support to more extensive researcher engagement. Key to this ambition is to build a strong technology-driven team and recruit the above-mentioned post.
You will be responsible to administer and further develop a range of research computing infrastructure that supports university researchers on both local and national resources. This includes planning and designing new research computing infrastructure (such as HPC cluster, HTCondor, high-performance data storage, virtual desktop infrastructure, cloud infrastructure) and dealing with vendors and users. You will work as part of a dynamic team in developing and delivering high quality research computing services (such as AI services, research data service) based on the infrastructure. We are looking for individuals with a sound knowledge of system administration with a solid understanding of Linux Systems, networks and storage, especially in the context of high-performance computing and high-performance data storage. Working with the line manager you will also contribute to the strategic planning of research computing services and lead associated implementation work.
This will include aspects of defining new services, ensuring service delivery, user engagement and support, hardware and software installation and support, maintenance and upgrades of systems that include HPC services, research servers, and storage infrastructure. This a specialist role which includes provision of expert technical advice to other teams within the broader University. You will also have a broad interest in IT infrastructure, and its provision, and will be encouraged and supported to undertake further training as required in line with changing service developments.
The position is a part of a vibrant and growing activity that is at the heart of research computing in the University of Liverpool and provides exciting opportunities for individuals seeking to develop their career in these areas.
You will liaise closely with the Service Desk to ensure that all research computing related requests and incidents are progressed, through call tracking, communication and escalation where appropriate. The role holder will use their expert knowledge of IT systems to troubleshoot and resolve issues in a timely manner within agreed response times.
We require enthusiastic individuals with a passion for learning new technology and skills. You must be self-motivated, capable of working unsupervised but also able to work closely with colleagues both in and outside of the team and department.
The university operates a hybrid work environment for staff in Central Professional Services. There is an opportunity to work from home and on campus.
- Plan and deploy operating system and application upgrades, security patches and other standard Linux system management tasks.
- Manage and deploy hardware upgrades for the compute, network and storage services.
- Develop future plans for new services, engaging with the research community to capture requirements.
- Work to ensure local research computing services work with and complement regional research computing infrastructure through the involvement with N8CIR consortium and with other national Tier 2 and Tier 1 services to which Liverpool users have access.
- Work with the team leader to develop long term strategies for developing and sustaining the portfolio of research IT services, both locally and regionally.
- Manage the installation and provision of services working in concert with the team leader. This involves performing and overseeing installation and deployment work in a data centre environment of compute, storage and network systems.
- Plan and carry out changes to services in communication with users and the wider University IT support community.
- Troubleshoot systems and carry out or coordinate repairs in a timely manner.
- Ensure best practices are consistently applied to research IT services. Working with the team leader to define new practices as appropriate.
- Write and update documentation for research IT infrastructure.
- Investigate areas of potential benefit for developing the research IT infrastructure for the University. Pro-actively engage with the future Research IT Technical Advisory Group.
- Assist with procurements, asset management and other administration duties as required.
- Engage with users to ensure services delivered meet user requirements.
- Support users and other Research IT staff in diagnosing complex problems requiring HPC specialist knowledge.
- Pro-actively engage with users of the Research IT to maximise the impact of the Research IT facilities on university research.
- Provide reports on user usage and system operation as requested.
- Provide support for deploying software applications as required.
- Develop and support user forums, open days, training events and other research computing related community meetings to ensure Research IT provides services relevant to the research community.
- To deliver research computing training as and when required to a wider audience, providing briefing notes for the courses delivered.
- Assist in supporting and deploying new and novel services (e.g. cloud based services), transitioning services into production.
- Assist in supporting other compute clusters in the University, particularly as they transition to the new data centre.
- Pro-actively develop links with those in similar roles within the university and at other HE sites.
- Allocate tasks to any junior members of the team, giving guidance where necessary.
- Provide coaching/training to established team members on an on-going basis, acting as mentor to junior team members.
- Continuously learn new technology & skills, and apply them to the work.
- Establish good working relationships with clients and colleagues, internal and external to the department and the University.
- Assist on courses provided as part of the IT Services training programme.
- Ensuring sound time management, multi-tasking effectively in order to adapt plans, prioritise and coordinate work, responding as necessary to changing priorities, circumstances, and workload.
- Undertake other duties commensurate with the grade as required.
Skills & Knowledge:
Proven experience in six or more of the following areas is essential.
- HPC systems administration.
- Redhat/CentOS/Ubuntu Linux systems administration.
- HPC performance and benchmarking.
- Job schedulers such as Slurm, Grid Engine and PBS.
- HTCondor management.
- Authentication methods including LDAP, Active Directory and Kerberos.
- Experience of monitoring and alerting systems such as Nagios, Ganglia and SCOM.
- Configuration management tools such as Ansible, Puppet and Chef.
- High-performance data storage management.
- Storage area network and fibre channel networking administration.
- Network file services including NFS with Kerberos-based authentication.
- Parallel file systems such as Lustre.
- CIFS/SMB and Samba administration.
- Cloud storage (AWS/Azure/AIMES/Dropbox) management.
- Cloud infrastructure and services management.
- VMWare virtual machine management.
- Experience of using container tools such as Docker and Singularity.
- Experience of using virtual environment tools such as Conda, Virtualenv and Pipenv.
- Experience of installing software using various methods.
- Experience of managing software with Environment Modules.
- Experience of compiling, porting, testing, debugging, profiling and/or optimising code.
- Shell scripts (bash/csh/tcsh).
- Parallel programming experience of using libraries/tools such as MPI, OpenMP and CUDA.
- Experience of using GPUs for scientific computing.
- Research experience of using artificial intelligence, machine learning or deep learning technology.
- Experience of using deep learning frameworks such as Tensorflow, PyTorch, Keras and Caffe.
- Experience of using computationally-intensive applications such as VASP, Gaussian, Molpro, Gromacs, LAMMPS, CP2K, CASTEP, Spartan, NEMO, OpenFoam, NEMO, Fluent, Abaqus and Matlab.
- Experience of administering and configuring databases like mySQL, MariaDB and SQLite.
- Data management in REDCap.
- Data visualisation tools such as Power BI.
- Experience of managing license servers for applications such as Matlab, Maple, Mathematica, Abaqus, Fluent, MathCAD and Spartan.
- Experience of using cloud resources in scientific computing applications (e.g., AWS, Azure).
For full details, including a full job description, please contact our recruitment partner, Paul Hubbard at Eutopia Solutions at email@example.com
Eutopia Solutions Ltd ("Eutopia") is acting as an Employment Agency in relation to this vacancy.
Eutopia is an equal opportunities employer and positively encourages applications from any suitably qualified and eligible candidates.