LINUX SERVER GPU ENGINEER - TS/SCI WITH SECURITY CLEARANCE
Company: Xcelerate Solutions
Location: Bethesda
Posted on: November 16, 2024
|
|
Job Description:
Description Linux Server/NVidia Admin/ GPU Engineer - TS/SCI
Xcelerate Solutions is seeking a Linux Server GPU Engineer position
to support the National Media Exploitation Center (NMEC). This role
requires an individual that has technical experience with
administering Nvidia DGX1 and A100 servers within a within a
physical and virtual environment. This individual should be detail
oriented in order to capture customer inquiries appropriately. This
role is responsible for interacting with administrators to handle
service inquiries and problems. Duties include examining customer
problems and implementing appropriate corrective action to initiate
a repair or return to service. This role analyzes recurring
problems and initiates solutions for preventing reoccurrence and
analyzes existing infrastructure for tuning/performance
enhancements. The individual will provide systems and software
operations and maintenance support in a large, multi-enclave
enterprise environment. This individual will work in a team
environment to ensure mission needs are met and ensure
functionality of capabilities of customers. Individuals in this
role may be required to perform technical software configuration,
rebooting, and other remedial actions on customer servers. The
Customer utilizes an Agile Framework to plan and successfully
complete all initiatives. The work location is in Bethesda at the
Intelligence Community Campus. Security Clearance:
TS/SCI Location:
Bethesda, MD Responsibilities: * GPU Architecture and Design:
Collaborate with a multidisciplinary team to define, develop, and
optimize GPU architectures, ensuring they meet stringent
performance, power efficiency, and feature requirements. Leverage
industry insights to drive design decisions. Ensure that GPU
designs and integrations are not only optimized for Linux but are
also adaptable to other operating systems. * Operating System
Integration: Work closely with operating system developers to
ensure smooth GPU integration with Linux-based systems. Optimize
GPU drivers for compatibility, performance, and reliability in a
Linux environment. Provide regular maintenance and updates to
ensure continued compatibility. * Hardware Expertise: Contribute to
the design and development of GPU hardware, providing insights into
hardware architecture to ensure efficient interaction with software
components. Maintain and update hardware designs as needed. * CUDA
(Compute Unified Device Architecture) /OpenCL (Open Computing
Language) Programming: Develop and optimize applications using CUDA
or OpenCL, harnessing the full potential of GPU hardware for
parallel processing, high-performance computing, and machine
learning on Linux platforms. Maintain and update software for
optimal performance. * Performance Analysis: Analyze GPU
performance, identify bottlenecks, and develop strategies to
enhance performance across various applications in Linux,
addressing both hardware and software considerations. Regularly
monitor and improve performance. * GPU Tooling: Create and maintain
debugging tools, profiling utilities, and performance analysis
software tailored for Linux systems to facilitate efficient GPU
development and troubleshooting. Keep tools up-to-date and
functional. * Power Efficiency: Work on power management techniques
to optimize GPU power consumption, ensuring efficient operation on
both mobile and desktop Linux platforms. Continuously assess and
enhance power efficiency strategies. * Testing and Validation:
Design and execute tests to validate GPU performance and
functionality on Linux, including stress testing, benchmarking, and
debugging to ensure robust operation. Maintain and expand the
testing suite. * Documentation: Maintain comprehensive technical
documentation, including architectural specifications, code
documentation, and Linux-specific best practices for GPU
development. Keep documentation up-to-date with changes and
improvements. * Industry Insight: Stay updated on the latest
trends, innovations, and competitive landscapes within the GPU
industry, contributing to research efforts and proposing
Linux-specific approaches to GPU design and optimization. Share
regular updates and insights with the team. Minimum Requirement *
Bachelor's or higher degree in Computer Science, Electrical
Engineering, or a related field. Additional years of experience may
be considered in lieu of a degree. * 10+ years of relevant systems
engineering experience * Proven experience in GPU architecture
design, and GPU performance optimization. * Expertise in operating
system integration for Linux. * Strong understanding of computer
hardware architecture, particularly as it relates to Linux systems.
* Knowledge of parallel computing, graphics algorithms, and
real-time rendering in Linux environments. * Familiarity with GPU
debugging tools and profiling software for Linux. * Excellent
problem-solving skills and the ability to collaborate within a
team. * Strong communication skills for conveying technical
information in a Linux context. * Proficiency with scripting
languages such as Python or BASH. * Proficiency with automation
tools such Ansible, Puppet, Salt, Terraform, etc. * Candidate must,
at a minimum, meet DoD 8570.11- IAT Level II certification
requirements (currently Security+ CE, CCNA-Security, GICSP, GSEC,
or SSCP along with an appropriate computing environment (CE)
certification). An IAT Level III certification would also be
acceptable (CASP+, CCNP Security, CISA, CISSP, GCED, GCIH, CCSP).
Preferred Qualification * Published research or contributions in
the GPU industry, especially related to Linux. * Experience with
machine learning and neural network frameworks on GPUs in Linux. *
Knowledge of GPU virtualization, cloud computing, and emerging
Linux-based technologies in the field. * Proficiency in programming
languages such as GPU-specific languages. * Experience with
container technologies (Docker, Kubernetes) * Experience with
Prometheus/Grafana for monitoring * Knowledge of distributed
resource scheduling systems [Slurm (preferred), LSF, etc.] *
Familiarity with CUDA and managing GPU-accelerated computing
systems * Basic knowledge of deep learning frameworks and
algorithms About Xcelerate Solutions: Founded in 2009 and
headquartered in McLean, VA, Xcelerate Solutions
(www.xceleratesolutions.com) is one of America's fastest-growing
companies. Xcelerate's culture is defined by our diversified
workforce of dynamic and versatile professionals, supported with
growth and development opportunities that contribute to individual
and company growth. This strong commitment to our employees has
been recognized by our inclusion on the Washington Business
Journal's 50 Best Places to Work list as well as being a Great
Place to Work certified company with a 4.6 star, and a 99% CEO
approval Glassdoor rating. Come find out why Xcelerate Solutions is
one of the DC Metro top employers! Xcelerate Solutions is an Equal
Employment Opportunity/Affirmative Action Employer. We evaluate
qualified applicants without regard to race, color, national
origin, religion, age, equal pay, disability, veteran status, sex,
sexual orientation, gender identity, genetic information, or
expression of another protected characteristic. As part of this
commitment to the full inclusion of all qualified individuals,
Xcelerate provides reasonable accommodations if needed because of
an applicant's or an employee's disability. Pay Transparency
Notice: Xcelerate Solutions will not discharge or in any other
manner discriminate against employees or applicants because they
have inquired about, discussed, or disclosed their own pay or the
pay of another employee or applicant.
Keywords: Xcelerate Solutions, Silver Spring , LINUX SERVER GPU ENGINEER - TS/SCI WITH SECURITY CLEARANCE, IT / Software / Systems , Bethesda, Maryland
Click
here to apply!
|