Staff Software Engineer, Capacity Engineering
Job Summary
This role involves managing and optimizing Pinterest's large-scale ML infrastructure, with a focus on efficiency and capacity management. The ideal candidate should have expertise in GPU architectures, ML software stacks, and cloud-native platforms like Kubernetes and AWS. Strong software development skills in Java, Python, and C++ are required, along with experience in distributed applications and performance engineering. The position emphasizes collaboration across engineering teams and supports flexible, inclusive work arrangements.
Required Skills
Benefits
Job Description
About Pinterest:
Millions of people around the world come to our platform to find creative ideas, dream about new possibilities and plan for memories that will last a lifetime. At Pinterest, we’re on a mission to bring everyone the inspiration to create a life they love, and that starts with the people behind the product.
Discover a career where you ignite innovation for millions, transform passion into growth opportunities, celebrate each other’s unique experiences and embrace the flexibility to do your best work. Creating a career you love? It’s Possible.
Pinterest is seeking a Staff Software Engineer, Capacity Engineering focused on managing and optimizing the ML infrastructure. The team is responsible for efficiently managing one of the largest-scale cloud-native infrastructures in the world.
This role is highly impactful, as efficiency is an ongoing strategic priority for Pinterest. The role has direct visibility across Pinterest Engineering and with Engineering and company leadership. The team is looking for a candidate with a strong background in ML Infrastructure focusing on efficiency and optimization.
What you’ll do
- Manage the ML hardware capacity that powers the models running at Pinterest
- Improve the efficiency of ML Infrastructure at Pinterest
- Build develop and mature profiling and optimization capabilities for ML Infrastructure at Pinterest scale
- Collaborate with ML Platform, Infrastructure Engineering and SRE teams in their mission to deliver highly available, resilient, secure and efficient ML foundations for Pinterest’s tech stack
What we’re looking for:
- Deep understanding of GPU Architectures, Pytorch, etc.
- Deep understanding of supporting parts of ML software stack like Scheduling, Data and Storage
- Hands on experience with shared platforms like Kubernetes
- Strong technical and performance engineering skills to collaborate with stakeholders on complex and ambiguous technical challenges
- Experience building and managing highly available distributed applications at scale
- Proficiency in software development languages such as Java, Python and C++
- Excellent skills in communicating complex technical issues
- Understanding of ML Models, Kernels and optimization opportunities
- Hands-on experience with large, cloud-native multi-tenant platforms at Internet scale
- Experience with AWS or similar cloud environments
- Deep understanding of infrastructure capacity and performance
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
In-Office Requirement Statement:
We let the type of work you do guide the collaboration style. That means we're not always working in an office, but we continue to gather for key moments of collaboration and connection.
- This role will need to be in the office for in-person collaboration 1-2 times/quarter and therefore can be situated anywhere in the country.
Relocation Statement:
- This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model.
#LI-HYBRID
At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. The position is also eligible for equity. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise.
Information regarding the culture at Pinterest and benefits available for this position can be found here.
Our Commitment to Inclusion:
Interested in this job?
Applications are no longer being accepted for this job.
Safe Remote Job Search Tips
Verify Employer Thoroughly
Research the company's identity thoroughly before applying. Check for a professional website with contacts, active social media, and LinkedIn profiles. Verify details across platforms and look for reviews on Glassdoor or Trustpilot to confirm legitimacy.
Never Pay to Get a Job
Legitimate employers never require payment for applications, training, background checks, or equipment. Always reject upfront payment requests or demands for bank details, even if they claim it's for purchasing necessary work gear on your behalf.
Safeguard Your Personal Information
Protect sensitive data like SSN, bank details, or ID copies. Share this only after accepting a formal, written job offer. Ensure it's submitted via a secure company system or portal, never through insecure channels like standard email attachments.
Scrutinize Communication & Interviews
Watch for communication red flags: poor grammar, generic emails (@gmail), vague details, or undue pressure. Be highly suspicious of interviews held only via text or chat apps; legitimate companies typically use video or phone calls.
Beware of Unrealistic Offers
If an offer's salary or benefits seem unrealistically high for the work involved, be cautious. Research standard pay for similar roles. Offers that appear 'too good to be true' are often scams designed to lure you into providing information or payment.
Insist on a Formal Contract
Always secure and review a formal, written job offer or employment contract before starting work or sharing final personal details. Ensure it clearly defines your role, compensation, key terms, and conditions to avoid misunderstandings or scams.