Senior Incident Manager for Trading Platform | Remote Role

Remotely
Full-time
Are you a technical troubleshooter with exceptional analytical skills and a passion for maintaining high-availability trading systems? We're seeking a dedicated Senior Incident Manager to oversee our sophisticated trading platform operations and ensure 24/7 availability across all environments. Your expertise in identifying, resolving, and preventing incidents will be crucial for maintaining our system's reliability and performance. Key Responsibilities: - Monitor production reporting systems with vigilance, solving real-time problems while maintaining 99.9%+ uptime for business-critical trading applications. - Identify and resolve incidents through comprehensive log analysis, performance metric evaluation, and service interaction assessment, then coordinate with development teams to implement permanent solutions. - Manage the entire build, release, and configuration lifecycle for production applications using modern CI/CD pipelines and version control systems. - Deploy, automate, and maintain AWS cloud-based infrastructure, optimizing for availability, performance, scalability, and security to support trading operations. - Oversee development and QA environments to ensure consistency and reliability across the entire technology ecosystem. - Analyze system metrics and application performance patterns, creating detailed reports and actionable recommendations for technology improvements. - Collaborate with cross-functional teams to enhance system reliability and systematically reduce incident frequency through preventative measures. Required Skills and Experience: - 1+ years of experience designing, analyzing, troubleshooting, and resolving issues in multi-tiered application architectures. - Demonstrated expertise supporting service-oriented and microservices architectures that demand 24/7 availability. - Proficiency with SQL queries and advanced database troubleshooting techniques. - Working knowledge of Oracle (PL/SQL 19c or newer) and/or PostgreSQL (version 14+) database systems. - Linux fundamentals including command-line utilities (awk, sed, bash, cat, grep) and system monitoring tools. - Practical experience with AWS services including VPC, EC2, ECS, Route53, S3, and related cloud infrastructure. - Proficiency with Git version control systems and enterprise branching strategies. - Strong understanding of networking concepts, protocols, and systematic troubleshooting methodologies. - Exceptional analytical skills with demonstrated ability to identify root causes in complex, interconnected systems. - Outstanding communication skills to coordinate incident response across multiple technical teams. Nice to Have: - Advanced Linux system administration and web server (Nginx, Tomcat) configuration experience. - Hands-on experience with modern DevOps tools such as Docker, Kubernetes, Jenkins, GitLab-CI, and Terraform. - Understanding of JVM configuration parameters and runtime optimization techniques. - Knowledge of modern API technologies including RESTful services, GraphQL, and gRPC protocols. - Background in high-load application implementation, scaling, and performance optimization. - Software engineering experience, particularly in financial markets, Forex trading, or gaming industries. - Proficiency with JIRA for incident tracking, workflow management, and cross-team coordination. - Experience with the ELK stack (Elasticsearch 7.x+, Logstash, Kibana) for comprehensive log management. - Familiarity with enterprise monitoring tools such as Zabbix, Prometheus with Grafana, or similar platforms. - Working knowledge of message brokers including Apache Kafka, AWS SQS/SNS, and Enterprise Service Bus implementations. - Scripting abilities in Bash, Python 3.x, or other automation languages for routine task elimination. Why Join Our Team: Working with us means taking ownership of mission-critical systems that power sophisticated trading operations worldwide. You'll expand your technical expertise across multiple domains while working in a flexible, remote environment with a team of dedicated professionals. We offer competitive compensation, continuous professional development opportunities, and the chance to make a significant impact on a high-performance financial technology platform that processes millions of transactions daily.