Multicloud MSP for a Major Restaurant Operator’s AWS Environment

Through the continuous Multicloud operating model, we successfully reduced the average incident response time, increased the availability of critical services, and provided greater cost predictability on AWS by implementing FinOps practices and capacity reviews.

Ferramentas da Dedalus para uma arquitetura de dados multicloud sustentável

About the Customer  

Our client is one of Brazil’s largest operators in the food service sector, managing hundreds of restaurants for iconic international brands across all Brazilian states. The company is renowned for its consistent growth and strong commitment to operational efficiency, quality service, and sustainability. By continuously investing in technology and processes, from ingredient selection to final delivery, the company has secured its leadership position in the market. 

Customer Challenge 

 With distributed environments across multiple clouds (AWS, GCP, and Azure), the customer needed to expand their support scope to a 24×7 model that would standardize monitoring, ITSM, and incident management in a single, unified operation. The primary objectives were to integrate corporate monitoring tools, define clear SLAs by criticality, structure crisis management, and reinforce business continuity through robust backup and disaster recovery solutions, all without interrupting business during the transition. The previous scenario presented risks of increased incident recovery times due to a lack of standardization, fragmented visibility across clouds, rising operational costs without proper governance, and greater exposure to critical incidents due to the absence of a formal escalation and communication process. 

Dedalus Solution 

 Dedalus structured a 24×7 Multicloud MSP support service, delivering proactive environment monitoring along with comprehensive support and service management, using processes and tools fully integrated into the customer’s environment. Our operation encompasses crisis management, performance management, and business continuity, supported by a dedicated team, defined workflows, and recurring reports.  

For the AWS environment specifically, our MSP service provided management and optimization for a wide range of services. This included Compute and Containers (Amazon EC2, Amazon EKS) where we implemented capacity adjustments and high availability; Storage and Data (Amazon S3, Amazon EFS, Amazon RDS, Amazon DynamoDB/DocumentDB, Amazon ElastiCache) where we aligned with the customer’s backup and continuity policies; Networking and Security (Amazon VPC, ELB, Amazon CloudFront, Amazon Route 53, AWS WAF, AWS KMS) with security best practice reviews; and Serverless and Integration (Amazon API Gateway, AWS Lambda, Amazon SQS, Amazon SNS) with a focus on observability and performance tuning. 

For monitoring and observability, the solution leverages the client’s corporate tools (Zabbix, Grafana, Dynatrace) as telemetry sources, which enables automatic ticket generation and support for building APM dashboards. The service process follows a clear prioritization and escalation flow with aggressive SLAs, structured communication to stakeholders during crisis events, and a formal process for root cause analysis (RCA) to drive continuous improvement. 

Results and Benefits 

Through the continuous Multicloud operating model, we successfully reduced the average incident response time, increased the availability of critical services, and provided greater cost predictability on AWS by implementing FinOps practices and capacity reviews. The 24×7 operation is supported by a robust escalation process via Microsoft Teams and telephone, ensuring agility and efficiency. The most significant result was the reduction in the average response time to incidents, which decreased from a contractual limit of 15 minutes to an actual average of 8 minutes—an improvement of over 40%. 

Veja mais cases