strong> [clear filter]
arrow_back View All Dates
Tuesday, October 29

07:30 GMT

Badge Pickup
Tuesday October 29, 2024 07:30 - 17:00 GMT
Tuesday October 29, 2024 07:30 - 17:00 GMT
Ground Floor Foyer

08:45 GMT

Morning Coffee and Tea
Tuesday October 29, 2024 08:45 - 08:45 GMT
Tuesday October 29, 2024 08:45 - 08:45 GMT
The Forum

08:45 GMT

Opening Remarks
Tuesday October 29, 2024 08:45 - 09:00 GMT
Tuesday October 29, 2024 08:45 - 09:00 GMT
The Liffey

09:00 GMT

Dude, You Forgot the Feedback: How Your Open Loop Control Planes Are Causing Outages
Tuesday October 29, 2024 09:00 - 09:45 GMT
Laura de Vesine, Datadog, Inc.

It's a strong principle of good UX design that users should get feedback about the results of their actions, to help prevent errors. Experienced SREs know to build in additional observability to systems to watch our systems change as we mutate them, but these are typically out-of-band and require a conscious, deliberate action to observe -- so getting good feedback into our actions requires constant vigilance and training of new users. What if we instead built control planes that tell us exactly what we've done, and what effect that is having?
This talk explores various patterns of "fire and forget" control planes in production systems, how each one contributes to outages, and some simple solutions to build better tools for operations.

avatar for Laura de Vesine

Laura de Vesine

Datadog, Inc.
Laura de Vesine is a 20+ year software industry veteran. She has spent the last 8 years in SRE working in incident analysis and prevention, chaos engineering, and the intersection of technology and organizational culture. Laura is currently a staff engineer at Datadog, Inc. She also... Read More →
Tuesday October 29, 2024 09:00 - 09:45 GMT
The Liffey

09:45 GMT

You Depend on Time, This Is How It Works and You Won’t Believe It
Tuesday October 29, 2024 09:45 - 10:30 GMT
Philip Rowlands, Jane Street

This is a talk about calendars, clocks, and computers. We’ll look at the metrology of the second, from candles to atoms, and consider how your phone always seems to know the right time.

If you’ve ever wondered why is today Thursday? or how was the Gregorian calendar adopted? then come and learn the mistakes to avoid the next time you are the Pope.

If you’ve ever wondered why do these two clocks disagree? then come and learn about the challenges of finding the elusive perfect tick, and why it’s not at the top of Mount Everest.

And if you’ve ever wondered how calendars and clocks work together in modern computer systems, then come and learn about protocols and APIs for keeping clocks reliable and accurate.

avatar for Philip Rowlands

Philip Rowlands

Jane Street
Philip Rowlands has been an SRE since before he really understood what it meant. He has worked over the years on automated telephony, Google Production SRE, Mainframe Linux, and more recently for various financial firms, all of which had timekeeping challenges.
Tuesday October 29, 2024 09:45 - 10:30 GMT
The Liffey

10:30 GMT

Coffee and Tea Break
Tuesday October 29, 2024 10:30 - 11:00 GMT
Tuesday October 29, 2024 10:30 - 11:00 GMT
The Forum

11:00 GMT

SRE Saga: The Song of Heroes and Villains
Tuesday October 29, 2024 11:00 - 11:40 GMT
Daria Barteneva, Microsoft Azure

SRE team require a balance of technical and soft skills, creativity and teamwork to be successful. Drawing parallels between the roles, challenges and dynamics of Dungeons and Dragons party and an SRE team will help us to explore SRE journey from the team inception to developing ideal makeup in terms of tenure/seniority, skillset and align it with the context SRE team could be part of.

We will share practical examples that helps SRE teams building resiliency and effective collaboration while dealing with challenges. We will also explore different mechanisms that can channel "super hero" energy to make team stronger and nurture the talent, helping team to keep the balance of distributed knowledge and accountability.

In this talk we will discuss:

  • Examples of functional SRE team setups

  • Common challenges SRE team may encounter

  • Developing early in career SRE

  • Dealing with the change and building resilience

  • Identifying red flags and avoiding long term problems

avatar for Daria Barteneva

Daria Barteneva

Microsoft Azure
Daria is a Principal Site Reliability Engineer in Observability Engineering in Azure. With a background in Applied Mathematics, Artificial Intelligence, and Music, Daria is passionate about machine learning, diversity in tech, and opera. In her current role, Daria is focused on changing... Read More →
Tuesday October 29, 2024 11:00 - 11:40 GMT
The Liffey A

11:00 GMT

I Can OIDC You Clearly Now: How We Made Static Credentials a Thing of the Past
Tuesday October 29, 2024 11:00 - 11:40 GMT
Iain Lane and Dimitris Sotirakis, Grafana Labs

At Grafana Labs, we tackled a thorny problem: managing secrets in an open-source CI/CD pipeline. Our journey from static secrets to OIDC-based access wasn't just about better security—it was about empowering our engineers. We'll walk you through how we leveraged OIDC and GitHub Actions to create a "secretless" system for accessing cloud resources, complete with shared jobs and abstractions that make secure access simple. But it wasn't all smooth sailing. We'll share war stories, including a security hiccup that taught us valuable lessons. If you're drowning in a sea of secrets or just want to sleep better at night, come and learn how we boosted security while cutting operational headaches. You'll walk away with practical strategies for implementing OIDC-based access that'll make your engineers happy and your security team even happier.

avatar for Iain Lane

Iain Lane

Grafana Labs
Iain is a senior software engineer at Grafana Labs. A member of the Platform team, his focus is on maintaining the infrastructure - Kubernetes clusters - which runs Grafana Cloud, and helping build tools and processes for engineers to deploy their software into this environment with... Read More →
avatar for Dimitris Sotirakis

Dimitris Sotirakis

Grafana Labs
Dimitris is a Senior Software Engineer with background in Backend, DevOps, Release and Platform Engineering. Specialized in CI/CD architecture, he has spent most of his career tackling the challenges of delivering software, tools and frameworks with quality. Currently he’s a member... Read More →
Tuesday October 29, 2024 11:00 - 11:40 GMT
The Liffey B

11:00 GMT

Discussion: Managing Cost
Tuesday October 29, 2024 11:00 - 12:30 GMT
John Looney, Reddit, and James Beal

This session is an opportunity for people to come together and discuss managing cost, facilitated by our knowledgeable guides. This is not a prepared talk or workshop—expect a less-formal session with plenty of opportunity to ask questions and to talk to other attendees who are interested in managing cost.

avatar for James Beal

James Beal

James started playing with computers with the ZX81, learned C for his A Levels, and has degrees in computer science and parallel and distributed systems. He has been using Linux originally with MCC Interim Linux and later with other distributions. He started volunteering at the OTW... Read More →

John Looney

John is a platform engineer who helps senior engineers tune their applications to cost less, and makes Kubernetes cost less to run. Both projects required making promises to product teams - “that the compute platform will be reliable enough that they don’t need to pad out resources... Read More →
Tuesday October 29, 2024 11:00 - 12:30 GMT
Liffey Hall 1

11:00 GMT

Workshop: Loadshedding and Isolation Using Envoy Proxy
Tuesday October 29, 2024 11:00 - 15:30 GMT
Laura Nolan; Niall Murphy, Stanza

Effective load management is a core aspect of the SRE role. In this workshop, participants will be introduced to a number of Envoy proxy features that are used for loadshedding and isolation, such as circuit breaking, adaptive concurrency, and ratelimiting. Participants will also use custom Go plugins to perform loadshedding. As part of the practical element of the workshop, participants will interact with Envoy configurations and status/control pages and endpoints, as well as Envoy’s telemetry.

avatar for Laura Nolan

Laura Nolan

Laura Nolan has contributed to several books on SRE, such as the Site Reliability Engineering book, Seeking SRE, and 97 Things Every SRE Should Know, and is currently is completing her MSc in Human Factors and Systems Safety at Lund University. Laura is a member of the USENIX board... Read More →
avatar for Niall Murphy

Niall Murphy

Niall is the CEO of Stanza Systems, has occupied various engineering and leadership roles in Microsoft, Google, and Amazon, and is the instigator of the best-selling & prize-winning Site Reliability Engineering, which he hopes at some stage to live down. His most recent book is Reliable... Read More →
Tuesday October 29, 2024 11:00 - 15:30 GMT
Liffey Hall 2

11:50 GMT

The Frontiers of Reliability Engineering
Tuesday October 29, 2024 11:50 - 12:30 GMT
Heinrich Hartmann, Zalando SE

We take the 10s anniversary of SRECon as an occasion to reflect over the past decade of advancements in Reliability Engineering and provide an overview about the Frontiers we are facing today. Within Zalando we followed major trends of the industry in outsourcing hardware provisioning to AWS, package applications into Docker images, fully automated deployments (CI/CD), and implemented Distributed Tracing for Microservice Observability. Despite these advances, many challenges remain in building reliable, observable software systems and new areas arose which require new methods and tools. In the talk we are proving a number of conceptual view that help to map out the larger Reliability Engineering landscape and zone-in on 3 specific frontiers that we are actively investing in at Zalando: (1) Data Operations and Monitoring Event Based Systems (2) Mobile Observability (3) Effective Management Practices for Reliability.

avatar for Heinrich Hartmann

Heinrich Hartmann

Zalando SE
Heinrich Hartmann is a seasoned expert with a decade of experience in Reliability Engineering. Currently, he serves as the Senior Principal SRE at Zalando, a leading European e-commerce company, where he oversees company-wide reliability practices. Before joining Zalando, Heinrich... Read More →
Tuesday October 29, 2024 11:50 - 12:30 GMT
The Liffey A

11:50 GMT

OMG WTF SSO: A Beginner’s Guide to Single Sign-On (Mis)configuration
Tuesday October 29, 2024 11:50 - 12:30 GMT
Adina Bogert-O'Brien

SSO protocols are just ways for an identity provider to share information about an authenticated identity with another service. Me having a way to tell my vendor “yeah, that’s Bob” doesn’t tell me what the vendor does with this information, or if the vendor always asks me who’s coming in the door. A bad SSO implementation can make you think you’re safer, while hiding all the new and fun things that have gone wrong.
To get the most out of implementing SSO, I need to know what I’m trying to accomplish and what steps I need to follow to get there. To illustrate why SSO needs to be set up carefully, for each of the things you need to do right, I’ll give you some fun examples of creative ways you and your vendor can do this wrong. We all learn from failure, right???

avatar for Adina Bogert-O'Brien

Adina Bogert-O'Brien

I am incessantly curious, work in renewable energy, and sometimes find vulnerabilities when I’m bored. I co-founded a hackerspace over a decade ago but have only just accepted that security is more than a hobby. At work, I’m a business architect with security leanings working... Read More →
Tuesday October 29, 2024 11:50 - 12:30 GMT
The Liffey B

12:30 GMT

Tuesday October 29, 2024 12:30 - 14:00 GMT
avatar for Cortex


Cortex helps engineering teams understand and improve their services. By aggregating data from tools like Datadog and Okta, we help teams understand their architecture at a glance – everything from ownership to runbooks. Using this data, we enable engineers to build report cards... Read More →
Tuesday October 29, 2024 12:30 - 14:00 GMT
The Forum

14:00 GMT

Sailing the Database Seas: Applying SRE Principles at Scale
Tuesday October 29, 2024 14:00 - 14:40 GMT
Ioannis Androulidakis and Martin Alderete, Booking.com

In this talk we will demonstrate how we apply core SRE principles in the field of Database Engineering. More specifically, we will talk about the challenges of operating large-scale database systems in multiple cloud environments and how adopting best SRE practices dramatically improved our daily workflows and operations.

We will share insights and concrete use cases around the following topics: Monitoring Distributed Systems, Eliminating Toil and Postmortem Culture.

This talk will equip attendees with ideas and guidelines to better understand and efficiently operate their database systems such as choosing the right SLIs and SLOs, automating capacity planning and embracing a postmortem culture after outages.

avatar for Ioannis Androulidakis

Ioannis Androulidakis

Ioannis Androulidakis is a Site Reliability Engineer with a strong background and multiple years of experience in Operating Systems, Observability Tools and Cloud Platforms. He is passionate about OSS technologies and has contributed to multiple open-source projects over the years.Ioannis... Read More →
avatar for Martin Alderete

Martin Alderete

Martin Alderete is a Principal Site Reliability Engineer with a long track record in Engineering, Distributed Systems and System Level Programming in both the academia where after getting his degree he worked as teacher assistant. And the industry where he led different teams building... Read More →
Tuesday October 29, 2024 14:00 - 14:40 GMT
The Liffey A

14:00 GMT

Achieving Excellence: SLO Thresholds That Transform Service Quality
Tuesday October 29, 2024 14:00 - 14:40 GMT
Thiara Ortiz, Netflix

At Netflix, ensuring exceptional quality for our streaming platform is crucial. Every time a Netflix member sits down, reclines in their chair, and turns on their TV, it's a moment of truth. It's our opportunity to deliver a spectacular service with amazing quality of experience. Misses, errors, or high latency—whether due to ISP configuration changes, code deployment, or catastrophic fallback—impact how our service is perceived.

In this talk, I'll share methods for defining thresholds for SLOs, ranging from intuition and industry best practices to advanced techniques like A/B experimentation. At Netflix, properly defining SLOs allows us to ensure industry-leading quality of experience for our members.

avatar for Thiara Ortiz

Thiara Ortiz

Thiara is a Staff CDN Reliability Engineer at Netflix. Over the last four years, Thiara has been working on Open Connect, improving the resilience of the Netflix service for members around the world. Most recently, Thiara has been heavily involved with the introduction of Cloud Gaming... Read More →
Tuesday October 29, 2024 14:00 - 14:40 GMT
The Liffey B

14:00 GMT

Discussion: eBPF
Tuesday October 29, 2024 14:00 - 15:30 GMT
Cameron Howes, Goldman Sachs, and Daniel Hodges

This session is an opportunity for people to come together and discuss eBPF, facilitated by our knowledgeable hosts. This is not a prepared talk or workshop—expect a less-formal session with plenty of opportunity to ask questions and to talk to other attendees who are interested in eBPF.

avatar for Cameron Howes

Cameron Howes

Goldman Sachs
Cameron Howes is an Analyst in the Market Data SRE team at Goldman Sachs, specialising in low-level development and performance instrumentation. When he's not ferociously avoiding a memory allocation, or reading about the latest CVEs, Cameron can be found writing black-box probers... Read More →
avatar for Daniel Hodges

Daniel Hodges

Daniel Hodges is a software engineer that works at Meta on profiling and scheduling. He has worked as a site reliability engineer, production engineer and has experience with observability, profiling and production deployments.
Tuesday October 29, 2024 14:00 - 15:30 GMT
Liffey Hall 1

14:45 GMT

Survivor: MySQL Island – Outwit, Outplay, Outlast Metadata Locking Challenges
Tuesday October 29, 2024 14:45 - 15:05 GMT
Julia Jablonska, Capsule CRM

Think you understand MySQL metadata locks? Join this interactive session to test your knowledge and take a deep dive into the intricacies of MySQL's locking mechanisms.

We'll explore real-world scenarios, such as creating tables with foreign key constraints and adding indexes, to see how metadata locks can impact performance and stability. Through live voting you'll gain insights into what's happening behind the scenes and learn practical tips for managing database migrations.

avatar for Julia Jablonska

Julia Jablonska

Capsule CRM
As an Infrastructure Engineer at Capsule CRM, Julia is responsible for keeping Capsule secure, fast and reliable for thousands of our business customers around the globe.
Tuesday October 29, 2024 14:45 - 15:05 GMT
The Liffey A

14:45 GMT

Selective Reliability Engineering: There Is No Single Source of Truth
Tuesday October 29, 2024 14:45 - 15:05 GMT
Elise Burke, Datadog, Inc.

As engineers we design distributed architectures, define project scopes, and ensure that we have a single "source of truth". But what, exactly, do we mean by the phrase? Do we really have only one source of truth - and for that matter, how do we decide what it is?

We'll look at some well-known ambiguities in system design and data modeling and then consider more philosophical questions about truth, the sources of truth we accept, and why this ambiguity matters.

avatar for Elise Burke

Elise Burke

Datadog, Inc.
Elise's sixteen year career as a software and site reliability engineer includes supporting Google's internal distributed storage systems and Datadog's organization-wide production practices. Her interests include exploring the interconnectedness of both technology and the people... Read More →
Tuesday October 29, 2024 14:45 - 15:05 GMT
The Liffey B

15:10 GMT

Fixing Your Noisy Pager in 500 Easy Steps
Tuesday October 29, 2024 15:10 - 15:30 GMT
Chris Sinjakli, PlanetScale

You're not sure when it happened, but your pager suddenly seems noisy. You've started dreading your on-call shifts before they begin. You breathe a sigh of relief every time you sleep without interruption. Sound familiar?

Noisy on-call rotas sneak up on us one page at a time - an edge case in a new feature, an alert with too many false positives, processes that get stuck and need restarting. Each of these is easy to tolerate alone, but they quickly add up, leaving you swamped in alert noise and tired from missed sleep.

In this talk we'll explore techniques for digging ourselves out of the hole. We'll look at how to demonstrate the scale of the issue to our colleagues, what to do when the list of problems seems insurmountable, and how to get started with automated remediation in a low-risk way - I promise it's less scary than it sounds.

avatar for Chris Sinjakli

Chris Sinjakli

Chris enjoys working on the strange parts of computing where software and systems meet. He especially likes the challenges of databases and distributed systems.All his programs are made from organic, hand-picked, artisanal keypresses.
Tuesday October 29, 2024 15:10 - 15:30 GMT
The Liffey A

15:10 GMT

Why You’re (Probably) Doing Service Catalogs Wrong
Tuesday October 29, 2024 15:10 - 15:30 GMT
Lisa Karlin Curtis, incident.io
Service catalogs promise a lot of things: powerful automations, insights into your technology estate.
But over the last few years, many of us have learned that setting up and maintaining a service catalog is really hard.
Building out a catalog from a standing start can take months, or even years. Too many people get stuck in a chicken-and-egg situation, where you can’t deliver value because you don’t have the data in your catalog, and you can’t convince anyone to spend time helping you because the catalog doesn’t do anything yet.
But there is another way...
avatar for Lisa Karlin Curtis

Lisa Karlin Curtis

Lisa started out as a consultant working with HMRC and then smart meters, before accidentally becoming a developer. She was a founding engineer at incident.io, building tooling to help your whole organization manage incidents better. She loves building stuff, but is also really interested... Read More →
Tuesday October 29, 2024 15:10 - 15:30 GMT
The Liffey B

15:30 GMT

Coffee and Tea Break
Tuesday October 29, 2024 15:30 - 16:00 GMT
Tuesday October 29, 2024 15:30 - 16:00 GMT
The Forum

16:00 GMT

Exploring the Unintended Consequences of Automation in Software
Tuesday October 29, 2024 16:00 - 16:40 GMT
Courtney Nash, The VOID

Automation is ubiquitous—it is entwined in our daily lives in ways that we aren’t always aware of. It has been woven into all aspects of modern software by being presented as a utopian vision: a way of making human lives easier, doing repetitive tasks faster and with fewer errors, freeing us fallible humans up to do other ostensibly more important work. But anyone who has worked directly with automated systems knows that we are still very far from such a dreamy reality.

This talk delves into detailed research about how automation is involved in software incidents. My focus on this area stems from the growing portrayal of automation as a panacea for various software incident issues, despite its limitations in effectively addressing these challenges, such as reliable detection and resolution of software issues or analyzing and disseminating learnings from these incidents back into the organization and its products and services.

Drawn directly from public incident reports (collected in the VOID), this research revealed multiple, often competing, roles that automation can play over the course of an incident, and most importantly underscored how important humans are at understanding, troubleshooting, and recovering from automated software issues. If you're struggling to convey the reality behind the hype of automation and AI to others on your team or at your organization, this is the talk for you.

avatar for Courtney Nash

Courtney Nash

Courtney Nash is a researcher focused on system safety and failures in complex sociotechnical systems. An erstwhile cognitive neuroscientist, she has always been fascinated by how people learn, and the ways memory influences how they solve problems. Over the past two decades, she’s... Read More →
Tuesday October 29, 2024 16:00 - 16:40 GMT
The Liffey A

16:00 GMT

SRE Stakeholders: A Spotter’s Guide
Tuesday October 29, 2024 16:00 - 16:40 GMT
Dave O'Connor

For Every SRE or SRE-adjacent team in any organisation, there are many kinds of stakeholders; people who care (or don't care!) about how your team operates, and the outcomes of that. They differ massively in how they view your team, and in how they, in turn, should be viewed, and managed.

In a timeline that doesn't contain a canonical book setting out what SRE is here for and how it achieves that, the sad and annoying answer is that "it depends". Because of this, we need to get good (or remain good) at stakeholder management and communications about why we're here, and what we do.

While primarily useful to SRE leadership, the kinds of stakeholders you run into can be useful to know for any SRE. Learn to spot the different stakeholders in your life, what they (generally) care about, and how you can help reduce misunderstandings and tension, no matter where you're sitting.

avatar for Dave O'Connor

Dave O'Connor

Dave is an SRE Leadership practitioner, Advisor and Coach based in Dublin. He's been working on SRE and SRE-adjacent organisations for over 20 years, primarily as an SRE Lead at Google from 2004-2021. Since then, he has spent time leading SRE, Security and Infrastructure teams at... Read More →
Tuesday October 29, 2024 16:00 - 16:40 GMT
The Liffey B

16:00 GMT

Enhancing Elasticsearch Performance: Innovative Reindexing Strategies Using Dedicated Nodes and KEDA Autoscalers
Tuesday October 29, 2024 16:00 - 16:40 GMT
Leila Vayghan, Shopify

This talk is about enhancing the search infrastructure of Shopify, a large-scale ecommerce platform that supports over 3 million merchants and handles more than two petabytes of data.

This talk explains how we leverage Kubernetes on Google Cloud Platform to ensure high availability and performance, crucial for maintaining our platform's robust search functionality. It will also elaborate on our innovative approach using dedicated reindexing nodes within existing clusters, which significantly improves indexing and reindex performance while cutting infrastructure costs. We will explore the application of Kubernetes Event-Driven Autoscaling (KEDA) to dynamically manage resource allocation, enhancing operational efficiency and reducing on-call fatigue. This strategy not only supports seamless user experiences but also boosts Gross Merchandise Value (GMV) and revenue through improved system responsiveness.

This presentation is ideal for those involved in managing large-scale data systems or interested in advanced Elasticsearch optimizations.

avatar for Leila Vayghan

Leila Vayghan

Leila is an engineer at Shopify, where she spends her days enabling millions of merchants to grow by making sure buyers are able to search and find their products. She does this by running a large-scale search infrastructure on Kubernetes in many regions of the world. Leila has completed... Read More →
Tuesday October 29, 2024 16:00 - 16:40 GMT
Liffey Hall 2

16:00 GMT

Discussion: Service Level Objectives
Tuesday October 29, 2024 16:00 - 17:30 GMT
Alex Hidalgo, Nobl9, and Heinrich Hartmann, Zalando SE

This session is an opportunity for people to come together and discuss SLOs, facilitated by our knowledgeable guides. This is not a prepared talk or workshop—expect a less-formal session with plenty of opportunity to ask questions and to talk to other attendees who are interested in SLOs.

avatar for Heinrich Hartmann

Heinrich Hartmann

Zalando SE
Heinrich Hartmann is a seasoned expert with a decade of experience in Reliability Engineering. Currently, he serves as the Senior Principal SRE at Zalando, a leading European e-commerce company, where he oversees company-wide reliability practices. Before joining Zalando, Heinrich... Read More →

Alex Hidalgo

Alex Hidalgo is the Field CTO at Nobl9 and author of "Implementing Service Level Objectives." During his career he has developed a deep love for sustainable operations, proper observability, and using SLO data to drive discussions and make decisions. Alex's previous jobs have included... Read More →
Tuesday October 29, 2024 16:00 - 17:30 GMT
Liffey Hall 1

16:45 GMT

Rock around the Clock (Synchronization): Improve Performance with High Precision Time!
Tuesday October 29, 2024 16:45 - 17:05 GMT
Lerna Ekmekcioglu, Clockwork Systems

Is the app slow or the network lagging? When it comes to latency in distributed systems, it can be hard to identify where exactly the issue is. As businesses increasingly adopt diverse deployment environments —on-premises, cloud, or hybrid— the complexity grows, obscuring visibility into system health. Join me to hear why clock synchronization is key for identifying the true culprit when latency is due to contention in the network. I’ll demo how network contention impacts tail latencies followed by an overview of clock synchronization protocols to date, their pros and cons, and best practices in disciplining clocks, as well as recent algorithms from Stanford Research. With high precision clock synchronization at scale, we gain back visibility into useful one way delay metrics, which act as an early signal for network congestion that help us prevent impact to response times for our end users!

avatar for Lerna Ekmekcioglu

Lerna Ekmekcioglu

Clockwork Systems
Lerna is a Senior Solutions Engineer at Clockwork Systems where she helps customers meet their performance goals with software solutions built on Clockwork.io’s foundational research. Prior to this, she was a Senior Solutions Architect serving Global Financial Services customers... Read More →
Tuesday October 29, 2024 16:45 - 17:05 GMT
The Liffey A

16:50 GMT

Panel Discussion: Is Reliability a Luxury Good?
Tuesday October 29, 2024 16:50 - 17:30 GMT
Moderator: Emil Stolarsky
Panelists: Niall Murphy, Stanza
avatar for Emil Stolarsky

Emil Stolarsky

Emil is an engineer at Increase where he works on building modern banking infrastructure. Before that, he was at companies such as Wave Mobile Money, DigitalOcean, and Shopify, working on everything from building data centres in Sub-Saharan Africa to caching & performance optimizations... Read More →
avatar for Niall Murphy

Niall Murphy

Niall is the CEO of Stanza Systems, has occupied various engineering and leadership roles in Microsoft, Google, and Amazon, and is the instigator of the best-selling & prize-winning Site Reliability Engineering, which he hopes at some stage to live down. His most recent book is Reliable... Read More →
Tuesday October 29, 2024 16:50 - 17:30 GMT
The Liffey B

16:50 GMT

Multi-tier Kubernetes Cluster Auto-Scaling
Tuesday October 29, 2024 16:50 - 17:30 GMT
Moeid Heidari

This research tackles the limitations of traditional autoscaling systems, which typically operate within a single cloud provider. We propose a new Kubernetes autoscaling operator that dynamically adjusts resources across multiple cloud platforms and on-premise systems. By integrating with various provisioning systems and allowing user-defined scaling strategies, this operator addresses the inefficiencies and vendor lock-in issues of conventional solutions. Our approach not only enhances scalability and system resilience but also improves cost-efficiency, as demonstrated by a significant increase in system availability. Metrics are collected and analyzed to predict scaling needs, ensuring optimal performance and resource utilization.

avatar for Moeid Heidari

Moeid Heidari

With over 16 years of experience in the IT industry, I offer a broad and deep skill set in technology. I hold a Master’s degree in Computer Science and am currently pursuing a PhD focused on cloud computing, scalability, and high availability methods.In my current role as a Cloud... Read More →
Tuesday October 29, 2024 16:50 - 17:30 GMT
Liffey Hall 2

17:10 GMT

Mnemonic Rules for Eponymous Laws or: There’s a Law for That!
Tuesday October 29, 2024 17:10 - 17:30 GMT
Peter Burkholder, U.S. Government

As SREs, referencing named laws like Brook’s Law, Galls Law, or Jevons Paradox can help strengthen our arguments. But remembering which law applies when is challenging.

In this talk, I'll highlight the most useful tech and behavioral science laws for SRE work, offer mnemonic tips for recalling them, and share real-world examples. We'll finish with a quick quiz to ensure you're ready to apply these concepts in your role.

avatar for Peter Burkholder

Peter Burkholder

U.S. Government
Geophysicist turned SRE. Jobs include: US Gov, (18f/cloud.gov), GovReady, Chef, AARP, NCBI, NCAR, Univ. of Washington. In my own time, I make pizza, sing, and play guitar (not simultaneously).
Tuesday October 29, 2024 17:10 - 17:30 GMT
The Liffey A

17:30 GMT

Conference Reception at the Sponsor Showcase
Tuesday October 29, 2024 17:30 - 19:30 GMT
Enjoy dinner and beverages while networking with other attendees and visiting the exhibits as we close out the first day of sessions!
Tuesday October 29, 2024 17:30 - 19:30 GMT
The Forum
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -