Loading…
arrow_back View All Dates
Thursday, October 31
 

08:00 GMT

Morning Coffee and Tea
Thursday October 31, 2024 08:00 - 09:00 GMT
Thursday October 31, 2024 08:00 - 09:00 GMT
The Forum

08:00 GMT

Badge Pickup
Thursday October 31, 2024 08:00 - 12:00 GMT
Thursday October 31, 2024 08:00 - 12:00 GMT
Ground Floor Foyer

09:00 GMT

Monitoring Systems as a Service – Walking the Line between Giving Your Devs Good M&O and Setting All Your Money on Fire
Thursday October 31, 2024 09:00 - 09:40 GMT
Joan O'Callaghan, Udemy


Monitoring-as-a-Service products, like Datadog and Honeycomb are amazing products for implementing monitoring & observability with minimal effort, but like Anything-as-a-Service, it comes at a cost.

We are a very normal company, with all the tech debt and orphaned code that any company over a certain age has. Like everyone else, we had staff that heard, "measure everything!" but they didn't know what the monitoring bill looked like and that "everything" included a lot of junk.

In the talk I'll discuss how we managed to reduce cost wastage, enable extra vendor features, improve M&O knowledge within the engineering organisation and keep the bill the same or lower, despite a 60% growth in infrastructure at our company.

Notes re the vendor - I won't say who the Vendor is, but I think our experience was universal enough that our fixes and techniques will be helpful to other companies.


https://www.usenix.org/conference/srecon24emea/presentation/ocallaghan
Speakers
avatar for Joan O'Callaghan

Joan O'Callaghan

Udemy
Joan O'Callaghan is a Monitoring and Observability Director at Udemy. She has worked in SRE and Incident Management and M&O (in one form or another), for many, many years. She likes to host and write blameless incident reviews and take long walks on the beach where she has imaginary... Read More →
Thursday October 31, 2024 09:00 - 09:40 GMT
The Liffey A

09:00 GMT

Opening the Box: Diagnosing Operating-System Task-Scheduler Behavior on Highly Multicore Machines
Thursday October 31, 2024 09:00 - 09:40 GMT
Julia Lawall, Inria-Paris


Getting unexpectedly poor performance from your multicore application? Maybe the operating system task scheduler is at fault. The task scheduler is responsible for placing tasks on cores and for selecting which task is allowed to run, at what time, and for how long. As such, the scheduler is a critical component of any operating system and has a major impact on application performance. Still, scheduling decisions are buried deep within the operating system code, making it challenging to diagnose performance problems (or even performance improvements) to determine whether the scheduler is responsible and, if so, in what way. These challenges are compounded for highly multithreaded applications, running on large multicore machines, due to the huge amount of information available.

In this talk, we present some tools that we have developed for visualizing the behavior of the Linux kernel task scheduler, and illustrate how these tools can be used to help diagnose performance problems. The tools presented are freely available at https://gitlab.inria.fr/schedgraph/schedgraph


https://www.usenix.org/conference/srecon24emea/presentation/lawall
Speakers
avatar for Julia Lawall

Julia Lawall

Inria-Paris
Julia Lawall is a senior researcher at Inria Paris. Prior to joining Inria, she completed a PhD at Indiana University and was on the faculty at the University of Copenhagen. Her work focuses on issues around the correctness and performance of operating systems. She develops and maintains... Read More →
Thursday October 31, 2024 09:00 - 09:40 GMT
The Liffey B

09:00 GMT

Discussion: Wrangling your Management Chain
Thursday October 31, 2024 09:00 - 10:30 GMT
Dave O’Connor and Todd Underwood


This session is an opportunity for people to come together and discuss managing your management chain, facilitated by our knowledgeable guides. This is not a prepared talk or workshop—expect a less-formal session with plenty of opportunity to ask questions and to talk to other attendees who are interested in wrangling your management.


https://www.usenix.org/conference/srecon24emea/presentation/discussion-wrangling-management
Speakers
avatar for Dave O'Connor

Dave O'Connor

Dave is an SRE Leadership practitioner, Advisor and Coach based in Dublin. He's been working on SRE and SRE-adjacent organisations for over 20 years, primarily as an SRE Lead at Google from 2004-2021. Since then, he has spent time leading SRE, Security and Infrastructure teams at... Read More →
avatar for Todd Underwood

Todd Underwood

Todd Underwood recently lead reliability for the Research Platform at Open AI. Previously he was a Senior Engineering Director at Google leading ML capacity engineering in the office of the CFO at Alphabet. Before that, he founded and led ML Site Reliability Engineering and was the... Read More →
Thursday October 31, 2024 09:00 - 10:30 GMT
Liffey Hall 1

09:00 GMT

Workshop: Guided Journey into the Heart of Systemd
Thursday October 31, 2024 09:00 - 12:30 GMT
Alvaro Leiva Geisse and Anita Zhang, Meta
IMPORTANT: If you are attending this workshop, please work through the Getting Started section in order to download the image and set up your environment.
systemd (with lowercase S and D) remains up until this day, both one of the most critical pieces of a system, and the least understood one. This workshop is designed to touch upon the beginner features of systemd and explain how you can use systemd to solve common problems, including some that you didn't even know you had. What problems do you ask? You’ll have to come and see.
https://www.usenix.org/conference/srecon24emea/presentation/geisse
Speakers
avatar for Alvaro Leiva Geisse

Alvaro Leiva Geisse

Meta
I love Python, I grew up in a small town in Chile and one weekend, over 16 years ago, I had the flu and could not go out. I decided to learn how to code in Python and that was the beginning of the road that would move us all to Northern California so that I could join the Production... Read More →
avatar for Anita Zhang

Anita Zhang

Meta
Anita Zhang is the software engineering manager of Meta's Linux Umbrella family of teams. Her teams connect Meta's low-level infrastructure with the open source community. She is known for being a part of the systemd community and continues to support systemd at Meta as part of their... Read More →
Thursday October 31, 2024 09:00 - 12:30 GMT
Liffey Hall 2

09:50 GMT

An Exploration in Storing Telemetry in Cloud Object Storage
Thursday October 31, 2024 09:50 - 10:30 GMT
Mike Heffner and Ray Jenkins, Streamfold


Modern web application architectures require extensive telemetry data to function efficiently at scale. Traditional methods for collecting, storing, and processing this data have become increasingly expensive and challenging to maintain. Conversely, the prevalence of cloud object storage has given rise to the data lake. This has led some organizations to explore telemetry data lakes, which enable cost-efficient storage of large volumes of telemetry data.

We will explore various data storage formats used in constructing telemetry data lakes and discuss the tradeoffs associated with each approach. We will delve into common formats such as JSON, Parquet, ORC, and Apache Iceberg, examining how they can be utilized to store telemetry data like logs, metrics, and traces at scale. These formats will be empirically evaluated using real-world datasets. Additionally, we will review recent literature that highlights areas for design improvements in storage formats to better align them with modern computing hardware.


https://www.usenix.org/conference/srecon24emea/presentation/heffner
Speakers
avatar for Mike Heffner

Mike Heffner

Streamfold
Mike Heffner is co-founder of Streamfold, where they are creating the first telemetry pipeline built for developers. Prior to Streamfold, Mike was a backend engineer at Netlify helping scale their delivery network, and at Librato building one of the first monitoring SaaS products... Read More →
avatar for Ray Jenkins

Ray Jenkins

Streamfold
Ray Jenkins is co-founder of Streamfold, where they are creating the first telemetry pipeline built for developers. Prior to founding Streamfold, he led software engineering efforts at Snowflake, on the observability and performance of FoundationDB and at Segment on development of... Read More →
Thursday October 31, 2024 09:50 - 10:30 GMT
The Liffey A

09:50 GMT

Granular CPU Capacity Management at Scale with eBPF
Thursday October 31, 2024 09:50 - 10:30 GMT
George Brighton and Cameron Howes, Goldman Sachs


Real-time market data is exceptionally bursty, with update rates in the busiest seconds of the day regularly exceeding 10x the average. User experience is predicated on maintaining sufficient CPU headroom to prevent full buffers and the resulting client disconnects. Sampling cumulative CPU time at a typical scrape interval hides microbursts, and sub-second polling from user space induces unacceptable overhead, so a different approach is needed.

This talk will cover how Market Data SRE at Goldman Sachs uplifted CPU monitoring of our market data distribution infrastructure in an unintrusive way, achieving 10x the granularity with 5% of the original monitoring overhead. We will cover the journey from deciding to use eBPF, through trials using bpftrace and making the leap to BPF C, to collecting and aggregating the metrics effectively. It will be most relevant to those interested in capacity management across a heterogeneous estate, and those looking to implement eBPF for the first time in their organisations.


https://www.usenix.org/conference/srecon24emea/presentation/brighton
Speakers
avatar for George Brighton

George Brighton

Goldman Sachs
George Brighton is a Vice President at Goldman Sachs, where he leads the Market Data SRE team. A Prometheus and OTel committer, he is responsible for uplifting observability and operational practices. George presented "Market Data: Applying SRE Techniques to Legacy Designs" at SREcon22... Read More →
avatar for Cameron Howes

Cameron Howes

Goldman Sachs
Cameron Howes is an Analyst in the Market Data SRE team at Goldman Sachs, specialising in low-level development and performance instrumentation. When he's not ferociously avoiding a memory allocation, or reading about the latest CVEs, Cameron can be found writing black-box probers... Read More →
Thursday October 31, 2024 09:50 - 10:30 GMT
The Liffey B

10:30 GMT

Coffee and Tea Break
Thursday October 31, 2024 10:30 - 11:00 GMT
Thursday October 31, 2024 10:30 - 11:00 GMT
The Forum

11:00 GMT

Embrace Fleet Reboots and Make Them Boring
Thursday October 31, 2024 11:00 - 11:40 GMT
Everton Didone Foscarini, Cloudflare


Server reboots bring up mixed sentiments. Some want to say “My kernel is stable, it does not crash with a thousand days uptime”, others understand that you are running a system with a thousand days of accumulated vulnerabilities.

In Cloudflare we believe that high uptimes are bad, and while the reboot automation was being developed, we were hit by a kernel+BIOS bug that caused a high rate of node crashes, and encouraged the quick adoption of reboot automation, prompting us to implement better tooling to deploy fleet changes over reboots, creating multiple reboot queues for different workloads, load-based maintenance windows and more.

We achieved monthly reboots for our edge fleet while keeping the clusters online and serving customer-facing traffic, unlocking our ability to iterate fast on Linux Kernel versions and OS releases, ensuring we are not running outdated library versions in hosts not rebooted for a thousand days.


https://www.usenix.org/conference/srecon24emea/presentation/foscarini
Speakers
avatar for Everton Didone Foscarini

Everton Didone Foscarini

Cloudflare
Working on Internet-based services using Linux since 2003, joined Cloudflare in 2017 and helped to scale Edge location operations from 102 to 320 cities, creating tooling to manage services lifecycle and server reboots.
Thursday October 31, 2024 11:00 - 11:40 GMT
The Liffey A

11:00 GMT

Riot Games: Evolution of Observability at the Gaming Company
Thursday October 31, 2024 11:00 - 11:40 GMT
Erick Moreira and Kirill Mikhailov, Riot Games


The video game industry is growing year-by-year, and it is projected that the market size for video games will double in the coming 10 years. The number of people playing video games will also grow substantially. All of these produce a lot of challenges for tech teams to make sure that the games are not only fun to play but also offer stable, accessible gameplay. This is even more important for online competitive games, as they demand increased stability and performance.

Our presentation is focused on a review of the Riot Games journey through observability and specifically on the latest iteration of global-scale changes we made to introduce SRE and the new observability pipeline in the company.


https://www.usenix.org/conference/srecon24emea/presentation/moreira
Speakers
avatar for Erick Moreira

Erick Moreira

Riot Games
I am Erick Moreira, a 32-year-old Brazilian from Rio, working and living for 5 years in Dublin. I grew up modding and creating simple things for games. Now, I am focused on the backend, cross-cutting concerns, and the developer experience. I still find space in my heart to build front-end... Read More →
avatar for Kirill Mikhailov

Kirill Mikhailov

Riot Games
I started my journey as an engineer while in school, building servers for online games. I then switched to traditional software engineering, working for large tech companies. But at the end of the day, I still landed in the gaming industry, where I have worked for the LiveOps organisation... Read More →
Thursday October 31, 2024 11:00 - 11:40 GMT
The Liffey B

11:00 GMT

Discussion: Building New SRE Teams
Thursday October 31, 2024 11:00 - 12:30 GMT
Avleen Vig and Stephane Dudzinski


This session is an opportunity for people to come together and discuss building new SRE teams, facilitated by our knowledgeable guides. This is not a prepared talk or workshop—expect a less-formal session with plenty of opportunity to ask questions and to talk to other attendees who are interested in building SRE teams from scratch.


https://www.usenix.org/conference/srecon24emea/presentation/discussion-building-new-sre-teams
Speakers
AV

Avleen Vig

Twilio
Avleen is one of Twilio’s Architects for SRE. Over his luminous 20+ year career he has shone a light on the importance of making reliability a core part of the work done by all software engineering teams. When he isn’t working on improving systems designs and reviewing code, you... Read More →
avatar for Stephane Dudzinski

Stephane Dudzinski

Reddit
Stephane Dudzinski is a seasoned veteran with over 20 years of experience in the tech industry, specializing in observability, SRE, and systems. With a decade of leadership experience, he has managed and mentored high-performing teams, improving system reliability. Stephane currently... Read More →
Thursday October 31, 2024 11:00 - 12:30 GMT
Liffey Hall 1

11:45 GMT

A Brief History of Release Engineering
Thursday October 31, 2024 11:45 - 12:05 GMT
Dinah McNutt, MongoDB


TL;DR This talk is a humorous (hopefully) retrospective on release engineering. How did we get from building binaries using a command line to all the fancy CI/CD systems we have today?

Things we used to do seem ridiculous today. Can looking back help us move forward? What’s the evolution and career path of a release engineer? Has the role become diluted through overuse and misuse?

Please join in the fun and include your anecdotes and experiences in the slack channel.


https://www.usenix.org/conference/srecon24emea/presentation/mcnutt
Speakers
avatar for Dinah McNutt

Dinah McNutt

MongoDB
Dinah McNutt is a TPM for MongoDB and based in Dublin, Ireland. She has over 35 years of experience in systems administration, release engineering and software development. She has written for various publications over the years including the Daemons and Dragons column for UNIX Review... Read More →
Thursday October 31, 2024 11:45 - 12:05 GMT
The Liffey A

11:45 GMT

A Powerful Logs Management Solution We All Have and Use but We Underestimate: systemd-journal
Thursday October 31, 2024 11:45 - 12:05 GMT
Costa Tsaousis, Netdata


This talk aims to unearth the potent features of systemd-journal that have remained mostly underutilized and largely underappreciated within the SRE community. The focus will be on its ability to handle dynamically structured log entries, its inherent support for centralized logging, and its robust security features including log sealing.


Systemd-journal offers dynamic field management, allowing flexible log annotation and querying without predefined schemas, along with decentralized log management that enables seamless analysis across systems. Its sealing feature ensures log integrity, critical for incident response and forensics. There’s a tooling gap for converting plain logs into structured entries, however, we will show examples of how this can be achieved.



https://www.usenix.org/conference/srecon24emea/presentation/tsaousis
Speakers
avatar for Costa Tsaousis

Costa Tsaousis

Netdata
Costa Tsaousis, is the Founder and CEO of Netdata. Since 1995, Costa has been actively working on internet related startups. He has been a co-founder and C-level executive of many successful projects, including Internet Service Providers, Cloud Hosting Providers and Fintech startups... Read More →
Thursday October 31, 2024 11:45 - 12:05 GMT
The Liffey B

12:10 GMT

Red Tide Revert
Thursday October 31, 2024 12:10 - 12:30 GMT
David Newman, Automattic


Explore the challenges of managing unexpected production errors in high-frequency deployment environments and introduce an innovative AI-driven solution for rapid error detection and resolution. The speaker will discuss how their team developed and refined an automated system that analyzes error logs, identifies problematic code commits, and streamlines the incident response process. This approach aims to reduce on-call stress, minimize user impact, and pave the way for fully automated error mitigation in complex, fast-paced development ecosystems.


https://www.usenix.org/conference/srecon24emea/presentation/newman
Speakers
DN

David Newman

Automattic
With a diverse background in platform engineering, distributed systems, and artificial intelligence, our speaker brings a trove of experience driving innovation from startup to enterprise environments. As a technical founder in companies ranging from retail intelligence to digital... Read More →
Thursday October 31, 2024 12:10 - 12:30 GMT
The Liffey A

12:10 GMT

Blast Radius Reduction for Large-Scale Distributed Systems
Thursday October 31, 2024 12:10 - 12:30 GMT
Linhua Tang, Huawei Ireland Research Centre


The construction of large-scale distributed systems poses significant challenges due to inherent complexities and the inevitability of failures across various levels, from hardware malfunctions to software bugs. Embracing the 'design for failure' philosophy, this paper delves into advanced isolation techniques aimed at reducing the blast radius—both spatially and temporally—thereby enhancing system resilience. Spatial containment strategies, such as cell-based architecture, compartmentalize failures to localized areas, preventing cascading effects. Temporal mitigation focuses on rapid recovery and self-healing mechanisms, which aim to restore system health promptly after a failure occurs. Furthermore, the paper explores the application of formal methods in verifying the robustness of these designs, providing a rigorous approach to ensure the reliability and effectiveness of implemented solutions. This research underscores the importance of proactive architectural planning and continuous verification in maintaining the stability of complex distributed systems.


https://www.usenix.org/conference/srecon24emea/presentation/tang
Speakers
avatar for Linhua Tang

Linhua Tang

Huawei Ireland Research Centre
Linhua Tang (also known as James) is a software engineer and tech lead for global server load balancing and formal methods at Huawei Ireland Research Center. Before that, he worked at Microsoft and Amazon in different distributed systems.
Thursday October 31, 2024 12:10 - 12:30 GMT
The Liffey B

12:30 GMT

Luncheon
Thursday October 31, 2024 12:30 - 14:00 GMT
Thursday October 31, 2024 12:30 - 14:00 GMT
The Forum

14:00 GMT

AppStack: An Open Source Cloud Native Platform for Running Digital Public Services
Thursday October 31, 2024 14:00 - 14:40 GMT
Dimitris Mitropoulos, National Infrastructures for Research and Technology – GRNET and University of Athens; Alex Kiousis, National Infrastructures for Research and Technology – GRNET


GRNET is Greece's National Infrastructures for Research and Technology (NREN) organisation, which acts as a network and services provider for research and education communities. Since 2019, GRNET is responsible for the development, operation and maintenance of several governmental services, thus playing an important role in Greece's digital transformation. To address the different challenges related to this role, GRNET teams developed AppStack, a cloud-native platform, based on production-ready open source software, for running government-related services such as the gov.gr portal, the electronic issuance of documents signed by the Greek state, and gov wallet, among others.

AppStack provides an environment for integrating open-source and in-house software components, where DevOps can incorporate suitable tools to tackle scalability and security issues.

Currently, AppStack hosts workloads that serve more than 8 million Greek citizens, are able to handle more than 20K requests per second, and can generate hundreds of digital documents signed by the Greek state per second.

In this talk we will present AppStack, its numerous components, and how open source made it possible. Finally, we will describe some key experiences from production.


https://www.usenix.org/conference/srecon24emea/presentation/mitropoulos
Speakers
avatar for Alex Kiousis

Alex Kiousis

National Infrastructures for Research and Technology – GRNET
Alex Kiousis is a Site Reliability Engineer in GRNET in Greece. His team handles GRNET's on-premise infrastructure and services, delivering GRNET's custom Cloud service to Greece's Research and Academic communities and several user-facing Government-related Digital Transformation... Read More →
avatar for Dimitris Mitropoulos

Dimitris Mitropoulos

National Infrastructures for Research and Technology – GRNET and University of Athens
Dimitris Mitropoulos is an Assistant Professor at the National and Kapodistrian University of Athens and the Head of Reliability Engineering at the Greek National Infrastructures for Research and Technology (GRNET). Previously, he has been a postdoctoral researcher at the Computer... Read More →
Thursday October 31, 2024 14:00 - 14:40 GMT
The Liffey A

14:00 GMT

Get Your Non-SREs Oncall Ready!
Thursday October 31, 2024 14:00 - 14:40 GMT
JC van Winkel and Brad Lipinski, Google


Hands on learning is best for adults, and we've used this principle in Google SRE since 2017. However, many oncall engineers aren't SREs and haven't gone through a full week-long SRE onboarding program. How can they learn the same skills and go oncall with confidence, but without the week-long curriculum?

We cherry picked our SRE onboarding program to create a succinct, scalable program for this audience that includes the best of orientation: the breakage exercises. This program is called "Oncall Ready!" and is completely self-service, requiring no operational work from the SRE EDU team. In this talk we will discuss the development, the behind the scenes, and the outcomes of this project. Best comment we got from a participant: "Oh wow, this is like going through a [production] escape room without having to pay for it".


https://www.usenix.org/conference/srecon24emea/presentation/van-winkel
Speakers
avatar for JC van Winkel

JC van Winkel

Google
JC has been teaching UNIX and programming languages since 1992, working for AT Computing, a small courseware spin-off of the University of Nijmegen, the Netherlands. JC joined Google's Site Reliability Engineering team in 2010 and is both a founding member and lead educator of the... Read More →
avatar for Brad Lipinski

Brad Lipinski

Google
Brad joined Google SRE in 2013 and worked on datacenter software. He's taught for SRE EDU from the beginning and contributed to many of the team's automation efforts. In 2019, he joined SRE EDU full time and is now the team's tech lead.
Thursday October 31, 2024 14:00 - 14:40 GMT
The Liffey B

14:00 GMT

Discussion: Learning from Incidents
Thursday October 31, 2024 14:00 - 15:30 GMT
Laura de Vesine, Datadog, Inc., and Cail Young, Octopus Deploy


This session is an opportunity for people to come together and discuss getting the most out of your incident review process, facilitated by our knowledgeable guides. This is not a prepared talk or workshop—expect a less-formal session with plenty of opportunity to ask questions and to talk to other attendees who are interested in learning from incidents.


https://www.usenix.org/conference/srecon24emea/presentation/discussion-learning-from-incidents
Speakers
avatar for Laura de Vesine

Laura de Vesine

Datadog, Inc.
Laura de Vesine is a 20+ year software industry veteran. She has spent the last 8 years in SRE working in incident analysis and prevention, chaos engineering, and the intersection of technology and organizational culture. Laura is currently a staff engineer at Datadog, Inc. She also... Read More →
avatar for Cail Young

Cail Young

Octopus Deploy
Cail has spent the last couple of decades working at the intersection of people and technology: in the performing arts, in the motion picture industry, and now in the field of software operations. He is fascinated by learning from incidents - large and small - and will gladly trade... Read More →
Thursday October 31, 2024 14:00 - 15:30 GMT
Liffey Hall 1

14:00 GMT

Discussion: System Performance and Scaling
Thursday October 31, 2024 14:00 - 15:30 GMT
Leila Vayghan, Shopify, and Abbas Soltanian, OpsGuru


Join us for an interactive Q&A session on System Performance and Scaling, where our expert panel, featuring a senior infrastructure engineer and a senior cloud solutions architect, will address your most pressing questions. This session is designed to provide practical insights and real-world solutions to help you optimize your systems for performance and scalability. Whether you're dealing with cloud architecture challenges, Kubernetes orchestration, or scaling complex infrastructures, bring your questions and engage with industry experts to enhance your understanding and capabilities.


https://www.usenix.org/conference/srecon24emea/presentation/discussion-system-performance-scaling
Speakers
avatar for Leila Vayghan

Leila Vayghan

Shopify
Leila is an engineer at Shopify, where she spends her days enabling millions of merchants to grow by making sure buyers are able to search and find their products. She does this by running a large-scale search infrastructure on Kubernetes in many regions of the world. Leila has completed... Read More →
avatar for Abbas Soltanian

Abbas Soltanian

OpsGuru
Dr. Abbas Soltanian, a Senior Cloud Solutions Architect at OpsGuru (Canada), holds a Ph.D. in Cloud Computing and has presented his work at numerous conferences. With over thirteen years of experience in both academia and industry, he helps companies migrate to the cloud and modernize... Read More →
Thursday October 31, 2024 14:00 - 15:30 GMT
Liffey Hall 1

14:50 GMT

Science Reliability Engineering for High Performance Computing
Thursday October 31, 2024 14:50 - 15:30 GMT
Nicholas Jones, LANL


High Performance Computing (HPC) as an industry has long stood on very human facing operational workflows. These workflows exist because HPC systems are generally purpose built machines for small sets of code bases with very specific performance metrics. This purpose built nature has resulted in HPC having very bespoke one-off systems, resulting in process and infrastructure that benefit a small set of code bases well, but aren't resilient to generational churn. To combat the difficulty from generational churn we've adopted an SRE mindset for our new administrative stack OpenCHAMI. This lets us keep our figures of merit (exact reproducibility, parallel bandwidth, and compute time to solution) aligned with what benefits our customer base the most.


https://www.usenix.org/conference/srecon24emea/presentation/jones
Speakers
avatar for Nicholas Jones

Nicholas Jones

LANL
Nick is a scientist at Los Alamos National Lab, where he works on system security architecture, CI/CD infrastructure, and shared computing environments and strategies across the National Nuclear Security Administration Laboratories.
Thursday October 31, 2024 14:50 - 15:30 GMT
The Liffey A

14:50 GMT

Transforming Production Readiness
Thursday October 31, 2024 14:50 - 15:30 GMT
Panagiotis Moustafellos, Elastic


Join Panagiotis Moustafellos, Distinguished Engineer at Elastic, as he shares Elastic's transformative journey of integrating development teams into on-call rotations.
This talk highlights the creation of an SLO observability product capable of monitoring hundreds of thousands of SLIs globally, amidst a significant infrastructure and software platform re-architecture.
Learn about the phased rollout of Elastic's new serverless offering and the delicate processes involved in getting all software engineers on call. Discover best practices in production readiness, incident management, self-service observability, and software release tools that empower teams to own their services. Gain valuable insights and actionable strategies to enhance production readiness and service reliability in your organization.


https://www.usenix.org/conference/srecon24emea/presentation/moustafellos
Speakers
avatar for Panagiotis Moustafellos

Panagiotis Moustafellos

Elastic
Panagiotis Moustafellos is a Distinguished Engineer at Elastic, the Search AI company. He brings over 15 years of experience in diverse tech environments and specializes in systems architecture, observability, and security, with a focus on scaling software systems, infrastructure... Read More →
Thursday October 31, 2024 14:50 - 15:30 GMT
The Liffey B

15:30 GMT

Coffee and Tea Break
Thursday October 31, 2024 15:30 - 16:00 GMT
Thursday October 31, 2024 15:30 - 16:00 GMT
The Forum

16:00 GMT

Energy Consumption of Datacenters
Thursday October 31, 2024 16:00 - 16:45 GMT
Thomas Fricke


Let us have look into the resource consumption of data centers and collect the current state of knowledge. There will be more questions than answers but predictions can be made because all resources have their limits.

The increase has already been exponential for years. With the AI hype, the demand for energy, cooling, water and other resources has increased dramatically.

The existing GPU based computing paradigm cuts hard into the standard design of data centers and demands other ways of cooling.


https://www.usenix.org/conference/srecon24emea/presentation/fricke
Speakers
avatar for Thomas Fricke

Thomas Fricke

Thomas main focus is cloud and Kubernetes security. He plans private clouds and delivers applications in highly critical infrastucture. His customers are delivering serivices for transmission grids, healthcare, traffic and the German administration.He is cofounder of two companies... Read More →
Thursday October 31, 2024 16:00 - 16:45 GMT
The Liffey

16:45 GMT

Are We Really Engineers?
Thursday October 31, 2024 16:45 - 17:30 GMT
Hillel Wayne


What makes software engineering different from “traditional” engineering? To find out, I interviewed 17 “crossovers”: people who have worked professionally as both a software and a traditional engineer. In aggregate, we learn three things: we are in fact engineers, we’re not actually that different as a field, and there’s a lot we can both teach and learn.


https://www.usenix.org/conference/srecon24emea/presentation/wayne
Speakers
avatar for Hillel Wayne

Hillel Wayne

Hillel is a formal methods consultant and the author of Logic for Programmers and Practical TLA+. His other work includes Computer Things, a weekly newsletter on the history and theory of software engineering, and Let's Prove Leftpad. In his free time, he juggles and makes chocolate... Read More →
Thursday October 31, 2024 16:45 - 17:30 GMT
The Liffey

17:30 GMT

Closing Remarks
Thursday October 31, 2024 17:30 - 17:40 GMT
Thursday October 31, 2024 17:30 - 17:40 GMT
The Liffey
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -