Cloud Tech is the largest gathering of cloud technologists & engineers in the bay area. Our speakers include the top cloud computing entrepreneurs & experts.
Come join us Saturday, October 6th, from 9am to 6pm. at the Computer History Museum in Mountain View, CA for a full 8 hours of learning directly from great minds sharing their secrets!
Special thanks to our sponsors who made this all possible. They are: CloudStack, Scalr, VMware,Rackspace, HP, DataStax, AWS, Canonical, Puppet, and General Catalyst.
Come listen to Apolak Borthakur, the head of Amazon EC2’s Bay area office, talk about what it takes to run the world's largest cloud, grow it, and staff for it to power the fastest growing organizations on the planet.
At Airbnb, we recently released Chronos for building complex data pipelines with dependencies (http://nerds.airbnb.com/introducing-chronos). Chronos allows scheduling jobs in a fault-tolerant and distributed way. It is a Scala framework built on top of mesos, a kernel for the cluster. Mesos is in production use at Twitter and Airbnb and runs on thousands of nodes. This talk will cover the basics of mesos and how we built chronos on top of mesos.
A step-by-step presentation of how we transitioned Box's web-application stack from a single bottlenecked MySQL database, to a fully sharded MySQL architecture, all the while serving 2 billion queries per day. The focus will be on the incremental steps and best practices that enabled the successful execution of this change, as well as the mistakes made and the lessons learned along the way.
We begin with an overview of our web application architecture both before and after sharding, and discuss our reasons for choosing sharded MySQL as our scaling solution. We then walk through the modifications we made to our ORM layer, including advanced features such as support for cross-shard queries and online moving of data between shards. Finally, we present a detailed description of the technique we developed for migrating live data to shards without downtime, which also supports table by table migration for added flexibility. Throughout the talk, the focus will be on how to make large-scale changes in an incremental fashion, without adversely affecting functionality, and most importantly without downtime.
This talk will cover how Facebook transformed its ETL and Analytics pipeline from daily batch to incremental, near realtime. It will discuss the technology that continuously moves, transforms and loads data from distributed log and sharded mysql db, into Hive data warehouse. HBase is used as underlying storage for incrementally updated table, while the data is exposed as external table into Hive for read processing.
MySQL replication strategies for data consistency: a Percona XtraDB Cluster case study, covering
1. synchronous replication
2. supports multi-master replication
3. parallel applying AKA “parallel replication”,
4. automatic node provisioning.
5. primary focus on data consistency
The do's the don'ts and the why's. The enterprise is faced with a large problem understanding the access patterns for exploration, developing, deploying, and maintaining machine learning at scale. In this talk we'll go through some common problems and architecture to support all the phases of data science. We'll also talk about what to lookout for when embarking on your first data science initiative.
Data Science has emerged as a field which combines expertise in quantitative analysis and distributed computing, generally as a need to apply algorithmic modeling in large-scale applications. Functional programming approaches such as Cascalog (in Clojure) and Scalding (in Scala) have gained popularity for commercial use cases, due to their efficient solutions at scale and desirable properties for software engineer. In this talk we will review typical use cases real-world applications, as well as consider some of the historical drivers which have caused changes in the industry. We we also review an example application in Cascalog, for a recommender system based on City of Palo Alto Open Data.
At eBay we are using Scala (along with Scalding and Scoobi) for much of our Hadoop based batch processing as well as for doing ETL on the generated data. In this talk I'll go over some of the Scala (and other) technologies we have embraced, talk about why we use the approaches that we do and cover some of the larger lessons we learned along the way. When applicable I'll use actual eBay case studies as illustrative examples.