Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








238 Hits in 5.2 sec

Live Recovery of Bit Corruptions in Datacenter Storage Systems [article]

Amy Tai, Andrew Kryczka, Shobhit Kanaujia, Chris Petersen, Mikhail Antonov, Muhammad Waliji, Kyle Jamieson, Michael J. Freedman, Asaf Cidon
2018 arXiv   pre-print
To do so, we present DIRECT, a novel set of policies that leverages latent redundancy in distributed storage systems to recover from bit corruption errors with minimal performance and recovery overhead  ...  Due to its high performance and decreasing cost per bit, flash is becoming the main storage medium in datacenters for hot data.  ...  We also show how increasing the resiliency to bit-level errors can significantly reduce storage costs and improve live recovery speed in datacenter environments.  ... 
arXiv:1805.02790v2 fatcat:htzwylssjnemzlkpc7jx5m67ju

Who's Afraid of Uncorrectable Bit Errors? Online Recovery of Flash Errors with Distributed Redundancy

Amy Tai, Andrew Kryczka, Shobhit O. Kanaujia, Kyle Jamieson, Michael J. Freedman, Asaf Cidon
2019 USENIX Annual Technical Conference  
Due to its high performance and decreasing cost per bit, flash storage is the main storage medium in datacenters for hot data.  ...  By significantly increasing the availability of distributed storage systems in the face of bit errors, DIRECT helps extend flash lifetimes.  ...  We also show how increasing the resiliency to bit errors can significantly reduce storage costs and improve live recovery speed in datacenter environments.  ... 
dblp:conf/usenix/TaiKKJFC19 fatcat:4jjnw5fafzc25cz5gasrfziufe

Reliable Multi-cloud Storage Architecture Based on Erasure Code to Improve Storage Performance and Failure Recovery

Emmy Mugisha, Gongxuan Zhang
2017 International Journal of Advanced Cloud Computing and Applied Research  
To assure stability in storage costs with best practice to assure content failure or loss recovery, we applied a Maximum Distance Separable such as Reed-Solomon to our RMCSA.  ...  The increasing popularity of cloud storage services has lead companies that handle critical content to think about using these services for their daily storage needs.  ...  The authors express their appreciation to Nanjing University of Science and Technology for creating a research fostering environment.  ... 
doi:10.23953/cloud.ijaccar.260 fatcat:cwifw7w2j5ch3p5nlx5mxms3da

Toward a high availability cloud: Techniques and challenges

Cuong Pham, Phuong Cao, Zbigniew Kalbarczyk, Ravishankar K. Iyer
2012 IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012)  
Multicore processing, virtualization, distributed storage systems and an overarching management framework that enable a Cloud, offer a plethora of possibilities to provide high availability using commodity  ...  With its geographical spread and value proposition comes the need to provide guaranteed level of availability in the infrastructure and in its services.  ...  ACKNOWLEDGMENT This work was supported in part by NSF grant CNS 10-18503 CISE, the Department of Energy under Award Number DE-OE0000097, the Air Force Office of Scientific Research, under agreement number  ... 
doi:10.1109/dsnw.2012.6264687 dblp:conf/dsn/PhamCKI12 fatcat:chins7svy5h2rihrlnqcsyhc7e

PaxosStore

Jianjun Zheng, Qian Lin, Jiatao Xu, Cheng Wei, Chuwei Zeng, Pingan Yang, Yunfan Zhang
2017 Proceedings of the VLDB Endowment  
In this paper, we present PaxosStore, a high-availability storage system developed to support the comprehensive business of WeChat.  ...  It employs a combinational design in the storage layer to engage multiple storage engines constructed for different storage models.  ...  Acknowledgments We would like to thank the anonymous reviewers and shepherd of the paper for their insightful feedback that helped improve the paper.  ... 
doi:10.14778/3137765.3137778 fatcat:rydvf7xt7rb5vmpkjzijw4tfui

Datacenter Ethernet and RDMA: Issues at Hyperscale [article]

Torsten Hoefler, Duncan Roweth, Keith Underwood, Bob Alverson, Mark Griswold, Vahid Tabatabaee, Mohan Kalkunte, Surendra Anubolu, Siyan Shen, Abdul Kabbani, Moray McLaren, Steve Scott
2023 arXiv   pre-print
Now, a decade later, we revisit RoCE's design points and conclude that several of its shortcomings must be addressed to fulfill the demands of hyperscale datacenters.  ...  We observe that emerging artificial intelligence, high-performance computing, and storage workloads pose new challenges for large-scale datacenter networking.  ...  This implies that packets can only be dropped if they are corrupted by bit errors, a very rare event.  ... 
arXiv:2302.03337v1 fatcat:kra475y64bcp3jb54frgd5j5ra

Proactive Detection and Repair of Data Corruption: Towards a Hassle-free Declarative Approach with Amulet

Nedyalko Borisov, Shivnath Babu
2011 Proceedings of the VLDB Endowment  
Occasional corruption of stored data is an unfortunate byproduct of the complexity of modern systems.  ...  The dominant practice to deal with data corruption today involves administrators writing ad hoc scripts that run dataintegrity tests at the application, database, file-system, and storage levels.  ...  INTRODUCTION Data corruption-where bits of data in persistent storage differ from what they are supposed to be-is an ugly reality that database and storage administrators have to deal with occasionally  ... 
dblp:journals/pvldb/BorisovB11 fatcat:t4dwjixfjvb2tfn4e4gqx5pzgu

The RAMCloud Storage System

John Ousterhout, Mendel Rosenblum, Stephen Rumble, Ryan Stutsman, Stephen Yang, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park (+1 others)
2015 ACM Transactions on Computer Systems  
In a large datacenter with 100,000 nodes, we expect small reads to complete in less than 10μs, which is 50 to 1,000 times faster than the storage systems commonly used today.  ...  Over the past 15 years, the use of DRAM in storage systems has accelerated, driven by the needs of large-scale Web applications.  ...  -Coordinator crashes -Corruption of segments, either in DRAM or on secondary storage. Multiple failures can occur simultaneously.  ... 
doi:10.1145/2806887 fatcat:fg3r5yahbjhxhcor6m2w2q6bxy

Flat Datacenter Storage

Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen S. Hofmann, Jon Howell, Yutaka Suzue
2012 USENIX Symposium on Operating Systems Design and Implementation  
We measure recovery of 92 GB data lost to disk failure in 6.2 s and recovery from a total machine failure with 655 GB of data in 33.7 s.  ...  Flat Datacenter Storage (FDS) is a high-performance, fault-tolerant, large-scale, locality-oblivious blob store.  ...  Johnson Apacible, Rich Draves, and Reuben Olinsky were part of the sort record team. Trevor Eberl, Jamie Lee, Oleg Losinets and Lucas Williamson provided systems support.  ... 
dblp:conf/osdi/NightingaleEFHHS12 fatcat:5ulzufamjnhnhblg53d6ibwtiq

The Design, Implementation, and Deployment of a System to Transparently Compress Hundreds of Petabytes of Image Files for a File-Storage Service

Daniel Reiter Horn, Ken Elkabany, Chris Lesniewski-Laas, Keith Winstein
2017 arXiv   pre-print
We report the design, implementation, and deployment of Lepton, a fault-tolerant system that losslessly compresses JPEG images to 77% of their original size on average.  ...  Lepton matches the compression efficiency of the best prior work, while decoding more than nine times faster and in a streaming manner.  ...  KW's participation was as a paid consultant and was not part of his Stanford duties or responsibilities.  ... 
arXiv:1704.06192v1 fatcat:6n2sefbsnba3fbi4kmovpeyj54

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second edition

Luiz André Barroso, Jimmy Clidaras, Urs Hölzle
2013 Synthesis Lectures on Computer Architecture  
Acknowledgments While we draw from our direct involvement in Google's infrastructure design and operation over the past several years, most of what we have learned and now report here is the result of  ...  Thanks in advance for taking the time to contribute.  ...  The Autopilot system from Microsoft [87] offers an example design for some of this functionality for Windows Live datacenters.  ... 
doi:10.2200/s00516ed2v01y201306cac024 fatcat:435o455inbcmrakl6l7jp4gope

A Taxonomy and Future Directions for Sustainable Cloud Computing: 360 Degree View [article]

Sukhpal Singh Gill, Rajkumar Buyya
2018 arXiv   pre-print
In this paper, we propose a comprehensive taxonomy of sustainable cloud computing.  ...  The usage of large number of cloud datacenters increases cost as well as carbon footprints, which further effects the sustainability of cloud services.  ...  datacenter powering systems and local flash storage with low power CPUs.  ... 
arXiv:1712.02899v2 fatcat:t26xxbgiijesneqgzi2mqz4gta

A Survey on Security Mechanisms of Leading Cloud Service Providers

Deepak Panth, Dhananjay Mehta, Rituparna Shelgaonkar
2014 International Journal of Computer Applications  
With an unprecedented pace of developments in Cloud computing technology, there has been an exponential increase of users of these services and an equal rise of cloud services providers.  ...  Clouding Computing is a virtual pool of resources provided to users as service through a web interface. These resources may include Software, Infrastructure, Storage, Network, Platform etc.  ...  Data Services (Storage, SQL Database, HDInsight, Cache, Backup, Recovery Manager) III.  ... 
doi:10.5120/17149-7184 fatcat:262geqqsqbet7gofec7wyrn2di

Swift

Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan M. G. Wassel, Xian Wu, Behnam Montazeri, Yaogong Wang, Kevin Springborn, Christopher Alfeld, Michael Ryan, David Wetherall, Amin Vahdat
2020 Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication  
We report on experiences with Swift congestion control in Google datacenters. Swift targets an end-to-end delay by using AIMD control, with pacing under extreme congestion.  ...  In large-scale testbed experiments, Swift delivers a tail latency of <50µs for short RPCs, with near-zero packet drops, while sustaining ∼100Gbps throughput per server.  ...  Manya Ghobadi, Emily Blem, Vinh The Lam, Philip Wells and Ashish Naik contributed to the work in the early days of Swift.  ... 
doi:10.1145/3387514.3406591 dblp:conf/sigcomm/KumarDJWWMWSARW20 fatcat:ks4dtoy7hfdo5nerw6pcfmhcky

Higher SLA satisfaction in datacenters with continuous VM placement constraints

Huynh Tu Dang, Fabien Hermenier
2013 Proceedings of the 9th Workshop on Hot Topics in Dependable Systems - HotDep '13  
In a virtualized datacenter, the Service Level Agreement for an application restricts the Virtual Machines (VMs) placement.  ...  We propose a Byzantine fault tolerant pub/sub system, on a tree-based overlay, tolerating a configurable number of failures in any part of the system, with minimal divergence from traditional pub/sub specifications  ...  We thank the developers and users of SQLite and LevelDB for helping us understand their software in detail.  ... 
doi:10.1145/2524224.2524226 dblp:conf/hotdep/DangH13 fatcat:xe5soaxhengtxcufr5oahrxyoq
« Previous Showing results 1 — 15 out of 238 results