Morpheus Blog

TL;DR: As the amount of unstructured data being collected by organizations skyrockets, their existing databases come up short: they're too slow, too inflexible, and too expensive. What's needed is a DBMS that isn't constricted by the relational schema, and one that accommodates object-oriented data structures without the complexity and latency of object-relational mapping frameworks. NoSQL (a.k.a. Not Only SQL) provides the flexibility, scalability, and availability required to manage the deluge of unstructured data, albeit with some shortcomings of its own.

Data isn't what it used to be. Gone (or going) is the well-structured model of data stored neatly in tables and rows queried via the data's established relations. Along comes Google, Facebook, Twitter, Amazon, and untold other sources of unstructured data that simply doesn't fit comfortably in a conventional relational database.

That isn't to say RDBMSs are an endangered species. An August 24, 2014, article on TheServerSide.com points out that enterprises continue to prefer SQL databases, primarily for their reliability through compliance with the atomicity, consistency, isolation, and durability (ACID) model. Also, there are plenty of DBAs with relational SQL experience, but far fewer with NoSQL skills.

Still, RDBMSs don't accommodate unstructured data easily -- at least not in their current form. The future is clearly one in which the bulk of data in organizations is unstructured. As far back as June 2011 an IDC study (pdf) predicted that 90 percent of the data generated worldwide in the next decade would be unstructured. How much data are we talking? How about 8000 exabytes in 2015, which is the equivalent of 8 trillion gigabytes.

Growth in Data

The tremendous growth in the amount of data in the world -- most of which is unstructured -- requires a non-relational approach to management. Credit: IDC

As the IDC report points out, enterprises can no longer afford to "consume IT" as part of their internal infrastructure, but rather as an external service. This is particularly true as cloud services such as the Morpheus database as a service (DBaaS) incorporate the security and reliability required for companies to ensure the safety of their data and regulatory compliance.

By supporting both MongoDB and MySQL, Morpheus offers organizations the flexibility to transition existing databases and their new data stores to the cloud. They can use a single console to monitor their queries in real time to find and remove performance bottlenecks. Connections are secured via VPN, and all data is backed up, replicated, and archived automatically. Morpheus's SSD-backed infrastructure ensures fast connections to data stores, and direct links to EC2 provide ultra-low latency. Visit the Morpheus site for pricing information or to sign up for a free account.

Addressing SQL's scalability problem head-on

A primary shortcoming of SQL is that as the number of transactions being processed goes up, performance goes down. The traditional solution is to add more RDBMS servers, but doing so is expensive, not to mention a management nightmare as optimization and troubleshooting become ever more complex.

With NoSQL, your database scales horizontally rather than vertically. The resulting distributed databases host data on thousands of servers that can be added or deleted without affecting performance. Of course, reality is rarely this simple. In a November 20, 2013, article on InformationWeek, Joe Masters explains that high availability is simple to achieve on read-only distributed systems. Writing to those systems is much trickier.

As stated in the CAP theorem (or Brewer theorem, named after Eric Brewer), you can have strict availability, or strict consistency, but not both. NoSQL databases lean toward the availability side, at the expense of consistency. However, distributed databases are getting better at handling timeouts, although there's no way to do so without affecting the database's performance.

Another NoSQL advantage is that it doesn't lock you into a rigid schema the way SQL does. As Jnan Dash explains in a September 18, 2013, article on ZDNet, revisions to the data model can cause performance problems, but rarely do designers know all the facts about the data model before it goes into production. The need for a dynamic data model plays into NoSQL's strength of accommodating changes in markets, changes in the organization, and even changes in technology.

The benefits of NoSQL's data-model flexibility

NoSQL data models are grouped into four general categories: key-value (K-V) stores, document stores, column-oriented stores, and graph databases. Ben Scofield has rated these NoSQL database categories in comparison with relational databases. (Note that there is considerable variation between NoSQL implementations.)

The four general NoSQL categories are rated by Ben Scofield in terms of performance, scalability, flexibility, complexity, and functionality. Credit: Wikipedia

The fundamental data model of K-V stores is the associative array, which is also referred to as a map or directory. Each possible key of a key-value pair appears in a collection no more than once. As one of the simplest non-trivial data models, K-V stores are often extended to more-powerful ordered models that maintain keys in lexicographic order, among other purposes.

The documents that comprise the document store encapsulate or encode data in standard formats, typically XML, YAML, or JSON (JavaScript Object Notation), but also binary BSON, PDF, and MS-Office formats. Documents in collections are somewhat analogous to records in tables, although the documents in a collection won't necessarily share all fields the way records in a table do.

A NoSQL column is a key-value pair comprised of a unique name, value, and timestamp. The timestamp is used to distinguish valid content from stale content and thus addresses the consistency shortcomings of NoSQL. Columns in distributed databases don't need the uniformity of columns in relational databases because NoSQL "rows" aren't tied to "tables," which exist only conceptually in NoSQL.

Graph databases use nodes, edges, and properties to represent and store data without need of an index. Instead, each database element has a pointer to adjacent elements. Nodes can represent people, businesses, accounts, or other trackable items. Properties are data elements that pertain to the nodes, such as "age" for a person. Edges connect nodes to other nodes and to properties; they represent the relationships between the elements. Most of the analysis is done via the edges.

Once you've separated the NoSQL hype from the reality, it becomes clear that there's plenty of room in the database environments of the future for NoSQL and SQL alike. Oracle, Microsoft, and other leading SQL providers have already added NoSQL extensions to their products, as InfoWorld's Eric Knorr explains in an August 25, 2014, article. And with DBaaS services such as Morpheus, you get the best of both worlds: MongoDB for your NoSQL needs, and MySQL for your RDBMs. It's always nice to have options!

TL; DR: Google Analytics stores a massive amount of statistical data from web sites across the globe. Retrieving reports quickly from such a large amount of data requires Google to use a custom solution that is easily scalable whenever more data needs to be stored.

At Google, any number of applications may need to be added to their infrastructure at any time, and each of these could potentially have extremely heavy workloads. Resource demands such as these can be difficult to meet, especially when there is a limited amount of time to get the required updates implemented.

If Google were to use the typical relational database on a single server node, they would need to upgrade their hardware each time capacity is reached. Given the amount of applications being created and data being used by Google, this type of upgrade could quite possibly be necessary on a daily basis!

The load could also be shared across multiple server nodes, but once more than a few additional nodes are required, the complexity of the system becomes extremely difficult to maintain.

With these things in mind, a standard relational database setup would not be a particularly attractive option due to the difficulty of upgrading and maintaining the system on such a large scale.

Finding a Scalable Solution

In order to maintain speed and ensure that such incredibly quick hardware upgrades are not necessary, Google uses its own data storage solution called BigTable. Rather than store data relationally in tables, it stores data as a multi-dimensional sorted map.

This type of implementation falls under a broader heading for data storage, called a key/value store. This method of storage can provide some performance benefits and make the process of scaling much easier.

Information Storage in a Relational Database

Relational databases store each piece of information in a single location, which is typically a column within a table. For a relational database, it is important to normalize the data. This process ensures that there is no duplication of data in other tables or columns.

For example, customer last names should always be stored in a particular column in a particular table. If a customer last name is found in another column or table within the database, then it should be removed and the original column and table should be referenced to retrieve the information.

The downside to this structure is that the database can become quite complex internally. Even a relatively simple query can have a large number of possible paths for execution, and all of these paths must be evaluated at run time to find out which one will be the most optimal. The more complex the database becomes, the more resources will need to be devoted to determining query paths at run time.

Information Storage in a Key/Value Store

With a key/value store, duplicate data is acceptable. The idea is to make use of disk space, which can easily and cost-effectively be upgraded (especially when using a cloud), rather than other hardware resources that are more expensive to bring up to speed.

This data duplication is beneficial when it comes to simplifying queries, since related information can be stored together to avoid having numerous potential paths that a query could take to access the needed data.

Instead of using tables like a relational database, key/value stores use domains. A domain is a storage area where data can be placed, but does not require a predefined schema. Pieces of data within a domain are defined by keys, and these keys can have any number of attributes attached to them.

The attributes can simply be string values, but can also be something even more powerful: data types that match up with those of popular programming languages. These could include arrays, objects, integers, floats, Booleans, and other essential data types used in programming.

With key/value stores, the data integrity and logic are handled by the application code (through the use of one or more APIs) rather than by using a scheme within the database itself. As a result, data retrieval becomes a matter of using the correct programming logic rather than relying on the database optimizer to determine the query path from a large number of possibilities based on the relation it needs to access.

Data Access

How data access differs between a relational database and a key/value database. Source: readwrite

Getting Results

Google needs to store and retrieve copious amounts of data for many applications, included among them are Google Analytics, Google Maps, Gmail, and their popular web index for searching. In addition, more applications and data stores could be added at any time, making their BigTable key/value store an ideal solution for scalability.

BigTable is Google’s own custom solution, so how can a business obtain a similar performance and scalability boost to give its users a better experience? The good news is that there are other key/value store options available, and some can be run as a service from a cloud. This type of service is easily scalable, since more data storage can easily be purchased as needed on the cloud.

A Key/Value Store Option

There are several options for key/value stores. One of these is Mongo, which is designed as an object database that stores information in JSON format. This format is ideal for web applications since JSON data makes it easy to pass data around in a standard format among the various parts of an application that need it.

For example, Mongo is part of the MEAN stack: Mongo, Express, AngularJS, and NodeJS—a popular setup for programmers developing applications. Each of these pieces of the puzzle will send data to and from other one or more of the other pieces. Since everything, including the database, can use the JSON format, passing the data around among the various parts becomes much easier and more standardized.

MySQL vs. MongoDB

How mySQL and Mongo perform the same tasks. Source: Rick Osborne

How to Make Use of Mongo

Mongo can be installed and used on various operating systems, including Windows, Linux, and OS X. In this case, the scalability of the database would need to be maintained by adding storage space to the server on which it is installed.

Another option is to use Mongo as a service on the cloud. This allows for easy scalability, since a request can be made to the service provider to up the necessary storage space at any time. In this way, new applications or additional data storage needs can be handled quickly and efficiently.

Morpheus is a great option for this service, offering Mongo as a highly scalable service in the cloud: Users of Morpheus get three shared nodes, full replica sets, and can seamlessly provision MongoDB instances. In addition, all of this runs on a high-performance, Solid State Drive (SSD) infrastructure, making it a very reliable data storage medium. Using Morpheus, a highly scalable database as a service can be running in no time!

TL; DR: The Department of Defense's slow, steady migration to public and private cloud architectures may be hastened by pressures at opposite ends of the spectrum. At one end are programs such as the NSA's cloud-based distributed RDBMS that realize huge cost savings and other benefits. At the other end are the growing number of sophisticated attacks (and resulting breaches) on expensive-to-maintain legacy systems. The consensus is that the DOD's adoption of public and private cloud infrastructures is inevitable, which makes the outlook rosy for commercial cloud services of all types.

U.S. computer networks are under attack. That's not news. But what is new is the sophistication of the assaults on public and private computer systems of all sizes. The attackers are targeting specific sensitive information, the disclosure of which threatens not only business assets and individuals' private data, but also our nation's security.

In a September 29, 2014, column on the Times Herald site, U.S. Senator Carl Levin, who is chairman of the Senate Armed Services Committee, released the unclassified version of an investigation into breaches of the computer networks of defense contractors working with the U.S. Transportation Command, or TRANSCOM. The report disclosed more than 20 sophisticated intrusions by the Chinese government into TRANSCOM contractor networks in a 12-month period ending in June 2013.

In one instance, the Chinese military stole passwords, email, and source code from a contractor's network. Other attacks targeted flight information to track the movement of troops, equipment, and supplies. TRANSCOM was aware of only two of the 20-plus attacks on its contractors' networks, even though the FBI and other government agencies were aware of all of the attacks.

The report highlights the need to disclose breaches and attempts. Otherwise, there's no way to formulate an effective response in the short run, and deterrence in the long run. The left hand doesn't know what happened to the right hand within government, as well as in the business world.

Lack of breach disclosures plays into the bad guys' hands

No longer are data thieves rogue hackers acting alone. Today's Internet criminals work in teams that tap the expertise of their members to attack specific targets and conceal their activities. InformationWeek's Henry Kenyon describes in a September 29, 2014, article how security officials in the public and private sectors are striving to coordinate their efforts to detect and prevent breaches by these increasingly sophisticated Internet criminals.

The Department of Homeland Security is charged with coordinating cyber-defenses, mitigating attacks, and responding to incidents of Internet espionage. Phyllis Schneck, DHS's director of cybersecurity, identifies three impediments to effective defenses against network attacks.

Problem 1: The criminals are talented and coordinated.
Problem 2: Breaches intent on espionage often appear to be theft attempts.
Problem 3: Firms don't report data breaches, so there's no sharing of information, which is necessary to devise a coordinated response.

DHS's Einstein system constantly scans civilian government networks, analyzing them to detect and prevent zero-day, bot-net, and other attacks. Schneck states that DHS makes it a priority to share the information it collects about attempted and successful breaches with other government agencies, the private sector, and academia.

DoD Modernization

Public and private cloud services will play an important role in the Department of Defense's IT Modernization program. Source: Business2Community

The problem, according to analysts, is that businesses are loathe to disclose data losses and thwarted attacks on their networks. They consider their reputation for network security a competitive advantage, so anything that impairs that reputation could reduce the company's value. Sue Poremba points out in a September 24, 2014, article on Forbes that most major breaches still receive very little publicity.

However, the recent spate of major breaches at Home Depot, Dairy Queen, PF Chang's, Target, and major universities are convincing company officials of the need to coordinate their defenses. Such a coordinated approach to network protection begins and ends by sharing information.

An NSA cloud success story serves as the blueprint

Organizations don't get more secretive than the U.S. National Security Agency. You'd think the NSA would be the last agency to migrate its databases to the cloud, but that's precisely what it did -- and in the process realized improved performance, timeliness, and usability while also saving money and maintaining top security.

In a September 29, 2014, article, NetworkWorld's Dirk A.D. Smith describes the NSA's successful cloud-migration program. The agency's hundreds of relational databases needed more capacity, but throwing more servers at the problem wasn't practical: existing systems didn't scale well, and the resulting complexity would have been a nightmare to manage.

Instead, NSA CIO Lonny Anderson convinced the U.S. Cyber Command director to move the databases to a private cloud. Now analyses take less time, the databases cost less to manage, and the data they contain is safer. That's what you call win-win-win.

The goal was to create a "user-facing experience" that offered NSA analysts "one-stop shopping," according to Anderson. Budget cuts required security agencies to share data and management responsibilities: NSA and CIA took charge of cloud management; the National Geospatial Intelligence Agency (NGA) and Defense Intelligence Agency (DIA) took responsibility for desktops; and National Reconnaissance Office (NRO) was charged with network management and engineering services.

The agencies' shared private cloud integrates open source (Apache Hadoop, Apache Accumulo, OpenStack) and government-created apps running on commercial hardware that meets the DOD's specs for reliability and security. The resulting network lets the government realize the efficiency benefits of commercial public cloud services, according to Anderson.

DoD Cloud Infrastructure

The DOD's Enterprise Cloud Infrastructure will transition local and remote data centers to a combination of public and private cloud apps and services. Source: Business2Community

Just as importantly, the cloud helps the defense agencies ensure compliance with the strict legal authorities and oversight their data collection and analysis activities are subject to. The private cloud distributes data across a broad geographic area and tags each data element to indicate its security and usage restrictions. The data is secured at multiple layers of the distributed architecture.

The data-element tags allow the agency to determine when and how each bit of data is accessed -- to the individual word or name -- as well as all the people who accessed, downloaded, copied, printed, forwarded, modified, or deleted the specific data element. Many of these operations weren't possible on the legacy systems the private cloud replaced, according to Anderson. He claims the new system would have prevented breaches such as the 2010 release of secure data by U.S. soldier Bradley Manning.

Overcoming analysts' reluctance to abandon their legacy systems

Anderson faced an uphill battle in convincing agency analysts to give up their legacy systems, which in many instances couldn't be ported directly to the cloud. Adoption of the cloud was encouraged through a program that prohibited analysts from using the legacy systems for one full day every two weeks. With the assistance of analysts with cloud expertise, the newbies overcame the problems they encountered as they transitioned to the agency's private cloud.

The result is a faster, more efficient system that improves security and cut costs. These are among the benefits being realized by companies using the Morpheus database-as-a-service (DBaaS). Morpheus is based on an SSD infrastructure for peak performance, allowing you to identify and optimize data queries in real time. Backup, replication, and archiving of databases are automatic, and your data is locked down via VPN security.

Morpheus supports Elasticsearch, Redis, MySQL, and MongoDB. Visit the Morpheus site for pricing information and to create a free trial account.

Similar benefits are being realized by the first DOD agencies using commercial cloud services. Amber Corrin reports in a September 24, 2014, article on the Federal Times site that defense agencies will soon be able to contract for public cloud services directly rather than having to go through the Defense Information Systems Agency (DISA).

The change is the result of the perception that DOD agencies are too slow to adopt cloud technologies, according to DOD CIO Terry Halvorsen. However, there will still be plenty of bureaucracy. Agencies will be required to provide the DOD CIO with "detailed business case analyses" that consider services offered by the DISA, among other restrictions.

Most importantly, all cloud connection points are controlled by the DISA, and unclassified traffic has to pass through secured channels. Slowing things down even further, agencies will have to obtain an authority to operate, or ATO.

The DOD's cloud migration may be slow, but it's steady. Bob Brewin reports in a September 23, 2014, article on NextGov that the Air Force Reserves will now use Microsoft 365 for email and other purposes, which promises to save the government millions of dollars over the next few years. That's something taxpayers can cheer about!

TL; DR: Recent data system breaches at Target and Home Depot remind us all that the continuous threat of criminal hacking of computer security systems is not abating. Rather, it’s becoming routine. Business managers, lawmakers, and computing professionals must understand the motivation behind this activity if they want to effectively protect business interests and thwart attacks. Perhaps the biggest challenge is that the hacking community is a diverse and complex universe: a large variety of skill layers and several motivators. Only by understanding the motives of criminal security hackers it is possible to profile computer crimes. With solid profiles in hand, security professionals can better predict future activity and install the appropriate safeguards.

Most security professionals are likely to spend much more time analyzing the technical and mechanical aspects of cybercrime than the social and psychological dimensions. Of course it’s critically important to dissect malware, examine hacker tools, and analyze their code. However, if we want to understand the nature of the cyber threat, then security professionals need to act more like criminal investigators. We no longer live in a world of mere glory-seekers and script kiddies. Some very serious thugs are now lurking in virtually every sector. So, it’s critically important for you to understand their motives and signatures, since these point to their targets and reveal their methods of operation.

As you consider your business context, it important to frequently ask yourself this question: What exactly are the means, motives, and opportunities for potential criminal hackers of my business computing systems? Getting a solid answer to this question is the key to identifying your most vulnerable assets and developing a security plan.

The Home Depot Breach

In September 2014, Home Depot Inc. made an announcement that as many as 56 million cards may have been compromised in a sustained malware attack on its payment systems—an attack that had been underway for many months. This security breach was even larger than the previous holiday attack at Target Corporation. This is yet another highlight in a string of similar events at corporations around the world, and reminds us of the vulnerability of U.S. retailers to hackers that continue to aggressively target their payment systems. Home Depot has said that the company had begun a project to fully encrypt its payment terminal data this year, but was outpaced by the hackers. The Home Depot attack is the latest in a wave of high-profile hackings at big merchants in recent months, ranging from high-end retailer Neiman Marcus Group Ltd. to grocer Supervalu Inc. to Asian restaurant chain P.F. Chang's China Bistro Inc.

According to many IT and computing system analysts, the top three hacker motives are financial, corporate espionage, and political activism. In the remainder of this article, we look closely at the financial motive, and then we help you consider the best approaches to securing your cloud-computing assets with BitCan.

Financial System Hackers

You’re probably most familiar with this type of hacker, since they cause the most damage and often feature in the news. The motive here is pretty obvious: make money the easy way, by stealing it. Financial system security hackers range in size from a few lone actors to large cyber-crime organizations—often with the backing of conventional criminal organizations. Collectively, these thieves are responsible for extracting billions of dollars from consumers and businesses each year.

These threats go well beyond the hobbyist community to a very high level of sophistication. All criminal attackers immerse themselves in a complex underground economy: a vast black market in which participants buy and sell toolkits, zero-day exploit code, and malware botnet services. Vast quantities of private data and intellectual property are up for sale—highly valuable data that has been stolen from victims. A recent market trend is the sale of web-exploit kits such as Blackhole, Nuclear Pack, and Phoenix—which they use to automate drive-by download attacks.

Some financial system hackers are opportunistic, and focus on small businesses and consumers. Larger operations go to great lengths to analyze large enterprises and specialize in one or two industry verticals. In a recent attack on the banking and credit card industry, a very organized group was able to pull off a global heist of $45 million in total from ATM-with an extreme degree of synchronization. These secondary attacks were feasible because of a previously undetected breach of some bank networks and a payment processor company.

Malicious hacker attacks are quite common, and often have tragic and highly disruptive outcomes. And these attacks are also inevitable, as more Internet users utilize cloud computing and storage. This raises more concerns about combating the effects of hacking, and it will become increasingly critical in the future. There is ongoing debate as to whether cloud computing is more vulnerable to hacking threats. After years of extensive industry debate, it’s been found to be the same problem in a different location. So, if businesses can build reliable security and recovery methods, then cloud computing can be a serious consideration. Most importantly, the freedom, accessibility, and collaboration that is available through cloud computing can far outweigh and mitigate the risks to your data security.

Many cloud computing users assume their data is held safe by the security measures of their cloud vendor. But, hackers use code-cracking algorithms and brute force attacks to acquire passwords, and they can also access data transmissions that lack proper encryption.

Ask yourself this question: Do you have solid infrastructure, processes, and procedures to ensure reliable, high-security backups of your sensitive and business-critical data? If you can’t answer this question with confidence, then we invite you to read on a bit further as we consider various aspects of a top-tier cloud backup service.

Your cloud backup service should process all data through encryption to ensure that it’s entirely unreadable by unauthorized users. It should only be possible to decrypt your data when you decide to retrieve it. Minimally, this means that data transmission should be done only through the SSL protocol and that strong passwords are necessary for information access and decoding.

No system is hacker-proof, but the greatest benefit of cloud a backup service is the high-degree of readiness for recovery from a hacking event. Companies that specialize in cloud backup services, like BitCan, reduce threats to your data by enabling full recovery of all business-critical data to its original state in just a matter of clicks. These backup companies replicate your cloud data and safeguard it in a separate cloud so that the likelihood of data loss from natural disasters and other threats remains infinitesimally small.

We recommend that you visit http://www.gobitcan.com and start your free 30-day trial. Or, you can read more below to learn how BitCan cloud backup services can help secure your backup data, support to your data-recovery plan, and bring you peace of mind.

Intensive Security for Your Online Backups

Rock-solid facilities. With BitCan cloud backup services, you can eliminate most of your backup infrastructure headaches and also alleviate your concerns about the safety and privacy of your cloud backups. Our robust, extreme-security data centers utilize precise electronic surveillance and multi-factor access control systems. The design of all our environmental systems aims to minimize the impact of any disruptions to operations. Multiple geographic locations and extensive redundancy add up to a high degree of resiliency against virtually all failure types, including natural disasters.

Protection from the bad guys. Not only do you get super-strong physical protection for your backup data, but we lock everything down with extensive network and security monitoring systems. As you expect, our systems include essential security measures such as distributed denial of service (DDoS) protection and password brute-force detection on all BitCan accounts. Additional security measures include:

Secure access and data transfer – all data access and transfers go through secure HTTP access using SSL
Unique users – Our identity and access management features allow you to control the level of access that users have to your BitCan infrastructure services.
Encrypted data storage – encrypt your backup data and objects using Advanced Encryption Standard (AES) 256, a secure symmetric-key encryption standard that employs 256-bit keys.
Security logs – BitCan provides extensive, verbose logs of all activity for all users of your account.
Native Support – Native support for MongoDB, MySQL, and Linux/Unix/Windows files.

Start your free 30-day free trial of BitCan today.

TL;DR: The realities of modern corporate networks make the move to distributed database architectures inevitable. How do you leverage the stability and security of traditional relational database designs while making the transition to distributed environments? One key consideration is to ensure your cloud databases are scalable enough to deliver the technology's cost and performance benefits.

Your conventional relational DBMS works without a hitch (mostly), yet you're pressured to convert it to a distributed database that scales horizontally in the cloud. Why? Your customers and users not only expect new capabilities, they need them to do their jobs. Topping the list of requirements is scalability.

David Maitland points out in an October 7, 2014, article on Bobsguide.com that startups in particular have to be prepared to see the demands on their databases expand from hundreds of requests per day to millions -- and back again -- in a very short time. Non-relational databases have the flexibility to grow and contract almost instantaneously as traffic patterns fluctuate. The key is managing the transition to scalable architectures.

Availability defines a distributed database

A truly distributed database is more than an RDBMS with one master and multiple slave nodes. One with multiple masters, or write nodes, definitely qualifies as distributed because it's all about availability: if one master fails, the system automatically rolls over to the next and the write is recorded. InformationWeek's Joe Masters Emison explains the distinction in a November 20, 2013, article.

The Evolving Web Paradigm

The evolution of database technology points to a "federated" database that is document and graph based, as well as globally queryable. Source: JeffSayre.com

The CAP theorem states that you can have strict availability or strict consistency, but not both. It happens all the time: a system is instructed to write different information to the same record at the same time. You can either stop writing (no availability) or write two different records (no consistency). In the real world, everything falls between these two extremes: business processes favor high availability first and deal with inconsistencies later.

Kyle Kingsbury's Call Me Maybe project measured the ability of distributed databases such as NoSQL to handle multiple partitions in real-world conflict situations. InformationWeek's Joe Masters Emison describes the project in a September 5, 2013, article. The upshot is that distributed databases fail -- as all databases sometimes do -- but they do so less cleanly than single-node databases, so tracking and correcting the resulting data loss requires asking a new set of questions.

The Morpheus database-as-a-service (DBaaS) delivers the flexibility modern databases require while ensuring the performance and security IT managers require. Morpheus provides the reliability of 100% bare-metal SSD hosting on a high-availability network with ultra-low latency to major peering points and cloud hosts. You can optimize queries in real time and analyze key database metrics.

Morpheus supports heterogeneous ElasticSearch, MongoDB, MySQL, and Redis databases. Visit the Morpheus site for pricing information or to sign up for a free trial account.

Securing distributed databases is also more complex, and not just because the data resides in multiple physical and virtual locations. As with most new technologies, the initial emphasis is on features rather than safety. Also, as the databases are used in production settings, unforeseen security concerns are more likely to be addressed as they arise. (The upside of this equation is that because the databases are more obscure, they present a smaller profile to the bad guys.)

The advent of the self-aware app

Databases are now designed to monitor their connections, available bandwidth, and other environmental factors. When demand surges, such as during the holiday shopping season, the database automatically puts more cloud servers online to handle the increased demand, and similarly puts them offline when demand returns to normal.

This on-demand flexibility relies on the cloud service's APIs, whether they use proprietary API calls or open-source technology such as OpenStack. Today's container-based architectures, such as Docker, encapsulate all resources required to run the app, including frameworks and libraries.

TL;DR: Even jaded IT veterans are sitting up and taking notice of the potential benefits of Docker's microservice model of app development, deployment, and maintenance. By containerizing the entire runtime environment, Docker ensures apps will function smoothly on any platform. By separating app components at such a granular level, Docker lets you apply patches and updates seamlessly without having to shut down the entire app.

The tech industry is noted for its incredibly short "next-big-thing" cycles. After all, "hype" and "tech" go together like mashed potatoes and gravy. Every now and then, one of these over-heated breakthroughs actually lives up to all the blather.

Docker is a Linux-based development environment designed to make it easy to create distributed applications. The Docker Engine is the packaging tool that containerizes all resources comprising the app's runtime environment. The Docker Hub is a cloud service for sharing application "artifacts" via public and private repositories, and automating the build pipeline.

Because Docker lets developers ship their code in a self-contained runtime environment, their apps run on any platform without the portability glitches that often drive sysadmins crazy when a program hiccups on a platform other than the one on which it was created.

How Docker out-virtualizes VMs

It's natural to compare Docker's microservice-based containers to virtual machines. As Lucas Carlson explains in a September 30, 2014, article on VentureBeat, you can fit from 10 to 100 times as many containers on a server than VMs. More importantly, there's no need for a hypervisor intermediation layer that's required to manage the VMs on the physical hardware, as Docker VP of Services James Turnbull describes in a July 9, 2014, interview with Jodi Biddle on OpenSource.com.

Virtual Machines and Docker

Docker containers are faster and more efficient than virtual machines in part because they require no guest OS or separate hypervisor management layer. Source: Docker

Because Docker offers virtualization at the operating system level, the containers run user space atop the OS kernel, according to Turnbull, which makes them incredibly lightweight and fast. Carlson's September 14, 2014, article on JavaWorld compares Docker development to a big Lego set of other people's containers that you can combine without worrying about incompatibilities.

You get many of the same plug-and-play capabilities when you choose to host your apps with the Morpheus cloud database-as-a-service (DBaaS). Morpheus lets you provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch on a single dashboard. The service's SSD-based infrastructure and automatic daily backups ensure the reliability and accessibility your data requires.

Morpheus deploys all your database instances with a free full replica set, and the service's single-click DB provisioning allows you to bring up a new instance of any SQL, NoSQL, or in-memory database in seconds. Visit the Morpheus site for pricing information or to sign up for a free trial account.

Services such as Morpheus deliver the promise of burgeoning technologies such as Docker while allowing you to preserve your investment in existing database technologies. In a time of industry transition, it's great to know you can get the best of both worlds, minus the departmental upheaval.

TL;DR: Among the Big Name software companies, Microsoft appears to be making the smoothest transition to a cloud-centric data universe. The early reviews of the company's Azure DocumentDB database-as-a-service indicate that Microsoft is going all in -- at least in part at the expense of the company's database mainstays. In fact, DBaaS may serve as the cornerstone of Microsoft's reconstitution into a developer-services provider.

Microsoft is getting a lot of press lately -- most of it not so good. (Exhibit A: CEO Satya Nadella stated at a recent women-in-tech conference that women should count on hard work and "good karma" to earn them a raise rather than to ask for one directly. FastCompany's Lydia Dishman reports in an October 14, 2014, article that Nadella's gaffe will ultimately be a great help to women who work for Microsoft and other tech firms.)

In one area at least Microsoft is getting solid reviews: the burgeoning database-as-a-service industry. The company's Azure DocumentDB earned a passing grade from early adopter Xomni, which provides cloud services to retailers. In a September 29, 2014, article, InformationWeek's Doug Henschen describes what Xomni liked and didn't like about DocumentDB.

; img style=

Microsoft's Azure DocumentDB uses a resource model in which resources under a database account are addressable via a logical and stable URI. Source: Microsoft

That's not to say DocumentDB doesn't have some very rough edges. As Xomni CTO Daron Yondem points out, there's no built-in search function or connection to Microsoft's new Azure Search. Another DocumentDB area in need of improvement is its software development kit, according to Yondem. While you can't expect much in the way of development tools in a preview release, Xomni relied on third-party tools to add a search function to DocumentDB.

On the plus side, Yondem points to DocumentDB's tuning feature for balancing transactional consistency and performance, as well as its support for SQL queries.

Microsoft embraces open source, icicles spotted in hell

Another sign that the Great Microsoft Makeover may be more than hype is the company's 180 on open source. Not only are Microsoft's new cloud services based on open source, the company is making it easier to change the open-source code repository via pull requests.

Readwrite's Matt Asay explains in an October 9, 2014, article that Microsoft is slowly winning over developers, who account for an increasing percentage of the technology buying in organizations. CIOs have long been convinced of the ability of Microsoft products to boost worker productivity, and now developers warming to the company. Asay asserts that Microsoft will succeed because of its ability to keep it simple and keep it safe.

That's precisely the secret to the success of the Morpheus database-as-a-service. Morpheus lets you provision a new instance of any SQL, NoSQL, or in-memory database with a single click. Your databases are automatically backed up each day and provisioned with a free live replica for failover and fault tolerance.

Your MongoDB, MySQL, Redis, and ElasticSearch databases are protected via VPN connections and monitored from a single dashboard. You can use your choice of developer tools to connect, configure, and manage your databases. Visit the Morpheus site for pricing information or to create a free account.

Morpheus Technical FAQ

What databases are currently supported?

The following is a list of the currently supported databases:

Redis-2.8.13
Elasticsearch-1.2.2
MongoDB-2.6.3
MySQL-5.6.19

What security measures does Morpheus take to protect instances?

Each instance has its own firewall.

How does scaling actually work?

To add more storage, set your notification level by clicking your instance’s Settings and choosing a notification storage limit.

When this threshold is reached, we send an email letting you know the storage capacity that remains. When you receive the email, you can reduce your storage needs or go to the Dashboard’s Instance page and add an instance.

Future plans will allow in-place upgrades to larger plans.

How often are Morpheus instances backed-up?

At this time, only Redis and MySQL are backed up, one per day for four days. Backups are taken just after midnight, Pacific Standard Time.

How do I access these backup?

From the dashboard select your Redis or MySQL instance. Click Backups.

Click the download iconto download this version to your hard-drive or click the restore iconto make this backup, the current cloud version.

Can I change the frequency of the backups?

No. At this time there are no configurations for backups. However, it’s easy to connect your Morpheus DB with a comprehensive cloud backup service such as BitCan available at http://gobitcan.com/

Can I check my logs?

Yes. From the Morpheus dashboard, select the database type then from the Instances page, select the instance. Click on Logs to display log messages.

This is useful for those wanting a quick overview of your instance’s connections and activities. If you want more finely-tuned log experience, integrate the Morpheus-created instance with a log manager such as Oohlalog available at http://www.oohlalog.com/

Can I check the status of my instances?

Yes. Use the Check Status button on your Instance page.

Servers are listed by IP address. If a server is not running, you can send a restart signal from here.

Can I access the live replicas created?

Yes and no, it depends on the type of instance.

In the case of MongoDB, you cannot directly connect to the three replicated data nodes. If any one of them fail, the others continue to operate in its place.

MySQL has two replicas. You can connect to either running instance. To connect, view both IP addresses on the check status screen. Failover needs to be handled from the driver in the application. Check the MySQL documentation at http://dev.mysql.com/doc/ for information about how to do this.

For Elasticsearch, you can connect to either of two clustered nodes, viewable from the check status screen. Failover is automatic if you are using the standard Elasticsearch node transport to connect from an application. Elasticsearch connects in two ways, HTTP or by using a node client.

HTTP—everything, such as searches, are done using standard URLs. The downside is an URL can only connect to one address at a time as the node transport essentially runs a cluster proxy inside your application. All you need to do is connect to your application, which knows the health of cluster and will load-balance route your search to any available node in the cluster.

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html for information about straight HTTP.

Node Client— Instantiating a node based client is the simplest way to get a client that can execute operations against elasticsearch.See http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/client.html for more information about ES node client.

With Redis, you can connect to either your master or slave. Currently, failover needs to be handled from the application, or you can provide your own Redis sentinel service. In the future, we plan to as Redis Sentinel to our standard Redis offering.

Why use Morpheus when there’s Amazon Web Services?

Simply put, to get the same high-performance database instance that Morpheus spins up in minutes would cost a lot of time, money effort and expertise. Morpheus lets you manage all your databases using a simple dashboard leaving our experts to handle the scaling/descaling, load balancing, disaster recovery, and security knob-twisting. Another benefit is that Morpheus-created instances are easy to integrate with many third-party add-ons like log management and more finely-tuned backups.

TL;DR: As software becomes the force driving industries of all types and sizes, the nature of app development and management is changing fundamentally. Gone are the days of centralized control via complex, interdependent, hierarchical architectures. Welcome to the Internet model of software: small pieces, loosely joined via the microservice architecture. At the forefront of the new software model are business managers, who base software-design decisions on existing and future business processes.

Anyone who works in technology knows change is constant. But change is also hard -- especially the kind of transformational change presently occurring in the software business with the arrival of the microservices model of app development, deployment, and maintenance. As usual, not everybody gets it.

Considering how revolutionary the microservices approach to software design is, the misconceptions surrounding the technology are understandable. Diginomica's Phil Wainewright gets to the heart of the problem in a September 30, 2014, article. When Wainewright scanned the agenda for an upcoming conference on the software-defined enterprise, he was flabbergasted to see all the focus on activities within the data center: virtualization, containerization, and software-defined storage and networking.

As Wainewright points out, the last thing you want to do is add a layer of "efficient flexibility underneath a brittle and antiquated business infrastructure." That's the approach that doomed the service-oriented architectures of a decade ago. Instead, the data center must be perceived as merely one component of a configurable and extensible software-defined enterprise. The foundation of tomorrow's networks are simple, easily exchangeable microservices that permeate the organization rather than residing in a single, central repository.

Microservices complete the transition from tightly coupled components through SOA's loose coupling to complete decoupling to facilitate continuous delivery. Source: PricewaterhouseCoopers

To paraphrase a time-worn axiom, if you love your software, let it go. The company's business managers must drive the decisions about technology spending based on what they know of the organization's goals and assets.

Microservices: fine-grained, stateless, self-contained

Like SOA, microservices are designed to be more responsive and adaptable to business processes and needs. What doomed SOA approaches was the complexity they added to systems management by applying a middleware layer to software development and deployment. As ZDNet's Joe McKendrick explains in a September 30, 2014, article, the philosophy underlying microservices is to keep it simple.

The services are generally constructed using Node.js or other Web-oriented languages, or in functional languages such as Scala or the Clojure Lisp library, according to PricewaterhouseCoopers analysts Galen Gruman and Alan Morrison in their comprehensive microservices-architecture overview. Another defining characteristic is that microservices are perfect fit for the APIs and RESTful services that are increasingly the basis for enterprise functions.

Microservice architectures are distinguished from service-oriented architectures in nearly every way. Source: PricewaterhouseCoopers

In the modern business world, "rapid" development simply isn't fast enough. The goal for app developers is continuous delivery of patches, updates, and enhancements. The discrete, self-contained, and loosely coupled nature of microservices allow them to be swapped out or ignored without affecting the performance of the application.

The March 25, 2014, microservices overview written by Martin Fowler and James Lewis provides perhaps the most in-depth examination of the technology. Even more important than the technical aspects of the microservices approach is the organizational changes the technology represents. In particular, development shifts from a project model, where the "team" hands off the end result and disbands, to a product model, where the people who build the app take ownership of it: "You build it, you run it."

The same development-maintenance integration is evident in the Morpheus database as a service, which allows you to provision, deploy, and host MySQL, MongoDB, Redis, and Elasticsearch databases using a single, simple console. The ability to spin up instances for elastic scalability based on the demands of a given momentm, whether growing rapidly or shrinking marginally, means that your instances will be far more productive and efficient. In addition to residing on high-performance solid-state drives, your databases are provisioned with free live replicas for fault tolerance and fail over. Visit the Morpheus site for to create a free account.

In yet another data breach, JP Morgan lost gigabytes of customer data, including some account information. Find out the technical details of the attack that allowed it to be successful.

TL;DR: JP Morgan was recently the latest large back to fall victim to a data breach that lost customer data. In spite of the company already having some very sophisticated security measures in place, the attackers were able to get into the database by exploiting a vulnerability they discovered in the JP Morgan web site. From there, writing some custom malware allowed them to obtain gigabytes of customer data over the course of roughly two months.

Security Measures Already in Place

The bank already had a strong security system in place, with very sophisticated attack detection systems. Two months before the breach, JP Morgan announced that they would begin spending approximately $250 million per year on cybersecurity and would have roughly 1,000 people working on this part of their infrastructure.

This would seemingly be a tough structure to bypass for intruders looking to gain access to the bank’s data. Unfortunately for the bank, attackers managed to find a way to do so.

The Beginning of the Breach

In early June when the attackers discovered a flaw in one of the JP Morgan web sites. The intruders used this flaw to begin writing custom programs that could be used to attack the bank’s corporate network. The malware was tailor made for infiltrating the JP Morgan network and digging deep into their systems.

The attackers are thought to have succeeded by finding a number of zero-day vulnerabilities, by which they could gain control of the systems they were after using methods that were unknown prior to the attack. This meant that programmers also had zero time to create any patches that could be used to counter the infiltration.

Example of a zero-day attack. Source: FireEye

The Data Collection

With their custom malware in place, the attackers were able to slowly gather consumer data. Their advanced attack programs were able to avoid detection by the bank’s extremely sophisticated detection alarms specifically designed to determine when stolen data was being pulled from their systems, and to avoid it for more than two months!

To help avoid detection, the malware was designed to route through computers in a number of foreign countries, and then was most often redirected to a site in Russia. During the two month period, the attackers were able to use this redirection to obtain gigabytes of customer data from the bank undetected. When JP Morgan was eventually able to find the breach, they were able to quickly put an end to it using their security measures.

Example of malware detection and reaction. Source: Securosis

Securing Big Data

Trying to secure large amounts of data can be a challenging task, especially if you do not have a large and sophisticated system in place like JP Morgan. One way to help with this is to find a company that offers a database as a service on the cloud.

One such service is Morpheus, which offers numerous security features to help protect important data, including online monitoring and VPN connections to databases. In addition, all databases are backed up, archived, and replicated on an SSD-backed infrastructure automatically.

With Morpheus, you can choose from several databases, including MySQL, MongoDB, and others, plus all databases are easy to install and scale based on your needs. So, visit the Morpheus site for pricing information or to try it out with a free account!

Things move so quickly in the coder-sphere that to keep up you have to give developers what they need and then stand back.

TL;DR: Development cycles continue to shrink. Companies have to adapt their processes or risk being left in their competitors' dust. Ensuring the software developers in your organization have what they need to deliver apps that meet your needs requires giving coders the kid-glove treatment: Let them use whichever tool and methods they prefer, and give them plenty of room to operate in.

Managing software developers is easy. All you have to do is think like they think -- and stay out of their way.

That's a big part of the message of Adrian Cockroft, who presented at Monktoberfest 2014 in Portland, Maine. Cockroft formerly ran cloud operations at Netflix, and he is now a fellow at Battery Ventures Technology. Readwrite's Matt Asay reports on Cockroft's presentation at Monktoberfest 2014 in an October 6, 2014, article.

You have to be fast, whatever you do. Cockroft says its most efficient to let developers use the tools they prefer, and to work in the manner that best suits them. He recommends that companies let cloud services do the "heavy lifting" in lieu of buying and maintaining the traditional hardware, software, and app-development infrastructure.

Ultimately the organization adopts an amorphous structure based on constant, iterative development of apps comprised of loosely coupled microservices. Distinctions between development and operations blur because various parts of the apps are under construction at any time -- without preventing the apps from working properly.

A tight, iterative app-development process loses the traditional hierarchy of roles and becomes cyclical. Source: Adrian Cockroft

Coders never stop learning new techniques, technologies

You'd be challenged to find a profession that changes faster than software development, which means coders are always on the lookout for a better way to get their work done. Code School founder and CEO Gregg Pollack claims that companies don't always do enough to encourage developers to pursue new and different interests.

In an October 16, 2014, article on TechCrunch, Pollack describes several ways organizations can encourage continuing education of their coders. One method that has met with success is pair programming, which combines work and learning because the two developers alternate between instructor and student roles, according to Pollack.

Considering the shear volume of software development innovations, it's impossible to be up on all the latest and greatest. In an October 18, 2014, article, TechCrunch's Jon Evans describes a condition he calls "developaralysis." This might cause a coder who is fluent in only eight programming languages to feel insecure. And for each language there are an untold number of frameworks, toolkits, and libraries to master.

One solution to developaralysis is the Morpheus database-as-a-service (DBaaS), which is unique in supporting SQL, NoSQL, and in-memory databases. Morpheus lets you provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and ElasticSearch. Access to key statistics across all databases is available via a single console.

Morpheus offers free daily backups and replication of all database instances, and VPN connections protect your data from the public Internet. Visit the Morpheus site for pricing information or to create a free account.

Developers walk the thin line between relying on what they know and searching out new, better approaches -- ones that their competitors might use to run rings around them. As Readwrite's Asay points out, developers are ultimately creators, so managers need to allow coders' creativity to flow -- often simply by making themselves scarce.

When you have a vast amount of data, scaling your database can be very difficult. See how eBay solved the potential problems involved in providing users with search suggestions by using MongoDB.

TL;DR: eBay uses MongoDB to perform a number of tasks involving large amounts of data. Such projects include search suggestions, cloud management, storage of metadata, and the categorization of merchandise. The search suggestions are a key feature of their web site, and MongoDB provides them with a way to provide these suggestions to users quickly.

What are search suggestions?

eBay's search suggestions at work. Source: AuctionBytes Blog

When you begin typing in a query into eBay’s search box, a list of suggested completed queries will appear underneath the box. If one of these suggestions matches what you planned to type in, you can immediately select it by using the mouse or your arrow keys rather than having to type out the remainder of you search query.

This is a great feature to have for users, as it not only may complete the intended query, but can also bring up a similar query the user may prefer over the original one. The suggestions feature provides the user with a convenient and helpful way of searching for particular items of interest.

What has to be done

To provide such assistance requires a large amount of possible suggestions to be stored, and these must be returned extremely quickly to the user to be even remotely useful. eBay determined that any query to the database to return suggestions must make the round trip in less than 60-70 milliseconds!

This could be very challenging with a traditional relational database. eBay instead decided to try out a document store, MongoDB, to see if they could achieve the needed performance.

How eBay implemented Mongo

eBay made the search suggestion list is a MongoDB document. This document was then indexed by word prefix, and in addition by certain pieces of metadata, such as product category. The multiple indexes provided them with flexibility in looking up suggestions and also kept the queries speedy.

eBay was able to use a single replica set which made sharding unnecessary. In addition, data was placed in memory, which again provided a speed boost for the queries.

Database sharding visualized. Source: Cubrid Shard

Mongo’s Performance

With all this in place, could the queries to the database still return suggestions to the user in the allotted time (less than 60-70 milliseconds)? As it turned out, MongoDB was able to make the round trip in less than 1.4 milliseconds!

Given this incredible performance, eBay was able to safely rely on MongoDB to provide speedy search suggestions to its users.

Could your business do the same?

If your business needs to query a large amount of data quickly, MongoDB may be a good choice for you. One way to easily get MongoDB working for you quickly is to use a provider that offers the database as a service.

Morpheus provides MongoDB (and several other popular databases) as a service, with easy setup and maintenance. The service is easily scalable, allowing you to add or remove space as your needs change. Additional services include online monitoring, VPN connections to databases, and excellent support.

All databases are backed up, replicated, and archived automatically on an SSD-backed infrastructure, ensuring you do not lose any of your important data. So, try out Morpheus today and get your data into a fast, secure, scalable database!

Design your huge database tables to ensure they can handle queries without slowing the database's performance to a crawl.

TL;DR: Get a jump on query optimization in your databases by designing tables with speed in mind. This entails choosing the best data types for table fields, choosing the correct fields to index, and knowing when and how to split your tables. It also helps to be able to distinguish table partitioning from sharding.

It's a problem as old as databases themselves: large tables slow query performance. Out of this relatively straightforward problem has sprung an industry of indexing, tuning, and optimizing methodologies. The big question is, Which approach is best for your database system?

For MySQL databases in particular, query performance starts with the design of the table itself. Justin Ellingwood explains the basics of query optimization in MySQL and MariaDB in a Digital Ocean article from November 11, 2013, and updated on May 30, 2014.

For example, data elements that will be updated frequently should be in their own table to prevent the query cache from being dumped and rebuilt repeatedly. Generally speaking, the smaller the table, the faster the updates.

Similarly, by limiting data sizes up front you avoid wasted storage space, such as by using the "enum" type rather than "varchar" when a field that takes string values has a limited number of valid entries.

There's more than one way to 'split' a table

Generally speaking, the bigger the database table, the longer it takes to access and modify. Unfortunately, database performance optimization isn't as simple as dividing big tables into several smaller ones. Michael Tocker describes 10 ways to improve the speed of large MySQL tables in an October 24, 2013, post on his Master MySQL blog.

One of the 10 methods is to use partitioning to reduce the size of indexes by creating several "tables" out of one. This minimizes index->lock contention. Tocker also recommends using InnoDB rather than MyISAM even though MyISAM can be faster at inserts to the end of a table. MyISAM's table locking restricts updates and deletes, and its use of a single lock to protect the key buffer when loading or removing data from disk causes contention.

Much confusion surrounds the concept of database table partitioning, particularly how partitioning is distinguished from sharding. When the question was posed on Quora, Mosaic CTO Tony Bako explained that partitioning divides logical data elements into multiple entities to improve performance, availability, and maintainability.

Conversely, sharding is a form of horizontal partitioning that creates replicas of the schema and then divides the data stored in each shard by the shard key. This requires that DBAs distribute load and space evenly across shards based on data-access patterns and space considerations.

Sharding uses horizontal partitioning to store data in physically separate databases; here a user table is sharded by values in the "s_age" field. Source: CUBRID

With the Morpheus database-as-a-service (DBaaS) you can monitor your MySQL, MongoDB, Redis, and ElasticSearch databases via a single dashboard. Morpheus lets you bring up a new instance of any SQL, NoSQL, or in-memory database with a single clock. Automatic daily backups and free live replica sets for each provisioned database ensure that your data is secure.

In addition, database performance is optimized via Morpheus's SSD-backed infrastructure and direct patching into EC2 for ultra-low latency. Visit the Morpheus site for pricing information or to create a free account.

Removing duplicate entries from merged database tables can be anything but routine -- and the source of performance woes.

TL;DR: Combining tables frequently results in duplicate entries that can be removed in several ways. The trick is knowing which way is best for a given situation. Often the only way to determine the best approach is by testing several and comparing their effect on database performance.

It is one of the most common operations in database management: Merge two tables that use different schema while also removing duplicate entries. Yet there are as many approaches to this problem as there are types of database tables. There are also as many potential glitches.

Here's a look at three ways to address the situation in SQL and MySQL.

All the news that's fit to merge

Combining multiple tables with similar values often creates duplicate entries. Several methods are available for eliminating duplicate values in SQL, but it can be tricky to determine which is best in a given situation.

In a StackOverflow post from October 2012, a number of approaches were proposed for removing duplicates from joined tables. The first was to convert an inner query to a common table expression (CTE):

A common table expression for an inner join often has a lower impact on performance than using the DISTINCT keyword to eliminate duplicates. Source: StackOverflow

The second approach was to use the DISTINCT keyword, which one poster claims performs better in some cases. Also suggested were use of the string_agg function and the group by clause.

Getting up close and personal with the UNION clause

One of the basic elements in the SQL toolbox is the UNION operator, which checks for duplicates and returns only distinct rows, and also stores data from both tables without duplicates:

Insert rows from a second table when their values don't match those of the joined table, or create a new table that doesn't affect either of the original tables. Source: StackOverflow

Alternatively, you can use the SELECT INTO command to create a new table from the contents of two separate tables in a way that removes duplicates:

The SELECT INTO command creates a new table from the content of two others and removes duplicates in the original tables. Source: StackOverflow

Combining multiple gigabyte-size tables without a performance hit

It isn't unusual for database tables to become massive. Imagine merging a dozen tables with a total of nearly 10 million separate records and more than 3GB. The first suggestion on StackOverflow was to create a new table with unique constraint on the set of columns that establish a row's uniqueness, then to use INSERT IGNORE INTO ... SELECT FROM to move rows from the old table to the new one, and finally to truncate the old tables and use INSERT INTO ... SELECT FROM to return the rows to the original table.

Another proposed solution was to create a specific view that combines the results of the 12 tables, and then to filter the results by querying on the view you just created.

The Morpheus database-as-a-service (DBaaS) makes analyzing and optimizing databases much more efficient. Morpheus lets you provision, host, and deploy MySQL, MongoDB, Redis, and ElasticSearch databases via a single dashboard. It is the only DBaas to support SQL, NoSQL, and in-memory databases.

In addition to automatic daily backups, each database instance is deployed with a full replica set for fail over and fault tolerance. Morpheus's solid state disk infrastructure, direct patches into EC2, and colocation with fast peering points ensure peak database performance.

Visit the Morpheus site for pricing information and to create a free account. Few things in the database universe are as straightforward as Morpheus's DBaaS.

Big-name development-tool vendors are rushing to support the container model for efficient, VM-based app creation.

TL;DR: Google, Microsoft, and other software giants have taken aim at the burgeoning market for container-focused application development by supporting the technology on their platforms. However, the true potential of the technology may be realized only by embracing the revolutionary approach to container development proposed by startups such as Terminal.

Applications are becoming atomized. No longer are they comprised of lines and lines of code painstakingly written, compiled, debugged, tested, deployed, and updated. Now programs are pieced together using microservices like Lego blocks to build unique software creations directly in your browser.

What began about a year ago with the landscaping-changing introduction of the Docker open-source platform for creating, deploying, and running distributed applications has quickly become a mainstream movement involving the biggest names in the software industry.

The most-recent arrival on the container scene is the Kubernetes-based Google Container Engine, which the company announced in a November 4, 2014, blog post. As explained by InfoWorld's Sergar Yegulalp in a November 5, 2014, article, Container Engine groups virtual-machine instances into nodes that themselves are combined into clusters. The clusters run "Dockerized" versions of the apps at scale, performing all load balancing and handling communication between containers.

The introduction of Google Container Engine comes on the heels of last month's announcement by Microsoft that the next version of Windows Server will support Docker containers, as ZDNet's Mary Jo Foley reported in an October 15, 2014, article. Docker CTO Solomon Hykes sees Microsoft's support for the technology as "a strong message to the IT community" that Docker is "a mainstream part of the IT toolbox." Hykes was interviewed by ZDNet's Toby Wolpe for an October 28, 2014, story.

Native support for the Docker client in Microsoft's Windows Server 2015 allows the same standard Docker client and interface to be used on multiple development environments. Source: Microsoft

Container's impact on app dev more revolutionary than evolutionary

It's no surprise that Google, Microsoft, and other established development-tool vendors would attempt to integrate the mini-VM container approach with their existing platforms. However, the long-term impact of the technology may be in how it turns the existing development model on its head by creating a private server for each app from a thin slice of the cloud.

That's how Terminal co-founders Joseph Perna and Varun Ganapathi explain the approach their new company is taking to container-based app development. Ganapathi is quoted by TechCrunch's Kim-Mai Cutler in an October 24, 2014, post as predicting that everyone will have a cloud computer on which they run and interoperate apps securely. Because you can run multiple processes simultaneously from within a container, you can quickly stitch together dozens of virtual machines to create an app that scales instantly, without requiring a reboot.

Scalability is both one of the greatest benefits of container-based development, and one of the technology's greatest challenges. According to NetworkWorld's Pete Bartolik, the problem is that CIOs continue to overpay for capacity. In an October 16, 2014, article, Bartolik cites a recent study by Computerworld UK that found companies are using only half of the cloud server capacity they have provisioned. It seems IT managers have gotten into the habit of over-provisioning their own servers to accommodate surges in demand.

Of course, this cancels out one of the principal benefits of cloud computing: the cost savings realized by auto-scaling and elasticity. Containers are perceived as a means of instituting a true usage-based model. Yet you needn't wait for the container wave to take advantage of the efficiency cloud services deliver. The Morpheus database-as-a-service (DBaaS) ensures that you're paying for only the resources you're using.

Morpheus's straightforward interface lets you provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch from a single dashboard. Your databases are backed up automatically and deployed with a free full replica set for fault tolerance and fail over. Visit the Morpheus site for pricing information and to sign up for a free account.

TL;DR: Just a little bit of pre-lock planning ensures that a SQL or MySQL database you convert to read-only status performs as expected and is accessible by the right group of users. Doing so also helps guarantee the database can be safely unlocked when and if it should ever need to be updated or otherwise altered.

There's something about setting a database to read-only that is comforting for DBAs. It's almost as if the database is all grown up and ready to be kicked out of the house, er, I mean, sent out to make its own way in the world.

Of course, there are as many reasons to set a database to read-only -- temporarily or permanently -- as there are databases. Here's a rundown on the ways to lock the content of a SQL or MySQL database while allowing users to access its contents.

As Atif Shehzad explains on the MSSQLTips site, before you lock the database, you have to optimize it to ensure it's running at peak performance. You can't update the statistics of a read-only database, for example, nor can you create or defragment indexes. Also, you can't add extended properties to the database's objects, edit its permissions, or add/remove users.

Shehzad provides an eight-step pre-lock script to run through prior to converting a database to read-only. The checklist covers everything from creating a transaction log backup to modifying permissions and updating statistics.

An eight-step pre-lock checklist ensures your database is optimized and backed up prior to being switched to read-only. Source: MSSQLTips

Once the database is optimized and backed up, use either the ALTER DATABASE [database name] SET READ_ONLY command or the system stored procedure sp_dboption (the former is recommended because the stored procedure has been removed from recent versions of SQL Server). Alternatively, you can right-click the database in SSMS, choose Properties > Options, and set the Database Read-Only state to True. The database icon and name will change in SSMS to indicate its read-only state.

Converting a MySQL database to read-only -- and back again

A primary reason for setting a MySQL database as read-only is to ensure no updates are lost during a backup. The MySQL Documentation Library provides instructions for backing up master and slave servers in a replication setup via a global read lock and manipulation of the read_only system variable.

The instructions assume a replication setup with a master server (M1), slave server (S1), and clients (C1 connected to M1, and C2 connected to S1). The statements that put the master in a read-only state and that restore it to a normal operational state once the backup is complete are shown below. (Note that in some versions, "ON" becomes "1" and "OFF" becomes "0".)

The first statements switch the database to read-only, and the second revert it to its normal state after completing the backup. Source: MySQL Documentation Library

In its read-only state, the database can be queried but not updated. An August 23, 2013, post on StackOverflow explains how to revoke and then reinstate DML privileges for specific users, which is less likely to affect the performance of the entire database.

The Morpheus database as a service (DBaaS) lets you make these and other changes to your database as simply as pointing and clicking. Morpheus's single dashboard can be used to provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. It is the first and only DBaaS that supports SQL, NoSQL, and in-memory databases.

In addition to automatic daily backups, Morpheus provides a free live replica set for each database instance. Developers can use their choice of tools for connecting, configuring, and managing their databases. Visit the Morpheus site to create a free account. Think of all you can accomplish in the time you'll save when you no longer have to worry about backups!

TL;DR: Thinking about upgrading your MySQL database? When performing an upgrade, there are some factors you need to consider and some best practices that can be followed to help ensure the process goes as smoothly as possible. You will need to consider if an upgrade is necessary, whether it is a minor or major upgrade, and changes to query syntax, results, and performance.

Do You Need to Upgrade?

The need to upgrade is based on the risk versus the reward. Any upgrade carries with it the risk of losing functionality (breaks something) or data (catastrophic loss). With that in mind, you may be running into bugs that are resolved in a later release, performance problems, or growing concerns about the security of the database as the current version continues to age. Any of these factors could cause an upgrade to be necessary, so you will need to follow some best practices to help mitigate as much risk as possible.

An example MySQL setup. Source: Programming Notes

Will the Upgrade be Minor or Major?

A minor upgrade is typically one where there is a small change in the third release number. For example, upgrading version 5.1.22 to 5.1.25 would be considered a minor upgrade. As long as the difference is relatively small, the risk to upgrade will be relatively low.

A major upgrade, on the other hand, involves a change in the second or the first number. For example, upgrading version 5.1.22 to 5.3.1 or 4.1.3 to 5.1.0 would usually be considered a major upgrade. In such cases, the risk becomes higher because more changes to the system have been implemented.

Consider the Changes

Before upgrading, it is best to examine the changes that have been made between the two versions. Changes to query syntax or the results of queries can cause your application to have erroneous data, errors, or even stop working. It is important to know what changes will need to be made in your queries to ensure that your system continues to function after the upgrade takes place.

Also, an upgrade could either cause increased or decreased performance, depending on what has changed and the system on which MySQL is running. If the upgrade could cause a decrease in performance, you will certainly want to consider if this is the right time to update.

Performance on a single thread comparison. Source: PERCONA

Performing the Upgrade

Typically, the best practice when upgrading is to follow this procedure:

Dump your user grant data
Dump your regular data
Restore your regular data in the new version
Restore your user grant data in the new version

Doing this, you significantly reduce your risk of losing data, since you will have backup dump files. In addition, since you are using the MySQL dump and restore, the restore process will use the format of the new MySQL version, which helps mitigate compatibility issues.

Easy Upgrades

If you want to upgrade even more easily, consider using a database as a service in the cloud. Such services make it easy to provision, replicate and archive your database, and make upgrading easier via the use of available tools.

One such service is Morpheus, which offers not only MySQL, but also lets you use MongoDB, ElasticSearch, or Redis. In addition, all databases are deployed on a high performance infrastructure with Solid State Drives and are automatically backed up, replicated, and archived. So, take a look at pricing information or open a free account today to begin taking advantage of this service!

TL;DR: When dealing with a user password, you want to be very careful in how this information is saved. Passwords stored in plain text within your database are a serious security risk both to you and your users, especially if your business is working with any of your users' financial or personal information. To keep from saving passwords in plain text, you can encrypt them using a salt and a hashing algorithm.

Plain Text Password Problems

While storing plain-text passwords can be handy when making prototypes and testing various systems, they can be disastrous when used in a production database. If an attacker somehow gains access to the database and its records, the hacker now can instantly make use of every user account. The reason: the passwords are all right there in plain text for the taking!

Back in 2006, the web site Reddit, a discussion forum, had a backup copy of its database stolen. Unfortunately, all of the passwords were stored in plain-text. The person that had the data could have easily taken over any of the accounts that were stored in the backup database by making use of the user names and passwords available.

This may not seem like a major problem for a discussion forum. If the administrator and moderator passwords were changed quickly, the intruder likely would only be able to post spam or other types of messages the user would not normally write. However, these same users may have used the same login information for other tasks, such as online banking or credit card accounts. This would indeed be a problem for the user once a hacker had access to such an account!

Plain text passwords are not a game, they are a security risk! Source: MacTrast

Salting and Hashing a Password

To avoid having plain-text passwords in your database, you need to store a value that has been altered in a way that will be very difficult to crack. The first step is to add a salt, which is a random string that is added to the password. This value can be either prepended or appended to the password, and should be long in order to provide the best security.

After the password is salted, it should then be hashed. Hashing will take the salted password and turn it into a string of characters that can be placed into the database instead of the plain-text password. There are a number of hashing algorithms, such as SHA256, SHA512, and more.

While implementing a salted password hashing can be more time consuming, it could save your users from having their passwords exposed or stolen. It is definitely a good idea to do this as a safeguard for the people using your services.

An example of password creation and verification with salting and hashing in place. Source: PacketLife

Further Protection

Another way to help protect your users is to make sure the database itself is secure. Keeping the database on site may be difficult for your business, but there are companies that offer databases as a service in the cloud.

One such company is Morpheus, which includes VPN connections to databases and online monitoring to help keep your database secure. In addition, databases are backed up, replicated, and archived automatically on an SSD-backed infrastructure. So, give Morpheus a try and get a secure, reliable database for your business!

TL;DR: In the past, it was standard practice to require that users follow a link in an email to migrate their password to a new system. Now administrators are hesitant to take up any more of their users' workday than necessary. Fortunately, automating the password-hash migration process is relatively easy to accomplish in all modern development environments.

DBAs generally don't like requiring that users reset their passwords. But sometimes a security upgrade or other major system change entails migration of hashed passwords.

When this happens, many admins don't hesitate to require all users to re-register on the new system via a link sent to them via email. That's the approach recommended in a four-year-old post on Webmasters Stack Exchange.

Today DBAs have many options for migrating password hashes without requiring any effort by users apart from logging in. The transparent approach relies on retaining the old password in the new system just long enough for each user to verify it when they sign into the new system for the first time.

A simple example of this password-migration approach is described in a Stack Overflow post from January 2013:

Verify the password with the new hash algorithm;
If the password doesn't match, compare it with the old hash algorithm;
If the password matches the old hash, calculate and store the new hash for the original password.

Variations on the hash-migration theme for Linux, PHP, others

Password migration is an important concern when considering whether to adopt a new platform. For example, supporters of the Discourse open source discussion system have devised a password-migration plug-in that stores the original password hash in a custom field.

The first time the user signs in with what the new system considers an incorrect password, the original hash method is used to calculate and compare the hash with the stored value. When there's a match, the "new" password is set automatically and the original hash is cleared.

Password migration is presented as a three-step process in a September 17, 2014, post on the Ole Aass site. First, create a table called users_migrate that has three columns: id, username, and password. Next, execute a query on the server that copies the id, username, and password data from the original user tables into the new table.

Run a query on the server that copies the user values from the original tables to the new password-migration table. Source: Ole Aass

Of course, it's also possible to overthink the problem. In a post from February 2013 on Stack Exchange's Super User site, someone pointed out that if there aren't that many users, it might be fastest to copy the hashes to the new system manually, one-by-one. Someone else recommended the chpasswd tool, and a third person suggested using lastlog to generate a list of users and then grep:

To migrate password hashes in Linux, generate a list of users with lastlog, and then grep them in /etc/shadow. Source: Super User

An even simpler approach to password management and other database-migration tasks is to take advantage of the simplicity and efficiency of the Morpheus database-as-a-service (DBaaS). Morpheus lets you provision, deploy, and host MySQL, MongoDB, Redis, and ElasticSearch databases from a single dashboard.

With the Morpheus DBaaS, you can invoke a new instance of any SQL, NoSQL, or in-memory database in seconds, and each instance is deployed with a free full replica set, in addition to automatic daily backups. Visit the Morpheus site to create a free account.

TL;DR: A common misconception about the document-based MongoDB NoSQL database is that it requires no schema at all. In fact, the first step in designing in MongoDB is selecting a schema that matches the database users' needs. Choosing the right schema allows you to take full advantage of the system's performance, efficiency, and scalability benefits.

Most of the people designing MongoDB databases come from a relational-database background. Transferring from a world of tables, joins, and normalization to the document-based approach of MongoDB and other NoSQL databases can be liberating and daunting at the same time.

MongoDB is designed for speed: You can embed all sorts of data types and structures in collections of documents that are easy to query. To realize the performance potential of MongoDB, you have to design collections to match the app's most common access patterns.

In an August 2013 post on Percona's MySQL Performance Blog, Stephane Combaudon uses the example of a simple passport database. In MySQL, you would typically create a "people" table with "id" and "name" columns, and a "passport" table with "id", "people_id", "country", and "valid_until" columns. Then you would use joins between the tables to run queries.

A basic passport database in MySQL might use joins between two separate tables to query the database. Source: MySQL Performance Blog

In contrast, a MongoDB database for the same purpose could use a single collection to store all the passport information, but this makes it difficult to determine which attributes are associated with which objects.

The same passport database in MongoDB could place all data elements in a single collection. Source: MySQL Performance Blog

Alternatively, you could embed the passport information inside the people information, or vice-versa, although this could be a problem if some people don't have passports, such as "Cinderella" in the example below.

A MongoDB passport database could embed the people information inside the passport information, though this design likely doesn't optimize performance. Source: MySQL Performance Blog

In this example, you're much more likely to access people information than passport information, so having two separate collections makes sense because it keeps less data in memory. When you need the passport data, simply add a join to the app.

The dangers of attempting 1:1 conversions of relational DBs

Many of the skills you learned in developing relational databases transfer smoothly to MongoDB's document-based model, but the principal exception is schema design, as InfoWorld's Andrew C. Oliver explains in a January 14, 2014, article. If you attempt a 1:1 port of an RDBMS schema to MongoDB, you're almost certain to run into performance problems.

Oliver points out that most of the complaints about MongoDB are by people whose choice of schema was all wrong for a document-focused database. A 1:1 table-to-document port is prone to cause missed joins, lost atomicity (although you can have atomic writes within a single MongoDB document), more required operations, and a failure to realize the performance benefits of parallelism.

By not enforcing a schema on a document or schema the way pre-defined schemas are required in RDBMSs, your databases are theoretically easier to develop and modify. In practice, things don't always work out this way. Among the MongoDB gotchas examined by Russell Smith in a Rainforest Blog post from November 2012 and updated on July 29, 2014, is failure to give schema design the attention it deserves.

Of course, MongoDB databases don't exist in isolation. Services such as the Morpheus database-as-a-service (DBaaS) are geared to meet the real-world needs of organizations that rely on a mix of SQL, NoSQL, and in-memory databases. In fact, Morpheus is the first and only DBaaS that lets you provision, deploy, and host heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases.

With Morpheus, you can bring up an instance of any database, monitor it, and optimize its performance in just seconds via a single dashboard. And all database instances include a free full replica set. Visit the Morpheus site for to sign up for a free account.

NoSQL Will Protect You From The Onslaught of Data Overload (or a bull charging down an alley)

How Is Google Analytics So Damn Fast?

Slowly But Surely, The Dept. of Defense Plods Towards The Modern Day

Why They Do It: Protect Your Data by Learning What Makes Hackers Tick

The Key to Distributed Database Performance: Scalability

Contain(er) Yourself: Separating Docker Hype from the Tech's Reality

Could Database as a Service Be What Saves Microsoft's Bacon?

Morpheus Technical FAQ

Morpheus Technical FAQ

What databases are currently supported?

What security measures does Morpheus take to protect instances?

How does scaling actually work?

How often are Morpheus instances backed-up?

How do I access these backup?

Can I change the frequency of the backups?

Can I check my logs?

Can I check the status of my instances?

Can I access the live replicas created?

Why use Morpheus when there’s Amazon Web Services?

The New Reality: Microservices Apply the Internet Model to App Development

The Technical Details of the JP Morgan Data Breach

How to Keep Coders Happy - Keep Them Learning, and Leave Them Alone

How eBay Solves the Database Scaling Problem with MongoDB

How to Handle Huge Database Tables

Merge Databases with Different Schema and Duplicate Entries

Container-based Development Separating Hype from Substance

How (And Why) Make Read-only Versions of Your SQL and MySQL Databases

Morpheus Lessons: Best Practices for Upgrading MySQL

Password Encryption: Keeping Hackers from Obtaining Passwords in Your Database

Quick and Simple Ways to Migrate Password Hashes Without Bugging Users

The Importance of Schema Design in 'Schema-less' MongoDB