Quantcast
Channel: Morpheus Blog
Viewing all 1101 articles
Browse latest View live

Diagnose and Optimize MySQL Performance Bottlenecks

$
0
0

A common source of MySQL performance problems is tables with outdated, redundant, and otherwise-useless data. Slow queries can be fixed by optimizing one or all tables in your database in a way that doesn't lock users out any longer than necessary.

MySQL was originally designed to be the little database that could, yet MySQL installations keep getting bigger and more complicated: larger databases (often running in VMs), and larger and more widely disparate clusters. As database configurations increase in size and complexity, DBAs are more likely to encounter performance slowdowns. Yet the bigger and more complex the installation, the more difficult it is to diagnose and address the speed sappers.

The MySQL Reference Manual includes an overview of factors that affect database performance, as well as sections explaining how to optimize SQL statementsindexesInnoDB tablesMyISAM tablesMEMORY tableslocking operations, and MySQL Server, among other components.

At the hardware level, the most common sources of performance hits are disk seeks, disk reading and writing, CPU cycles, and memory bandwidth. Of these, memory management generally and disk I/O in particular top the list of performance-robbing suspects. In a June 16, 2014, article, ITworld's Matthew Mombrea focuses on the likelihood of encountering disk thrashing (a.k.a. I/O thrashing) when hosting multiple virtual machines running MySQL Server, each of which contains dozens of databases.

Data is constantly being swapped between RAM and disk, and obviously it's faster to access data in system memory than data on disk. When insufficient RAM is available to MySQL, dozens or hundreds of concurrent queries to disk will result in I/O thrashing. Comparing the server's load value to its CPU utilization will confirm this: high load value and low CPU utilization indicates high disk I/O wait times.

Determining how frequently you need to optimize your tables

The key to a smooth-running database is ensuring your tables are optimized. Striking the right balance between optimizing too often and optimizing too infrequently is a challenge for any DBA working with large MySQL databases. This quandary was presented in a Stack Overflow post from February 2012.

For a statistical database having more than 2,000 tables, each of which has approximately 100 million rows, how often should the tables be optimized when only 60 percent of them are updated every day (the remainder are archives)? You need to run OPTIMIZE on the table in three situations:

  • When its datafile is fragmented on disk
  • When many of its rows are updated or change size
  • When deleting many records and not adding many others

Run CHECK TABLE when you suspect the table's data is corrupted, and then REPAIR TABLE when corruption is reported. Use ANALYZE TABLE to update index cardinality.

In a separate Stack Overflow post from March 2011, the perils of optimizing too frequently are explained. Many databases use InnoDB with a single file rather than separate files per table. Optimizing in such situations can cause more disk space to be used rather than less. (Also, tables are locked during optimization, so large tables may be inaccessible for long periods.)

From the command line, you can use mysqlcheck to optimize one or all databases:

Run "mysqlcheck" from the command line to optimize one or all of your databases quickly. Source: Stack Overflow

Alternatively, you can run this PHP script to optimize all the tables in your database:

This PHP script will optimize all the tables in a database in one fell swoop. Source: Stack Overflow

Other suggestions are to implode the table names into one string so that you need only one optimize table query, and to use MySQL Administrator in the MySQL GUI Tools.

Monitoring and optimizing your MySQL, MongoDB, Redis, and ElasticSearch databases is a point-and-click process in the new Morpheus Virtual Appliance. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds. You can provision your database with astounding ease, and each database instance includes a free full replica set. The service supports a range of database tools and lets you analyze all your databases from a single dashboard. Visit the Morpheus site to create a free account.


Overcoming Barriers to Adoption of Network Functions Virtualization

$
0
0

IT managers are equally intrigued by the promise of network functions virtualization, and leery of handing over control of their critical networks to unproven software, much of which will be managed outside their data centers. Some of the questions surrounding NFV will be addressed by burgeoning standards efforts, but most organizations continue to adopt a "show me" attitude toward the technology.

Big things are predicted for software defined networks (SDN) and network functions virtualization (NFV), but as with any significant change in the global network infrastructure, the road to networking hardware independence will have its share of bumps.

For one thing, securing networks that have no physical boundary is no walk in the park. Viodi's Alan Weissberger explains in a December 29, 2014, post that replacing traditional hardware functions with software extends the potential attack space "exponentially." When you implement multiple virtual appliances on a single physical server, for example, they'll all be affected by a single breach of that server.

Even with the security concerns, the benefits of virtualization in terms of flexibility and potential cost savings are difficult for organizations of all sizes to ignore. In a December 23, 2014, article on TechWeek Europe, Ciena's Benoit de la Tour points out that virtualization allows network operators to expand or remove firewalling, load balancing, and other services and appliance functionality instantly.

Simplifying hardware management is one of NFV's principal selling points. John Fruehe writes on the Moor Insights & Strategy blog that NFV replaces some specialty networking hardware with software that runs on commercial off-the-shelf (COTS) x86 servers, or as VMs running on those servers. It also simplifies network architectures by reducing the total number of physical devices.

NFV offers organizations the potential to simplify network management by reducing the overall hardware footprint. Source: Moor Insights & Strategy

Potential NFV limitations: licensing and carrier control

The maturation of the technology underlying NFV concepts is shown in the creation of the Open Platform for NFV, a joint project of the Linux Foundation and such telecom/network companies as AT&T, Cisco, HP, NTT Docomo, and Vodafone. As ZDNet's Steven J. Vaughan-Nichols reports in a September 30, 2014, article, OPNFV is intended to create a " carrier-grade, integrated, open source NFV reference platform."

Linux Foundation Executive Director Jim Zemlin explains that the platform would be similar to Linux distributions serving a variety of needs and allowing code to be integrated upstream and downstream. Even with an open-source base, some potential NFV adopters are hesitant to cede so much control of their networks to carriers. For one thing, companies don't want to find themselves caught in the middle of feuding carriers and equipment vendors.

 

SDN and NFV have many similarities, but also some important differences, principally who hosts the bulk of the network hardware. Source: Moor Insights & Strategy

More importantly, IT managers are concerned about ensuring the reliability of their networks in such widespread virtual environments. Red Hat's Mark McLoughlin states in an October 8, 2014, post on OpenSource.com that network functions implemented as horizontal scale-out applications will address reliability the way cloud apps do: each application tier will be distributed among multiple failure domains. Scheduling of performance-sensitive applications will be in the hands of the telcos, which makes SLAs more important than ever.

Existing software licensing agreements also pose a challenge to organizations hoping to benefit from use of NFV. A November 26, 2014, article by TechTarget's Rob Lemos describes a hospital that attempted to switch from a license based on total unique users to one based on concurrent users as it implemented network virtualization. The process of renegotiating the licenses took four years.

Lemos points out that organizations often neglect to consider the implications of renegotiating software licenses when they convert to virtualized operations. Conversely, when you use the new Morpheus Virtual Appliance, you know exactly what you're getting -- and what you're paying for -- ahead of time. With the Morpheus database-as-a service, you have full control of the management of your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases via a single dashboard.

Morpheus lets you create a new instance of any SQL, NoSQL, or in-memory database in just seconds via a point-and-click interface. A free full replica set is provided for each database instance, and your MySQL and Redis databases are backed up. Visit the Morpheus site for pricing information and to create a free account.

The Fastest Way to Import Text, XML, and CSV Files into MySQL Tables

$
0
0

One of the best ways to improve the performance of MySQL databases is to determine the optimal approach for importing data from other sources, such as text files, XML, and CSV files. The key is to correlate the source data with the table structure.

Data is always on the move: from a Web form to an order-processing database, from a spreadsheet to an inventory database, or from a text file to customer list. One of the most common MySQL database operations is importing data from such an external source directly into a table. Data importing is also one of the tasks most likely to create a performance bottleneck.

The basic steps entailed in importing a text file to a MySQL table are covered in a Stack Overflow post from November 2012: first, use the LOAD DATA INFILE command.

The basic MySQL commands for creating a table and importing a text file into the table. Source: Stack Overflow

Note that you may need to enable the parameter "--local-infile=1" to get the command to run. You can also specify which columns the text file loads into:

This MySQL command specifies the columns into which the text file will be imported. Source: Stack Overflow

In this example, the file's text is placed into variables "@col1, @col2, @col3," so "myid" appears in column 1, "mydecimal" appears in column 3, and column 2 has a null value.

The table resulting when LOAD DATA is run with the target column specified. Source: Stack Overflow

The fastest way to import XML files into a MySQL table

As Database Journal's Rob Gravelle explains in a March 17, 2014, article, stored procedures would appear to be the best way to import XML data into MySQL tables, but after version 5.0.7, MySQL's LOAD XML INFILE and LOAD DATA INFILE statements can't run within a Stored Procedure. There's also no way to map XML data to table structures, among other limitations.

However, you can get around most of these limitations if you can target the XML file using a rigid and known structure per proc. The example Gravelle presents uses an XML file whose rows are all contained within an file, and whose columns are represented by a named attribute:

You can use a stored procedure to import XML data into a MySQL table if you specify the table structure beforehand. Source: Database Journal

The table you're importing to has an int ID and two varchars: because the ID is the primary key, it can't have nulls or duplicate values; last_name allows duplicates but not nulls; and first_name allows up to 100 characters of nearly any data type.

The MySQL table into which the XML file will be imported has the same three fields as the file. Source: Database Journal

Gravelle's approach for overcoming MySQL's import restrictions uses the "proc-friendly" Load_File() and ExtractValue() functions.

MySQL's XML-import limitations can be overcome by using the Load_file() and ExtractValue() functions. Source: Database Journal

Benchmarking techniques for importing CSV files to MySQL tables

When he tested various ways to import a CSV file into MySQL 5.6 and 5.7, Jaime Crespo discovered a technique that he claims improves the import time for MyISAM by 262 percent to 284 percent, and for InnoDB by 171 percent to 229 percent. The results of his tests are reported in an October 8, 2014, post on Crespo's MySQL DBA for Hire blog.

Crespo's test file was more than 3GB in size and had nearly 47 million rows. One of the fastest methods in Crespo's tests was by grouping queries in a multi-insert statement, which is used by "mysqldump". Crespo also attempted to improve LOAD DATA performance by augmenting the key_cache_size and by disabling the Performance Schema.

Crespo concludes that the fastest way to load CSV data into a MySQL table without using raw files is to use LOAD DATA syntax. Also, using parallelization for InnoDB boosts import speeds.

You won't find a more straightforward way to monitor your MySQL, MongoDB, Redis, and ElasticSearch databases than by using the dashboard interface of the Morpheus database-as-a-service (DBaaS). Morpheus is the first and only DBaaS to support SQL, NoSQL, and in-memory databases.

You can provision, deploy, and host your databases from a single dashboard. The service includes a free full replica set for each database instance, as well as automatic daily backups of MySQL and Redis databases. Visit the Morpheus site for pricing information and to create a free account.

When Do You Need ACID Compliance?

$
0
0

[title 1]ACID Compliance: Where You Need It and Where You Don't - A Detailed Strategy [title 2]When Do You Need to Implement ACID Compliance? [title 3]ACID Compliance: When Is It Needed?  [image 1]http://www.shutterstock.com/pic-137910614/stock-photo-reliability-level-conceptual-meter-indicate-hundred-per-cent-isolated-on-white-background.html?src=k3nO64-KxYcC935aiNIQKw-1-0&ws=1 [image 2]http://www.shutterstock.com/pic-149926586/stock-photo-business-man-writing-rising-reliability-quality-efficiency-flexibility-performance-and-speed.html?src=k3nO64-KxYcC935aiNIQKw-1-8&ws=1 [image 3]http://www.shutterstock.com/pic-168527630/stock-photo-consistency-dart-board-repeat-reliable-dependable-score-aim.html?src=k3nO64-KxYcC935aiNIQKw-1-99&ws=1

ACID compliance can be very beneficial in some situations, while others may not require it. Find out where you need ACID compliance and where you don't.

TL;DR: One decision that must be made when setting up your databases is whether or not ACID compliance will be required. In some instances, it is very important to have it implemented; however, other situations allow for you to be more flexible with your implementation.

What is ACID Compliance?

A quick overview of ACID. Source: LastBuild Quality Assurance

To know when it is best to use ACID compliance, you should have a good understanding of the benefits it brings when used. ACID stands for Atomicity, Consistency, Isolation, and Durability. Each of these is described in more detail below:

Atomicity - Database transactions often have multiple parts of the transaction that need to be completed. Atomicity ensures that a transaction will not be successful unless all parts of it are successful. Since an incomplete transaction will fail, the database is less likely to get corrupted or incomplete data as a result.

Consistency - For a database to have consistency, the data within it must comply with its data validation rules. In an ACID-compliant database, transactions that do not follow those data validation rules are rolled back so that the database is in the state it was prior to the transaction. Again, this makes it less likely for the database to have faulty or corrupted data.

Isolation - When handling multiple transactions at the same time, a database that has isolation can handle each of the transactions without any of them affecting any of the others. For example, suppose two customers are each trying to purchase an item that is in stock (five still available). If one customer wants four of the items and the other wants three, isolation ensures that one transaction is performed ahead of the other, which keeps you from selling more than you have in stock!

Durability - A database with durability ensures that all completed transactions are saved, even if some sort of technology failure occurs.

Example showing how an ACID constraint can be implemented. Source: Wikipedia

When Is ACID Compliance Beneficial?

ACID compliance is essential in some cases. For instance, financial data/transactions or personal data (such as health care information) really need to be ACID compliant for the safety and privacy of the customer. In some cases, there may even be a regulatory authority which requires ACID compliance for particular situations. In any of these cases, it is a good idea to implement ACID compliance for your database.

When is ACID Compliance Not Necessary?

There are cases where a small amount of older or less consistent data can be acceptable. For instance, blog post comments or data saved for an autocomplete feature may not need to be under the scrutiny that other types of data require, so these often do not necessitate ACID compliance.

Get Your Own Database

Whether you require ACID compliance or not, Morpheus Virtual Appliance is a tool that allows you manage heterogeneous databases in a single dashboard. With Morpheus, you have support for SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds. So, visit the Morpheus site for pricing information or to create a free account today!

How to Ensure Your SSL-TLS Connections Are Secure

$
0
0

Encryption is becoming an essential component of nearly all applications, but managing the Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates that are at the heart of most protected Internet connections is anything but simple. A new tool from Google can help ensure your apps are protected against man-in-the-middle attacks.

In the not-too-distant past, only certain types of Internet traffic was encrypted, primarily online purchases and any transmission of sensitive business information. Now the push is on to encrypt everything -- or nearly everything -- that travels over the Internet. While some analysts question whether the current SSL/TLS encryption standards are up to the task, certificate-based encryption isn't likely to be replaced anytime soon.

The Elecronic Frontier Foundation's Let's Encrypt program proposes a new certificate authority (CA) intended to make HTTPS the default on all websites. The EFF claims the current CA system for HTTPS is too complex, too costly, and too easy for the bad guys to beat.

Nearly every web user has encountered a warning or error message generated by a misconfigured certificate. The pop-ups are usually full of techno-jargon that can confuse engineers, let alone your typical site visitors. In fact, a recent study by researchers at Google and the University of Pennsylvania entitled Improving SSL Warnings: Comprehension and Adherence (pdf) found that 66 percent of people using the Chrome browser clicked right through the CA warnings.

As Threatpost's Brian Donahue reports in a February 3, 2015, article, redesigning the messages to provide better visual cues and more dire warnings convinced 62 percent of users to choose the preferred, safe response, compared to only 37 percent who did so when confronted with the old warnings. The "opinionated design" concept combines a plain-English explanation ("Your connection is not private" in red letters) with added steps required to continue despite the warning.

Researchers were able to increase the likelihood that users would make the safe choice by redesigning SSL certificate warnings from cryptic (top) to straightforward (bottom). Source: Sophos Naked Security

Best practices for developing SSL-enabled apps

SSL has become a key tool in securing IT infrastructures. Because SSL certificates are valid only for the time they specify, monitoring the certificates becomes an important part of app management. A Symantec white paper entitled SSL for Apps: Best Practices for Developers (pdf) outlines the steps required to secure your apps using SSL/TLS.

When establishing an SSL connection, the server returns one or more certificates to create a "chain of trust." The certificates may not be received in a predictable order. Also, the server may return more than necessary or require that the client look for necessary certificates elsewhere. In the latter case, a certificate with a caIssuers entry in its authorityInfoAccess extension will list a protocol and extension for the issuing certificate.

Once you've determined the end-entity SSL certificate, you verify that the chain from the end-entity certificate to the trusted root certificate or intermediate certificate is valid.

To help developers ensure their apps are protected against man-in-the-middle attacks resulting from corrupted SSL certificates, Google recently released a tool called nogotofail. As PC World's Lucian Constantin explains in a November 4, 2014, article, apps become vulnerable to such attacks because of bad client configurations or unpatched libraries that may override secure default settings.

Nogotofail simulates man-in-the-middle attacks using deep packet inspection to track all SSL/TLS traffic rather than monitoring only the two ports usually associated with secure connections, such as port 443. The tool can be deployed as a router, VPN server, or network proxy.

Security is at the heart of the new Morpheus Virtual Appliance, which lets you seamlessly provision and manage SQL, NoSQL, and in-memory databases across hybrid clouds. Each database instance you create includes a free full replica set for built-in fault tolerance and failover. You can administer your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases from a single dashboard via a simple point-and-click interface. 

Visit the Morpheus site to sign up for a FREE Trial!

The Benefits of Virtual Appliances Expand to Encompass Nearly All Data Center Ops

$
0
0

Virtual appliances deliver the potential to enhance data security and operational efficiency in IT departments of all shapes, sizes, and types. As the technology expands to encompass ever more data-center operations, it becomes nearly impossible for managers to exclude virtual appliances from their overall IT strategies.

Why have virtual appliances taken the IT world by storm? They just make sense. By combining applications with just as much operating system and other resources as they need, you're able to minimize overhead and maximize processing efficiency. You can run the appliances on standard hardware or in virtual machines.

At the risk of sounding like a late-night TV commercial, "But wait, there's more!" The Turnkey Linux site summarizes several other benefits of virtual appliances: they streamline complicated, labor-intensive processes; they make software deployment a breeze by encapsulating all the app's dependencies, thus precluding conflicts due to incompatible OSes and missing libraries; and last but not least, they enhance security by running in isolation, so a problem with or breach of one appliance doesn't affect any other network components.

In fact, the heightened focus in organizations of all sizes on data security is the impetus that will lead to a doubling of the market for virtual security appliances between 2013 and 2018, according to research firm Infonetics. The company forecasts that revenues from virtual security appliances will total $1.2 billion in 2018, as cited by Fierce IT's Fred Donovan in a November 11, 2014, article.

 

Growth in the market for virtual appliances will be spurred in large part by increased emphasis on data security in organizations. Source: Infonetics Research, via Fierce IT

In particular, virtual appliances are seen as the primary platform for implementation of software-defined networks and network functions virtualization, both of which are expected to boom starting in 2016, according to Infonetics.

The roster of top-notch virtual appliances continues to grow

There are now virtual appliances available for such core functions as ERP, CRM, content management, groupware, file serving, help desks, and domain controllers. TechRepublic's Jack Wallen lists 10 of his favorite virtual appliances, which include Drupal applianceLAMP stackZimbra applianceOpenfiler appliance, and the Opsview Core Virtual Appliance.

If you prefer the DIY approach, the TKLDev development environment for Turnkey Linux appliances claims to make building Turnkey Core from scratch as easy as running make.

The TKLDev tool lets you build Turnkey Core simply by running make. Source: Turnkey Linux

The source code for all the appliances in the Turnkey library are available on GitHub, as are all other repositories and the TKLDev documentation.

Also available are the Turnkey LXC (LinuX Containers) and Turnkey LXC appliance. Turnkey LXC is described by Turnkey Linux's Alon Swartz in a December 19, 2013, post as a " middle ground between a chroot on steroids and a full fledged virtual machine." The environment allows multiple isolated containers to be run on a single host.

The most recent addition to the virtual-appliance field is the Morpheus Virtual Appliance, which is the first and only database provisioning and management platform that supports private, public, and hybrid clouds. Morpheus offers the simplest way to provision heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases.

The Morpheus Virtual Appliance offers real-time monitoring and analysis of all your databases via a single dashboard to provide instant insight into consumption and availability of system resources. A free full replica set is provisioned for each database instance, and backups are created for your MySQL and Redis databases.

Visit the Morpheus site to create a free trial account. You'll also find out how to get started using Morpheus, which is the only database-as-a-service to support SQL, NoSQL, and in-memory databases.

When One Data Model Just Won't Do: Database Design that Supports Polyglot Persistence

$
0
0

The demands of modern database development mandate an approach that matches the model (structured or unstructured) to the nature of the underlying data, as well as the way the data will be used. Choice of data model is no longer an either/or proposition: now you can have your relational and key-value, too. The multimodel approach must be applied deliberately to reduce operational complexity and ensure reliability.

"When your only tool is a hammer, all your problems start looking like nails." Too often that old adage has applied to database design: When your only tool is a relational DBMS, all your data starts to look structured.

Well, today's proliferation of data types defies squeezing it all into a single model. The age of the multimodel database has arrived, and developers are responding by adopting designs that apply the most appropriate model to the various data types that comprise their diverse databases.

In a January 6, 2015, article on InfoWorld, FoundationDB's Stephen Pimentel explains that the rise in NoSQL, JSON, graphs, and other non-SQL data models is the result of today's applications needing to work with various data types and storage requirements. Rather than creating multiple distinct databases, developers are increasingly basing their databases on a single backend that supports multiple data models.

Data scientist Martin Fowler describes polyglot persistence as the ability of applications to manage their own data using various technologies based on the characteristics and use of that data. Rather than selecting the tool first and then fitting the data to the tool, developers will determine how the various data elements will be manipulated and then will choose the appropriate tools for those specific purposes.

Multimodel databases apply different data models in a single database based on the characteristics of various data elements. Source: Martin Fowler

Multimodel databases are by definition more complicated than their single-model counterparts. Managing this complexity is the principal challenge of developers, primarily because each data storage mechanism requires its own interface and creates a potential performance bottleneck. However, the alternative of attempting to apply the relational model to NoSQL-type unstructured data will require a tremendous amount of development and maintenance effort.

Putting the multimodel database design into practice

John P. Wood highlights the primary shortcoming of RDBMSs in clustered environments: the way they enforce data integrity places inordinate demands on processing power and storage requirements. RDBMSs depend on fast, simple access to all data continually to prevent duplicates, enforce constraints, and otherwise maintain the database.

While you can scale out relational databases via slave-master, sharding, and other approaches, doing so increases the app's complexity. More importantly, a key-value store is often a better fit for that data than RDBMS's rows and columns, even with object/relation mapping tools.

Wood describes two scenarios in which polyglot persistence improves database performance: when performing complex calculations on massive data sets; and when needing to store data that varies greatly from document to document, or that is constantly changing structure. In the first instance, data is moved from the relational to the NoSQL database and then processed by the application to maximize the benefits of clustering. In the second, structure is applied to the document on the fly to allow data inside the document to be queried.

The basic relational (SQL) model compared to the document (NoSQL) model. Source: Aaron Stannard

The trend toward supporting multiple data models in a single database is evident in the new Morpheus Virtual Appliance, which supports heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. Morpheus lets you monitor and analyze all your databases using a single dashboard to provide instant insight into consumption and availability of system resources.

The Morpheus Virtual Appliance is the first and only database provisioning and management platform that works with private, public, and hybrid clouds. A free full replica set is provisioned for each database instance, and backups are created for your MySQL and Redis databases.

Visit the Morpheus site to create a free trial account!

ETL: Teaching an Old Data Cleanup Tool New Tricks

$
0
0

The role of the traditional data-warehouse extract, transform, load function has broadened to become the foundation for a new breed of graphical business-intelligence tools. It has also spawned a market for third-party ETL tools that support a range of data types, sources, and systems.

The data-warehousing concept of extract, transform, load (ETL) almost seems quaint in the burgeoning era of unstructured data stored in petabyte-sized containers. Some analysts have gone so far as to declare ETL all but dead. In fact, technologies as useful and pervasive as ETL rarely disappear -- they just find new roles to play.

After all, ETL is intended to improve accessibility to and analysis of data. This function becomes even more important as data stores grow and analyses become more complex. In addition, the people analyzing the data are now more likely to be business managers using point-and-click dashboards rather than statisticians using sophisticated modeling tools. The IT chestnut "garbage in, garbage out" has never been more relevant.

In a February 2, 2015, post on the Smart Data Collective, Mario Barajas asserts that the best place to ensure the quality of data is at the source: the data input layer. ETL is used at the post-input stage to aggregate data in report-ready form. The technology becomes the lingua franca that "preps" diverse data types for analysis, rather than a data validator.

Gartner analyst Rita Sallam refers to this second generation of ETL as "smart data discovery," which she expects will deliver sophisticated business-intelligence capabilities to the 70 percent of people in organizations who currently lack access to such data-analysis tools. Sallam is quoted in a January 28, 2015, article on FirstPost.

Big-data analysis without the coding

Off-the-shelf ETL products were almost unheard of when data warehouses first arrived. The function was either built in by the warehouse vendor or hand-coded by the customer. Two trends have converged to create a market for third-party ETL tools: the need to accommodate unstructured data (think Twitter streams and video feeds); and to integrate multiple platforms (primarily mobile and other external apps).

ETL has morphed from a specialty function either built into a data-warehouse system or coded by customers, to a product category that extends far beyond any single data store. Source: Data-Informed

Representing this new era of ETL are products such as ClearStoryPaxataTamr, and Trifacta. As Gigaom's Barb Darrow explains in a February 4, 2015, article, the tools are intended to facilitate data sharing and integration with a company's partners. The key is to be able to do so at the speed of modern business. This is where next-gen ETL differs from its slow, deliberate data-warehouse counterpart.

Running at the speed of business is one of the primary benefits of the new Morpheus Virtual Appliance. The Morpheus database-as-a-service (DBaaS) lets you provision and manage SQL, NoSQL, and in-memory databases across hybrid clouds via a simple point-and-click interface. The service supports heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases.

A full replica set is provisioned with each database instance for fault tolerance and fail over, and Morpheus ensures that the replica and master are synched in near real time. The integrated Morpheus dashboard lets you monitor read/writes, queries per second, IOPs, and other stats across all your SQL, NoSQL, and in-memory databases. Visit the Morpheus site for pricing information and to create a free account.


Best DB Storing Cached Websites

$
0
0

[title 1]The Best Database for Storing Cached Web Sites [title 2]What Is the Best Database for Storing Cached Web Sites? [title 3]Storing Cached Web Sites? The Best Database  [image 1]http://www.shutterstock.com/pic-214231507/stock-photo-pixelated-word-cache-made-from-cubes-mosaic-pattern.html?src=6ysoCHf2_Z23Lj4aatGxrw-1-48&ws=1 [image 2]http://www.shutterstock.com/pic-135958850/stock-photo-storage-word-cloud-over-white-background.html?src=6ysoCHf2_Z23Lj4aatGxrw-1-15&ws=1 [image 3]http://www.shutterstock.com/pic-220016809/stock-photo-chipmunk-licking-hands-in-fall-with-pumpkin-and-peanuts.html?src=6ysoCHf2_Z23Lj4aatGxrw-2-89&ws=1

If you need to store cached web sites, it is a good idea to begin with a database that is optimized for caching. What database is built to perform this sort of task?

TL;DR: If you need to store cached data, such as cached web sites, you will want to be able to retrieve content quickly when needed. Some databases are built better for such purposes, since there will be a large amount of data stored and retrieved on a regular basis. A good option for this is an in-memory, key-value data store.

What is an In-Memory Database?

An in-memory database (often shortened to IMDB), is a database that stores data using the main memory of a machine as the primary storage facility, rather than storing the data on a disk as in a typical disk-optimized database.

Typically, the biggest feature of in-memory databases is that they offer faster performance than databases that use disk storage. This is due in part to fewer CPU instructions being run, as well as drastically improved query time, since the data is in main memory rather than needing to be read from a disk on the machine.

The differences between a disk database and an in-memory database. Source: OpenSourceForU.

What is a Key-Value Store?

A key-value store is a system in which an associative array (often called a map) is used for storing information, as opposed to the typical tabular structure associated with relational databases. This mapping results in key-value pairs that resemble the associative arrays in most all popular programming languages.

As a result, it is often easy for programmers to write queries for such databases, since these often use a JSON-like structure that is very familiar to most programmers. The figure below shows one example of how the key-value store Redis can store data and be queried for that data:

An example of a Redis query. Source: Redis.

Storing Cached Web Sites

When storing cached web sites, you will likely be storing extremely large amounts of data, since each stored web site will have files, images, media, and dependencies that will need to be stored to have a complete cache.

While a relational database could store the information, it would likely be less optimized than a NoSQL solution for this purpose, and would likely be slower than an in-memory database is likely to be far faster at storing and retrieving items from the massive amount of available data. With this in mind, a database that is an in-memory, key-value store is likely going to be the most effective and efficient solution.

Databases such as Redis and Memcached fall into this category. If you are using Ngnix as a reverse proxy server, Redis even has an Nginx module available that can serve the cached content from Redis directly. This makes Redis an excellent choice for performing the task of storing cached web sites and retrieving information from that store.

Get Your Own Redis Database

Whether your database tables will be simple or complex, Morpheus Virtual Appliance is a tool that allows you manage heterogeneous databases in a single dashboard. With Morpheus, you have support for SQL, NoSQL, and in-memory databases like Redis across public, private, and hybrid clouds. So, visit the Morpheus site for pricing information or to create a free account today!

Security Focus Shifts from Physical Servers to Cloud Services

$
0
0

[title 1] Security Focus Shifts from Physical Servers to Cloud Services [title 2] Tying Security to Data Rather than to the Hardware It's Stored on [title 3] A Data-centric Approach to Security De-emphasizes Hardware 

Tying security to the data itself allows IT to defend against internal and external threats, and avoid never-ending patch cycles.

TL;DR: Do you know where your organization's critical data is? As cloud services proliferate, it has become nearly impossible to secure physical servers and data-center perimeters. Growing threats from outside and within organizations have led IT managers to focus their security efforts on the data itself rather than the hardware the data is stored on.

You can't blame IT managers for thinking all their security efforts are futile. Nor can they be faulted for believing the deck is stacked against them. Today's hackers are more numerous, more proficient, and more focused on stealing companies' most valuable assets.

Even worse, outside threats may not be data managers' biggest security problem. As IT Business Edge's Sue Marquette Poremba writes in a February 2, 2015, article, recent surveys indicate IT departments' greatest concern is often the security threat posed by insiders: privileged users and employees with high-level access to sensitive data who either cause a breach intentionally, or through carelessness or lack of proper training.

Kaspersky Labs' IT Security Risk Survey 2014 identifies the biggest internal and external threats to medium-sized businesses' data security. Source: Kaspersky Labs

Poremba cites the 2015 Insider Security Threat Report compiled by Vormetric Data Security, which found that 59 percent of the IT personnel surveyed believe insiders pose the greatest data security risk to their firms. Vormetric is one of a growing number of security services to recommend customers focus their security efforts on the data rather than on securing the perimeter.

An opportunity to get off the non-stop-patch merry-go-round

One indication of the uphill battle companies face in keeping their systems safe is the sorry state of software patches. Security software vendor Secunia reports that 48 percent of Java users lack the latest patches. CSO's Maria Korolov reports on the Secunia survey in a January 26, 2015, article.

Secunia claims that in the past year, 119 new vulnerabilities were discovered in Java, which is installed on 65 percent of all computers. That's a lot of surface area for potential breaches. And Java is far from the only possible hack target: Veracode's recent scan of Linux systems found that 41 percent of enterprise applications using the GNU C Library (glibc) are susceptible to the Ghost buffer-overflow vulnerability because the apps use the gethostbyname function. Dark Reading's Kelly Jackson Higgins reports on the finding in a February 5, 2015, article.

Many analysts are predicting that an entirely new approach to data security is beginning to take hold in organizations: one that de-emphasizes server software and focuses instead on the data itself. Information Age's Ben Rossi writes in a January 25, 2015, article that physical servers are becoming "disposable," and in their place are API-driven cloud services.

Security controls are built into cloud services, according to Rossi: virtual servers feature dedicated firewalls, role access policy, and network access rights; files stored in the cloud have simple access policies and encryption mechanisms built in; and user-specific identity policies restrict their access to data and resources.

Security is at the heart of the new Morpheus Virtual Appliance, which lets you seamlessly provision and manage SQL, NoSQL, and in-memory databases across hybrid clouds. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over. You can administer your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases from a single dashboard via a simple point-and-click interface.

With the Morpheus database-as-a-service (DBaaS), you can migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync. Visit the Morpheus site for pricing information and to create a free account.

How to Use MySQL Performance Schema to Fine-tune Your Database

$
0
0

[title 1] How to Use MySQL Performance Schema to Fine-tune your Database [title 2] Find Performance Bottlenecks in MySQL Databases by Querying Performance Schema Tables [title 3] Performance Schema Table Queries Help You Improve MySQL Database Performance 

Simple queries of Performance Schema tables can boost MySQL database performance without the heavy lifting.

TL;DR: Identify indexes that are no longer being used, spot the queries that take longest to process, and diagnose other hiccups in your MySQL database's performance by querying the Performance Schema tables.

When Oracle released MySQL 5.7 last year, the company boasted that the new version handled queries at twice the speed of its predecessor, as InfoWorld's Joab Jackson reported in a March 31, 2014, article. Companies could therefore use fewer servers to run large jobs, or take less time to run complex queries on the same number of servers.

MySQL 5.7 also added new performance schema for diagnosing and repairing bottlenecks and other problems. The Performance Schema tool is designed to monitor MySQL Server execution at a low level to avoid imposing a performance hit of its own. The 52 tables in "performance_schema" can be queried to report on the performance of indexes, threads, events, queries, temp tables, and other database elements.

In his January 12, 2015, overview of MySQL Workbench Performance reports, Database Journal's Rob Gravelle points out that using Performance Schema entails a lot of "instrumentation," including server defaults for monitoring coverage, using canned Performance Reports, and retrieving the SQL statement.

The MySQL Reference Manual explains that Performance Schema monitors all server events, including mutexes and other synchronization calls, file and table I/O, and table locks. It tracks current events as well as event histories and summaries, but tables stored in the corresponding performance_schema are temporary and do not persist on disk. (Note that the complementary Information Schema -- also called the data dictionary or system catalog -- lets you access server metadata, such as database and table names, column data types, and access privileges.)

Putting MySQL's Performance Schema tool to use

Starting with MySQL version 5.6.6, Performance Schema is enabled by default. Database Journal's Gravelle explains in a December 8, 2014, article that you can enable or disable it explicitly by starting the server with the performance_schema variable set to "on" via a switch in your my.cnf file. Verify initialization by using the statement "mysql> SHOW VARIABLES LIKE 'performance_schema';".

Initialize Performance Schema via a switch in the my.cnf file, and verify initialization by running the "SHOW VARIABLES LIKE" command. Source: Database Journal

The 52 Performance Schema tables include configuration tables, object tables, current tables, history tables, and summary tables. You can query these tables to identify unused indexes, for example.

Run this query to determine which of your database's indexes are merely taking up space. Source: Database Journal

Adding "AND OBJECT_SCHEMA = 'test'" to the WHERE clause lets you limit results to a specific schema. Another query helps you determine which long-running queries are monopolizing system resources unnecessarily.

This Performance Schema query generates a process list by the time required for the queries to run. Source: Database Journal

While most performance_schema tables are read-only, some setup tables support the data manipulation language (DML) to allow configuration changes to be made. The Visual Guide to the MySQL Performance Schema lists the five setup tables: actors, objects, instruments, consumers, and timers.

There's no simpler or more-efficient way to monitor the performance of heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases than by using the new Morpheus Virtual Appliance. Morpheus lets you seamlessly provision and manage SQL, NoSQL, and in-memory databases across hybrid clouds via a single point-and-click dashboard. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

With the Morpheus database-as-a-service (DBaaS), you can migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync. Visit the Morpheus site for pricing information and to create a free account.

Java vs JavaScript

$
0
0

[title 1] Java vs. JavaScript: And the Winner Is... Both? [title 2] Java and JavaScript: Two Old-timers with a Bright Future [title 3] Enhanced Versions of Java and JavaScript Are in the Wings 

The two popular development environments are expected to benefit from long-awaited new features in their upcoming releases.

TL;DR: Long seen as rivals -- and often considered outdated -- Java and JavaScript remain the most popular development environments, despite their perceived shortcomings. Many of the criticisms of previous versions in terms of modularity, portability, and security are being addressed in new releases of both systems now under development.

At one time in the not-too-distant past, both Java and JavaScript were reported to be ready to breathe their last. Java was deemed so insecure that in 2013 Carnegie Mellon University's Software Engineering Institute and the U.S. Department of Homeland Security warned that people should disable Java in their browsers unless using it was "absolutely necessary."

Meanwhile, cross-site scripting attacks and other vulnerabilities led many to question the security of JavaScript. Still, few people have come out in favor of disabling JavaScript altogether because doing so renders most websites all but unusable. Chris Hoffman explains why in a February 28, 2013, article on the How-To Geek.

Fast-forward to 2015, and the outlook for both Java and JavaScript appears much rosier. Not only do both continue to top the list of most popular development platforms (as Mashable's Todd Wasserman reports in a January 18, 2015, article), enhanced versions of each are expected to arrive (relatively) soon.

Some of the improvements are already apparent. For example, Cisco Systems' 2015 Annual Security Report finds that Java exploits decreased 34 percent in 2014 from the previous year, as ProgrammableWeb's Michael Vizard reports in a January 29, 2015, article. The drop is attributed primarily to Java's implementation of automatic updates.

The prevalence of malware targeting Java dropped considerably in 2014, due in large part to implementation of automatic updates. Source: DarkVision Hardware

While attacks targeting JavaScript have increased as more developers choose the scripting language for both client- and server-side apps, JavaScript founder Brendan Eich recently predicted that the language would ultimately supplant Java as the premiere development platform for the web and elsewhere. In a February 13, 2015, article, InfoWorld's Paul Krill reports on Eich's presentation at the recent Node Summit conference in San Francisco.

Is Java development finally back on track?

When Oracle acquired Java as part of its purchase of Sun Microsystems in 2010, the product was in sorry shape, as ITworld's Andy Patrizio writes in a February 13, 2015, article. In addition to critical bugs, most of Sun's Java development team departed soon after the merger. This helps explain why so much time passed between the releases of Java SE 7 in 2011 and SE 8 in 2014.

With the impending arrival of the more modular Java 9, the tide is definitely turning. JavaWorld's Jeff Hanson explains in a February 2, 2015, article that despite Java's object-oriented nature, the current version doesn't meet developers' modularity requirements: object-level identity is unreliable, interfaces aren't versioned, and classes aren't unique at the deployment level.

The Open Service Gateway Initiative (OSGi) has been the primary alternative for modular Java, providing an execution environment, module layer, life-cycle layer, and service registry. However, OSGi lacks a formal mechanism for native package installation. Project Jigsaw is expected to be Java 9's modularity component, while the Java Enhanced Proposal (JEP) 200 and JEP 201 will build on Jigsaw for segmenting the JDK and modularizing the JDK source code, respectively.

JavaScript's fortunes are tied to the growing popularity of HTML5, the combination of which is becoming the de facto standard for websites, enterprise apps, and mobile apps, according to Developer Tech's Carlos Goncalves in a January 23, 2015, article. The downside of this tandem is a lack of security: code is stored on both the client and server as clear text. JavaScript obfuscation is being used increasingly not only to protect systems and applications, but also to enforce copyrights and licenses.

One way to future-proof your databases is by using the new Morpheus Virtual Appliance, which provides a single dashboard for provisioning, deploying, and managing your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. Morpheus offers a simple point-and-click interface for analyzing SQL, NoSQL, and in-memory databases across hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

With the Morpheus database-as-a-service (DBaaS), you can migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync. Visit the Morpheus site for pricing information and to create a free account.

Replace DB Column with ID

$
0
0

[title 1]When to Replace a Database Column with an ID [title 2]When Should You Replace a Database Column with an ID? [title 3]Databases: Column vs. ID  [image 1]http://www.shutterstock.com/pic-206828482/stock-photo-graph-degress-raster-id-cards.html?src=rlue4ZqvkiPBtZLNuopq6A-1-2&ws=1 [image 2]http://www.shutterstock.com/pic-66240421/stock-photo-monochromatic-increase-decrease-magnifiers-icons-raster-version-of-vector-illustration-id.html?src=rlue4ZqvkiPBtZLNuopq6A-1-43&ws=1 [image 3]http://www.shutterstock.com/pic-227201065/stock-vector-vector-id-card-with-fingerprint-eps-rgb-one-global-color-organized-by-layers-gradients-used.html?src=8c8KpWrnTmwUNb_WJCB1eQ-1-2&ws=1

Sometimes database tables grow and require some columns to be moved to a separate table. Find out when this may be a good idea!

TL;DR: As a database table grows, certain columns can become candidates for being moved into a separate table and referred to by a particular ID in the original table. To know when this is a good idea, you need to determine the purpose of the data that is in the column in question and the potential for it to expand and evolve over time.

Determine the Purpose of the Data

Suppose you had a small application using a single table, like the one shown below:

An example database table.

This table is simply used for storing user information. You have a user ID, last name, first name, and user type. All of these seem good, as long as the number of user types is small and doesn't change very often. However, what if different user types need to be added on a somewhat regular basis over time?

Since the user type is currently a string of text (character varying in database terms), the initial adding of a user or the updating of the user type requires that particular string of text to be entered. There is a chance (especially if an update is done via a manual query) that the string could be mistyped slightly. This creates a new user type, which would exclude the user from being in the originally intended group!

Using an ID Instead

Now that the user type column is becoming problematic, you need to find a way to help keep the data in the column consistent with the actual user types. You can simply change this to a number; however, this will require that anyone adding or updating records in the table know what those number mean (e.g., 1 is for administrator, 2 is for premium user, and so on).

Another method would be to change it to a number and add a user type description to the table as well, but this is now adding another column to keep track of in this table, and could fall prey to the type description being entered incorrectly.

Moving a Column to a Table

Since you want to have the user type be a number, but still want to know what that number means, user type information is a good candidate to be moved to its own table with an ID as the primary key and a description column. The ID can then be referenced in the original users table. This helps ensure that both the user and user type data are kept consistent and accurate.

The resulting tables are shown below:

An example of two tables working together.

Notice that you can now simply point to the user type ID in the user_types table. This table can now be referenced if you need to know what that number means, and keeps each of those types with a consistent ID and description.

Get Your Own Database

Whether your database tables will be simple or complex, Morpheus Virtual Appliance is a tool that allows you manage heterogeneous databases in a single dashboard. With Morpheus, you have support for SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds. So, visit the Morpheus site for pricing information or to create a free account today!

Pros Cons DB Normalization

$
0
0

[title 1]The Pros and Cons of Database Normalization [title 2]Database Normalization: The Pros and Cons [title 3]The Benefits and Drawbacks of Normalizing a Database  [image 1]http://www.shutterstock.com/pic-156725447/stock-photo-file-in-database-laptop-with-colored-ring-binders-d-image.html?src=Nrh4-uVPRaGMDS-FQpMsiQ-1-53&ws=1 [image 2]http://www.shutterstock.com/pic-217168813/stock-photo-data-storage-file-and-document-management-and-database-mining.html?src=mevTdindvozdWxbMFhvYXw-1-61&ws=1 [image 3]http://www.shutterstock.com/pic-246213892/stock-photo-ethnicity-cheerful-group-big-data-information-database-concept.html?src=rwp6f1fDGjDYP3wPCbto9A-2-50&ws=1

To normalize or not to normalize? Find out when normalization of a database is helpful and when it is not.

TL;DR: When using a relational database, normalization can help keep the data free of errors and can also help ensure that the size of the database doesn't grow large with duplicated data. At the same time, some types of operations can be slower in a normalized environment. So, when should you normalize and when is it better to proceed without normalization?

What is Normalization?

Database normalization is the process of organizing data within a database in the most efficient manner possible. For example, you likely do not want a username stored in several different tables within your database when you could store it in a single location and point to that user via an ID instead.

By keeping the unchanging user ID in the various tables that need the user, you can always point it back to the appropriate table to get the current username, which is stored in only a single location. Any updates to the username occur only in that place, making the data more reliable.

An example of using the first normal form. Source: ChatterBox's .NET.

What Is Good about Database Normalization?

A normalized database is advantageous when operations will be write-intensive or when ACID compliance is required. Some advantages include:

  1. Updates run quickly due to no data being duplicated in multiple locations.
  2. Inserts run quickly since there is only a single insertion point for a piece of data and no duplication is required.
  3. Tables are typically smaller that the tables found in non-normalized databases. This usually allows the tables to fit into the buffer, thus offering faster performance.
  4. Data integrity and consistency is an absolute must if the database must be ACID compliant. A normalized database helps immensely with such an undertaking.

What Are the Drawbacks of Database Normalization?

A normalized database is not as advantageous under conditions where an application is read-intensive. Here are some of the disadvantages of normalization:

  1. Since data is not duplicated, table joins are required. This makes queries more complicated, and thus read times are slower.
  2. Since joins are required, indexing does not work as efficiently. Again, this makes read times slower because the joins don't typically work well with indexing.

What if the Application is Read-Intensive and Write-Intensive?

In some cases, it isn't as clear that one strategy should be used over the other. Obviously, some applications really need both normalized and non-normalized data to work as efficiently as possible.

In such cases, companies will often use more than one database: a relational data such as MySQL for ACID compliant and write-intensive operations and a NoSQL database such as MongoDB for read-intensive operations on data where duplication is not as big of an issue.

What NoSQL and SQL databases do well. Source: SlideShare.

Get Your Own Hosted Database

Whether your database will be normalized or not, or whether you decide you need multiple databases, Morpheus Virtual Appliance is a tool that allows you manage heterogeneous databases in a single dashboard. With Morpheus, you have support for SQL, NoSQL, and in-memory databases like Redis across public, private, and hybrid clouds. So, visit the Morpheus site for pricing information or to create a free account today!

Is Scala the Development Environment of the Future, or a Programming Dead End?

$
0
0

The Scala development environment seems to have as many naysayers as it has ardent supporters. Yet the language's combination of object-oriented and functional programming features is being emulated in such environments as Apple's Swift and Microsoft's C#. Despite recent forks released by Scala proponents dissatisfied by the slow pace of the language's development, the consensus is that Scala is a technology with a bright future.

When Java 8 was released in March 2014, it marked the most significant update of the venerable development environment in more than a decade. Many analysts interpreted the addition of such features as lambda expressions and functional programming as bad news for the Scala programming language, which already supported functional programming and runs in Java virtual machines.

In a September 5, 2014, interview with InfoWorld's Paul Krill, Typesafe chairman and co-founder Martin Odersky countered the argument that Scala and Java 8 are now so compatible that Scala is redundant. (Typesafe sells Scala-based middleware.) Odersky points out that the industry is moving increasingly toward functional programming, and Java 8 will spur the trend. However, Scala offers a much richer functional programming environment than Java 8, and Scala will leverage Java 8's VM improvements.

The current version 2.11. of Scala features a streamlined compiler and support for case classes with more than 22 parameters. Three new versions of Scala are in the pipeline, according to Odersky; their releases are scheduled roughly 18 months apart.

  • Version 2.12 will emphasize integration with Java 8
  • Aida will focus on making Scala's libraries less complicated to use and will integrate Java 8's parallel collections, or streams
  • Don Giovanni will represent a major reworking of the environment with a goal of making the core simpler and compiling more efficient

Much of the promise of Scala as a key language for future development is its fusion of object-oriented and functional programming. This hybridization is also evident in Apple's new Swift language and is expected to be evident in the next version of Microsoft's C#.

 

As a "blended" language, Scala combines attributes of both object-oriented and functional programming. Source: app-genesis

In a February 20, 2014, interview with ReadWrite's Matt Asay, Odersky states that Java 8's implementation of lambdas will add new methods to the java.util.stream.Stream type that will facilitate writing high-level collection code. It will also ease the transition to Java-based functional interfaces.

Scala forks: A sign of trouble or indication of strength?

A few months after the releases of Java 8 and Scala 2.11, not one but two different versions of the Scala compiler were announced. InfoQ's Benjamin Darfler reported in a September 16, 2014, article that Shapeless library principal engineer Miles Sabin announced on the Typelevel blog a fork of the Scala compiler that is merge-compatible with the Typesafe Scala compiler.

Just three days later, another Scala compiler fork was announced by Typesafe co-founder Paul Philips, who left Typesafe in 2013. Unlike Sabin's "conservative" fork, the Scala compiler developed by Philips is not intended to be merged with the Typesafe version down the road. Both Sabin and Philips believe Scala's development is proceeding too slowly (version 2.12 is scheduled for release in early 2016).

Typesafe CEO Odersky welcomed the forks, writing on the Typesafe blog that having "advanced and experimental branches" of the language and compiler in parallel with the "stable main branch" serves the needs of diverse developers. Odersky is encouraged that Typelevel's compiler will remain merge-compatible with standard Scala and believes some of that compiler's innovative features may eventually be added to the standard.

 

Typelevel's "conservative" fork of the standard Scala compiler is designed to provide a simple migration path to the Typesafe version. Source: Typelevel, via GitHub

During this time of transition in the development world, one of the key features of the new Morpheus Virtual Appliance database-as-a-service (DBaas) is its support for a wide range of development tools for connecting to, configuring, and managing heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. Morpheus lets you monitor and analyze all your databases using a single dashboard to provide instant insight into consumption and availability of system resources.

The Morpheus Virtual Appliance is the first and only database provisioning and management platform that works with private, public, and hybrid clouds. A free full replica set is provisioned automatically for each database instance, and backups are created for your MySQL and Redis databases.

Visit the Morpheus site to create a free trial account today!


How the Internet of Things Will Affect Database Management

$
0
0

There's no denying the rise of the Internet of Things will challenge existing database systems to adapt to accommodate huge volumes of unstructured data from diverse sources. Some analysts question whether RDBMSs have the scalability, flexibility, and connectivity required to collect, store, and categorize the disparate data types organizations will be dealing with in the future. Others warn against counting out RDBMSs prematurely, pointing out that there's plenty of life left in the old data structures.

Imagine billions of devices of every type flooding data centers with information: a secured entryway reporting on people's comings and goings; a smart shelf indicating a shortage of key production supplies; a pallet sensor reporting an oversupply of stocked items.

The Internet of Things poses unprecedented challenges for database administrators in terms of scalability, flexibility, and connectivity. How do you collect, categorize, and extract business intelligence from such disparate data sources? Can RDBMSs be extended to accommodate the coming deluge of device-collected data? Or are new, unstructured data models required?

As you can imagine, there's little consensus among experts on how organizations should prepare their information systems for these new types and sources of data. Some claim that RDBMSs such as MySQL can be extended to handle data from unconventional sources, many of which lack the schema, or preconditioning, required to establish the relations that are the foundation of standard databases. Other analysts insist that only unstructured, "schema-less" DBMSs such as NoSQL are appropriate for data collection from intelligent devices and sensors.

 

The standard application model is transformed to encompass the cloud by the need to accommodate tomorrow's diverse data sources and types. Source: Technische Universität München

In a November 28, 2014, article, ReadWrite's Matt Asay reports on a recent survey conducted by Machine Research that found NoSQL is the key to "managing more heterogeneous data generated by millions and millions of sensors, devices and gateways." Not surprisingly, the two primary reasons for the assessment are NoSQL's flexibility in handling unstructured data, and its scalability, which the researchers claim RDBMSs simply can't match.

Reports of the death of RDBMSs are slightly exaggerated

There are a couple of problems with this claim, however. First, there's an acute shortage of NoSQL developers, according to statistics compiled by research firm VisionMobile. The company pegs the current number of NoSQL developers at 300,000, but it estimates that by the year 2020 the number will jump to 4.5 million.

 

VisionMobile's research indicates that demand for NoSQL developers will explode in coming years. Source: VisionMobile

Many experts posit that the forecast demise of RDBMSs is premature because the forecasters underestimate the ability of RDBMS vendors to extend their products to enhance their scalability and their ability to accommodate unstructured data. For example, MySQL vendor DeepDB improved the scalability of its product by replacing the default InnoDB storage engine with an alternative that it claims improved server performance by a factor of 100.

At this point, IT managers can be excused for thinking "Here we go again." Does the Internet of Things signal yet another sea-change for their data centers? Or is this the most recent case of the hype outpacing the substance? According to a December 2014 TechTarget article by Alan R. Earls, corporate bandwidth will be overwhelmed by the rush of small data packets coming from Internet-connected devices.

In particular, distributed data centers will be required that move data security -- and the bulk of data analysis -- to the edge of the network. This will require pushing the application layer to the router and integrating a container with logic. As Accenture Technology Lab researcher Walid Negm points out, the cloud is increasingly serving as either an extension of or replacement for the conventional network edge.

The best way to get a jump on the expansion of data networks and provision your databases is via a secure, reliable, and scalable platform for private, public, and hybrid clouds. Morpheus Virtual Appliance supports MongoDB, MySQL, Elasticsearch, and Redis with a simple point and click database provisioning setup. Get started on your free trial now! 

New Breed of JSON Tools Closes the Gap with XML

$
0
0

As JSON's popularity for web app development increases, the range of tools supporting the environment expands into XML territory. The key is to maintain the simplicity and other strengths of JSON while broadening the environment's appeal for web developers.

JSON or XML? As with most programming choices, determining which approach to adopt for your web app's server calls is not a simple "either-or" proposition.

JSON, or JavaScript Object Notation, was conceived as a simple, concise data format for encoding data structures in an efficient text format. In particular, JSON is less "verbose" than XML, according to InCadence Strategic Solutions VP Michael C. Daconta in an April 16, 2014, article on GCN.com.

Despite JSON's efficiency, Daconta lists four scenarios in which XML is still preferred over JSON:

  • When you need to tag the data via markup, which JSON doesn't support
  • When you need to validate a document before you transmit it
  • When you want to extend a document's elements via substitution groups or other methods
  • When you want to take advantage of one of the many XML tools, such as XSLT or XPath

SOAP APIs and REST APIs are also not strictly either/or

One of JSON's claims to fame is that it is so simple to use it doesn't require a formal specification. George Anadiotis explains in a January 28, 2014, post on the Linked Data Orchestration site that many real-world REST-based JSON apps require schemas, albeit much different schemas than their SOAP-based XML counterparts.

 

The basic JSON-REST model decouples the client and server, which separates the app's internal data representation from the wire format. Source: Safety Net

The most obvious difference is that there are far fewer JSON tools than there are XML tools. This is understandable considering that XML has been around for decades and JSON is a relative newcomer. However, JSON tools are able to reverse-engineer schemas based on the JSON fragments you arrange on a template. The tools' output is then edited manually to complete the schema.

Anadiotis presents a five-step development plan for a JSON schema:

  1. Create sample JSON for exchanging data objects
  2. Use a JSON schema tool to generate a first-draft schema based on your sample JSON fragments
  3. Edit the schema manually until it is complete
  4. Use a visualization tool to create an overview of the schema (optional)
  5. Run a REST API metadata framework to provide the API's documentation

Tool converts JSON to CSV for easy editing

JSON may trail XML in quantity and quality of available toolkits, but the developer community is working hard to close the gap. An example is the free JSON-to-CSV converter developed by Eric Mills of the Sunlight Foundation. The converter lets you paste JSON code into a box and then automatically reformat and recolor it in an easy-to-read table.

 

In this example, the JSON-to-CSV converter transforms JSON code into a table of data about Ohio state legislators. Source: Programmable Web

Mills' goal in creating the converter was to make JSON "as approachable as a spreadsheet," as Janet Wagner repots in a March 31, 2014, post on the Programmable Web site. While the converter is intended primarily as a teaching tool that demonstrates the potential of JSON as a driver of the modern web, Mills plans to continue supporting the converter if is used widely.

Conversely, if you'd like to convert Excel/CSV data to HTML, JSON, XML, and other web formats, take the free Mr. Data Converter tool out for a spin.

Relational or Graph: Which Is Best for Your Database?

$
0
0

Choosing between the structured relational database model or the "unstructured" graph model is less and less an either-or proposition. For some organizations, the best approach is to process their graph data using standard relational operators, while others are better served by migrating their relational data to a graph model.

The conventional wisdom is that relational is relational and graph is graph, and never the twain shall meet. In fact, relational and graph databases now encounter each other all the time, and both can be better off for it.

The most common scenario in which "unstructured" graph data coexists peaceably with relational schema is placement of graph content inside relational database tables. Alekh Jindal of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) points out in a July 9, 2014, post on the Intel Science and Technology Center for Big Data blog that most graph data originates in an RDBMS.

Rather than extract the graph data from the RDBMS for import to a graph processing system, Jindal suggests applying the graph-analytics features of the relational database. When a graph is stored as a set of nodes and a set of edges in an RDBMS, built-in relational operators such as selection, projection, and join can be applied to capture node/edge access, neighborhood access, graph traversal, and other basic graph operations. Combining these basic operations makes possible more complex analytics.

Similarly, stored procedures can be used as driver programs to capture the iterative operations of graph algorithms. The down side of expressing graph analytics as SQL queries is the performance hit resulting from multiple self-joins on tables of nodes and edges. Query pipelining and other parallel-processing features of RDBMSs can be used to mitigate any resulting slowdowns.

When Jindal compared the performance of a column-oriented relational database and Apache Giraph on PageRank and ShortestPath, the former outperformed the latter in two graph-analytics datasets: one from LiveJournal with 4.8 million nodes and 68 million edges; and one from Twitter with 41 million nodes and 1.4 billion edges.

 

A column-oriented RDBMS matched or exceeded the performance of a native graph database in processing two graph datasets. Source: Alekh Jindal, MIT CSAIL

When migrating data from relational to graph makes sense

While there are many instances in which extending the relational model to accommodate graph-data processing is the best option, there are others where a switch to the graph model is called for. One such case is the massive people database maintained by Whitepages, which resided for many years in siloed PostgreSQL, MySQL, and Oracle databases.

As explained in a November 12, 2014, post on Linkurious, Whitepages discovered that many of its business customers were using the directory to ask graph-like questions, primarily for fraud prevention. In particular, the businesses wanted to know whether a particular phone number was associated with a real person at a physical address, and what other phone numbers and addresses have been associated with a particular person.

The development team hired by Whitepages used the Titan scalable graph database to meet the company's need for scalability, availability, high performance (processing 30,000 vertices per second), and high ingest rate (greater than 200 updates per second). The resulting graph schema more accurately modeled the way Whitepages customers where querying the database: from location to location, and number to number.

 

The Whitepages graph schema tracks people as they change physical address and telephone number, among other attributes. Source: Linkurious

Whitepages has made its graph infrastructure available to the public via the WhitePages PRO API 2.0.

Whether you find your organization's data better suited to either the graph or relational model, the Morpheus Virtual Appliance will help you with real-time database and system operational insights. Get your MongoDB, MySQL, Elasticsearch, or Redis databases provisioned with a simple point-and-click interface, and manage SQL, NoSQL, and In-Memory databases across hybrid clouds. 

HTML5 Promises Simpler Embedded Videos and Better Performance

$
0
0

[title 1] Final HTML5 Spec Has Arrived: Welcome to the Post-Flash Web [title 2] HTML5 Promises Simpler Embedded Videos and Better Performance [title 3] How to Take Advantage of HTML5's Enhanced Video Features 

Embedding videos in web pages is simpler and playback quality improved via HTML5's new video specification.

TL;DR: The new HTML standard was along time coming, but now that it has arrived, developers can put HTML5's advanced video features to good use. Use these sites to determine how compatible your browser is with HTML5, how to embed an open-source HTML5 video player in your pages, and how to make best use of all the attributes associated with HTML5's new "video" tag.

Let's face it: Adobe's Flash player is trouble. It's trouble because Flash is susceptible to so many zero-day attacks, making your systems more vulnerable to malware. It's also trouble because it's everywhere: Flash was the long-time de facto standard for streaming Web animations and other media on PCs and, to a lesser extent, mobile devices. (Emphasis on was.)

YouTube's recent decision to dump Flash in favor of HTML5 is the stake in the heart of a proprietary technology that has outstayed its welcome. In a January 28, 2015, article, CNET's Stephen Shankland describes the ascendance of HTML5 video. Flash will survive for a short time as a vestigial browser extension needed to accommodate sites that haven't converted to the new video standard. (Note that Google Chrome has used a built-in Flash player since 2010.)

However, Flash's ultimate demise may not be far off. Mozilla is developing a version of Firefox that doesn't need the Flash player, as Tech Times' Timothy Torres reports in a February 16, 2015, article. The trend is away from browser extensions generally and the Flash player in particular.

In October 2014, the W3C's final recommendation for the HTML5 specification was released. You can get a sense of how well your browser is prepared for the new web standard by visiting the HTML5 Test site. The service generates a numerical score indicating your browser's degree of support for HTML5. The top score is 555. Below the overall score are category tables showing how well your browser scored in such areas as parsing, video, audio, elements, forms, storage, 2D and 3D graphics, and user interaction.

The HTML5 Test site automatically generates an overall score of your browser's support for HTML5, and shows how well your browser does in specific categories. Source: HTML5 Test

The test doesn't attempt to cover all aspects of HTML5 and its many extensions. You can also view the aggregate scores of other desktop, tablet, and mobile browsers. On desktops, Google Chrome scores highest overall, followed by Opera, Firefox, Safari, and Internet Explorer.

The aggregate scores of browsers' support for HTML5 indicate that Chrome supports the new standard best, followed by Opera, Firefox, Safari, and IE. Source: HTML5 Test

Open-source HTML5 video player, and an HTML5 video tutorial

The site supporting the open-source Video.js HTML5 video player offers a primer on the spec's "video" tag, which works much like the "img" tag in earlier HTML versions. The goal is to allow developers to embed videos in their pages without being concerned about which players site visitors have installed. Video performance is enhanced because the browser doesn't need to call a separate plug-in or extension to play the video.

HTML5 Rocks' Pete LaPage put together an extensive HTML5 video tutorial that covers everything from the spec's simple embedded playback controls to using the "source" element to specify multiple source files. For example, you'll optimize video performance by including the type attribute in the source element.

Use the type attribute in the "video" tag's source element to improve the performance of videos embedded in web pages. Source: HTML5 Rocks

The tutorial also covers using the "track" element to add subtitles and other text to a video, and special "video" attributes, such as autoplay, preload, loop, muted, and height & width. As you might expect, the tutorial includes several videos that demonstrate the techniques it presents.

Using a browser to manage your heterogeneous databases doesn't get any simpler than by using the new Morpheus Virtual Appliance. The Morpheus database-as-a-service (DBaaS) provides a single dashboard for provisioning, deploying, and monitoring MySQL, MongoDB, Redis, and ElasticSearch databases via a simple point-and-click interface.

Morpheus lets you work with SQL, NoSQL, and in-memory databases across hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over. You can migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync.

Visit the Morpheus site for pricing information and to create a free account.

The Many Forms of HTML5 Local Storage

$
0
0

[title 1] The Many Forms of HTML5 Local Storage [title 2] How HTML5's Web Storage Improves Application Performance [title 3] Use HTML5's Web Storage Options to Make Your Apps More Efficient 

New local storage options in HTML5 are sure to improve the performance of any applications that run in browsers.

TL;DR: With HTML5, you can now store as much as 5MB of data locally in the browser, and decide whether or not the data should persist when the browser session ends. Here's a rundown of the HTML5 Web Storage options for app developers, as well as a few pitfalls you'll want to avoid.

After years of fits and starts, the W3C's final recommendation for the HTML5 specification was released in October 2014. Perhaps the greatest impact HTML5 will have for app developers is local data storage. Toptal's Demir Selmanovic points out in The 5 Most Common HTML5 Mistakes that Web Storage's local data stores are not encrypted, which introduces a potential security risk.

On the plus side, Web Storage data never travels to web servers, so it is more secure than old-style cookies and Flash LBOs. However, HTML5's localStorage and sessionStorage values are easy for bad guys to modify, so you should avoid storing security tokens locally.

HTML5 has already gone through several iterations of local storage, the simplest of which is JavaScript variables. According to Sitepoint's Craig Buckler in HTML5 Browser Storage: The Past, Present and Future, you can store application data in a single global variable.

The simplest approach to local storage for application data in HTML5 is to use a single JavaScript global variable. Source: Sitepoint

Alternatively, values can be stored in the page DOM as node attributes or properties. This is particularly beneficial for widget-specific values, but doing so is riskier than using JavaScript variables because you can't predict how your data will be interpreted by future browsers and other libraries.

Web Storage's window.localStorage and code.sessionStorage objects have identical APIs and are used to retain persistent data and session-only data, respectively. Name/value pairs are used to store domain-specific strings, and up to 5MB of data can be stored locally, none of which ever travels to the server.

HTML5's window.localStorage object allows local data to persist after the browser session is closed. Source: Sitepoint

Web Storage supports only string values, and it's unstructured, so it doesn't allow transactions, indexing, or searching. Conversely, IndexedDB's data store is structured, transactional, and more like NoSQL in terms of performance. Its synchronous and asynchronous API makes possible more robust client-side data storage and access, although the API's size and complexity make creating an IndexedDB polyfill a challenge.

File API enhancements facilitate local file access

When users interact with files in their browsers, the many back-and-forth trips between the client and server can be frustrating -- to users and developers alike. HTML5's File API lets users access and alter files in the browser with much less interaction with the server.

In an October 29, 2014, tutorial on Scotch.io, Spencer Cooley describes how to allow browser users to select one or more image files using JavaScript, and then display the file without requiring a call to the server.

HTML5 allows you to access an image file in your browser without having to communicate with the server. Source: Scotch.io

After accessing the FileList object, you render the file in the browser by loading one of the file objects into FileReader to generate a local URL that serves as the src in an image element.

Load a file object into FileReader to create a local URL to use as the src image element. Source: Scotch.io

The simplest way to manage your databases in a browser is by using the new Morpheus Virtual Appliance, which provides a single dashboard for provisioning, deploying, and monitoring your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. Morpheus offers a simple point-and-click interface for analyzing SQL, NoSQL, and in-memory databases across hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

With the Morpheus database-as-a-service (DBaaS), you can migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync. Visit the Morpheus site for pricing information and to create a free account.

Viewing all 1101 articles
Browse latest View live