Morpheus Blog

When one of your Heroku apps is accessed infrequently, it can take more than 20 seconds for it to spin out of idle mode. Keep the apps active by automatically pinging their servers, either by using a free add-on or by running a custom function.

The app-development two step: Step one, you build and test your app; step two, you find a service to host the app so potential customers can kick its tires. No matter which tools and services you use to create your web application, the chances are good you can use the Heroku platform as a service (PaaS) to make it available to the public.

Heroku made its name by offering to host your web apps for free. You switch to the paid version of the service once you scale up as the app gains traction with customers. But Heroku's true claim to fame is that the service lets you deploy your apps with just a couple of clicks. As ReadWrite's Lauren Orsini explains in a September 23, 2014, article, hosting an app is much trickier than hosting a site.

Orsini describes using a free add-on called Heroku Scheduler to ping her apps once an hour. A primary reason for pinging an app regularly is to avoid having to spin up a new dyno each time someone accesses the app after a delay. If it has shifted into idle, it can take more than 20 seconds for the app to open. Some potential customers may not wait that long.

This is the problem addressed in a Stack Overflow post that generated 14 suggestions for preventing dynos from idling. Topping the responses was use of the free New Relic add-on, which has an availability monitor that can ping sites at a set interval. Alternatives include Kaffeine, pingdom, and Uptimerobot.

If you prefer a solution that doesn't rely on a third-party service, you can run the KeepAlive function shown below:

The KeepAlive function automatically pings apps at a set interval to prevent Heroku dynos from idling. Source: Stack Overflow

Automatic deployment of code from GitHub and Dropbox

Heroku recently released GitHub Integration, which automates the process of deploying code stored on GitHub. InfoQ's Richard Seroter explains in a January 21, 2015, article that manual deploys are recommended when making changes, such as testing a new feature branch. Automatic deployments are initiated each time developers push to a designated branch, or for teams, when the continuous integration server finishes and commits successfully.

To use Heroku's Dropbox Sync option, create a "Heroku" subfolder on the Dropbox account. The source code of the applications you deploy will be copied to that folder. Deployment from Dropbox to Heroku is done by using the Heroku dashboard to kick off a manual commit. You can't use both Dropbox Sync and GitHub Integration in the same app, but you can use Dropbox Sync with Heroku's standard git function.

Multiple developers can work on an application simultaneously by connecting their Dropbox accounts to the app, which delivers any changes to the source code automatically. However, Heroku warns that Dropbox lacks the robust features of a true source control management tool. For example, if a developer force-pushes the Heroku git rep, the Dropbox folders are unlinked due to the difficulty of generating a differences report.

The update's many new features put Heroku in a good position against the competition in the PaaS market, according to InfoQ's Seroter. The table below compares the native code deployment capabilities of the four leading PaaS providers.

The addition of GitHub Integration, Dropbox Sync, and other new Heroku features gives the service an edge over most PaaS competitors. Source: InfoQ

The most efficient way to manage your apps, databases, IT operations, and business services in real time is by using the Happy Apps service. Happy Apps lets you set up rules so you are alerted via SMS and email whenever incidents or specific events occur. You can group and monitor multiple apps, databases, web servers, and app servers. In addition to an overall status, you can view the status of each individual group member.

Happy Apps is the only app-management service to support SSH and agent-based connectivity to all your apps on public, private, and hybrid clouds. The service provides dependency maps for determining the impact your IT systems will have on other apps. All checks performed on your apps are collected in easy-to-read reports that can be analyzed to identify repeating patterns and performance glitches over time. Visit the Happy Apps site to sign up for a free trial.

The many OpenSSL vulnerabilities coming to light in recent months have motivated a thorough audit of the open system's code. But this hasn't prevented companies from implementing proprietary SSL alternatives, including application delivery controllers running a streamlined, closed SSL stack, and Google's own BoringSSL implementation.

It's only March, but it has already been a rough year for OpenSSL security. On January 8, the OpenSSL Project issued updates that addressed eight separate security holes, two of which were rated as "moderate" in severity. SC Magazine's Adam Greenberg reports on the patches in a January 8, 2015, article.

Then in the first week of March, the FREAK vulnerability was disclosed, which made one-fourth of all SSL-encrypted sites susceptible to man-in-the-middle attacks, as Informationweek Dark Reading's Kelly Jackson Higgins explains in a March 3, 2015, article.

Now site managers are sweating yet another OpenSSL patch for a security hole that could be just as serious as FREAK. In a mailing list notice posted on March 16, 2015, the OpenSSL Project's Matt Caswell announced the March 19, 2015, release of a patch for multiple OpenSSL vulnerabilities, at least one of which is classified as "high" severity. The Register's Darrin Pauli reports in a March 17, 2015, post that the updates apply to OpenSSL versions 1.0.2a, 1.0.1m, 1.0.0r, and 0.9.8zf.

Web giants finance long-overdue OpenSSL security audit

The alert about the new OpenSSL vulnerability comes just more than a week after it was announced that the NCC Group security firm would be conducting an audit of OpenSSL code. The goal of the audit is to spot errors in the code before they are discovered in the wild, as ZDNet's Steven J. Vaughan-Nichols writes in a March 7, 2015, article.

NCC Group principal security engineer Thomas Ritter states that the OpenSSL codebase is now stable enough to undergo a thorough analysis and revision. The focus of the NCC Group audit will be on Transport Layer Security attacks related to protocol flow, state transitions, and memory management. Preliminary results are expected by early summer 2015, according to Ritter.

OpenSSL is only one of the many Secure Sockets Layer/Transport Layer Security implementations for encrypting web content. Source: Ale Agostini

Serious vulnerabilities discovered in OpenSSL in the recent past, including Heartbleed, Shellshock, and Early CCS, cause sites to rush to apply patches. The OpenSSL audit is the first project under the Linux Foundation's Core Infrastructure Initiative, which is funded in large part by contributions from Google, Amazon, Cisco Systems, Microsoft, and Facebook, as the Register's Pauli notes in a March 10, 2015, article.

Proprietary SSL implementations as a safer alternative to OpenSSL

The number and severity of OpenSSL security holes have caused some organizations to build their own proprietary SSL stack on application delivery controllers, as FirstPost's Shibu Paul describes in a March 9, 2015, article. ADCs are a new type of advanced load balancer for frontend servers that use a streamlined version of the SSL stack designed to be small enough to execute in the kernel.

An advantage of proprietary SSL stacks is that hackers don't have access to the code the way they do for open systems. If an organization discovers a vulnerability in its proprietary SSL stack, it can address the problem without the public being aware of it. That's why companies using ADCs weren't susceptible to Heartbleed or man-in-the-middle attacks.

Application delivery controllers are touted as a more-secure alternative to OpenSSL because they rely on a proprietary SSL stack. Source: Scope Middle East

Google's response to the many OpenSSL vulnerabilities was to create its own version of the encryption standard, called BoringSSL. As Matthew McKenna writes in a February 25, 2015, post on the TechZone 360 site, having to manage more than 70 OpenSSL patches was making it difficult for the company to maintain consistency across its multiple code bases.

Maintaining security without impacting manageability is a key precept of the new Morpheus Virtual Appliance, which lets you provision, deploy, and monitor heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases from a single point-and-click console. With the Morpheus database-as-a-service (DBaaS) you can manage all your SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

In addition, the service allows you to migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync. Visit the Morpheus site for pricing information and to create a free account.

Choosing the right database for a project is an extremely important step in planning and development. Picking the wrong setup can cost quite a bit of time and money, and can leave you with numerous upset users in the process. Both MongoDB and MySQL are excellent databases when used in their expected ways, but which one is better for building a social network?

What is MongoDB?

MongoDB is a NoSQL database, which means that related data gets stored in single documents for fast retrieval. This is often a good model for when data won’t need to be duplicated in multiple documents (which can cause inconsistencies).

An example of a MongoDB document. Source: MongoDB.

MongoDB is easily scalable in such cases, so the database can have rapid horizontal growth while automatically keeping things in order. This can be especially good when you have large amounts of data and need a quick response time.

What is MySQL?

MySQL is a relational database, which means that data gets stored (preferably) in normalized tables so that there is no duplication. This is a good model when you need data to be consistent and reliable at all times (such as personal information or financial data).

An example of a MySQL table. Source: MySQL.

While horizontal scaling can be more difficult, it does adhere to the ACID model (atomicity, consistency, isolation, durability), which means you have far fewer worries about data reliability.

How Does Social Networking Work?

Social networks offer different ways for people to connect. Whether it is through a mutual friendship, a business associate, or following a well-known person or business for updates, there are numerous methods of getting information out over social networks.

The key ingredient is in the connections: for anyone a user is connected with or following, that person will typically see the updates from all of those connections once logged in to the system.

An example social network relationship diagram. Source: SarahMei.com.

Comparison of the Databases

Given that social data has various relations, to users in particular, it lends better to a relational database over time. Even though a NoSQL solution like MongoDB can seem like a great way to retrieve lots of data quickly, the relational nature of users in a social network can cause lots of duplication to occur.

Such duplication lends itself to data becoming inconsistent and/or unreliable over time, or to queries becoming much more difficult to handle if the duplication is removed (since documents will likely need to point to other documents, which is not optimal for a NoSQL type of database).

As a result, MySQL would be the better recommendation, since it will have the data reliability and relational tools necessary to handle the interactions and relationships among numerous users. You may also decide to use both MySQL and MongDB together to utilize the best features of each database.

Get MySQL or MongoDB

Whether you decide to use one or both databases, the new Morpheus Virtual Appliance seamlessly provisions and manages both SQL and NoSQL databases across private and public (or even hybrid) clouds. With its easy to use interface, you can have a new instance of a database up and running in seconds.

Visit the Morpheus site to create a free account.

Researchers are developing compiler technologies that optimize and regenerate code in multiple languages and for many different platforms in only one or a handful of steps. While much of their work focuses on Java and JavaScript, their innovations will impact developers working in nearly all programming languages.

Who says you can't teach an old dog new tricks? One of the staples of any developer's code-optimization toolkit is a compiler, which checks your program's syntax, semantics, and other aspects for errors and otherwise optimizes its performance.

Infostructure Associates' Wayne Kernochan explains in an October 2014 TechTarget article that compilers are particularly adept at improving the performance of big data and business-critical online transaction processing (OLTP) applications. As recent developments in compiler technology point out, the importance of the programs goes far beyond these specialty apps.

Google is developing two new Java compilers named Jack (Java Android Compiler Kit) and Jill (Jack Intermediate Library Linker) that are part of Android SDK 21.1. I Programmer's Harry Fairhead writes in a December 12, 2014, article that Jack compiles Java code directly to a .dex Dalvik Executable rather than using the standard javac compiler to convert the source code to Java bytecode and then to Dalvik bytecode by feeding it through the dex compiler.

In addition to skipping the conversion to Java bytecode, Jack also optimizes and applies Proguard's obfuscation in a single step. The .dex code Jack generates can be fed to either the Dalvik engine or the new ART Android RunTime Engine, which uses Ahead-of-Time compilation to improve speed.

Jill converts .jar library files into the .jack library format to allow it to be merged with the rest of the object code.

Google's new Jack and Jill Java compilers promise to speed up compilation by generating Dalvik bytecode without first having to convert it from Java bytecode. Source: I Programmer

In addition to streamlining compilation, Jack and Jill reduce Google's reliance on Java APIs, which are the subject of the company's ongoing lawsuit with Oracle. At present, the compilers don't support Java 8, but in terms of retaining compatibility with Java, it appears Android has become the tail wagging the dog.

Competition among open-source compiler infrastructures heats up

The latest versions of the LLVM and Gnu Compiler Collection (GCC) are in a race to see which can out-perform the other. Both open-source compiler infrastructures generate object code from any kind of source code; they support C/C++, Objective-C, Fortran, and other languages. InfoWorld's Serdar Yegulalp reports in a September 8, 2014, article that testing conducted by Phoronix of LLVM 3.5 and a pre-release version of GCC 5 found that LLVM recorded faster C/C++ compile times. However, LLVM trailed GCC when processing some encryption algorithms and other tests.

Version 3.5 of the LLVM compiler infrastructure outperformed GCC 5 in some of Phoronix's speed tests but trailed in others, including audio encoding. Source: Phoronix

The ability to share code between JavaScript and Windows applications is a key feature of the new DuoCode compiler, which supports cross-compiling of C# code into JavaScript. InfoWorld's Paul Krill describes the new compiler in a January 22, 2015, article. DuoCode uses Microsoft's Roslyn compiler for code parsing, syntactic tree (AST) generation, and contextual analysis. DuoCode then handles the code translation and JavaScript generation, including source maps.

Another innovative approach to JavaScript compiling is the experimental Higgs compiler created by University of Montreal researcher Maxime Chevalier-Boisvert. InfoWorld's Yegulalp describes the project in a September 19, 2014, article. Higgs differs from other just-in-time (JIT) JavaScript compilers such as Google's V8, Mozilla's SpiderMonkey, and Apple's LLVM-backed FTLJIT project in that it has a single level rather than being multitiered, and it accumulates type information as machine-level code rather than using type analysis.

When it comes to optimizing your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases, the new Morpheus Virtual Appliance makes it as easy as pointing and clicking in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

With Morpheus, you can invoke a new database instance with a single click, and each instance includes a free full replica set for failover and fault tolerance. Your MySQL and Redis databases are backed up and you can administer your databases using your choice of tools. Visit the Morpheus site to create a free account.

Too often the increased efficiencies and performance improvements promised by new data technologies seem to vanish into thin air when the systems hit the production floor. Not so for these three companies that implemented MongoDB databases in very different environments, but realized very similar benefits: faster app speeds and lower overall system costs.

A hedge fund reduced its software licensing costs by a factor of 40, and its data storage by 40 percent. In addition, its quantitative analysts' modeling is now 25 times faster.

A retailer has installed in-store touch screens that give its customers an enjoyable, interactive shopping experience. The company can create and modify its online catalogs in just minutes to keep pace with ever-changing fashion trends.

A firm that provides affiliate-marketing and partner-management services for enterprises was able to expand without incurring the added expenses for hardware and services it anticipated. The company's customers realized improved performance because the new system's compression and other storage enhancements allowed more of their report requests to be processed in RAM.

All three of these success stories were made possible by converting the companies' traditional databases to MongoDB.

Hedge fund adopts a self-service model for financial analyses

In the past, whenever British hedge fund AHL Man Group wanted to add any new data sources, it became a long, drawn-out process that piled onto the IT department's busy workload. As ComputerWeekly's Brian McKenna reports in a January 21, 2015, article, AHL decided to standardize on Python in 2012, and subsequently discovered that Python interfaced very smoothly with its MongoDB databases.

By the end of 2013 the company had completed a proof-of-concept project, after which it was able to finalize its transition to MongoDB by the end of May 2014, at which time its legacy-system licenses expired. The result was a 40-fold decrease in licensing costs, and a 40 percent reduction in disk-storage requirements. In addition, the switch to a self-service model has allowed some of the firm's analysts to perform their "quant" modeling up to 25 times faster than previously.

Retailer's in-store tablets keep pace with fashion trends

Another January 21, 2015, article on the Apparel site recounts how retailer Chico's FAS developed a MongoDB-based application for its in-store touch-screen Tech Tablets that customers use as virtual catalogs. In addition to highlighting Chico's latest styles, the tablets show product videos and testimonials. The key benefit of the MongoDB application is the ability to create and adapt catalogs in minutes rather than the weeks required previously.

It took Chico's only five months to develop and implement the MongoDB-based app, which easily scaled to meet the retailer's increased demand in the holiday shopping season. More importantly, the app created an interactive, personalized shopping experience that's sure to bring its customers back for more.

MongoDB distro lets expanding marketer avoid high hardware costs

As a company grows, its data networks have to grow along with it, which often increases cost and complexity exponentially. Affiliate-marketing and partner-management firm Performance Horizon Group (PHG) faced skyrocketing hardware expenses as it grew its operations supporting enterprise clients in more than 150 countries.

By implementing Tokutek's TokuMX distribution of MongoDB, PHG reduced its need for new servers by a factor of eight, according to PHG CTO Pete Cheyne. In addition, each of the new servers required only half the RAM of its existing machines while accommodating a growing number of data sets. PHG's implementation of TokuMX is described in a December 2, 2014, Tokutek press release.

Any organization can improve the efficiency of its database-management operations by adopting the new Morpheus Virtual Appliance, which lets you manage heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

With Morpheus, you can invoke a new database instance with one click, and each instance includes a free full replica set for failover and fault tolerance. You can administer your databases using your choice of tools. Visit the Morpheus site to create a free account.

There isn't a single best programming language. Rather than flitting from one language to the next as each comes into fashion, determine the platform you want to develop apps for -- the web, mobile, gaming, embedded systems -- and then focus on the predominant language for that area.

"Which programming languages do you use?"

In many organizations, that has become a loaded question. There is a decided trend toward open source development tools, as indicated by the results of a Forrester Research survey of 1,400 developers. ZDNet's Steven J. Vaughan-Nichols reports on the study in an October 29, 2014, article.

Conventional wisdom says open-source development tools are popular primarily because they cost less than their proprietary counterparts. That belief is turned on its head by the Forrester survey, which found performance and reliability are the main reasons why developers prefer to work with open-source tools. (Note that Windows still dominates on the desktop, while open source leads on servers, in data centers, and in the cloud.)

Then again, "open source" encompasses a universe of different development tools for various platforms: the web, mobile, gaming, embedded systems -- the list goes on. A would-be developer can waste a lot of time bouncing from Rails to Django to Node.js to Scala to Clojure to Go. As Quincy Larson explains in a November 14, 2014, post on the FreeCodeCamp blog, the key to a successful career as a programmer is to focus.

Larson recounts his seven months of self-study of a half-dozen different programming languages before landing his first job as a developer -- in which he used none of them. Instead, his team used Ruby on Rails, a relative graybeard among development environments. The benefits of focusing on a handful of tools are many: developers quickly become experts, productivity is enhanced because people can collaborate without a difference in tools getting in the way, and programmers aren't distracted by worrying about missing out on the flavor of the month.

Larson recommends choosing a single type of development (web, gaming, mobile) and sticking with it; learning only one language (JavaScript/Node.js, Rails/Ruby, or Django/Python); and following a single online curriculum (such as FreeCodeCamp.com or NodeSchool.io for JavaScript, TheOdinProject.com or TeamTreehouse.com for Ruby, and Udacity.com for Python).

A cutout from Lifehacker's "Which Programming Language?" infographic lists the benefits of languages by platform. Source: Lifehacker

Why basing your choice on potential salary is a bad idea

Just because you can make a lot of money developing in a particular language doesn't mean it's the best career choice. Readwrite's Matt Asay points out in a November 28, 2014, article that a more rewarding criterion in the long run is which language will ensure you can find a job. Asay recommends checking RedMonk's list of popular programming languages.

Boiling the decision down to its essence, the experts quoted by Asay suggest JavaScript for the Internet, Go for the cloud, and Swift (Apple) or Java (Android) for mobile. Of course, as with most tech subjects, opinions vary widely. In terms of job growth, Ruby appears to be fading, Go and Node.js are booming, and Python is holding steady.

But don't bail on Ruby or other old-time languages just yet. According to Quartz's programmer salary survey, Ruby on Rails pays best, followed by Objective C, Python, and Java.

While Ruby's popularity may be on the wane, programmers can still make good coin if they know Ruby on Rails. Source: Quartz, via Readwrite

Also championing old-school languages is Readwrite's Lauren Orsini in a September 1, 2014, article. Orsini cites a study by researchers at Princeton and UC Berkeley that found inertia is the primary driver of developers' choice of language. People stick with a language because they know it, not because of any particular features of the language. Exhibits A, B, C, and D of this phenomenon are PHP, Python, Ruby, and JavaScript -- and that doesn't even include the Methuselah of languages: C.

No matter your language of choice, you'll find it combines well with the new Morpheus Virtual Appliance, which lets you monitor and manage heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases from a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

Rather than out-and-out replacing their relational counterparts, MongoDB and other NoSQL databases will coexist with traditional RDBMSs. However, as more -- and more varied -- data swamps companies, the scalability and data-model flexibility of NoSQL will make it the management platform of choice for many of tomorrow's data-analysis applications.

There's something comforting in the familiar. When it comes to databases, developers and users are warm and cozy with the standard, nicely structured tables-and-rows relational format. In the not-too-distant past, nearly all of the data an organization needed fit snugly in the decades-old relational model.

Well, things change. What's changing now is the nature of a business's data. Much time and effort has been spent converting today's square-peg unstructured data into the round hole of relational DBMSs. But rather than RDBMSs being modified to support the characteristics of non-textual, non-document data, companies are now finding it more effective to adapt databases designed for unstructured data to accommodate traditional data types.

Two trends are converging to make this transition possible: NoSQL databases such as MongoDB are maturing to add the data-management features businesses require; and the amount and types of data are exploding with the arrival of the Internet of Things (IoT).

Heterogeneous DBs are the wave of the future

As ReadWrite's Matt Asay reports in a November 28, 2014, article, any DBAs who haven't yet added a NoSQL database or two to their toolbelt are in danger of falling behind. Asay cites a report by Machine Research that found relational and NoSQL databases are destined to coexist in the data center: the former will continue to be used to process "structured, highly uniform data sets," while the latter will manage the unstructured data created by "millions and millions of sensors, devices, and gateways."

Relational databases worked for decades because you could predict the characteristics of the data they held. One of the distinguishing aspects of IoT data is its unpredictability: you can't be sure where it will come from, or what forms it will take. Managing this data requires a new set of skills, which has led some analysts to caution that a severe shortage of developers trained in NoSQL may impede the industry's growth.

The expected increase in NoSQL-based development in organizations could be hindered by a shortage of skilled staff. Source: VisionMobile, via ReadWrite

The ability to scale to accommodate data elements measured in the billions is a cornerstone of NoSQL databases, but Asay points out the feature that will drive NoSQL adoption is flexible data modeling. Whatever devices or services are deployed in the future, NoSQL is ready for them.

Document locking one sign of MongoDB's growing maturity

According to software consultant Andrew C. Oliver -- a self-described "unabashed fan of MongoDB" -- the highlight of last summer's MongoDB World conference was the announcement that document-level locking is now supported. Oliver gives his take on the conference happenings in a July 3, 2014, article on InfoWorld.

Oliver compares MongoDB's document-level locking to row-level locking in an RDBMS, although documents may contain much more data than a row in an RDBMS. Some conference-goers projected that multiple documents may one day be written with ACID consistency, even if done so "locally" to a single shard.

Another indication of MongoDB becoming suitable for a wider range of applications is the release of the SlamData analytics tool that works without having to export data via ETL from MongoDB to an RDBMS or Hadoop. InfoWorld's Oliver describes SlamData in a December 11, 2014, article.

In contrast to the Pentaho business-intelligence tool that also supports MongoDB, SlamData CEO Jeff Carr states that the company's product doesn't require a conversion of document databases to the RDBMS format. SlamData is designed to allow people familiar with SQL to analyze data based on queries of MongoDB document collections via a notebook-like interface.

The SlamData business-intelligence tool for MongoDB uses a notebook metaphor for charting data based on collection queries. Source: InfoWorld

There's no simpler or more-efficient way to manage heterogeneous databases than by using the point-and-click interface of the new Morpheus Virtual Appliance, which lets you monitor and analyze heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

There aren't many MySQL databases that don't need to support users' remote connections. While many failed remote connections can be traced to a misconfigured my.cnf file, the many nuances of MySQL remote links make troubleshooting a dropped network connection anything but straightforward.

Some Linux administrators were rattled this week to learn of the discovery by Qualys of a bug in the GNU C Library (glibc) that could render affected systems vulnerable to a remote code execution attack. In a January 27, 2015, article, The Register's Neil McAllister describes the dangers posed by Ghost to Linux and a handful of other OSes.

Ghost affects versions of glibc back to 2.2, which was released in 2000, but as threats go, this one appears to be pretty mild. For one thing, the routines involved are old and rarely used these days. Even when they are used, they aren't called in a manner that the vulnerability could exploit. Still, Linux vendors Debian, Red Hat, and Ubuntu have released patches for Ghost.

As Ars Technica's Dan Goodin explains in a January 27, 2015, article, Ghost may affect MySQL servers, Secure Shell servers, form submission apps, and other mail servers in addition to the Exim server on which Qualys demonstrated the remote code execution attack. However, Qualys has confirmed that Ghost does not impact Apache, Cups, Dovecot, GnuPG, isc-dhcp, lighttpd, mariadb/mysql, nfs-utils, nginx, nodejs, openldap, openssh, postfix, proftpd, pure-ftpd, rsyslog, samba, sendmail, sysklogd, syslog-ng, tcp_wrappers, vsftpd, or xinetd.

Finding a solution to common MySQL remote-connection glitches

While protecting against Ghost may be as simple as applying a patch, managing remote connections on MySQL servers and clients can leave DBAs pounding their keyboards. ITworld's Stephen Glasskeys writes in a December 19, 2014, post about the hoops he had to jump through to find the cause of a failed remote connection on a Linux MySQL server.

After using the ps command to list processes, Glasskeys found that the --skip-networking command was enabled, which tells MySQL not to listen for remote TCP/IP connections. Running KDE's Find Files/Folders tool determined that rc.mysqld was the only script file containing the text "--skip-networking".

Diagnosing the cause of failed remote connections on a MySQL server led to the file rc.mysqld. Source: ITworld

To restore remote connections, open rc.mysqld and comment out the command by placing a pound sign (#) at the beginning of the line. Then edit the MySQL configuration file /etc/my.cnf as follows, making sure bind-address is set to 0.0.0.0:

Edit the /etc/my.cnf file to ensure bind-address is set to 0.0.0.0. Source: ITworld

Finally, use the "iptables --list" command to make sure the Linux server is set to accept requests on MySQL's port 3306, and the "iptables" command to enable them if it's not. After you restart MySQL, you can test the remote connection using the credentials and other options as they appear in the my.cnf file on the Linux server.

When MySQL's % wildcard operator leaves a remote connection hanging

A Stack Overflow post from April 2013 describes a situation where MySQL's % wildcard operator failed to allow a user "user@%" to connect remotely. Such remote connections require that MySQL's bind port 3306 be present in each machine's IP in my.cnf. Also, the user has to be created in both localhost and the % wildcard, and permissions granted on all databases. (You may also need to open port 3306, depending on your OS.)

Enable remote connections in the MySQL my.cnf file by adding each machine's IP and creating users in localhost and the % wildcard. Source: ITworld

Diagnosing failed remote connections and other database glitches is facilitated by the point-and-click interface of the new Morpheus Virtual Appliance, which lets you manage heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

Document-level locking and pluggable storage APIs top the list of new features in MongoDB 3.0, but the big-picture view points to a more prominent role for NoSQL databases in companies of all types and sizes. The immediate future of databases is relational, non-relational, and everything in between -- sometimes all at once.

Version 3.0 of MongoDB, the leading NoSQL database, is being touted as the first release that is truly ready for the enterprise. The new version was announced in February and shipped in early March. At least one early tester, Adam Comerford, reports that MongoDB 3.0 is indeed more efficient at managing storage, and faster at reading compressed data.

The new feature in MongoDB 3.0 gaining the lion's share of analysts' attention is the addition of the WiredTiger storage engine and pluggable API that MongoDB acquired in December 2014. JavaWorld's Andrew C. Oliver states in a February 3, 2015, article that WiredTiger will likely boost performance over MongoDB's default MMapV1 engine in apps where reads don't greatly outnumber writes.

Oliver points out that WiredTiger's B-tree and Log Structured Merge (LSM) algorithms benefit apps with large caches (B-tree) and with data that doesn't cache well (LSM). WiredTiger also promises data compression that reduces storage needs by up to 80 percent, according to the company.

The addition of the WiredTiger storage engine is one of the new features in MongoDB 3.0 that promises to improve performance, particularly for enterprise customers. Source: Software Development Times

Other enhancements in MongoDB 3.0 include the following:

Document-level locking for concurrency control via WiredTiger
Collection-level concurrency control and more efficient journaling in MMapV1
A pluggable API for integration with in-memory, encrypted, HDFS, hardware-optimized, and other environments
The Ops Manager graphical management console in the enterprise version

Computing's John Leonard emphasizes in a February 3, 2015, article that MongoDB 3.0's multi-model functionality via the WiredTiger API positions the database to compete with DataStax' Apache Cassandra NoSQL database and Titan graph database. Leonard also highlights the new version's improved scalability.

Putting MongoDB 3.0 to the (performance) test

MongoDB 3.0's claims of improved performance were borne out by preliminary tests conducted by Adam Comerford and reported on his Adam's R&R blog in posts on February 4, 2015, and February 5, 2015. Comerford repeated compression tests with the WiredTiger storage engine in release candidate 7 (RC7) -- expected to be the last before the final version comes out in March -- that he ran originally using RC0 several months ago. The testing was done on an Ubuntu 14.10 host with an ext4 file system.

The results showed that WiredTiger's on-disk compression reduced storage to 24 percent of non-compressed storage, and to only 16 percent of the storage space used by MMapV1. Similarly, the defaults for WiredTiger with MongoDB (the WT/snappy bar below) used 50 percent of non-compressed WiredTiger and 34.7 percent of MMapV1.

Testing WiredTiger storage (compressed and non-compressed) compared to MMapV1 storage showed a tremendous advantage for the new MongoDB storage engine. Source: Adam Comerford

Comerford's tests of the benefits of compression for reads when available I/O capacity is limited demonstrated much faster performance when reading compressed data using snappy and zlib, respectively. A relatively slow external USB 3.0 drive was used to simulate "reasonable I/O constraints." The times indicate how long it took to read the entire 16GB test dataset from the on-disk testing into memory from the same disk.

Read tests from compressed and non-compressed disks in a simulated limited-storage environment indicate faster reads with WiredTiger in all scenarios. Source: Adam Comerford

All signs point to a more prominent role in organizations of all sizes for MongoDB in particular and NoSQL in general. Running relational and non-relational databases side-by-side is becoming the rule rather than the exception. The new Morpheus Virtual Appliance puts you in good position to be ready for multi-model database environments. It supports rapid provisioning and deployment of MongoDB v3.0 across public, private and hybrid clouds. Sign Up for a Free Trial now!

Tried-and-true languages such as Java, C++, Python, and JavaScript continue to dominate the most popular lists, but modern app development requires a multi-language approach to support diverse platforms and links to backend servers. The future will see new languages being used in conjunction with the old reliables.

Every year, new programming languages are developed. Recent examples are Apple's Swift and Carnegie Mellon University's Wyvernet. Yet for more than a decade, the same handful no. of languages have retained their popularity with developers -- Java, JavaScript, C/C++/C#/Objective-C, Python, Ruby, PHP -- even though each is considered to have serious shortcomings for modern app development.

According to TIOBE Software's TIOBE Index for January 2015, JavaScript recorded the greatest increase in popularity in 2014, followed by PL/SQL and Perl.

The same old programming languages dominate the popularity polls, as shown by the most-recent TIOBE Index. Source: TIOBE Software

Of course, choosing the best language for any development project rarely boils down to a popularity contest. When RedMonk's Donnie Berkholz analyzed GitHub language trends in May 2014, aggregating new users, issues, and repositories, he concluded that only five languages have mattered on GitHub since 2008: JavaScript, Ruby, Java, PHP, and Python.

An analysis of language activity on GitHub between 2008 and 2013 indicates growing fragmentation. Source: RedMonk

Two important caveats to Berkholz's analysis are that GitHub focused on Ruby on Rails when it launched but has since gone more mainstream; and that Windows and iOS development barely register because both are generally open source-averse. As IT World's Phil Johnson points out in a May 7, 2014, article, while it's dangerous to draw conclusions about language popularity based on this or any other single analysis, it seems clear the industry is diverging rather than converging.

Today's apps require a multi-language, multi-paradigm approach

Even straightforward development projects require expertise in multiple languages. TechCrunch's Danny Crichton states in a July 10, 2014, article that creating an app for the web and mobile entails HTML, CSS, and JavaScript for the frontend (others as well, depending on the libraries required); Java and Objective-C (or Swift) for Android and iPhone, respectively; and for links to backend servers, Python, Ruby, or Go, as well as SQL or other database query languages.

Crichton identifies three trends driving multi-language development. The first is faster adoption of new languages: GitHub and similar sites encourage broader participation in developing libraries and tutorials; and developers are more willing to learn new languages. Second, apps have to run on multiple platforms, each with unique requirements and characteristics. And third, functional programming languages are moving out of academia and into the mainstream.

Researcher Benjamin Erb suggests that rather than functional languages replacing object-oriented languages, the future will be dominated by multi-paradigm development, in particular to address concurrency requirements. In addition to supporting objects, inheritance, and imperative code, multi-paradigm languages incorporate higher-order functions, closures, and restricted mutability.

One way to future-proof your SQL, NoSQL, and in-memory databases is by using the new Morpheus Virtual Appliance, which lets you manage heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases in a single dashboard. Morpheus is the first and only database-as-a-service (DBaaS) that supports SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds.

Elasticsearch is a great tool to provide fast and powerful search services to your web sites or applications, but care should be taken when moving from development to production. By following the checklist below, you can avoid some issues that may arise if you use development settings in a production environment!

Configure Your Log and Data Paths

To minimize the chances of data loss in a production environment, it is highly recommended that you change your log and data paths from the default paths to something that is less likely to be accidentally overwritten.

You can make these changes in the configuration file (which uses YAML syntax) under path, as in the following example, which uses suggested production paths from the Elasticsearch team:

Suggested settings for the log and data paths. Source: Elasticsearch.

Configure Your Node and Cluster Names

When you are looking for a node or a cluster, it is a good idea to have a name which describes what you will need to find and separates one from another.

The default cluster name of "elasticsearch " could allow any nodes to join the cluster, even if this was not intended. Thus, it is a good idea to give the cluster a distinct identifier instead.

The default node names are chosen randomly from a set of roughly 3000 Marvel character names. While this wouldn't be so bad for a node or two, this could get quite confusing as you add more than a few nodes. The better option is to use a descriptive name from the beginning to avoid potential confusion as nodes are added later.

Configure Memory Settings

Memory swapping used on systems could potentially cause the elasticsearch process to be swapped, which would not be good while running in production. Suggestions from the Elasticsearch team to fix this include disabling swapping, configuring swapping to only run in emergency conditions, or (for Linux/Unix users) using mlockall to try to lock the address space of the process into RAM to keep it from being swapped.

Configure Virtual Memory Settings

Elasticsearch indices use mmapfs/niofs, but the default mmap count on operating systems can potentially be too low. If so, you will end up with errors such as "out of memory" exceptions. To fix this, you can up the limit to accommodate Elasticsearch indices. The following example shows how the Elasticsearch team recommends increasing this limit on Linux systems (run the command as root):

Suggested command to increase the mmap count for Linux systems. Source: Elasticsearch.

Ensure Elasticsearch Is Monitored

It is a good idea to monitor your Elasticsearch installation so that you can see the status or be alerted if or when something goes wrong. A service such as Happy Apps can provide this type of monitoring for you (and can monitor the rest of your app as well).

Get ElasticSearch in the Cloud

When you launch your application that uses Elasticsearch, you will want reliable and stable database hosting. Morpheus Virtual Appliance is a tool that allows you manage heterogeneous databases in a single dashboard.

With Morpheus, you have support for SQL, NoSQL, and in-memory databases like Redis across public, private, and hybrid clouds. So, visit the Morpheus site for pricing information or to create a free account today!

One of the most common MySQL operations is replicating databases between master and slave servers. While most such connections are straightforward to establish and maintain, on occasion something goes amiss: some master data may not replicate on the slave, or read requests may be routed to the master rather than to the server, for example. Finding a solution to a replication failure sometimes requires a little extra detective work.

Replication is one of the most basic operations in MySQL -- and any other database: it's used to copy data from one database server (the master) to one or more others (the slaves). The process improves performance by allowing loads to be distributed among multiple slave servers for reads, and by limiting the master server to writes.

Additional benefits of replication are security via slave backups; analytics, which can be performed on the slaves without affecting the master's performance; and widespread data distribution, which is accomplished without requiring access to the master. (See the MySQL Reference Manual for more on replication.)

As with any other aspect of database management, replication doesn't always proceed as expected. The Troubleshooting Replication section of the MySQL Reference Manual instructs you to check for messages in your error log when something goes wrong with replication. If the error log doesn't point you to the solution, ensure that binary logging is enabled in the master by issuing a SHOW MASTER STATUS statement. If it's enabled, "Position" is nonzero; if it isn't, make sure the master is running with the --log-bin option.

The manual offers several other replication-troubleshooting steps:

The master and slave must both start with the --server-id option, and each server must have a unique ID value;
Run SHOW SLAVE STATUS to ensure the Slave_IO_Running and Slave_SQL_Running values are both "yes";
Run SHOW_PROCESSLIST and look in the State column to verify that the slave is connecting to the master;
If a statement succeeded on the master but failed on the slave, the nuclear option is to do a full database resynchronization, which entails deleting the slave's database and copying a new snapshot from the master. (Several less-drastic alternatives are described in the MySQL manual.)

Solutions to real-world MySQL replication problems

What do you do when MySQL indicates the master-slave connection is in order, yet some data on the master isn't being copied to the slave? That's the situation described in a Stack Overflow post from March 2010.

Even though replication appears to be configured correctly, data is not being copied from the master to the slave. Source: Stack Overflow

The first step is to run "show master status" or "show master status\G" on the master database to get the correct values for the slave. The slave status above indicates the slave is connected to the master and awaiting log events. Synching the correct log file position should restore copying to the slave.

To ensure a good sync, stop the master, dump the database, record the master log file positions, restart the master, import the database to the slave, and start the slave in slave mode with the correct master log file position.

Another Stack Overflow post from March 2014 presents a master/slave setup using JDBC drivers in which transactions marked as read-only were still pinging the master. Since the MySQL JDBC driver was managing the connections to the physical servers -- master and slave -- the connection pool and Spring transaction manager weren't aware that the database connection was linking to multiple servers.

The solution is to return control to Spring, after which the transaction on the connection will be committed. The transaction debug message will indicate that queries will be routed to the slave server so long as the connection is in read-only mode. By resetting the connection before it is returned to the pool, the read-only mode is cleared and the last log message will show that queries are now being routed to the master server.

The point-and-click dashboard in the new Morpheus Virtual Appliance makes it a breeze to diagnose and repair replication errors -- and other hiccups -- in your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases. Morpheus lets you seamlessly provision, monitor, and analyze SQL, NoSQL, and in-memory databases across hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

With the Morpheus database-as-a-service (DBaaS), you can migrate existing databases from a private cloud to the public cloud, or from public to private. A new instance of the same database type is created in the other cloud, and real-time replication keeps the two databases in sync. Visit the Morpheus site to create a free account.

Before you sign on the dotted line for a cloud service supporting your application development or other core IT operation, make sure you have an easy, seamless exit strategy in place. Just because an infrastructure service is based on open-source software doesn't mean you won't be locked in by the service's proprietary APIs and other specialty features.

In the quest for ever-faster app design, deployment, and updating, developers increasingly turn to cloud infrastructure services. These services promise to let developers focus on their products rather than on the underlying servers and other exigencies required to support the development process.

However, when you choose cloud services to streamline development, you run the risk of being locked in, at either the code level or the architecture level. Florian Motlik, CTO of continuous-integration service Codeship, writes in a February 21, 2015, article on Gigaom that infrastructure services mask the complexities underlying cloud-based development.

Depending on the type of cloud infrastructure service you choose, the vendor may manage more or less of your data operations. Source: Crucial

Even when the services you use adhere strictly to open systems, there is always a cost associated with switching providers: transfer the data, change the DNS, and thoroughly test the new setup. Of particular concern are services such as Google App Engine that lock you in at the code level. However, Amazon Web Services Lambda, Heroku, and other infrastructure services that let you write Node.js functions and invoke them either via an API or on specific events in S3, Kinesis, or DynamoDB entail a degree of architecture lock-in as well.

To minimize lock-in, Motlik recommends using a micro-services architecture based on technology supported by many different providers, such as Rails or Node.

Cloud Computing Journal's Gregor Petri identifies four types of cloud lock-in: the horizontal type locks you into a specific product and prevents you from switching to a competing service; vertical limits your choices in other levels of the stack, such as database or OS; diagonal locks you into a single vendor's family of products, perhaps in exchange for reduced management and training costs, or to realize a substantial discount; and generational prevents you from adopting new technologies as they become available.

Gregor Petri identifies four types of cloud lock-in: horizontal, vertical, diagonal, and generational. Source: Cloud Computing Journal

Will virtualization bring about the demise of cloud lock-in?

Many cloud services are addressing the lock-in trap by making it easier for potential customers to migrate their data and development tools/processes from other platforms to the services' own environments. Infinitely Virtual founder and CEO Adam Stern claims that virtualization has "all but eliminated" lock-in related to operating systems and open source software. Stern is quoted by Linux Insider's Jack M. Germain in an article from November 2013.

Alsbridge's Rick Sizemore points out that even with the availability of tools for migrating data between VMWare, OpenStack, and Amazon Web Services, customers may be locked in by contract terms that limit when they can remove their data. Sizemore also cautions that services may combine open source tools in a proprietary way that locks in your data.

In a February 9, 2015, article in Network World, HotLink VP Jerry McLeod points out that you can minimize the chances of becoming locked into a particular service by ensuring that you can move hybrid workloads seamlessly between disparate platforms. McLeod warns that vendors may attempt to lock in their customers by requiring that they sign long-term contracts.

Seamless workload migration and customer-focused contract terms are only two of the features that make the new Morpheus Virtual Appliance a "lock-in free" zone. With the Morpheus database-as-a-service (DBaaS) you can provision, deploy, and monitor your MongoDB, Redis, MySQL, and ElasticSearch databases from a single point-and-click console. Morpheus lets you work with SQL, NoSQL, and in-memory databases across hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

Configure Your Log and Data Paths

You can make these changes in the configuration file (which uses YAML syntax) under path, as in the following example, which uses suggested production paths from the Elasticsearch team:

Suggested settings for the log and data paths. Source: Elasticsearch.

Configure Your Node and Cluster Names

When you are looking for a node or a cluster, it is a good idea to have a name which describes what you will need to find and separates one from another.

The default cluster name of "elasticsearch " could allow any nodes to join the cluster, even if this was not intended. Thus, it is a good idea to give the cluster a distinct identifier instead.

Configure Memory Settings

Configure Virtual Memory Settings

Suggested command to increase the mmap count for Linux systems. Source: Elasticsearch.

Ensure Elasticsearch Is Monitored

Get ElasticSearch in the Cloud

Depending on the type of cloud infrastructure service you choose, the vendor may manage more or less of your data operations. Source: Crucial

To minimize lock-in, Motlik recommends using a micro-services architecture based on technology supported by many different providers, such as Rails or Node.

Gregor Petri identifies four types of cloud lock-in: horizontal, vertical, diagonal, and generational. Source: Cloud Computing Journal

Will virtualization bring about the demise of cloud lock-in?

The new MySQL 5.7.6 developer milestone 16 features noteworthy security upgrades, but others propose more radical approaches to database security. One method puts applications in charge of testing and reporting on their own security, while another separates the app from all security responsibility by placing each in its own virtual machine.

When a database release claims to improve performance over its predecessors by a factor of two to three times, you take notice. That's what Kay Ewbank claims in a March 12, 2015, post on the iProgrammer site about MySQL 5.7.6 developer milestone 16. The new version was released on March 9, 2015, and is available for download (its source code can be downloaded from GitHub).

In a March 10, 2015, post on the MySQL Server blog, Geir Hoydalsvik lists milestone 16's many new features and fixes. (Prepare to give your mouse scroll wheel a workout: the list is long.) Ewbank points in particular to the InnoDB data engine's CREATE TABLESPACE syntax for creating general tablespaces in which you can choose your own mapping between tables and tablespaces. This allows you to group all the tables of one customer in a single tablespace, for example.

(Note Hoydalsvik's warning that the milestone release is "for use at your own risk" and may require data format changes or a complete data dump.)

One of the update's security enhancements relates to the way the server checks the validity of the secure_file_priv system variable, which is intended to limit the effects of data import and export operations. In the new release, secure_file_priv can be set to null to disable all data imports and exports. Also, the default value now depends on the INSTALL_LAYOUT CMake option.

The default value of the secure_file_priv system variable is platform specific in MySQL 5.7.6 developer milestone 16. Source: MySQL Release Notes

Apps that continuously test and report on their own security

Data security generally entails scanning applications to spot problems and missing patches. In a March 5, 2015, article on InformationWeek's Dark Reading site, Jeff Williams proposes building security into the application via "instrumentation," which entails continuous testing and reporting by the app of its own security status.

Instrumentation collects security information from the apps without requiring scans because the programs test their own security and report the results back to the server. Williams provides the example of reports identifying all non-parameterized queries in an organization based on a common SQL injection defense: requiring that all queries be parameterized.

The opposite extreme: Separating security from the application

Another novel approach to database security is exemplified in Waratek AppSecurity for Java, which is reviewed by SC Magazine's Peter Stephenson in a March 2, 2015, article. The premise is that security is too important to be left to the application's developers. Instead, create a sandbox for Java, similar to a firewall but without the tendency to report false positives.

Waratek's product assigns each app the equivalent of its own virtual container, complete with a hypervisor. The container holds its own security rules, which frees developers to focus solely on their applications. Stephenson offers the example of a container that defends against a SQL injection attack on a MySQL database.

Waratek AppSecurity for Java creates a secure virtual machine that applies security rules from outside the application. Source: Waratek

Application security is at the core of the new Morpheus Virtual Appliance. With the Morpheus database-as-a-service (DBaaS) you can provision, deploy, and monitor heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases from a single point-and-click console. Morpheus lets you work with your SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds in just minutes. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

It's not unusual for data analysts to spend more than half their time cleaning and converting data rather than extracting business intelligence from it. As data stores grow in size and data types proliferate, a new generation of tools are arriving that promise to deliver sophisticated analysis tools into the hands of non-data scientists.

One of the hottest job titles in technology is Data Scientist, perhaps surpassed only by the newest C-level position: Chief Data Scientist. IT's long-standing skepticism about such trends is evident by the joke cited by InfoWorld's Yves de Montcheuil that a data scientist is a business analyst who lives in California.

There's nothing funny about every company's need to translate its data into business intelligence. That's where data scientists take the lead role, but as the amount and types of data proliferate, data scientists find themselves spending the bulk of their time cleaning and converting data rather than analyzing and communicating it to business managers.

A recent survey of data scientists (registration required) conducted by IT-project crowdsourcing firm CrowdFlower found that two out of three analysts claim cleaning and organizing data is their most time-consuming task, and 52 percent report their biggest obstacle is poor quality data. While the respondents named 48 different technologies they use in their work, the most popular is Excel (55.6 percent), followed by the open source language R (43.1 percent) and the Tableau data-visualization software (26.1 percent).

Data scientists identify their greatest challenges as time spent cleaning data, poor data quality, lack of time for analysis, and ineffective data modeling. Source: CrowdFlower

What's holding data analysis back? The data scientists surveyed cite a lack of tools required to do their job effectively (54.3 percent), failure of their organizations to state goals and objectives clearly (52.3 percent), and insufficient investment in training (47.7 percent).

A dearth of tools, unclear goals, and too little training are reported as the principal impediments to data scientists' effectiveness. Source: CrowdFlower

New tools promise to 'consumerize' big data analysis

It's a common theme in technology: In the early days, only an elite few possess the knowledge and tools required to understand and use it, but over time the products improve and drop in price, businesses adapt, and the technology goes mainstream. New data-analysis tools are arriving that promise to deliver the benefits of the technology to non-scientists.

Steve Lohr profiles several of these products in an August 17, 2014, article in the New York Times. For example, ClearStory Data's software combines data from multiple sources and converts it into charts, maps, and other graphics. Taking a different approach to the data-preparation problem is Paxata, which offers software that retrieves, cleans, and blends data for analysis by various visualization tools.

The not-for-profit Open Knowledge Labs bills itself as a community of "civic hackers, data wranglers and ordinary citizens intrigued and excited by the possibilities of combining technology and information for good." The group is seeking volunteer "data curators" to maintain core data sets such as GDP and ISO-codes. OKL's Rufus Pollock describes the project in a January 3, 2015, post.

Open Knowledge Labs is seeking volunteer coders to curate core data sets as part of the Frictionless Data Project. Source: Open Knowledge Labs

There's no simpler or straightforward way to manage your heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases than by using the new Morpheus Virtual Appliance. Morpheus lets you seamlessly provision, monitor, and analyze SQL, NoSQL, and in-memory databases across hybrid clouds via a single point-and-click dashboard. Each database instance you create includes a free full replica set for built-in fault tolerance and fail over.

TL;DR: Even experienced SQL programmers can sometimes be thrown for an infinite loop -- or other code failure -- by one of the many pitfalls of the popular development platform. Get the upper hand by monitoring and managing your databases in the cloud via the Morpheus database-as-a-service.

When an database application goes belly up, the cause can often be traced to sloppy coding -- but not always. Every now and then, the reason for a misbehaving app is an idiosyncrasy in the platform itself. Here's how to prevent your database from tripping over one of these common SQL errors.

In an August 11, 2014, article in the Database Journal, Rob Gravelle describes a "gotcha" (not a bug) in the way MySQL handles numeric value overflows. When a value supplied by an automated script or application is outside the range of a column data type, MySQL truncates the value to an entry within the acceptable range.

Generally, a database system will respond to an invalid value by generating an error instructing the script to proceed or abort, or it will substitute the invalid error with its best guess as to which valid entry was intended. Of course, truncating or substituting the entered value is almost certainly going to introduce an error into the table. Gravelle explains how to override MySQL's default handling of overflow conditions to ensure it generates an error, which is the standard response to invalid entries by most other databases.

Don't fall victim to one of the most-common developer mistakes

According to Justin James on the Tech Republic site, the most common database programming no-no is misuse of primary keys. James insists that primary keys should have nothing at all to do with the application data in a row. Except in the "most unusual of circumstances," primary keys should be generated sequentially or randomly by the database upon row insertion and should not be changed. If they're not system values managed by the system, you're likely to encounter problems when you change the underlying data or migrate the data to another system.

Another frequent cause of program problems is overuse of stored procedures, which James describes as a "maintenance disaster." There's no easy way to determine which applications are using a particular stored procedure, so you end up writing a new one when you make a significant change to an app rather than adapting an existing stored procedure. Instead, James recommends that you use advanced object-relational mappers (ORMs).

Not every developer is sold on ORMs, however. On his Experimental Thoughts blog, Jeff Davis explains why he shies away from ORMs. Because ORMs add more lines of code between the application and the data, they invite more semantic errors. Davis points out that debugging in SQL is simpler when you can query the database as if you were an application. The more lines of code between the application error and the database, the more difficult it is to find the glitch.

One of the common database errors identified by Thomas Larock on the SQL Rockstar site is playing it safe by overusing the BIGINT data type. If you're certain no value in a column will exceed 100,000, there's no need to use the 8-byte BIGINT data type when the 4-byte INT data type will suffice. You may not think a mere 4 bytes is significant, but what if the table ends up with 2 million rows? Then your app is wasting 7.8MB of storage. Similarly, if you know you won't need calendar dates before the year 1900 or after 2079, using SMALLDATETIME will make our app much more efficient.

On the SQL Skills site, Kimberly Tripp highlights another common database-design error: use of non-sequential globally unique identifiers, or GUIDs. In addition to creating fragmentation in the base table, non-sequential GUIDs are four times wider than an INT-based identity.

Is your app's failure to launch due to poor typing skills?

Maybe the best way to start your hunt for a coding error is via a printout rather than with a debugger. That's the advice of fpweb.net's tutorial on troubleshooting SQL errors. The problem could be due to a missing single-quote mark, or misuse of double quotes inside a string. If you get the “No value given for one or more required parameters” error message, make sure your column and table names are spelled correctly

Likewise, if the error states “Data type mismatch in criteria expression,” you may have inserted letters or symbols in a column set for numeric values only (or vice-versa). The FromDual site provides a complete list of MySQL error codes and messages, including explanations for many of the codes, possible sources for the errors, and instructions for correcting many of the errors.

MySQL Error Codes

The FromDual site explains how to find sources of information about MySQL error messages.

MySQL error message

For many of the MySQL error messages, the FromDual index provides an explanation, reasons for the error message's appearance, and potential fixes.

Cloud database service helps ensure clean, efficient code

One of the benefits of the Morpheus cloud database-as-a-service is the ability to analyze your database in real time to identify and address potential security vulnerabilities and other system errors. TechTarget's Brien Posey points to another benefit of the database-as-a-service model: By building redundancy into all levels of their infrastructure, cloud database services help organizations protect against data loss and ensure high availability.

In addition to auto backups, replication, and archiving, Morpheus's service features a solid-state-disk-backed infrastructure that increases I/O operations per second (IOPs) by 100 times. Latency is further reduced via direct connections to EC2. Databases are monitored and managed continuously by Morpheus's crack team of DBAs and by the service's sophisticated robots.

Morpheus supports MongoDB, MySQL, Redis, and Elasticsearch. Platform support includes Amazon Web Services, Rackspace, Hiroku, Joyent, Cloud Foundry, and Windows Azure. Visit the Morpheus site for pricing information; free databases are available during the service's beta.

Hadoop Opener

TL; DR: The tremendous growth predicted for the open-source Hadoop architecture for data analysis is driven by the mind-boggling increase in the amount of structured and unstructured data in organizations, and the need for sophisticated, accessible tools to extract business and market intelligence from the data. New cloud services such as Morpheus let organizations of all sizes realize the potential of Big Data analysis.

The outlook is rosy for Hadoop -- the open-source framework designed to facilitate distributed processing of huge data sets. Hadoop is increasingly attractive to organizations because it delivers the benefits of Big Data while avoiding infrastructure expenses.

A recent report from Allied Market Research concludes that the Hadoop market will realize a compound annual growth rate of 58.2 percent from 2013 to 2020, to a total value of $50.2 billion in 2020, compared to $1.5 billion in 2012.

Hadoop Market Size

Allied Market Research forecasts a $50.2 billion global market for Hadoop services by the year 2020.

Just how "big" is Big Data? According to IBM, 2.5 quintillion bytes of data are created every day, and 90 percent of all the data in the world was created in the last two years. Realizing the value of this huge information store requires data-analysis tools that are sophisticated enough, cheap enough, and easy enough for companies of all sizes to use.

Many organizations continue to consider their proprietary data too important a resource to store and process off premises. However, cloud services now offer security and availability equivalent to that available for in-house systems. By accessing their databases in the cloud, companies also realize the benefits of affordable and scalable cloud architectures.

The Morpheus database-as-a-service offers the security, high availability, and scalability organizations require for their data-intelligence operations. Performance is maximized through Morpheus's use of 100-percent bare-metal SSD hosting. The service offers ultra-low latency to Amazon Web Services and other peering points and cloud hosting platforms.

The Nuts and Bolts of Hadoop for Big Data Analysis

The Hadoop architecture distributes both data storage and processing to all nodes on the network. By placing the small program that processes the data in the node with the much larger data sets, there's no need to stream the data to the processing module. The processor splits its logic between a map and a reduce phase. The Hadoop scheduling and resource management framework executes the map and reduce phases in a cluster environment.

The Hadoop Distributed File System (HDFS) data storage layer uses replicas to overcome node failures and is optimized for sequential reads to support large-scale parallel processing. The market for Hadoop really took off when the framework was extended to support the Amazon Web Services S3 and other cloud-storage file systems.

Adoption of Hadoop in small and midsize organizations has been slow despite the framework's cost and scalability advantages because of the complexity of setting up and running Hadoop clusters. New services do away with much of the complexity by offering Hadoop clusters that are managed and ready to use: there's no need to configure or install any services on the cluster nodes.

Netflix data warehouse combines Hadoop and Amazon S3 for infinite scalability

For its petabyte-scale data warehouse, Netflix chose Amazon's Storage Service (S3) over the Hadoop Distributed File System for the cloud-based service's dynamic scalability and limitless data and computational power. Netflix collects data from billions of streaming events from televisions, computers, and mobile devices.

With S3 as its data warehouse, Hadoop clusters with hundreds of nodes can be configured for various workloads, all able to access the same data. Netflix uses Amazon's Elastic MapReduce distribution of Hadoop and has developed its own Hadoop Platform as a Service, which it calls Genie. Genie lets users submit jobs from Hadoop, Pig, Hive, and other tools without having to provision new clusters or install new clients via RESTful APIs.

Netflix Hadoop S3 Data Warehouse

The Netflix Hadoop-S3 data warehouse offers unmatched elasticity in terms of data and computing power in a widely distributed network.

There is clearly potential in combining Hadoop and cloud services, as Wired's Marco Visibelli explains in an August 13, 2014, article. Visibelli describes how companies leverage Big Data for forecasting by scaling from small projects via Amazon Web Services and scaling up as their small projects succeed. For example, a European car manufacturer used Hadoop to combine several supplier databases into a single 15TB database, which saved the company $16 million in two years.

Hadoop opens the door to Big Data for organizations of all sizes. Projects that leverage the scalability, security, accessibility, and affordability of cloud services such as Morpheus's database as a service have a much greater chance of success.

[TL:DR] The theft of hundreds of millions of user IDs, passwords, and email addresses was made possible by a database programming technique called dynamic SQL, which makes it easy for hackers to use SQL injection to gain unfettered access to database records. To make matters worse, the dynamic SQL vulnerability can be avoided by using one of several simple programming alternatives.

How is it possible for a simple hacking method which has been publicized for as many as 10 years to be used by Russian cybercriminals to amass a database of more than a billion stolen user IDs and passwords? Actually, the total take by the hackers in the SQL injection attacks revealed earlier this month by Hold Security was 1.2 billion IDs and passwords, along with 500 million email addresses, according to an article written by Nicole Perlroth and David Gelles in the August 5, 2014, New York Times.

Massive data breaches suffered by organizations of all sizes in recent years can be traced to a single easily preventable source, according to security experts. In an interview with IT World Canada's Howard Solomon, security researcher Johannes Ullrich of the SANS Institute blames an outdated SQL programming technique that continues to be used by some database developers. The shocker is that blocking such malware attacks is as easy as using two or three lines of code in place of one. Yes, according to Ullrich, it's that simple.

The source of the vulnerability is dynamic SQL, which allows developers to create dynamic database queries that include user-supplied data. The Open Web Application Security Project (OWASP) identifies SQL, OS, LDAP, and other injection flaws as the number one application security risk facing developers. An injection involves untrusted data being sent to an interpreter as part of a command or query. The attacker's data fools the interpreter into executing commands or accessing data without authentication.

A1 Injection

According to OWASP, injections are easy for hackers to implement, difficult to discover via testing (but not by examining code), and potentially severely damaging to businesses.

The OWASP SQL Injection Prevention Cheat Sheet provides a primer on SQL injection and includes examples of unsafe and safe string queries in Java, C# .NET, and other languages.

String Query

An example of an unsafe Java string query (top) and a safe Java PreparedStatement (bottom).

Dynamic SQL lets comments be embedded in a SQL statement by setting them off with hyphens. It also lets multiple SQL statements to be strung together, executed in a batch, and used to query metadata from a standard set of system tables, according to Solomon.

Three simple programming approaches to SQL-injection prevention

OWASP describes three techniques that prevent SQL injection attacks. The first is use of prepared statements, which are also referred to as parameterized queries. Developers must first define all the SQL code, and then pass each parameter to the query separately, according to the OWASP's prevention cheat sheet. The database is thus able to distinguish code from data regardless of the user input supplied. A would-be attacker is blocked from changing the original intent of the query by inserting their own SQL commands.

The second prevention method is to use stored procedures. As with prepared statements, developers first define the SQL code and then pass in the parameters separately. Unlike prepared statements, stored procedures are defined and stored in the database itself, and subsequently called from the application. The only caveat to this prevention approach is that the procedures must not contain dynamic SQL, or if it can't be avoided, then input validation or another technique must be employed to ensure no SQL code can be injected into a dynamically created query.

The last of the three SQL-injection defenses described by OWASP is to escape all user-supplied input. This method is appropriate only when neither prepared statements nor stored procedures can be used, whether because doing so would break the application or render its performance unacceptable. Also, escaping all user-supplied input doesn't guarantee your application won't be vulnerable to a SQL injection attack. That's why OWASP recommends it only as a cost-effective way to retrofit legacy code.

All databases support one or more character escaping schemes for various types of queries. You could use an appropriate escaping scheme to escape all user-supplied input. This prevents the database from mistaking the user-supplied input for the developer's SQL code, which in turn blocks any SQL injection attempt.

The belt-and-suspenders approach to SQL-injection prevention

Rather than relying on only one layer of defense against a SQL injection attack, OWASP recommends a layered approach via reduced privileges and white list input validation. By minimizing the privileges assigned to each database account in the environment, DBAs can reduce the potential damage incurred by a successful SQL injection breach. Read-only accounts should be granted access only to those portions of database tables they require by creating a specific view for that specific level of access. Database accounts rarely need create or delete access, for example. Likewise, you can restrict the stored procedures certain accounts can execute. Most importantly, according to OWASP, minimize the privileges of the operating system account the database runs under. MySQL and other popular database systems are set with system or root privileges by default, which likely grants more privileges than the account requires.

Adopting the database-as-a-service model limits vulnerability

Organizations of all sizes are moving their databases to the cloud and relying on services such as Morpheus to ensure safe, efficient, scalable, and affordable management of their data assets. Morpheus supports MongoDB, MySQL, Redis, ElasticSearch, and other DB engines. The service's real-time monitoring lets you analyze and optimize the performance of database applications.

In addition to 24/7 monitoring of your databases, Morpheus provides automatic backup, restoration, and archiving of your data, which you can access securely via a VPN connection. The databases are stored on Morpheus's solid-state drives for peak performance and reliability.

Happy Apps - How to Prevent a Heroku Dyno from Idling

Upcoming OpenSSL Security Overhaul Is Long Overdue

MySQL vs. MongoDB: The Pros and Cons When Building a Social Network

New Compilers Streamline Optimization and Enhance Code Conversion

How Three Companies Improved Performance and Saved Money with MongoDB

The Key to Selecting a Programming Language: Focus

MongoDB Poised to Play a Key Role in Managing the Internet of Things

Troubleshoot Lost MySQL Remote Connections

MongoDB 3.0 First Look: Faster, More Storage Efficient, Multi-model

Preparing Developers for a Multi-language Multi-paradigm Future

Don't Go Into Elasticsearch Production without this Checklist

Troubleshooting Problems with MySQL Replication

Avoid Being Locked into Your Cloud Services

Don't Go Into Elasticsearch Production without this Checklist

Avoid Being Locked into Your Cloud Services

Three Different Approaches to Hardening MySQL Security

How to Minimize Data Wrangling and Maximize Data Intelligence

Don't Fall Victim to One of These Common SQL Programming 'Gotchas'

Don't Drown Yourself With Big Data: Hadoop May Be Your Lifeline

The SQL Vulnerability Hackers Leverage to Steal Your IDs, Passwords, and More