Update: StackIQ Cluster Manager Now Integrated With Cloudera

April 8, 2014, 4:00 pm

≫ Next: Deploy and Manage Red Hat Enterprise Linux OpenStack Platform With StackIQ

Updated: 4/8/2014 (Note that these instructions are for Cloudera Enterprise 4. To use StackIQ Cluster Manager with Cloudera Enterprise 5, please contact support@stackiq.com)

StackIQ takes a “software defined infrastructure” approach to provision and manage cluster infrastructure that sits below Big Data applications like Hadoop. In this post, we’ll discuss how this is done, followed by a step-by-step guide to installing Cloudera Manager on top of StackIQ’s management system.

Components:

The hardware used for this deployment was a small cluster: 1 node (i.e. 1 server) is used for the StackIQ Cluster Manager and 4 nodes are used as backend/data nodes. Each node has 2 disks and all nodes are connected together via 1Gb Ethernet on a private network. The StackIQ Cluster Manager node is also connected to a public network using its second NIC. StackIQ Cluster Manager has been used in similar deployments between 2 nodes and 4,000+ nodes.

Image 1 resized 600

Step 1: Install StackIQ Cluster Manager

The StackIQ Cluster Manager node is installed from bare metal (i.e. there is no prerequisite software and no operating system previously installed) by burning the StackIQ Cluster Core Roll ISO to DVD and booting from it (the StackIQ Cluster Core Roll can be downloaded from the Rolls section after registering). The Core Roll leads the user through a few simple forms (e.g., what is the IP address of the Cluster Manager, what is the gateway, DNS server) and then asks for a base OS DVD (for example, Red Hat Enterprise Linux 6.5; other Red Hat-like distributions such as CentOS are supported as well). The installer copies all the bits from both DVDs and automatically creates a new Red Hat distribution by blending the packages from both DVDs together.

The remainder of the Cluster Manager installation requires no further manual steps and this entire step takes between 30 to 40 minutes.

Step 2: Install the CDH Bridge Roll

StackIQ has developed software that “bridges” our core infrastructure management solution to Cloudera’s Hadoop distribution that we’ve named the CDH Bridge Roll. One feature of our management solution is that it records several parameters about each backend node (e.g., number of CPUs, networking configuration, disk partitions) in a local database. After StackIQ Cluster Manager is installed and booted, it is time to download and install the CDH Bridge Roll:

Log into the frontend as "root", download cdh-bridge ISO from here.
Then execute the following comands at the root prompt"

 # rocks add roll <path_to_iso>

 # rocks enable roll cdh-bridge

 # rocks create distro

 # rocks run roll cdh-bridge | sh

The cluster is now configured to install Cloudera packages on all nodes.

Step 3: Install Cloudera Manager and Cloudera CDH4 Roll

You can download a prepackaged Cloudera Manager here and a prepackaged Cloudera CDH4 from here.

We will now install these 2 ISOs.

 rocks add roll cloudera-cdh4/cloudera-cdh4-6.5-0.x86_64.disk1.iso

 rocks add roll cloudera-manager/cloudera-manager-6.5-0.x86_64.disk1.iso

 rocks enable roll cloudera-cdh4

 rocks enable roll cloudera-manager

 rocks create distro

 rocks run roll cloudera-cdh4 | sh

 rocks run roll cloudera-manager | sh

Step 4: Install the backend nodes

Before we install the backend nodes (also known as compute nodes), we want to ensure that all disks in the backend nodes are optimally configured for HDFS. During an installation of a data node, our software interacts with the disk controller to optimally configure it based on the node’s intended role. For data nodes, the disk controller will be configured in “JBOD mode” with each disk configured as a RAID 0, a single partition will be placed on each data disk and a single file system will be created on that partition. For example, if a data node has one boot disk and 4 data disks, after the node installs and boots, you’ll see the following 4 file systems on the data disks: /hadoop01, /hadoop02, /hadoop03 and /hadoop04.

For more information on this feature, see our blog post Why Automation is the Secret Ingredient for Big Data Clusters.

Now we don’t want to reconfigure the controller and reformat disks on every installation, so we need to instruct the StackIQ Cluster Manager to perform this task the next time the backend nodes install. We do this by setting an attribute (“nukedisks”) with the rocks command line:

# rocks set appliance attr compute nukedisks true

# rocks set appliance attr cdh-manager nukedisks true

Now we are ready to install the backend nodes. First we put the StackIQ Cluster Manager into "discovery" mode using the CLI or GUI and all backend nodes are PXE booted. We will boot the first node as a cdh-manager appliance. The cdh-manager node will run the Cloudera Manager web admin console used to configure, monitor and manager CDH.

describe the image

After installing it shows up as below:

2 cdh manager after discovery (framed) resized 600

We will install all the other nodes in the cluster as compute nodes. StackIQ Cluster Manager discovers and installs each backend node in parallel (10 to 20 minutes) - no manual steps are required.

describe the image

For more information on installing and using the StackIQ Cluster Manager (a.k.a., Rocks+), please visit StackIQ Support or watch the the demo video.

After all the nodes in the cluster are up and running you will be ready to install Cloudera Manager. In this example, the StackIQ Cluster Manager node was named “frontend” and the compute nodes were assigned default names of compute-0-0, compute-0-1, compute-0-2 (3 nodes in Rack 0), and compute-1-0 (1 node in Rack 1).

Step 5: Install Cloudera Manager

SSH into cdh-manager appliance, as root, execute:

# /opt/rocks/sbin/cloudera-manager-installer.bin --skip_repo_package=1

This will install Cloudera Manager with packages from our local yum repository as opposed to fetching packages over the internet.

Step 6: Select What to Install

Log into the cdh-manager node http://<cdh-manager>:7180 (where ‘<cdh-manager>’ is the FQDN of your StackIQ Cluster Manager) with username admin and password admin

Image 2 (cropped) resized 600

Choose Cloudera Enterprise trial if you want to do a trial run

Image 3 copy resized 600

The GUI will now prompt you to restart Cloudera Manager server. Run the following command on cdh-manager node.

# service cloudera-scm-server restart

After restarting the server, you will be asked to login again. Click Continue in the screen below.

Image 4 framed resized 600

Specify list of hosts for CDH installation e.g., compute-0-[0-3],cdh-manager-0-0

Image 5 cropped resized 600

After all the hosts are identified, hit Continue

Image 1 resized 600

Choose Use Packages and select CDH4 as the version in the screen below.

Image 7 copy copy resized 600

Specify custom repository as the CDH release you want to install. Specify http://<frontend>/install/distributions/rocks-dist/x86_64/ for the URL of the repository where <frontend> is the IP address of the cluster’s frontend.

describe the image

In the example above, 10.1.1.1 was the IP address of the private eth0 interface on the frontend.

Choose All hosts accept same private key as the authentication method. Use Browse to upload the private key present in /root/.ssh/id_rsa on StackIQ Cluster Manager.

Image 9 copy resized 600

You will then see a screen where the progress of the installation will be indicated. After installation completes successfully, hit Continue.

Image 10 copyn(cropped) copy resized 600

You will then be directed to the following screen where all hosts will be inspected for correctness.

Image 11 copy resized 600

Choose a combination of services you want to install and hit Continue

Image 12 cropped resized 600

Review that all services were successfully installed.

Image 13 copy resized 600

Finally your Hadoop services will be started.

Image 14 (cropped) copy resized 600

Step 7: Run a Hadoop sample program

It is never enough to set up a cluster and the applications users need and then let them have at it. There are generally nasty surprises for both parties when this happens. A validation check is a requirement to make sure everything is working the way it is expected to.

Do this to check to test if the cluster is functional:

Log into the the frontend as “root” via SSH or Putty.
On the command line, run the following map-reduce program as the “hdfs” user, which runs a simulation to estimate the value of pi based on sampling:

# sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 10 10000

Output should look something like this.

Image 15 copy resized 600

Congratulations, you are done!

We’re certain you’ll find this the quickest way to deploy a cluster capable of running Cloudera Hadoop. Give it a shot and send us your questions!

The StackIQ Team

@StackIQ

↧

Deploy and Manage Red Hat Enterprise Linux OpenStack Platform With StackIQ

April 15, 2014, 10:03 am

≫ Next: How to Use Cloudera Enterprise 5 With StackIQ Cluster Manager

≪ Previous: Update: StackIQ Cluster Manager Now Integrated With Cloudera

StackIQ has officially partnered with Red Hat to simplify the process of deploying Red Hat Enterprise Linux OpenStack Platform. StackIQ Cluster Manager is ideal for deploying the hardware infrastructure and application stacks of heterogeneous data center environments. With Red Hat Enterprise Linux OpenStack offering, StackIQ Cluster Manager handles the automatic bare-metal installation of heterogeneous hardware and correct configuration of the multiple networks required by OpenStack. StackIQ Cluster Manager also automatically deploys Red Hat Foreman which enables web-based driven cluster configuration on deployed nodes of OpenStack software. All OpenStack services are available for management and deployment via the OpenStack Dashboard, including Nova, Neutron, Cinder, Swift, etc. StackIQ Cluster Manager enables the ongoing management, deployment, and integration of Red Hat OpenStack services on growing and multi-use clusters.

StackIQ takes a “software defined infrastructure” approach to provision and manage cluster infrastructure that sits below applications like OpenStack and Hadoop. In this post, we’ll discuss how this is done, followed by a step-by-step guide to installing Red Hat Enterprise Linux OpenStack with StackIQ’s management system

Components:

The hardware used for this deployment was a small cluster: 1 node (i.e., 1 server) is used for the StackIQ Cluster Manager, 1 node is used as the Foreman server, and 3 nodes are used as backend/data nodes. Each node has 1 disk and all nodes are connected together via 1Gb Ethernet on a private network. StackIQ Cluster Manager, Foreman server, and OpenStack controller nodes are also connected to a corporate public network using the second NIC. Additional networks dedicated to OpenStack services can also be used but are not depicted in this graphic or used in this example. StackIQ Cluster Manager has been used in similar deployments between 2 nodes and 4,000+ nodes.

Step 1. Install StackIQ Cluster Manager

The StackIQ Cluster Manager node is installed from bare-metal (i.e., there is no pre-requisite software and no operating system previously installed) by burning the StackIQ Cluster Core Roll ISO to DVD and booting from it (the StackIQ Cluster Core Roll can be obtained from the “Rolls” section after registering at http://www.stackiq.com/download/). The Cluster Core Roll leads the user through a few simple forms (e.g., what is the IP address of StackIQ Cluster Manager, what is the gateway, DNS server, etc.) and then asks for a base OS DVD (for example, Red Hat Enterprise Linux 6.5; other Red Hat-like distributions such as CentOS are supported as well, but for Red Hat Enterprise Linux, only certified media is acceptable). The installer copies all the bits from both DVDs and automatically creates a new Red Hat distribution by blending the packages from both DVDs together.

The remainder of StackIQ Cluster Manager installation requires no further manual steps and this entire step takes between 30 to 40 minutes.

A detailed description of StackIQ Cluster Manager can be found in section 3 of the StackIQ Users Guide. It is strongly recommended that you familiarize yourself with at least this section before proceeding. (C’mon, really, it’s not that bad. The print is large and there are a bunch of pictures, shouldn’t take long.)

https://s3.amazonaws.com/stackiq-release/stack3/roll-cluster-core-usersguide.pdf

If you have further questions, please contact support@stackiq.com for additional informatio.

Step 2. Install the Red Hat Enterprise Linux OpenStack Bridge

StackIQ has developed software that “bridges” our core infrastructure management solution to Red Hat’s OpenStack Platform we’ve named the RHEL OpenStack Bridge Roll. The RHEL OpenStack Bridge Roll is used to spin up Foreman services by installing a Foreman appliance. This allows you to leverage RHEL’s Foreman OpenStack puppet integration to deploy a fully operational OpenStack Cloud.

StackIQ Cluster Manager uses the concept of “rolls” to combine packages (RPMs) and configuration (XML files which are used to build custom kickstart files) to dynamically add and automatically configure software services and applications.

The first step is to install a StackIQ Cluster Manager as a deployment machine. This requires that you use, at a minimum, the cluster-core and RHEL 6.5 ISOs. It’s not possible to add StackIQ Cluster Manager on an already existing RHEL 6.5 machine. You must start with the installation of StackIQ Cluster Manager. The rhel-openstack-bridge roll can be added once the StackIQ Cluster Manager is up. We will also

It is highly recommended that you check the MD5 checksums of the downloaded media

You must burn the cluster-core roll and RHEL Server 6.5 ISOs to disk, or, if installing via virtual CD/DVD, simply mount the ISOs on the machine's virtual media via the BMC.

Then follow this https://s3.amazonaws.com/stackiq-release/stack3/roll-cluster-core-usersguide.pdf for instructions on how to install StackIQ Cluster Manager in section 3. (Yes! I mentioned it again.)

What You’ll Need:

After StackIQ Cluster Manager is installed and booted, add the Red Hat Enterprise Linux OpenStack Bridge, create the Red Hat Enterprise Linux OpenStack roll, and create an updated Red Hat Enterprise Linux Server distribution roll.
Creating the

On the StackIQ Cluster Manager, download rhel-openstack-bridge roll ISO: http://stackiq-release.s3.amazonaws.com/stack3/rhel-openstack-bridge-1.0-0.x86_64.disk1.iso

Copy the roll to a directory on the StackIQ Cluster Manager. "/export" is a good place as it should be the largest partition.

Verify the MD5 checksums:

# md5sum rhel-openstack-bridge-1.0-0.x86_64.disk1.iso

should return f7a2e2cef16d63021e5d2b7bc2b16189

Then execute the following commands on the frontend:

# rocks add roll rhel-openstack-bridge*.iso
# rocks enable roll rhel-openstack-bridge
# rocks create distro
# rocks run roll rhel-openstack-bridge | sh

The OpenStack bridge roll will enable you to set-up a Foreman appliance which will then be used to deploy OpenStack roles.

Step 3. Completing the Red Hat Enterprise Linux 6.5 and OpenStack Platform Deployment

We need to get the Red Hat Enterprise Linux OpenStack Platform and Red Hat Enterprise Linux 6.5 updates before a full deployment is possible. The latest version of Red Hat Enterprise Linux OpenStack Platform requires updates only available in the Red Hat Enterprise Linux 6.5. Fortunately, if you have a Red Hat Subscription license for these two components, creating and adding these as rolls is easy with StackIQ Cluster Manager. The only caveat is this will take some time, depending on your network. Make a pot of coffee, get some donuts, and proceed with the steps below to make the required rolls.

What You’ll Need:

If the StackIQ Cluster Manager has web access, enable your Red Hat Enterprise Linux 6.5 and Red Hat Enterprise Linux OpenStack Platform subscriptions with subscription-manager on the StackIQ Cluster Manager. Explaining how to do this is out of scope for this document. Please refer to the Red Hat documentation on how to do this: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Installation_Guide/s1-steps-rhnreg-x86.html (You'll have to do a "# yum -y install subscription-manager subscription-manager-gui" first.)
- OR
A valid subscription via Satellite Server for your company. If the StackIQ Cluster Manager has access to your company's subscriptions via a company or subnet Satellite server, you can create the the required rolls from the repository URLs available from you company's Satellite server.
Once you have properly subscribed StackIQ CM obtain the repoids needed:
- The repoid for Red Hat Enterprise Linux OpenStack Platform: in this example, since the StackIQ CM does have web access the url we will use is: "rhel-6-server-openstack-4.0-rpms" (If your system is properly configured for subscription, this can be obtained by running "# yum repolist enabled" on the StackIQ CM command line.)
- The repoid for Red Hat Enterprise Linux 6 Server: in this example, since the StackIQ CM does have web access the url we will use is: "rhel-6-server-rpm" (Also obtained by running "# yum repolist enabled" on the command line.

Creating the rolls:

StackIQ Cluster Manager can create a roll that encapsulates all the required RPMS for a repository. This allows for multiple rolls and multiple distributions to be run on different sets of hardware. At this basic level, know that we are creating a roll that will allow us to have all the RPMS available to us during kickstart of the backend nodes to fully deploy the Foreman/OpenStack infrastructure.

Prior to creating the rolls, let's check our known roll state. On the StackIQ CM command line, list the current rolls:

# rocks list roll

We have the two rolls added at install time: RHEL and rhel-openstack-bridge. Now we want to create the rolls that will allow us to add the Red Hat Enterprise Linux 6 Server updates and Red Hat Enterprise Linux OpenStack Platform.

"cd" to /export on the StackIQ CM. This is the largest partition and a good place to pull down these repositories. Then using the repoid you can obtain by running:

# cd /export

# yum repolist enabled

Do:

# rocks create mirror repoid=rhel-6-server-openstack-4.0-rpms rollname=rhel-openstack-4.0

And breath or drink coffee or check email or write a letter to your mother (Hey, she probably hasn't heard from you in a while, am I right?). This is going to take some time depending on your network.

When all the command are run and completed, you'll have an ISO with all the Red Hat Enterprise Linux OpenStack Platform RPMs in a directory that is the same name as the "repoid" above. The ISO in that directory will have the "rollname" as the ISO.

# ls rhel-6-server-openstack-4.0-rpms

All that will look like this:

Let's add the roll to the distribution:

# rocks add roll rhel-6-server-openstack-4.0-rpms/rhel-openstack-4.0-6.5.update1-0.x86_64.disk1.iso

And then enable it using the name listed in the first column of a "# rocks list roll"

# rocks enable roll rhel-openstack-4.0

It looks like this:

Now we'll do the same thing to get the most recent Red Hat Enterprise Linux 6 Server RPMs so we can take advantage of the full set of updates for that distribution. (Covers the OpenSSL Heartbleed bug and updates some RPMs required for the latest version of Red Hat Enterprise Linux OpenStack Platform.)

This is the same as the preceding process, so we'll just show the commands and an ending screenshot.

# rocks create mirror repoid=rhel-6-server-rpms name=RHEL-6-Server-Updates-06122014

This will take more time than the OpenStack repository. If you didn't write your mother then, do it now. More coffee is always an option. So is a second (or third!) donut. Lunch might be in order, a longish one.

(The "name" parameter will enable us to keep track of repository mirrors by date. This allows us to add updates on a roll basis without overwriting previous distributions. Testing new distributions becomes easy this way by assigning rolls to new distributions and machines to the distribution, providing delineation between production and test environments. If this doesn't make sense, don't worry, it's getting into cluster life-cycle management, and you'll understand it when you have to deal with it.)

Once we have the repo, a roll has been created. Let's add it:

# rocks add roll rhel-6-server-rpms/rhel-6-server-rpms-6.5.update1-0.x86_64.disk1.iso

# rocks list roll

to check the name.

Enable it:

# rocks enable roll rhel-6-server-rpms

Disable the original RHEL roll. Since the Updates roll contains all that was old and all that is new.

# rocks disable roll RHEL

Check it:

# rocks list roll

This is how all that looks on the command line:

Now recreate the distribution the backend nodes will install from. This creates one repository for kickstart to pull from during backend node installation or during yum updates of individual packages.

# rocks create distro

Breath. Sign your mother's letter and address the envelope. The distro create should be done by then because you probably have to find her address after all these years of not writing her.

Starts like this:

And ends like this:

Update the StackIQ Cluster Manager

Now we're going to update the StackIQ Cluster Manager at this point before installling any backend nodes. It's good hygiene and gets us running the lastest and great Red Hat Enterprise Linux 6.5.

# yum -y update

Then reboot when it's done. Once the machine comes back up, you can install the Foreman server and then the compute nodes. The steps to do that come next.

Step 4. Install the Foreman Appliance

StackIQ Cluster Manager contains the notion of an “appliance.” An appliance has a kickstart structure that installs a preconfigured set of RPMS and services that allow for concentrated installation of a particular application. The bridge roll provides a “Foreman” appliance that sets up the automatic installation of the Red Hat Foreman server with the required OpenStack infrastructure. It’s the fastest way to get a Foreman server up and running.

Installing Backend Nodes Using Discovery Mode in the StackIQ Cluster Manager GUI

“Discovery” mode allows the automatic installation of backend nodes without pre-populating the StackIQ Cluster Manager database with node names, IP addresses, MAC addresses, etc. The StackIQ Cluster Manager runs DHCP to answer and install any node making a PXE request on the subnet. This is ideal on networks when you, a) have full control of the network and the policies on the network and, b) you don’t care about the naming convention of your nodes. If one of these is not true, please follow the instructions for populating the database in the “Install Your Compute Nodes Using CSV Files” in the cluster-core roll documentation reference above (Section 3.4.2).

“Discovery” mode is no longer turned on by default, as it may conflict with a company’s networking policy. To turn on Discovery mode, in a terminal or ssh session on StackIQ Cluster Manager do the following:

# rocks set attr discover_start true

To turn it off after installation if you wish:

# rocks set attr discover_start false

DHCP is always running but with “discover_start” set to “false,” it will not promiscuously answer PXE requests. (In the next release this will simply be a button to turn on and off "discovery.")

With Discovery turned on, you can perform installation of backend nodes via the GUI or via the command line. To install via the GUI go the StackIQ Cluster Manager GUI at http://<StackIQ Cluster Manager hostname or IP>

Click the Login link and login as “root” with the password set during installation for “root”

Go to the “Discover” tab:

Click on Appliance, and choose “Foreman” and click “Start.”

Boot the server you are using as the Foreman server. All backend nodes should be set to PXE first on the network interface attached to the private network. This is a hard requirement.

In the GUI, you should see a server called “foreman-0-0” appear in the dialog, and in sufficient time, the Visualization area in the "Discover" tab will indicate the network traffic being used during installation.

The Foreman server appliance installation is somewhat chatty. You’ll receive status updates in the Messages box at the bottom of the page for what is happening on the node. The bare metal installation of the Foreman server is relatively short, about 20 minutes depending on the size of the disks being formatted. The installation of the Foreman application takes longer and happens after the initial boot due to RPM packaging constraints of the Foreman installer. It should be done, beginning to end, in about an hour.

When the machine is up, the indicator next to it’s name will be green and there will be a message in the alerts box indicating the machine has installed Foreman.

Using the command line:

If for some reason you do not have access to the front-end web GUI or access is extremely slow, or if you just happen to be a command line person, there is a command to do discovery of backend resources.

To install a Foreman appliance:

# insert-ethers

Choose “Foreman” and choose “OK”

Boot the machine and it should be discovered, assuming PXE first.

Once the Foreman server is installed, you can access it’s web interface by running Firefox on StackIQ Cluster Manager. It should be available at the IP address listed in a:

# rocks list host interface foreman-0-0

Adding an additional interface

If you want it accessible on the public or corporate network and not just on the private network, it will be necessary to add another network interface attached to the public network.

If the interface was detected during install:

# rocks set host interface ip

# rocks set host interface subnet public

If you add the interface after the fact:

# rocks add host interface help

And fill in the appropriate fields.

In either event, to make the network change live, sync the network:

# rocks sync host network foreman-0-0 restart=yes

This procedure is more clearly delineated in section 4.3 of the cluster-core roll documentation, referenced (twice!) above.

Step 5. Install the Backend Nodes

Before we install the backend nodes (also known as “compute nodes”), we want to ensure that all disks in the backend nodes are configured and controlled by the StackIQ Cluster Manager. On node reinstall, this prevents the inadvertent loss of data on disks that are not the system disk. Now, we don’t want to reconfigure the controller and reformat disks on every installation, so we need to instruct the StackIQ Cluster Manager to perform this task the next time the backend nodes install. We do this by setting an attribute (“nukedisks”):

# rocks set appliance attr compute nukedisks true

After node reinstallation, this attribute is automatically set to “false,” so the only way to reformat non-system disks, is to deliberately set this attribute to “true” before node reinstall.

Now we are ready to install the backend nodes. This is the same procedure that we used to install the Foreman server. This time, however, choose “Compute” as the appliance, whether you are using the web GUI or the CLI command “insert-ethers”.

Make sure the StackIQ Cluster Manager is in "discovery" mode using the CLI or GUI and all backend nodes are PXE booted. StackIQ Cluster Manager discovers and installs each backend node in parallel, packages are installed in parallel, and disks on the node are also formatted in parallel. All this parallelism allows us to install an entire cluster, no matter the size, in about 10 to 20 minutes -- no manual steps are required. For more information on installing and using the StackIQ Cluster Manager, please visit http://www.stackiq.com/support/ or http://www.youtube.com/stackiq. Please review the above video and section 3.4 of the cluster-core roll documentation for questions.

After all the nodes in the cluster are up and running, you will be ready to deploy OpenStack via the Foreman web interface. In this example, the StackIQ Cluster Manager node was named “kaiza” and the foreman server was named “foreman-0-0.” The compute nodes were assigned default names of compute-0-0, compute-0-1, compute-0-2.

This is how it looks on the GUI when all the installs are completed.

Step 5. Configuring Foreman to Deploy OpenStack

The Foreman server as supplied by RHEL contains all the puppet manifests required to deploy machines with OpenStack roles. With the backend nodes installed and properly reporting to Foreman, we can go to the Foreman web GUI and configure the backend nodes to run OpenStack.

The example here will be for the simplest case: a Nova Network Controller using a single network, and a couple of hypervisors running the Cirros cloud image.

More complex cases (Neutron, Swift, Cinder) will follow in the next few weeks as appendices to this document. Feel free to experiment ahead of those instructions, however.

1. Go to https://

Choose “Proceed Anyway” or, if in Firefox, accept the certificate, if the security certificate is not trusted.

You should get a login screen:

Log_in_screen

2. Login, the default username is “admin” and the default password is “changeme” Take the time to change the password once you log in, especially if the Foreman server is available to the outside world.

3. Add a controller node

You should see all the nodes you’ve installed listed on the Foreman Dashboard. Click on the “Hosts” tab to go to the hosts window.

Click on the machine you intend to use as a controller node. You will have to change some parameters to reflect the network you are using for OpenStack (in this example, the private one).

It’s highly recommended this machine also have a connection to the external network (www or corporate internet) to simplify web access. See “Adding an additional interface” above on how to do that. Do not choose the Foreman server as a controller node. The OpenStack Dashboard overwrites httpd configuration files and will disable the ability to log into the Foreman web server. However, If you have a small cluster, you can add the Foreman server as an OpenStack Compute node, as we do in this example. You may not want to do that in a larger cluster though. Separation of services is almost always a good thing.

Click on the host, we will use “compute-0-0.” When the “compute-0-0” page comes up, click on “Edit.”

You should see a page called “Edit compute-0-0.local.” Set the “Host Group” tab to “Controller (Nova Network).” (An example of Neutron networking will follow in later Appendices to this document.)

Click on the “Parameters” tab. There are a lot of parameters here, but we will change the minimum to reflect our network.

Click the “Override” button next to the following parameters:

controller_priv_host

controller_pub_host

mysql_host

qpid_host

These parameters will be listed at the bottom of the page with text fields to change them. The controller_priv_host, mysql_host, and qpid_host, should all be changed to the private interface IP of the controller node, i.e. the machine you are editing right now.

The controller_pub_host should be the IP address of the public interface (if you have added one) of the controller node, i.e. the machine you are editing right now.

If you don’t know the IP address of the controller node, in a terminal on the StackIQ Cluster Manager, do the following

Screen Shot 2014-03-27 at 2.58.57 PM.png

The IP address for the controller_pub_network, in this instance, is on eth2, we set it that way and cabled it to the corporate external network, and has IP address 192.168.1.60.

This can been seen as below. Once you’ve made the changes, click “Submit.”

Going back to the “Hosts” tab, you should see that “compute-0-0.local” is has the “Controller (Nova Network)” role.

Screen Shot 2014-03-27 at 3.01.41 PM.png

There is a puppet agent that runs on each machine. It runs every 30 minutes. This will automatically update the machine’s configuration and make it the OpenStack Controller. If you don’t want to wait that long, start the puppet process yourself from StackIQ Cluster Manager. (Alternatively, you can ssh to compute-0-0 and manually run “puppet agent -tv”.)

Once the puppet run finishes, you can add OpenStack Computes. (The puppet run on the controller node can take awhile to execute.)

Add OpenStack Compute Nodes

There isn’t much for an OpenStack Controller to do if it can’t launch instances of images, so we need a couple of hypervisors. We’ll do this a little differently than the Controller node, where we edited one individual machine, and instead, edit the “Host Group” we want the computes to run as. This allows us to make the changes once and apply them to all the machines.

Go to “More” and choose “Configuration” from the drop down, then click on “Host Groups” in the next drop down.

Click on “Compute (Nova Network)” and it will bring you to an “Edit Compute (Nova Network)” screen:

Choose the “Parameters” tab:

We’re going to edit a number of fields, similar to the the Controller node. Click the “override” button on each of the following parameters and edit them at the bottom of the page:

controller_priv_host - set to private IP address of controller

controller_pub_host - set to public IP address of controller

mysql_host - set to private IP address of controller

qpid_host - set to private IP address of controller

nova_network_private_iface - the device of the private network interface

nova_network_public_iface

nova_network_private_iface

The nova_network_*_iface default to em1 and em2. These may work on the machines in your cluster, and you may not have to change them. Since the test cluster is on older hardware, eth0, eth1, and eth2 are where the networks sit. So for this test cluster, the appropriate changes are as below. The test cluster needs the eth2 interface for the public network because it is using the foreman-0-0 as a compute node. If your Foreman node is not part of your test cluster, you may not need to change this.

More advanced networking configurations, i.e. when using multiple networks or using Neutron, may require additional parameters.

Click submit. Any host that is listed with the “Compute (Nova Network)” role, will inherit these parameters.

Now lets add the hosts that will belong to the “Host Group” Compute (Nova Network).

Go to the “Hosts” tab once again, and choose all the hosts that will run as Nova Network Computes. In this example, since it’s such a small cluster, we’ll add the “foreman-0-0” machine as an OpenStack Compute:

Screen Shot 2014-03-27 at 3.13.05 PM.png

Now click on “Select Action” and choose “Change Group.”

Screen Shot 2014-03-27 at 3.16.45 PM.png

Click on “Select Host Group” and choose “Compute (Nova Network)” then click “Submit.”

The hosts should show the group they’ve been assigned to:

Again, you can wait for the Puppet run or spawn it yourself from StackIQ Cluster Manager. Since we have a group of machines, we will use “rocks run host” to spawn “puppet agent -tv” on all the machines:

If we had chosen only the Compute nodes for OpenStack Compute role and not the Foreman node, we could do this on just the computes by specifying their appliance type:

Once puppet has finished, log into the OpenStack Controller Dashboard to start using OpenStack.

Using OpenStack

To access the controller node, go to http://<controller node ip> . This is accessible on either the public IP you configured for this machine or at the private IP. If you have only configured this on the private IP, you’ll have to open a browser from StackIQ Cluster Manager or port forward SSH to the private IP from your desktop.

The username is “admin” and the password was randomly generated during the Controller puppet install. To get this password, go to the Foreman web GUI, click on the “Hosts” tab and click on the host name of the Controller host:

The click “Edit” and go to the “Parameters” tab:

Copy the “admin_password” string:

Paste it into the password field on the OpenStack Dashboard and click “Submit.”

You should now be logged into the OpenStack Dashboard

Click on “Hypervisors,” you should see the three OpenStack compute nodes you’ve deployed.

As a simple example, we’ll deploy the Cirros cloud image that OpenStack uses in their documentation.

Click on “Images.”

Click on “Create Image” and you’ll be presented with the image configuration window.

Fill in the required information:

Name - we’ll just use “cirros”

Image Source - use default “Image Location”

Image Location - http://download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-disk.img

Why do we know this? Because I looked it up here: http://docs.OpenStack.org/image-guide/content/ch_obtaining_images.html

Format - QCOW2

And make it “Public”

Then click “Create Image.”

The image will show a status of “Queued.”

And when it’s downloaded and available to create Instances, it will be labeled as “Active.”

Cool! Now we can actually launch an instance and access it.

Adding an Instance:

Click on “Project” then on “Instances” in the sidebar:

Click on “Launch Instance.”

Fill out the parameters:

Availablility Zone - nova, default

Instance Name - we’ll call it cirros-1

Flavor - m1.tiny, default

Instance Count - 1, default

Instance Boot Source - Select “Boot from Image”

Image Name - Select “cirros”

It should look like this:

Now click "Launch."

You should see a transient “Success” notification on the OpenStack Dashboard and then the instance should start spawning.

When instance is ready for use, it will show as “Active” with power state “Running,” and log-in should work. (Cirros login is “cirros” and password is “cubswin:)” with the emoji.)

Logging into the Instance

In this simple example, to log into the instance, you must log into the hypervisor where the instance is running. Subsequent blog posts will deal with more transparent access for users.

To find out which hypervisor your instance is running on, go to the “Admin” panel from the left sidebar and click on “Instances.”

We can see the instance is running on compute-0-1 with a 10.0.0.2 IP. So from a terminal on the frontend, ssh into the hypervisor compute-0-1.

Now log into the instance as user “cirros” with password “cubswin:)” (The password includes the emoji.)

Untitled_4

Now you can run Linux commands to prove to yourself you have a functioning instance:

Untitled_5

Reinstalling

There are times when a machine needs to be reinstalled: hardware changes or repair, uncertainty about a machine’s state, etc. A reinstall generally takes care of these issues. The goal of StackIQ Cluster Manager is to have software homogeneity across heterogeneous hardware. StackIQ Cluster Manager allows you to have immediate consistency of your software stacks on first boot. One of the ways we do this is by making reinstallation of your hardware as fast as possible (reinstalling 1000 nodes is about as fast as reinstalling 10.) and correct when a machine comes back up.

One of the difficulties with the OpenStack puppet deployment is certificate management. When a machine is first installed and communicates with Foreman, a persistent puppet certificate is created. When a machine is re-installed or replaced, the key needs to be removed in order for the machine to resume its membership in the cluster. StackIQ Cluster Manager takes care of this by watching for reinstallation events and communicating with the Foreman server to remove the puppet certificate. When the machine finishes installing, the node will rejoin the cluster automatically. In the instance of a reinstall, if the OpenStack role has been set for this machine, the node will do the appropriate puppet run and rejoin OpenStack in the assigned role, and you really don’t have to do anything special for that to happen.

To reinstall a machine:

# rocks run host “/boot/kickstart/cluster-kickstart-pxe”

# rocks set host boot action=install

# rocks run host “reboot”

If you wish to start with a completely clean machine and don’t care about the data on it, set the “nukedisks” flag to true before doing one of the above installation commands:

# rocks set host attr nukedisks true

Multi-use clusters
StackIQ Cluster Manager has been used to run multi-use clusters with different software stacks assigned to different sets of machines. The OpenStack implementation is like that. If you want to allocate machines for another application and you’re using the RHEL OpenStack Bridge roll, then you can turn off OpenStack deployment on certain machines, and they will not be set-up to participate in the OpenStack environment. To do this, simply do the following:

# rocks set host attr has_openstack false

The bridge roll sets every compute node to participate in the OpenStack distribution. Throwing this flag for a host means the machine will not participate in the OpenStack deployment. If the machine was first installed with OpenStack, then you will have to reinstall after setting this attribute.

Updates

Red Hat provides updates Red Hat Enterprise Linux OpenStack Platorm and to Red Hat Enterprise Linux Server regularly. StackIQ tracks these updates and will provide updated rolls for critical patches or service updates to Red Hat Enterprise Linux OpenStack Platform. Additionally, if your frontend is properly subscribed to RHN or to Subscription manager, these updates can be easily pulled and updated with the "rocks create mirror" command. Updating and subscription management deserves a blog post of its own and will be forthcoming.

Next Steps

Admittedly, we are documenting the simplest use case - Nova Networking on a single network. This is not ideal for production systems, but by now, you should be able to see how you can use the different components, StackIQ Cluster Manager, Foreman, and OpenStack Dashboard to easily configure and deploy OpenStack. Adding complexity can be done as you explore the Red Hat Enterprise Linux OpenStack Platform ecosystem to fit your company’s needs.

In the future, we will provide further documentation on deploying Neutron, Swift, and Cinder. Additionally, layering OpenStack roles (Swift and Compute, for instance) will be topics we will be exploring and blogging about as we move forward with Red Hat Enterprise Linux OpenStack Platform. Stay tuned!

Resources

StackIQ:
Using StackIQ Cluster Manager for deploying clusters: https://s3.amazonaws.com/stackiq-release/stack3/roll-cluster-core-usersguide.pdf. Video: https://www.youtube.com/watch?v=gVPZcA-yHQY&list=UUgg-AnfqnNCp-DxpVEfJkuA

Red Hat:
Red Hat Enterprise Linux OpenStack Platform Documentation: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/

We’re happy to answer questions on installing, configuring, and deploying Red hat Enterprise Linux OpenStack Platform with StackIQ Cluster Manager. Please send email to support@stackiq.com.

Greg Bruno, Ph.D., VP of Engineering, StackIQ

@StackIQ

↧

How to Use Cloudera Enterprise 5 With StackIQ Cluster Manager

April 25, 2014, 12:17 pm

≫ Next: Is OpenStack Ready for Primetime?

≪ Previous: Deploy and Manage Red Hat Enterprise Linux OpenStack Platform With StackIQ

(Note that these instructions are for Cloudera Enterprise 5. To use StackIQ Cluster Manager with previous Cloudera release, please visit the the blog post)

Components:

Image 1 resized 600

Step 1: Install StackIQ Cluster Manager

The remainder of the Cluster Manager installation requires no further manual steps and this entire step takes between 30 to 40 minutes.

Step 2: Install the CDH Bridge Roll

Log into the frontend as "root", download cdh-bridge ISO from here.
Then execute the following comands at the root prompt"

 # rocks add roll <path_to_iso>

 # rocks enable roll cdh-bridge

 # rocks create distro

 # rocks run roll cdh-bridge | sh

The cluster is now configured to install Cloudera packages on all nodes.

Step 3: Install Cloudera Manager and Cloudera CDH5 Roll

You can download a prepackaged Cloudera Manager here and a prepackaged Cloudera CDH5 from here.

We will now install these 2 ISOs.

rocks add roll cloudera-cdh5-6.5-0.x86_64.disk1.iso

rocks add roll cloudera-manager5-6.5-0.x86_64.disk1.iso

rocks enable roll cloudera-cdh5

rocks enable roll cloudera-manager5

rocks create distro

rocks run roll cloudera-cdh5 | sh

rocks run roll cloudera-manager5 | sh

Step 4: Install the backend nodes

For more information on this feature, see our blog post Why Automation is the Secret Ingredient for Big Data Clusters.

# rocks set appliance attr compute nukedisks true

# rocks set appliance attr cdh-manager nukedisks true

describe the image

After installing it shows up as below:

2 cdh manager after discovery (framed) resized 600

We will install all the other nodes in the cluster as compute nodes. StackIQ Cluster Manager discovers and installs each backend node in parallel (10 to 20 minutes) - no manual steps are required.

describe the image

For more information on installing and using the StackIQ Cluster Manager (a.k.a., Rocks+), please visit StackIQ Support or watch the the demo video.

Step 5: Install Cloudera Manager

SSH into cdh-manager appliance, as root, execute:

# /opt/rocks/sbin/cm5/cloudera-manager-installer.bin --skip_repo_package=1

This will install Cloudera Manager with packages from our local yum repository as opposed to fetching packages over the internet.

Step 6: Select What to Install

Log into the cdh-manager node http://<cdh-manager>:7180 (where ‘<cdh-manager>’ is the FQDN of your StackIQ Cluster Manager) with username admin and password admin

Image 2 (cropped) resized 600

Choose Cloudera Enterprise trial if you want to do a trial run

Image 3 copy resized 600

Click Continue in the screen below.

Image 4 framed resized 600

Specify list of hosts for CDH installation e.g., compute-0-[0-3],cdh-manager-0-0

Image 5 cropped resized 600

After all the hosts are identified, hit Continue

Image 1 resized 600

Choose Use Packages and select CDH5 as the version in the screen below.

Cluster_Installation

Specify custom repository as the CDH release you want to install. Specifyhttp://<frontend>/install/distributions/rocks-dist/x86_64/ for the URL of the repository where <frontend> is the IP address of the cluster’s frontend.

Screen_Shot_2014-04-24_at_2.04.37_PM

In the example above, 10.1.1.1 was the IP address of the private eth0 interface on the frontend.

Choose All hosts accept same private key as the authentication method. Use Browse to upload the private key present in /root/.ssh/id_rsa on StackIQ Cluster Manager.

Image 9 copy resized 600

You will then see a screen where the progress of the installation will be indicated. After installation completes successfully, hit Continue.

Screen_Shot_2014-04-24_at_2.07.43_PM

You will then be directed to the following screen where all hosts will be inspected for correctness.

Screen_Shot_2014-04-24_at_2.08.49_PM

Choose a combination of services you want to install and hit Continue

Screen_Shot_2014-04-24_at_2.10.14_PM

Review that all services were successfully installed.

Screen_Shot_2014-04-24_at_2.11.32_PM

Finally your Hadoop services will be started.

Screen_Shot_2014-04-24_at_2.12.44_PM

Step 7: Run a Hadoop sample program

Do this to check to test if the cluster is functional:

Log into the the cdh-manager node as “root” via SSH or Putty.
On the command line, run the following map-reduce program as the “hdfs” user, which runs a simulation to estimate the value of pi based on sampling:

# sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 10 10000

Output should look something like this.

Image 15 copy resized 600

Congratulations, you are done!

We’re certain you’ll find this the quickest way to deploy a cluster capable of running Cloudera Hadoop. Give it a shot and send us your questions!

The StackIQ Team

@StackIQ

↧

Is OpenStack Ready for Primetime?

July 8, 2014, 11:07 am

≫ Next: Automate the Way You Work With Spreadsheets

≪ Previous: How to Use Cloudera Enterprise 5 With StackIQ Cluster Manager

openstack-logo512 Here at StackIQ we are all very excited about OpenStack and the possibilities that it creates for businesses that think about adopting the private cloud model.

OpenStack is gaining significant momentum in the enterprise data center and there is a common consensus among cloud architects that it’s now ready for prime time. Well, maybe there are a few doubters out there but what can you do about that, right?

Anyways, we wanted to share a few thoughts on what’s going on in the space, and what we have been working on in regards to OpenStack.

First of all, why private cloud? There are ample of options to get your cloud on by working with vendors like Amazon (AWS), Google, Microsoft, etc. but these are public cloud solutions. Specialty hardware or configuration requirements, regulatory requirements, network latency, and security concerns are just a few use cases where a private cloud solution is required to get the job done.
Data centers have been transitioning from proprietary solutions to open source software and less costly commodity hardware. We have seen this happening in Big Data, general data center applications, HPC, etc. Now the paradigm shift is starting in the cloud space as well.
In private cloud environments, VMWare for example has been a dominant player with over 50% of enterprises using VMWare products for cloud virtualization projects. It supposedly works out of the box, but maintaining a VMWare cloud is pricey. But up until now there were very few enterprise-grade alternatives available.
And here comes OpenStack – It’s a viable solution and like other open source projects, it levels the playing field for businesses of all shapes and sizes. But there is a caveat. Like everything open-source, OpenStack is a set of loosely coupled projects, and operating an OpenStack-powered cloud on your own brings its own set of challenges with deployment and management.

So, is OpenStack ready for business and can it become the standard for private cloud in the enterprise?

We say yes, OpenStack is ready for the enterprise data center and it will unlock new possibilities. But like we have seen with Big Data (Hadoop), the cornerstones of OpenStack’s success in the enterprise are Automation, Integration and Scalability.

Here are a few fundamental things to keep in mind if you have been playing around with the idea of jumping on the OpenStack bandwagon:

Automate as much maintenance work as possible and free up the IT workforce to create new applications, rather than care and feeding for infrastructure.
The infrastructure must be compatible to integrate with a wide range of hardware and applications.
Easy to scale – infrastructure must remain flexible and stable enough to rapidly add additional capacity to meet business requirements

Many vendors are lining up to emerge as the leader as the OpenStack ecosystem continues to grow. Red Hat and many others back the project but we believe that Red Hat, with its experience in Linux and the footprint of RHEL in the enterprise, is clearly in a great position to take the lead. That’s why we certified our software suite with Red Hat and joined the Red Hat OpenStack Cloud Infrastructure ecosystem.

StackIQ’s holistic automation is helping to accelerate OpenStack adoption in the enterprise by reducing the resources needed for deployment and management. Just like StackIQ automates the deployment and management for Hadoop, we now offer the same capabilities for Red Hat OpenStack Platform customers.

Alcatel-Lucent already utilizes StackIQ for the deployment and management of Red Hat's OpenStack Platform for its CloudBand™ NFV (Network Functions Virtualization) Platform. Alcatel-Lucent’s technology is used by the largest telecommunications operators like T-Mobile, Telefonica and NTT. Read the announcement from earlier this year. Other heavy hitters like AT&T, Comcast, Gap and Disney have already deployed, or announced their intention to deploy OpenStack-powered clouds in the near future.

Together, StackIQ and Red Hat are committed to provide a best-of-breed solution to accelerate the adoption of OpenStack technology in the enterprise data center.

Ready to take the OpenStack plunge? Start with reading Dr. Bruno’s blog post on how to deploy and manage Red Hat OpenStack Platform with StackIQ. More questions? Talk to us. You like infographics? You just found the best OpenStack infographic on the web and you are welcome (thanks to IDG Connect and Red Hat).

The StackIQ Team (@StackIQ)

↧

Automate the Way You Work With Spreadsheets

July 14, 2014, 2:36 pm

≫ Next: The Pain Curve: The Complexity of Clusters and Why Clusters are So Different

≪ Previous: Is OpenStack Ready for Primetime?

This post discusses how to track data center topology by using spreadsheet applications like Microsoft Excel or Google Docs Spreadsheets. Many data center and network operators maintain the topology of services, appliances, hosts, configurations, etc. in spreadsheets. Since spreadsheets are a portable format, this allows them to track changes and move the data around with ease. It also allows the administrator to exert fine-grained control over the topology of their datacenter operations.

However, one of the challenges of maintaining data in spreadsheets is translating from spreadsheet to actual implementation. The administrator is required to read the data from the spreadsheet and manually type in commands on the console that will then bring the system up to the state that's described in the spreadsheet. This, even with experienced system administrators, is subject to error and failure. But how to leverage the advantages of the spreadsheet format and at the same time automate the process?

Enter StackIQ – The Automated Way to Work With Spreadsheets

If you have been following us for a while, you already know that here at StackIQ we believe that automation is the key to success in today’s enterprise data center, and if there is a way to automate it, we’ll find it. Here is how to leverage the information in a spreadsheet and eliminate the manual process involved.

If the data stored in the spreadsheet is in a compatible format (we’ll get to what formats are compatible later), StackIQ can ingest the spreadsheet directly into a running StackIQ Cluster Manager. Our software can then automatically translate the data into runnable commands. This way, the data stored in the spreadsheet is no longer just a description, it is an actual representation of the desired state of the cluster.

There are two types of spreadsheets that StackIQ currently supports. One is the Hosts spreadsheet, and the other is the Attributes spreadsheet.

Hosts Spreadsheet

Let’s start with the Hosts spreadsheet. The Hosts spreadsheet, shown below, is used to add hosts to an existing StackIQ Cluster Manager^[1].

Screen_Shot_2014-07-10_at_3.39.54_PM

As the spreadsheet shows, if we know the MAC addresses of the machines in our cluster, we can assign IP addresses, hostname, network information, rack and rank information, and appliance types to each of these machines. This allows the administrator fine-grained control over the cluster.

Importing a Hosts spreadsheet is a very simple process:

1. Download the spreadsheet on to StackIQ Cluster Manager. Let's call it hosts_config.csv

2. Run the command:

# rocks load hostfile file=hosts_config.csv

When a spreadsheet is ingested using the above command, the network information and hostnames in the spreadsheet are used to configure the hosts. If the administrator decides to change the naming or networking information, the spreadsheet is updated and the process is repeated - Ingest the spreadsheet, and re-install the hosts.

Attributes Spreadsheet

On a StackIQ Cluster Manager, the configuration information for the cluster, and properties of the hosts are maintained in a database as key-value pairs. These properties are called Attributes. The attributes follow a simple schema. They are hierarchical. In order, they are global attributes, appliance attributes, and host attributes - each taking precedence over the previous level in the hierarchy. These attributes can be manipulated using the StackIQ command line or using the spreadsheet.

The Attribute spreadsheet, shown below, is used to manipulate attributes on StackIQ Cluster Manager.

This simple spreadsheet shows the following for this cluster:

discover_start attribute is set to true in the global scope

nukedisks attribute is set to true for all compute appliances

compute-0-0, however, has the "nukedisks" attribute set to false, which overrides the appliance level attribute.

Importing the Attributes spreadsheet is as simple as the above process for the Hosts spreadsheet.

1. Download the spreadsheet on to the cluster manager. Let's call it attr_list.csv

2. Run the command:

# rocks load attrfile file=attr_list.csv

Now, we can set the hosts to install using the command:

# rocks set host boot compute ambari action=install

Then, we power-cycle the hosts to let them install, and boot up into a running OS. And that’s it!

Try it for yourself - Download our software (free for up to 16 nodes), use the spreadsheet example from this post or create your own, and spin up a cluster.

In the future we'll plan to use spreadsheets to configure more services on the cluster. We’re working on configuring disk controllers, partitioning of disks, Hadoop Services like Ambari and Cloudera using spreadsheets. Stay tuned and check back for more developments.

Any questions or comments? Contact us @StackIQ

The StackIQ Team

[1] StackIQ Cluster Manager is the machine that provisions the OS on the backend nodes, and manages and monitors the installation.

↧

The Pain Curve: The Complexity of Clusters and Why Clusters are So Different

July 21, 2014, 4:17 pm

≫ Next: Configure and Deploy Ambari and Hortonworks Data Platform (HDP) with StackIQ

≪ Previous: Automate the Way You Work With Spreadsheets

I've been building clusters for my entire professional career and I've known that clusters are different, but never could quite articulate why. Until now.

After having many conversations with operations team members from a broad cross-section of enterprises, I now have a handle on why clusters are so different from farms of single-purpose servers that reside in traditional data centers.

For every organization that operates a cluster with traditional data center tools, there is what I call a "Pain Curve" (see diagram below). It is difficult to quantify the number of servers required to reach the pain threshold, which is dependent on the size and quality of the operations staff. But one thing is certain – for those who don't have an automated solution that can address the cluster requirements of uniform software stacks, consistent service configurations, and total server awareness, real pain is coming and failure is inevitable.

Due of the rise of Hadoop and OpenStack, many enterprises are now deploying their first small clusters of 10 to 20 servers. At this small scale, the complexity of operating the cluster looks and feels like 10-20 general data center servers – that is because we are on the far left side of the operational complexity graph below. It is not until the clusters scale, as they inevitably do, when the pain caused by the exponential complexity becomes apparent. We've seen this problem occur time and time again.

dr-brunos-pain-curve-5

Consider one real-world example involving a top-tier financial services company. They were building a Hadoop cluster, and their projected production cluster was scoped to be 100 servers. They had plenty of experience running 100-server clusters before, and they felt they had the situation under control. As they did in the past, they cobbled together a home-grown project to manage their small cluster.

Soon-after they put the 100-server cluster into production and, once operational, the machine generated so much value for the business that demand for it skyrocketed and they had to scale the cluster. The cluster was expanded to 350 servers, but somewhere between 100 and 350 is where the home-grown project failed. All 350 servers went down – the cluster effectively became a multi-million dollar paperweight.

They had crossed the pain threshold, and they recognized that their home-grown project had landed them in the “Failure Zone.”

Finally, after months of pain and inevitable failure, this company utilized an automated solution that was meticulously designed to manage clusters at scale, and the company was able to put all 350 servers back online again with a sustainably configured architecture in just 36 hours.

Clustered Servers See the World Differently

Why did this global financial services firm have such trouble solving this seemingly simple problem? Because the worldview of a single-purpose server in a traditional data center is that it accepts external requests, processes those requests, then responds to the requester. It is like a thoroughbred wearing blinders in the Kentucky Derby. Such a detached server has no notion of any of the other hundreds or thousands of servers that are happily churning through requests in the adjoining racks. As new servers are added to the data center, they are racked and stacked, installed and configured, then brought online -- the existing servers remain untouched.

The worldview of a server in a cluster is vastly different. By definition, each server in a cluster must be aware of every other server in the cluster. This is so the servers in the cluster can “collectively” accept external requests, process those requests, and then respond to the request – as a team.

At the absolute minimum, each server in a cluster must know about all the other servers in the cluster. Additionally, each cluster service (e.g., Hadoop services) must be configured with the awareness of the other services on all the cluster servers. And, more often than not, each service must be executing on top of the exact same software stack on each cluster server in order to produce consistent and correct results.

In short, all cluster servers must: 1) have the exact same bits on each server; 2) have the exact same software configuration; and 3) have total awareness of each of the cluster servers. As new servers are added to a cluster, the new servers must satisfy all three of the above requirements (same bits, same configuration and total awareness). Moreover, the existing clustered servers must now be aware of the newly added servers.

Back to the single-purpose servers in the data center. Since each server is an island, as new servers are added, the complexity added to the operations team increases by a “linear” amount. If I'm wearing my computer science hat, the complexity of operations for the general data center servers is O(N), where N = number of servers. It is linear because new servers do not require configuration changes to the existing servers.

Contrast this with the total awareness requirement for cluster servers. Newly added servers increase the burden on the operations team by an “exponential” amount because the existing servers must be reconfigured to be made aware of the new servers. In other words, the complexity of operating clusters is O(N²).

This level of coordination is what makes clusters the obvious choice for next-gen Big Data platforms like Hadoop and cloud architectures like OpenStack. Clusters deliver vastly greater speed, power, and agility, but as we now know it is what also makes clusters too complex to manage without an automated solution.

↧

Configure and Deploy Ambari and Hortonworks Data Platform (HDP) with StackIQ

July 29, 2014, 2:03 pm

≫ Next: Bringing Much Needed Automation to OpenStack Infrastructure

≪ Previous: The Pain Curve: The Complexity of Clusters and Why Clusters are So Different

Hortonworks Data Platform (HDP) is an enterprise-grade Hadoop distribution from Hortonworks. Architected, developed, and built completely in the open, HDP provides Hadoop designed to meet the needs of enterprise data processing.

The deployment of HDP on a cluster is a non-trivial task. With this in mind, the engineers at Hortonworks came up with a service called Ambari. Ambari provides a web interface that enables the deployment of Hadoop services across a cluster. However, this requires an additional deployment step. While Ambari is used to deploy HDP on a cluster, Ambari itself needs to be setup on a cluster too. StackIQ Cluster Manager automates the deployment of Ambari in a few simple steps. Automation becomes increasingly valuable to prevent the pain and failure many enterprises have come to experience as clusters scale.

Using the StackIQ Bridge Roll for HDP, the administrator can easily deploy Ambari on a cluster by following the step-by-step instructions below.

Step 1: Install the StackIQ Cluster Manager

Install StackIQ Cluster Manager with the StackIQ Bridge Roll for HDP as part of the installation. For instructions on how to install StackIQ Cluster Manager with the StackIQ Bridge Roll for HDP, refer to the StackIQ core installation guide and the StackIQ Bridge Roll for HDP documentation.

Step 2: Download HDP and Ambari Bits

Once StackIQ Cluster Manager is up and running, you are now ready to download StackIQ Bridge Rolls for HDP (need correct link) and Ambari.

Download the following Rolls from the given locations:

1. HDP 2.x Roll: https://s3.amazonaws.com/stackiq-release/stack3/HDP-2.x-6.5-0.x86_64.disk1.iso

2. Updates to HDP Roll: https://s3.amazonaws.com/stackiq-release/stack3/Updates-HDP-2.x-6.5-0.x86_64.disk1.iso

3. Ambari Roll: https://s3.amazonaws.com/stackiq-release/stack3/ambari-1.x-6.5-0.x86_64.disk1.iso

4. Updates to Ambari Roll: https://s3.amazonaws.com/stackiq-release/stack3/Updates-ambari-1.4.4.23-6.5-0.x86_64.disk1.iso

5. HDP Utils 1.1.0-16 Roll: https://s3.amazonaws.com/stackiq-release/stack3/HDP-UTILS-1.1.0.16-6.5-0.x86_64.disk1.iso

Depending on your inbound network connectivity and speed, the above process may take anywhere from 30 minutes to 3 hours.

Step 3: Add the HDP and Ambari bits to your distribution

Once all the rolls are downloaded, Add each ISO to your distribution.

# rocks add roll <rollname.iso>

Enable the rolls by running:

# rocks enable roll HDP Updates-HDP ambari Updates-ambari HDP-UTILS

Re-create the distribution by running:

# rocks create distro

StackIQ Cluster Manager is now fully-capable of deploying Ambari on your cluster.

Step 4: Deploy Ambari Appliance

Using the StackIQ web interface, or the insert-ethers utility, install an Ambari appliance. This will install and configure Ambari on the appliance.

Note: Deploy only one machine on your cluster as Ambari appliance.

Step 5: Deploy Compute Appliances

After the Ambari appliance is deployed, you can then deploy all the other machines in your cluster as compute appliances using the same StackIQ web UI or the insert-ethers utility.

Step 6: Configuring HDP Using Ambari

The Ambari service is a web service that allows the administrator to deploy HDP services, like HDFS, MapReduce, YARN, Hive, HBase, etc.

The administrator can access the Ambari Web interface at port 8080 on the Ambari appliance. For example, if the hostname of the Ambari appliance is ambari-0-0, point your web-browser to http://ambari-0-0:8080/

Enter the name that you'd like to use for the HDP cluster. Click "Next"

Select the version of HDP that you'd like to use. By default, only the latest version is made available. For any other version, the bits will have to be downloaded.

The repository should already be set to point to the correct location. Verify this.

Enter the bootstrapping information about the hosts, and the private key.

Click "Next" to start the installation of hosts.

Image_6.1

After this, the wizard may warn you of inconsistencies in the system. Unless the install fails, this warning may be ignored.

Next, choose the services you want to install

Assign the master/server components to the hosts

Assign client components

Customize the services. This allows you to fix mistakes in the default configuration. The services that require attention have a red tag next to them, as shown below, next to Hive, Oozie, and Nagios.

Review the setup, and click "Deploy"

This last screen shows the status of the installation.

At the end of this process, you should have a fully-functional HDP installation. Congratulations!

We’re certain you’ll find this the quickest and easiest way to configure and deploy a HDP cluster. Give it a shot and send us your questions or comments!

The StackIQ Team

@StackIQ

↧

Bringing Much Needed Automation to OpenStack Infrastructure

August 22, 2014, 12:54 pm

≫ Next: Deploying a Hortonworks Data Platform Cluster With the StackIQ CLI

≪ Previous: Configure and Deploy Ambari and Hortonworks Data Platform (HDP) with StackIQ

If you read our previous blog post, you already know that here at StackIQ we are excited about OpenStack and the possibilities that it creates for the enterprise. We hinted at a few challenges that we see with OpenStack deployments and took it a step further with a white paper. Here is a little teaser of what's discussed. Have a read and if interested in more, just download the entire white paper for free. openstack-logo512

As more companies become fluent in cloud computing, they tend to deploy more and more types of cloud workloads. As these workloads expand with greater scope and maturity, businesses can derive even greater value. Benefits can include dramatically lower costs for data center infrastructure, while increasing scalability and availability. All of this translates to faster time to market with much greater business flexibility.

OpenStack has quickly emerged as the leading cloud architecture built on open source software for scalable private clouds in the enterprise. One notable challenge of OpenStack, similar to deploying Hadoop for Big Data, is the exponential curve of complexity in deploying and managing the clusters needed to build OpenStack infrastructure at scale, which can quickly become unwieldy for IT managers.

OpenStack does not include a comprehensive off-the-shelf platform for managing cloud infrastructure – which leads to the high degrees of complexity at scale. The need for greater efficiency and automation has become all too clear in enterprise deployments of OpenStack. For this reason, StackIQ has partnered with Red Hat to simplify the process of deploying and managing Red Hat Enterprise Linux OpenStack Platform.

With StackIQ Cluster Manager, data center managers can greatly reduce the time needed for large private cloud deployments and greatly increase reliability and manageability by automating the complex configuration steps for the hardware infrastructure and application stacks of heterogeneous data center environments.

Interested to read more? Download the white paper and let us know what you think. Enjoy the read.

The StackIQ Team

↧

Deploying a Hortonworks Data Platform Cluster With the StackIQ CLI

September 10, 2014, 10:48 am

≫ Next: Configure and Deploy Puppet on Top of StackIQ Cluster Manager

≪ Previous: Bringing Much Needed Automation to OpenStack Infrastructure

hortonworks1_logo In a previous blog post, we discussed how StackIQ’s Cluster Manager automates the installation and configuration of an Ambari server. The blog post then went on to illustrate the installation of a Hortonworks Data Platform (HDP) cluster through the Ambari web interface. This follow up post walks you through on how to deploy a HDP cluster using StackIQ’s command line interface (CLI).

The Ambari web interface is very user-friendly, but to help with automation of the cluster installation, a CLI is needed. To that end, the Ambari server provides a REST API to help deploy and configure HDP services and components. This REST API uses HTTP calls - GET, POST, PUT, and DELETE - to inspect, create, configure, and remove components and services in the Ambari server.

StackIQ’s Cluster Manager provides a CLI, with the moniker “rocks”, that allows system administrators to manage large cluster installations with ease. The CLI has a complete view of the entire cluster, the hardware resources, and the software configuration.

We’ve extended the StackIQ CLI to talk to the Ambari server using the REST API. This allows us to use the familiar “rocks” commands to add HDP clusters, hosts, services, components, and configure each of the services to fit the needs of the users.

There are two advantages to using the StackIQ CLI to deploy HDP:

As mentioned earlier, a CLI allows for easy scripting - hence easy automation.
The StackIQ CLI is completely plugged into the StackIQ database, and can transfer knowledge about the cluster to the Ambari server.

The list of StackIQ commands that talk to the Ambari server are given below.

run ambari api [resource=string] [command=GET|PUT|POST|DELETE] [body=JSON]

This command makes a generic API call to the Ambari server. This is the basis to all the Ambari-specific commands listed below.

For example, if we need to list all the clusters in an Ambari server, run:

# rocks run ambari api resource=clusters command=GET

To create a HDP cluster called “dev”, run:

# rocks run ambari api resource=clusters/dev command=POST body=’{“Clusters”: {“version”: “HDP-2.1”}}’

add ambari host {host}

Add hosts to the Ambari server. This bootstraps the host, runs the ambari-agent on the host, and registers the host with the Ambari server.

add ambari cluster {cluster} [version=string]

Add a HDP cluster to the Ambari server, with the given name and version number.

add ambari cluster host {hosts} [cluster=string]

Add a host to the HDP cluster. The host must be added to the Ambari server using the “rocks add ambari host” command before this command is run.

add ambari config [body=JSON] [cluster=string] [config=string]

Send service specific configuration to the Ambari server as a JSON string.

add ambari service [cluster=string] [service=string]

Add a HDP service (e.g., HDFS, MapReduce or YARN) to the HDP cluster.

add ambari host component {host} [cluster=string] [component=string]

Add a HDP component (e.g., HDFS Namenode, MapReduce History Server, etc.) to the HDP cluster.

report ambari config [cluster=string]

Print the desired HDP cluster configuration.

start ambari cluster {cluster}

Start the HDP cluster through Ambari.

start ambari host component {hosts} [cluster=string] [component=string]

Start a single host component in the HDP cluster.

start ambari service [cluster=string] [service=string]

Start a single service in the HDP cluster.

start ambari service component [cluster=string] [component=string] [service=string]

Start all instances of a service component in a HDP cluster.

stop ambari cluster {cluster}

Stop all services in a HDP cluster.

stop ambari host component {hosts} [cluster=string] [component=string]

Stop a single instance of a service component running on the specified host.

stop ambari service [cluster=string] [service=string]

Stop a single service in the HDP cluster.

stop ambari service component [cluster=string] [component=string] [service=string]

Stop all instances of the specified component in a HDP cluster.

sync ambari config [cluster=string]

Synchronize the configuration of an Ambari cluster to the configuration specified in the StackIQ database. This command gets the output of the “rocks report ambari config” command, and applies it to the Ambari server.

sync ambari hosts {cluster}

Synchronize the hosts and host components to the HDP cluster. The database maintains a mapping of hosts to service components. For example, the StackIQ database contains data mapping the “HDFS datanode” service, and the “YARN Resource Manager” service to “compute-0-1.

This command adds the host to the HDP cluster, and creates those components on the host.

sync ambari services {cluster}

Synchronize all the services to the HDP cluster. The list of services , ex. HDFS, MapReduce, YARN, Nagios, etc., that the admin wants to deploy is gleaned from the StackIQ database. This command gets the list of services from the database, and creates the services in the HDP cluster.

create ambari cluster {cluster}

This command is a meta command that runs some / all the commands listed above.

Deploying a HDP cluster on a Newly Installed StackIQ Cluster

The “create ambari cluster” command can be used to initially deploy HDP on a newly installed StackIQ cluster.

Simply map the necessary HDP components to the specific backend nodes that you want the components to run on. This mapping can be done using the “rocks set host attr” command.

For example, if we want to deploy a namenode on compute-0-2, we can run:

# rocks set host attr compute-0-2 attr=hdp.hdfs.namenode value=true

We repeat the above command for all the components that we want to deploy. At a minimum, HDFS, ZooKeeper, Ganglia, and Nagios service components must be mapped to compute nodes. Once the host-to-service component mapping is satisfactory, we can run the “rocks create ambari cluster” command to deploy HDP.

This command gets a list of all hosts in the StackIQ cluster, checks to see which hosts are to be used in the HDP cluster, adds them, creates the required services, maps the service components to the hosts in the cluster, installs them, configures them, and then starts these services.

The command also has added support for Namenode HA. If the namenode service component is mapped to two hosts, and the secondary namenode service component is not specified, command installs HDFS in High-Availability NameNode mode. This allows for the standby namenode to take over namenode operations without losing any data, if the primary namenode were to fail.

In the next blog post, we’ll tell you more about the host-to-service mapping, and using spreadsheets to manage it all. In the meantime, give this a try and let us know what you think. Stay tuned for more.

The StackIQ Team

@StackIQ

↧

Configure and Deploy Puppet on Top of StackIQ Cluster Manager

September 15, 2014, 4:14 pm

≫ Next: How to Deploy IBM InfoSphere BigInsights With StackIQ Cluster Manager

≪ Previous: Deploying a Hortonworks Data Platform Cluster With the StackIQ CLI

Puppet is a popular software suite that helps sys admins with the configuration and management of applications. In this post we will give step-by-step instructions on how to integrate Puppet on top of StackIQ Cluster Manager.

Step 1: Install StackIQ Cluster Manager

First off, install StackIQ Cluster Manager (download first and see documentation for detailed instructions). StackIQ Cluster Manager will also serve as the puppet-master. The Puppet master will be automatically installed and turned off. You can verify this by executing:

service puppetmaster status

Step 2: Install Puppet Agents

By default, every backend node (we'll call them compute appliances) will have the Puppet packages installed with the Puppet service turned off (this is controlled by the attribute ‘puppet_agent_package’ which is set to “True” for all compute appliances). Puppet agents can be turned on later as needed.

If you want Puppet packages to be installed on other appliance types, run the below command in StackIQ Cluster Manager:

rocks set appliance attr <Appliance Type> attr=puppet_agent_package value=True

Step 3: Sign Certificates from Puppet Agents

Once the compute appliances are up, execute the following in StackIQ Cluster Manager:

rocks run host compute command=’service puppet start’
puppet cert --list

The above will list all Puppet agents that have contacted Puppet master. The command:

“puppet cert --sign --all” will sign all outstanding certificates or if individual certificates need to be signed execute “puppet cert --sign <hostname>”.

To control whether the Puppet agent should be running on startup we have the ‘run_puppet_agent’ attribute. Setting this to “True” ensures that the Puppet agent is running even after a reinstall.

rocks set appliance attr <Appliance Type> attr=run_puppet_agent value=True

By default this attribute value is set to "False" for all compute appliances.

Step 4: Test Installation

Install a sample helloworld application from puppet-forge by executing the following command on StackIQ Cluster Manager:

puppet module install ricardogarcia-helloworld

The above module creates a directory called hello-world in /tmp. Now, we need to tell Puppet to sync this module to all compute nodes. We can specify this in the /etc/puppet/manifests/site.pp file. Create the site.pp file with the contents below:

node /^compute-.*$/ {

include helloworld

}

Note, that /^compute-.*$/ is a regular expression that matches all hostnames beginning with ‘compute’. Modify this appropriately for the hostnames in your cluster.

The hello-world module will be compiled into a catalog and sync-ed with the agents every 30 minutes (by default). If you want to see the results of this immediately, you can restart the Puppet agents. Alternatively instead of re-starting puppet agents one can also execute a salt command as below:

salt 'compute-*' puppet.run

Now when you list the contents of /tmp/ on all puppet-agents, you should see hello-world directory created.

rocks run host compute command='ls /tmp/ | grep hello-world'
compute-0-0: hello-world
compute-0-1: hello-world
compute-0-2: hello-world

Alternatively, the logs on the Puppet agents will also have the message below:

Jun 12 12:06:14 compute-0-0 puppet-agent[13089]: Reopening log files
Jun 12 12:06:14 compute-0-0 puppet-agent[13089]: Starting Puppet client version 3.6.0
Jun 12 12:06:16 compute-0-0 puppet-agent[13092]: (/Stage[main]/Helloworld/File[/tmp/hello-world]/ensure) created
Jun 12 12:06:16 compute-0-0 puppet-agent[13092]: Finished catalog run in 0.05 seconds

And that's it! We’re certain you’ll find this the quickest and easiest way to configure Puppet on top of our cluster management software. Give it a shot and send us your questions or comments!

The StackIQ Team

@StackIQ

↧

How to Deploy IBM InfoSphere BigInsights With StackIQ Cluster Manager

October 28, 2014, 1:50 pm

≫ Next: Cloudera Automation With StackIQ Cluster Manager

≪ Previous: Configure and Deploy Puppet on Top of StackIQ Cluster Manager

INTRODUCTION:

StackIQ has partnered with IBM to simplify the process of deploying IBM InfoSphere BigInsights. StackIQ Cluster Manager is ideal for deploying the hardware infrastructure and application stacks of heterogeneous data center environments. For InfoSphere BigInsights this includes proper configuration of disk, InfoSphere BigInsights accounts, and passwordless SSH required for a fully functioning InfoSphere BigInsights cluster.

In this post, we’ll discuss how this is done, followed by a step-by-step guide to installing InfoSphere BigInsights with StackIQ.

Components:

The hardware used for this deployment was a small cluster: 1 node (i.e., 1 server) is used for the StackIQ Cluster Manager, 1 node serves as the BigInsights manager, and 4 nodes are used as backend or data nodes. In the simplest example, each node has 1 disk and all nodes are connected together via 1Gb Ethernet on a private network. StackIQ Cluster Manager and the InfoSphere BigInsights manager server are also connected to a corporate public network using a second NIC. Additional networks dedicated to Hadoop services can also be connected but are not used for purposes of this example. StackIQ Cluster Manager has been used in similar deployments whether with 2 nodes or well over 4,000+ nodes.

Step 1: Getting Started

The StackIQ Cluster Manager node is installed from bare metal (i.e., there is no software pre-installed) by burning the StackIQ Cluster Core Roll ISO to DVD and booting from it (the StackIQ Cluster Core Roll can be obtained from the “Rolls” section after registering at http://www.stackiq.com/download/).

Let’s pause for a moment. For those of you unfamiliar with StackIQ, Rolls are additional software packages that allow for extending the base system through mass installation and configuration of many nodes in parallel. These Rolls are what makes our automation platform flexible and easily customizable.

The Cluster Core Roll leads the user through a few short forms (e.g., what is the IP address of StackIQ Cluster Manager, what is the gateway, DNS server, etc.) and then asks for a base OS DVD (for example, Red Hat Enterprise Linux 6.5; other Red Hat-like distributions such as CentOS are supported as well, but for Red Hat Enterprise Linux, only certified media is acceptable). The installer copies all the bits from both DVDs and automatically generates a new Red Hat distribution by blending the packages from both DVDs together.

The remainder of the StackIQ Cluster Manager installation requires no further manual steps and this entire step takes between 30 to 40 minutes.

A detailed description of StackIQ Cluster Manager can be found in section 3 of the StackIQ Users Guide. It is highly recommended that you familiarize yourself with at least this section before proceeding. (The print is large and there are plenty of pictures so it isn’t that bad.)

https://s3.amazonaws.com/stackiq-release/stack3/roll-cluster-core-usersguide.pdf

If you have further questions, please contact support@stackiq.com for additional information.

This is what you'll need:

An installed StackIQ Cluster Manager frontend. See the above documentation.
An ISO of CentOS or RHEL 6.5. NOT 6.6, really, seriously, 6.5 or InfoSphere BigInsights won't install.
The InfoSphere BigInsights tar file, either Community or Enterprise edition. Community can be downloaded from IBM:
- http://www-01.ibm.com/software/data/infosphere/biginsights/quick-start/downloads.html
- (This is about a 3G download. Get a cup or a pot of coffee depending on your corporate bandwidth. Donuts, donuts would be good too.)
The InfoSphere BigInsights Bridge Roll from StackIQ. It can be downloaded from here:
- https://s3.amazonaws.com/stackiq-release/stack4/biginsights-bridge-1.1-stack4.x86_64.disk1.iso
- MD5 = 7f6b9e9d5008e6833d7cc9e1b1862c6b
Patience and the support email: support@stackiq.com should you run into difficulties.

Step 2: Install the Biginsights Bridge Roll.

StackIQ has developed software that “bridges” our core infrastructure management solution to InfoSphere BigInsights named the BigInsights Bridge Roll (now there's a surprise). The BigInsights Bridge Roll is used to create the biadmin/bigsql/catalog user accounts, passwordless SSH access for these accounts, and other critical configuration steps as indicated in the InfoSphere BigInsights documentation. The BigInsights Bridge roll prepares the cluster to allow the deployment of InfoSphere BigInsights via the BigInsights installer without any further configuration from you. (We do recommend that you set site-specific passwords, and we'll show you how this is done shortly.) This allows you to leverage the InfoSphere BigInsights manager to install a fully functioning Hadoop and Analytics cluster with minimal interaction.

StackIQ Cluster Manager uses “Rolls” to combine packages (RPMs) and configuration (XML files which are used to build custom kickstart files) to dynamically add and automatically configure software services and applications.

The first step is to install a StackIQ Cluster Manager as a deployment machine. This requires that you use, at a minimum, the cluster-core and RHEL 6.5 ISOs. It’s not possible to add StackIQ Cluster Manager on an already existing RHEL 6.5 machine. You must begin with the installation of StackIQ Cluster Manager. The biginsights-bridge roll can be added once the StackIQ Cluster Manager is up and running or during installation of the frontend.

Please be aware RHEL/CentOS 6.5 is a hard requirement for IBM InfoSphere BigInsights. As of this writing, RHEL/CentOS 6.6 is not supported by InfoSphere BigInsights.

It is highly recommended that you check the MD5 checksums of the downloaded media.

You must burn the cluster-core roll and RHEL Server 6.5 ISOs to disk, or, if installing via virtual CD/DVD, simply mount the ISOs on the machine's virtual media via the BMC.

Then follow this https://s3.amazonaws.com/stackiq-release/stack3/roll-cluster-core-usersguide.pdf for instructions on how to install StackIQ Cluster Manager in section 3. (Yes! I mentioned it again.)

What You’ll Need:

After StackIQ Cluster Manager is installed and booted, log in and add the biginsights-bridge roll.
On the StackIQ Cluster Manager, download biginsights-bridge roll ISO: https://s3.amazonaws.com/stackiq-release/stack4/biginsights-bridge-1.1-stack4.x86_64.disk1.iso
- # cd /export
- # wget https://s3.amazonaws.com/stackiq-release/stack4/biginsights-bridge-1.1-stack4.x86_64.disk1.iso

Copy the roll to a directory on the StackIQ Cluster Manager. "/export" is a good place as it should be the largest partition.

Verify the MD5 checksums:

# md5sum biginsights-bridge-1.1-stack4.x86_64.disk1.iso

Should return:

7f6b9e9d5008e6833d7cc9e1b1862c6b biginsights-bridge-1.1-stack4.x86_64.disk1.iso

Then execute the following commands on the frontend:

# rocks add roll biginsights-bridge*.iso
# rocks enable roll biginsights-bridge
# rocks create distro
# rocks run roll biginsights-bridge | bash

The BigInsights Bridge roll will enable you to set-up a BigInsights Manager node from which to install BigInsights on the rest of the cluster.

Step 3: Install BigInsights Manager and backend nodes

The next step is to install the BigInsights Manager. Before we do this, however, it is advisable to change the default passwords that were installed for the biadmin, bigsql, and catalog users.

StackIQ Cluster Manager drives infrastructure automation via key/value pairs called "attributes" in the cluster manager database. These attributes can be set and over-ridden at the global, appliance, and host level. There are several attributes, including these user passwords, for InfoSphere BigInsights.

To see these attributes do the following:

# rocks list attr attr=biginsights.

Screen_Shot_2014-10-22_at_3.25.05_PM

(The period at the end of that is required. You can also do "rocks list attr | grep biginsights" but it's not as cool.)

At the moment we will deal with the three password values. You'll notice that these values are not shown in the output, this is because they are set to "shadow" and are only available to the root and apache users during kickstart.

The current passwords are all set to "biadmin." You want to change this largely because everyone who reads this blog post now knows what your passwords are. (I admit this is not likely to approach millions of views, but it will be searchable so....)

Change them like this:

Screen_Shot_2014-10-22_at_3.29.15_PM

The "rocks set attr" command will change the password as given on the command line. The SSL command will hash these passwords in the database and hide them with the "shadow=true" flag. You can use different passwords for each account or the same password. You will need to know these passwords when configuring BigInsights with the BigInsights UI installation. It is highly recommended that you clear your history after running the above commands. (Also, don't use "mynewpassword" as the password because, well, you know, millions of views and all that.)

You'll also notice two other attributes. These control partitioning schemas. For the default installation, these are fine as is. The default partitioning uses only the system disk (assuming it's large). The biginsights.data_mountpoint only matters when bigsinsights.partitioning is set to "multidisk." In the default "singledisk" case, only the sytem disk is used. In the multidisk case, any disks other than the system disk will have /hadoop0X where X is the number of the disk in the array. You can change this mountpoint by changing the attribute value. Further elucidation for more advanced configuration will be found in a follow-up blog post. If you need to know how to do it now, send email to support@stackiq.com, and we'll walk you through the proper changes.

So let's install some backend nodes.

There are two ways to do this: using "discovery" mode or using a properly formatted host CSV file. We will use discovery and leave the configuration of a host CSV file for later. The discovery mode assumes you have full control of your network and can set the frontend into promiscuous DHCP mode. If you don't have this control over the network, you'll have to add hosts via spreadsheet. Instructions for configuration with a host.csv fall under more advanced configuration and will be covered in a further blog post.

Go to the StackIQ Cluster Manager Web UI via the public hostname or the public IP. In this example the public IP is 192.168.1.50

Untitled

Go to the Discovery tab.

bi-8

Verify "Automatically Detect" is chose and click "Continue."

bi-7

Click "Enable" to start discovery mode.

bi-5

Click Start, it will ask you to login. Login as "root" with the root password you supplied during installation.

bi-6

On the "Appliance" drop down, choose "bi-manager" and click "Start."

Note: You only need one bi-manager appliance. Don't install more than one.

bi-9

Turn on the machine that will act as the bi-manager. It should be set to PXE boot first. This will be discovered and installed. Once the button turns gray and the visualization starts. It has been kickstarted. It will look something like this:

bi-10

You can now install the rest of the backend nodes. Click "Stop"

bi-11

Choose "Compute" in the Appliance dropdown.

bi-12

Click "Start." Now boot all the other machines which also should have been set to PXE boot first.

bi-13

These will be discovered and start installing. The visualization shows the peer-to-peer installer sharing of the RPM packages. This allows for scaling during installation, 2 or 1000 nodes takes about the same time.

bi-15

Once the buttons next to the compute nodes turn green. Click "Stop." You are ready for the next step, Installing BigInsights.

bi-16

Step 4: Installing IBM InfoSphere BigInsights

This tutorial assumes the use of the BigInsights community edition. The Enterprise Edition should be similar but requires a purchased license from IBM. It is our goal to make certain to automate as much as possible but allow the full use of a product's capabilities. Sometimes this means automation takes a back seat to allow for the full use of a given product. This allows both users (you) and the vendor (IBM) to be able to have the correct set of tools required to fully deploy and support the application. This means there are some steps that must be done by hand. Using the IBM BigInsights Installer in following the manner allows for greater site customization and better support capabilities.

Before we begin, to get to the BigInsights Installer WebUI, we want to have an IP address we can get to. You can do this on the private subnet IP for the bi-manager, but it may be easier to assign it a public IP, well, public to your subnet, so we'll add an interface on our public network.

Set the IP:

# rocks set host interface ip bi-manager-0-0 eth1 192.168.1.51

Set the network subnet:

# rocks set host interface subnet bi-manager-0-0 eth1 public

Verify:

# rocks list host interface bi-manager-0-0

Sync the network on the bi-manager.

# rocks sync host network bi-manager-0-0

Screen_Shot_2014-10-22_at_4.43.49_PM

This machine should now have an interface on the public subnet.

Now we need to install the BigInsights installer tar file. You should have downloaded this from IBM. Here is a link to the community edition. The Enterprise Edition must be purchased and downloaded with a license.

http://www-01.ibm.com/software/data/infosphere/biginsights/quick-start/downloads.html

Copy the tar file to either the frontend or the bi-manager machine directly.

# scp iibi3001_QuickStart_x86_64.tar.gz bi-manager-0-0:/home/biadmin

Log into the bi-manager machine and change the permissions on the tar file.

Change to the biadmin user.

# su - biadmin

Untar it.

# tar -xvf iibi3001_QuickStart_x86_64.tar.gz

Change to the BigInsights installer directory and start the installer.

# cd biginsights-3.0.0.1-quickstart-nonproduction-Linux-amd64-b20140918_1248

# ./start.sh

This is what the above steps look like:

Screen_Shot_2014-10-22_at_4.57.42_PM-1

When the installer is started, it will list public and private URLS to continue the BigInsight Web UI installation. Go to any of the URLs you have access to and follow the installation promps.

Step 5: Installing BigInsights using the BigInsights Installer Web UI.

In our example case, the bi-manager is at 192.168.1.51 so we'll open a browser at 192.168.1.51:8300

bi-18-1

Click "Next" on the intro page. It's likely a good idea to read it.

bi-19-1

Accept the license and choose "Next."

bi-20

Since this is a "singledisk" install and the system disk is large, we can accept the default directory structure. If we had chosen "multidisk," we would reconsider this. But for now, defaults are fine.

bi-22

On this next screen choose the second option: "Use the current user biadmin with passwordless sudo privileges on all nodes." Trust me on this. The biginsights-bridge roll sets up passwordless sudo access on all nodes for biadmin, catalog, and bigsql. This greases the install and cuts some time off the installation. Then click "Next."

bi-23

Let's add the nodes we've installed to the BigInsights instance. Choose "Add Nodes."

bi-24

You'll get a window in which to add nodes. Make it simple, use a regex and click "OK."

bi-25

You can never have to many buttons to push, so make sure your nodes are correct and available and then "Accept" them. They have fragile egos and need your validation.

bi-26

And then do the "Next" thing.

bi-27

Now add the passwords you defined above in the biginsights.bigsql_password and the biginsights.biadmin_password attributes and click next.

bi-28

Accept the defaults on this screen and hit "Next."

bi-29

In the default install, the bi-manager is the name node. Change these per your site specifications warrant it and hit "Next."

bi-30

Since this is a pretty basic install we'll set-up PAM with flat file authentication." If you have LDAP, please send email to support@stackiq.com on the process to make LDAP work. Then hit "Next."

bi-31

Check over everything to see if it meets your site criteria and then click "Install."

bi-32

The installation is going to take a bit, far longer then the installation of the actual machines. The more machines you have in the cluster, the longer the BigInsights installation will take because not all aspects of the installer are parallel. However, when it succeeds you'll have a bullet-proof BigInsights installation.

bi-33

The log can be watched during installation from a terminal window on the bi-manager or from the Log tab in the StackIQ UI. Cut and paste the log path and you can watch it there. This will be more fully covered in a following post.

Sooner or later it will be done:

Click "Finish."

bi-36

Go the BigInsights Console URL on the bi-manager at port 8080. In this case it is 192.168.1.51:8080

Log in as the "biadmin" user with the password you supplied for the biginsights.biadmin_password url. If you kept the defaults (really?) it will be "biadmin."

bi-38

Which should bring you to the BigInsights Console. From here, consult the IBM BigInsights documentation for further use.

bi-39

CONCLUSION:

With the help of StackIQ and IBM you should now have a functioning IBM InfoSphere BigInsights installation on the pile of machines that have been glaring at you in your data center. StackIQ is ideal for automating some of the more tedious parts of cluster installation and allow you to fully deploy a functioning Hadoop and analytics cluster to further your business needs.

↧

Cloudera Automation With StackIQ Cluster Manager

November 7, 2014, 3:34 pm

≫ Next: Adding Value, not Complexity.

≪ Previous: How to Deploy IBM InfoSphere BigInsights With StackIQ Cluster Manager

In a previous blog post about StackIQ and Cloudera integration, we looked at how one can install a Cloudera cluster using StackIQ Cluster Manager and the Cloudera Manager GUI. In this post we will explore how to automate the installation using StackIQ CLI and spreadsheet-based role configuration.

First, install StackIQ Cluster Manager, CDH Manager and backend nodes

Bring up the StackIQ Cluster Manager, CDH Manager node and all backend nodes (Steps 1 through 4 of previous blog post). Use the links below to download ISOs:

CDH Bridge Roll http://stackiq-release.s3.amazonaws.com/stack4/cdh-bridge-5-stack4.x86_64.disk1.iso
Cloudera Manager Roll http://stackiq-release.s3.amazonaws.com/stack4/cloudera-manager5-6.6-0.x86_64.disk1.iso
Cloudera CDH5 Roll http://stackiq-release.s3.amazonaws.com/stack4/cloudera-cdh5-6.6-0.x86_64.disk1.iso

Install the Cloudera Manager GUI using the installation script (Step 5 of previous blog post). By default if the CDH Bridge Roll is added to the distribution, the compute appliance (the default backend node appliance) and the cdh-manager appliance will have the Cloudera packages installed and the Cloudera agents running by default. This can be extended to other appliance types as given below (needs to be set before installing the appliances):

/opt/rocks/bin/rocks set appliance attr <appliance_type> attr=cdh_agent value=True

One can verify if the cloudera-agents are up by running the below command:

rocks run host compute command='service cloudera-scm-agent status'

compute-0-0: cloudera-scm-agent (pid 3195) is running...

compute-0-1: cloudera-scm-agent (pid 3174) is running...

compute-0-2: cloudera-scm-agent (pid 3171) is running...

After this step, we are now going to install the Cloudera services using a spreadsheet and the command line.

Why Spreadsheets?

Spreadsheets are a convenient way of viewing and assigning the various roles for Cloudera. Our automation software bridges the gap between the convenient view to something that works every time.

Screen Shot 2014-09-19 at 12.22.53 PM.png

See here for a sample. The column headers have the below format:

cdh.<cluster_name>.<service>.<role>

cluster_name is the name of the CDH cluster (cannot contain spaces for now).
service is the Cloudera service e.g.: hdfs, mr, etc.
role is the service role e.g.: tasktracker, jobtracker for MR and namenode, datanode for hdfs.

Entries in the target column are the host names for which roles need to be assigned in the CDH cluster.

After you fill out the spreadsheet, save it as a CSV file (in the example below, we’ve saved the spreadsheet as “cdh_roles.csv”).

The Command Line

Run the two commands below on the StackIQ Cluster Manager node:

# rocks load attrfile file=cdh_roles.csv

The above command will load the column names in the csv file as host-specific attributes.

# rocks create cdh cluster name=prod21

The above command creates a Cloudera cluster, configures services and starts them.

The current cdh-bridge Roll supports automation of the below services that are a part of ‘Core Hadoop package’:

HDFS
MRv1
Hive
Sqoop
Zookeeper
Hue
Oozie
Cloudera Management Services - Activity Monitor, Host Monitor, Service Monitor, Events Manager, Alert Publisher

For Hive, the database has currently been set to a postgres DB on the cdh-manager node. This can be modified via Cloudera Manager GUI if needed. One can see the below output if a cluster was created successfully.

# rocks create cdh cluster name=prod21

Cluster created!

Updated configuration and role types

Starting zookeeper...

Formatting HDFS Namenode...

Starting HDFS Service...

Create HDFS /tmp directory...

Creating Hive metastore DB...

Creating Hive metastore tables...

Creating Hive user directory...

Creating Oozie DB...

Installing Oozie shared lib...

Starting Oozie...

Creating Sqoop user directory...

Starting Sqoop2...

Starting Hue...

Starting Cloudera Management Services…

This can also be verified by visiting the Cloudera Manager GUI at http://<cdh-manager-host>:7180/cmf/

Screen Shot 2014-09-25 at 11.29.54 AM.png

By default the cluster is created with Cloudera Express License. The Enterprise license can be added via the Administration Menu as needed.

# rocks remove cdh cluster [name=string]

The above command stops all running services and deletes the CDH Cluster specified by the name attribute and the Cloudera management services. This command can be used as a rollback option if there are issues during cluster creation.

Testing

Run a sample Hadoop program (Step 7 of previous blog post) to test if the cluster is functional.

Thats it! Let us know how it goes.

↧

Adding Value, not Complexity.

January 6, 2015, 7:45 am

≫ Next: In Control... At Every Layer

≪ Previous: Cloudera Automation With StackIQ Cluster Manager

For regular readers of the StackIQ blog, Hi! I’m Don MacVittie, the new Senior Solutions Architect here at StackIQ.

For regular readers of my aggregated blog, if you have not, meet StackIQ – Web Scale Infrastructure Management vendor that will knock your socks off.

Introductions completed, let’s move on to the topic at hand, shall we?

OpenStack and Hadoop are both amazingly powerful platforms for those who need (and recognize that they need) them. We all know what private clouds and big data are, so I won’t waste your time explaining them. I will point out for those who haven’t had the pleasure of installing them that they are terribly difficult to install and manage. Not because they’re poorly designed and written, and not because Open Source doesn’t care about usability, but because they’re that complex. To illustrate this point, I like to point people to this picture from the OpenStack documentation that shows a simplified depiction of the architecture:

OpenStack.Architecture

I call this simplified because there are optional parts that you can also install and configure, and it doesn’t show the construction of the physical environment used to support this system. Each of the boxes in the above diagram is a massive system in its own right – networking, RBAC, ISO management, virtualization engine… The list goes on. Hadoop has a similar architecture, and similar challenges.

As I learned how to get OpenStack up and running in a variety of environments – from beefy laptops for demos to massive datacenters – I came to the conclusion that whomever automated this process won.

I stand by that judgment. The complexity required to serve up functional systems that interoperate with the rest of the network or to allow querying of huge amounts of data in a reasonable amount of time is massive, and even little things like single server misconfigurations or failed drives can put the entire integrated system off-line or in degraded performance mode (lose the disk on your keystone server, and you’ll see what I mean).

While I have not seen a truly comparable diagram out there for No SQL big data deployments, Hadoop does have an even more simplified version:

Typical_Hadoop_Cluster_WhitePaper_July_18_2012

The thing is, they both serve the business by massively improving agility – Big Data driving business intelligence agility, OpenStack driving server/app provisioning agility. But in the arena of installation and ongoing management, they themselves are not very agile, and cost IT man-hours keeping these critical systems up to date and functioning properly.

That’s why I came to StackIQ, because the products answer these vexing problems. And they do it well. My second week here at StackIQ, I sat down with the product and installed OpenStack – in about two hours. Without tutelage. If you know OpenStack, you know what that means. It can take two hours to install and configure virtual networking, let alone the entire architecture. I’m just starting a clean Hadoop install as I write this. Interestingly, I've not used full stack automation to rapidly deploy hadoop before, so I’m intrigued to see how well this goes. My suspicion, based on the OpenStack install, is that it will go well, even though I haven’t done it before. StackIQ makes things that easy.

As I wend my way through the various features, facets, and astounding extras of the product, I’ll be here updating you. I tend to enjoy blogging about occasional extraneous things too, so you are certainly welcome to read those as we go also.

Until then, I’ll be in the lab, on the phone, checking email… Did I mention in the lab?

↧

In Control... At Every Layer

January 19, 2015, 1:39 pm

≪ Previous: Adding Value, not Complexity.

The idea of DevOps is appealing, particularly in highly complex environments. There are just so many places where a system can go wrong, let alone a complex interconnected multi-machine system like a cluster or a cloud hosting environment. As systems have become progressively more complex, there have been improvements in deployment, monitoring, and management capabilities to address those changes in complexity.

I hear frequently from current and future customers that what is appealing about StackIQ is the idea that they could take deployment, monitoring, and management, and roll them into easy to use bundles, while still maintaining adaptability. This struck me as a powerful proposition.

Only a month into the job as an evangelist of the product, I thought I’d share some insight with you on how it is done, and why it is done that way. There will be more detail in the future, we'll just start out with an overview of the power and flexibility StackIQ brings to bear.

Note, for this blog, I will be ignoring some standard ways of expressing things here at StackIQ to focus on an explanation that does not require knowledge of the StackIQ nomenclature. Of course I want to get you thinking about our product, but also, I want to spur some ideas about better ways of doing things, and that is best done in a more generic manner. Since I am a StackIQ employee though, where we have a brand name, I will use it.

The first thing that is unique about StackIQ is the depth of installation capability. From bare metal (or bare virtual metal) to functional system in just a few minutes. Functional system can be a RedHat derivative Linux box, or a full on Ambari server, or even a full on Foreman server. That is everything, there is no prompting of the user done at any point in the install.

The way that StackIQ gets from bare metal to full-on server is both simple and complex. Out of the box, the OpenStack manager disk can install StackIQ Boss (our base product), and Boss can manage an OpenStack installation. The same is true for the Big Data disk, and there are other possibilities – HPC, other flavors of big data that require additional software, and core Linux machines.

First, let me introduce what those “easy to use bundles” I mentioned above are. Imagine if you could make a tarball that held the ISO, the RPMs, all the necessary configuration information for both OS and packages, some custom software, a few scripts (even Puppet scripts, if your organization uses them), and a manifest that tells the system what order to perform each of the steps. That is what StackIQ Pallets are. They’re a complete definition of how a machine needs to work. From RAID and specialized NIC configuration before install to final configuration of the application being installed. It knows everything that is not variable, and StackIQ has a mechanism to feed it those parts (like IP address and hostname) that vary from machine to machine.

people

Now imagine if you had a team of trained techies who knew how to install everything perfectly, and you had one per server, and they could all start at the same time. StackIQ does. The Pallet is a set of instructions to perfectly install a server, and our Avalanche-based installer (a distribution network using modified BitTorrent) allows Pallets to be installed simultaneously on as many servers as you need. I believe the record for simultaneous installs with StackIQ to date is 2000.

And finally, what if all of those installers were knowledgeable in the bevy of tools used in your organization every day. Whether you’re running Puppet or Satellite Server StackIQ knows how to use them to make installs conform to corporate standards. Again, this is what StackIQ does. Sometimes – as in the case with Ambari or Foreman – our Pallet sets up the infrastructure to fulfill all of the prerequisites. Sometimes - like a core Linux installation - our Pallet knows all it needs without handing off to an application layer and guarantees a finished configuration and installation of the entire application stack. Either way, the time to get the systems configured is a fraction of what it would take by hand - or even with popular scripting tools.

Similarly, for Puppet, the system installs the necessary parts, but they can be disabled, enabling the StackIQ Boss to play the role of a puppet server through a single command line statement. Enabling all of the machines defined as “compute” nodes to use Puppet agents is also simple – because they’re already configured with the agents, the functionality just has to be turned on if you need it. The point is, the tools (and beyond built-in support, anything that can be called or configured from a Linux command line) can be used, there is no need for a complete change in procedures. If a scripting language works for you, or a bare metal installer, you can continue to use it, while getting the benefits of StackIQ Boss’ total application stack management.

And, we’ve only just highlighted the install process. What if all these people were available every time you upgraded, patched, or modified your complex systems? Because StackIQ Boss knows what you’ve already installed and how it’s configured, changing these things is relatively easy, and just involves utilizing tools provided to update the Pallets that require changes.

But, it doesn’t stop with installation and upgrades. There’s more! Those machines are monitored 24x7. The infrastructure your mission critical apps are running on is watched like a hawk. Need to understand disk usage on a given partition of a given machine somewhere in the datacenter? Log in to Boss and check it. The overall performance of your cluster is there, and you can drill down to the memory/cpu/disk usage to see what individual machines are doing. Along with such automated reporting is a performance test that will tell you how each machine under management is performing. Since the report comes out as a scatter graph, it is relatively easy to find the outlying machines (performance-wise), and discover what is different about them – so you can troubleshoot why there are low performance outliers, or replicate high performance outliers.

At every step of the lifecycle, and every layer of the application infrastructure, it’s like having a team of experts at your beck and call. StackIQ Boss’ many hooks into tools that do one job very well obviates costly and frustrating forklift changes, and removes the steep learning curve that’s often a less talked about but significant burden for “the new way to do things”.

This is what Warehouse Grade Automation is all about. Full-stack, Full-lifecycle automation of the infrastructure while drawing on the investments IT has already made.

↧

report ambari config [cluster=string]

start ambari cluster {cluster}

start ambari host component {hosts} [cluster=string] [component=string]

start ambari service [cluster=string] [service=string]

start ambari service component [cluster=string] [component=string] [service=string]

stop ambari cluster {cluster}

stop ambari host component {hosts} [cluster=string] [component=string]

stop ambari service [cluster=string] [service=string]

stop ambari service component [cluster=string] [component=string] [service=string]

sync ambari config [cluster=string]

sync ambari hosts {cluster}

sync ambari services {cluster}

create ambari cluster {cluster}

Deploying a HDP cluster on a Newly Installed StackIQ Cluster

Step 2: Install the Biginsights Bridge Roll.

Step 3: Install BigInsights Manager and backend nodes

Step 5: Installing BigInsights using the BigInsights Installer Web UI.