How to setup Big Data Development Environment?
- Need to have modern laptop with 64 bit OS and at least 16 GB RAM (for support of all softwares and speed respectively).
Big Data is open source and there are many technologies one need to learn to be proficient in Big Data eco system tools such as Hadoop, Spark, Hive, Pig, Sqoop etc. This blog will cover how to set up development environment on personal computer or laptop using distributions such as Cloudera or Hortonworks. Both Cloudera and Hortonworks provide virtual machine image which contain all Big Data eco system tools packaged. This blog will provide
- Comparison of Virtualization software such as Virtualbox and VMWare
- Step by step instructions to set up virtualization software such as virtualbox or VMWare
- Choosing Cloudera or Hortonworks image
- Step by step instructions to set up VM using chosen image
- Setup necessary additional components such as MySQL database and log generation tool
- Review HDFS, Map Reduce, Sqoop, Pig, Hive, Spark etc.
VirtualBox or Vmware comparison:
What is Virtualization?
Virtualization is a combination of software and hardware engineering that creates Virtual Machines (VMs) – an abstraction of the computer hardware that allows a single machine to act as if it where many machines.
There are two type of software used for virtualization VirtualBox and Vmware. Let’s Compare features of VirtualBox and Vmware.
VirtualBox
VirtualBox is best for visualizing single desktop environments.
Platform: Windows, Mac, Linux
Price: Free
Features
- Easy installation of popular operating systems like Windows, Linux, and Mac OS X
- Run multiple virtualized environments simultaneously
- Run a guest OS in “seamless mode”, which puts the applications on your main Windows desktop
- Fast performance all around
- supports snapshots of your virtual machines, so you can start it up from any configuration or point in its life
- 3D Virtualization
- Open virtual disk images made in VirtualBox, VMWare, or Microsoft Virtual PC
Why to use virtual box
First because it is free for use. VirtualBox makes running other operating systems—whether it be Linux, other versions of Windows, or even Mac OS X—super easy on your home computer. Just insert your install disc (or point it to an ISO on your computer), and you can install it in a virtual machine with as much or as little RAM, CPU, and hard drive space as you want. It integrates with our mouse pointer, so you don’t even have to click on the window to start using it, and lets you create “snapshots” of your machines so, like restore points, you can just boot it up from any point in its history and use it from that point. You can even share your clipboard back and forth between your virtualized and host OS.
Why to avoid this
VirtualBox can seem a little intimidating to most beginners, but so can any virtualization program. In addition, its “seamless” mode, while cool, isn’t done quite as well as VMWare’s— it brings the entire toolbar of your guest OS with it, and moving the Windows around isn’t the smoothest experience. But, overall, it’s still very feature-filled, and with a great documentation and a ton of users.
Vmware
Vmware is best for server virtualization.
Platform: Windows, Mac, Linux
Price: not Free
Features
- include all feature of virutal box and some advance features also
- VMware Player, run directly on the hardware itself and provide all the services you need within the software package
- VMware also supports restricted virtual machines, which is useful when you want to prevent unauthorized IT personnel from tampering with configuration settings.
Why to use VMware
It includes some extra features rather than virual box. Vmware provides more user friendly and smooth options.
Why to avoid this
It is not free. provides so many options a newbie can not understand all the complications.