Posts

 Q1 Name a few commonly used spark ecosystems Answer: ·          Spark SQL (Shark) ·          Spark Streaming ·          GraphX ·          MLlib ·          SparkR Q2 What is “Spark SQL”? Answer: Spark SQL is a Spark interface to work with structured as well as semi-structured data. It has the capability to load data from multiple structured sources like “text files”, JSON files, Parquet files, among others. Spark SQL provides a special type of RDD called SchemaRDD. These are row objects, where each object represents a record. Q3 Can we do real-time processing using Spark SQL? Answer: Not directly but we can register an existing RDD as a SQL table and trigger SQL queries on top of that. Q4 Explain about the major libraries that constitute the Spark Ecosystem Answer: Spark MLib- Machine learning library in Spark for commonly used learning algorithms like clustering, regression, classification, etc. Spark Streaming – This library is used to process real time
  How should I manage multiple projects in AWS? If you’re building multiple projects or apps on AWS, you will need an approach to keep each one separate. Whether you’re developing websites for different clients or disparate products within a single organisation, you need to protect against: Traffic bursts in minor apps bringing down mission-critical ones. Developers reusing security groups and deciding to open 1 more port to get their new app working. Releases being accidentally deployed to the wrong website. A large monthly bill and no idea which app has caused it. Developers having unauthorised access to certain client’s data. There are a few potential approaches you could take: 1.       Use a naming convention and tags 2.      Isolate projects via separate VPC and IAM groups 3.      Completely separate accounts for each app Let’s look at these in a bit more detail… Approach #1 – Separation by Naming Convention and Tags This approac
  Set Up Your Organization for Cloud Adoption Success Many IT organizations share the core objective of achieving business and technical agility. A cloud center of excellence (CCoE) is a function that helps organizations balance speed and stability while they pursue this objective. A cloud center of excellence is the best-practice approach to drive cloud-enabled transformation. To ensure cloud adoption success, organizations must have the right skills and structure in place. The optimal way to achieve this is by setting up a centralized cloud center of excellence (CCOE). A CCOE is a centralized governance function for the organization and acts in a consultative role for central IT, business-unit IT and cloud service consumers in the business. A CCOE is key to driving cloud-enabled IT transformation. The CCOE is an enterprise architecture function. Its responsibilities include setting cloud policy, guiding provider selection, and assisting with solution architecture and workload