Saturday, February 6, 2016

Shared Memory : Simple C program to demonstrate setting up Share memory

shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs.let see how it
actually work using simple c programs.lets create two process which will use shared memory

//process1.c
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdlib.h>
#include <sys/ipc.h>
#include <sys/shm.h>


int main(int argc,char *argv[]){


int shmid;
key_t key;
char *shm;
char *memloc;
key = 9876;

shmid = shmget(key,100,IPC_CREAT | 0666);
if(shmid <0){
    printf("unable to get shared memory area identifier");
    exit(1);
}


shm = shmat(shmid,NULL,0) ;
if (shm == (char *)-1)
{
    printf("map/unmap shared memory");
    exit(1);
}


memcpy(shm,"this is shared content",22);
memloc = shm;
memloc +=22;

*memloc = 0;

while(*shm != '!')
    sleep (1);

return 0;
}

let create an another program which will read from the shared memory created by the process1.

#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdlib.h>
#include <sys/ipc.h>
#include <sys/shm.h>


int main(int argc,char *argv[]){


int shmid;
key_t key;
char *shm;
char *mloc;
key = 9876;

shmid = shmget(key,100,IPC_CREAT | 0666);
if(shmid <0){
    printf("unable to get shared memory area identifier");
    exit(1);
}


shm = shmat(shmid,NULL,0) ;
if (shm == (char *)-1)
{
    printf("map/unmap shared memory");
    exit(1);
}

for(mloc = shm; *mloc !=0;mloc++)
    printf ("%c\n",*mloc);

*shm = '!';
return 0;
}
Compilation:
gcc -o process1 process1.c 
gcc -o process2 process2.c 
Run:
./process1 -- will create and write to the shared memory and will be running
from another shell
./process2 -- will write the content from shared memory to the console and set content to the shared memory which terminate the process1.

-- 
ipcs -m
IPC status from  as of Sat Feb  6 13:24:00 IST 2016
T     ID     KEY        MODE       OWNER    GROUP
Shared Memory:
m  65536 0x00002694 --rw-rw-rw-   rsingh    staff


ipcrm -M 9876 // 9876 is hex equivalent of the shared memory key.

Saturday, January 9, 2016

BTrace : Tracing a java Application

BTrace is a helpful tool that lets you run Java-like scripts on top of a live JVM to capture or aggregate any form of variable state without restarting the JVM or deploying new code. This enables you to do pretty powerful things like printing the stack traces of threads, writing to a specific file, or printing the number of items of any queue or connection pool and many more.

lets trace a simple java application to know how much time a method took to complete at runtime without modifying a code.
Installation:
-Download latest version of BTrace from https://kenai.com/projects/btrace/downloads/directory/releases and unzip it some location.
-Add the env variable as follows

export BTRACE_HOME=/root/btrace-bin
PATH=$PATH:$BTRACE_HOME/bin

This is the sample code which I want to trace
package  com.rajkrrsingh.trace.test;  
  
public  class  BtraceTest {  
  
    public  void  execute (int i) {   
        System.out.println ("executing "+i+" time");  
        try  {  
            Thread.sleep (Math.abs (( int ) (Math.random () *  1000 )));  
        }  catch  (InterruptedException e) {  
            e.printStackTrace ();  
        }  
    }  
  
    public  void  doExcute () {  
       int i=0; 
       while  ( true ) {  
           
            execute (i);
            i++;  
        }  
    }  
  
    public  static  void  main (String [] args) {  
        BtraceTest btraceTest =  new  BtraceTest ();  
        btraceTest.doExcute ();  
    }  
}  

BTrace require a tracing script which has annotated action about what to trace and can add additional capabilities.this is the user guide to know about the different annotations and what they signifies.we have a following tracing script which will trace the execute method and print its execution time. to compile this script you need to have brace-tool.jar in the class path.

package  com.rajkrrsingh.trace.test;

import com.sun.btrace.annotations.BTrace;

import static com.sun.btrace.BTraceUtils.jstack;
        import static com.sun.btrace.BTraceUtils.println;
        import static com.sun.btrace.BTraceUtils.str;
        import static com.sun.btrace.BTraceUtils.strcat;
        import static com.sun.btrace.BTraceUtils.timeMillis;

        import com.sun.btrace.annotations.BTrace;
        import com.sun.btrace.annotations.Kind;
        import com.sun.btrace.annotations.Location;
        import com.sun.btrace.annotations.OnMethod;
        import com.sun.btrace.annotations.TLS;


@BTrace
public class TraceMethodTime {

    @TLS
    static Long beginTime;

    @OnMethod(clazz="com.rajkrrsingh.trace.test.BtraceTest", method="execute")

    public static void traceExecuteBegin() {
        println("method Start!");
        beginTime = timeMillis();

    }


    @OnMethod(clazz="com.rajkrrsingh.trace.test.BtraceTest", method="execute", location=@Location(Kind.RETURN))

    public static void traceExcute() {

        println(strcat(strcat("btrace.test.MyBtraceTest.execute time is: ", str(timeMillis() - beginTime)), "ms"));
        println("method End!");
        jstack();
    }
}

now it time to trace the already running java application, to know the process id of the application use jps command which will fetch you the process id.attach btrace to process id as follows and it will start tracing the application
btrace process-id script-location
btrace 22618 com/rajkrrsingh/trace/test/TraceMethodTime.java 
Btrace will attach to the running jam and you will see the following output on the console.
method Start!
btrace.test.MyBtraceTest.execute time is: 323ms
method End!
com.rajkrrsingh.trace.test.BtraceTest.execute(BtraceTest.java:12)
com.rajkrrsingh.trace.test.BtraceTest.doExcute(BtraceTest.java:18)
com.rajkrrsingh.trace.test.BtraceTest.main(BtraceTest.java:25)
method Start!
btrace.test.MyBtraceTest.execute time is: 655ms
method End!
com.rajkrrsingh.trace.test.BtraceTest.execute(BtraceTest.java:12)
com.rajkrrsingh.trace.test.BtraceTest.doExcute(BtraceTest.java:18)
com.rajkrrsingh.trace.test.BtraceTest.main(BtraceTest.java:25)

Thursday, January 7, 2016

SparkThrift Server Configuration in yarn mode

Env: MapR,Spark 1.4.1
Start spark thrift server on mapr cluster as follows (-- start thrift server non root user)
/opt/mapr/spark/spark-1.4.1/sbin/start-thriftserver.sh --master yarn --hiveconf hive.server2.thrift.bind.host `hostname` --hiveconf hive.server2.trift.port 10000

check for any error exception in the logs(in my case it is tailf /opt/mapr/spark/spark-1.4.1/logs/spark-mapr-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-ip-10-0-0-233.out)
if there is no error exception in the logs try connecting using the beeline ship with the spark distribution

/opt/mapr/spark/spark-1.4.1/bin/beeline 
Beeline version 1.4.1 by Apache Hive
beeline> !connect jdbc:hive2://10.0.0.233:10000/default;auth=noSasl
scan complete in 1ms
Connecting to jdbc:hive2://10.0.0.233:10000/default;auth=noSasl
Enter username for jdbc:hive2://10.0.0.233:10000/default;auth=noSasl: 
Enter password for jdbc:hive2://10.0.0.233:10000/default;auth=noSasl: 
Connected to: Spark SQL (version 1.4.1)
Driver: Spark Project Core (version 1.4.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://10.0.0.233:10000/default> show tables;
+------------------------------+--------------+
|          tableName           | isTemporary  |
+------------------------------+--------------+
| b1                           | false        |
| b2                           | false        |
| bob                          | false        |
| ct                           | false        |

Friday, December 4, 2015

Google protobuf installation on Mac

$wget https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.bz2
$tar xvf protobuf-2.5.0.tar.bz2
$cd protobuf-2.5.0
$./configure CC=clang CXX=clang++ CXXFLAGS='-std=c++11 -stdlib=libc++ -O3 -g' LDFLAGS='-stdlib=libc++' LIBS="-lc++ -lc++abi"
$make -j 4 
$sudo make install
$protoc --version

Sunday, November 29, 2015

Amazon EMR : Creating a Spark Cluster and Running a Job

Amazon Elastic MapReduce (EMR) is an Amazon Web Service (AWS) for data processing and analysis. Amazon EMR offers the expandable low-configuration service as an easier alternative to running in-house cluster computing.
In this example lets spin a spark cluster and run a spark job which crunch the apache logs and filter out the error logs only.

Prerequisites:
AWS Account
install and configure the AWS CLI tool
create default roles

Spark Job
follow these steps to create a sample jobs
mkdir SampleSparkApp
cd SampleSparkApp
mkdir -p src/main/scala
cd src/main/scala
vim SimpleApp.scala

package com.example.project

/**
 * @author rsingh
 */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "s3://rks-clus-data/log.txt" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val  errors = logData.filter(line => line.contains("error"))
    errors.saveAsTextFile("s3://rks-clus-data/error-log.txt") 
  }
}

cd -
vim build.sbt

name := "Spark Log Job"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.5.0","org.apache.spark" %% "spark-streaming" % "1.5.0")

* now build the project using sbt
sbt package

the jar will be available after successful build target/scala-2.10/spark-log-job_2.10-1.0.jar
upload job jar to the s3 bucket
aws s3 cp target/scala-2.10/spark-log-job_2.10-1.0.jar s3://rks-clus-data/job-jars/

upload sample logs at your s3 bucket
aws s3 cp log.txt s3://rks-clus-data/

create job steps as follows
cat step.json
[
{
"Name": "SampleSparkApp",
"Type":"CUSTOM_JAR",
"Jar":"command-runner.jar",
"Args":
[
"spark-submit",
"--deploy-mode", "cluster",
"--class", "com.example.project.SimpleApp",
"s3://rks-clus-data/job-jars/spark-count-job_2.10-1.0.jar",
"s3://rks-clus-data/log.txt",
"s3://rks-clus-data"
],
"ActionOnFailure": "TERMINATE_CLUSTER"
}
]

now Spin a Amazon EMR cluster with auto terminate option
    aws emr create-cluster \
    --name "Single Node Spark Cluster" \
    --instance-type m3.xlarge \
    --release-label emr-4.2.0 \
    --instance-count 1 \
    --use-default-roles \
    --applications Name=Spark \
    --steps file://step.json \
    --auto-terminate

The above command will spin a spark cluster on EMR and run a job.it will terminate automatically irrespective of success or failure.

Tuesday, November 24, 2015

Apache Storm Sample Application

These are baby steps to build and run the storm application.
pre requisite : java 7, maven
create default maven archetype
$  mvn archetype:generate 
$  cd StormSampleApp/
$  update pom.xml as follow https://github.com/rajkrrsingh/StormSampleApp/blob/master/pom.xml
$  create a driver class composed of Spout definition, Bolt definitions and topology configuration as follows
   https://github.com/rajkrrsingh/StormSampleApp/blob/master/src/main/java/com/rajkrrsingh/storm/SampleTopology.java
$  mvn package
$  storm jar target/StormSampleApp-1.0-SNAPSHOT-jar-with-dependencies.jar com.rajkrrsingh.storm.SampleTopology