Wednesday, August 31, 2016

Major difference between Python 2 and Python 3



1. Print is now became function

The print statement has been replaced with a print() function. Example :


Old: print "The answer is", 2*2
New: print("The answer is", 2*2)

Old: print x,           # Trailing comma suppresses newline
New: print(x, end=" ")  # Appends a space instead of a newline

Old: print              # Prints a newline
New: print()            # You must call the function!

Old: print >>sys.stderr, "fatal error"
New: print("fatal error", file=sys.stderr)

Old: print (x, y)       # prints repr((x, y))
New: print((x, y))      # Not the same as print(x, y)!

2.Unicode and Strings


In Python 2, you had to mark every single Unicode string with a u at the beginning, like u'Hello'. So, with Python 3, they fixed it. All strings are now Unicode by default and you have to mark byte sequences with b. Using Unicode is a much more common scenario so it has reduced development time for everyone that does Python 3.

If you have or want to support both, you can still mark strings with u in Python 3, though.

3. Division With Integers


One of Python’s core values is to never do anything implicitly. You shouldn’t turn a number into a string unless the programmer tells you to, for example. But Python 2 took this a bit too far. Consider this problem:

5 / 2 

For most of us, our immediate answer is 2.5, which is, of course, the right answer. But Python 2 said "oh, you only gave me integers so you must want an integer back" and happily returned 2. Well, yes, I did give you integers, but I’d rather have a correct answer than an answer that matches my data types.

Again, Python 3 fixed this. Python 3 will give 2.5 as the answer to that question. In fact, it gives a float (a number with a decimal in it) to every division operation. But, if you are expecting an integer (round value), then probably you can use double division operator (//) , which will return integer value:

5//2  will print 2


4. input() is Now Safe to Use


In Python 2, there was a raw_input() function and an input() function. raw_input() was the one you always wanted to use. input() was a great way to have your code do things you didn’t want it to. The reason for this is that input() evaluated whatever came in. So good users would send in 123 and Python, trying to be helpful, would make that into an integer instead of a string. Bad users would send in little, or not so little, bits of Python which would then be evaluated, or run. Cue your software doing things you didn’t ask it to do.

In Python 3, raw_input() is replaced with input(), which no longer evaluates the data it receives. You always get back a string.

5. Performance


The net result of the 3.0 generalizations is that Python 3.0 runs the pystone benchmark around 10% slower than Python 2.5. Most likely the biggest cause is the removal of special-casing for small integers. There’s room for improvement, but it will happen after 3.0 is released!


Saturday, August 27, 2016

Running Python script on Unix



So far we have learned about basics of Python including how to print anything in Python and its major data types. We have also seen different types of loop and function declarations. In this article, we will write small function and save it to a file (.py) and then we will execute it from terminal

Writing first Python script  

We will write a function, which will accept user input and validate, if user has entered incorrect input. It will keep on prompting for user input until it gets correct input.


def getFloatFromUser(prompt):
    while True:
        number = raw_input(prompt)
        try:
            number = float(number)
        except:
            print 'That is not a float, please try again.'
            continue
        # everything OK
        return number

myFloat = getFloatFromUser('Please enter a float: ')
print myFloat


We have saved above code to my_first_python_script.py file.


Running Python script  

We have saved our python file to our tutorial folder. Now, we will open the terminal and run this file, but before that, we need to change file to executable.
 $chmod +x /Desktop/MyTutorial/Python/my_first_python_script.py
Once, you have changed file to executable , you need to type following command to execute your python script :
 $python /Desktop/MyTutorial/Python/my_first_python_script.py

The script will start executing and expects valid input, please refer below snippet : 


$ python /Desktop/MyTutorial/Python/my_first_python_script.py
Please enter a float: tt
That is not a float, please try again.
Please enter a float: las
That is not a float, please try again.
Please enter a float: sdf
That is not a float, please try again.
Please enter a float: 90
90.0

Until user enters valid input, it will keep on prompting for valid input. Once, user enters valid input, it will display it to screen and come out from the loop. 

That's it !!

About NoSQL and its Databases



If you have heard "NoSQL" word from your friends or colleague and curious to know more about "NoSQL". Then, you came at the right place. In this article, we will talk about NoSQL and its databases and how they are different from most commonly used relational databases like Oracle, MySQL, SQL Server and etc..


About NoSQL

A NoSQL sometimes refer as "Not Only SQL" or "Non Relational", which means it can use SQL as well but it will be using non relational structure to store data. 


About NoSQL Databases

Most of NoSQL database are primarily non-relational database or distributed database, which stores data in form of key-value, wide column, graph, or document, which are different from those used by default in relational databases, which makes some operations faster in NoSQL. There are more than 255 NoSQL databases currently available to use on which MongoDB, CouchDB, HBase and Cassandra are widely used databases. 


NoSQL Database Categories

The NoSQL databases can be categorized into 5 types :
  1. Column : In this category of databases, data gets stored in column base structure in distributed data stores environment, where information is stored on more than one node. To store data, it uses tuple (key-value pair), which consists 3 elements Unique Name (to reference the column), Value (actual data) and Timestamp (to determine updated value). The well known databases in this category are Cassandra and HBase.

      Unique Name Unique Name Unique Name
      Value Value Value
      Timestamp Timestamp Timestamp


  2. Document : In this category of databases are designed for storing, retrieving, and managing document-oriented information. The central concept of a document-oriented database is the notion of a document. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML, JSON, and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on). The well known databases in this category are CouchDB and MongoDB.

      
      {
          Website: “http://apptech-solution.blogspot.in”, 
          Title: “About NoSQL and it’s Databases”, 
          Tag: “Database, NoSQL"
      }
      


  3. Key-Value : They are designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary or hash. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to quickly find the data within the database. The well known databases in this category are Dynamo and Redis.


      Key 1 Value 1
      Key 2 Value 2
      Key 3 Value 3
      Key 4 Value 4
      Key 5 Value 5


  4. Graph : Graph databases are based on graph theory, which will have nodes, edges and properties.
    • Nodes represent entities such as people, businesses, accounts, or any other item you might want to keep track of. 
    • Edges, also known as graphs or relationships, are the lines that connect nodes to other nodes; they represent the relationship between them. 
    • Properties are pertinent information that relate to nodes.
    The well known databases in this category are Neo4J and OrientDB.



  5. Multi-Model :  It is designed to support multiple data models against a single, integrated backend. Document, graph, relational, and key-value models are examples of data models that may be supported by a multi-model database. The well known databases in this category are OrientDB and ArangoDB.



Friday, August 19, 2016

How to encrypt string in Java


Security is important for Business to assurance that you have all proper security systems in place. Encrypting user related information like phone number, password, credit card always assure that their information are safe and secure.  Almost all modern applications need, in one way or another, encryption plays vital role to empower the business. 

The MD5 algorithm is a widely used hash function producing a 128-bit hash value. So, we will create a class with single method to get encrypted value of a String.



import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.math.BigInteger;
import java.util.logging.Level;
import java.util.logging.Logger;

public class md5Hash {
    public static String getEncrypted(String password) {
        MessageDigest m = null;
        try {
            m = MessageDigest.getInstance("MD5");
            m.update(password.getBytes(), 0, password.length());
            return (new BigInteger(1, m.digest()).toString(16));
        } catch (NoSuchAlgorithmException ex) {
            Logger.getLogger(md5Hash.class.getName()).log(Level.SEVERE, null, ex);
        }
        
        return null;
    }
}

Now, you can simply call this method as shown below to get encrypted string 

String encrypted = md5Hash.getEncrypted(password);

Thursday, August 18, 2016

Schedule SQL Query to execute on specific time/interval



Imagine, you have created some stuff and to demo it, you have created a page where you accept dummy data from your guest users and you are populating them into your database. At some point of time, your database would be flooded with records that would be useless and this will definitely slow down the performance of your database. Now, in this case, you need to manually clean up your database in order to keep the performance of your database at it best.

But, instead of doing it manually you would think about something, which should automate cleanup work on regular interval. At this point of time, you probably think about CRON job. But, instead, we will use MySQL database inbuilt option called EVENTS, which we can use to execute certain task in database like executing SQL script, calling stored procedure etc.. 


Also, please note that, this is not like triggers. Triggers are fired on data or structure change but MySQL events are scheduled based on time/interval.

First of all Enable EVENT Scheduler

SET GLOBAL event_scheduler = ON;

Then, validate current status of Event Scheduler

SELECT @@event_scheduler;

Now, create EVENT to execute your SQL query


CREATE EVENT e_store_ts 
ON SCHEDULE
EVERY 1 HOUR
DO
UPDATE myschema.yourtable set mycolumn='N' -- update this table

List all EVENT created

SHOW EVENTS;

That's it. !!


Schedule MySQL Database backup on CPanel or Linux




If you running a business online and you are using MySQL database to keep your customer and product related information. Then, you always try to keep database backed up, just in case, if your database got crashed, you will have a copy of data. 

Taking manual backup will never be a good idea as it takes time and resource whereas you will find many paid services which will help you to keep your database backed up on regular interval. But, if you are looking for something, which can do same job without paying single penny to any third party. Then, this might be useful article for you

First of all, we create a backup folder, where we will keep our database backup dump file. And then, we will write the shell script, which will create database backup file and place it to our backup folder. 


#!/bin/sh
now="$(date +'%d_%m_%Y_%H_%M_%S')"
filename="db_backup_$now".gz
backupfolder=“"
fullpathbackupfile="$backupfolder/$filename"
logfile="$backupfolder/"backup_log_"$(date +'%Y_%m')".txt
echo "mysqldump started at $(date +'%d-%m-%Y %H:%M:%S')" >> "$logfile"
mysqldump —user= —password= --default-character-set=utf8  | gzip > "$fullpathbackupfile"
echo "mysqldump finished at $(date +'%d-%m-%Y %H:%M:%S')" >> "$logfile"
chown  "$fullpathbackupfile"
chown  "$logfile"
echo "file permission changed" >> "$logfile"
find "$backupfolder" -name db_backup_* -mtime +2 -exec rm {} \;
echo "old files deleted" >> "$logfile"
echo "operation finished at $(date +'%d-%m-%Y %H:%M:%S')" >> "$logfile"
echo "*****************" >> "$logfile"
exit 0

Above script will create backup file and place it to your backup folder and also it will remove dump file, which is older than 2 days. You can keep this shell script file to any location (but I would recommend to keep this file in same backup folder)

Now, depending on your requirement, you should schedule this script to run on specific time/interval by adding it to CRON job. And to add new CRON job, you need to provide following details, path of shell script file (i.e. command) and email id where you want to receive notification for any failure/success.




Once, you have successfully created CRON job, then you will see a CRON job entry created, which will be executed only specified time/interval 




Upon your specified time/interval, your backup folder will start getting populated with your database backup and log files.

That's it. !!

Let me know, if you got into any issue while configuring it.





Tuesday, August 16, 2016

Mutable & Immutable data types in Python



In my previous article, I have already explained :

  1. Installation of Python
  2. Python Syntax
  3. Printing “Hello Python”
  4. Different data types and their declarations
  5. Function, Loop and Conditional Controls 

I have also mentioned that, String Number and Tuple are Immutable whereas List and Dictionary are Mutable in nature. But, can we prove it ? Yes, of course, Python has a function id(), which returns the memory id of a variable. This we can use to understand Python data types and their memory allocations.


Now, we will declare a variable for each below mentioned data types and then we will change it’s value to something else and validate its object id.

String (Immutable)


>>> name="AppTech Solution"
>>> print name
AppTech Solution
>>> id(name)
4503631496        #object id
>>> name="Welcome to AppTech Solution"
>>> print name
Welcome to AppTech Solution
>>> id(name)
4503615728        #object id changed
>>> 

Number (Immutable)


>>> emp_id=12
>>> print emp_id
12
>>> id(emp_id)
140410018119968      #object id
>>> emp_id=20
>>> print emp_id
20
>>> id(emp_id)
140410018119776      #object id changed
>>> 

List (Mutable)

The values of List are enclosed with curly bracket []. We expects object id should remain same, even we manipulate its content.


>>> var_list=[3,5,"Ram"]
>>> print var_list
[3, 5, 'Ram']
>>> id(var_list)
4503638456           #object id
>>> var_list[0]=30   #changing value for existing index
>>> print var_list
[30, 5, 'Ram']
>>> id(var_list)
4503638456           # object id remains same
>>>

Tuple (Immutable)

Similar to List, but values are enclosed with small bracket (). We expects object id should change whenever value gets changed



>>> var_tup=("alpha",34,"beta")
>>> print var_tup
('alpha', 34, 'beta')
>>> id(var_tup)
4503411760                                #object id
>>> var_tup=(“alpha",34,"beta","gamma")   #assigning new value
>>> print var_tup
('alpha', 34, 'beta', 'gamma')
>>> id(var_tup)
4503316888                                #object id changed
>>> 

Dictionary (Mutable)

They are similar to hash-map and values are enclosed with curly bracket {}. We expects object id should remain same, even we manipulate its content.


>>> laptop={}
>>> laptop["hp"]=30000
>>> laptop["dell"]=35000
>>> print laptop
{'hp': 30000, 'dell': 35000}
>>> id(laptop)
4503636800                       #object id
>>> laptop[“dell"]=45000         #changing value for existing key
>>>print laptop
{'hp': 30000, 'dell': 45000}
>>> id(laptop)
4503636800                       #object id remains same
>>> laptop[“acer"]=15000         #adding new value
>>> print laptop
{'acer': 15000, 'hp': 30000, 'dell': 45000}
>>> id(laptop)
4503636800                       #object id remains same
>>>



You could see that, for String, Number and Tuple object id gets changed when we changes its value, but this wasn’t the case for List and Dictionary. This means every time, when we change values for String, Number and Tuple, then its actually creating a new object for it and assigning value to it.



Monday, August 15, 2016

How to display client IP Address


Some time you may want to show or capture client IP address for security reasons. Here are sample codes specific to language/script. 

PHP :

 echo $_SERVER[‘REMOTE_ADDR'];  


Java :


import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.logging.Level;
import java.util.logging.Logger;


public class IPAddress{
    public static void main(String[] a) {

        try {
            InetAddress thisIp = InetAddress.getLocalHost();
            System.out.print(thisIp.getHostAddress());     
        } catch (UnknownHostException ex) {
            Logger.getLogger(study.class.getName()).log(Level.SEVERE, null, ex);
        }

    }
}

C# :


protected string GetIPAddress()
{
    System.Web.HttpContext context = System.Web.HttpContext.Current; 
    string ipAddress = context.Request.ServerVariables["HTTP_X_FORWARDED_FOR"];

    if (!string.IsNullOrEmpty(ipAddress))
    {
        string[] addresses = ipAddress.Split(',');
        if (addresses.Length != 0)
        {
            return addresses[0];
        }
    }

    return context.Request.ServerVariables["REMOTE_ADDR"];
}


VB.Net :


Public Shared Function GetIPAddress() As String
    Dim context As System.Web.HttpContext = System.Web.HttpContext.Current
    Dim sIPAddress As String = context.Request.ServerVariables("HTTP_X_FORWARDED_FOR")
    If String.IsNullOrEmpty(sIPAddress) Then
        Return context.Request.ServerVariables("REMOTE_ADDR")
    Else
        Dim ipArray As String() = sIPAddress.Split(New [Char]() {","c})
        Return ipArray(0)
    End If
End Function

Javascript :


$.getJSON('//freegeoip.net/json/?callback=?', function(data) {
  obj = JSON.parse(JSON.stringify(data, null, 2));
  $('#ip').html(obj.ip);
});


Basics of Java Servlet


What is Servlet ?


A servlet is a Java programming language class used to extend the capabilities of servers that host applications accessed by means of a request-response programming model. Servlets also have access to the entire family of Java APIs, including the JDBC API to access enterprise databases.

The javax.servlet and javax.servlet.http packages provide interfaces and classes for writing servlets. All servlets must implement the Servlet interface, which defines lifecycle methods. The HttpServlet class provides methods, such as doGet and doPost, for handling HTTP-specific services.


Servlet Lifecycle


A servlet life cycle can be defined as the entire process from its creation till the destruction. The following are the paths followed by a servlet


  1. Servlet Initialization using init() method
  2. The servlet is initialized by calling the init () method. It is used for one-time initializations and called when the servlet is first created, not for each user request.

  3. Serving Client Request using service() method
  4. The servlet calls service() method to process a client's request. Each time the server receives a request for a servlet, the server creates a new thread and calls service. The service() method checks the HTTP request type (GET, POST etc.) and calls doGet, doPost etc. methods as appropriate.

  5. Reallocating memory using destroy() method
  6. The servlet is terminated by calling the destroy() method. It allows developers to close database connections, halt background threads, write cookie lists or hit counts to disk, and perform other such cleanup activities.

Finally, servlet is garbage collected by the garbage collector of the JVM.





Advantage of Servlet


1. Servlets provide a way to generate dynamic documents that is both easier to write and faster to run.
2. provide all the powerful features of JAVA, such as Exception handling and garbage collection.
3. Servlet enables easy portability across Web Servers.
4. Servlet can communicate with different servlet and servers.
5. Since all web applications are stateless protocol, servlet uses its own API to maintain  session


URL Mapping


When there is a request from a client, servlet container decides to which application it should forward to. Then context path of url is matched for mapping servlets. 
<servlet> <servlet-name>AddPhotoServlet</servlet-name> //servlet name <servlet-class>upload.AddPhotoServlet</servlet-class> //servlet class </servlet> <servlet-mapping> <servlet-name>AddPhotoServlet</servlet-name> //servlet name <url-pattern>/AddPhotoServlet</url-pattern> //how it should appear </servlet-mapping>

If you change url-pattern of AddPhotoServlet from /AddPhotoServlet to /MyUrl. Then, AddPhotoServlet servlet can be accessible by using /MyUrl. Good for the security reason, where you want to hide your actual page URL.

Rule 1 : 

The server context on the servlet container matches the pattern in /inbox/* as follows:

http://apptech-solution.blogger.in/inbox/synopsis               <—Correct
http://apptech-solution.blogger.in/inbox/complete?date=today     <— Correct
http://apptech-solution.blogger.in/inbox                           <— Correct
http://apptech-solution.blogger.in/server1/inbox                     <—  Incorrect

Rule 2 :

A context located at the path /geo matches the pattern in *.map as follows:

http://apptech-solution.blogger.in/geo/US/Oregon/Portland.map    <—Correct
http://apptech-solution.blogger.in/geo/US/server/Seattle.map   <—Correct
http://apptech-solution.blogger.in/geo/Paris.France.map          <—Correct
http://apptech-solution.blogger.in/geo/US/Oregon/Portland.MAP   <—Incorrect (case-sensitive)
http://apptech-solution.blogger.in/geo/US/Oregon/Portland.mapi <—Incorrect

Rule 3 :

A mapping that contains the pattern / matches a request if no other pattern matches. This is the default mapping. The servlet mapped to this pattern is called the default servlet.


The default mapping is often directed to the first page of an application. Explicitly providing a default mapping also ensures that malformed URL requests into the application return are handled by the application rather than returning an error.

Friday, August 12, 2016

Python Basics for absolute Beginners




1. What is Python Programming

It is a widely used high-level, interpreted, dynamic programming language. Its design philosophy emphasizes code readability, and allows programmers to express concepts in fewer lines of code when comparing other languages such as C++ or Java.

2. How to Use it

Download latest version of Python bundle from https://www.python.org/. Installation steps are pretty simple, you need to follow the steps while installation with the default settings.

Type “python”  on your terminal, to verify your python installation. This will print Python version along with your machine details.

Python 2.7.10 (default, Oct 23 2015, 19:19:21) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Now, you can start your python coding

3. Python Syntax 

To start any block, use : instead of { and indented with same number of space to indicate code for the same block. You will see, when we go through function declaration.

4. Say Hello to Python

To print anything on terminal you need to type “print” and space and your message. Example: 
print “Hello Python”

>>> print "Hello Python”   
Hello Python
>>> 

print became function in Python version 3.x. So, you need to use 

>>> print(“Hello Python”)   
Hello Python
>>>


5. Data Types in Python

  1. String
  2. Number
  3. List
  4. Tuple
  5. Dictionary


6. Variable Declaration and Naming Conventions

A valid variable declaration would be lowercase with underscore(_) between the words and shouldn’t start with number.Example :

my_var = 4  <— valid
8my_var = 5  <— invalid

6.1. How to declare String (Immutable)

name =“AppTech Solution”
str_with_quote=“AppTech Solution’s Tutorial”
str_multi_line=“”” AppTech Solution
Welcomes You”””

6.2. How to declare Number (Immutable)

my_var = 4

6.3. How to declare List (Mutable)

It holds sequences and values are enclosed with square bracket [].

>>> a=[3,5,"Ram"]
>>> print a
[3, 5, 'Ram']
>>> 

6.4. How to declare Tuple (Immutable)

Similar to List, but values are enclosed with small bracket ().

>>> a=("alpha",34,"beta")
>>> print a
('alpha', 34, 'beta')
>>> 

6.5. How to declare Dictionary (Mutable)

Are similar to hash-map and values are enclosed with curly bracket {}.

>>> laptop={}
>>> laptop["hp"]=30000
>>> laptop["dell"]=20000
>>> laptop["acer"]=25000
>>> print laptop["dell"]
20000
>>> print laptop
{'acer': 25000, 'hp': 30000, 'dell': 20000}
>>> 


7. Functions

Function name should not start with numbers and use underscore(_) between the words. Basic syntax is

def function_name(arg):
    body with indentation
    more code here
    some more line
    return    

Example:

def addition(num):
    return num+2

my_num= addition(5)


8. Loops and Condition Controls

The syntax of a while loop in Python programming language is 

while expression :
      statement(s)

Example:

>>> a=5
>>> while(a>1): 
...    print a
...    a-=1
... 
5
4
3
2
>>>

The syntax of a for loop in Python programming language is 

for iter in sequence :
          statement(s)

Example:

>>> for letter in 'Python':     # First Example
...    print 'Current Letter :', letter
... 
Current Letter : P
Current Letter : y
Current Letter : t
Current Letter : h
Current Letter : o
Current Letter : n





Thursday, August 11, 2016

Big Data and Hadoop Overview





What is Big Data?
It is a broad term for data sets, which is large or complex that traditional data processing applications are insufficient. 

Why we need to know about Big Data?
An increasing number of data sources such as social media and a growing number of media-rich data types such as videos are fueling the challenges in data analysis, capture, search, sharing, storage, transfer, visualization, and information privacy.

What is Apache Hadoop?
It is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. 
Some known features:

  •  Scalable: It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. 
  • High-Availability and Robust: Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer.

What are the Hadoop components?
Hadoop has four major components, which works together to provide better services:
  1. Hadoop Common : Common utilities to support other Hadoop modules
  2. Hadoop Distributed File System: Provides high-throughput access to application data. 
  3. Hadoop YARN: Framework for job scheduling and cluster management.
  4. Hadoop MapReduce: YARN-based system for parallel processing of large data set.
Note:
  • Node: Typically a computer.  
  • Rack: It is collections of multiple nodes, which are all connected to same network.
  • Cluster: It is collections of multiple racks. 

Architecture of Apache Hadoop
There are 2 major nodes in Hadoop Cluster


  1. HDFS Nodes
    • NameNode (one per cluster, which manages file-system and metadata.)
    • DataNode (Many per cluster, manages blocks with data and serves them to client, periodically reports to NameNode with block information)

  2. MapReduce Nodes
    • JobTracker (one per cluster, receives request from clients, schedule and monitor MapReduce jobs on TaskTracker)
    • TaskTracker (Many per cluster, which executes MapReduce operations)


Overview of Hadoop Cluster




Writing File to HDFS (Hadoop Distributed File System)

Note:File Block: 64 MB (default), 128 (recommended). Increasing file block size will reduce the seek time, which directly improves the performance of Hadoop. 

  1. Client submits the create request to NameNode. Then NameNode will check for the file existence and also verify whether client has write permission or not.
  2. Then NameNode will determine the DataNode, to write the first block of the file. (Note: If client is currently running on DataNode, then it will write the first block to the same DataNode otherwise it will pick random DataNode to write).
  3. Now, same block of data will be replicated to at-least two other places in the same cluster, which may resides in a same rack (as shown above). Again, DataNodes were randomly picked to write data blocks.
  4. Now, to ensure that data blocks were written successfully to DataNode, an acknowledgment will be sent back from last node to client in the reversible order. 
  5. Once client receives acknowledgment, then same process will be repeated for remaining blocks. 
  6. When client completes writing to all of the data blocks to DataNode and receives acknowledgement, then it tells to NameNode that “completed”.
  7. Then, NameNode will check the data block for minimal replication before responding.