The below architecture diagram is designed using EdrawMax as it has free architecture symbols. As per the below architecture diagram, Hadoop Distributed File System (HDFS) exposes a file system namespace and allows user data to be stored in files. These files are split into one or more blocks within the internal System, and these blocks then process from preprocessing (ETL). It should be noted here that a data source is a location where data is created or where physical information is first digitized. However, even the most refined data may serve as a source. As the image suggests, a data source may be a database, a flat-file, live measurements from physical devices, scraped web data, or any of the myriad static.