Analyze large sets of data that range from 1 TB – 10 TB Perform Data structuring to semi-structured, cleansing of Structured Data and keying the data by pulling out Dimension’s. Designing and building Migration pipeline from Traditional RDBMS to Hadoop Optimizing and tuning of SQL queries to Hive based Queries. Designing and developing unit test cases to validate the functionality and performance of the data services and Data validation with cross-check with existing Oracle Data ware house. Actively engage with client stake holders for better business understanding and usage of Data for customer needs.Contribute in designing the new data model in Hadoop with data flow’s and flow charts of Hybridized platform.
Actively contributing to design decisions of the product.
Building Data flow and deploy on Control M with automation and deploy the. hql and .sh scripts to production environment and automate the new incoming data sets.
Developing integration of data form Hadoop to NoSQL database such as MonetDB and MongoDB.
Deploy the data flow with Big Data querying tools such as Pig, Hive and Impala.