Big Data Technologies
"Unlocking the Power of Data: Master the Tools and Techniques of the Big Data Ecosystem"
Why This Training?
In the modern data-driven era, understanding and harnessing the power of Big Data has become vital for businesses across all sectors. This training is meticulously designed to provide participants with in-depth knowledge of the Big Data ecosystem, its tools, techniques, and best practices. Unlock the potential of Big Data and gain a competitive edge in your career and business endeavors.
Duration: 30 Hours (online / virtual live session)

Who Should Attend?
Data Engineers, Data Scientists, and Data Analysts aiming to upgrade their skills.
IT Professionals and Software Developers looking to enter the Big Data domain.
Business Intelligence professionals keen on leveraging Big Data tools.
Decision-makers and managers overseeing data-driven projects.
Any individual passionate about the world of Big Data and its applications.
IT Professionals and Software Developers looking to enter the Big Data domain.
Business Intelligence professionals keen on leveraging Big Data tools.
Decision-makers and managers overseeing data-driven projects.
Any individual passionate about the world of Big Data and its applications.

Course Highlights
Foundational to Advanced: Commence with the basics of Big Data and journey through to advanced tools and techniques.
Hands-on Sessions: Practical exercises with top Big Data tools, ensuring applied learning.
Hands-on Sessions: Practical exercises with top Big Data tools, ensuring applied learning.
See more
Expert Guidance: Learn from industry veterans with vast experience in Big Data implementations.
Real-World Use Cases: Understand the application of Big Data through relevant industry case studies.
Holistic Coverage: From data ingestion and storage to analytics and visualization, gain a comprehensive view of the Big Data landscape.
Interactive Q&A: Engage in insightful discussions and clarify your doubts.
Real-World Use Cases: Understand the application of Big Data through relevant industry case studies.
Holistic Coverage: From data ingestion and storage to analytics and visualization, gain a comprehensive view of the Big Data landscape.
Interactive Q&A: Engage in insightful discussions and clarify your doubts.

Pre-requisites
Basic Programming Knowledge: Familiarity with any programming language.
Understanding of Databases: Prior knowledge of SQL will be beneficial.
Analytical Mindset: An inherent curiosity to decipher patterns and insights from data.
Basic Hardware Familiarity: Understanding of servers, storage, and general IT infrastructure.
Understanding of Databases: Prior knowledge of SQL will be beneficial.
Analytical Mindset: An inherent curiosity to decipher patterns and insights from data.
Basic Hardware Familiarity: Understanding of servers, storage, and general IT infrastructure.
Training Materials Needed by Participants
A laptop or desktop with at least 8GB RAM and a stable internet connection.
Pre-installed Big Data tools (a list will be provided prior to the training).
Relevant datasets for hands-on sessions (will be shared during the course).
Note-taking materials, whether digital or traditional.
Write your awesome label here.
Training Content
Big Data Technologies
1. Introduction to Big Data (3 hours)
Objective: Equip participants with foundational knowledge of Big Data, its significance, and the inherent challenges.
- Overview of Big Data: What is Big Data? Why is it important?
- 3Vs of Big Data: Volume, Velocity, and Variety.
- Real-World Applications: Case studies showcasing the value of Big Data analytics.
- Challenges with Traditional Systems: Why we needed a new set of tools and methodologies.
2. Big Data Ecosystem & Components (3 hours)
Objective: Provide a comprehensive overview of the Big Data ecosystem and familiarize participants with core components.
- Introduction to the Big Data Ecosystem: A view of various tools and platforms.
- Data Ingestion Tools: Introduction to Flume and Sqoop.
- Storage Solutions: Basics of HDFS (Hadoop Distributed File System).
3. Deep Dive into Hadoop (3 hours)
Objective: Enable participants to understand Hadoop's architecture, components, and its data processing methodology.
- Hadoop Components: MapReduce, YARN.
- Architecture and Workflow: How Hadoop processes data.
- Hands-on: Running a basic MapReduce job.
4. Apache Spark & Its Components (3 hours)
Objective: Introduce the capabilities of Apache Spark and its significance in fast, in-memory data processing.
- Introduction to Apache Spark: Understanding its rise and significance.
- Core Components: Spark RDDs, Spark DataFrames.
- Spark Modules: SparkSQL, Spark Streaming, MLlib.
- Hands-on: Setting up a basic Spark application.
5. Data Storage - NoSQL Databases (3 hours)
Objective: Elucidate the need for NoSQL databases, their types, and practical use cases.
- Why NoSQL?: Differences between SQL and NoSQL.
- Types of NoSQL Databases: Document, Columnar, Key-Value, Graph.
- Introduction to MongoDB and Cassandra.
- Hands-on: Setting up and inserting data into a NoSQL database.
6. Big Data on Cloud Platforms (3 hours)
Objective: Highlight the advantages of leveraging cloud platforms for Big Data solutions and introduce prominent cloud-based Big Data services.
- Overview: Advantages of cloud platforms for Big Data.
- Big Data on AWS: Introduction to services like EMR, Redshift, and Kinesis.
- Big Data on Azure: Overview of HDInsight, Azure Stream Analytics.
- Hands-on: Setting up a Big Data environment on a chosen cloud platform.
7. Advanced Analytics & Data Processing (3 hours)
Objective: Dive into advanced data analytics tools and techniques, emphasizing real-time data processing.
- Introduction to Apache Kafka: Real-time data processing.
- Data Analytics with Apache Hive: SQL-like querying with HiveQL.
- Data Workflow Orchestration with Apache Airflow.
- Hands-on: Setting up a data pipeline with Kafka, processing with Hive.
8. Data Visualization & BI Tools (3 hours)
Objective: Impress upon participants the importance of effective data visualization and familiarize them with top BI tools in the Big Data realm.
- Importance of Data Visualization in Big Data.
- Introduction to Tableau and Power BI: Connecting with Big Data sources, creating dashboards.
- Open-Source Alternatives: Apache Superset.
- Hands-on: Creating a dashboard using live Big Data streams.
9. Security & Best Practices in Big Data (3 hours)
Objective: Discuss the challenges and solutions associated with Big Data security and introduce best practices for robust Big Data implementations.
- Challenges in Big Data Security.
- Tools & Techniques: Apache Ranger, Apache Knox.
- Data Governance and Cataloging with Apache Atlas.
- Hands-on: Setting up basic security rules using Ranger.
10. Recap, Emerging Trends & Q&A (3 hours)
Objective: Consolidate the course's learnings, acquaint participants with emerging Big Data trends, and address any outstanding questions.
- Review of the Big Data Landscape: What we've learned.
- Emerging Trends: New tools, platforms, and methodologies on the horizon.
- Interactive Case Study: End-to-end processing of a Big Data task.
- Q&A Session: Addressing questions, uncertainties, and curiosities.
With clear objectives set for each session, participants will have a roadmap of what to expect and the skills they will acquire by the end of the training.
WOMEN AI ACADEMY
Women AI Academy is a gender-equality and technology driven learning & development organization
Copyright © 2023 Brought to you by Ethos ai AI Training & Consultancy GmbH
Ali Hessami is currently the Director of R&D and Innovation at Vega Systems, London, UK. He has an extensive track record in systems assurance and safety, security, sustainability, knowledge assessment/management methodologies. He has a background in the design and development of advanced control systems for business and safety-critical industrial applications.
Hessami represents the UK on the European Committee for Electrotechnical Standardization (CENELEC) & International Electrotechnical Commission (IEC) – safety systems, hardware & software standards committees. He was appointed by CENELEC as convener of several Working Groups for review of EN50128 Safety-Critical Software Standard and update and restructuring of the software, hardware, and system safety standards in CENELEC.
Ali is also a member of Cyber Security Standardisation SGA16, SG24, and WG26 Groups and started and chairs the IEEE Special Interest Group in Humanitarian Technologies and the Systems Council Chapters in the UK and Ireland Section. In 2017 Ali joined the IEEE Standards Association (SA), initially as a committee member for the new landmark IEEE 7000 standard focused on “Addressing Ethical Concerns in System Design.” He was subsequently appointed as the Technical Editor and later the Chair of P7000 working group. In November 2018, he was appointed as the VC and Process Architect of the IEEE’s global Ethics Certification Programme for Autonomous & Intelligent Systems (ECPAIS).
Trish advises and trains organisations internationally on Responsible AI (AI/data ethics, policy, governance), and Corporate Digital Responsibility.
Patricia has 20 years’ experience as a lawyer in data, technology and regulatory/government affairs and is a registered Solicitor in England and Wales, and the Republic of Ireland. She has authored and edited several works on law and regulation, policy, ethics, and AI.
She is an expert advisor on the Ethics Committee to the UK’s Digital Catapult Machine Intelligence Garage working with AI startups, is a Maestro (a title only given to 3 people in the world) and expert advisor “Maestro” on the IEEE’s CertifAIEd (previously known as ECPAIS) ethical certification panel, sits on IEEE’s P7003 (algorithmic bias)/P2247.4 (adaptive instructional systems)/P7010.1 (AI and ESG/UN SDGS) standards programmes, is a ForHumanity Fellow working on Independent Audit of AI Systems, is Chair of the Society for Computers and Law, and is a non-exec director on the Board of iTechlaw and on the Board of Women Leading in AI. Until 2021, Patricia was on the RSA’s online harms advisory panel, whose work contributed to the UK’s Online Safety Bill.
Trish is also a linguist and speaks fluently English, French, and German.
In 2021, Patricia was listed on the 100 Brilliant Women in AI Ethics™ and named on Computer Weekly’s longlist as one of the Most Influential Women in UK Technology in 2021.