Skip to content

This repository contains Jupyter Notebooks related to fraud detection, data streaming, and real-time data visualization. These notebooks cover various aspects of processing, analyzing, and modeling data to address fraudulent transactions in eCommerce and other contexts.

License

Notifications You must be signed in to change notification settings

DemolisherAA/Fradulent_Data_Detection_Apache

Repository files navigation

Project Overview

This repository contains Jupyter Notebooks related to fraud detection, data streaming, and real-time data visualization. These notebooks cover various aspects of processing, analyzing, and modeling data to address fraudulent transactions in eCommerce and other contexts.

Files

  1. Analysing Fraudulent Transaction Data.ipynb

    • Purpose: Exploratory data analysis (EDA) of fraudulent transaction datasets.
    • Key Components:
      • Analyzing patterns in fraudulent transactions.
      • Visualizing data distributions and key features.
      • Libraries used: pandas, matplotlib, seaborn.
  2. Building Models for eCommerce Fraud Detection.ipynb

    • Purpose: Building and evaluating machine learning models for fraud detection.
    • Key Components:
      • Preprocessing data for model training.
      • Training and evaluating models such as Logistic Regression, Random Forest, etc.
      • Libraries used: scikit-learn, numpy, pandas.
  3. Producing the Data.ipynb

    • Purpose: Simulating and producing data streams for analysis.
    • Key Components:
      • Generating mock data for fraud scenarios.
      • Producing data using streaming technologies.
      • Libraries used: faker, pandas.
  4. Consuming Data Using Kafka and Visualise.ipynb

    • Purpose: Consuming data streams and visualizing results.
    • Key Components:
      • Setting up Kafka consumers to read data streams.
      • Visualizing the processed data for insights.
      • Libraries used: kafka-python, matplotlib.
  5. Streaming Application Using Spark Structured Streaming.ipynb

    • Purpose: Building a streaming application for real-time data processing.
    • Key Components:
      • Setting up Spark Structured Streaming.
      • Processing streaming data in real-time.
      • Libraries used: pyspark.

Getting Started

Prerequisites

  • Python 3.x
  • Jupyter Notebook or Google Colab
  • Required Python libraries:
    • pandas, numpy, matplotlib, seaborn
    • scikit-learn, faker, kafka-python, pyspark

Installation

  1. Clone the repository:
    git clone <repository-url>
  2. Navigate to the project directory:
    cd <repository-folder>
  3. Install the required libraries:
    pip install -r requirements.txt

Usage

  • Open the notebooks in Jupyter or any compatible environment (e.g., Google Colab).
  • Follow the instructions within each notebook to execute the cells in sequence.

Datasets

The datasets used in this project are too large to include in the repository. Please email me at [your-email@example.com] to request access to the datasets.

License

This project is licensed under GNU (General Public License). See the LICENSE file for details.

Acknowledgments

  • Python documentation
  • Open-source libraries used in the project
  • Kafka and Spark community resources

About

This repository contains Jupyter Notebooks related to fraud detection, data streaming, and real-time data visualization. These notebooks cover various aspects of processing, analyzing, and modeling data to address fraudulent transactions in eCommerce and other contexts.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published