DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular unsupervised machine learning algorithm used for clustering spatial data points. Unlike K-Means clustering, DBSCAN does not require the number of clusters to be specified in advance and is capable of identifying clusters of arbitrary shapes and sizes. This repository provides an overview of DBSCAN clustering along with examples and implementations in Python.
DBSCAN clustering works by grouping together closely packed data points into clusters based on two key parameters: epsilon (ε) and minimum points (MinPts).
- Epsilon (ε): The radius around each data point within which neighboring points are considered part of the same cluster.
- Minimum Points (MinPts): The minimum number of data points required to form a dense region (core point) in order for it to be considered as part of a cluster.
-
Core Point Identification:
- For each data point, identify its ε-neighborhood (including the point itself).
- If the number of points in the neighborhood is greater than or equal to MinPts, the point is considered a core point.
-
Cluster Expansion:
- Assign each core point and its ε-neighborhood to the same cluster.
- If a non-core point is within the ε-neighborhood of a core point, it is assigned to the same cluster as the core point.
- Repeat this process until all data points have been assigned to clusters or marked as noise.
-
Noise Identification:
- Any data points that are not assigned to any cluster are considered noise points.
- Epsilon (ε): The radius around each data point within which neighboring points are considered part of the same cluster.
- Minimum Points (MinPts): The minimum number of data points required to form a dense region (core point) in order for it to be considered as part of a cluster.
- Does not require the number of clusters to be specified in advance.
- Capable of identifying clusters of arbitrary shapes and sizes.
- Robust to noise and outliers.
- Does not assume clusters are spherical or have similar densities.
- Sensitivity to the choice of ε and MinPts parameters.
- May struggle with clusters of varying densities or non-uniform distribution of data points.
- Computationally more expensive compared to K-Means clustering, especially for large datasets.
- Image segmentation and object detection.
- Anomaly detection in cybersecurity.
- Identifying spatial clusters in geographic data.
- Customer segmentation in marketing.
- Identifying natural groupings in biological data.
This repository includes sample datasets in CSV format that can be used to practice DBSCAN clustering. The datasets contain spatial data points with relevant attributes for clustering tasks.
└── DBSCAN_Clustering/
├── README.md
├── Wine_Dataset_DBSCAN.ipynb
├── requirements.txt
├── wine-clustering.csv
└── wine-dataset-EDA.htmlRequirements
Ensure you have the following dependencies installed on your system:
- JupyterNotebook
- Clone the DBSCAN_Clustering repository:
git clone https://github.com/sumony2j/DBSCAN_Clustering.git- Change to the project directory:
cd DBSCAN_Clustering- Install the dependencies:
pip install -r requirements.txtUse the following command to run DBSCAN Clustering:
jupyter nbconvert --execute notebook.ipynbContributions are welcome! Here are several ways you can contribute:
- Submit Pull Requests: Review open PRs, and submit your own PRs.
- Join the Discussions: Share your insights, provide feedback, or ask questions.
- Report Issues: Submit bugs found or log feature requests for Dbscan_clustering.
Contributing Guidelines
- Fork the Repository: Start by forking the project repository to your GitHub account.
- Clone Locally: Clone the forked repository to your local machine using a Git client.
git clone https://github.com/sumony2j/DBSCAN_Clustering.git
- Create a New Branch: Always work on a new branch, giving it a descriptive name.
git checkout -b new-feature-x
- Make Your Changes: Develop and test your changes locally.
- Commit Your Changes: Commit with a clear message describing your updates.
git commit -m 'Implemented new feature x.' - Push to GitHub: Push the changes to your forked repository.
git push origin new-feature-x
- Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
Once your PR is reviewed and approved, it will be merged into the main branch.