CSFC: A New Centroid Based Clustering Method to Improve the Efficiency of Storing and Accessing Small Files in Hadoop: Recent Advancement

Rathidevi, R. and Parameswari, R. (2020) CSFC: A New Centroid Based Clustering Method to Improve the Efficiency of Storing and Accessing Small Files in Hadoop: Recent Advancement. In: Recent Studies in Mathematics and Computer Science Vol. 2. B P International, pp. 44-50. ISBN 978-93-90149-09-4

Full text not available from this repository.

Abstract

In day to day life, the computer plays a major role, due to this advancement of technology collection of data
from various fields are increasing. A large amount of data is produced by various fields because of IOT sensors
for every second and is not easy to process. This large amount of data is called as Big data. A large number of
small files also considered as Big data. It's not easy to process and store the small files in Hadoop. In the
existing methods Merging technologies and Clustering Techniques are used to combine smaller files to large
files up to 128 MB before sending it to HDFS in Hadoop. In the Proposed system CSFC (Clustering Small Files
based on Centroid) Clustering Technique is used without mentioning the number of Clusters previously because
if the clusters are mentioned before, all the files are clubbed within the limited number of clusters. In proposing
system clusters are generated by depending on the number of related files in the dataset. The relevant files are
combined up to 128 MB in a cluster. If any file is not relevant to the existing cluster or if the memory size
reached 128MB then-new cluster will be generated and the file will be stored. It is easy to process the related
files, comparing two relevant files. By using this method fetching data from the data node, it produces efficient
result when comparing with other clustering techniques.

Item Type: Book Section
Subjects: Open Research Librarians > Medical Science
Depositing User: Unnamed user with email support@open.researchlibrarians.com
Date Deposited: 20 Nov 2023 05:16
Last Modified: 20 Nov 2023 05:16
URI: http://stm.e4journal.com/id/eprint/2146

Actions (login required)

View Item
View Item