In the current era of information, huge volumes ofdata have become available on hand for decision making. So there is need oftools that implement certain methods to mine such type of data and medium forstorage. The rate at which it is growing is increasing the size every second.This term is thus coined as Big data which is not only big in size, but alsohighly different in terms of variety and velocity which makes them difficult tohandle using traditional tools and techniques. Due to the rapid growth of such data,it is required to find solutions to handle such data and extract interestingpatterns in order to gain knowledge from these datasets using Data Miningalgorithms Such value can be provided using big data analytics, which is theapplication of advanced analytics techniques on big data. The emergence of BigData has given rise to many security and privacy issues that need to behandled. Otherwise, Big Data Analytics will not fulfil the needs andopportunities. Keywords: Big Data, security, privacy 1.
Introduction Thinkingof a world without data storage, without social websites, without banking transactionsis next to impossible. We see that in today’s information era we are surroundedwith complex and large data storage. In social websites like facebook , twitterthere is a need for storing all profile details, who liked your picture onfacebook, who commented on that and what comment has been posted, allinformation is to be stored in some place. Anew type of analysis methods, different storage and visualization techniquesare required to analyse such sheer amount of data and visualize the extractedpatterns. John Mashey was the person who introduced the term Big Data in aSilicon Graphics(SGI) slide deck in 1998. Withthe advancements in science and technology, increase in number of internetusers, thus more social interactions, digital recording of data, data fromsensors, medical field, pictures from geostationary satellite are adding datato the rapidly growing dataset.
But as is said, everything comes with pros andcons .The irregularity and uncertainty in datasets due to different formats,sources, storage using cloud to promote sharing of resources. The use of large numberof software platforms in cloud infrastructure has increased probability ofattacks in the entire system. The different types of problems that has raisedwith emergence of Big data have been discussed in the following sections. Theterm “Big data” refers to acollection of data sets so large and complex that it becomes difficultto process using on-hand database management tools or traditional dataprocessing applications. BigData has much bigger and wider pool of organizations than these big companiesonly. It has been extended to any company and government agencies that dependon datasets of Big Data for statistical algorithms and different data mining techniquesto analyse these large datasets and ultimately improving decision making andenhancing efficiency to take better decisions. The various sources which areadding data to the datasets are listed below:· Media/entertainment:The media and the entertainment industry is playing a role in increasing thevolumes and variety of data either in form of text , jpeg files, twitter postsor videos.
They have started recording and delivering everything digitallywhich requires modern processing tools. · Medical: The healthcare industry is recording the data in electronic medical recordsand images, which is used for health monitoring and epidemiological researchprograms· Videosurveillance: In the present scenario, the security services has enhanced whichis helping out industries to analyse their data in a better manner. The wayvideos were recorded for surveillance has transited from CCTV to (ClosedCircuit Television) to IPTV (Internet Protocol Television) cameras.· Logistics,retail, utilities, and telecommunications: GPS transceivers,, Radio FrequencyIdentification tags and cell phones are generating data through their embeddedsensors. This data needs to be stored and handled so that it can be used byindustries to optimise their business related activities and enhanceoperational Business Intelligence.
· Datathrough Social Networks and Location trackers: Social Networking sites andmobile applications like Facebook, Twitter, Flipkart and other online shoppingsites, Google Maps are generating data in various data formats like commentsand likes to any posted photo or tweets by celebrities. All these socialnetworks are providing free services to their users. Internet Users are sharingphotos, videos and are blogging to keep themselves in touch with their friends. Fig.
1: Sources of Big Data 2. Characteristicsof Big Data BigData are datasets that requires new analytical and processing tools to extractpatterns from such highly scalable, diversified and distributed data. Threemain features characterize big data: volume, variety, and velocity, or thethree V’s.1.
Volume: Volume is always the first feature that comes into mindwhenever Big Data is the topic of discussion. There is general agreementthat if volume is in the gigabytes it is probably not Big Data, but at theterabyte and petabyte level and beyond it may very well be. Volume ofdata is the actual reason why Relational database management systems cannot beused or have failed to analyse Big Data. Apart from issue of being big, theother issues include the different formats that is a mix of structured,unstructured and semi-structured data, complexity, cost and reliability. Thereal time data from sensors and devices often termed as IoT, tweets fromTwitter, experimental data from research labs, data of customers ordering pizzafrom Dominos, transactional data from shopping sites and banks including allpayment information, you tube videos and many more are generating almost petabytesof data every second which is increasing the volume at an exponential rate. 2. Variety: Variety describes the different formats in which data isgenerating that do not allow themselves to be stored in structured relationaldatabase systems. These include a long list of data, whether documents ortext data in pdf, excel or docx format, emails, audios and videos, activityrecords from electronic devices messages from social media in form of images,tweets and other text messages the output from all types of machine-generateddata from sensors, devices, RFID tags, machine logs, cell phone GPS signals,stock prices to their purchase histories and much more.
Storage andretrieval of data of different types in cost efficient manner and visualizingthe extracted patterns to take decisions is a challenge for data analysts.3. Velocity: Velocity defines the data in motion that is moving or emergingevery second at a rapid rate. For example, the stream of the web log history ofpage visits and clicks by each visitor to a web site or readings taken from asensor.
This can be thought of as data coming from some sort of pipelinethat needs to be captured, stored, and analysed so that it can be used forstrategic decision making by the top level management. Consistency andcompleteness of fast moving streams of data are one concern. Matchingthem to specific outcome events, a challenge raised under Variety is another. Timelinessor latency can be incorporated as characteristics of data that somehow defines velocity. The other two V’s whichdescribes Big data are Veracity and Value.
Veracity signifies the quality ofdata. As inaccurate and noisy data having uncertainties is useless, veracityalso refers to the trustworthiness of data. Value on the other hand defines theimportance of data or its business value for any organisation in monetary terms Fig.
2:Characteristics of Big Data 3. Big DataAnalytics Tools and MethodsFasterand efficient methods are required to handle the multitudes of data flowing inand out of organizations daily. Traditional techniques for data management and analysishave failed to handle and mine such noisy data sets.
Therefore, there arises aneed for new tools and methods specialized for big data analytics, as well asthe required architectures for storing and managing such data. Accordingly, theemergence of big data has an effect on everything from the data itself and itscollection, to the processing, to the final extracted decisions. The main areaswhere Big data differs from normal data sets are the way it is processed, theamount of storage required and the techniques which can be used to extractpatterns for decision making.
Hadoop framework was introduced for Big Data Analytics.MapReduce algorithm has been implemented on Hadoop where a the analysis takesplace in two parts that is mapping and reducing. Various other tools like R,MongoDB, Cloudera are available for analysis of Big Data.4. Issues andChallenges 1) Fault Tolerance: The fault tolerance defines that the damage incase of failure should be minimum that is under threshold level so that only asubset of the whole task needs to be redone.
This can be achieved by dividingthe problem into certain parts and assigning each subset to a node which arethen made to work in parallel mode. The checkpoints can be inserted or appliedat regular intervals.2) Data Quality: Asdiscussed about veracity which is nothing but the quality of data, is animportant factor.
The big data that are so large in volume, quality should belooked into as there is no sense of wasting storage by storing low quality and irrelevantdata which will be resulting into useless patterns and conclusions.3) Scalability: The increasein number of Internet users and thus rise in scalability of Big data has led togrowth of cloud computing which allows sharing of expensive resources andprocessing of large volumes of data into large clusters in distributed mannerto increase the performance. Solid state devices have replaced the hard diskdrives but their performance for transferring data randomly and sequentially isnot same. So the decision of storage device is a big challenge from analyticspoint of view. 4) Security and Privacyissues: The use of social media at such a large extent has posed varioussecurity and privacy. Following information gives an overview of security andprivacy issues involved in Big Data environment:· Theaccess of social media in terms of text documents, images at shopping sites, songsin form of audios , youtube videos, online money transfer using netbankingfacilities etc, is increasing at such a rapid rate that it seems to be verydifficult to ensure pricy of personal data. The use of Internet has increasedthe threat of cyber attacks resulting into internet attack at every 10 minutesin our country. Various applications are keeping a look at our device’slocation.
5) Processing Issues: NewAnalytic algorithms and parallel processing is required for effective and rapidprocessing of Big Data. One of the challenges is to find out important datapoints from which useful patterns and maximum benefit out of it can beextracted.6) Storage and TransportIssues Big data processingissue has been well explained by the author of 4 by a very good example. Eachtime a new storage medium is invented the quantity of data becomes more andmore.The transfer of data from storage device to processing point for analysis atalmost 1 gigabyte per second having an effective transfer rate of 80% needs 100megabytes of bandwidth.
Fig.3: Processing in Big Data Environment5. Conclusion and FutureScopeSince thetime PC was invented by Steve Jobs, data is the biggest thing to hit theindustry. In thisresearch, we have reviewed the innovative topic of big data, which is recentlythe most researched area of IT industry as it is revealing remarkable andunusual opportunities.
Theincrease in count of internet users and advancements in technology leading tolow storage costs has made Big Data the most researched topic. The PC changedthe world now the Data movement is doing the same. The future of Big data isconcerned with large volumes due to exponential growth in the number ofportable and handheld devices. Various programs like Spark and Kafka haveenabled users to take decisions in real time. Data Mining and many of itstechniques like Binning, Normalization, Sampling has been used to pre-processdata and transform it eliminating outliers and other uncertainties.
The use ofInternet is growing security challenges to our personal data. With the increasein embedded sensors in devices, somewhat defined as Internet of things leadingto communication between devices in a network has open a path for automated vehicles, robots whichwould going to be a trend in future. Big data hasfind its applications in many areas like customer segmentation, transportation,biomedical, geostationary, retail purchase, telecom and manufacturing.Industries and organisations need to focus on and train their employees to workon tools and techniques that can be used to process data having varied formatsto enhance good decision making by taking into account the hidden and unknownpatterns extracted on mining the voluminous datasets. Big data analytics is ofgreat importance and if utilized properly, it can lead to technological andscientific levels. 6. References 1. Adams, M.
N.:Perspectives on Data Mining. International Journal of Market Research52(1),11–19 (2010) 2. Asur, S., Huberman,B.A.: Predicting the Future with Social Media. In: ACM InternationalConferenceon Web Intelligence and Intelligent Agent Technology, vol.
1, pp. 492–499(2010) 3. Bakshi, K.:Considerations for Big Data: Architecture and Approaches. In: Proceedings oftheIEEE Aerospace Conference, pp. 1–7 (2012) 4. Cebr: Data equity,Unlocking the value of big data. in: SAS Reports, pp.
1–44 (2012) 5. Cohen, J., Dolan, B.
, Dunlap,M., Hellerstein, J.M., Welton, C.: MAD Skills: New Analy-sis Practices for BigData. Proceedings of the ACM VLDB Endowment 2(2), 1481–1492(2009) 6.
Cuzzocrea, A., Song,I., Davis, K.C.
: Analytics over Large-Scale Multidimensional Data:The Big DataRevolution! In: Proceedings of the ACM International Workshop onDataWarehousing and OLAP, pp. 101–104 (2011) 7. Economist IntelligenceUnit: The Deciding Factor: Big Data & Decision Making. In:CapgeminiReports, pp. 1–24 (2012)Big Data Analytics: A Literature Review Paper 227 8. Elgendy, N.: Big DataAnalytics in Support of the Decision Making Process. MSc Thesis,GermanUniversity in Cairo, p.
164 (2013) 9. EMC: Data Science andBig Data Analytics. In: EMC Education Services, pp. 1–508(2012) 10.
He, Y., Lee, R.,Huai, Y., Shao, Z., Jain, N., Zhang, X.
, Xu, Z.: RCFile: A Fast and Space -efficientData Placement Structure in MapReduce-based Warehouse Systems. In:IEEEInternational Conference on Data Engineering (ICDE), pp. 1199–1208 (2011) 11.
Herodotou, H., Lim,H., Luo, G., Borisov, N.
, Dong, L., Cetin, F.B., Babu, S.: Starfish:ASelf-tuning System for Big Data Analytics. In: Proceedings of the Conferenceon Innovative Data Systems Research, pp.