Big Data Visualization by MapReduce for Discovering the Relationship Between Pollutant Gases



  • Yas A. Alsultanny uruk

Big data mining and pollution are extremely important issues in todays. An innovative method in this study was used for visually discovering the relationship between pollutant gases by MapReduce. One dimensional, two-dimensional, and three-dimensional visualization used to visualize the data, that was processed as an hourly reading for one year from an air quality monitoring station to study the behaviors of pollutant gases distribution, and to show graphically the distribution of one, two, or three gases. The number of readings used in this paper are 8760 hourly readings for each of the five pollutant gases under this study. Pearson correlation used to explore numerically the correlation between the pollutant gases, and eta factor used to evaluate the effect of one gas on the other pollutant gases. We found out by both methods, visually and numerically the same facts that related between the pollutant gasses. The ozone has a moderate negative correlation of value -0.622 with nitrogen dioxide, and weak negative correlation of value -0.248 with carbon monoxide, and -0.155 with carbon dioxide. Ozone has approximately no correlation of value .060 with silver dioxide. The carbon monoxide has moderate positive correlation of value 0.364 with carbon dioxide. The eta factor between ozone and nitrogen dioxide is very weak of values 0.292, and 0.009 with Sulphur dioxide, this proved an important fact that the ozone, nitrogen dioxide, and Sulphur dioxide sources are different. The study recommends that each country must analysis visually and numerical the big data that was collected yearly from the monitoring stations to control the pollution gases especially near the large industrial factories.


: Air quality, data mining, ; d gas pollution, carbon dioxide, correlation

[1] S. Lohr, (2012, Feb.). The Age of big data. New York Times. [Online]. Available:

[2] V. Vadivu, “A Review on big data analytics,” IJSDR, vol. 1, no. 10, pp. 264-266, Oct. 2016, [Online]. Available:

[3] M. Zaki, M. Hartmann, N. Feldmann, and A. Neely, (2014). Big data for big business? a taxonomy of data-driven business models used by start-up firms. Cambridge Service Alliance, United Kingdom. [Online]. Available:

[4] R. Kune, K. Konugurthi, A. Agarwal, R. Chillarige, and R. Buyya, “The anatomy of big data computing,” Software Pract Exper, vol. 46, no. 1, pp. 79-105, Oct. 2015. [Online]. Available:

[5] A. Honarvar and A. Sami, “Towards sustainable smart city by particulate matter prediction using urban big data, excluding expensive air pollution infrastructures,” Big Data Res., vol. 17, pp. 56-65. Sept. 2019, [Online]. Available:,

[6] A. De Mauro, M. Greco, M. Grimaldi, and G. Nobili “Beyond data scientists: a review of big data skills and job families,” in Proc. of the 11th International Forum on Knowledge Asset Dynamics, IFKAD 2016. June 15-17, 2016, pp.1844-1857, Dresden, Germany. [Online]. Available:

[7] D. Helbing, S. Frey, G. Gigerenzer, E. Hafen, M. Hagner, Y. Hofstetter, and V. Zicari, (2017, Feb.). Will democracy survive big data and artificial intelligence, Scientific American, a Division of Nature America Inc., USA. [Online]. Available:

[8] B. Zhang, R. M. Hughes, W.S. Davis, and C. Cao, “Big data challenges in overcoming China’s water and air pollution: relevant data and indicators,” SN Appl. Sci. 3, 469, March 2021. [Online]. Available:

[9] Y. Alsultanny, Université de Bourgogne, France “Comparison between data mining algorithms implementation," in Proceedings of the International Conference on Digital Information and Communication Technology and its Applications. DICTAP2011, Université de Bourgogne, Dijon, France June 21-23, 2011, part II, CCIS 167, pp. 628-641. [Online]. Available:

[10] S. Kumar, and K. Kaur, (2016). Review of data mining (knowledge discovery) in the future. IJARCS, vol. 7, no. 6, 269-272, 2016. [Online]. Available:

[11] P. Giudici, and S. Figini, “Applied data mining for business and industry,” 2nd Edition, Wiley & Sons Inc., New Jersey, USA, 2009. [Online]. Available:

[12] R. Kimball, and M. Ross, “The data warehouse toolkit: the complete guide to dimensional modeling,” 3rd Edition, Wiley & Sons, New Jersey, USA, 2013. [Online]. Available:

[13] G. Shmueli, C. Bruce, I. Yahav, R. Patel, and C. Lichtendahl, “Data mining for business analytics: concepts, techniques, and applications in R,” John Wiley and Sons, New Jersey, USA. 2017. [Online]. Available:

[14] H. Witten, E. Frank, A. Hall, and J. Pal, “Data mining: practical machine learning tools and techniques,” 4th Edition, Elsevier, Amsterdam, Netherlands, 2016. [Online]. Available:

[15] K. Purohit and K. Sharma, “Development of data mining driven software tool to forecast the customer requirement for quality function deployment,” IJBAN, vol. 4, no. 1, pp. 56-86, 2017. [Online]. Available:

[16] X. Wu, V. Kumar, R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. McLachlan, A. Ng, B. Liu, P. Yu, Z. Zhou, M. Steinbach, D. Hand, and D. Steinberg, “Top 10 algorithms in data mining,” Knowl Inf Syst. 14(1): 1-37, 2008. [Online]. Available:

[17] J. Han, and M. Kamber, “Data mining concepts and techniques,” 3rd Edition, Elsevier. Amsterdam, Netherlands, 2011. [Online]. Available:

[18] B. Srinivasan and P. Mekala, Mining social networking data for classification using REPTree. IJARCSMS, vol. 2, no. 10, pp. 155-160, 2014. [Online]. Available:

[19] A. Nagar, “A Comparative study of data mining algorithms for decision tree approaches using WEKA tool,” AENSI, vol. 11, no. 9, pp. 230-241, 2017. [Online]. Available:

[20] E. Dragomir, M. Oprea, M. Popescu, and S. Mihalache, “Particulate matter air pollutants forecasting using inductive learning approach,” Rev Chim-Bucharest, vol. 67, no. 10, pp. 2075-2081, 2016.

[21] J. Dean, and S. Ghemawat, (2008). “MapReduce: simplified data processing on large clusters,” CACM, vol. 51, no. 1, pp. 107-113. [Online]. Available:

[22] I. Hashem, N. Anuar, A. Gani, I. Yaqoob, F. Xia, and S. Khan, “MapReduce: review and open challenges,”. Scientometrics, vol. 109, no. 1, pp. 389-422, 2016. [Online]. Available:,

[23] S. Shirwadkar, “An evaluation of key-value stores in scientific applications”. MSc thesis, University of Houston, Texas, USA, 2017. [Online]. Available:

[24] O. Siriporn, and S. Benjawan, “Anomaly detection and characterization to classify traffic anomalies case study: TOT public company limited network,”. WASET, vol. 3, no. 1, pp.15-23, 2009. [Online]. Available:

[25] SAS, Predictive Analytics What it is and Why it Matters, 2017. [Online]. Available:

[26] P. Chen, and C. Zhang, (2014). “Data intensive applications, challenges, techniques, and technologies: a survey on big data,” Inf. Sci., vol. 275, no. 1, pp. 314-347. [Online]. Available:

[27] P. Cota, D. Rodríguez, R. González-Castro, and M. Gonçalves, “Massive data visualization analysis of current visualization techniques and main challenges for the future. Proceedings of the Information Systems and Technologies,” IEEE 12th Iberian Conference on June 21-24, 2017, pp. 190-195, Lisbon, Portugal. [Online]. Available:

[28] H. Teh, A. Kempa-Liehr, K. and Wang, “Sensor data quality: a systematic review,” J Big Data, vol. 7, no. 11, pp. 1-49, 2020. [Online]. Available:,

[29] Y. Alsultanny, “Data mining and visualization: meteorological parameters and gas concentration use case,” Proceedings of the XIX International Conference on Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL’2017, October 10-13, 2017, Moscow, Russia.m [Online]. Available:

[30] X, Chen and X. Chen, “Data visualization in smart grid and low-carbon energy systems: A review,” International Transactions on Electrical Energy Systems, vol. 31, no. 7, pp. 1-12, 2021. [Online]. Available:

[31] I. Maletic and A. Marcus, “Data cleansing: beyond integrity analysis,” Proceedings of the International Conference on Information Quality, October 20-22, 2000, pp. 200-209, Massachusetts Institute of Technology, USA. [Online]. Available:

[32] J. Vaske, J. Beaman, and C. Sponarski, “Rethinking internal consistency in Cronbach's Alpha,” Leis. Sci. vol., 39 no. 2, pp. 163-173, 2017. [Online]. Available:

Alsultanny, Y. A. . (2021). Big Data Visualization by MapReduce for Discovering the Relationship Between Pollutant Gases. Journal Port Science Research, 4(2), 56–63.


Download data is not yet available.