The choice of the optimal order of the neighborhood for separation a spatial point pattern into a cluster and noise component (by the example of analysis of location of antique settlements in the Kerch Peninsula)

DOI: 10.35595/2414-9179-2020-4-26-257-265

View or download the article (Rus)

About the Author

Pavel A. Ukrainskiy

Belgorod State National Research University, Federal and Regional Centre for aerospace and ground monitoring of objects and natural resources,
Pobedy str., 85, 308015, Belgorod, Russia,
E-mail: pa.ukrainski@gmail.com

Abstract

When allocating spatial clusters of point objects, the problem of noise in the data often arises. This noise prevents clear boundaries of the clusters. One of the popular methods for separating the cluster and noise components of a point image is NNCR (Nearest Neighbor Clutter Removal), proposed in 1998 by Bayers and A.E. Raftery. The method is based on using the distance to the nearest neighbor in the calculations. The result of applying NNCR is highly dependent on the user selected neighborhood order. This paper describes a method for selecting the optimal neighborhood order for NNCR. This method focuses on the implementation of NNCR using the optional spatstat package of the programming language R. It is proposed to use the probability of the presence of a cluster component in the data as the main criterion for the optimal order of the neighborhood. With an optimal order of neighborhood, its value reaches its maximum value. In addition to this, it is proposed to analyze the probability of belonging to a cluster for all points assigned to the cluster component. For this, graphs of the dependence of the median and interquartile range of the probability of belonging on the order of the neighborhood are built. With an increase in the order of neighborhood, the median of the probability of belonging to the cluster component increases, tending to a value of 1.0. The interquartile range of the probability of belonging, on the contrary, decreases with an increase in the order of neighborhood, tending to a value of 0.0. The inflection in these graphs indicates the optimal order of the neighborhood. A user function is written in the programming language R, which makes it possible to automate the comparison of the NNCR results obtained in various orders of the neighborhood. It returns a matrix whose columns are the median of the probability of belonging, the interquartile range of the probability of belonging, and the probability of the presence of a cluster component in the data. The proposed method for choosing the optimal neighborhood order has been tested to analyze the point layer of ancient settlements of the Kerch Peninsula. For this data, the third order of neighborhood was optimal.

Keywords

point pattern analysis, antique settlement, spatial clustering, clutter removal

References

  1. Allard D., Fraley C. Nonparametric maximum likelihood estimation of features in spatial point processes using Voronoi tessellation. Journal of the American statistical Association, 1997. V. 92. P. 1485–1493. DOI: 10.2307/2965419.
  2. Baddeley A., Turner R. Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software, 2005. V. 12. No 6. P. 1–42. DOI: 10.18637/jss.v012.i06.
  3. Beilin D.V., Ermolin E.L., Maslennikov A.A., Smekalov S.L. Antique settlements of European Bosporus of Hellenistic time (catalog of monuments). Antiquities of the Bosporus, 2014. V. 18. P. 35–72 (in Russian).
  4. Byers S., Raftery A.E. Nearest-neighbor clutter removal for estimating features in spatial point processes. Journal of the American Statistical Association, 1998. V. 93. P. 577–584. DOI: 10.2307/2670109.
  5. Ester M., Kriegel H.-P., Sander J., Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). Portland: AAAI Press, 1996. P. 226–231.
  6. Heidenreich N.B., Schindler A., Sperlich S. Bandwidth selection for kernel density estimation: a review of fully automatic selectors. AStA Advances in Statistical Analysis, 2013. V. 97. No 4. P. 403–433.
  7. Hennig C., Coretto P. The noise component in model-based cluster analysis. Data Analysis, Machine Learning and Applications. Berlin–Heidelberg: Springer, 2008. P. 127–138.

For citation: Ukrainskiy P.A. The choice of the optimal order of the neighborhood for separation a spatial point pattern into a cluster and noise component (by the example of analysis of location of antique settlements in the Kerch Peninsula). InterCarto. InterGIS. GI support of sustainable development of territories: Proceedings of the International conference. Moscow: Moscow University Press, 2020. V. 26. Part 4. P. 257–265. DOI: 10.35595/2414-9179-2020-4-26-257-265 (in Russian)