(PDF) A multifunctional matching algorit

Computers and Electronics in Agriculture 187 (2021) 106262 Contents lists available at ScienceDirect Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag Original papers A multifunctional matching algorithm for sample design in agricultural plots N. Ohana-Levi a, *, 1, A. Derumigny b, 1, A. Peeters c, A. Ben-Gal d, I. Bahat e, f, L. Katz d, e, f, g, Y. Netzer i, j, A. Naor h, Y. Cohen e a Independent Researcher, Variability, Ashalim 85512, Israel b Department of Applied Mathematics, Delft University of Technology, Mourik Broekmanweg 6, 2628 XE Delft, the Netherlands c TerraVision Lab, Midreshet Ben-Gurion 8499000, Israel d Institute of Soil, Water and Environmental Sciences, Agricultural Research Organization, Gilat Research Center, Mobile post Negev 2, 85280, Israel e Institute of Agricultural Engineering, Agricultural Research Organization, Volcani Center, P.O. Box 15159, Rishon LeZion 7505101, Israel f The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, The Hebrew University of Jerusalem, The Robert H. Smith Faculty of Agriculture, Food & Environment, Rehovot 76100, Israel g Department of Soil and Water Sciences, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, P.O. Box 12, Rehovot 7610001, Israel h Department of Precision Agriculture, MIGAL Galilee Research Institute, Kiryat Shmona 11016, Israel i Department of Agriculture and Oenology, Eastern R&D Center, Israel j Department of Chemical Engineering, Ariel University, Ariel 40700, Israel A R T I C L E I N F O A B S T R A C T Keywords: Collection of accurate and representative data from agricultural fields is required for efficient crop management. Partially-observed data Since growers have limited available resources, there is a need for advanced methods to select representative Representative sampling given covariates points within a field in order to best satisfy sampling or sensing objectives. The main purpose of this work was to Two-phase study develop a data-driven method for selecting locations across an agricultural field given observations of some Agricultural sampling Spatial autocorrelation covariates at every point in the field. These chosen locations should be representative of the distribution of the covariates in the entire population and represent the spatial variability in the field. They can then be used to sample an unknown target feature whose sampling is expensive and cannot be realistically done at the popu lation scale. An algorithm for determining these optimal sampling locations, namely the multifunctional matching (MFM) criterion, was based on matching of moments (functionals) between sample and population. The selected functionals in this study were standard deviation, mean, and Kendall’s tau. An additional algorithm defined the minimal number of observations that could represent the population according to a desired level of accuracy. The MFM was applied to datasets from two agricultural plots: a vineyard and a peach orchard. The data from the plots included measured values of slope, topographic wetness index, normalized difference vegetation index, and apparent soil electrical conductivity. The MFM algorithm selected the number of sampling points according to a representation accuracy of 90% and determined the optimal location of these points. The algorithm was vali dated against values of vine or tree water status measured as crop water stress index (CWSI). Algorithm per formance was then compared to two other sampling methods: the conditioned Latin hypercube sampling (cLHS) model and a uniform random sample with spatial constraints. Comparison among sampling methods was based on measures of similarity between the target variable population distribution and the distribution of the selected sample. MFM represented CWSI distribution better than the cLHS and the uniform random sampling, and the selected locations showed smaller deviations from the mean and standard deviation of the entire population. The MFM functioned better in the vineyard, where spatial variability was larger than in the orchard. In both plots, the spatial pattern of the selected samples captured the spatial variability of CWSI. MFM can be adjusted and applied * Corresponding authors. E-mail address:

[email protected]

(N. Ohana-Levi). 1 Equal contribution. https://doi.org/10.1016/j.compag.2021.106262 Received 27 December 2020; Received in revised form 7 June 2021; Accepted 8 June 2021 Available online 15 July 2021 0168-1699/© 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 using other moments/functionals and may be adopted by other disciplines, particularly in cases where small sample sizes are desired. 1. Introduction proposes an approach that aims to optimize data collection efficiency by collecting a small number of observations representing the statistical Accurate information is necessary for decision-making and man distribution of target field properties while avoiding information agement in precision agriculture. Proliferation and improvement in redundancy due to spatial autocorrelation and accounting for data technological applications and data availability enable extension of values that are not independent (Haining, 2015a). This means that the precision agriculture and “smart farming” (Kamilaris et al., 2017). main purpose is to sample the population distribution and not the spatial Gaining more product with less input in agricultural plots can be ach distribution, although the two frequently coincide (López-Vicente et al., ieved using multiple applications, including field management accord 2020; Roudier et al., 2008). ing to the spatial variability of certain characteristics, namely In the field of digital soil mapping, conditioned Latin hypercube management zones (MZs), for irrigation, fertilization and pest control sampling (cLHS) was introduced by Minasny & McBratney (2006) as a (Monaghan et al., 2013; Nawar et al., 2017; Park et al., 2007). Tools for sampling strategy of an area where ancillary data are available. The spatial management include sampling soil and specific plants, directly or cLHS algorithm is a stratified random procedure that provides sample indirectly, to collect relevant data for decision-making. Such data can design based on multiple variables and their distribution, assuming that include soil properties, crop size and growth, crop physiological status, these variables represent the variability of the target variable to be and yield characteristics (De Lannoy et al., 2006; Gandah et al., 2000; sampled. The technique optimizes the N sites to be included in the Jonckheere et al., 2004; Menzel & Simpson, 1986) collected manually or sample, such that the multivariate distribution of the ancillary data is via remote or proximal sensing. In-field sensors, continuously collecting maximally stratified (Minasny & McBratney, 2006), providing full environmental, soil or crop-related data add a temporal dimension to coverage of each explanatory variable. The cLHS method has been potential decision-making processes. widely used in the field of digital soil mapping to assess variability in soil Although sampling technologies have become more affordable and, properties at regional scales, using ancillary data from a wide range of in some cases, automated, managers remain limited regarding the sources such as governmental data and remote sensing retrievals number of sampling points that may be feasibly collected within a given (Mulder et al., 2013). The method is also commonly used in modeling for field. Therefore, there is a need to develop methods to select points or selection of calibration and validation sets of the data (Richter et al., individual plants to sample that represent the entire field/population. 2016). Over time, several extensions to the cLHS were introduced. The Sampling techniques within agricultural fields vary according to the variance quadtree algorithm was designed to account for non- type of representation required. Spatial sampling is commonly needed to stationarity in the spatial sampling (Minasny et al., 2007). The flexible address strategies for precision agriculture (Oliver, 2010). Spatial rep Latin hypercube sampling was suggested for selecting alternative sites in resentations are designed to account for within-field heterogeneity and cases where sampling would only be possible in specific regions. For spatial autocorrelation (Haining, 2015b) and represent the spatial dis example, in proximity to tracks and roads or avoiding privately owned tribution. In many cases, the samples are the basis for spatial interpo land (Clifford et al., 2014). cLHS has rarely been used for agricultural lation of the collected variables, which account for differences or purposes (as shown by Adamchuk et al., 2011 & Israeli et al., 2019), and similarities between measurements (Stein & Ettema, 2003). Sampling was shown to be less favorable in cases where a low number of locations plans for interpolation purposes are frequently performed using random, is required for sampling (Werbylo & Niemann, 2014). However, the stratified random, systematic, and adaptive spatial sampling throughout notion of using ancillary data to optimize the sampling design was the agricultural field (Haining, 2015b; Stein & Ettema, 2003). In these shown to be highly effective (Zhang et al., 2017). cases, the size or density of the data collected is of major importance and Using known field properties can assist in determining the pattern is a function of the degree of spatial autocorrelation (Kerry et al., 2010). and optimal sampling intervals to avoid over- or under-sampling (Kerry An additional related topic of interest has, in recent years, focused on the et al., 2010). The representation of a target variable using a minimum deployment of sensors across the field (Ben-Gal et al., In Press). For number of samples relies on a set of predefined attributes with distribution of wireless sensors, a main concern for their location is maximum coverage in the field. These previously measured variables achieving maximum coverage of the field without obstruction (Aqeel- are assumed to be related to the target variable, thus building confidence Ur-Rehman et al., 2014), but for all sensors, as for sampling in general, that their characteristics and statistical moments (i.e., statistical pa the distance between the sensors is often based on field attributes rameters that describe the location, scale or shape of a distribution) can (Visalini et al., 2019). Commonly in agricultural practices, MZs are used serve as proxies for the unknown target variable. The use of multiple as a general frame for sampling since it is assumed that each MZ is ho variables that are efficiently and easily acquired in the field in terms of mogeneous in space (Gavioli et al., 2019). Therefore, following a labor, time, and cost, should therefore be beneficial. delineation into MZs, sample-points are selected arbitrarily within each This work proposes an algorithm to quantify a multifunctional zone (Johnson et al., 2003) to represent field properties. This approach matching (MFM) criterion to statistically define the “best” set of loca may be problematic, since the spatial distributions of MZs are subjected tions for sampling within a field. The algorithm was illustrated based on to the selected clustering algorithm used to generate them (Chang et al., two datasets, one from a peach orchard and one from a wine grape 2003). Additionally, clustering algorithms rarely generate perfect sep vineyard. The resulting selection of observations to sample (e.g. trees/ arations of groups, and tend to introduce dissimilarities among clusters vines) were chosen to be representative of the joint distribution of (i.e., misclassified individuals) (Raskutti & Leckie, 1999), thus intro several known variables, which are supposed to be related in some way ducing bias to the sample. Few studies have dealt with sampling with the to the distribution of the unknown target variable. Therefore, observa goal of representing the distribution of an unknown target variable tions of the target variables at those points can be representative of the across the field while using a set of known variables, such as the method entire (unknown) population distribution of the target variable. The proposed by Bazzi et al. (2019). While sampling for interpolation pur algorithm considers the spatial variability in the field as a cause of po poses requires estimated values for as many points in the field as tential information redundancy and is designed to avoid it by applying a possible, sampling for representation of the population distribution spatial constraint. However, as MFM is not a spatial algorithm, it was not commonly aims to reduce the number of locations/observations and to designed to calculate and account for spatial autocorrelation or spatial represent the field using the lowest possible number of points. This study stratified heterogeneity (SSH) (Wang et al., 2016) inherently. A second 2 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 algorithm was further suggested to determine the most suitable number factors (Goovaerts & Kerry, 2010; Heuvelink & van Egmond, 2010) that of observations within the field according to a predefined level of ac enable efficient determination of the appropriate set of variables used to curacy reduction from the maximum number of observations necessary, define the sample design needed to represent an unknown target vari given the spatial constraint. The benefits of the MFM algorithm may able. Fortunately, some of these factors may be acquired at the field support agronomic applications for both commercial and scientific scale with low effort and costs. purposes, such as berry sampling prior to harvest in vineyards (i.e, ◦ Brix, The main objective of this study was to develop a method for opti pH, total acidity) (Kasimatis & Vilas, 1985; Sinton et al., 1978) or fruit mizing locations to sample an unknown target feature across an agri mass/size and red skin coloration in orchards (Gasic et al., 2015; Shinya cultural field by accurately representing a multivariate distribution of a et al., 2013). population of observations. Specific secondary objectives were: (1) to The main algorithm is based on the comparison between multiple develop an algorithm that selects an approximated best set of observa moments of the population and of the sample. It finds the best sample tions; (2) to define the number of observations according to a specified whose moments are simultaneously most similar to those observed in accuracy level of population representation; and (3) to validate the al the whole population. Comparison of moments has long been common gorithm performance using actual agricultural datasets and compare the in statistics, dating from Pearson (1894) who first proposed to estimate a proposed algorithm against uniform random sampling and cLHS. parameter by matching empirical to theoretical moments. Since Pear son, the so-called “method of moments” has become popular due to its 2. Methodology generality and wide applications, especially in econometrics (Hall, 2005; Hansen, 1982), but also in other fields where complex models are 2.1. Study sites and experimental design used, such as electromagnetism (Gibson, 2014) and hydrology (Kitani dis, 1988) among many others. The main idea behind the method of The MFM was tested on two datasets, based on measurements con moments is very intuitive: given a parametric statistical model and an ducted in two agricultural experimental sites, a wine grapes vineyard observed sample, the unknown parameter is estimated as the value and a peach orchard (Fig. 1), both subjected to experiments investi implicitly defined by theoretical moments matched to empirical ones. In gating the potential for variable rate irrigation management. The vine this study, the framework is slightly atypical, even if the same kind of yard (Fig. 1b) is located in the central mountains in Israel and covers an intuition can be used. We do not assume any parametric model for the area of 2.5 ha, 2.3 of which was used for the study. The area is char dataset but wish to find a subset of the dataset whose distribution is close acterized by Mediterranean climate with hot, dry summers and rainy to the distribution of the whole population. The definition of closeness winters and is located at an elevation ranging between 675 and 900 m will be based on the supremum of the distance between several func above sea level. The commercial vineyard of Vitis vinifera L. ‘Cabernet tionals of the population distribution and of the sample distribution. Sauvignon’ was planted in 2011 with vine spacing of 1.5 m within rows Various commonly used distances between distributions, such as the and 3 m between rows, resulting in a density of 2222 vines per hectare. Kolmogorov distance, the total variation distance and the Wasserstein The number of vines within the study site was 3893. Border vines were distance, can be defined as the suprema of functionals (Gibbs & Su, removed from the analysis since sampling grapevines in border-rows 2002). may introduce a bias to the data due to margin effects (Murolo et al., To the best of our knowledge, there has not been any previous sci 2014). After this step, the population consisted of 3523 vines. The entific focus on developing a method for using moment (or functional) vineyard is surrounded by natural and planted vegetation (figs and in matching to represent a (multivariate) population of field attributes digenes vine), which include low shrubs, a pine forest and Mediterra using a limited number of observations for agricultural practices. nean trees (Ohana-Levi et al., 2019). However, much research has dealt with quantifying associations, both in The peach orchard (Fig. 1c) is located in Israel’s Upper Galilee region space and time, among different environmental, soil, plant, and terrain and covers an area of 4 ha. It is also characterized by Mediterranean Fig. 1. The two study site locations within Israel (a): the vineyard near Mevo Beitar (red) (b) and the peach orchard near Mishmar HaYarden (blue) (c). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) 3 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 climate and the elevation of this plot ranges between 168 and 187 m NDVI values were averaged for a 1 m radius circle buffer around the tree above sea level. A commercial peach orchard of Prunus persica cv. 1881 center. NDVI is a well-established measure of relative green vegetation was planted in 2007 with spacing of 2.6 m between trees and 5 m be and crop status. NDVI has been used to estimate a large number of tween rows, resulting in a density of 770 trees/ha. The number of trees vegetation properties, including leaf area (Johnson, 2003), photosyn within the study site was 2402. Border trees were removed from the thetic response, and water stress (Aguilar et al., 2012; Kim & Glenn, analysis, resulting in a population of 2151 trees. The orchard is located 2015). east of a water reservoir and adjacent to other orchards. Validation of the MFM algorithm for each plot was performed using measurements of crop water stress index (CWSI), to assess how well the 2.2. Data collection selected observations represent the entire population distribution of a target variable. Data used for testing the algorithm were acquired during the growing Crop water stress index: CWSI was computed using thermal UAV season of 2017 for both plots. The following four variables were used to imagery acquired with a FLIR SC2000 thermal camera (FLIR* Systems test the algorithm, based on the premise that these attributes may be Inc., Billerica, MA, USA), and is defined in Eq (3): easily acquired and have a strong effect on, or may serve as indicators to, Tcanopy − Twet the plant water conditions within the field: CWSI = (3) Slope: Extracted from a digital elevation model (DEM) based on an Tdry − Twet elevation survey conducted across the vineyard with a spatial resolution where Tcanopy is the temperature of a specific pixel, Twet is the lower of 1 m. In the peach orchard, the elevation was based on a DEM with a baseline calculated from the entire field and Tdry is the upper baseline. spatial resolution of 4 m, provided by Survey of Israel. The slope in The calculation was conducted on pixels consisting of pure vegetation, dicates the maximum rate of change in elevation between one cell and after masking out the soil surface and mixed pixels. Local air tempera its eight immediate neighbors. The specific cell is given a value in ture was measured using meteorological stations located in the agri percent, signifying the steepest downhill descent, meaning that lower cultural plots. Tdry was defined as air temperature +7 ◦ C in the vineyard values correspond to flatter terrain. Slope was calculated using the Slope and air temperature +5 ◦ C in the peach orchard (Rud et al., 2013). Twet tool in the Surface toolset, ArcGIS PRO (ESRI Inc., 2017). Slope has been was calculated by averaging the lowest 5% of temperature values in the established as one of the main factors affecting soil moisture and thermal image (Cohen et al., 2017; Rud et al., 2014). The value provided consequently plant physiology (Ohana-Levi et al., 2020) and plant for each observation in the vineyard was calculated as the minimum drought stress (Becker et al., 1988). CWSI value at a radius of 70 cm around each vine in the vineyard Topographic wetness index (TWI): This index simulates static soil (Ohana-Levi et al., 2019), and the mean value of the lowest 33% of CWSI moisture conditions and quantifies the distribution of accumulated area pixels present within a radius of 150 cm around each tree in the peach across downslope pixels. TWI was calculated using Eq. (1) and applied to orchard (Katz et al., In Review; Meron et al., 2010). CWSI is widely used the DEM data with the “dynatopmodel” package in R (Metcalfe et al., to quantify water status in plants, specifically in agricultural crops 2018): (Khanal et al., 2017; Meron et al., 2010), including orchards and vine TWI = ln(A/tanβ) (1) yards (Bellvert et al., 2016). It is known to be associated with terrain, soil and vegetation factors (Colaizzi et al., 2003; O’Shaughnessy et al., where A is the upstream contributing area (m2) for each pixel and β is 2011; Ohana-Levi et al., 2019) and was therefore selected as a target the local slope in the steepest downhill direction in degrees. TWI ac variable indicating water condition of the crops. quires the spatial resolution of the original DEM, in this case 1 m in the The descriptive statistics of the five variables (four inputs and vali vineyard and 4 m in the peach orchard. TWI is used to assess soil dation) are summarized in Table 1 for the vineyard and the peach or moisture and its effect on plant conditions, mostly in ecological appli chard, including mean, standard deviation (SD), coefficient of variation cations (Petroselli et al., 2013) and is known to influence vine vigor and (CV) and the range of values. Spatial representation of these five vari water status (Bahat et al., 2021). ables is illustrated in Figs. S1 and S2 (Supplementary material A). The Apparent soil electrical conductivity (ECa): A survey was conducted relationships among the input variables for each of the plots are pro in each of the plots to acquire soil ECa measurements. In the vineyard, vided in a Kendall’s tau rank correlation matrices (Fig. 2). the survey was performed on April 10, 2018 by a four-probe soil resis tance sensor (Soil EC 3100) (Veris Technologies) at a field capacity, 2.3. Spatial autocorrelation representing 0–30 cm depth. In the peach orchard, the survey was conducted on September 25, 2017 using an EM38-MK2 (Geonics Ltd., Positive spatial autocorrelation is present where adjacent observa Ontario, CA), representing 0–150 cm depth. Continuous ECa maps were created using 4037 and 35,640 points recorded for the vineyard and Table 1 peach orchard respectively, and interpolated using kernel interpolation, Descriptive statistics of the variables used as input and validation for the by way of a moving window calculating the shortest distance between multivariate moment-matching criterion algorithm. TWI, ECa, NDVI, and CWSI points (Mühlenstädt & Kuhnt, 2011). stand for topographic wetness index, apparent electrical conductivity, normal ized difference vegetation index, and crop water stress index, respectively. SD Normalized difference vegetation index (NDVI): Calculated from and CV stand for standard deviation and coefficient of variation, respectively. multispectral imagery that was acquired by an unmanned aerial vehicle (UAV) equipped with a multispectral MicaSense RedEdge camera Variable Mean SD CV (%) Range (MicaSense* Inc, Seattle, USA). NDVI uses the red and near infrared Vineyard bands of the image according to Eq. (2): Slope (◦ ) 4.33 1.72 39.67 1.72–10.28 TWI 7.23 1.49 20.67 4–11.51 ρ(NIR) − ρ(red) ECa (mS/m) 32.41 10.76 33.19 12.32–64.21 NDVI = (2) NDVI 0.5 0.08 15.81 0.16–0.74 ρ(NIR) + ρ(red) CWSI 0.29 0.14 49.65 0–0.89 Peach orchard where ρ is the reflectance value for each pixel, ranging from − 1 to 1. The Slope (◦ ) 5.74 2.11 36.82 0.37–17.37 images were acquired on September 5 and August 29, 2017, for the TWI 7.31 1.17 16.04 3.41–12.23 vineyard and orchard, respectively. The value given to each vine was the ECa (mS/m) 57.25 27.89 48.71 17.58–187.56 average NDVI value for a rectangle with a length of 1.4 m and width of NDVI 0.67 0.02 3.72 0.56–0.75 CWSI 0.19 0.11 57.89 0–0.68 80 cm covering the central area of the canopy. For the peach trees, the 4 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 Fig. 2. Kendall’s tau rank correlation matrices for the input variables of the vineyard (a) and peach orchard (b). TWI, ECa, and NDVI stand for topographic wetness index, apparent electrical conductivity, and normalized difference vegetation index, respectively. tions have similar data values. Indication of spatial autocorrelation sample, when the data lacks stationarity (Oliver & Webster, 1986). An implies information redundancy, which, in the case of representative empirical semivariogram plots half the squared difference between two sampling while using a small number of observations, should be avoi observations (i.e. semivariance) against their actual distance in space, ded. The stronger the spatial autocorrelation, the more important it is to averaged for predefined distance classes. The semivariogram consists of minimize duplication of information within the sample size that is parameters defining its structure, including the sill (average half selected (Haining, 2015a). A representation of the distribution of an squared difference of two observations), range (the maximum distance unknown variable based on a set of known variables should reduce the at which pairs of observations display spatial dependence), and nugget duplication of information to a minimum. Applying the MFM algorithm, (the variance within the sampling unit). The model semivariogram as therefore, requires definition of a minimum distance between potential sumes that there is no spatial association for distances larger than the sample observations within a specific subset of the population. We first range (Wagner, 2003). The multivariate semivariogram computation, quantified spatial autocorrelation using Moran’s I statistic (α = 0.05) to equivalent to summing univariate variograms, included the set of four confirm that all variables display spatial dependence (Moran, 1948). input variables. It was calculated using the “adespatial” package in R Then, we computed a multivariate empirical semivariogram, showing (Stéphane Dray et al., 2020), with a minimum distance value of 8 m, a how spatial structure varies with distance (Isaacs & Srivastava, 1989) to maximum distance that is equal to half the maximum length of each determine the minimum distance between two observations within a plot, and 80 classes of lag distances. A variogram model was then fitted Fig. 3. Flowchart of the multivariate moment-matching criterion for the specific case studies on the vineyard and peach orchard (Fig. 1). SD stands for stan dard deviation. 5 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 using the “gstat” package in R (Pebesma, 2004), with the Bessel and Matern (with an optimally fitted kappa) variogram models used for the The output of Algorithm 1 is not an estimator, but rather a sample. It vineyard and peach orchard, respectively. For each semivariogram, the consists of a list of locations (e.g., trees, grapevines) that are recom range values were used as the minimum distance between observations mended to be used for sampling. within a sample. It should be noted that other methods of computing the minimum distance required to avoid spatial dependence exist (Griffith, Note that several constraints should apply to the set of inputs. First, it 2004; Koenig, 1999), and the user may select any method they see fit to is better to use variables that are related to the target variable. If this determine this parameter. condition is not satisfied, then Algorithm 1 is only as good as simply selecting a sample at random among the population. Second, the mini mum distancer and the size of the sampleN should not be too large in 2.4. Multivariate functional-matching criterion both cases so that the set of samples satisfying the conditions has enough elements. Finally, the number of samplesK to be compared should not be In this study, we considered the problem of finding the optimal too small to have a good approximation of the best criterion, but it location of observations for the variable CWSI given observations from should not be too large to ensure a reasonable computation time. Should the variables TWI, NDVI, Slope and ECa. This means that the observa the user wish to express different importance levels for specific vari tions space is X = Rd withd = 4 variables and forj = 1, ⋯, 4, X(j) de ables, weights may be considered. This algorithm is available in the R notes the jth component of the random vectorX. Algorithm 1, package “MFunctMatching” (Derumigny & Ohana-Levi, 2020). represented in Fig. 3, proposes to match 14 functionals of the population distribution to the sample distribution and find the sample that has the 2.5. Criterion reduction for defining sample size closest moments. These functionals are: In some cases, the user may determine a predefined sample sizeS [ ] [ ] • the 4 different meansφ1 (P) = EP X(1) ,φ2 (P) = EP X(2) , φ3 (P) = (according to existing resources such as a specific number of sensors, [ ] EP X(3) , φ4 (P) = EP [X(4) ], limited manpower to conduct the sampling, and so on). This will require [ ] determining a set of observations that represent the population distri • the 4 different standard deviations φ5 (P) = sdP X(1) , φ6 (P) = [ ] [ ] bution as accurately as possible, given this limitation. In that case, they sdP X(2) , φ7 (P) = sdP X(3) , φ8 (P) = sdP [X(4) ], [ ] may directly apply Algorithm 1 to the data. In other cases, the choice of • the 6 different Kendall’s tau φ9 (P) = τP X(1) , X(2) , φ10 (P) = the number of observations to include in the sample is less obvious. One [ ] [ ] [ ] τP X(1) , X(3) , φ11 (P) = τP X(1) , X(4) , φ12 (P) = τP X(2) , X(3) , possibility is to determine how many observations are necessary to [ ] [ ] φ13 (P) = τP X(2) , X(4) , φ14 (P) = τP X(3) , X(4) , where τP [X, Y] de represent the population distribution within a given accuracy. We pro notes Kendall’s tau for the distribution P between variables X and Y. pose a method to choose a minimum sample size, for cases where the number of observations is not predefined. The sample size is a function Therefore, the criterion is defined asφ(P) : = ∑14 In practice, of the error that the user is willing to tolerate in representing of the j=1 φj (P). population. The algorithm for defining the sample size to select relies on there is no access to the true distribution P* of the variables, but only to different executions of Algorithm 1 forN in an interval [Nmin ,Nmax ], until a population of sizen, whose empirical distribution will be denoted reaching the maximum sample size, or until reaching convergence. In byPn . One could wonder about the difference between the true criteria order to have a higher precision, for each value ofN, Algorithm 1 was run φ(P) and its estimated population versionφ(Pn ). Asφ1 ,⋯,φ13 are regular R = 1000 times. The user may select the smallest sample size that sat functionals, we can apply the central limit theorem to them (under isfies a certain reduction of the maximum number achieved, i.e. the suitable regularity conditions), to get the asymptotic approx √̅̅̅ smallest sample size N* such that the reduction of the criteria from Nmin imationφ(P) − φ(Pn ) ≈ nG for some Gaussian variableG. If S a random sample of sizeN (among then population points) is considered, the to N* is a given fraction α, such as 90%, of the maximal reduction of the criteria fromNmin toNmax . This procedure is displayed below as Algo quantity Δn,S := (φ(Pn ) − φ(PS ) )2 is a random variable bounded below rithm 2 and is available in the R package “MFunctMatching” (Der by 0. Let us call its (essential) minimumΔn,min conditionally to Pn . By umigny & Ohana-Levi, 2020). classical results, the best sample S* of K random samples should sat isfyΔn,Sk* ≈ Δn,min + O(K− 1 ). A precise computation of the distribution of Algorithm 2. (Choice of the optimal sample size satisfying a reduction of the the best error Δn,min achievable by a sample of sizeN amongn observa criteria) tions is out of the scope of this article and left for future research. Inputs 1–2: As in Algorithm 1. Algorithm 1. (Multivariate functional-matching algorithm for selection of a Input 3: A target percentage α denoting the chosen reduction of the mean criteria. Input 4: A range of relevant sample sizes[Nmin , Nmax ]. representative sample) Input 5: A numberR of replications for the computation. Input 6 (optional): A tolerance tol for the convergence of the mean criteria Input 1: A population of n observations of the d variables and their positions in a 1. For eachN = Nmin to Nmax do coordinate system a. RunR times Algorithm 1 using the sample sizeN. This gives a mean Input 2: Tuning parameters: minimum distancer, size of the sampleN, number of criterionCmean,N . samplesK. b. If|Cmean,N − Cmean,N− 1 | < tol then affect Nmax := N and go to step 2. 1. Select a subset S of the population to be represented { 2. ComputeN* : = min N : Cmean,Nmin − Cmean,N > α(Cmean,Nmin − Cmean,Nmax ) } 2. For eachk = 1 to Kdo: Return the chosen sample sizeN* a. Choose at random one sampleSk ⊂S of sizeN satisfying the minimum distance condition. b. Compute the d means using the sampleSk . The minimum sample sizeNmin should be chosen large enough to c. Compute the d standard deviations using the sampleSk . enable reliable calculation of moments. Note that it is possible to choose d. Compute the Kendall’s tau among all pairs of d variables using the sampleSk . 3. Compute the means, standard deviations, and Kendall’s tau on S. Nmax = n and wait for the convergence of the criteria to0. The chosen 4. Compute and normalize the difference between computed on the subset S and on sample sizeN* increases with the criterion reduction fractionα, as a sample Sk . bigger reduction corresponds to a larger sample size. In a complemen 5. For eachk = 1 to Kdo: tary way, a visualization of the mappingN ↦ Cmean,N may provide in a. Compute the criterion of Crit(Sk ) by summing the normalized differences. formation on the degree of accuracy that is granted by a larger sample 6. Compute k* as the minimizer of the criterionCrit(Sk ) Return the sampleSk* and the associated criterionCrit(Sk* ). size, or that is lost by a smaller sample size. In this study, the criteria reduction percentage was chosen asα = 90%. 6 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 An empirical analysis of the sensitivity of Algorithm 2 to the mini each iteration. We ran each of the three methods (MFM, uniform random mum distance r was performed, to evaluate the effect of spatial auto sampling, both with spatial constraints and cLHS) 3000 times. Each correlation on the sample size for each of the plots. Algorithm 2 was time, the MFM chose the best sample (i.e., corresponding to the lowest tested usingK = 500, and10 ≤ r ≤ 16. In this study, tol was chosen criterion) amongst 2000 random samples drawn uniformly. For each of as min Cmean,N , however tolerance may be defined according to the these best samples, we computed statistics comparing the population N∈[Nmin ,Nmax ] and sample distributions of the target variable CWSI: differences in user’s choice. mean, differences in standard deviation, and KS. We compared the 2.6. Validation against uniform random sampling and cLHS Table 2 A qualitative comparison between the conditioned latent hypercube sampling In this study, we know the measured value of CWSI at each of the (cLHS) and the multifunctional matching (MFM) algorithms. N is population trees/vines, and therefore this information can be used to compare the size, while n stands for sample size. population distribution of CWSI with a sample distribution of CWSI. This cLHS MFM allows us to measure the success rate of the proposed algorithm and the Assumption n≪N n≪N accuracy by which the selected sample represents the target variable regarding the CWSI values for the entire population of vines/peach trees (specified in sample size Tuning parameter Simulated annealing with Straightforward: number of subsection 2.2). Fig. 4 illustrates a density estimate of CWSI for the two for the model temperature T, the rate of tested subsamples plots, using kernel smoothing. The CWSI values in the vineyard were, on temperature decrease d, average, higher than those in the peach orchard, due to the strategy of number of iterations or deficit irrigation and subsequent greater water stress. The variance of stopping criteria CWSI in the vineyard was also higher due to a stronger spatial hetero Difficulty Depends on the convergence Straightforward: in general, geneity of terrain and soil attributes. calibrating the properties of the Markov 10,000 subsamples yield tuning chain acceptable performance An analysis was conducted to assess the quality of the outcome parameter sample and compare the distribution of the population CWSI with the Type of matching Iterative process Brute-force comparison CWSI of the sample generated by Algorithm 1 after defining the optimal Moments to match Fixed: n × k quantiles + Flexible: any combination of number of locations for the sample using Algorithm 2. The comparison k(k +1)/2 correlations, moments; by default, k between the sample and the entire population was performed using the where k is the number of means + k standard numerical variables deviations + k(k − 1)/2 Kolmogorov-Smirnov Goodness-of-Fit (KS) test statistic (Marsaglia et al., Kendall’s tau 2003), SD, mean, CV, range, quantiles of the population represented by Bias/variance Small bias: the quantiles Less effective for large sample the sample and value of the criterion of the chosen sample. compromise for ensure that the whole sizes than the cLHS Finally, the MFM algorithm was compared to two reference methods, large sample size distribution of each variable concerning the accuracy of namely a uniform random sample with identical spatial constraints (e.g. in the subsample is close to other moments than the ones their population distribution used for the fit the same minimum distance as specified in Algorithm 1, Input 2), as well Bias/variance Less effective for small Small variance: the (by as to the cLHS provided in the R package “clhs” (Roudier, 2011). A compromise for sample sizes (Werbylo & default) smaller number of qualitative comparison between MFM and cLHS is also detailed in small sample Niemann, 2014) moments to match makes the Table 2. The cLHS algorithm was applied using the default parameter size estimation more stable for ization values specified by the “clhs” package. These included the small sample sizes Type of method Nonparametric (the number Parametric (fixed number of number of iterations for the simulated annealing process, initial tem of constraints increase with constraints) but could be perature (the control parameter of the annealing process) at which the n) nonparametric if the set of simulated annealing begins (between 0 and 1), and a decreasing factor moments is increased between 0 and 1, signifying the rate at which temperature decreases at Fig. 4. Crop water stress index density plots quantified by drone images acquired for the vineyard (September 5, 2017) and peach orchard (August 29, 2017). 7 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 distribution of these 3 goodness-of-fit measures for MFM against the two 3.2. Chosen sample size and location of the sample observations other methods using Welch’s t-test for statistical difference between two means, and we compared them graphically using quantile–quantile plots The results for the sensitivity analysis are presented in Table 3, (QQ-plots). The Q-Q plots enable visualization of the distribution of illustrating the relationship between spatial autocorrelation and sample differences between the sampled moments and the population moments size. In the vineyard, with a smaller areal coverage then the peach or for the MFM against the two reference methods. This procedure was chard, the spatial constraint prevented a convergence of the mean cri implemented in R (R Core Team, 2020). terion forr < 10. In the peach orchard, which consists of a larger area, r was not found to constrain the spatial distribution of the observations 2.7. Assessment of management zones forr < 16. In both cases, higher r values were associated with smaller sample size. In recent years, variable-rate in-field spatial management of various The inputs to the MFM algorithm are specified in Table 4. The results agricultural practices (irrigation, fertilization, and plant protection) has in Fig. 5 represent the outcome of Algorithm 2 using the respective been recognized by the scientific community as valuable due to the minimum distancer that were defined by the ranges of the semivario potential for increased profitability and benefits for sustainable crop gram for each of the two fields. Convergence in the vineyard occurred at production (Ben-Gal et al., In Press; Zhang et al., 2002). One of the most 69 observations and for the peach orchard, the spatial constraints efficient techniques for precision agriculture is using site-specific MZs. allowed for 89 observations. For both plots, the sample size for 90% MZs are homogeneous sub-areas within a field based on spatially criterion reduction was 22. The final location of the sample observations quantitative measures (Gavioli et al., 2019). An additional aspect of for both plots are illustrated in Fig. 6. testing the accuracy of population representation through the sample derived by the MFM algorithm is determining the number of sample observations that fall within each management zone. In an accurate 3.3. Validation representation of the partitioning of the plots, the population share of observations in each MZ should be equal to the sample share of obser After applying the MFM algorithm (Algorithm 1), resulting in an vations in each zone, where the sample has been chosen by the MFM approximate best sample of observations for each plot, analysis was algorithm. The MZs for each plot were estimated using a weighted conducted to compare the sample to the true population values of CWSI. multivariate clustering method (Ohana-Levi et al., 2020). First, the In Table 5 the sample (22 observations) and the entire population are variables were given weights according to their respective contribution compared in terms of KS test statistic, SD, mean, CV, range, quantiles of to the explained variance by applying principal component analysis the population represented by the sample and value of the criterion of (PCA) (Dunteman, 1989). This step was followed by iteratively applying the chosen sample. For both plots, there was no significant difference fuzzy c-means algorithm for a different number of clusters each time. between the distributions of the entire population of CWSI values and The best clustering result was determined by calculating an average the sample (p = 0.71 and p = 0.1 for the vineyard and orchard, silhouette width (ASW), comparing cluster tightness and separation respectively). For both plots, SD differences between CWSI population (Rousseeuw, 1987). For this, the iteration with the highest ASW was and the selected sample were small (0.04 and 0.01 for vineyard and chosen to define the number of MZs for each plot. The MZ delineation peach orchard, respectively), and the sample set was able to represent process was performed using the R packages “ppclust” (Cebeci, 2019) 99.6 and 93.3% of the range of values of the CWSI population. applying the fuzzy c-means algorithm and “cluster” (Maechler et al., The results of the validation of the MFM algorithm against uniform 2019) for the ASW analysis. random sampling with spatial constraints and cLHS are detailed in Table 6. Generally, the MFM method performed better than the uniform 3. Results Table 4 3.1. Defining minimum distance by estimating spatial autocorrelation Input variables and parameters for the multiple functional-matching criterion algorithm. TWI, NDVI and ECa stand for topographic wetness index, normalized The four input variables, as well as the target variable, showed sig difference vegetation index and apparent electrical conductivity, respectively. nificant spatial autocorrelation according to Moran’s I test at the level Vineyard Peach orchard ofα = 0.0001. The multivariate semivariogram applied to the four Variables Slope, TWI, ECa, Slope, TWI, ECa, variables provided the structural parameters. For this study, we NDVI, NDVI considered the estimated range as the minimum distance between each Population sizen 3523 2151 pair of sample observations. Minimum distance values were 13.89 and Sample sizeN 22 22 15.26 m for the vineyard and peach orchard, respectively. Minimum distancer (m) 13.9 15.3 Number K of samples 20,000 20,000 considered Table 3 Sample size estimation, maximum number of observations and sample size at convergence of the mean criterion,Cmean,N , according to different minimum distance values, withα = 90% criterion reduction, K = 500 samples for each minimum distancer. Minimum distance Vineyard Peach orchard (m) Sample Maximum number of Sample size at convergence Sample Maximum number of Sample size at convergence size observations ofCmean,N size observations ofCmean,N 10 24 127 124 24 200 194 11 23 105 – 24 171 170 12 23 94 – 23 148 147 13 22 76 – 23 112 111 14 22 68 – 22 101 98 15 21 61 – 22 92 91 16 21 52 – 22 82 – 8 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 Fig. 5. Illustration of simulated percent criterion reduction against sample size for the vineyard (a) and peach orchard (b). Fig. 6. Spatial representation of the optimized sample location according to the multivariate moment-matching criterion in the vineyard (a) and the peach or chard (b). random sampling and cLHS for both fields. This means that the moments vineyard, the differences between the MFM and the two reference of the samples selected by the MFM algorithm were closer to the pop methods were significant for mean, SD and KS statistic, while in the ulation moments compared to the moments of the uniform random peach orchard there were significant differences only for the average samples or those generated by cLHS. More precisely, the difference be differences in mean values, and between MFM and cLHS for the KS tween each MFM sample moment and the corresponding population statistic. moments were on average smaller than the differences with the uniform Differences between the methods were also visualized using QQ- random sample moment and those of cLHS. The only exception was the plots (Figs. S3, S4). In general, at lower values, there were almost no SD for the peach orchard which had a slightly larger difference with the differences between MFM and both reference sampling methods for all MFM sample than both reference methods, and the SD difference range moments (mean, SD, and KS), with an exception for MFM against cLHS was smaller for cLHS. The SD differences for the peach orchard data of in the vineyard. However, larger differences between the sampling MFM were not significantly different from the other two reference techniques were observed in the vineyard dataset (Figs. S3, S4 a, c, e). methods (p-value = 0.61 & 0.704 for the uniform random sample and The QQ-plot (dotted line) spanned further away from the diagonal cLHS, respectively). The highest dissimilarities between the MFM sam (continuous line) towards the Y-axis, representing the uniform random ple results and both the uniform random sampling and cLHS, for both sampling method/cLHS, illustrating larger differences between samples plots, were for differences between population and sampled range and population moments. These results support the t-test analysis pro values. On average, the MFM sampled observations showed lower vided in Table 6, showing that the MFM method produced sample ob ranges of differences for all moments compared to the ranges of differ servations that were significantly closer to the population CWSI in terms ences resulting from the uniform random sampling or cLHS. In the of mean, SD, and distribution (KS), compared to sample observations 9 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 Table 5 selected observations) in cluster 2. For the peach orchard, there were six Comparison between population and chosen sample statistics of crop water observations (27% of selected observations) located within cluster 1 and stress index for the vineyard and peach orchard plots, where the sample was 16 (73%) in cluster 2. chosen among K = 20, 000 uniformly drawn samples. Population Sample 4. Discussion Vineyard Kolmogorov-Smirnov goodness-of-fit p = 0.711 The main objective of this study was to construct an algorithm for Standard deviation 0.14 0.18 defining locations to sample a target feature across an agricultural field, Mean 0.29 0.30 using known explanatory variables observed over the whole population. Coefficient of variation 0.5 0.6 The proposed algorithm (Algorithm 1) requires several inputs, the first Range 0–0.89 0–0.79 Quantiles of population represented by sample (%) 0–99.6 of which (Input 1) is a set of multivariate spatially defined observations. Criterion for the chosen sample set 2.5 The variables that serve as inputs to the model should be linked to the Peach orchard target variable. Such an association may be established using expert Kolmogorov-Smirnov goodness-of-fit p = 0.098 knowledge, based on past experiments, and literature review. An input Standard deviation 0.11 0.1 set of variables independent from the target variable to be sampled will Mean 0.19 0.15 Coefficient of variation 0.58 0.65 not yield an accurate sample set of observations. In this study, some of Range 0–0.68 0.006–0.4 the variables selected (e.g. TWI, NDVI, ECa and slope) are known to be Quantiles of population represented by sample (%) 2.2–95.5 associated with soil moisture levels (Costa et al., 2014; Mohanty & Criterion for the chosen sample set 3.02 Skaggs, 2001; Nanda et al., 2019), which in turn determine water availability to the plants (Prueger et al., 2019). NDVI is an indicator of derived from the uniform random sampling or cLHS. For the peach or plant vigor, that has been found to be associated with drought stress in chard, the differences between the methods were significant only for crops (Lee & Park, 2019). The choice of the input variables was also mean differences (uniform random sampling) and for mean differences based on the degree of availability and efficiency of measurements in the and KS (cLHS). This result is also supported by Figs. S3b & S4b, f, field. TWI and slope are both products that were generated from a DEM, showing a larger distance between the QQ-plot (dotted line) and the which is commonly available, NDVI is easily derived from UAV, diagonal than for the other moments of the peach orchard (Figs. S3 d, f & airborne, or satellite platforms and is widely used in agricultural S4 d). research and practices (Kamilaris et al., 2017). Measurements of ECa, representing soil properties that affect crop productivity, including soil texture, salinity, organic matter, drainage conditions, and subsoil fea 3.4. Management zone representation tures, are reliable, quick and easy to acquire (Corwin & Lesch, 2013). It should be noted that the algorithm may also be modified and applied In Fig. 6, the location of the chosen set of observations for both plots using other types of data, such as categorical variables, circular vari is shown. After applying a weighted, multivariate fuzzy C-means cluster ables, etc. analysis to the plots’ datasets, they were each separated into two clusters Acknowledging the presence of spatial autocorrelation in (Fig. 7). The population share of vines/trees in cluster 1 and cluster 2 geographically distributed data is imperative for sample size determi was, correspondingly, 0.41 and 0.59 for the vineyard and 0.34 and 0.66 nation (Griffith, 2013), thus the final set of input variables was used to for the peach orchard. The selected observations showed a similar dis define the minimum distance between each pair of sample observations tribution, where, in the vineyard, eight observations (36% of selected (Input 2). The main purpose of this step is to avoid having a number of observations) were included in cluster 1 and 14 observations (64% of observations that provide dependent information and could be biased Table 6 Validation results for the target variable “crop water stress index”, based on 3000 replications. A comparison between averaged differences between the functionals of the samples and the population was based on multiple runs of the multivariate functional-matching algorithm, the conditioned Latin hypercube sampling (cLHS) and uniform random sampling with a spatial constraint for the vineyard and peach orchard, using a sample size of 22 (corresponding to 90% criterion reduction) for both plots. The percentages in parentheses denote the relative difference between MFM and the uniform random sample and cLHS. Average differences between normalized sample and population moments of the target variable MFM (reference) cLHS Uniform random sample Vineyard Average difference of means 0.017 0.023 (− 25.3%) 0.021 (− 18.87%) Range of mean difference 0.087 0.11 (− 18.23%) 0.11 (− 17.46%) Average difference of SD 0.017 0.019 (− 13.18%) 0.018 (− 7.67%) Range of SD difference 0.067 0.09 (− 25.3%) 0.103 (− 34.25%) Average D statistic 0.160 0.175 (− 8.38%) 0.167 (− 3.8%) Range of D statistic 0.296 0.342 (− 13.28%) 0.321 (− 7.87%) Welch’s t-test Differences between means t = − 12.08, p < 0.001 t = − 9.55, p < 0.001 Differences between SD t = − 6.05, p < 0.001 t = − 3.71, p < 0.001 Differences between KS statistics t = − 9.66, p < 0.001 t = − 4.77, p < 0.001 Peach orchard Average difference of means 0.016 0.17 (− 4.99%) 0.017 (− 4.44%) Range of mean difference 0.077 0.087 (− 11.62%) 0.094 (− 18.32%) Average difference of SD 0.017 0.016 (+0.87%) 0.016 (+1.06%) Range of SD difference 0.066 0.064 (+3.95%) 0.077 (− 13.92%) Average D statistic 0.172 0.176 (− 2.14%) 0.173 (− 0.76%) Range of D statistic 0.316 0.373 (− 15.12%) 0.345 (− 8.37%) Welch’s t-test Differences between means t = − 2.106, p = 0.035 t = − 2.08, p = 0.038 Differences between SD t = 0.38, p = 0.704 t = − 0.51, p = 0.61 Differences between KS statistics t = − 2.268, p = 0.023 t = − 0.88, p = 0.377 10 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 Fig. 7. A spatial representation of the clusters in the vineyard (a) and the peach orchard (b), along with the sampled points selected by the multifunctional matching algorithm in the vineyard (red) and peach orchard (blue). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) compared to the entire population. In general, the more heterogeneous order effects would be expected to have increasing influence when the the population, the larger the sample size needed to obtain a specific accuracy level is sufficiently high. When convergence does not occur (for level of precision (Israel, 1992). In addition, a stronger spatial auto example because of a small population size or equivalently, because of a correlation (and weaker spatial variability), with a larger model- high autocorrelation), the size of the plot may also have an effect on semivariogram range value, requires a smaller sample size (Input 2) to sample size. In other words, in the cases where the spatial constraint represent the multivariate distribution, since more observations are limits the sample size, larger plots will require more observations to similar to others (Haining, 2015b). In the case study presented in this maintain a certain level of accuracy. Similarly, another factor that af work, the characteristics of the vineyard displayed both higher vari fects the sample size is spatial autocorrelation, since a larger spatial ability (Table 1) and a higher spatial variability (Table 4) than the at variability will require a larger sample size. tributes of the peach orchard. The number of observations suggested by Validation against measured CWSI was conducted to provide an Algorithm 2 was slightly sensitive to the minimum distancer, which assessment of the performance of the suggested algorithm. In Table 5, corresponds to the spatial variability. More precisely, the estimated the sample moments are shown in comparison to the population mo number of samples for a givenα may shift with varying values of r. In ments, to assess the effectiveness of the algorithm in finding a repre some cases, a small r will not enable enough observations to be included sentative sample. The results show small differences between sample for the mean criterion to converge, thus decreasing the approximated and population mean and SD of CWSI. The range of values was not ex sample size K when r increases. While applying Algorithm 2, other pa pected to overlap between the sample values and the population, since rameters can also have an influence on the outcome, as well as the outliers should not be incorporated in the selected sample. Indeed, a structure of the data itself. The outcome of Algorithm 1 depends on the useful sample should not account for extreme values. Therefore, we chosen number of replications (Input 2). A higher number of replications chose to use only means, SDs and Kendall’s tau as functionals of interest will result in a higher probability of reducing the criterion and finding a (Algorithm 1), instead of the minimum and the maximum which would more accurate fit to the population moments, thereby introducing a tend to select outliers. Table 5 shows that the quantiles of the population better sample to represent the distribution of interest. represented by the range of the sample were quite high, 99.6% for the In this present study, the vineyard was smaller in area than the or vineyard and 93.3% for the peach orchard, indicating that the selected chard (with a ratio of 57%). However, the chosen number of observa set of observations for both plots was able to represent most of the tions for both plots, after applying a criterion reduction of 90%, was 22. population distribution, excluding extreme values. In cases where the With a larger area and lower tree-density, the peach orchard was found user wishes to represent a narrower part of the distribution (only the to have a higher range of spatial autocorrelation (Table 4) and lower inter-quantile range, for example), the algorithm should be applied on a variability (Table 1) than the vineyard. Although the vineyard consisted subset of the multivariate population data that lies within the specified of a more observations than the orchard (3523 vines vs 2151 trees), frequency levels. In this study, we chose to exclude border vines/trees, when increasing the percentage of criterion reductionα above 90% (and so the application of the algorithm was based on a subset of the entire thereby increasing the accuracy of the representation of the population), population. The MFM algorithm is randomized, therefore different ap the number of observations required to represent the population was plications may yield different outputs even with the same input. This is larger for the orchard than for the vineyard (for example, 95% criterion not problematic, as it is entirely possible that different sets of samples reduction resulted in 34 vs 31 observations for the orchard and the could each satisfactorily represent the population. vineyard, respectively). These differences in the estimated required Validating the performance of the algorithm also included a com number of observations for higher accuracy levels can be explained by parison to a uniform random sampling with the semivariogram-derived the difference between the two distributions. Indeed, the underlying minimum distance and to cLHS for each plot. Table 6 shows that overall, data-generating processes that correspond to each field are not identical; the MFM algorithm performed better than the uniform random sampling and even if we use some normalization (step 3 of Algorithm 1), higher- and cLHS techniques in the vineyard, where the dataset had greater 11 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 variability and spatial heterogeneity. In the peach orchard, improve functionals of each set of observations are compared to the population ments shown by the MFM model were more subdued. The t-test analyses functionals and result in a criterion that is determined by their differ showed that the sample set generated by the MFM algorithm accounted ences. This work shows how to apply mean, SD and Kendall’s Tau as the only for significant improvement in representing the population mean of functionals of matching; however, the method can be adjusted and CWSI against both reference methods, and improvement in representing applied using other moments/functionals that could help to represent the population distribution compared to cLHS. However, in both plots, the distribution. The algorithm can also be applied using only one input the reduction in the ranges of difference between sample set functionals variable, as long as it is associated to the target variable. Validation of and population functionals was high and significant, with an exception the model results against a specified target variable showed a higher of SD for the peach orchard, where cLHS and MFM algorithms were not accuracy, better precision, and an overall stronger performance of the found to be significantly different. This means that, on average, the MFM MFM model compared to a uniform random sampling with spatial performed better than uniform random sampling for the peach orchard constraint and to cLHS. Model performance was better for the vineyard in representing the (unknown) average value of the target but similarly that had higher variability in the data than for the peach orchard. in terms of representing the SD. In other words, on 3000 replications of We applied the MFM based on a static dataset; however, it may be the process, the MFM model showed higher levels of accuracy compared used dynamically throughout the season. Some variables in agricultural to the uniform random sampling or cLHS (smaller differences on applications are known to change during the growing season (e.g. average, Table 6). cLHS has been widely used in digital soil mapping, NDVI), and a sampling design may change along with the known vari usually applied to cases where a high number of observations should be ables for each sampling campaign. sampled within large areas of research (Brungard & Boettinger, 2010). The number of sample sites defined using Algorithm 2 was based on Defining sample designs within agricultural fields may benefit from the an arithmetic reduction of the criterion. Future work may include a MFM, which has fewer tuning parameters and a straightforward cali reduction of the criterion based on minimizing cost function from an bration process. From this study, it also seems to be more suitable for economical point of view, or the penalty of losing information that may smaller sample sizes for plots with spatial autocorrelation (Table 2). lead to inferior estimations in terms of expenses to the farmer, as Further study should be conducted to test this assumption. Additionally, opposed to the cost of sampling. This type of modeling approach will the MFM model was not designed to specifically account for SSH (Wang have to rely on quantifying the cost of error or bias to the farmer. et al., 2016), and it is currently assumed that the accuracy and efficiency Additional data collection and studies in this direction are needed in of the algorithm is unaffected by either the presence or the absence of order to establish such a modeling framework. SSH in the data. This assumption should be tested in the future followed Finally, the application of the proposed algorithm may be adopted by by possible specific modifications to the algorithm. other fields of research where small sample sizes are required and is not Finally, the MFM algorithm generated sampling sets that corre limited to agricultural practices. Sampling designs of spatially and non- sponded to the spatial patterns of the plots (Fig. 7). Using multiple spatially varying datasets for representation of their distribution may variables, two clusters were estimated for each plot, with a different yield high accuracy levels of information by utilizing the modeling number of vines/trees in each cluster. The population shares of obser framework and assisting in decision-making processes. vations in each cluster were quite similar to the selected sample shares of observations in each cluster. This means that the suggested algorithm CRediT authorship contribution statement could account for the spatial variability and succeed in representing the population’s repartition into the clusters. However, the MFM algorithm N. Ohana-Levi: Conceptualization, Formal analysis, Investigation, is a non-spatial platform for representing the distribution of a certain Methodology, Software, Validation, Visualization, Writing – original population, with a recommended choice of incorporating a minimum draft. A. Derumigny: Conceptualization, Formal analysis, Investigation, distance condition to the calculation. MZs may be used as a validation Methodology, Software, Visualization, Writing – original draft. A. Pee tool to assess the certainty of the resulting sample set, and indeed, as the ters: Conceptualization, Funding acquisition, Investigation, Methodol sample is intended to be representative of the population, it is coherent ogy, Supervision, Writing – review & editing. A. Ben-Gal: that the repartition of the observations of this chosen sample into both Conceptualization, Funding acquisition, Investigation, Methodology, clusters should be close to the repartition of the whole population. Project administration, Resources, Supervision, Writing – review & The suggested framework was designed to select the best locations to editing. I. Bahat: Data curation, Investigation, Methodology, Formal sample in order to represent an unknown population distribution of a analysis. L. Katz: Data curation, Investigation, Methodology, Formal certain field attribute. Our study case demonstrated a scenario where a analysis. Y. Netzer: Data curation, Investigation, Methodology, Re relatively low number of samples represented a large number of popu sources, Project administration. A. Naor: Data curation, Resources. Y. lation observations (0.6% and 1% of the vines/trees, respectively). Cohen: Conceptualization, Data curation, Formal analysis, Funding These approximately best locations are determined using information acquisition, Investigation, Methodology, Project administration, Re from other proxy variables. Using such a method may allow for a rela sources, Supervision, Writing – review & editing. tively small number of samples that would sufficiently represent the entire population of objects in the field, such as in the cases of sampling Declaration of Competing Interest or sensing for soil or plant water or nutrient status, plant size, yield or plant physiological parameters, provided that the input variables are The authors declare that they have no known competing financial related closely enough to the target variable for sampling. The MFM may interests or personal relationships that could have appeared to influence be applied to non-spatial data as well, when dealing with problems of the work reported in this paper. representing population distributions, such as sampling honeybee col onies for infections (Fries et al., 1984) or analysis of genotypic variation Acknowledgements for specific crops (He et al., 2017; O’Toole & Moya, 1978). Similarly, it may be used with time-series data to analyze specific intervals/timesteps This research is a part of the “Eugene Kendel” Project for Develop that may be sampled to represent the entire dataset. ment of Precision Drip Irrigation funded via the Ministry of Agriculture and Rural Development in Israel (Grant No. 20-12-0030). The project 5. Summary and conclusions has also received funding from the European Union’s Horizon 2020 research and innovation programme under Project SHui, grant agree The new MFM algorithm was designed to select the best set of sample ment No 773903. The authors wish to thank NaanDan Jain for the locations based on functional matching of a random vector. Specified elevation survey in Mevo Beitar vineyard, to Carmel Winery for the 12 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 vineyard contribution, and the grower, Avi Yehuda. The authors would Gasic, K., Reighard, G.L., Windham, J., Ognjanov, M., 2015. Relationship between fruit maturity at harvest and fruit quality in peach. Acta Hortic. 1084, 643–648. https://d also like to thank the grower, Shlomo Cohen, for collaborating and oi.org/10.17660/ActaHortic.2015.1084.86. allowing the research to be conducted in his peach orchard and to the Gavioli, A., de Souza, E.G., Bazzi, C.L., Schenatto, K., Betzek, N.M., 2019. Identification Northern R&D technical support team. of management zones in precision agriculture: an evaluation of alternative cluster analysis methods. Biosyst. Eng. 181, 86–102. https://doi.org/10.1016/j. biosystemseng.2019.02.019. Appendix. Supplementary materials A and B Gibbs, A.L., Su, F.E., 2002. On choosing and bounding probability metrics. Int. Stat. Rev. 70 (3), 419–435. https://doi.org/10.1111/j.1751-5823.2002.tb00178.x. Supplementary data to this article can be found online at https://doi. Gibson, W.C., 2014. The Method of Moments in Electromagnetics, Second Ed. Chapman and Hall/CRC. org/10.1016/j.compag.2021.106262. Goovaerts, P., Kerry, R., 2010. Using ancillary data to improve prediction of soil and crop attributes in precision agriculture. In: Oliver, M.A. (Ed.), Geostatistical Applications References for Precision Agriculture. Springer, Netherlands, pp. 167–194. https://doi.org/ 10.1007/978-90-481-9133-8_7. Griffith, D.A., 2004. Spatial autocorrelation. In: Encyclopedia of Social Measurement. Adamchuk, V.I., Viscarra Rossel, R.A., Marx, D.B., Samal, A.K., 2011. Using targeted Elsevier Inc., pp. 581–590. https://doi.org/10.1016/B0-12-369398-5/00334-0 sampling to process multivariate soil sensing data. Geoderma 163 (1–2), 63–73. Griffith, D.A., 2013. Establishing qualitative geographic sample size in the presence of https://doi.org/10.1016/j.geoderma.2011.04.004. spatial autocorrelation. Ann. Assoc. Am. Geogr. 103 (5), 1107–1122. https://doi. Aqeel-Ur-Rehman, Abbasi, A.Z., Islam, N., Shaikh, Z.A., 2014. A review of wireless org/10.1080/00045608.2013.776884. sensors and networks’ applications in agriculture. Comput. Stand. Interfaces 36 (2), Haining, R., 2015a. Spatial Autocorrelation. In: International Encyclopedia of the Social 263–270. https://doi.org/10.1016/j.csi.2011.03.004. & Behavioral Sciences, Second Edition. Elsevier Inc., pp. 105–110. https://doi.org/ Bahat, I., Netzer, Y., Grünzweig, J.M., Alchanatis, V., Peeters, A., Goldshtein, E., Ohana- 10.1016/B978-0-08-097086-8.72056-3 Levi, N., Ben-Gal, A., Cohen, Y., 2021. In-season interactions between vine vigor, Haining, R., 2015b. Spatial sampling. In: International Encyclopedia of the Social & water status and wine quality in terrain-based management-zones in a ‘cabernet Behavioral Sciences, Second Edition. Elsevier Inc., pp. 185–190. https://doi.org/ sauvignon’ vineyard. Remote Sens. 13 (9), 1636. https://doi.org/10.3390/ 10.1016/B978-0-08-097086-8.72065-4 rs13091636. Hall, A.R., 2005. Generalized Method of Moments. Oxford University Press. Bazzi, C.L., Schenatto, K., Upadhyaya, S., Rojo, F., Kizer, E., Ko-Madden, C., 2019. Hansen, L.P., 1982. Large sample properties of generalized method of moments Optimal placement of proximal sensors for precision irrigation in tree crops. Precis. estimators. Econometrica 50 (4), 1029. https://doi.org/10.2307/1912775. Agric. 20 (4), 663–674. https://doi.org/10.1007/s11119-018-9604-3. He, J., Jin, Y., Du, Y.-L., Wang, T., Turner, N.C., Yang, R.-P., Siddique, K.H.M., Li, F.-M., Becker, P., Rabenold, P.E., Idol, J.R., Smith, A.P., 1988. Water potential gradients for 2017. Genotypic variation in yield, yield components, root morphology and gaps and slopes in a Panamanian tropical moist forest’s dry season. In: Journal of architecture, in soybean in relation to water and phosphorus supply. Front. Plant Sci. Tropical Ecology, Vol. 4. Cambridge University Press, pp. 173–184. https://doi. 8, 1499. https://doi.org/10.3389/fpls.2017.01499. org/10.2307/2559656. Heuvelink, G.B.M., van Egmond, F.M., 2010. Space-time geostatistics for precision Bellvert, J., Zarco-Tejada, P.J., Marsal, J., Girona, J., González-Dugo, V., Fereres, E., agriculture: a case study of NDVI mapping for a Dutch Potato Field. In: Geostatistical 2016. Vineyard irrigation scheduling based on airborne thermal imagery and water Applications for Precision Agriculture. Springer, Netherlands, pp. 117–137. https:// potential thresholds. Aust. J. Grape Wine Res. 22 (2), 307–315. https://doi.org/ doi.org/10.1007/978-90-481-9133-8_5. 10.1111/ajgw.12173. Isaacs, E.H., Srivastava, M., 1989. An Introduction to Applied Geostatistics. Oxford Ben-Gal, A., Cohen, Y., Peeters, A., Naor, A., Netzer, Y., Ohana-Levi, N., Bahat, I., Katz, University Press. L., Shaked, B., Linker, R., Yulzary, S., & Alchanatis, V., n.d. Precision drip irrigation Israel, G.D., 1992. Determining Sample Size. for horticulture. Acta Horticulturae, In Press. Israeli, A., Litaor, M., Emmerich, M., Shir, O.M., 2019. Statistical learning in soil Brungard, C.W., Boettinger, J.L., 2010. Conditioned Latin hypercube sampling: optimal sampling design aided by Pareto optimization. In: GECCO 2019 - Proceedings of the sample size for digital soil mapping of arid rangelands in Utah, USA. In: Digital Soil 2019 Genetic and Evolutionary Computation Conference, pp. 1198–1205. https:// Mapping. Springer, Netherlands, pp. 67–75. https://doi.org/10.1007/978-90-481- doi.org/10.1145/3321707.3321809. 8863-5_6. Johnson, C.K., Mortensen, D.A., Wienhold, B.J., Shanahan, J.F., Doran, J.W., 2003. Site- Cebeci, Z., 2019. Comparison of internal validity indices for fuzzy clustering. J. Agric. specific management zones based on soil electrical conductivity in a semiarid Inf. 10 (2). https://doi.org/10.17700/jai.2019.10.2.537. cropping system. Agron. J. 95 (2), 303–315. https://doi.org/10.2134/ Chang, J., Clay, D.E., Carlson, C.G., Clay, S.A., Malo, D.D., Berg, R., Kleinjan, J., agronj2003.3030. Wiebold, W., 2003. Different techniques to identify management zones impact Johnson, L.F., 2003. Temporal stability of an NDVI-LAI relationship in a Napa Valley nitrogen and phosphorus sampling variability. Agron. J. 95 (6), 1550–1559. https:// vineyard. Aust. J. Grape Wine Res. 9 (2), 96–101. https://doi.org/10.1111/j.1755- doi.org/10.2134/agronj2003.1550. 0238.2003.tb00258.x. Clifford, D., Payne, J.E., Pringle, M.J., Searle, R., Butler, N., 2014. Pragmatic soil survey Jonckheere, I., Fleck, S., Nackaerts, K., Muys, B., Coppin, P., Weiss, M., Baret, F., 2004. design using flexible Latin hypercube sampling. Comput. Geosci. 67, 62–68. https:// Review of methods for in situ leaf area index determination: Part I. Theories, sensors doi.org/10.1016/j.cageo.2014.03.005. and hemispherical photography. Agric. For. Meteorol. 121 (1–2), 19–35. https://doi. Cohen, Y., Alchanatis, V., Saranga, Y., Rosenberg, O., Sela, E., Bosak, A., 2017. Mapping org/10.1016/J.AGRFORMET.2003.08.027. water status based on aerial thermal imagery: comparison of methodologies for Kamilaris, A., Kartakoullis, A., Prenafeta-Boldú, F.X., 2017. A review on the practice of upscaling from a single leaf to commercial fields. Precis. Agric. 18 (5), 801–822. big data analysis in agriculture. In: Computers and Electronics in Agriculture, Vol. https://doi.org/10.1007/s11119-016-9484-3. 143. Elsevier B.V, pp. 23–37. https://doi.org/10.1016/j.compag.2017.09.037. Colaizzi, P.D., Barnes, E.M., Clarke, T.R., Choi, C.Y., Waller, P.M., 2003. Estimating soil Kasimatis, A.N., Vilas, E.P., 1985. Sampling for degrees brix in vineyard plots. Am. J. moisture under low frequency surface irrigation using crop water stress index. Enol. Viticult. 36 (3). J. Irrig. Drain. Eng. 129 (1), 27–35. https://doi.org/10.1061/(ASCE)0733-9437 Katz, L., Ben-Gal, A., Litaor, M., Naor, A., Peres, M., Bahat, I., Netzer, Y., Peeters, A., (2003)129:1(27). Alchanatis, V., Cohen, Y., n.d. A methodology to evaluate the impact of variable rate Corwin, D.L., Lesch, S.M., 2013. Protocols and guidelines for field-scale measurement of application: cases from a drip-irrigated peach orchard and a wine grape vineyard. soil salinity distribution with ECa-directed soil sampling. J. Environ. Eng. Geophys. Precis. Agric. 18 (1), 1–25. https://doi.org/10.2113/jeeg18.1.1. Kerry, R., Oliver, M.A., Frogbrook, Z.L., 2010. Sampling in precision agriculture. In: Costa, M.M., de Queiroz, D.M., de Carvalho Pinto, F. de A., dos Reis, E.F., Santos, N.T., Geostatistical Applications for Precision Agriculture. Springer, Netherlands, 2014. Moisture content effect in the relationship between apparent electrical pp. 35–63. https://doi.org/10.1007/978-90-481-9133-8_2. conductivity and soil attributes. Acta Scientiarum - Agronomy, 36(4), 395–401. https Khanal, S., Fulton, J., Shearer, S., 2017. An overview of current and potential ://doi.org/10.4025/actasciagron.v36i4.18342. applications of thermal remote sensing in precision agriculture. Comput. Electron. De Lannoy, G.J.M., Verhoest, N.E.C., Houser, P.R., Gish, T.J., Van Meirvenne, M., 2006. Agric. 139, 22–32. https://doi.org/10.1016/J.COMPAG.2017.05.001. Spatial and temporal characteristics of soil moisture in an intensively monitored Kim, J.Y., Glenn, D.M., 2015. Measurement of photosynthetic response to plant water agricultural field (OPE3). J. Hydrol. 331 (3–4), 719–730. https://doi.org/10.1016/j. stress using a multi-modal sensing system. Trans. ASABE 58 (2), 233–240. https://do jhydrol.2006.06.016. i.org/10.13031/trans.58.10873. Derumigny, A., Ohana-Levi, N., 2020. R package: MFunctMatching. Kitanidis, P.K., 1988. Prediction by the method of moments of transport in a Dunteman, G.H., 1989. Principal Component Analysis. SAGE Publications Ltd. heterogeneous formation. J. Hydrol. 102 (1–4), 453–473. https://doi.org/10.1016/ ESRI Inc, 2017. ArcGIS Pro (Version 2.1.2). Environmental Systems Research Institute, 0022-1694(88)90111-4. Redlands, CA. Koenig, W.D., 1999. Spatial autocorrelation of ecological phenomena. In: Trends in Fries, I., Ekbohm, G., Villumstad, E., 1984. Nosema apis, sampling techniques and honey Ecology and Evolution, Vol. 14. Elsevier Ltd., pp. 22–26. https://doi.org/10.1016/ yield. J. Apic. Res. 23 (2), 102–105. https://doi.org/10.1080/ S0169-5347(98)01533-X. Issue 1. 00218839.1984.11100617. Lee, D.-H., Park, J.-H., 2019. Comparison between NDVI and CWSI for waxy corn growth Gandah, M., Stein, A., Brouwer, J., Bouma, J., 2000. Dynamics of spatial variability of monitoring in field soil conditions. In: Neale, C.M., Maltese, A. (Eds.), Remote millet growth and yields at three sites in Niger, West Africa and implications for Sensing for Agriculture, Ecosystems, and Hydrology XXI, Vol. 11149. SPIE, p. 58. precision agriculture research. Agric. Syst. 63 (2), 123–140. https://doi.org/ https://doi.org/10.1117/12.2533558. 10.1016/S0308-521X(99)00076-1. López-Vicente, M., Calvo-Seas, E., Álvarez, S., Cerdà, A., 2020. Effectiveness of cover crops to reduce loss of soil organic matter in a rainfed vineyard. Land 9 (7), 230. https://doi.org/10.3390/land9070230. 13 N. Ohana-Levi et al. Computers and Electronics in Agriculture 187 (2021) 106262 Maechler, M., Rousseeuw, P.J., Struyf, A., Hubert, M., Hornik, K., 2019. cluster: Cluster Park, Y.L., Krell, R.K., Carroll, M., 2007. Theory, technology, and practice of site-specific Analysis Basics and Extensions. R package version 2.1.0. (R package version 2.1.0; p. insect pest management. J. Asia-Pac. Entomol. 10 (2), 89–101. https://doi.org/ Available online; https://svn.r-project.org/R-pack). 10.1016/S1226-8615(08)60337-4. Marsaglia, G., Tsang, W.W., Wang, J., 2003. Evaluating Kolmogorov’s distribution. Pearson, K., 1894. Contributions to the mathematical theory of evolution. Philos. Trans. J. Stat. Softw. 8 (1), 1–4. https://doi.org/10.18637/jss.v008.i18. Roy. Soc. London (A.) 185, 71–110. https://doi.org/10.1098/rsta.1894.0003. Menzel, C.M., Simpson, D.R., 1986. Plant water relations in lychee: Effects of solar Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Comput. Geosci. radiation interception on leaf conductance and leaf water potential. Agric. For. 30, 683–691. https://doi.org/10.1016/j.cageo.2004.03.012. Meteorol. 37 (4), 259–266. https://doi.org/10.1016/0168-1923(86)90064-X. Petroselli, A., Vessella, F., Cavagnuolo, L., Piovesan, G., Schirone, B., 2013. Ecological Meron, M., Tsipris, J., Orlov, V., Alchanatis, V., Cohen, Y., 2010. Crop water stress behavior of Quercus suber and Quercus ilex inferred by topographic wetness index mapping for site-specific irrigation by thermal imagery and artificial reference (TWI). Trees – Struct. Funct. 27 (5), 1201–1215. https://doi.org/10.1007/s00468- surfaces. Precis. Agric. 11 (2), 148–162. https://doi.org/10.1007/s11119-009-9153- 013-0869-x. x. Prueger, J.H., Parry, C.K., Kustas, W.P., Alfieri, J.G., Alsina, M.M., Nieto, H., Wilson, T. Metcalfe, P., Beven, K., Freer, J., 2018. dynatopmodel: Implementation of the Dynamic G., Hipps, L.E., Anderson, M.C., Hatfield, J.L., Gao, F., McKee, L.G., McElrone, A., TOPMODEL Hydrological Model. R package version 1.2.1 (version 1.2.1 from CRAN; Agam, N., Los, S.A., 2019. Crop Water Stress Index of an irrigated vineyard in the p. Available online: https://CRAN.R-project.org/packa). https://rdrr.io/cran/dyna Central Valley of California. Irrig. Sci. 37 (3), 297–313. https://doi.org/10.1007/ topmodel/. s00271-018-0598-4. Minasny, B., McBratney, A.B., 2006. A conditioned Latin hypercube method for sampling R Core Team, 2020. R: A language and environment for statistical computing (p. in the presence of ancillary information. Comput. Geosci. 32 (9), 1378–1388. Available online: https://www.R-project.org/). R Foundation for Statistical https://doi.org/10.1016/j.cageo.2005.12.009. Computing, Vienna, Austria. URL http://www.R-project.org/. Minasny, B., McBratney, A.B., Walvoort, D.J.J., 2007. The variance quadtree algorithm: Raskutti, B., Leckie, C., 1999. An evaluation of criteria for measuring the quality of use for spatial sampling design. Comput. Geosci. 33 (3), 383–392. https://doi.org/ clusters. In: Proceedings of the 16th International Joint Conference on Artificial 10.1016/j.cageo.2006.08.009. Intelligence - Volume 2 (IJCAI’99), pp. 905–910. Mohanty, B.P., Skaggs, T.H., 2001. Spatio-temporal evolution and time-stable Richter, R., Reu, B., Wirth, C., Doktor, D., Vohland, M., 2016. The use of airborne characteristics of soil moisture within remote sensing footprints with varying soil, hyperspectral data for tree species classification in a species-rich Central European slope, and vegetation. Adv. Water Resour. 24 (9–10), 1051–1067. https://doi.org/ forest area. Int. J. Appl. Earth Obs. Geoinf. 52, 464–474. https://doi.org/10.1016/j. 10.1016/S0309-1708(01)00034-3. jag.2016.07.018. Monaghan, J.M., Daccache, A., Vickers, L.H., Hess, T.M., Weatherhead, E.K., Grove, I.G., Roudier, P., 2011. clhs: a R package for conditioned Latin hypercube sampling. https://g Knox, J.W., 2013. More ‘crop per drop’: constraints and opportunities for precision ithub.com/pierreroudier/clhs/. irrigation in European agriculture. J. Sci. Food Agric. 93 (5), 977–980. https://doi. Roudier, P., Tisseyre, B., Poilvé, H., Roger, J.M., 2008. Management zone delineation org/10.1002/jsfa.6051. using a modified watershed algorithm. Precis. Agric. 9 (5), 233–250. https://doi. Moran, P.A., 1948. The interpretation of statistical maps. J. Roy. Stat. Soc.: Ser. B org/10.1007/s11119-008-9067-z. (Methodol.) 10 (2), 243–251. https://doi.org/10.1111/j.2517-6161.1948.tb00012. Rousseeuw, P.J., 1987. Silhouettes: A graphical aid to the interpretation and validation x. of cluster analysis. J. Comput. Appl. Math. 20 (C), 53–65. https://doi.org/10.1016/ Mühlenstädt, T., Kuhnt, S., 2011. Kernel interpolation. Comput. Stat. Data Anal. 55 (11), 0377-0427(87)90125-7. 2962–2974. https://doi.org/10.1016/J.CSDA.2011.05.001. Rud, R, Cohen, Y., Alchanatis, V., Dar, Z., Levi, A., Brikman, R., Shenderey, C., Heuer, B., Mulder, V.L., de Bruin, S., Schaepman, M.E., 2013. Representing major soil variability at Markovits, T., Mulla, D., Rosen, C., 2013. The potential of CWSI based on thermal regional scale by constrained Latin Hypercube Sampling of remote sensing data. Int. imagery for in-season irrigation management in potato fields. 721–727. https://doi. J. Appl. Earth Obs. Geoinf. 21 (1), 301–310. https://doi.org/10.1016/j. org/10.3920/978-90-8686-778-3_89. jag.2012.07.004. Rud, Ronit, Cohen, Y., Alchanatis, V., Levi, A., Brikman, R., Shenderey, C., Heuer, B., Murolo, S., Mancini, V., Romanazzi, G., 2014. Spatial and temporal stolbur population Markovitch, T., Dar, Z., Rosen, C., Mulla, D., Nigon, T., 2014. Crop water stress index structure in a cv. Chardonnay vineyard according to vmp1 gene characterization. derived from multi-year ground and aerial thermal images as an indicator of potato Plant. Pathol. 63 (3), 700–707. https://doi.org/10.1111/ppa.12122. water status. Precis. Agric. 15 (3), 273–289. https://doi.org/10.1007/s11119-014- Nanda, A., Sen, S., McNamara, J.P., 2019. How spatiotemporal variation of soil moisture 9351-z. can explain hydrological connectivity of infiltration-excess dominated hillslope: Shinya, P., Contador, L., Predieri, S., Rubio, P., Infante, R., 2013. Peach ripening: observations from lesser Himalayan landscape. J. Hydrol. 579, 124146 https://doi. segregation at harvest and postharvest flesh softening. Postharvest Biol. Technol. 86, org/10.1016/j.jhydrol.2019.124146. 472–478. https://doi.org/10.1016/j.postharvbio.2013.07.038. Nawar, S., Corstanje, R., Halcro, G., Mulla, D., Mouazen, A.M., 2017. Delineation of soil Sinton, T.H., Ough, C.S., Kissler, J.J., Kasimatis, A.N., 1978. Grape juice indicators for management zones for variable-rate fertilization: a review. In: Advances in prediction of potential wine quality. I. Relationship between crop level, juice and Agronomy, Vol. 143. Academic Press Inc., pp. 175–245. https://doi.org/10.1016/bs. wine composition, and wine sensory ratings and scores. Am. J. Enol. Viticult. 29 (4). agron.2017.01.003 Stein, A., Ettema, C., 2003. An overview of spatial sampling procedures and experimental O’Shaughnessy, S.A., Evett, S.R., Colaizzi, P.D., Howell, T.A., 2011. Using radiation design of spatial studies for ecosystem comparisons. Agric. Ecosyst. Environ. 94 (1), thermography and thermometry to evaluate crop water stress in soybean and cotton. 31–47. https://doi.org/10.1016/S0167-8809(02)00013-0. Agric. Water Manag. 98 (10), 1523–1535. https://doi.org/10.1016/j. Stéphane Dray, A., Bauman, D., Blanchet, G., Borcard, D., Clappe, S., Guenard, G., agwat.2011.05.005. Jombart, T., Larocque, G., Legendre, P., Madi, N., Wagner, H.H., 2020. Package O’Toole, J.C., Moya, T.B., 1978. Genotypic variation in maintenance of leaf water “adespatial”: Multivariate Multiscale Spatial Analysis (R package version 0.3-8). http potential in rice1. Crop Sci. 18 (5), 873–876. https://doi.org/10.2135/ s://doi.org/10.1890/11-1183.1. cropsci1978.0011183X001800050050x. Visalini, K., Subathra, B., Srinivasan, S., Palmieri, G., Bekiroglu, K., Thiyaku, S., 2019. Ohana-Levi, N., Ben-Gal, A., Peeters, A., Termin, D., Linker, R., Baram, S., Raveh, E., Paz- Sensor placement algorithm with range constraints for precision agriculture. IEEE Kagan, T., 2020. A comparison between spatial clustering models for determining N- Aerosp. Electron. Syst. Mag. 34 (6), 4–15. https://doi.org/10.1109/ fertilization management zones in orchards. Precis. Agric. 1–25 https://doi.org/ MAES.2019.2921177. 10.1007/s11119-020-09731-5. Wagner, H.H., 2003. Spatial covariance in plant communities: Integrating ordination, Ohana-Levi, N., Paz-Kagan, T., Panov, N., Peeters, A., Tsoar, A., Karnieli, A., 2020. Time geostatistics, and variance testing. Ecology 84 (4), 1045–1057. https://doi.org/ series analysis of vegetation-cover response to environmental factors and residential 10.1890/0012-9658(2003)084[1045:SCIPCI]2.0.CO;2. development in a dryland region. GIScience and Remote Sensing 56 (3). https://doi. Wang, J.F., Zhang, T.L., Fu, B.J., 2016. A measure of spatial stratified heterogeneity. org/10.1080/15481603.2018.1519093. Ecol. Ind. 67, 250–256. https://doi.org/10.1016/j.ecolind.2016.02.052. Ohana-Levi, Noa, Bahat, I., Peeters, A., Shtein, A., Netzer, Y., Cohen, Y., Ben-Gal, A., Werbylo, K.L., Niemann, J.D., 2014. Evaluation of sampling techniques to characterize 2019. A weighted multivariate spatial clustering model to determine irrigation topographically-dependent variability for soil moisture downscaling. J. Hydrol. 516, management zones. Comput. Electron. Agric. 162, 719–731. https://doi.org/ 304–316. https://doi.org/10.1016/j.jhydrol.2014.01.030. 10.1016/J.COMPAG.2019.05.012. Zhang, G., Liu, F., Song, X. dong., 2017. Recent progress and future prospect of digital Oliver, M.A., 2010. Geostatistical applications for precision agriculture. In: Geostatistical soil mapping: A review. In: Journal of Integrative Agriculture, Vol. 16 Chinese Applications for Precision Agriculture. Springer. https://doi.org/10.1007/978-90 Academy of Agricultural Sciences, Issue 12, pp. 2871–2885. https://doi. -481-9133-8. org/10.1016/S2095-3119(17)61762-3. Oliver, M.A., Webster, R., 1986. Semi-variograms for modelling the spatial pattern of Zhang, N., Wang, M., Wang, N., 2002. Precision agriculture—a worldwide overview. landform and soil properties. Earth Surf. Proc. Land. 11 (5), 491–504. https://doi. Comput. Electron. Agric. 36 (2–3), 113–132. https://doi.org/10.1016/S0168-1699 org/10.1002/esp.3290110504. (02)00096-0. 14

(PDF) A multifunctional matching algorithm for sample design in agricultural plots