This article discusses the current practice and trend of data collection in New Zealand dairy farms. Issues are revealed, business opportunities put forward, and a potential solution in the form of Scarlatti’s Data Locker tool explained. In realising the Data Locker concept, we argue that the current surveys used for data collection are not properly synchronised, hence compromising the optimality of the tool. A couple of recommendations to tackle this issue are written.

New Zealand dairy farmers are confronted with a sharp increase in the information required by external organisations to measure and evaluate their farm performance (e.g., via benchmarking) as well as to ensure legal compliance. Two of the consequences of such trend, which are the focus of this article, are the emergence of various data forms that need to be distributed to and filled by the farmers and the increase in the data collection activity. The collection activity is currently carried out by external organisations who, in many cases, send out their agents to the farmers to assist and ensure the collection of the required data. Subsequently, these lead to an increase in operational cost by the organisations, as well as the discontentment from the farmers who need to provide substantial amount of information (and often redundantly). Inappropriately managed, the latter will compromise the integrity of the collected data, requiring more cost to get them validated.

The above challenges may, at the same time, represent a new business opportunity. The case for this relies on the observation that many data forms collect some common information from the farmers. By creating a process that is able to collect the information once and use it multiple times, we can practically diminish the operational cost generated by the data collection activity and minimise farmers’ discontentment by not asking the same information twice. It also allows the external organisations to invest less in the data collection activity and more in the analysis (strategic or operational) that utilises the data.

To the author’s knowledge, there has not been many tools that have been developed to benefit from the above opportunity. The closest one was AgriCloud which was explored by NZX Agri and a couple collaborators. Their idea was to develop a farm data management centre in the cloud, which organises the information that can be automatically collected (using electronic devices) from the farmers and shares it with other parties. The project is currently in hiatus. A couple of other tools developed in the past, for example AG-HUB and FARMIQ, also seek to aggregate information from the farmers. However, they stop short from trying to reconcile different existing tools, and focus on providing analysis for the farmers.
In what follows, we elaborate our idea of a data locker for farmers in an attempt to address the above challenges. In turn, we present our own preliminary studies and discuss the challenges that have been faced. Some recommendations then follow to conclude the article.

The conception of the data locker idea is inspired by the advancement of internet technology and the success of many electronic social network / communication sites. The main idea is to give the farmers “the ownership” of the data and allow them to control the data sharing via internet. Analogous to this is the social network site Facebook, where general public can enter their details and configure their “privacy” setup to manage data sharing. In the Data Locker, the farmers enter their farm details and the “privacy” setup allows them to choose which external organisations can have an access to which of their data.

The data locker itself acts as an interface between farmers and the external organisations. The external organisations can extract and import data to the farmers (upon permission from the latter) and farmers can in turn view and modify their “profiles” at will. Figure 1 illustrates the basic of Data Locker.




Physically, the Data Locker takes the form of an online survey that are built from aggregated surveys used by different external organisations. Depending on who the farmers want to share their data with, the Data Locker customises the compulsory fields to be filled by the farmers.
Conceptually, upon proper survey aggregation and data entry, the Data Locker can reduce the data redundancy that exists between surveys and minimise the data collection cost by external organisations. However, in practice, two relatively important issues appear to inhibit a direct development of the Data Locker tool. These are the seemingly lack of commonality and the variation of data quality. In this article, we will focus on the former and provide only a brief comment on the latter.

One of the main cases for the Data Locker is the assumption that many data forms collect some common information from the farmers. Obviously, the higher the commonality between the data forms, the more valuable the Data Locker will be. Quick observation of the forms, however, fails to reveal critical insights on the commonality, mostly due to the different textual contents between them. Therefore, we design a structured pragmatic approach to analyse the commonality between forms.

Our approach to commonality analysis: Assume that we have filled one form (the basis form). We ask the question – how much of another form (the mapped form) can be readily filled based on the completed basis form. In our analysis, we use the DairyBase surveys (published by DairyNZ) as the basis form, mainly due to our familiarity with them and the fact that some DairyBase surveys are designed to complete (at least, to a large extent) a dairy Overseer file, which is the standard tool used in a significant number of analysis for New Zealand dairy farms.

We also note that varying levels of commonality exist between forms. That is, the commonality questions do not expect a binary response. To the above fact, we discretise the response to three levels:

1. Level 1: Basically the same fields, but with possibly different wordings, cardinality, and (when relevant) enumerations.
2. Level 2: Fields that have the same intent, can be derived from an aggregation (or calculations), and asked at different point of time.
3. Level 3: Fields that are not related or cannot be derived in an obvious way.


Obviously, the above approach leads to a directional outcome (i.e., the outcome of the analysis will depend on which of two forms are the basis and mapped). This is particularly true for Level 2 similarity. For example, consider the fields Area by type and Total area:
• Area by type --> Total area: Level 2 common, since Total area can be aggregated from Area by type.
• Total area --> Area by type: Level 3 common, since Area by type cannot be derived from Total area.

Figure 2 serves as an illustration of the results from the commonality analysis undertaken from DairyBase to Fonterra Dairy Diary. In total, 40% of the latter form is Level 1-2 common with the basis form. The same analysis were also done to Agriculture Production Census, Horizons Sustainable Milk Plan (SMP), and Farm Environmental Plan (FEP) by Irrigation NZ among other forms. We found correspondingly, 49%, 42%, and 6% Level 1-2 commonality with DairyBase.

In addition to the above analysis, we also used the FEP as the basis form to check its commonality with SMP. Interviews with DairyNZ personnel revealed that the two should have relatively high commonality. However, our study showed only 26% of SMP’s fields can be filled by completing the FEP.
The study revealed that the commonality between forms is relatively low as opposed to what the experts’ intuition may suggest. Even for the fields with Level 1-2 commonality, a huge majority belong to Level 2 (which means that some analysis need to be performed in order to reconcile the fields).

Through the above analysis and discussions with a few partners from different organisations, we identified two sources of discrepancies that lower the commonality between forms:

1. different terminologies are used in different forms, and
2. different set of questionnaires are used to derive the same output.




To the above extent, a standardisation exercise is recommended to realise the value of the Data Locker. At a low level, a dairy dictionary needs to be established to standardise the terminologies used by different organisations (and farmers). A preliminary study suggested that this type of standardisation may improve the commonality between similar surveys significantly (the Level 1-2 commonality from DairyBase to Fonterra Dairy Diary could improve from 40% up to 75%). This exercise can and should be integrated with an ongoing work by Rezare System on establishing a Dairy Industry Network Data Standard (DINDS) that essentially acts as a database of terminologies used in New Zealand dairy industry. At the moment, most of DINDS’ contents are concerned about low-level measures (weight of each cow, chemical used per application, etc.). There is a potential to extend it to cover farm-level measures.

The other source of discrepancy was revealed by the limited commonality between FEP and SMP. We understand via discussions that the two utilise seemingly different sets of questionnaires in order to derive the same output report (or sub-report). The commonality analysis described in the previous section look at two different surveys (the basis and mapped forms) and compare them fields by fields (questions by questions). Since the questionnaires appear completely different, the analysis fail to capture the commonality between FEP and SMP.

More importantly, the above exercise reveals that standardisation may also be required at a high level (i.e., form-by-form, instead of field-by-field). This can be achieved by starting to work from the output, back to the overall set of questionnaires required and standardise the latter as a whole.
Remarks on data quality

One of the two identified issues with the concept of the Data Locker is the variation of data quality. It has been acknowledged by a couple different external organisations, that some entries (especially those require some technical judgments) from the farmers lack the required degree of accuracy. Noting this, the Data Locker may not be able to completely eliminate the cost by the external organisations to send their specialised agents to assist the data entry activity. Quite to the contrary, the Data Locker should be designed to allow some degree of interference from professional consultants.
Another implication of this issue is the need for the Data Locker to have the ability to manage data validity. This may be as simple as installing a programme within the tool that disallows grossly erroneous entries and also records the individual who makes any data entry. A validation rule should be set up to evaluate the reliability of data based on the latter.



In this short article, we have explained the business case for unifying farm data fields for performance and compliance measurements. As the number of farm surveys increases, so does the cost and time by many organisations and farmers. Many professionals note the similarity between existing forms, and therefore there is an opportunity to reduce the cost of data entry and collection. Scarlatti conceives the idea of a data locker to capitalise this emerging opportunity. However, on a detailed study, we found two issues that oppose the development of the Data Locker tool. One of them, discussed extensively in this article, is the seemingly lack of commonality between forms. This issue can potentially be addressed by proper standardisation. It is our plan to manage this process in the future on our pathway to provide a data locker for farmers.