Skip to main content

Predicting Honeybee Health: The Healthy Colony Checklist,Hive Scale and Weather Data

Guy and Scott trial run

Published onMar 11, 2022
Predicting Honeybee Health: The Healthy Colony Checklist,Hive Scale and Weather Data
·

Emily Lower1,*, Brooke Hilden1, Michael Kovalchuk1, Grantham Williams1,
Richard Rogers2, Edgar Hassler4 and Joseph Cazier,4

1 Walker College of Business Graduate Programs, Appalachian State University TEST CHANGE test change 2

2 Bayer Crop Science, United States

4 Computer Information Systems, Appalachian State University

* Corresponding Author: emilylower@gmail.com

Structured Abstract

Data Overview: This dataset is designed to help monitor and predict honeybee health. It includes information from hives in North Carolina and Utah, including records of colony health as measured by the standardized Healthy Colony Checklist (HCC) protocol described in Cazier et. al. (2018a and 2018b). Honey bee colony inspection data was paired with hive weight data for more insights.

Data Value: This data is helpful to develop a predictive model of honey bee colony health based on hive scale data. It can also help measure and explain key factors influencing hive health.

Data Description: Data includes standardized colony inspection data, hive weight data collected automatically at least hourly, local hourly weather data, approximate apiary location and hive history.

Data Application: The data described in this article is useful for analyzing and predicting the health of honey bee colonies. By combining HCC data with scale measurements and meteorological dimensions, this data also attempts to determine what factors are most impactful on the health of a colony. The apiaries used for data collection are located throughout the US, which means the corresponding weather data provides diversity in the weather conditions experienced by the hives. This could help in identifying the main meteorological and geographical influences that impact honey bee colony health.

Indexing Table

Supported UN SDGs

2: Zero Hunger, 15: Life on Land

Type of Data/Article

Archival

Class of Analytics

Descriptive, Diagnostic and Predictive

Data Tables

Six tables: Apiary Information, Hive_Information, HCC_Inspections, Scale_Data, Weather_Stations, Hourly_Weather

Key Words

Healthy Colony Checklist, Bayer, Beekeeping, Scale Data, Weather, Ecological Modeling

Introduction

Honey bees are responsible for pollinating the majority of the most nutritious crops such as fruits, nuts, and vegetables, and also for pollinating crops used for seed production. (Bauer and Wing, 2010). Yet many beekeepers are now facing annual colony losses of more than 40% (Steinhauer, et al., 2015). Beekeepers must address many challenges to keep honey bee colonies healthy. Even though beekeepers are doing their best, failures do happen and could potentially lead to reduced supply of hives for pollination. Honey production can also be negatively affected if honey bee colony health is not maintained.. This data is designed to help build a predictive model for honey bee colony health. It includes the following types of data:


  • Standardized Hive Inspection Data - Trained experts performed regular hive inspections using the standardized Healthy Colony Checklist (HCC), giving quality health data. See Cazier et. al. (2018a and 2018b) for more information on the HCC.


  • Hive Scale Data - Sensors recording hive weight every hour or more for each hive. This provides useful data that might be used to forecast hive outcomes.


  • Weather Data - Hourly weather data was collected for each hive location to provide valuable control variables.

Together this data has the potential to start identifying factors that can explain or predict colony health in advance, an important precursor to identifying management that can improve and maintain honey bee colony health. The practical determination of the health status of a honey bee colony can be achieved through careful inspection by a knowledgeable and experienced beekeeper. We used the following definition of a healthy hive as proposed by Richard Rogers (personal comm) for a working definition of a healthy colony.

A healthy honey bee colony has below threshold levels of parasites, pathogens, and predators; no deficiency of, or out of balance, beneficial microbes; and strength and health is sustainable with a reasonable amount of management by the beekeeper to provide food, shelter, and safety as needed, as for any livestock operation.

For practical use, the above definition was further expanded to identify six key assessable conditions (Cazier et. al. (2018a and 2018b) that contribute to optimal colony growth potential and overall health to develop the HCC. To be considered “apparently” healthy,a colony must satisfy all six of the following conditions, as seasonably appropriate:


  1. BROOD: all stages and instars of brood present


  2. ADULT BEES: sufficient adult bees and age structure to care for brood and perform all the tasks of the colony


  3. QUEEN: a young productive laying queen with no apparent deformities or behavioral issues


  4. FOOD: sufficient nutritious forage (external to hive) and food stores (internal to hive)


  5. STRESSORS: no stressors that would lead to reduced colony survival and growth potential (including environmental conditions inside and outside hive, and issues with biosecurity, as well as crowding leading to drifting and mingling)


  6. SPACE: suitable hive space for current and near-term colony needs that is sanitary and defendable, and has room for egg laying and food storage.

Appendix B presents the original inspection rubric.

A few sample hypotheses that could be tested with this data include:


  • Which HCC factors (Bees, Brood, Queen, Food, Stressors, Space) are satisfied/present most often? Which are deficient/problematic most often? Is any one factor more important to the health of the hive than the others?


  • The HCC makes it faster to perform colony assessments with minimal record keeping. However, does the HCC process provide enough inspection data to identify actions needed to improve and maintain honey bee colony health? Can beekeepers use the HCC to self-identify knowledge gaps? Can educators use the HCC to identify topics to customize training for individual needs? Does the HCC contribute to sustainability of apiculture?


  • What meteorological dimensions impact colony health the most? Do different overall climates (ie. North Carolina vs. Utah) impact colony health?


  • What are the internal and external factors that affect honey production? Do healthy hives produce more honey?

U.N. Goals Supported

The following are United Nations Sustainable Development Goals that are potentially addressed by this paper.

Primary: Goal 15 Life on Land

“Protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss.” (United Nations, 2021)

Secondary: Goal 2 Zero Hunger

“End hunger, achieve food security and improved nutrition and promote sustainable agriculture.”(United Nations, 2021)

The data referenced in this article provides a standardized way to inspect a honey bee colony so that its health, at the time of inspection, can be recorded and analyzed. By assessing the health of colonies, beekeepers can intervene at critical times to prevent colony losses. Reduced colony losses can improve the viability of commercial and smallholder apiculture thus contributing to the pollination of crops and wild plants, and overall biodiversity. Without the pollination services provided by managed honey bee colonies, there is a risk that crop yields could be negatively affected, leading to food scarcity and insecurity. Maintaining the health of managed honey bee colonies, helps ensure the viability of commercial apiculture that supports sustainable crop production and affordable food availability for all.

Value of Data

The HCC data provides a framework for analyzing colony health. From this information, beekeepers can decide what management action is needed, if any, to maintain colony health and prevent colony collapse. A widely accepted inspection method would facilitate the accumulation of shareable honey bee data that could be used in a number of settings. The scale data would provide additional insights into how colony health and hive weight changes, related to honey production or management action, are correlated.

Furthermore, the methodology set forth by the HCC is very scalable with potential for widespread adoption by small-scale to large-scale beekeeping operations. The hope is that with historical and real-time honey bee data, data scientists and beekeepers may discover new best practices that could extend the survivability of honey bees well into the future.

This data is valuable as it can aid in the creation of a standardized measure of hive health which allows for the development of predictive models of the various weather, sensor and observational inputs in this dataset. These predictive models can then be scaled to other hives and used to alert beekeepers of likely health problems so they can take management actions to mitigate the harm to the hive. Collectively this can improve colony health and survival, and pollination services which will help secure our food supply.

The HCC Framework

The six Healthy Colony Checklist items and their sub-conditions are based on honey bee biology and needs. Each are described below.

Hive Scale

The hive scale data relates to the change in hive weight over time. Since an individual bee weighs a fraction of a gram (~100 mg +-), and the number of bees in a colony do not change dramatically on a daily basis, then any major weight change in the hive can be attributed to honey production, honey harvest, a loss of bee mass from bees exiting the hive (e.g. swarming, foraging, mass mortality external to hive), or management action. A positive net change in hive weight is generally attributable to honey production, which is an indicator of colony performance, or a management action such as adding supers or feeding. In short, a healthy honey bee colony produces more honey thus increasing the weight of the hive.

Weather Data

Weather conditions can influence certain honey bee behaviors such as flight and clustering. Using temperature data, temperature based behavioral thresholds, and a flight and cluster hours model developed by Richard Rogers at Bayer, it was possible to quantify the impact of changing climate on these two behaviors. Without the right conditions, bees cannot fly to find food which will likely decrease the health of the colony. What follows next is an outline of what we might expect from a few of the weather variables relevant to bee activity:

External Temperature

Ambient air temperature was used as a variable to predict honey bee colony health and mortality rates because we expect, based on our research, that colony health is highly correlated with the climate conditions of their associated location. Temperature data was collected from wnderground.com, an international public database for weather data that pulls airport station weather, based on the locations of hives in our data set. Essentially, the ideal temperature for brood success ranges from 57°F to 100°F with 60-70 degrees Fahrenheit being the optimal for maximum honeybee productivity. Within this ‘sweet spot’ range bees can fly (Offord, S., 2017).

Internal Temperature

Inside hive temperature is also very important, and mostly independent of ambient temperature as the colonies thermoregulate in an effort to keep an ideal internal temperature for brood rearing. Colonies with a steady internal brood area hive temperature of “between 32℃ and optimally 35℃” (Arnia.co.uk, 2017) are hypothesized to be the “apparently” healthy colonies.  Colonies go to great lengths to maintain an internal hive temperature within their ‘sweet spot’; “therefore, variation in the brood temperature signals one of the following scenarios: broodless state, which could be due to seasonal influences; non-laying queen; queenlessness; or, preparations for swarming,” (Arnia.co.uk). “Even small deviations (more than 0.5℃) from the optimal brood temperatures have a significant influence on the development of the brood and health” of honey bees (Arnia.co.uk, 2017).

Outside of this temperature range, brood is present, but not all stages may be present. Furthermore, “stabilization of brood temperature from an unstable state can be a very reliable indication that the queen has started laying as the bees have started to regulate brood temperature,” (Arnia.co.uk). The lack of a presence of all stages of brood as well as the queen’s productivity are reflected in the Bayer HCC as a variable; therefore, internal temperature data will be useful in predicting the checklist items related to brood stage presence and frequency of queen egg laying. 

Precipitation

Precipitation is also related to the hive location’s vegetation, which directly impacts the honeybees’ “foraging season, colony development and vitality,” (Switanek, Crailshem, Truhetz, Brodschneider, 2016). We expect that a lack of sufficient rain will lead to higher mortality rates among honeybee colonies as annual precipitation increases, the likelihood that a hive will survive lower, less optimal temperature conditions increases. (Switanek, Crailshem, Truhetz, Brodschneider, 2016).

Business Value

There is as yet no widely adopted standardized protocol and recordkeeping form that beekeepers use for status-checking honeybee colonies. Currently, beekeepers use a variety of forms with different layouts and fields that are available from beekeeping suppliers and governments or that they have developed themselves, simple non-paper methods to mark hives, or nothing at all. As such, it is very difficult to accurately predict and track the progression of honey bee colony health. Also, results are seldom comparable among beekeepers. As a result, commercial beekeepers and hobbyists alike are limited in their ability to efficiently plan and execute appropriate management practices. Without a basic set of standardized inspections and records, it is also difficult to determine the efficacy of the resources used for seasonal and pest management.

Analysis of the Healthy Colony Checklist will provide beekeepers an incentive and methodology for documenting hive health on a regular basis. The current checklist has six conditions that are assessed by trained beekeepers. A potential predictive application of this data would involve exploring each of these six checklist items individually. This will yield one model for each descriptive independent variable that will predict the likelihood of obtaining a “1”, or “Yes” score for each of the checklist components and an overall healthy hive represented by an overall score. From there the individual models could be aggregated into one combined model to predict bee colony health.

In addition to the above goal of predicting overall hive health and mortality given certain conditions, a potential predictive subgoal of the data is to understand honey production in relation to overall hive health.  Phrased as a question, we ask: Do the six HCC factors directly translate to honey production?

Baseline

Target Variable: Each checklist item on the Bayer Healthy Colony Checklist and the related “0” or “1” score is a target variable. The goal would be to predict the likelihood of obtaining a “1” score for each of these checklist items based on our exploratory analysis of descriptive variables.

Each of the following target variables is determined on a beekeeper level at physical inspection of the hive:


  • Q1 - All stages of brood and instars present (eggs 1-3, larvae 1-6, pupae 1-11)? (Y/N)


  • Q2 - Sufficient adult bees and age structure to care for brood and perform all tasks of the colony? (Y/N)


  • Q3 - A young (<1 yr. old), productive, laying queen present? (Y/N)


  • Q4 - Sufficient nutritious forage and food stores available (inside and/or outside the hive)? (Y/N)


  • Q5 - No (apparent) stressors present that would lead to reduced colony survival and growth potential? (includes feeding brood, caring for queen, thermoregulation, foraging, house cleaning, undertaking, guarding) (Y/N)


  • Q6 - Suitable space (not too much or too little) for colony size/expansion that is sanitary and defendable, and room for egg laying? (Y/N)

Equation 1: The Healthy Colony Checklist Score

(Q116) + (Q216)+(Q316)+(Q416)+(Q516)+(Q616)=100%Q_{1}*\frac{1}{6})\ {+ \ (Q}_{2}*\frac{1}{6}) + {(Q}_{3}*\frac{1}{6}) + (Q_{4}*\frac{1}{6}) + {(Q}_{5}*\frac{1}{6}) + (Q_{6}*\frac{1}{6}) = 100\%

Each checklist item will have a line of best fit used to predict the probability of obtaining a “100%” overall score (equation 1). The ultimate goal would be predicting whether the hive is healthy; a hive is healthy if it obtains a “1” score for all 6 variables. For purposes of predicting whether the hive is healthy overall, a “1” score would be assigned to that checklist item if it has a greater than x% probability related to that independent variable. The formula used is as follows: Brood + Bees + Queen + Food + Stressors + Space = x/6 health score.

As mentioned above, the predictive model would determine the probability of obtaining a healthy index score of 6/6 (100%) based on the relative probability of obtaining an index score of “1” for each of the independent checklist items. The prediction would be based on our exploration of various independent variables that we find to have a strong correlation with that individual checklist item. Variables such as weather fluctuation within a day and honey production need to be explored to predict the probability of a “100%” score given an input of 6 different variables by a beekeeper. Additional factors, such as available flight and cluster hours during the day as a calculated field based on the existing Bayer algorithm developed by Dick Rogers, could also provide insight into the variables that contribute the most to attaining a score of 6/6 on the Hive Colony Checklist.

Baseline: A potential baseline for this predictive model is the national average beehive mortality rate of 35-45% per year. In order to use the most accurate mortality rate as the baseline, we believe it would be useful to collect the five year average mortality rate instead of just the previous year’s rate. The baseline was retrieved from the Bee Informed Partnership of Maryland. The partnership is a national nonprofit organization “using science-based, data-driven approaches to improve the health and long-term sustainability of honeybees,” (beeinformed.org, 2021). There is a 10 point range in the existing mortality rate statistic—if the predictive model could predict mortality rates within (1) the existing range, as well as (2) a smaller range of variability, then there is reason to believe that the model would work as intended.

Data Summary

Upon opening our dataset, the user can expect to find 6 tables in .csv format. The files names are as follows: Weather_Stations, Hourly_Weather, Apiary_Information, Hive_Information, Scale_Data, and HCC_Inspections.

Each file is in the table format with various variables as listed below in our Entity Relationship Diagram [Figure 2]. See the Entity Data Dictionary on pages 15-19 for further explanation of each variable as it relates to each entity.

Weather data has been collected from WeatherUnderground, which compiles airport weather station data.

The variable formats included in our 6 entities are abbreviated as follows:


  • varchar - a variable-length string of text characters


  • char - a fixed-length string of text characters


  • smallint - numeric integer values


  • date - values represent the date in the format of yyyymmdd


  • time - values represent the time in the local timezone in the format of hh:mm:ss


  • decimal - numeric integer values containing a decimal


  • boolean - values represent a logical yes-no decision indicated by a 1 or 0

Data Inventory

See Appendix A for a detailed listing of each variable from our data sources used. Note that the first two sections represent primary data and the last section represents secondary weather data collected.

Entity Relationship Diagram

Figure 2: Entity Relationship Diagram

Preliminary Data Validation

We began with 2 raw datasets. The first consisted of the Bayer HCC inspection reports collected at various apiary locations at non-standard intervals. The second consisted of 50 individual scale data tables, each representing inspection data for an 1 hive at 1 apiary collected. The Bayer HCC inspection reports, collected and organized by Dick Rogers, originally contained 37 variables. They include RecID, HCC_ID, Entered By, Entered, InsptDate, Flag, HiveTag_ID, Apiary, StateProv, Observer, Recorder, Brood, Bees, Queen, Food, Stressors, Space, %Met, BeeSampleCollected, StickyBoardInserterd, Notes, *DoNotUse, Management, MgmtDate, MgmtBy, VarCount_ID, VarWash_ID, Attachments, Flag Notes, Hive Scale Data, Varroa Board Counts, C1-Note only used in 2016/2017/2019, C2-Note only used in 2016/2017/2019, C3-Note only used in 2016/2017/2019, C4-Note only used in 2016/2017/2019, C5-Note only used in 2016/2017/2019, and C6-Note only used in 2016/2017/2019. We began our cleaning process by making a copy of the original HCC dataset and removing all columns except for the following: HiveTag_ID, Apiary, InsptDate, Brood, Bees, Queen, Food, Stressors, Space, and %Met. We did not keep the RecID, HCC_ID, Flag, StateProv, BeeSampleCollected, StickyBoardInserted, *DoNotUse, Management, MgmtDate, MgmtBy, VarCount_ID, VarWash_ID, Attachments, Flag Notes, Hive Scale Data, Varroa Board Counts, C1-Note only used in 2016/2017/2019, C2-Note only used in 2016/2017/2019, C3-Note only used in 2016/2017/2019, C4-Note only used in 2016/2017/2019, C5-Note only used in 2016/2017/2019, and C6-Note only used in 2016/2017/2019 because each field is 90-100% empty of data. We removed the fields EnteredBy, Observer, and Recorder because these fields are not relevant to our hypotheses. The recording of an observation by a particular beekeeper may be helpful in explaining why an observation was recorded in a certain way, but we do not think it will be helpful in predicting overall beehive health. Furthermore, we removed the Notes field because of potential inconsistencies and because of the subjective nature of this field. We also determined that this information is auxiliary and that similar, more clear information is contained within each of the HCC attributes.

Our data cleaning steps for the HCC checklist data are as follows:


  1. After filtering our attributes down to the relevant, non-empty columns, we used a VLOOKUP function to match Apiary with city and state information.


  2. We renamed the %Met column to Percent_Met due to the special character and the inability of many programming languages to handle special characters.


  3. We then checked the range of each HCC attribute (Brood, Bees, Queen, Food, Stressors, and Space) to ensure that each value is either a 0 or a 1. We discovered that there were some scores of 0.5 and, after discussing this with our subject matter expert, changed each of these scores to 0 to reflect that that checklist item has not been fully met.


  4. Removed all HCC data with no matching location because we will not be able to perform weather analysis on these hives. The apiaries removed are Pepper tree (1 hive), BIJ-avenues (1 hive), Hill (1 hive), Jim (2 hives), Orchard (1 hive), and Hart (45 hives).


  5. Created an engineered field called Healthy with values stored in boolean format. A hive is assigned a score of 1 if the Percent_Met column is equal to 100. A hive is assigned a score of 0 if the Percent_Met column is less than 100.


  6. Using a program called Alteryx, ensured all variable types are stored correctly. This includes converting the InpstDate field to a yyyymmdd format. This also includes converting each of the HCC checklist items from integers to boolean values.


  7. Using Alteryx, renamed HiveTag_ID to HiveID. Also renamed InsptDate to Inspection_Date to match the names in the hive scale data discussed next.

Figure 3: HCC Indicator Frequency

Figure 3 shows the average score (1 or 0) for each of the 6 HCC indicators for 2016-2019. The greener the color is, the more likely that score is to obtain a score of 1, indicating that the hive is healthy. However, as indicated by the chart, the hive is less likely to obtain a healthy score as the color moves from green to red. In summation, it appears that the presence of bees has the highest positive correlation with a healthy beehive colony and the presence of stressors has the highest negative correlation with a healthy colony.

The hive scale data collected consists of hive weight observations as compared to the previous inspection date/time. Observations are collected in 15-minute intervals from 2015 - 2019. Each hive has its own inspection log and inspection date range. There are 7 attributes present within each hive scale log: Hive, Customer Field, Date Time, Weight [lbs], Weight Variation [lbs], Temperature [F], and Information. We removed the Information attribute because this field is empty for almost all of the data. The dataset also contains an attribute called Temperature, but we discovered that some of the values reflect internal hive temperature while others reflect external atmospheric temperature with no way to tell which is which. As such, we removed this attribute from our combined hive scale dataset with the expectation that we could collect more accurate weather data after figuring out which dates and locations we needed.

Data Management

The data was anonymized during data collection. We intentionally did not keep address or latitude/longitude records for the apiaries to protect their privacy. Instead we used the nearest town or city to serve as the location for record keeping and weather data purposes.

Data Application and Conclusion

The potential information gained by this dataset could have a revolutionary impact on the standardization of beehive inspections. If used correctly, data scientists can gather insight on the various factors which influence beehive health and could subsequently help beekeepers predict hive behaviors and activities. Even though we are still focused on descriptive aspects of this data, potentially the most important value gained from this data is the ability to predict how current environmental factors are going to impact hive health. Armed with that information, beekeepers gain the information needed for correct management action that will prolong the life of hives.

Not only could this dataset provide insight on hive health, it could also help determine the overall validity and effectiveness of the Bayer Healthy Colony Checklist. It is important to determine whether or not this process of collecting hive data is effective in providing enough information for beekeepers (management) to use and base decisions on. Furthermore, while the inspection process should provide sufficient data, it should also be conducted in a timely manner such that the resulting actions do not cause hive distress. Data scientists may be able to observe changes in hive behavior after an inspection has been conducted and identify any indications of a negative impact. This could help indicate whether or not the inspection process is being carried out effectively.

Although the main focus of this dataset is based around beehive data, there are some conclusions that could be drawn using the indirect data from our dataset. With weather specifically, we are able to determine which regions around the country have seen a change in bee hive patterns through the decades and which have remained constant; these kinds of observations would allow bee keepers and researchers in those areas to make conclusions on why these changes have occured.

Despite the many potential benefits this dataset can provide, there are a few limitations that are worth mentioning. The main limitation focuses around the collected Hive Scale data and the HCC data, where the original Hive Scale dataset contains a significant amount of observations more than the HCC data. Meaning, these two datasets could only be combined based on the dates when inspections occurred in the HCC data. Overall, this removes a considerable amount of scale observations that could provide insight regarding hive health in relation to honey production. Without a defined interval in which hive inspections take place (i.e. every week, every two weeks, etc.) it is hard to draw consistent patterns in the data when matching the scale and checklist datasets together. If further research and data collection is to be done, it would be beneficial to establish an official interval in which inspections take place.

Acknowledgements

The authors wish to thank former teammate Cory Dalton for his contributions. Finally, we wish to thank Nika Davenport and Preston McDonald for their guidance and advice throughout the project.

References

Bauer, D. M. and Wing, I. S. (2010) “Economic Consequences of Pollinator Declines: A Synthesis,” Agricultural and Resource Economics Review, vol. 39, no. 3. pp. 368–383, 2010, doi: 10.1017/s1068280500007371.

Cazier, Joseph A., Rogers, R., Hassler, E. E. and Wilkes, James T. (2018a) “The Healthy Colony Checklist (HCC) Part I: A Framework for Aggregating Hive Inspection Data”, Bee Culture, July 2018 Issue. Pages 29-32.

Cazier, Joseph A., Rogers, R., Hassler, E. E. and Wilkes, James T. (2018b) “The Healthy Colony Checklist (HCC) Part II: Validating and Using the HCC for Hive Inspections”, Bee Culture, August 2018 Issue. Pages 27-31.

Evans, Hugh (2021) “Temperature and Thermoregulation in the Beehive.” arnia.co.uk. (n.d.). Accessed April 15, 2021 at https://www.arnia.co.uk/temperature-and-thermoregulation-in-the-beehive/.

Fauvel, A. M. (2021). Bee Informed Partnership. Accessed April 8, 2021 at https://beeinformed.org/.

Braga, A. R., D. G. Gomes, B. M. Freitas, and J. A. Cazier, (2020a), “A cluster-classification method for accurate mining of seasonal honey bee patterns” Ecological Informatics, v. 59, p. 101107.

Braga, A. R., D. G. Gomes, R. Rogers, E. E. Hassler, B. M. Freitas, and J. A. Cazier, (2020b) “A method for mining combined data from in-hive sensors, weather and apiary inspections to forecast the health status of honey bee colonies” Computers and Electronics in Agriculture, v. 169, p. 105161.

Hadley, D., (2019) “How Honey Bees Keep Warm in Winter, Thermoregulation in Winter Honey Bee Hives”, accessed April 10, 2021 at https://www.thoughtco.com/how-honey-bees-keep-warm-winter-1968101.

Offord, S., (2017) How Honey Bees Survive Winter by Regulating Their Temperature in a Cluster, https://www.beepods.com/honey-bees-survive-winter-regulating-temperature-cluster/.

Steinhauer, N., K. Kulhanek, K. Antúnez, H. Human, P. Chantawannakul, and M.-P. Chauzat (2018) Drivers of colony losses: Current opinion in insect science, v. 26, p. 142-148.

Steinhauer, N., et al. (2015, May 13). Colony Loss 2014-2015: Preliminary Results. Retrieved February 24, 2016, from https://beeinformed.org/results/colony-loss-2014-2015-preliminary-results/

Steinhauer, N., and C. Saegerman, (2021) “Prioritizing changes in management practices associated with reduced winter honey bee colony losses for US beekeepers” Science of The Total Environment, v. 753, p. 141629.

Switanek, M., K. Crailsheim, H. Truhetz, and R. Brodschneider, (2016) “Modelling seasonal effects of temperature and precipitation on honey bee winter mortality in a temperate climate” Science of the Total Environment, v. 579, p. 1581-1587.

United Nations (2021). “The 17 goals”, United Nations Department of Economic and Social Affairs Retrieved April, 2021, from https://sdgs.un.org/goals/goal8

USDA (2021). “Animals and animal products:bees and honey”. Retrieved March 12, 2021, from https://usda.library.cornell.edu/concern/publications/rn301137d?locale=en

Appendix A - Entity Data Dictionary

Apiary_Information.

Attribute Name

Format

Description

ApiaryID

VARCHAR(2)

PK for the Apiary_Information table. This variable assigns a unique apiary identifier based on the apiary name, city, and state.

Apiary

VARCHAR(15)

Name of apiary.

City

VARCHAR(25)

City of the related apiary. Also city of the related weather station.

State

CHAR(2)

State of the city in which the apiary is located.

Hive_Information.

Attribute Name

Format

Description

HiveID

VARCHAR(3)

PK for the Hive_Information table. This variable assigns a unique hive identifier based on the Hive_Tag and ApiaryID.

Hive_Tag

VARCHAR(10)

The hive name given in the original HCC raw dataset.

ApiaryID

VARCHAR(2)

PK for the Apiary_Information table. This variable assigns a unique apiary identifier based on the apiary name, city, and state.

HCC_Inspections.

Attribute Name

Format

Description

InpsectionID

VARCHAR(3)

PK for the HCC_Inspections table. This variable assigns a unique inspection identifier based on the HiveID and HCC inspection date.

HiveID

VARCHAR(3)

PK for the Hive_Information table. This variable assigns a unique hive identifier based on the Hive_Tag and ApiaryID.

InsptDate

DATETIME

Date the specific hive was inspected using the HCC checklist in the format of yyyymmdd.

Brood

BOOLEAN

C1 - All stages of brood and instars present (eggs 1-3, larvae 1-6, pupae 1-11)?

Bees

BOOLEAN

C2 - Sufficient adult bees and age structure to care for brood and perform all tasks of the colony?

Queen

BOOLEAN

C3 - A young (<1 yr old), productive, laying queen present?

Food

BOOLEAN

C4 - Sufficient nutritious forage and food stores available (inside and/or outside the hive)?

Stressors

BOOLEAN

C5 - No (apparent) stressors present that would lead to reduced colony survival and growth potential?

Space

BOOLEAN

C6 - Suitable space (not too much or too little) for colony size/expansion that is sanitary and defendable, and room for egg laying?

Percent_Met

SMALLINT

Calculates the amount of HCC checklist items that obtained a "1" value out of a possible 6. This value what given in our dataset; therefore, we do not need to perform any calculations to get this value.

Healthy

BOOLEAN

Analyzes the Percent_Met variable an assigns a 'Yes' value if 100% of checklist items have been met. Assigns a 'No' value if anything less than 100% of the checklist items have been met.

Scale_Data.

Attribute Name

Format

Description

ScaleID

VARCHAR(7)

PK for the Scale_Data table. This variable assigns a unique scale inspection identifier based on HiveID, Date, and Time.

HiveID

VARCHAR(3)

PK for the Hive_Information table. This variable assigns a unique hive identifier based on the Hive_Tag and ApiaryID.

Date

DATETIME

Represents the weather observation date, scale inspection date, and HCC inspection date as the data values from the scale inspection dates were matched with the HCC inspection dates and used to pull weather data. Stored in the format of yyyymmdd.

Time

DATETIME

Represents the time, in 15 minute increments, that the hive scale was collected by the hive sensor. Stored in the format of hhmmss.

Customer_Field

VARCHAR (20)

Represents the entity submitting the hive scale data

Original_Weight

SMALLINT

Represents the hive scale weight recorded for a HiveID at a specific date and time.

Weight_Variation_Lbs

SMALLINT

Represents the hive scale weight variation in pounds from the last recorded inspection time for that specific HiveID.

Scale_Temperature

SMALLINT

Represents the temperature recorded for a hive at a specific time and date. Some of the values are the internal temperature and some are the external temperature.

Weather_Stations.

Attribute Name

Format

Description

StationID

CHAR(1)

PK for the Weather_Stations table. Assigns a unique weather station identifier based on City and Station. Includes all stations for which there is a matching InsptDate, scale weight Date, and apiary location.

Station

CHAR(4)

Name of the weather station from which weather data was collected.

City

VARCHAR(25)

City of the related apiary. Also city of the related weather station.

Hourly_Weather.

Attribute Name

Format

Description

WeatherID

VARCHAR(4)

PK for the Hourly_Weather table. Assigns a unique weather identifier based on ObsID, StationID, Temperature, Humidity, Dew_Point, Wind_Direction, Wind_Speed, Wind_Gust, Pressure, Precip, Condition, Sunrise, Sunset, and Daylight_Hours.

Date

DATETIME

Represents the weather observation date, scale inspection date, and HCC inspection date as the data values from the scale inspection dates were matched with the HCC inspection dates and used to pull weather data. Stored in the format of yyyymmdd.

Obs_Time

DATETIME

Represents the time, in 1 hour increments, that weather data was collected for a specific location and date. Stored in the format of hhmmss.

Obs_Hour

CHAR(2)

Represents the first two digits of the Obs_Time variable.

StationID

CHAR(1)

PK for the Weather_Stations table. Assigns a unique weather station identifier based on City and Station. Includes all stations for which there is a matching InsptDate, scale weight Date, and apiary location.

Temperature

SMALLINT

Represents the temperature identified for a specific ObsID and StationID.

Humidity

SMALLINT

Represents the humidity level identified for a specific ObsID and StationID.

Dew_Point

SMALLINT

Represents the dew point identified for a specific ObsID and StationID.

Wind_Direction

VARCHAR(4)

Represents the wind direction identified for a specific ObsID and StationID. Possible values include CALM, E, ENE, ESE, N, NE, NNE, NNW, NW, S, SE, SSE, SSW, SW, VAR, W, WNW, WSW.

Wind_Speed

SMALLINT

Represents wind speed in miles per hour (MPH) for a specific ObsID and StationID.

Wind_Gust

SMALLINT

Represents wind gust strength for a specific ObsID and StationID.

Pressure

DECIMAL(2,2)

Represents the barometric air pressure at a point in time for a specific ObsID and StationID.

Precip

DECIMAL(1,1)

Represents the precipitation present, in inches, at a point in time for a specific ObsID and StationID.

Condition

VARCHAR(25)

Represents the sky conditions at a point in time for a specific ObsID and StationID. Possible values include Cloudy, Cloudy / Windy, Fair, Fog, Heavy T-Storm, Light Drizzle, Light Rain, Light Rain / Windy, Light Rain with Thunder, Mostly Cloudy, Mostly Cloudy / Windy, Partly Cloudy, Partly Cloudy / Windy, Patches of Fog, Rain, Thunder, Thunder / Windy, Thunder in the Vicinity, T-Storm.

Sunrise

DATETIME

Represents the sunrise time for the ObsID day, stored in the format of hh:mm:ss.

Sunset

DATETIME

Represents the sunset time the the ObsID day, stored in the format of hh:mm:ss.

Daylight_Hours

DATETIME

Represents the number of daylight hours. Stored in the format of hh:mm:ss.

Appendix B - Healthy Colony Checklist Inspection Form

Comments
0
comment

No comments here

Why not start the discussion?