I have been thinking about the kind of competencies or skills could make up what we call digital scholarship. I’m pretty sure somewhere there must be something on this, but I couldn’t find it. Here is a snapshot of the list I have been keeping in my notes.
To be clear: this will never be finished and this blog is, of course, not the best place to publish such a thing. I would like suggestions for related work and collaboration.
It was inspired by playing with Moodle’s Competency Framework features, my work (obviously, though this has been a private side project) and the DCMI Linked Data Competency Index.
Fundamentals of digital scholarship
Definitions
- Define digital scholarship in terms of activities and objects
- Understand how DS is different from traditional scholarship
- Know that DS is not just ‘adding computers to scholarship’
- Explain the relationships between Digital Scholarship and Open Science, RDM and Data Science
Project planning and management
- Explain how digital scholarship activities fit the definition of a project
- Identify example digital scholarship projects
- Understand the difference between project, product and service in the context of DS
- Find (technical) resources to get the project going
- Plan for the end of the project
- Understand that projects end at some point and what that means for project outcomes
Random
Persistent identification
- understand link rot and content drift in web pages
- determine whether a persistent identifier system supports persistent identification
- explain what makes an identifier ‘persistent’ or ‘persistable’
Research Data basics
- translate research questions into data needs and potential data sources
- recognise/know stages data life cycle (and research life cycle?)
- understand how a character encoding relates bytes in files to characters from a certain set
- identify that Unicode is a character set and that e.g. UTF-8 is a character encoding
- identify that the Unicode Consortium determines what characters (and emoji) are accepted into Unicode
- understand the difference between text and fonts
- file formats and their characteristics
- plain text
- HTML
- JSON
- Word documents
- JPEG, PNG, SVG
- Understand the difference between markup and layout
- Understand the differences between static and dynamic web pages
- data in files versus data mediated through interfaces (e.g. DBMS)
Obtaining data
- Identify data sources
- Understand license terms
- Understand effects of copying data or linking to data
- get data
- download or request from archive
- use provided API
- scrape from web pages
- understand data provenance
- identify common ways to record provenance
- file metadata
- commonly provided/used files in a folder
- understand the influence of file templates on correctness/completeness of provenance
- people do not always care to provide metadata
- identify common ways to record provenance
- determine data completeness
- identify measures for data completeness
Data management
- understand the goals and content of a data management plan
- storage
- access control
- cloud storage basics
- encryption basics
Data criticism
- Identify data as anything that serves as evidence in research
- perform data modelling
- understand how data relates to ‘real-world’ entities
- perform data quality assurance
- understand how data quality can be assured
- apply data quality checking
- understand data ethics
- identify potential privacy issues
- understand legal rights: copyright, database rights
- identify basic rules: what can and cannot be copyrighted?
- acknowledge that experts are needed to determine copyright applicability
- determine whether data is fit for (your) purpose
Data Science
Methods
- list common data science methods
- Text and data mining
- Network analysis
- Visualisation
- Annotation
- GIS
- software development
- Machine learning
Data transformation
- understand that you likely need to prepare (preprocess) data before use
Tool criticism
- distinguishing between hosted tools and locally running tools
- understand that hosted tools need data in their control
- identify tools that are fit for purpose
- find a description of the tool in its manual or in reviews
- identifying combinations of tools that solve problems
- know the four rights users have with FLOSS
- understand difference between open-source and propriety software
- influence of algorithms on research
Attitudes
- understand the trade-off between efforts needed for automation and repeating a task manually
- understand that not every new shiny tool is necessarily better
- learn from code forums like StackOverflow
- support folks who ask good questions
- when asking a question, explain your thought process and show your efforts
- help people ask good questions
Machine learning
- know that supervised machine learning is based on models, training using data and predicting results for unseen data
- understand that (especially in deep learning) how ML models predict often not easily explainable
Collaborating
- collaborative data maintenance
- preventing data conflicts
- resolving data conflicts
- agree on using a system for exchanging information in a project
- keep confidential data safe
Sharing results and credits
- Understand that (technical) support is part of the research and deserves credit too
- Identify the legal owner(s) of an artifact
Publishing
- know that publishing is making something public
- know that some disciplines have narrower/other definitions of publishing
- understand the paradox that it is hard to remove things from the internet and that it is hard to keep things available online for the long term
- identify various ways of publishing DS
- web
- data and publication repositories
- code repositories
- (binary) package repositories
- preprint
- traditional publisher
- data paper
- identify publishing activities that cost money
- identify who is paying for DS activities and why
FAIR
- know that FAIR does not imply Open (or vice versa)
- explain how produced research objects relate
- describe the provenance of produced results
- list terms commonly used in research to describe research objects
- understand the meaning of generic data terms
- dataset, file, metadata
Accessibility
- acknowledge that research should be accessible and understandable to anyone regardless of (dis)abilities (FIXME)
- apply basic guidelines for making text content accessible
- use headings
- provide relevant textual descriptions of images, videos and audio elements