Categories
Academia Ideas

Draft: Digital Scholarship competency list

I have been thinking about the kind of competencies or skills could make up what we call digital scholarship. I’m pretty sure somewhere there must be something on this, but I couldn’t find it. Here is a snapshot of the list I have been keeping in my notes.

To be clear: this will never be finished and this blog is, of course, not the best place to publish such a thing. I would like suggestions for related work and collaboration.

It was inspired by playing with Moodle’s Competency Framework features, my work (obviously, though this has been a private side project) and the DCMI Linked Data Competency Index.


Fundamentals of digital scholarship

Definitions

  • Define digital scholarship in terms of activities and objects
    • Understand how DS is different from traditional scholarship
    • Know that DS is not just ‘adding computers to scholarship’
  • Explain the relationships between Digital Scholarship and Open Science, RDM and Data Science

Project planning and management

  • Explain how digital scholarship activities fit the definition of a project
    • Identify example digital scholarship projects
    • Understand the difference between project, product and service in the context of DS
  • Find (technical) resources to get the project going
  • Plan for the end of the project
    • Understand that projects end at some point and what that means for project outcomes

Random

Persistent identification

  • understand link rot and content drift in web pages
  • determine whether a persistent identifier system supports persistent identification
  • explain what makes an identifier ‘persistent’ or ‘persistable’

Research Data basics

  • translate research questions into data needs and potential data sources
  • recognise/know stages data life cycle (and research life cycle?)
  • understand how a character encoding relates bytes in files to characters from a certain set
    • identify that Unicode is a character set and that e.g. UTF-8 is a character encoding
    • identify that the Unicode Consortium determines what characters (and emoji) are accepted into Unicode
  • understand the difference between text and fonts
  • file formats and their characteristics
    • plain text
    • HTML
    • JSON
    • Word documents
    • JPEG, PNG, SVG
    • PDF
  • Understand the difference between markup and layout
  • Understand the differences between static and dynamic web pages
  • data in files versus data mediated through interfaces (e.g. DBMS)

Obtaining data

  • Identify data sources
  • Understand license terms
  • Understand effects of copying data or linking to data
  • get data
    • download or request from archive
    • use provided API
    • scrape from web pages
  • understand data provenance
    • identify common ways to record provenance
      • file metadata
      • commonly provided/used files in a folder
    • understand the influence of file templates on correctness/completeness of provenance
      • people do not always care to provide metadata
  • determine data completeness
    • identify measures for data completeness

Data management

  • understand the goals and content of a data management plan
  • storage
  • access control
    • cloud storage basics
    • encryption basics

Data criticism

  • Identify data as anything that serves as evidence in research
  • perform data modelling
    • understand how data relates to ‘real-world’ entities
  • perform data quality assurance
    • understand how data quality can be assured
    • apply data quality checking
  • understand data ethics
    • identify potential privacy issues
  • understand legal rights: copyright, database rights
    • identify basic rules: what can and cannot be copyrighted?
    • acknowledge that experts are needed to determine copyright applicability
  • determine whether data is fit for (your) purpose

Data Science

Methods

  • list common data science methods
    • Text and data mining
    • Network analysis
    • Visualisation
    • Annotation
    • GIS
    • software development
    • Machine learning

Data transformation

  • understand that you likely need to prepare (preprocess) data before use

Tool criticism

  • distinguishing between hosted tools and locally running tools
    • understand that hosted tools need data in their control
  • identify tools that are fit for purpose
    • find a description of the tool in its manual or in reviews
  • identifying combinations of tools that solve problems
  • know the four rights users have with FLOSS
  • understand difference between open-source and propriety software
  • influence of algorithms on research

Attitudes

  • understand the trade-off between efforts needed for automation and repeating a task manually
  • understand that not every new shiny tool is necessarily better
  • learn from code forums like StackOverflow
  • support folks who ask good questions
    • when asking a question, explain your thought process and show your efforts
    • help people ask good questions

Machine learning

  • know that supervised machine learning is based on models, training using data and predicting results for unseen data
  • understand that (especially in deep learning) how ML models predict often not easily explainable

Collaborating

  • collaborative data maintenance
  • preventing data conflicts
  • resolving data conflicts
  • agree on using a system for exchanging information in a project
  • keep confidential data safe

Sharing results and credits

  • Understand that (technical) support is part of the research and deserves credit too
  • Identify the legal owner(s) of an artifact

Publishing

  • know that publishing is making something public
    • know that some disciplines have narrower/other definitions of publishing
    • understand the paradox that it is hard to remove things from the internet and that it is hard to keep things available online for the long term
  • identify various ways of publishing DS
    • web
    • data and publication repositories
    • code repositories
    • (binary) package repositories
    • preprint
    • traditional publisher
    • data paper
  • identify publishing activities that cost money
  • identify who is paying for DS activities and why

FAIR

  • know that FAIR does not imply Open (or vice versa)
  • explain how produced research objects relate
  • describe the provenance of produced results
  • list terms commonly used in research to describe research objects
  • understand the meaning of generic data terms
    • dataset, file, metadata

Accessibility

  • acknowledge that research should be accessible and understandable to anyone regardless of (dis)abilities (FIXME)
  • apply basic guidelines for making text content accessible
    • use headings
    • provide relevant textual descriptions of images, videos and audio elements