Domino Data Lab FAQs | Comparably
Enterprise Data Science Platform read more
EMPLOYEE
PARTICIPANTS
22
TOTAL
RATINGS
231

Domino Data Lab FAQs

Domino Data Lab's Frequently Asked Questions page is a central hub where its customers can always go to with their most common questions. These are the 187 most popular questions Domino Data Lab receives.

Frequently Asked Questions About Domino Data Lab

  • Overview

    Workspace sessions are interactive sessions hosted by a Domino executor where you can interact with code notebooks like Jupyter and RStudio. The software tools and associated configurations available to you are called Workspaces, and they are defined in the pluggable notebooks section of your Domino environment.

    How to launch a Workspace session in Domino

    To start a Workspace session, click Workspaces from the Project menu to open the Workspaces dashboard. The list of Workspaces available to be launched is determined by the Domino environment selected in the Compute Environment menu.

    Supply an optional name for your session, choose a hardware tier, then click Launch Workspace to start your session. This will open a loading screen for your session that will automatically redirect you to the hosted Workspace when it's ready.

    Jobs dashboard

    How to re-open running Workspace sessions

    If you close your browser or navigate away from your Workspace session, you can re-open it from the Workspaces dashboard by locating its row on the Running tab and clicking Open.

    How to view and manage Workspace sessions from the Workspaces dashboard

    The Workspaces dashboard supports all of the same organizational, management, and comparison features as the .

    View Article
  • Overview

    Each run and workspace in Domino operates in its own Docker container. These Docker containers are defined byDomino compute environments. Environments can be shared and customized, and they are automatically versioned by Domino.

    New installations of Domino come with a standard set of environments and associated Docker images. Periodically, Domino publishes a new set of standard environments with updated libraries and packages. This article describes how to adopt the latest Domino standard environments.

    Note

    These environments are only applicable to Domino versions 4+. See here for environments available to Domino version <4. Specifically, these environments are compatible with our move to nvidia-docker for GPU workspaces and also drop the need for wild-card certificate for subdomains.

    Standard Environments

    Domino maintains two standard compute environments:

    Domino Analytics Distribution (DAD) for Python 3.6and R 3.6.1

    Domino Analytics Distribution (DAD) for Python 3.7and R 3.6.1

    A compute environment with Py2.7 is available, however note that Py2.7 support will be deprecated after Jan 1, 2020.

    The Domino Analytics Distributions are designed to handle most of what a typical data science workflow needs out of the box, and differ only on which version of Python they have installed.

    These images are hosted on Quay.io, a third-party Docker registry. By default, your Domino deployment comes configured with keys to access Domino’s images in this service.

    The URLs for the latest versions of Domino Standard Environments are:

    DAD Python 3.6: quay.io/domino/base:Ubuntu18_DAD_Py3.6_R3.6_20190918

    DAD Python 3.7: quay.io/domino/base:Ubuntu18_DAD_Py3.7_R3.6_20190918

    Python 2.7 support will be deprecated after Jan 1, 2020. Following is the URL for a Py2.7 base image

    DAD Python 2.7: quay.io/domino/base:Ubuntu18_DAD_Py2.7_R3.6_20190809

    Contents of the Base Environments:

    Ubuntu 18.04

    Anaconda Python 3.6, 3.7

    R3.6.1

    Jupyter Lab0.35.4

    Jupyter 1.0

    Rstudio1.2.1335

    VSCode

    Keras2.2.5 Tensorflow-gpu* 1.14, CUDA10.0.130 (Py3.6 environment only)

    Open JDK 8

    Scala 2.12.8

    Julia 1.1

    Node 8.10

    Various common utilities (Curl, vim, AWSCLI,telnet, etc)

    See here for a full list of python and R packages:

    Ubuntu18_DAD_Py2.7_R3.6-20190916

    *note: to use Tensorflow on a cpu machine you must install Tensorflow's cpu version (e.g. pip install tensorflow).

    Identifying standard environments in your deployment

    To determine which version of Domino standard environments your deployment is using, open Domino and clickEnvironmentsin the top navigation bar. You can use the search bar to find environments withDomino Analyticsin the name. However, previous administrators or installers may have added these environments under a different name. To confirm that an environment is using a Domino standard, click on it to open theOverview.

    Compute Environment Management

    Note:

    Your installation of Domino may be set up with a previous version of our standard base image. Examples of historical base images:

    quay.io/domino/base:Ubuntu18_DAD_Py2.7_R3.5-20190501

    quay.io/domino/base:Ubuntu18_DAD_Py3.6_R3.5-20190501

    quay.io/domino/base:Ubuntu16_DAD_Py2.7_R3.4-20180727

    quay.io/domino/base:Ubuntu16_DAD_Py3.6_R3.4-20180727

    quay.io/domino/base:DAD_py2.7_R3.4_23052018

    quay.io/domino/base:DAD_py3.6_R3.4_23052018

    quay.io/domino/base:DED_py2.7_R3.4_23052018

    quay.io/domino/base:DED_py3.6_R3.4_23052018

    quay.io/domino/base:2016-12-07_1239_flat2

    You may also have a custom base image with your companies name in the URL.

    If you are using Domino version <3, you may require an upgrade to your system before you are able to utilize the Ubuntu 16.04 or 18.04 images first released in August 2018.

    Adding standard environments to your deployment

    Adding a Domino standard environment to your deployment involves two high-level steps.

    Create a new environment with the correct defaults for the standard you want to add.

    Seefor an overview of how to create and manage environments.

    See the table of environment details below to populate theBase Image,Dockerfile Instructions, andPluggable Workspace Toolsfields. These fields must be filled out correctly for the environment to work as intended.

    Critical Note: To use the latest version of the base image, you must have the Pluggable Workspace Tools feature flag turned by your administrator

    Copy over any additional customization from your existing default compute environment.

    If you added additional packages and libraries to your previous default environment, and you intend to use the new standard environment as your deployment default, you should copy over those additions to the new environment you created. These can be added to the end of theDockerfile Instructionsin the new environment.

    You should not copy over any instructions that update Python, R, RStudio, or other core components of the standard environments if the new standard already includes the version you want.

    Be sure to copy over any setup scripts and run scripts attached to your previous default, as your users may depend on them.

    Detailed information about standard environments

    This section contains detailed specifications about the two Domino Standard Environments, plus the configuration settings needed when adding them to your deployment.

    Domino Analytics Distribution (DAD)

    Title:

    Domino Analytics Distribution py3.6 R3.6

    OR

    Domino Analytics Distribution py3.7 R3.6

    Description:

    Ubuntu 18.04

    Anaconda Python 3.6

    R 3.6

    Jupyter Lab 0.35.4,Jupyter 1.0,Rstudio 1.2, VSCode

    Keras 2.2.4, Tensorflow-gpu 1.14.0, CUDA 10.0.130

    For python packages run: !pip freeze

    For R packages run: installed.packages()

    For further detail, please ask Domino Support for the full Dockerfile

    OR

    Ubuntu 18.04

    Anaconda Python 3.7

    R 3.6

    Jupyter Lab 0.35.4, Jupyter 1.0, Rstudio 1.2, VSCode

    Keras 2.2.4, Tensorflow-gpu 1.14.0, CUDA 10.0.130

    For python packages run: !pip freeze

    For R packages run: installed.packages()

    For further detail, please ask Domino Support for the full Dockerfile

    Base Image URL:

    quay.io/domino/base:Ubuntu18_DAD_Py3.6_R3.6_20190918

    OR

    quay.io/domino/base:Ubuntu18_DAD_Py3.7_R3.6_20190918

    Pluggable Workspace Tools

    jupyter:

    title: "Jupyter (Python, R, Julia)"

    iconUrl: "/assets/images/workspace-logos/Jupyter.svg"

    start: [ "/var/opt/workspaces/jupyter/start" ]

    httpProxy:

    port: 8888

    rewrite: false

    internalPath: "/{{ownerUsername}}/{{projectName}}/{{sessionPathComponent}}/{{runId}}/{{#if pathToOpen}}tree/{{pathToOpen}}{{/if}}"

    requireSubdomain: false

    supportedFileExtensions: [ ".ipynb" ]

    jupyterlab:

    title: "JupyterLab"

    iconUrl: "/assets/images/workspace-logos/jupyterlab.svg"

    start: [ /var/opt/workspaces/Jupyterlab/start.sh ]

    httpProxy:

    internalPath: "/{{ownerUsername}}/{{projectName}}/{{sessionPathComponent}}/{{runId}}/{{#if pathToOpen}}tree/{{pathToOpen}}{{/if}}"

    port: 8888

    rewrite: false

    requireSubdomain: false

    vscode:

    title: "vscode"

    iconUrl: "/assets/images/workspace-logos/vscode.svg"

    start: [ "/var/opt/workspaces/vscode/start" ]

    httpProxy:

    port: 8888

    requireSubdomain: false

    rstudio:

    title: "RStudio"

    iconUrl: "/assets/images/workspace-logos/Rstudio.svg"

    start: [ "/var/opt/workspaces/rstudio/start" ]

    httpProxy:

    port: 8888

    requireSubdomain: false

    View Article
  • Overview

    Domino makes it easy to collaborate on projects and share project outputs.

    There are two things that affect who has access toyour project:

    theproject's visibility settings

    the project's collaborators

    Visibility settings

    You can changeyour project's visibility by going to the Access & Sharing tab of the project's settings page.

    import the project

    There are three different visibility options:

    Public

    Anyone can view your files and runs, even if they don't have a Domino account.

    If file exports are enabled, anyone can import your project files.

    Only explicitly added collaborators can modify files, start runs, orimport environment variables, unless you check theallow runs by anonymous usersbox described below.

    Searchable

    Anyonewill be able to see that this project exists, and see its name and description in search results, but only explicitly added collaborators can see the project's contents.

    Private

    Only collaborators can view this projector discover its existence through search results.

    If your project is publicly visible, there is an additional option to allow runs by anonymous users. This will allow users to start runs even if they don't have a Domino account.Runs started by anonymous users will show up as being started by the project owner.

    Warning

    Allowing anyone to run your code can be dangerous. Be careful granting this level of access, and make sure to think through any information you may be revealing, such as environment variables you have set in your project that contain bearer tokens, API keys, or passwords.

    Managing collaborators

    To grant other users access to a project, you can add them as collaborators. To add collaborators, you must be a Contributor to the project, or the project Owner.

    ClickSettingsfrom the project menu, then click the Access & Sharing tab and scroll down to the Collaborators and permissions panel. You can add new collaborators by their username or email address. If you supply an email address belonging to a Domino user, that user will be invited to join the project as a collaborator. If you supply an email address that is not associated with an existing Domino user, an email will be sent to that address inviting them to join Domino.

    The collaborators tab is also where you specify how each collaborator should be notified when runs complete. This can be a powerful tool to keep your collaborators in sync on the work that each person is doing.

    Access levels

    The owner of a project can set different access levels for collaborators from theCollaborators and permissions panel. The basic capabilities of the various types of project collaborators are as follows:

    Contributors

    Can read and write project files, and start runs. On the Settings page, Contributors can read and write project environment variables, and they can invite new collaborators. Contributors cannot change hardware tiers, compute environments, or the access levels of collaborators.

    Results Consumers

    Can only read files and access published apps.

    Launcher Users

    Can only view and run Launchers, and see the launcher runs results. They cannot view project files.

    Project Importers

    Can, but otherwise cannot access it.

    Owners

    Are the only users whocan archive a project, change the owner, change collaborator types, set automatic workspace shutdown times, or change the project default hardware tier or environment.

    For complete, itemized project permissions set on each type of collaborator, consult the tables below:

    Files permissions

    Permission

    Project Importer

    Launcher User

    Results Consumer

    Contributor

    Owner

    Read files

    x

    x

    x

    Write files

    x

    x

    Add external Git repository

    x

    x

    Runs permissions

    Permission

    Project Importer

    Launcher User

    Results Consumer

    Contributor

    Owner

    Start Run

    x

    x

    Start Workspace

    x

    x

    Schedule Run

    x

    x

    Publishing permissions

    Permission

    Project Importer

    Launcher User

    Results Consumer

    Contributor

    Owner

    Run Launcher

    x

    x

    x

    View App

    x

    x

    x

    x

    Publish App

    x

    x

    Unpublish App

    x

    x

    Invite users to App

    x

    x

    Change App hardware tier

    x

    x

    Publish Model API

    x

    x

    Create Launcher

    x

    x

    Settings permissions

    Permission

    Project Importer

    Launcher User

    Results Consumer

    Contributor

    Owner

    Set environment variable

    x

    x

    Invite collaborator

    x

    x

    Change project stage

    x

    x

    Raise a blocker

    x

    x

    Set project status as complete

    x

    Manage collaborator permissions

    x

    Change visibility setting

    x

    Change default environment

    x

    Change default hardware tier

    x

    Change project name

    x

    Handle merge request

    x

    x

    Transfer project ownership

    x

    Archive project

    x

    Dataset permissions

    Permission

    Project Importer

    Launcher User

    Results Consumer

    Contributor

    Owner

    Mount Dataset from project for read-only use

    x

    x

    x

    Write new Snapshot to Dataset in project

    x

    x

    Import permissions

    Permission

    Project Importer

    Launcher User

    Results Consumer

    Contributor

    Owner

    Import project

    x

    x

    x

    x

    View Article
  • 3.6.13 (October 1, 2019)

    Changes

    Fixed an issue where attempting to reopen a recently closed tab in Edge v17+ that contained a Domino workspace session would result in a 404. The reopened tab will now connect to the workspace session successfully.

    3.6.12 (September 26, 2019)

    Changes

    Improved performance of loading model instance logs in the UI.

    Removed checkboxes on the rows in some tables that did not support any bulk actions.

    Fixed an issue where changes to timeout settings for Model APIs were sometimes not taking effect.

    Fixed an issue where changes to the visibility setting of a Model API would sometimes not persist.

    4.0.0 (September 24, 2019)

    Welcome to Domino 4!

    In addition to helpful new features for data scientists and project leaders, Domino 4 introduces a new architecture with all components running on Kubernetes. This change makes Domino easier to install, configure, monitor, and administer, and allows Domino to run in more environments than ever before. Visit admin.dominodatalab.com to learn about the technical design of Domino 4 and read guides for configuration and administration.

    Breaking changes

    Domino 4.0 sunsets support for V1 environments. Convert any existing V1 environments to V2 environments to continue to use them.

    Domino 4.0 sunsets support for legacy API endpoints. Only Model APIs are supported.

    Many previous interfaces and options for managing Domino executors have been replaced with the introduction of the new Kubernetes compute grid. There are new dashboards for viewing Kubernetes infrastructure and active execution pods, and new options for configuring Hardware Tiers.

    Click to read more about Managing the compute grid in Domino 4.

    Domino 4.0 removes support for SSH access to a Run container.

    Domino 4.0 removes support for arbitrary Docker arguments for things like custom volume mounts.

    Domino 4.0 removes support for connecting to VPNs from Run containers.

    In Domino 4.0, user logins must use the new Keycloak authentication service. Any existing legacy LDAP integrations will need to have their configurations migrated to Keycloak.

    Domino 4.0 ships with a new collection of Domino 4.0 standard environments. Users who want to use NVIDIA GPUs in Domino 4.0 should base the environments they want to use for those workflows on the new standards, as they are built with NVIDIA Docker.Note that these new standard environments do not support working with GPUs from Python 2.

    New features

    Domino 4.0 adds a new Assets Portfolio that allows users to quickly discover and see key information about the data products they have access to in Domino, including Model APIs, Apps, Launchers, and Scheduled Jobs.

    A newProjectManager admin role is available. This role grants a user contributor access to projects owned by other users who are members of the same organization as theProjectManager. This allows theProjectManagerto view those projects in the Projects Portfolio, discover their published assets in the Assets Portfolio, and view the projects’ contents as a contributor.

    Domino 4.0 introduces Project Goals. Goals represent outcomes or subtasks within projects. Project contributors can link files, Workspace sessions, Jobs, Apps, and Model APIs to goals, which show up on the goal card in the project overview. This provides a way to track all work related to a specific goal in the project, and can make navigating large and busy projects easier.

    New options are available in the Notifications and Workspace Settings sections of user Account Settings that allow for opt-in to email notifications or auto-termination for long-running Workspace sessions with a configurable duration.

    Admins also now have additional options for defining which Workspace sessions to treat aslong-running, enforcing notification requirements for users, and sending additional global notifications about long-running sessions to admins.

    Additional changes

    Visual styling and design for tables, buttons, links, accordion headers, breadcrumbs, and tab navigation have all been improved and made consistent across the Domino application.

    3.6.11 (September 11, 2019)

    Changes

    Removed the checkboxes on table rows when there are no bulk actions available on objects in the table.

    Clicking the button to duplicate an environment now redirects the user to the duplicate environment description page. Previously, taking this action would reload the original environment's description page.

    3.6.10 (September 4, 2019)

    Changes

    Fixed an issue where stopping a Workspace from the Workspaces dashboard would erroneously indicate to the user that they were discarding changes even when there were no uncommitted changes in the session.

    3.6.9 (August 30, 2019)

    Changes

    Improved performance of loading model instance logs in the UI.

    Improved garbage collection for temporary folders and volumes on executors.

    Fixed an issue where stopping and committing from the Workspace UI would not execute the post-run scripts from the user's environment.

    Added a modelManager.modelContainer.restartCountLimit option which defines how many times a Model API can fail to launch before being descheduled and not restarting further.

    3.6.8 (August 26, 2019)

    Changes

    Restyled the support button so that it no longer obscures other interactive UI components.

    3.6.7 (August 20, 2019)

    Changes

    Fixed an issue with event telemetry reporting that could cause some actions in the UI to hang indefinitely.

    Fixed an issue where logs would sometimes erroneously indicate a lack of executor capacity.

    3.6.6 (August 13, 2019)

    Changes

    Added a modelmanager.requestBufferSize to override the size of the uWSGI request buffer for published models. This will allow creating models that take larger requests without causing invalid HTTP request size errors.

    Fixed an issue that could cause opening a Workspace to 404 for users with uppercase characters in their LDAP federated usernames.

    3.6.5 (August 6, 2019)

    Changes

    Fixed an issue with CLI login.

    3.6.4 (August 2, 2019)

    Changes

    Fixed an issue where the support widget could cover up interactions in the application.

    3.6.3 (July 30, 2019)

    Changes

    Added a button to the dispatcher admin UI to download the files from the working directory of an active Run.

    3.6.2 (July 22, 2019)

    Changes

    Fixed an issue where if a user had made changes in a workspace, then reopened that workspace in a new tab or window, attempting to stop and commit changes would not correctly commit the changes.

    3.6.1 (July 19, 2019)

    Changes

    Fixed some issues with rendering custom support buttons.

    Admin users who stop their own workspaces will no longer receive a notification that an admin stopped their workspace.

    Fixed a rendering issue with the help link in the workspace logs panel.

    3.6.0 (July 17, 2019)

    Breaking changes

    The legacy Runs dashboard is no longer available in Domino 3.6. After upgrading to 3.6, the new Jobs dashboard interface will be enabled.

    Changes

    Domino 3.6 introduces Datasets scratch spaces. These are mutable filesystem directories for temporary data storage and exploration. They are a compliment to the core Datasets functionality. Read more about scratch spaces here.

    New API endpoints for working with Datasets are available.

    The following types of events have been added to project activity feeds :

    Publishing a Model API

    Publishing an App

    Publishing or modifying a scheduled Job

    Creating, editing, or deleting files from the Domino UI

    Creating, editing, or deleting files as the result of a Workspace sync

    A new timeline component has been added to the Jobs dashboard. This component shows a time series of dominostats.json values being tracked across experiments. Read more here.

    If you are a collaborator on a project, and an administrator or project owner stops one of your Workspace sessions, you will now get an email notification with details.

    For new deployments, Domino 3.6 introduces a new authentication service that supports additional protocols and SSO providers.

    Tables throughout the UI have been switched over to a new component type for unified and improved styling.

    Previously, when an executor in Maintenance Mode was stopped, starting the executor from the Dispatcher UI would take the executor out of Maintenance Mode automatically. Now, the executor will start but remain in Maintenance Mode. Executors will only exit Maintenance Mode when an administrator manually toggles Maintenance Mode.

    Fixed some issues with model tester connectivity to published models.

    Fixed formatting issues with UI text on the environment definition page.

    Fixed an issue with port assignment for Spark that could impact connectivity.

    3.5.4 (July 2, 2019)

    Changes

    Previously, when an executor in Maintenance Mode was stopped, starting the executor from the Dispatcher UI would take the executor out of Maintenance Mode automatically. Now, the executor will start but remain in Maintenance Mode. Executorswill only exit Maintenance Mode when an administrator manually toggles Maintenance Mode.

    3.5.3 (June 26, 2019)

    Changes

    Added support site URL to the Projects Portfolio help link.

    Fixed an issue where members of an organization were not inheriting the correct permissions on published Apps in projects owned by the organization.

    3.4.11 (June 25, 2019)

    Changes

    Fixed an issue where members of an organization were not inheriting the correct permissions on published Apps in projects owned by the organization.

    3.5.2 (June 20, 2019)

    Changes

    Fixed an issue that could lead to long load times for the Workspaces dashboard.

    Changed implementation of filtering out default quick-start projects from the Projects Portfolio to allow for filtering on custom default projects.

    Fixed an issue where older activity could incorrectly be listed as recent in user activity reports.

    3.4.10 (June 20, 2019)

    Changes

    Fixed an issue that could lead to long load times for the Workspaces dashboard.

    Fixed an issue where cloning a hardware tier could lead to intermittentError getting hardware tiermessages in project settings.

    3.5.1 (June 13, 2019)

    Changes

    Renamed the Projects filter in the Activity Feed to Project Events.

    Fixed an issue where some older Runs were incorrectly showing up as recent workloads in user activity reports.

    Fixed an issue with Domino 3.5.0 where the button to SSH into a Run container would not display in the Jobs dashboard details panel.

    The default quick-start project created for all users no longer appears in the Projects Portfolio.

    Fixed a logging issue with the Domino frontend application that could cause host instability.

    Fixed an issue where tagging a project did not reliably reindex the project for search with the new tag.

    3.4.9 (June 11, 2019)

    Changes

    Fixed a logging issue with the Domino frontend application that could cause host instability.

    Fixed an issue where tagging a project did not reliably reindex the project for search with the new tag.

    3.5.0 (June 7, 2019)

    New features

    Domino 3.5 introduces project stages. Stages are a customizable set of labels that can be used to track the progress of your project through your team’s data science life cycle. The set of available stages can be configured by Domino administrators in a new interface accessible through the admin portal.

    The stage at the top of the Project Stage Configuration will be the default stage for new projects. Projects can be moved to any other available stage by the project’s owner or contributors.

    Project stage is managed from a new menu that users can open by clicking the project title in the project menu. Stage changes are recorded in the project activity feed.

    Projects owners and contributors can also use the project stage menu to flag a project as blocked with a description of the blocker, or mark a project as complete with a description of the project conclusion.

    Blocker and completion events also appear in the project activity feed with an attached discussion thread for comments. The same project status menu used to raise blockers and mark projects complete can be used to resolve blockers and reopen completed projects. Note that projects marked as blocked or complete are still fully functional Domino projects. These statuses are labels used by team leaders to track project status.

    Domino admins have access to a new Projects Portfolio interface, designed as a dashboard for data science leaders to organize and understand projects worked on by their teams. The portfolio can filter projects by stage and status, and includes customizable columns of project information.

    The Projects Portfolio is accessible from the Control Center main menu.

    A new option is available for creating Dataset snapshots via browser upload.

    Browser uploads to Datasets can support up to 50GB or 50,000 files per snapshot. Uploads can be paused and resumed to allow for network interruptions to laptops or workstations. Note that paused or interrupted uploads that are not resumed within 24 hours are discarded and will need to be started again.

    Theupload-datasetcommand in the Domino CLI has new functionality similar to the new browser upload feature. It now supports up to 50GB and 50,000 files per upload, and can be resumed if interrupted. Install Domino CLI 3.5+ and rundomino help upload-datasetfor usage instructions.

    For licensing purposes, Domino now distinguishes between users who have signed up only to browse and consume results, and users who run practitioner workflows. The former type of user will not be treated as taking up a Domino practitioner license until they launch a Run, Workspace, or publish a data product like an App or Model. Starting Runs via Launchers published by other users does not count as a practitioner workflow.

    The Users page of the Domino admin portal includes new information to track license usage and practitioner status. Admins can easily identify users who are taking up a practitioner license, see when those users were last active in Domino, and deactivate accounts as desired.

    All of the information in the Users page can also be downloaded as CSV via User Activity Reports.

    Columns in the Jobs dashboard will now automatically resize to accommodate the sizes of the displayed values.

    When generating a new Dataset snapshot by running a Job, the script file selector will now autocomplete with the names of files in the current project.

    Added apublicProjectsEnabledoption which, when disabled, removes Public as an option for project visibility.

    Issues resolved

    Project collaborators withLauncher Userpermissions can no longer access the Workspaces dashboard.

    Project collaborators withLauncher Userpermissions will no longer seeUnauthorizederrors when trying to open the Jobs dashboard to view the results of Jobs they started with Launchers.

    Fixed an issue where console logs were sometimes not displaying in the Logs tab on the Jobs and Workspaces dashboards.

    Fixed an issue where Jobs started from the Domino CLI could not appear in the Jobs dashboard.

    Public projects that allow anonymous execution will no longer require a sign-in to access the Workspaces dashboard.

    Previously, a limit of 1000 Domino environments could be listed on the Environments overview. This limit has been removed.

    Fixed an issue where cloning a hardware tier could lead to intermittentError getting hardware tiermessages in project settings.

    The hardware tier dropdown menu in the project settings will now load correctly even when admins are actively creating many additional hardware tiers.

    Fixed an issue where attempting to stop a model version in the middle of startup would stop the underlying model host instance, but not show the model as stopped in the UI, leaving it stuck in anInstance not startedstate.

    3.4.8 (May 28, 2019)

    Changes

    Fixed an issue where Domino with elastic executor scaling could sometimes create new executor hosts when there was still available capacity on stopped hosts. Domino will now more reliably restart existing available hosts instead of launching new ones.

    Fixed an issue where transferring a project to an Organization would cause the Create an app.sh filebutton on the app publishing page to point to a non-existent URI, producing 404 errors.

    Previously, a limit of 1000 Domino environments could be listed on the Environments overview. This limit has been removed.

    Fixed issue with bulk downloading files from the project files page. Using checkboxes next to files should now reliably allow users to bulk download the selected files.

    3.4.7 (May 21, 2019)

    Changes

    Project collaborators withLauncher Userpermissions can no longer access the Workspaces dashboard.

    Fixed several issues related to excessive Dispatcher logging.

    3.4.6 (May 13, 2019)

    Changes

    Fixed an issue where Spark connectivity ports listed in Spark configuration files and ports open in Run containers could be different, resulting in no connectivity to Spark.

    Fixed an issue where Run results would fail to render when certain special characters appeared in filenames.

    3.4.5 (May 6, 2019)

    New features

    It is now possible for Domino administrators to set up custom default projects that all new Domino users will own a copy of. Setting up custom default projects replaces the Domino-standardquick-startproject.

    Learn more in Change the default project for new users.

    3.4.4 (April 29, 2019)

    Changes

    Changed theStart timefield in the Jobs dashboard to show the time the Job entered theQueuedstate. Previously, this field showed when the Job entered theRunning state, which could be a noticeable time after the user initiated the Job depending on startup times.

    Fixed an issue where Control Center data would fail to load into the UI due to a malformed API call.

    Fixed an issue where changing the time constraints on Model instance logs did not correctly filter the logs to only those in the indicated times.

    3.4.3 (April 23, 2019)

    Welcome to Domino 3.4!

    New features

    Domino 3.4 introduces a new activity feed that pulls together records of important events across your project. The activity feed shows Jobs started, Workspace sessions launched, and comments left on both Runs and files.

    It also has an option to quickly open a Run comparison.

    There are new API endpoints available for adding, removing, and configuring external Git repositories.

    Click to learn more about project-level endpoints and repository-level endpoints.

    Domino now records and displays in the UI which branch of an external repository was checked out for the start and end commit of a Run.

    Domino now supports OpenSSH formatted SSH keys as credentials for accessing external Git repositories.

    Domino now supports adding Bitbucket App Passwords as credentials for accessing external Git repositories.

    Domino now supports adding Git repositories stored in Azure Repos.

    Error messages related to errors in thedomino.yamlconfiguration file for Datasets advanced mode have been improved. Learn more in About domino.yaml.

    The Workspace session UI now has more consistent options for controlling which repositories are committed to when performing a manual sync or stopping the session.

    Apps can now be made aware of the Domino username of the user viewing them, by accessing a new HTTP header. Learn more by reading How to get the Domino username of an App viewer.

    New configurable health checks have been added that can mark Domino executors as unhealthy when they are low on disk space. Learn more about configurable health checks.

    Substantial changes have been made to the way Domino handles unhealthy and unresponsive executors.

    Domino no longer automatically puts machines inMaintenance Mode. Instead,Maintenance Modeis reserved as a state Domino admins can place a machine in for actual maintenance. Unhealthy or unresponsive executors will be automatically moved by Domino into a newUnusablestate.

    Read more about these changes in Executor maintenance in Domino 3.4+.

    Kubernetes deployment logs are now available for Domino models, from the same interface that hosts instance and build logs. If you experience persistent issues with model deployment in Domino, you may need to retrieve and send these logs to the Domino support team. Only Domino admins can download these logs.

    A newProjectcolumn has been added to the Datasets administration UI to help disambiguate cases of identical Dataset names.

    It is now possible to specify a hardware tier when starting a run from the python-domino library.

    Apps can now run on a Kubernetes cluster, bringing greater reliability to your apps via autorestart. This feature is in limited availability for this release. Please contact [email protected] to learn more.

    Issues resolved

    Copying a project now also copies the configurations for attached external Git repositories.

    Fixed issue where the Jobs dashboard would spawn an error modal window that could not be closed by the user.

    Improved long load times for some pages in the Model API publishing UI.

    Fixed issue with logs not displaying correctly on the Apps dashboard.

    Fixed issue where in some cases the status of an App on the Overview tab would not update during publishing.

    Fixed issue where publishing an App could change its permissions fromAnyone with a Domino accounttoInvited users only (others may request).

    Fixed some issues with filtering on the Files page for a project.

    2.11.18 (April 3, 2019)

    Changes

    Turned off Apache HTTP logging in Domino CLI.

    Fixed issue with kubeadm access to S3 for backups in AWS deployments.

    3.3.8 (April 3, 2019)

    Changes

    Turned off Apache HTTP logging in Domino CLI.

    3.3.7 (March 27, 2019)

    Issues resolved

    Fix issue where Jobs submitted through the Domino CLI would sometimes not appear in the Jobs dashboard.

    3.3.6 (March 20, 2019)

    Issues resolved

    Fix issue where public projects that enabled Runs by anonymous users would get 500 errors when loading the Jobs dashboard or Workspaces dashboard if anonymously started Runs were present.

    3.3.5 (March 18, 2019)

    This release of Domino 3.3.5 includes cumulative changes of several development releases since the release of Domino 3.3.1.

    New features

    ARelaunchbutton has been added to the Workspaces dashboard. You can use this button to launch a new Workspace session based on the hardware tier, environment, and optionally the project revision of a past Workspace session.

    ClickingDelete all marked Snapshotsfrom the Dataset administration interface now shows a confirmation dialog that itemizes the Snapshots to be deleted.

    Issues resolved

    Fix issue with heap usage and garbage collection that could cause frontend outages.

    Stop frontend from trying to reconnect to stopped executors, which was causing spurious timeout errors.

    Improve unclear error messages when users attempt to open a Workspace session they do not have permission to view.

    Fix issue where in some cases the Domino project menu would fail to render or disappear in response to user action.

    Fix issue where changing the project default hardware tier in the project menu would not affect Runs started from the Quick Action menu until the page was refreshed. Changing the default hardware tier from the project menu now takes effect immediately.

    Clicking theStopbutton for a Workspace session on the Workspaces dashboard will no longer simultaneously open the confirmation dialog and the session details pane. Only the confirmation dialog will open.

    Project collaborators with access to the Workspaces dashboard, including Owners, will no longer be able to open Workspace sessions started by other users. Only the user that launched a Workspace session can open it.

    Fixed an issue where starting an App in Domino would sometimes switch the App permissions toInvited users only.

    Copying a project now correctly copies over the project’s added Git repositories and their configurations.

    Resolved inconsistencies in the UI text forDatasetsversusdata setsanddata set collections.

    Fix issue where re-running a Job that produced a Dataset Snapshot did not enforce Snapshot limits, allowing the target Dataset to exceed the limit.

    Fix issue where the SSH command shown for connecting to an App-hosting executor would sometimes be too long to display in the text field that showed it. Those text fields have been made scrollable for these cases.

    When users change the project default environment or the environment from the Workspaces dashboard, the available Workspace cards will now populate correctly with the Workspaces available in the chosen environment.

    Fix issue where hanging symlinks in the Dataset output directory could cause Snapshot creation to fail.

    3.3.1 (February 20, 2019)

    Welcome to Domino 3.3!

    We’ve introduced some powerful new features for working with big data and managing your experiments.

    New features

    Domino 3.3 introduces Domino Datasets, a new feature for high-performance storage of big data in Domino.

    Datasets get attached to your Runs in Domino as network directories, which greatly improves startup and sync times. Unlike project files, there is no limit to the individual file size, number of files, or total file volume you can store in Datasets.

    Note

    Datasets are available for use immediately for Domino deployments running in the cloud on version 3.3.1+. If you want to use Datasets with your on-premises deployment of Domino, contact your Customer Success Manager.

    To learn more about Datasets, read:

    Datasets overview

    Datasets best practices

    Datasets advanced mode tutorial

    Converting legacy Data Sets to Domino Datasets

    Datasets administration

    Domino 3.3 also introduces new tools for managing experiments in Domino.

    The previous Runs dashboard has been replaced by a new Jobs dashboard.

    This new dashboard offers several enhancements, including:

    Customizable columns of sortable data about your Jobs, including keys from dominostats.json

    More powerful tagging features for quick and effective organization

    Bulk actions to tag, archive, or stop Jobs

    Additionally, the Jobs dashboard only shows information about Jobs. Similar separate dashboards have been added where you can find the same information about Workspace sessions and Hosted Apps.

    Note

    While the new Jobs dashboard is enabled by default on new deployments of Domino 3.3.1+, if you have upgraded to 3.3.1 from a previous version of Domino, you may need to manually enable it.

    To enable the new dashboard, a system administrator must set the ShortLived.JobsDashboardEnabled feature flag to true.

    Contact [email protected] if you need assistance.

    Domino CLI now auto-detects proxy configurations and should work out-of-the-box in more cases.

    Manual configuration of CLI proxy settings is still available if necessary.

    Issues resolved

    Fixed an issue where Apps would sometimes render incorrectly due to the Domino toolbar not loading.

    Resolved an issue where the full name of the default hardware tier was not displaying on the Start Run dialog.

    Improved the performance of opening the Start Run dialog.

    Fixed a bug where transferring ownership of an environment to another user would fail.

    The /v4/organizations API POST endpoint will now use the real name parameter it receives for the organization name. Previously, this endpoint was always using an incorrect hardcoded value.

    Previously, users could POST to /v4/projects/projectToCopyId/copy multiple times with the same projectToCopyId, resulting in multiple identically named copies. Now, attempting to make a copy that would produce an identically named project fails.

    Fixed an issue where executors in on-premises deployments of Domino were sometimes shut down when they shouldn’t be.

    Fixed a bug where the edit option for environment name wasn’t showing up on the environment overview page.

    3.2.9 (January 31, 2019)

    If you’re new to Domino 3, check out the welcome guide for information about the new UI design, and features like Launchpad and Control Center.

    Notes

    The Create Model form will now only list projects on which the user is an Owner or Contributor.

    Improved loading times of the Create Model form.

    Improved loading times of the Model Versions page.

    3.2.7 (January 11, 2019)

    If you’re new to Domino 3, check out the welcome guide for information about the new UI design, and features like Launchpad and Control Center.

    Breaking changes

    There is no longer a feature flag to disable the Domino 3.0 UI. Versions of Domino 3.2.6+ no longer support using the old Domino 2.X UI.

    Previously, all models published in Domino would log model inputs and outputs by capturing the request and response body in the model instance logs. Request and response logging is now configurable with a checkbox when publishing a model version, and it defaults to OFF.

    New features

    A new Quick Action button is available. Click Quick Action at the top of the project menu, or press 'A' with a project open to access quick actions.

    You can use quick actions to select and launch a workspace:

    Or start a run:

    The default landing page after signing in to Domino (https://<your-domino-url>/overview) has been changed to a new activity overview. On this page you can see your recent Runs, which includes Runs that are in-progress or finished in the last hour:

    And your scheduled runs:

    Click the name of a run to open it in the Runs dashboard.

    Previously, the default landing page was the Projects page. You can access this page by clicking Projects from the main menu. If you want to return to the activity overview, click Domino at the top of the main menu.

    You can now click the hardware tier and compute environment tab from the project menu to quickly change those settings in a fly-out menu:

    When scheduling a Run in a project that belongs to an organization, admins can now set the Run as field to any member of the organization.

    A new configuration point is available to change which LDAP field the Domino user fullName attribute is derived from. Set com.cerebro.domino.ldap.fullNameAttribute to the name of the LDAP field you want to use for fullName. This defaults to cn.

    New deployments of Domino 3.2.7+ will use MongoDB 3.4 for the central database.

    Issues resolved

    To improve security, authentication tokens have been removed from Kubernetes pods where they were not required.

    Previously, after archiving a project, the project would continue to be listed in the Projects tab for the default environment used by the project. Clicking the project name in that table would result in a 404 error. Archived projects no longer appear in the Projects tab for their default environment.

    Fixed an issue where it was possible for two different users to create compute environments with identical names, then make both of them global. They would then not be distinguishable in the environment select dropdown. Attempting to make an environment global now checks for name uniqueness, and reports an error if the check fails:

    Previously, it was possible to archive an environment that was in use as the default environment by one or more projects. Those projects retained access to that environment until they were switched to a new default. Now, in order to archive an environment, it must not be in use by any projects. Attempting to archive an environment in use will result in an error.

    An issue that could cause an error when loading the Workspaces page was caused by incompatible versions of Apollo and some other Graphql dependencies. These have been updated to working versions.

    Fixed an issue that could lead to a race condition in the Domino object database.

    Fixed an issue where setting an EBS Volume Size Override on a hardware tier would sometimes not successfully change the volume size on executors.

    Previously, using Domino with kubeadm and autoscaling relied on some external dependencies that were not available entirely in Domino mirrors. These dependencies have been set up in mirrors.domino.tech and Domino is configured to use them.

    Known issues

    A performance issue where the versions tab of a model can take a long time to render when there are many model versions.

    A performance issue where the models overview can take a long time to render when there are many models.

    An issue where Domino Frontend hosts can use excessive disk space for /tmp.

    The boilerplate comment in a new project’s .dominoresult file is inaccurate. Instead of:

    List the filenames here that you want to show up in the results view

    It should say:

    List the filenames here that you want to exclude from the results view

    A bug where trying to edit the description of a Run from the Runs dashboard may fail.

    Trying to select a V1 environment from the new compute environment fly-out menu is not supported. Attempting to do so will produce an Environment not set error toast.

    3.1.10 (January 9, 2019)

    If you’re new to Domino 3, check out the welcome guide for information about the new UI design, and features like Launchpad and Control Center.

    Notes

    Domino services have been updated to run on Java 1.8.0_181.

    Fixed an issue where setting custom certificates with both domino.trusted_ca_cert_file and domino.ldap.trusted_cert_file could cause a race condition and fail to update the Java keystore.

    Fixed an issue where the /v4/project/<id>/copy endpoint could produce projects with identical names. This endpoint now enforces unique project names.

    Fixed a bug with the API endpoint to retrieve projects by owner that caused it to return an empty set even when provided with a valid user ID.

    Fixed an issue with the datasets V2 beta feature where attempting to write to a dataset from a workspace would cause the workspace to not shut down correctly, and the dataset write to fail.

    2.11.14 (January 9, 2019)

    Notes

    Fixed an issue where setting custom certificates with both domino.trusted_ca_cert_file and domino.ldap.trusted_cert_file could cause a race condition and fail to update the Java keystore.

    Fixed a bug with the API endpoint to retrieve projects by owner that caused it to return an empty set even when provided with a valid user ID.

    2.11.13 (January 2, 2019)

    Notes

    Amazon Elastic Container Registry images are supported as a base image for compute environments.

    Fixed an issue where the/v4/project/<id>/copyendpoint could produce projects with identical names. This endpoint now enforces unique project names.

    2.11.12 (December 13, 2018)

    Notes

    Fixed an issue where the Domino UI running in Internet Explorer would sometimes make duplicate API calls.

    2.11.11 (December 7, 2018)

    Notes

    To improve security, authentication tokens have been removed from Kubernetes pods where they were not required.

    2.11.10 (December 6, 2018)

    Notes

    Fixed an issue where on some upgrade paths a version incompatibility between several dependencies could cause the Workspaces page to fail to load.

    3.1.6 (December 5, 2018)

    If you’re new to Domino 3, check out the welcome guide for information about the new UI design, and features like Launchpad and Control Center.

    Notes

    Fixed an issue where on some upgrade paths a version incompatibility between several dependencies could cause the Workspaces page to fail to load.

    3.1.5 (November 15, 2018)

    If you’re new to Domino 3, check out the welcome guide for information about the new UI design, and features like Launchpad and Control Center.

    Notes

    This release focused on code refactors and small implementation changes to improve executor stability and central database performance.

    3.1.4 (November 8, 2018)

    If you’re new to Domino 3, check out the welcome guide for information about the new UI design, and features like Launchpad and Control Center.

    Notes

    Fixed an issue that could cause users with Launcher user permissions on a project to hit a 404 error when trying to access the project's runs.

    2.11.9 (November 8, 2018)

    Notes

    The default read and connect timeouts for Domino executor replication havebeen increased to 120 seconds. These timeouts are now configurable with the following keys:

    httpService.connectTimeoutSec

    httpService.readTimeoutSec

    3.1.2 (October 29, 2018)

    If you’re new to Domino 3, check out the welcome guide for information about the new UI design, and features like Launchpad and Control Center.

    Notes

    Fixed an issue where enabling global two-factor authentication could cause some executors to fail to start.

    Previously, users opening Domino workspaces could sometimes run into a CORS issue when running notebooks with a subdomain, resulting in a 404 error. This issue has been fixed.

    A non-thread-safe implementation of some libraries in Domino was previously responsible for some cases of unusually high CPU usage. This has been fixed by changing to a thread-safe implementation.

    2.11.8 (October 24, 2018)

    This release of 2.11.8 follows the release of Domino 2.11.6.

    Notes

    Domino now supports specifying a custom field from your LDAP server for specifying users’ full names. The previous standard value of 'cn' continues to be the default. Contact your Domino Account Manager to discuss making this change to your existing deployment.

    Previously, users opening Domino workspaces could sometimes run into a CORS issue when running notebooks with a subdomain, resulting in a 404 error. This issue has been fixed.

    A non-thread-safe implementation of some libraries in Domino was previously responsible for some cases of unusually high CPU usage. This has been fixed by changing to a thread-safe implementation.

    2.11.6 (October 16, 2018)

    This release of 2.11.6 follows the release of Domino 2.11.3.

    Notes

    A new configuration option is available to set how many lines of logs to return when viewing run logs. The option logs.run.fetchSize controls this, and defaults to 10,000.

    Fixed an issue where enabling global two-factor authentication could cause some executors to fail to start.

    A new configuration option is available to disable user access to the Domino API and CLI. Set com.cerebro.domino.api.isEnabled to false to disable all programmatic access. Users will only be able to interact with Domino through the web application if this is set.

    The stdout panel in the Runs dashboard now polls for fresh stdout data from the active executor once every 3 seconds. This is a reduction in frequency from the previous value of 2 seconds, and will reduce backend load.

    2.11.3 (September 11, 2018)

    This release of 2.11.3 follows the release of Domino 2.11.1.

    Issues resolved

    A metadata encoding issue that was causing some scheduled runs to fail when a deployment upgraded to Domino 2.11 has been fixed.

    Resolved an issue that caused model builds to fail when launched from a project with an attached external Git repository.

    Fixed an issue where under some conditions runs could get stuck in thePreparingstate due to a failure to pull an external Git repository.

    Domino Executors will no longer be able to resolve symlinks in project files or external Git repositories. If you have workflows that depend on symlinks, you will need to refactor them before upgrading to Domino 2.11.3.

    2.11.1 (September 11, 2018)

    Issues resolved

    Fixed an issue present in 2.11.0 that caused runs and model builds to fail if an external Git repository had been added to the project. These builds will now succeed.

    Security

    This patch changes the way Domino issues cookies to users of the Domino web application. Cookies are now always sent over SSL.

    Options have been added to disable some client-side monitoring features.

    2.6.7 / 2.8.3 (September 1, 2018)

    The following security patches are available in Domino 2.6.7 and Domino 2.8.3.

    Security

    This patch changes the way Domino issues cookies to users of the Domino web application. Cookies are now always sent over SSL.

    To improve security, Domino Executors will no longer be able to resolve symlinks in project files or external Git repositories. If you have workflows that depend on symlinks, you will need to refactor them before upgrading to Domino 2.8.3.

    2.11.0 (August 31, 2018)

    This release of Domino 2.11.0 directly follows Domino 2.9.0.

    Features

    Added new monitoring to track run status and executor health.

    Improved the layout and scaling of the Dispatcher Admin UI.

    Issues resolved

    Fixed various issues with v1 API endpoints that were present in Domino 2.9.0. Deployments using v1 API endpoints can now safely upgrade from 2.8.0 to 2.11.0.

    Resolved a compatibility issue between v1 API endpoints running R code and the latest Domino Standard Environments.

    Fixed an issue where attempting to schedule a run could result in an error if the project had a previously scheduled run from past versions of Domino.

    Infrastructure

    Connections to the Dispatcher Admin UI are now routed through the Domino Frontend. Previously, connections to this UI bypassed the Frontend.

    2.9.0 (August 22, 2018)

    Breaking changes

    Previously, when Domino hosted your model as an API endpoint, your project files were loaded onto the host machine at/project. Now, project files will be loaded at/mnt/<username>/<project_name>to be consistent with the behavior of Domino runs. An environment variable namedDOMINO_WORKING_DIRis now set on model hosts, and will always contain the path to your project files. If you have models that depend on absolute paths to/project, you should update them before launching them in Domino 2.9.0.

    Known issues

    If you stop a currently running App, switch it to run on a different hardware tier, then launch it, the App status panel will incorrectly show it running on the old hardware tier. Refreshing the page will update the panel, and show the correct hardware tier.

    Features

    Running the Domino CLI with--versionnow also reports the version of Java that the CLI is using.

    The Domino CLI now supports Java 9.

    When creating a newapp.shfile in a project, it will now contain commented code examples demonstrating how to launch applications for some common frameworks like Flask, Shiny, and Dash.

    When choosing an environment for a project from the project settings page, the list of environments in the menu is now alphabetized instead of chronological, to make it easier to find a specific environment.

    Model health checks can now be configured to start a specified duration after the model is launched, to avoid an infinite restart loop if your model requires additional time before it can serve requests. Use theInitial delayfield to set the desired duration.

    Timeout durations for models are now configurable to be higher than the previous default of 60 seconds. For AWS deployments, when changing this timeout you must also edit theidleTimeoutsetting on the Elastic Load Balancer (ELB) that serves model requests, as it also defaults to 60 seconds. See the AWS documentation to learn more.

    Issues resolved

    Fixed a bug that caused the Domino CLI to always throw an error the first time a user ranupdate-check.

    Fixed an issue that could cause on-premise deployments of Domino to hang while pulling external Git repositories.

    Fixed an issue where copying a project you did not own would produce a copy with incorrect permissions.

    2.8.1 (August 1, 2018)

    Issues resolved

    Fixed an issue where an overflowing message buffer could crash the dispatcher.

    Improved validation on some dispatcher parsing operations.

    2.6.5 (July 27, 2018)

    Issues resolved

    Fixed an issue where an overflowing message buffer could crash the dispatcher.

    2.8.0 (July 24, 2018)

    Issues resolved

    Fixed an issue where the#character in filenames was producing some invalid URLs in the Domino application.

    Domino now correctly handles SSH connections to Git servers on nonstandard ports.

    Requests to the API that result in a400 Bad Requesterror will now return application/json instead of HTML.

    Beta endpoints

    Domino 2.8.0 introduces some v4 beta API endpoints for interacting with users, organizations, and projects. Visit the redesigned API documentation site to learn more.

    2.6.4 (July 23, 2018)

    Features

    Model health checks can now be configured to start a specified duration after the model is launched, to avoid an infinite restart loop if your model requires additional time before it can serve requests. Use the Initial delay field to set the desired duration.

    Timeout durations for models are now configurable to be higher than the previous default of 60 seconds. For AWS deployments, when changing this timeout you must also edit the idleTimeout setting on the Elastic Load Balancer (ELB) that serves model requests, as it also defaults to 60 seconds. See the AWS documentation to learn more.

    2.7.1 (July 2018)

    Issues resolved

    Fixed compatibility issues with Domino 2.6 and Jupyter Notebooks running versions of Jupyter older than 4.2.

    Resolved an issue where executors were sometimes being put into Maintenance Mode while stopped.

    2.6.3 (July 2018)

    NOTE

    This version patches issues in prior versions of Domino 2.6, but is not the latest release of Domino. New deployments should use Domino 2.7+

    Issues resolved

    Fixed compatibility issues with Domino 2.6 and Jupyter Notebooks running versions of Jupyter older than 4.2.

    Added support for connecting to Git servers via SSH on ports other than the default (port 22).

    2.7.0 (June 2018)

    New features

    TheAccount Settingspage for user accounts now includes several new fields:

    AddedProject Environment Default. Users can use this field to set an environment to be the default for new projects they create, instead of using the deployment default.

    AddedWorkspace Settings. Users can now configure their workspaces to shut down automatically after a specified runtime. New projects will inherit this setting from the user that creates them.

    The Domino search field now supports searches for exact matches on quoted text.

    When downloading the Domino Command Line Interface (CLI) from a deployment, the URL needed to log in to that deployment is now displayed under the download button.

    Issues resolved

    Error messages related to cloning and fetching external Git repositories have been improved.

    Duplicating an environment now creates a copy of the original’s latest active revision, and preserves all metadata and logs.

    Models being deployed will now report availability sooner.

    Fixed a bug where capital letters in the middle of a word in a search query would split the query into two terms.

    Git repositories that have been cloned onto an executor but have not been used by a run for more than 1 week will now be automatically cleaned up to save disk space.

    Fixed an issue where runs that produces many file changes or large file changes could take a long time to finish when shutting down.

    Domino now correctly handles SSH connections to Git servers on nonstandard ports.

    Infrastructure changes

    Executors now have a health check that will mark them as unhealthy if they are unable to reach the fluentd logging service for more than 5 minutes. The duration associated with this health check is configurable.

    2.6.2 (June 2018)

    New features

    If you have added an external git repository to your project, you can now publish a model using code in that repository. When specifying which file and function to use for the model, supply a full path to the file as it would appear in a Domino workspace, like:/repos/repo-name/filename.py.

    Issues resolved

    Fixed an issue where the date picker UI for date parameters on launchers was not entering the selected value.

    Environment variable values can no longer overflow their field. Long values will be truncated with an ellipsis. Users can click to expand the field to multiple lines and see the full value.

    Fixed an issue where inviting users to collaborate on a project in deployments using SSO or LDAP would email a link to the sign-up page instead of the correct authentication portal.

    Performance of theModel Overviewinterface has been significantly improved.

    Resolved an issue where theselect allcheckbox in the header of the Domino file browser would sometimes not work correctly.

    Fixed an issue where the title of a run in the run detail pane was sometimes displaying with incorrect padding.

    2.6.1 (May 2018)

    New features

    The workspace UI now shows a full path to modified files when syncing external repositories.

    Updated the sign-up page for GDPR compliance.

    Issues resolved

    Resolved an issue where runs could take longer than expected to stop.

    Fixed where UI elements on some pages were not rendering correctly in Internet Explorer 11.

    Previously, syncing a workspace with symlinks to non-existent directories or unmounted directories could cause the sync to fail. Syncs now skip these unresolvable symlinks.

    Fixed an issue where adding environment variables to a compute environment could cause the build to fail due to incorrect decoding of certain characters.

    Fixed an issue with Domino CLI 2.6.0 where incorrect default logging settings could lead to very verbose logs.

    2.6.0 (May 2018)

    New features

    When pushing to external repositories from the Workspace UI, all Git operations performed by Domino are logged and displayed in theLogstab of theSession Overview.

    When loading a new Workspace on an existing executor with cached clones of external repositories, Domino now automatically performs a git fetch on those repositories to ensure newly created branches in remote are accessible in the Workspace.

    Applying the tagreleaseto a run will prevent the run from being archived. This can be used to prevent archival of project states that other projects import and depend on.

    Issues resolved

    In version 2.5.2, a feature for automatically stopping instances that had been in Maintenance Mode for a specified time was added to Domino. However, Domino would only stop such instances if they responded to status requests. Domino will now stop these instances even if they do not respond to status requests.

    Infrastructure changes

    Following the introduction in Domino 2.5.2 of automatic shutdown of executors in Maintenance Mode, we’ve removed the ability to set a maximum number of Maintenance Mode executors.

    2.5.2 (May 2018)

    This release directly follows Domino 2.5.0.

    New Git features available in Workspace UI

    Previously, users who wanted to push to their remote Git repositories had to do so manually through the workspace command line. In version 2.5.2+, Domino is now able to commit and push to remote Git repositories automatically through the improved workspace UI. See all the details in Git repositories in Domino.

    TheSession Overviewpanel now shows pending changes in Git repositories file-by-file, and each repository has a checkbox that controls whether changes are committed when the user performs aFull Sync:

    When committing, the modal window now lists all repositories with changes set to commit, and Domino applies the commit message entered by the user to both the project revision and the commit it makes to each repository. If no commit message is entered, Domino will default toCommitted from Domino:

    Attempting to stop a workspace while there are uncommitted changes shows a UI similar to theSession Overview, with a list of all modified files in changed repositories. Users can click the checkboxes next to the listed repositories to control which are committed to:

    The notification that appears when Domino encounters a conflict upon pushing a new commit to a Git repo has been changed from a small toast to a modal window with clearer explanation of what occurred:

    The description text in theGit Credentialspanel of theAccount Settingspage has been expanded to describe when to generate a Personal Access Token versus an SSH key. This description text will always display. It is no longer hidden once a credential has been added.

    Previously, Domino would not clone submodules of any added repositories during a run or workspace session. Domino will now clone those modules, but it will not commit changes to them when pushing changes to the containing repository.

    The files tab in Jupyter workspaces will now include a../link so that users can open the parent of the current directory. When clicked from/mnt, this will open the executor’s filesystem root and allow the user to access Git repositories in/repos.

    Other new features

    Adding a collaborator to a project that includes Git repositories now displays a message notifying the user that the new collaborator will need access to those repositories in the Git service before they can work with the project.

    By default, when adding collaborators to a project that is part of an organization, Domino will show autocomplete results for users that are not members of the organization. It is now possible to prevent this by settingcom.cerebro.domino.frontend.restrictCollaboratorsToOrganizationstotruein theCentral Config. With that setting enabled, only users who are members of the organization will show up as autocomplete results when adding collaborators to projects in that organization.

    Previously, if a user had set the active revision of an environment to somethingother than the latest, there was no way to get the environment to return to the previous behavior of always using the latest revision. If a revision has been manually set, there will now be a button on theRevisionstab toSet Active Revision to Latestthat restores the original behavior.

    Email notifications for new comments in Domino now include the name of the user who wrote the comment.

    The interface for scheduling a run now displays the timezone that Domino uses for executing scheduled runs.

    Domino CLI 2.5.2+ supports logging in with your Domino API key, to support deployments that use Single Sign-On. When SSO is enabled, the CLI will prompt users to enter their Domino API key, which is visible inAccount Settings.

    Issues resolved

    An issue that was preventing comments from being archived under some conditions has been fixed. Comments should now be reliably archived.

    The “ check out our product tour video ” link in thequick-startproject created for new users has been fixed to point to the correct URL.

    Fixed an issue where launchers could sometimes fail to delete.

    The web application now checks to see if cached dependencies are of the correct version. Cached dependencies are cleared and the correct version is downloaded where necessary, preventing users from seeing a stale interface after upgrading Domino.

    Hardware tier costs displays have had their precision increased from two decimal places to four decimal places. Previously, very low running costs like$0.0015could show up as$0.00.

    Infrastructure changes

    Improvements to the Kubernetes system drivers Domino uses when building environment images will improve build times significantly. The new drivers will be used automatically on new deployments of Domino 2.5.2+, but older deployments will require manual intervention to switch to the new drivers on upgrade.

    Instances that are placed in Maintenance Mode will now be automatically stopped after 2 hours of idle time by default. This value is configurable by admins to be longer or shorter.

    The executor template for new deployments of 2.5.2+ will run Ubuntu 14.04 and CUDA 8.0 by default. Support for Ubuntu 16.04 is available.

    Amazon Machine Images used for Domino hosts and executors now support Enhanced Networking with the Elastic Network Adapter.

    2.5.0 (April 2018)

    API

    New API endpoints are available in this release related to model publishing. It is now possible to publish models and get information about existing models through the 2.5.0 API and CLI.

    Check out the API docs to learn more, or install the Domino CLI from a deployment on version 2.5+ to try these new commands:

    publish-model: Creates and publishes a model for a project

    project-models: Get all m

    View Article
  • Overview

    Domino administrators have four important responsibilities when managing Domino Datasets :

    periodically check the Datasets administration interface

    monitor and track storage consumption

    set limits on usage per-Dataset

    handle deletion of Dataset snapshots

    Prerequisites

    Domino 3.3+ with Datasets enabled

    Accessing the Datasets administration interface

    To access the Datasets administration interface, click Admin from the Domino main menu to open the Admin home, then click Advanced > Datasets.

    marks a Snapshot for deletion

    Monitoring Datasets usage

    The Datasets administration page shows important information about Datasets usage in your deployment. At the top of the interface is a display that shows:

    total storage size used by all stored Snapshots

    the size of all storage used by Snapshots marked for deletion

    Below that display is a table of all Snapshots from the history of the deployment. This table can be sorted by Snapshot status, size, and the name of the containing Dataset.

    Setting limits on Datasets usage

    There are two important central configuration options administrators can use to limit the growth of storage consumption by Datasets.

    Namespace: common

    Key:com.cerebro.domino.dataset.quota.maxActiveSnapshotsPerDataSet

    Value: number

    Default: 20

    This option controls the maximum number of active Snapshots that may be stored in a Dataset. Snapshots marked for deletion are not active and do not count against this limit.

    Namespace: common

    Key:com.cerebro.domino.dataset.quota.maxStoredSnapshotsPerDataSet

    Value: number

    Default: 20

    This option controls the total number of Snapshots of any status that may be stored in a Dataset.

    If a Dataset reaches one of these limits, attempting to start a run with a Dataset configuration that could output a new Snapshot will result in an error message. Before additional Snapshots can be written, you will need to delete old snapshots or increase the limit.

    Administrators can authorize individual projects to ignore these limits with an option in the Hardware & environment tab of the project settings.

    Deleting Snapshots from Datasets

    Administrators can delete individual Snapshots at any time with the Delete button at the end of the row representing the Snapshot in the Datasets administration UI. Clicking this button will open a confirmation dialog, and if you choose to confirm, the Snapshot will be permanently deleted.

    To avoid losing user data, Domino recommends following a two-step process for Snapshot deletion, where the user who owns the Dataset, and then an administrator takes action to delete the Snapshot if reasonable. Non-administrator users can never permanently delete Snapshots on their own.

    From the Datasets administration UI, you'll find a button you can click to Delete all marked snapshots, and you can also sort the table of Snapshots by status to find and examine all Snapshots that have been marked for deletion.

    View Article
  • Overview

    Domino supports connecting to a Cloudera CDH5 cluster through the addition of cluster-specific binaries and configuration files to your Domino environment.

    At a high level, the process is as follows:

    Connect to your CDH5 edge or gateway node and gather the required binaries and configuration files, then download them to your local machine.

    Upload the gathered files into a Domino project to allow access by the Domino environment builder.

    Create a new Domino environment that uses the uploaded files to enable connections to your cluster.

    Enable YARN integration for the Domino projects that you want to use with the CDH5 cluster.

    Note

    Domino supports the following types of connections to a CDH5 cluster:

    FS shell

    spark2-shell

    spark2-submit

    pyspark

    YARN shell

    Gathering the required binaries and configuration files

    You will find most of the necessary files for setting up your Domino environment on your CDH5 edge or gateway node. To get started, connect to the edge node via SSH, then follow the steps below.

    Create a directory named hadoop-binaries-configs at /tmp.

    mkdir /tmp/hadoop-binaries-configs

    Create the following subdirectories inside /tmp/hadoop-binaries-configs/.

    mkdir /tmp/hadoop-binaries-configs/configs

    mkdir /tmp/hadoop-binaries-configs/parcels

    (Optional) If your cluster uses Kerberos authentication, create the following subdirectory in /tmp/hadoop-binaries/configs/.

    mkdir /tmp/hadoop-binaries-configs/kerberos

    Then, copy the krb5.conf Kerberos configuration file from /etc/ to /tmp/hadoop-binaries-configs/kerberos.

    cp /etc/krb5.conf /tmp/hadoop-binaries-configs/kerberos/

    Copy the CDH and SPARK2 directories from /opt/cloudera/parcels/ to /tmp/hadoop-binaries-configs/parcels/. These directories will have a version number appended to their names, so complete the appropriate directory name in the commands shown below.

    cp -R /opt/cloudera/parcels/CDH-<version>/ /tmp/hadoop-binaries-configs/parcels/

    cp -R /opt/cloudera/parcels/SPARK2-<version>/ /tmp/hadoop-binaries-configs/parcels/

    Copy the hadoop, hive, spark, and spark2 directories from /etc/ to /tmp/hadoop-binaries-configs/configs/.

    cp -R /etc/hadoop /tmp/hadoop-binaries-configs/configs/

    cp -R /etc/hive /tmp/hadoop-binaries-configs/configs/

    cp -R /etc/spark2 /tmp/hadoop-binaries-configs/configs/

    cp -R /etc/spark /tmp/hadoop-binaries-configs/configs/

    On the edge node, run the following command to identify the version of Java running on the cluster.

    java -version

    You should then download a JDK .tar file from the Oracle downloads page that matches that version. The filename will have a pattern like the following.

    jdk-8u211-linux-x64.tar.gz

    Keep this JDK handy on your local machine for use in a future step.

    Compress the /tmp/hadoop-binaries-configs/ directory to a gzip archive.

    cd /tmp

    tar -zcf hadoop-binaries-configs.tar.gz hadoop-binaries-configs

    When finished, use SCP to download the archive to your local machine.

    Next, you'll need to extract the archive on your local machine, add a java subdirectory, then add the JDK .tar file you downloaded earlier to the java subdirectory.

    tar xzf hadoop-binaries-configs.tar.gz

    mkdir hadoop-binaries-configs/java

    cp jdk-8u211-linux-x64.tar.gz hadoop-binaries-configs/java/

    When finished, your hadoop-binaries-configs directory should have the following structure.

    hadoop-binaries-configs/

    configs/

    hadoop/

    hive/

    spark/

    spark2/

    java/

    jdk-8u211-linux-x64.tar.gz

    parcels

    CDH-version/

    SPARK-version/

    kerberos/ # optional

    krb5.conf

    If your directory contains all the required files, you can now compress it to a gzip archive again in preparation for uploading to Domino in the next step.

    tar -zcf hadoop-binaries-configs.tar.gz hadoop-binaries-configs

    Uploading the binaries and configuration files to Domino

    Use the following procedure toupload the archive you created in the previous step to a public Domino project. This will make the file available to the Domino environment builder.

    Log in to Domino, then create a new public project.

    configure credentials at the user level or project level

    Open the Files page for the new project, then click to browse for files and select the archive you created in the previous section. Then click Upload.

    Once the archive has been uploaded, click the gear menu next to it on the Files page, then right click Download and click Copy Link Address. Save the copied URL in your notes, as you will need it in the next step.

    Once you have recorded the download URL of the archive, you're ready to build a Domino environment for connecting to your CDH5 cluster.

    Creating a Domino environment for connecting to CDH5

    Click Environments from the Domino main menu, then click Create Environment.

    Give the environment an informative name, then choose a base environment that includes the version of Python that is installed on the nodes of your CDH5 cluster. Most Linux distributions ship with Python 2.7 by default, so you will see the Domino Analytics Distribution for Python 2.7 used as the base image in the following examples. Click Create when finished.

    After creating the environment, click Edit Definition. Copy the below example into your Dockerfile Instructions, then be sure to edit it wherever necessary with values specific to your deployment and cluster.

    Note

    In this Dockerfile, wherever you see a hyphenated instruction enclosed in carats like <paste-your-domino-download-url-here>, be sure to replace it with the corresponding value you recorded in previous steps.

    You may also need to edit commands that follow to match downloaded filenames.

    USER root

    # Give user ubuntu ability to sudo as any user including root

    RUN echo "ubuntu ALL=(ALL:ALL) NOPASSWD: ALL" >> /etc/sudoers

    # Set up directories

    RUN mkdir -p /opt/cloudera/parcels && \

    mkdir /tmp/domino-hadoop-downloads && \

    mkdir /usr/java

    # Download the binaries and configs gzip you uploaded to Domino.

    # This downloaded gzip file should have the following

    # - CDH and Spark2 parcel directories in a 'parcels' sub-directory.

    # - java installation tar file in 'java' sub-directory

    # - krb5.conf in 'kerberos' sub-directory

    # - hadoop, hive, spark2 and spark config directories a 'configs' sub-directory

    RUN wget --no-check-certificate <paste-your-domino-download-url-here> -O /tmp/domino-hadoop-downloads/hadoop-binaries-configs.tar.gz && \

    tar xzf /tmp/domino-hadoop-downloads/hadoop-binaries-configs.tar.gz -C /tmp/domino-hadoop-downloads/

    # Install kerberos client and update the kerberos configuration file

    RUN apt-get -y install krb5-user telnet && \

    cp /tmp/domino-hadoop-downloads/hadoop-binaries-configs/kerberos/krb5.conf /etc/krb5.conf

    # Install version of Java that matches hadoop cluster and update environment variables

    # Note that your JDK may have a different filename depending on your cluster's version of Java

    RUN tar xvf /tmp/domino-hadoop-downloads/hadoop-binaries-configs/java/jdk-8u162-linux-x64.tar -C /usr/java

    ENV JAVA_HOME=/usr/java/jdk1.8.0_162

    RUN echo "export JAVA_HOME=/usr/java/jdk1.8.0_162" >> /home/ubuntu/.domino-defaults && \

    echo "export PATH=$JAVA_HOME/bin:$PATH" >> /home/ubuntu/.domino-defaults

    # Install CDH hadoop-client binaries from cloudera ubuntu trusty repository.

    # This example shows client binaries for CDH version 5.15 here.

    # Update these commands with the CDH version that matches your cluster.

    RUN echo "deb [arch=amd64] http://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh trusty-cdh5.15.0 contrib" >> /etc/apt/sources.list.d/cloudera.list && \

    echo "deb-src http://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh trusty-cdh5.15.0 contrib" >> /etc/apt/sources.list.d/cloudera.list && \

    wget http://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh/archive.key -O /tmp/domino-hadoop-downloads/archive.key && \

    apt-key add /tmp/domino-hadoop-downloads/archive.key && \

    apt-get update && \

    apt-get -y -t trusty-cdh5.15.0 install zookeeper && \

    apt-get -y -t trusty-cdh5.15.0 install hadoop-client

    # Copy CDH and Spark2 parcels to correct directories and update symlinks

    # Note that the version strings attached to your directory names may be different than the below examples.

    RUN mv /tmp/domino-hadoop-downloads/hadoop-binaries-configs/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21 /opt/cloudera/parcels/ && \

    mv /tmp/domino-hadoop-downloads/hadoop-binaries-configs/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809 /opt/cloudera/parcels/ && \

    ln -s /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21 /opt/cloudera/parcels/CDH && \

    ln -s /opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809 /opt/cloudera/parcels/SPARK2

    # Copy hadoop, hive and spark2 configurations

    RUN mv /etc/hadoop /tmp/domino-hadoop-downloads/hadoop-binaries-configs/configs/hadoop-etc-local.backup && \

    mv /tmp/domino-hadoop-downloads/hadoop-binaries-configs/configs/hadoop /etc/hadoop && \

    mv /tmp/domino-hadoop-downloads/hadoop-binaries-configs/configs/hive /etc/hive && \

    mv /tmp/domino-hadoop-downloads/hadoop-binaries-configs/configs/spark2 /etc/spark2 && \

    mv /tmp/domino-hadoop-downloads/hadoop-binaries-configs/configs/spark /etc/spark

    # Create alternatives for hadoop configurations. Update the extensions with the same strings as found in your edge node

    # Example: In the command 'update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.cloudera.yarn 55'

    # make sure that /etc/hadoop/conf.cloudera.yarn is named the same as the corresponding file on your edge node.

    # Sometimes in the CDH5 edgenode, that is named something like /etc/hadoop/conf.cloudera.yarn_

    RUN update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.cloudera.yarn 55 && \

    update-alternatives --install /etc/hive/conf hive-conf /etc/hive/conf.cloudera.hive 55 && \

    update-alternatives --install /etc/spark2/conf spark2-conf /etc/spark2/conf.cloudera.spark2_on_yarn 55 && \

    update-alternatives --install /etc/spark/conf spark-conf /etc/spark/conf.cloudera.spark_on_yarn 55

    # These instructions are for Spark2

    # Creating alternatives for Spark2 binaries, also create symlink for pyspark pointing to pyspark2

    RUN update-alternatives --install /usr/bin/spark2-shell spark2-shell /opt/cloudera/parcels/SPARK2/bin/spark2-shell 55 && \

    update-alternatives --install /usr/bin/spark2-submit spark2-submit /opt/cloudera/parcels/SPARK2/bin/spark2-submit 55 && \

    update-alternatives --install /usr/bin/pyspark2 pyspark2 /opt/cloudera/parcels/SPARK2/bin/pyspark2 55 && \

    ln -s /usr/bin/pyspark2 /usr/bin/pyspark

    # Update SPARK and HADOOP environment variables. Make sure py4j file name is correct per your edgenode

    ENV SPARK_HOME=/opt/cloudera/parcels/SPARK2/lib/spark2

    RUN echo "export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop" >> /home/ubuntu/.domino-defaults && \

    echo "export HADOOP_CONF_DIR=/etc/hadoop/conf" >> /home/ubuntu/.domino-defaults && \

    echo "export YARN_CONF_DIR=/etc/hadoop/conf" >> /home/ubuntu/.domino-defaults && \

    echo "export SPARK_HOME=/opt/cloudera/parcels/SPARK2/lib/spark2" >> /home/ubuntu/.domino-defaults && \

    echo "export SPARK_CONF_DIR=/etc/spark2/conf" >> /home/ubuntu/.domino-defaults && \

    echo "export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip" >> /home/ubuntu/.domino-defaults

    # Change spark-defaults.conf file permission

    RUN mv /etc/spark2/conf/spark-defaults.conf /etc/spark2/ && \

    chmod 777 /etc/spark2/conf.cloudera.spark2_on_yarn

    # Copy hive-site.xml to /etc/spark2/conf to access hive tables from Spark2.

    RUN cp /etc/spark2/conf/yarn-conf/hive-site.xml /etc/spark2/conf/

    Scroll down to the Pre Run Script field and add the following lines.

    cat /etc/spark2/spark-defaults.conf >> /etc/spark2/conf/spark-defaults.conf

    sed -i.bak '/spark.ui.port\=0/d' /etc/spark2/conf/spark-defaults.conf

    Scroll down and click Advanced to expand additional fields. Add the following line to the Post Setup Script field.

    echo "export YARN_CONF_DIR=/etc/hadoop/conf" >> /home/ubuntu/.bashrc

    Click Build when finished editing the Dockerfile instructions. If the build completes successfully, you are ready to try using the environment.

    Configure a Domino project for use with a CDH5 cluster

    This procedure assumes that an environment with the necessary client software has been created according to the instructions above. Ask your Domino admin for access to such an environment.

    Open the Domino project you want to use with your CDH5 cluster, then click Settings from the project menu.

    On the Integrations tab, click to select YARN integration from the Apache Spark panel, then click Save. You do not need to edit any of the fields in this section.

    If your cluster uses Kerberos authentication, you can . Do so before attempting to use the environment. Note that if you followed the instructions above on creating your environment, your Kerberos configuration file has already been added to it.

    On the Hardware & Environment tab, change the project default environment to the one with the cluster's binaries and configurations files installed.

    You are now ready to start Runs from this project that interact with your CDH5 cluster.

    View Article
  • Overview

    In Domino 4.0+ you can add goals to projects. Goals represent outcomes or subtasks within the project. Contributors to the project can link files, Workspace sessions, Jobs, and Apps to goals, which show up on the goal card in the project overview.

    This provides a way to track all work related to a specific goal in the project, and can make navigating large and busy projects easier.

    Creating goals

    From the project overview, click the Goals tab, then click + Add Goals. Provide a title and description, then click Save.

    Managing goals

    You can always see the current status of a goal by returning to the Goals tab on the project overview. Click the menu button in the top right of a goal card to see options for editing, deleting, or changing the completion status of a goal.

    Linking work to goals

    You can link Workspace sessions or Jobs to a goal by checking the desired entries in their respective dashboards and clicking the Link to goal button.

    You can link an App to a goal by clicking Link to goal on the App settings tab.

    You can link Model APIs to goals by clicking Link to Goal in the Actions column of the Model APIs overview.

    You can also link files to goals by clicking Link to Goal in the upper right of the file view.

    View Article
  • Prerequisites

    Domino 4.0+

    Overview

    Users in Domino can author and publish several types of data products and assets, including:

    Model APIs

    Apps

    Launchers

    Scheduled Jobs

    These are valuable, lasting resources that are meant to deliver results for consumers and colleagues. To make it easy to discover assets, users in Domino 4.0+ can find all of the above types of assets they have access to in the Assets Portfolio in the Control Center.

    Opening the Assets Portfolio

    To open the Assets Portfolio, use the Switch to menu to open the Control Center, then click Assets in the Domino main menu.

    organizations

    Navigating the Assets Portfolio

    The Assets Portfolio will show assets in projects that you own or have been added to as a collaborator. Additionally, Domino System Administrators will see assets in all projects across the Domino instance, and Domino Project Managers will see assets in all projects owned by users in their .

    When first opened, the Assets Portfolio will show Model APIs by default. Click one of the four asset type buttons at the top of the Assets Portfolio to view that type of asset.

    For each asset type, the table of information will include unique columns that provide useful metrics and information specific to that asset. You can select which columns you want to view with the column-picker above the table, and you can filter the table by typing a query in the text box next to it.

    View Article
  • Overview

    Each run and workspace in Domino operates in its own Docker container. These Docker containers are defined byDomino environments. Environments can be shared and customized, and they are automatically versioned by Domino.

    New installations of Domino come with a standard set of environments and associated Docker images. Periodically, Domino publishes a new set of standard environments with updated libraries and packages. This article describes how to adopt the latest Domino standard environments.

    Standard Environments

    Domino maintains three standard compute environments:

    Domino Analytics Distribution (DAD) for Python 2.7

    Domino Analytics Distribution (DAD) for Python 3.6

    Domino Analytics Distribution (DAD) for Python 3.7

    The Domino Analytics Distributions are designed to handle most of what a typical data science workflow needs out of the box, and differ only on which version of Python they have installed.

    These images are hosted on Quay.io, a third-party Docker registry. By default, your Domino deployment comes configured with keys to access Domino’s images in this service.

    The URLs for the latest versions of Domino Standard Environments are:

    DAD Python 2.7: quay.io/domino/base:Ubuntu18_DAD_Py2.7_R3.5-20190501

    DAD Python 3.6: quay.io/domino/base:Ubuntu18_DAD_Py3.6_R3.5-20190501

    DAD Python 3.7:quay.io/domino/base:Ubuntu18_DAD_Py3.7_R3.5-20190501

    Contents of the Base Environments:

    Ubuntu 18.04

    Anaconda Python 2.7, 3.6 or 3.7

    R3.5.3

    Jupyter Lab0.35.4

    Jupyter 1.0

    Rstudio1.2

    VSCode

    Keras2.2.4 and Tensorflow-gpu* 1.13.1

    CUDA10.0.130 with Nvidia driver 410.104

    Open JDK 8

    Scala 2.12.6

    Julia 1.1

    Node 8.10

    Various common utilities (Curl, vim, AWSCLI,telnet, etc)

    See here for a full list of python and R packages:

    Ubuntu18_DAD_Py2.7_R3.5-20190501

    Ubuntu18_DAD_Py3.6_R3.5-20190501

    Ubuntu18_DAD_Py3.7_R3.5-20190501

    *note: to use Tensorflow on a cpu machine you must install Tensorflow's cpu version (e.g. pip install tensorflow).

    Identifying standard environments in your deployment

    To determine which version of Domino standard environments your deployment is using, open Domino and clickEnvironmentsin the top navigation bar. You can use the search bar to find environments withDomino Analyticsin the name. However, previous administrators or installers may have added these environments under a different name. To confirm that an environment is using a Domino standard, click on it to open theOverview.

    Creating a new AMI

    Note:

    Your installation of Domino may be set up with a previous version of our standard base image. Examples of historical base images:

    quay.io/domino/base:Ubuntu16_DAD_Py2.7_R3.4-20180727

    quay.io/domino/base:Ubuntu16_DAD_Py3.6_R3.4-20180727

    quay.io/domino/base:DAD_py2.7_R3.4_23052018

    quay.io/domino/base:DAD_py3.6_R3.4_23052018

    quay.io/domino/base:DED_py2.7_R3.4_23052018

    quay.io/domino/base:DED_py3.6_R3.4_23052018

    quay.io/domino/base:2016-12-07_1239_flat2

    You may also have a custom base image with your companies name in the URL.

    If you are using one of these older images, you may require an upgrade to your system before you are able to utilize the Ubuntu 16.04 or 18.04 images first released in August 2018.

    Adding standard environments to your deployment

    Adding a Domino standard environment to your deployment involves four high-level steps.

    Create a new environment with the correct defaults for the standard you want to add.

    See Compute Environment Management for an overview of how to create and manage environments.

    See the table of environment details below to populate theBase Image,Dockerfile Instructions, andPluggable Workspace Toolsfields. These fields must be filled out correctly for the environment to work as intended.

    Critical Note: To use the latest version of the base image, you must have the Pluggable Workspace Tools feature turned on.

    Copy over any additional customization from your existing default compute environment.

    If you added additional packages and libraries to your previous default environment, and you intend to use the new standard environment as your deployment default, you should copy over those additions to the new environment you created. These can be added to the end of theDockerfile Instructionsin the new environment.

    You should not copy over any instructions that update Python, R, RStudio, or other core components of the standard environments if the new standard already includes the version you want.

    Be sure to copy over any setup scripts and run scripts attached to your previous default, as your users may depend on them.

    Update the Nvidia drivers on your executor template (AWS customers using GPUs only) or Update the Nvidia on your GPU executor (On Premise customers using GPUs only)

    Only all other changes you make make to a compute environment when using packages optimized for GPUs (e.g. Tensorflow-gpu and CUDA) there is a dependency between the host machine and the container. In this case the nvidia drivers must match.

    Please reach out to [email protected]

    Create a new machine image for your executor template that includes the new environment. (AWS customers only)

    Seefor instructions.

    Once this is done, you can set the new environment as the deployment default.

    You may also choose to archive the previous default environment. Any existing projects using an archived environment will be able to continue using it, but new projects will not have access to it.

    Detailed information about standard environments

    This section contains detailed specifications about the two Domino Standard Environments, plus the configuration settings needed when adding them to your deployment.

    Domino Analytics Distribution (DAD) for Python 2

    Title:

    Domino Analytics Distribution py2.7 R3.5

    Description:

    Ubuntu 18.04

    Anaconda Python 2.7

    R3.5.3

    Jupyter Lab0.35.4, Jupyter 1.0, Rstudio1.2, VSCode

    Keras2.2.4 and Tensorflow-gpu* 1.13.1

    CUDA10.0.130 with Nvidia driver 410.104

    For python packages run: !pip freeze

    For R packages run: installed.packages()

    For further detail, please ask Domino Support for the full Dockerfile

    Base Image URL:

    quay.io/domino/base:Ubuntu18_DAD_Py2.7_R3.5-20190501

    Pluggable Workspace Tools

    jupyterlab:

    title: "JupyterLab (Beta)"

    iconUrl: "/assets/images/workspace-logos/jupyterlab.svg"

    start: [ /var/opt/workspaces/Jupyterlab/start.sh ]

    httpProxy:

    internalPath: "/{{#if pathToOpen}}/lab/tree/{{pathToOpen}}{{/if}}"

    port: 8888

    rewrite: false

    rstudio:

    title: "RStudio"

    iconUrl: "/assets/images/workspace-logos/Rstudio.svg"

    start: [ "/var/opt/workspaces/rstudio/start" ]

    httpProxy:

    port: 8888

    vscode:

    title: "vscode"

    iconUrl: "/assets/images/workspace-logos/vscode.svg"

    start: [ "/var/opt/workspaces/vscode/start" ]

    httpProxy:

    port: 8888

    jupyter:

    title: "Jupyter (Python, R, Julia)"

    iconUrl: "/assets/images/workspace-logos/Jupyter.svg"

    start: [ "/var/opt/workspaces/jupyter/start" ]

    httpProxy:

    port: 8888

    rewrite: false

    internalPath: "/{{#if pathToOpen}}tree/{{pathToOpen}}{{/if}}"

    supportedFileExtensions: [ ".ipynb" ]

    Domino Analytics Distribution (DAD) for Python 3.6

    Title:

    Domino Analytics Distribution py3.6 R3.5

    Description:

    Ubuntu 18.04

    AnacondaPython 3.6

    R3.5.3

    Jupyter Lab0.35.4,Jupyter 1.0,Rstudio1.2, VSCode

    Keras2.2.4 and Tensorflow-gpu* 1.13.1

    CUDA10.0.130 with Nvidia driver 410.104

    For python packages run: !pip freeze

    For R packages run: installed.packages()

    For further detail, please ask Domino Support for the full Dockerfile

    Base Image URL:

    quay.io/domino/base:Ubuntu18_DAD_Py3.6_R3.5-20190501

    Pluggable Workspace Tools

    jupyterlab:

    title: "JupyterLab (Beta)"

    iconUrl: "/assets/images/workspace-logos/jupyterlab.svg"

    start: [ /var/opt/workspaces/Jupyterlab/start.sh ]

    httpProxy:

    internalPath: "/{{#if pathToOpen}}/lab/tree/{{pathToOpen}}{{/if}}"

    port: 8888

    rewrite: false

    rstudio:

    title: "RStudio"

    iconUrl: "/assets/images/workspace-logos/Rstudio.svg"

    start: [ "/var/opt/workspaces/rstudio/start" ]

    httpProxy:

    port: 8888

    vscode:

    title: "vscode"

    iconUrl: "/assets/images/workspace-logos/vscode.svg"

    start: [ "/var/opt/workspaces/vscode/start" ]

    httpProxy:

    port: 8888

    jupyter:

    title: "Jupyter (Python, R, Julia)"

    iconUrl: "/assets/images/workspace-logos/Jupyter.svg"

    start: [ "/var/opt/workspaces/jupyter/start" ]

    httpProxy:

    port: 8888

    rewrite: false

    internalPath: "/{{#if pathToOpen}}tree/{{pathToOpen}}{{/if}}"

    supportedFileExtensions: [ ".ipynb" ]

    Domino Analytics Distribution (DAD) for Python 3.7

    Title:

    Domino Analytics Distribution py3.7 R3.5

    Description:

    Ubuntu 18.04

    Anaconda Python 3.7

    R3.5.3

    Jupyter Lab0.35.4, Jupyter 1.0, Rstudio1.2, VSCode

    Keras2.2.4 and Tensorflow-gpu* 1.13.1

    CUDA10.0.130 with Nvidia driver 410.104

    For python packages run: !pip freeze

    For R packages run: installed.packages()

    For further detail, please ask Domino Support for the full Dockerfile

    Base Image URL:

    quay.io/domino/base:Ubuntu18_DAD_Py3.7_R3.5-20190501

    Pluggable Workspace Tools

    jupyterlab:

    title: "JupyterLab (Beta)"

    iconUrl: "/assets/images/workspace-logos/jupyterlab.svg"

    start: [ /var/opt/workspaces/Jupyterlab/start.sh ]

    httpProxy:

    internalPath: "/{{#if pathToOpen}}/lab/tree/{{pathToOpen}}{{/if}}"

    port: 8888

    rewrite: false

    rstudio:

    title: "RStudio"

    iconUrl: "/assets/images/workspace-logos/Rstudio.svg"

    start: [ "/var/opt/workspaces/rstudio/start" ]

    httpProxy:

    port: 8888

    vscode:

    title: "vscode"

    iconUrl: "/assets/images/workspace-logos/vscode.svg"

    start: [ "/var/opt/workspaces/vscode/start" ]

    httpProxy:

    port: 8888

    jupyter:

    title: "Jupyter (Python, R, Julia)"

    iconUrl: "/assets/images/workspace-logos/Jupyter.svg"

    start: [ "/var/opt/workspaces/jupyter/start" ]

    httpProxy:

    port: 8888

    rewrite: false

    internalPath: "/{{#if pathToOpen}}tree/{{pathToOpen}}{{/if}}"

    supportedFileExtensions: [ ".ipynb" ]

    View Article
  • WARNING

    This is an advanced prodecure. If done improperly, it could leave your deployment in an inoperable state. This document is intended for experienced administrators only. Please reach out to [email protected] if you have questions.

    This procedure only covers deployments using elastic compute resources in Amazon Web Services (AWS).

    Overview

    Domino’s cluster of executors dynamically scales up and down based on the current demand for compute resources, driven by the number of active runs. When Domino needs to create a new machine to add to the cluster, Domino uses a hardware tier definition connected to an AWS Amazon Machine Image (AMI) template to define the starting state of the machine.

    Each run or workspace that a user starts will run in a Docker container with an associated Docker image on a machine in the executor cluster. Domino pulls the required Docker image from an internal or external Docker registry.

    In order to minimize the time it takes to pull the Docker image onto a new machine, we suggest that you add your base image and most common environments images to your executor template, and create a new AMI for future executors. This way, the Docker layers do not need to be downloaded from the registry onto each new executor, and instead are available immediately when the machine is spun up. See Run States to learn more about the life cycle of a run.

    This article describes the process and best practices for creating a new AMI. The process involves use of the executor template, which is an idle executor machine that is not used for runs, but exists only to be a fresh template. You will need access to the AWS console for the account where your deployment is running to find this machine and perform the necessary steps.

    Procedure

    Log in to the Executor Template

    Log in to the AWS console where your Domino deployment is installed, open the EC2 service, clickInstancesin the sidebar, and find the executor template instance. This instance should be tagged with a name that includes the stringexecutor-template. Start the template machine if it is stopped.

    Using its IP or AWS DNS address, SSH into the executor template machine using your deployment’s private key. Example:

    ssh ec2-xx-xx-xxx-xxx.us-west-2.compute.amazonaws.com -i ~/my-private-key

    This key should be supplied by Domino engineers following your Domino installation. If you do not have the key, reach out to [email protected].

    Pull your desired Docker images

    Rundocker imagesas the root user to see what images are cached.

    Rundocker pullfollowed by an image URL to cache the specified image on the executor. If an image was built within Domino, you can find the URL on theRevisionstab for the environment in the Domino UI. Example:

    docker pull quay.io/domino/base:DED_py3.6_R3.4_23052018

    Snap the AMI

    Runsalt-call state.highstate. This applies all necessary software and system updates.

    Select the executor template machine in the AWS EC2 console, then clickActions -> Images -> Create Image. Naming the new AMIdomino-<deployment-name>-executor-YYYYMMDD-HHMM. Use the default storage volumes, but be sure to checkDelete on Terminationfor all volumes.

    From the sidebar, clickAMIs. Wait for the new AMI to have a status ofavailable. You may need to refresh to see the table update. Once it’s ready, record the AMI ID. You will need the ID to set up the AMI for use in Domino.

    Test the new AMI

    In the Domino application, open an existing unused hardware tier, or create a new hardware tier for testing.

    Edit the hardware tier, and set the AMI ID to the one you recorded in the previous section.

    Set up a Domino project to use the hardware tier you just edited, and use an environment that you cached an image for earlier. When you start a workspace in this project, you should see it progress through aqueuedstate as it starts up the new machine, but spend zero (or minimal) time in apullingstate.

    Apply the new AMI to other hardware tiers

    NOTE

    Be sure to alert users of incoming changes to their hardware tiers, or conduct these steps during a maintenance window.

    Make note of the current AMI IDs used by existing hardware tiers. You can use these notes to revert later if needed.

    Before updating all hardware tiers, make sure you don’t have any hardware tiers that use special AMIs. For instance, some GPU workloads may use a special hardware tier with a customized AMI running Ubuntu 16.04. Do not change such tiers to use the new AMI.

    You can update hardware tiers individually to use the new AMI by editing them in the Domino application and entering the new ID. Alternatively, you can updateall hardware tiersto use the new AMI for all new machines by connecting to the Domino central server via SSH, and running the following MongoDB command:

    db.executor_group_configuration.update({},{$set:an{"executorImage":"NEW_AMI_ID"}},{multi:true})

    Currently running executors will not automatically switch to the new AMI. You can place such machines in Maintenance Mode, preventing new runs from starting on that machine, and manually terminate the machine when live runs have concluded. They will be replaced executors created with the new AMI when compute demand triggers a new machine spin up.

    FAQ

    How often should I snap a new AMI?

    We recommend that administrators review their AMIs and compute environments quarterly, or if you’ve noticed that users have custom compute environments that take a long time to pull when starting runs. You can refactor those environments by removing common custom instructions and adding it to a base image. You can then add this new base image to your AMI, and those common instructions will be cached.

    Which Docker images should I add to the AMI?

    Docker operates in layers. For example, consider two image with layersABC, andABCDErespectively. These images share their first three layers. Each layer being the state generated by a line in the Dockerfile. If an image with layersABCis already cached on a machine, then only layersDandEneed to be downloaded when you want to use an image with layersABCDE. We recommend that you build most of your environments on top of a small number (<5) of base images, and that you add those images to your AMI. There’s no hard limit to the number of images you can cache, but adding more images requires more disk space on executors.

    Should I remove old images from the AMI?

    This is not required. You may want to keep them to maintain backwards compatibility, or you may chose this as an opportunity to encourage users to start working from the latest image. The only consequence of removing an older image from the AMI is longer pulling times for users who start runs with environments that depend on that image.

    View Article
  • You can assign admin access roles to certain users.

    The available roles are:

    System Admin

    Support Staff

    System Librarian

    Project Manager (Domino 4.0+)

    Permission

    System Admin

    Support Staff

    System Librarian

    Project Manager (Organization)

    ViewDispatcher

    X

    X

    Stop Runs

    X

    X

    Launch/Stop/Start Executors

    X

    X

    Edit Hardware Tier/Executor Configuration

    X

    X

    View Logs

    X

    X

    View Usage

    X

    X

    Upgrade/Stop/Restart Server

    X

    X

    Manage API Endpoints

    X

    X

    Manage Project Tags

    X

    X

    X

    List All Projects

    X

    X

    X

    Preview Projects

    X

    X

    X

    Curate Projects

    X

    X

    X

    Edit Global Compute Environment

    X

    Edit Configuration Details

    X

    Manage Organizations

    X

    View Control Center

    X

    Global Project Access

    X

    Global Model APIs Access

    X

    Run MongoDB Commands

    X

    Manage User Roles

    X

    Manage Datasets

    X

    View Assets

    X

    X

    ASystem Admin user can grant access roles to other users. To do so, visit the admin page and click Users from the top menu.Locate the user you want to grant permissions to, click Edit next to the username, then select the desire role.

    About the Project Manager Role

    In Domino 4.0+ users may be assigned a role of Project Manager. When Project Managers are members of organizations, their role grants them contributor-level access to all projects that are owned by other members of the organizations. This allows the Project Manager to see these projects and their assets in the Projects Portfolio and Assets Portfolio.

    Note that the Project Manager may also have the ability to add users to these organizations, thereby gaining contributor access to those users' projects. For this reason, Project Manager should be treated as a highly privileged role, similar to System Administrator.

    View Article
  • Overview

    Domino maintains a pool of machines called executors, organized into hardware tiers for use in Domino Runs. Domino system administrators can monitor and take actions on executors from the Admin interface, and this document will describe how to monitor and work with the fleet of executors in your Domino deployment.

    Contents

    Monitoring and working with executors

    Executor state

    Taking actions on executors

    Health checks

    Health check failures

    Unresponsive executors

    Executors dead on arrival

    Maintenance Mode

    Configurable timeout settings

    Monitoring and working with executors

    Domino system administrators can click Dispatcher from the Admin home to view and manage executors in the deployment. This interface features live updating, and shows all currently configured hardware tiers and executors, with information about usage and current state for each.

    Central Config

    Executor state

    In the Dispatcher interface, you can see information about your executors. You will see flags for the following states on executors.

    Available

    Available executors are ready for use, and may be assigned Runs, unless they have been manually put in Maintenance Mode.

    Unusable

    Domino will not assign Runs to executors in an Unusable state, and executors in an Unusable state do count against the total number of allowed executors in their hardware tiers. This means that a large number of executors in an Unusable state can fill the capacity of a hardware tier and limit its availability. If this occurs, system administrators should either transfer take action to manually terminate them, wait for the executors to be automatically terminated, or put the executors into Maintenance Mode to attempt to repair them.

    Maintenance Mode

    You will see a flag indicating executors that have been put into Maintenance Mode, plus an optional comment from the admin who toggled Maintenance Mode on for that executor.

    Taking actions on executors

    The rightmost column for each executor in the Dispatcher interface is an Actions link that opens an interface where Domino system administrators can take the following actions.

    Toggle Maintenance Mode on the executor

    Start or stop the executor

    Terminate the executor

    Restart the executor

    Health checks

    Domino executors are subject to several periodic health checks. There are two checksthat test if the Dispatcher is able to connect to vital services running on the executor.

    However, there is also a configurable health check for disk space. If com.cerebro.domino.executor.minUsableSpaceInGB is set to a non-zero value, disk space health checks will run and executors will fail the health check if their available disk space is lower than the minimum of the following two configuration options.

    Namespace: common

    Key: com.cerebro.domino.executor.diskSpaceRunsGarbageCollectorFreeSpaceLimit

    Value: number of bytes

    Default: 50000000000 (this is ~50GB in bytes)

    Namespace: common

    Key: com.cerebro.domino.executor.minUsableSpaceInGB

    Value: number of gigabytes

    Default: 0

    If this option is set to its default value of 0, disk space health checks will be disabled and will not run or impact your executors.

    Note: These settings can be configured in the Central Config settings

    Health check failures

    If an executor fails a health check, the following process occurs.

    On the next Dispatcher tick, the Dispatcher will stop scheduling Runs on the executor.

    The executor goes idle once any existing Runs finish.

    After 5 minutes (configurable) of idle, the executor enters an Unusablestate visible in its Executor State column on the Dispatcher interface.

    After 15 minutes (configurable) in Unusable state, the executor is stopped.

    48 hours (configurable) after stopping, executors in an Unusable state are terminated.

    Unresponsive executors

    When previously functional executors become completely unresponsive, Domino executes the following process.

    After 15-minutes (configurable) of being unresponsive, the executor is placed in an Unusable state and stopped immediately.

    48 hours (configurable) after stopping the instance is terminated.

    Executors dead on arrival

    When Domino attempts to start a new executor that never becomes responsive, the following process occurs.

    After 15-minutes (configurable) of being unresponsive, the executor is placed in an Unusable state and stopped immediately.

    48 hours (configurable) after stopping the instance is terminated.

    Maintenance Mode

    From the Actions interface for an individual executor, Domino system administrators can enable Maintenance Mode on the executor. This does the following things.

    Executors in Maintenance Mode will not be assigned new Runs by the Dispatcher.

    Executors inMaintenance Mode will not be automatically terminated by Unusable state timeouts.

    Executors in Maintenance Mode do not count against the total executor limits for their hardware tiers.

    An executor that is responsive and has been in Maintenance Mode for 120 minutes (configurable) will be stopped.

    An executor that is unresponsive and has been in Maintenance Mode for 15 minutes (configurable) will be stopped.

    An executor that is passing its health checks while in Maintenance Mode will attempt to rejoin the pool of Available executors in its hardware tier when Maintenance Mode is toggled off.

    Domino system administrators should consider putting executors in an Unusable state into Maintenance Modeif they believe the executor can be fixed and restored to healthy operation, or if they want to attempt to recover data from the executor and thus want to exempt it from automatic termination.

    Configurable timeout settings

    Namespace: common

    Key: com.cerebro.domino.executor.maxIdleMaintenanceModeTimeInMinutes

    Value: number of minutes

    Default: 120

    This is the time a machine can be responsive, running, and idle in Maintenance Mode before it will be automatically stopped.

    Namespace: common

    Key: com.cerebro.domino.dispatcher.clusterHealthMonitoring.unhealthyExecutorMMTimeout

    Value: JODA duration

    Default: 15m

    This is the duration before an unresponsive executor in Maintenance Mode will be stopped, and the duration before an unresponsive executor will be set to an Unusable state.

    Namespace: common

    Key: com.cerebro.domino.executor.healthCheckTimeoutInMinutes

    Value: number of minutes

    Default: 5

    This is the duration before an Available executor that is failing health checks will be put into an Unusable state.

    Namespace: common

    Key: com.cerebro.domino.dispatcher.clusterHealthMonitoring.unusable2StoppedExecutorTimeoutMin

    Value: number of minutes

    Default: 5

    This is the duration a machine in an Unusable state can remain idle before being stopped.

    Namespace: common

    Key: com.cerebro.domino.dispatcher.clusterHealthMonitoring.unusable2TerminatedExecutorTimeoutMin

    Value: number of minutes

    Default: 2880

    This is the duration a machine in an Unusable state can remain stopped before being terminated.

    Note: These settings can be configured in thesettings

    View Article
  • Domino lets you create "organizations" with groups of users, so you can permission your project to many users at once and more easily add/remove collaborators from multiple projects.

    To create an Organization, navigate to your Account Settings page (the button with your username in the upper right), and click the "New Organization" button:

    compute environments

    An Organization is associated with onespecific user account whoowns and manages the organization (the "Admin").If you are logged inunderthis account, you can manageorganization membership on your Account page.

    From there, you canadd/remove users from your Organization.

    Permissions of Organization Members

    All members of your Organization have "owner-level" access to any projects under the Organization's account. To make that concrete...

    Let's say you have an Organization accountwith username your_org; it hasmembers nick, chris, and matthew. That means nick, chris, and matthew will all have owner-level access toany your_org's projects, e.g., your_org/quick-start, your_org/project1.

    They cannot, however, changetheOrganization's membership -- only you can do that.

    Transferring projects to an Organization

    Organization members can transfer ownership of a project to the Organization by clicking Settings from the project menu, then on the Archive Project or Transfer Ownership tab, clicking Change Project Owner. Enter the current owner and project name, then for the New owner username use the name of the Organization.

    Organization Collaborators

    If you add an organization as a collaborator to one of your projects, all members of the organization will begranted collaborator-level access.

    Typical Uses for Organizations

    Simple project access management

    Project access management is the most common use for organizations in Domino.

    Domino sees organizations and users as near-equivalents; the only real difference is that an organization contains multiple users. Because of this, organizations are a handy tool for simplifying project access for groups of users.

    For example, inviting a team to join a project you are working on is as simple as creating an organization consisting of all the members of that team, and then adding the organization to the project. This is also useful in the event that you elect to remove the team from the project. You can also break up an existing organization into smaller groups, and then un-invite the original team while retaining the smaller, more relevant organization.

    You can also use organizations to provide restricted access to your projects, instead of full access. For example, to give a team read-only access to a project, simply add the organization to your project and set its role to Results Consumer. If there's a group that should only be able to use Launchers, add the organization and give them Launcher User access.

    Organization admins have the power to add or subtract people to their organization. Adding users to (or removing them from) organizations instead of each individual project that organization owns can be a real time-saver. Simply add or remove them once, at the organization level. Domino will take care of updating their access to each project for you.

    Projects in production

    Organizations are also a useful way to manage projects that have reached a “production grade”-level of quality. Users can set up organizations that are specifically intended to be homes for projects that are ready to be promoted to production, or that have already been promoted. Users working with projects owned by this organization will understand that these projects should be carefully documented and kept free of clutter. Any changes that are needed will require a fork and merge request process. Runs executed in these projects end up being very deliberate, with the purpose of updating data or models.

    The process of using organizations to manage projects promoted to production can unfold in one of two ways. The first occurs when a project starts out as an individual user's project. That user invites others to this project, either individually or by using the Organizations feature, and over time other users come to find this project particularly useful. After a certain point, a decision is made that access to the project should be easier, and that it should be owned by the organization instead of the individual user.

    The other path involves initial ownership of the project by an organization. This usually happens when the project is deemed to be important from the very beginning, and the project can best be built up by a group. As soon as this project is created, all users have access to all its resources, and project development proceeds mostly by forking and merging. Execution of code is still done in the organization-owned project as a way to check in your analysis or results.

    Some Domino customers do this even for models that are not actually in production, as a way of submitting their work to be counted.

    Sharing compute environments

    One of the advantages of organizations is that they streamline and simplify the practice of sharing . Project owners who are members of an organization can use all of that organization’s compute environments for their own projects.

    For example, if a user is a member of a corporate organization, whatever environments that organization’s admins have created will be available to the user for their own projects. Domino does not constrain users to membership in just one organization. If a user is a member of two organizations - a ‘corporate organization’ with production environments and an ‘R&D organization’ with more cutting edge environments - that user has access to the compute environments of both organizations, as well as any global environments and environments the user already owns directly.

    What would happen if this user were to be removed from an organization he was a member of, even though several of his projects rely on that organization’s environments? In that case, the now-unavailable environments would be reset to the user’s default environment. Domino sends a notification to any affected users whenever this happens.

    View Article
  • Prerequisites

    Domino 3.0+

    Access to the Domino Analytics Distribution Py3.6 R3.4 standard environment

    Overview

    This article will show you how to publish a Python App with Dash in Domino 3.0+

    In this tutorial you will:

    configure a Domino environment with the necessary dependencies to publish a Dash App

    create a project and set it up for App publishing

    publish an App to the Domino launchpad

    observe how other users in Domino can use the App

    You'll be working with the second example from Basic Dash Callbacks, part of the Dash User Guide. In this example, the application serves an interactive scatter plot of countries by GDP per capita and life expectancy.

    It will take approximately 15 minutes to get this example running in Domino.

    Set up environment

    The first step is to create a Domino environment capable of running your App.

    From the Lab, click Environments.

    From the Environments Overview, click Create Environment.

    Give your environment a descriptive name and description, and selectDomino Analytics Distribution Py3.6 R3.4from the Environment dropdown under Base Image. This selection means that the setup instructions we provide for this environment will be applied on top of a base image with Python 3.6 and some analytics modules already installed. Read Domino standard environments to learn more about the contents of this base image.

    Bash

    Click Create Environment when finished.

    You will be directed to the Overview tab for your new environment. Click Edit Dockerfile.

    In the Dockerfile Instructions field, paste in the following instructions:

    # Install the libraries we want to use in our app

    RUN pip install dash==0.22.0 && \

    pip install dash-renderer==0.13.0 && \

    pip install dash-html-components==0.11.0 && \

    pip install dash-core-components==0.26.0 && \

    pip install plotly --upgrade

    Click Build when finished.

    You will be directed to the Revisions tab for your environment. Here you'll be able to monitor the build process for your new version of the environment. If the build succeeds, you're ready to use this environment for App publishing.

    Set up project

    The next step is creating a project with the settings and content you need to publish your App.

    From the Lab, click Projects.

    Click New Project.

    Give your project an informative name, then clickCreate Project.

    Click Settingsin the project sidebar, then set the Compute environment to the one you created in the previous step.

    Click Filesin the project sidebar, then click Add File.

    Name the file app.py in the title field above the editor.

    In the body of the file, paste the following example App code.

    import dash

    import dash_core_components as dcc

    import dash_html_components as html

    import pandas as pd

    import plotly.graph_objs as go

    df = pd.read_csv(

    'https://raw.githubusercontent.com/plotly/'

    'datasets/master/gapminderDataFiveYear.csv')

    app = dash.Dash()

    app.config.update({

    'requests_pathname_prefix': ''

    })

    app.layout = html.Div(style={'paddingLeft': '40px', 'paddingRight': '40px'}, children=[

    dcc.Graph(id='graph-with-slider'),

    dcc.Slider(

    id='year-slider',

    min=df['year'].min(),

    max=df['year'].max(),

    value=df['year'].min(),

    step=None,

    marks={str(year): str(year) for year in df['year'].unique()}

    )

    ])

    @app.callback(

    dash.dependencies.Output('graph-with-slider', 'figure'),

    [dash.dependencies.Input('year-slider', 'value')])

    def update_figure(selected_year):

    filtered_df = df[df.year == selected_year]

    traces = []

    for i in filtered_df.continent.unique():

    df_by_continent = filtered_df[filtered_df['continent'] == i]

    traces.append(go.Scatter(

    x=df_by_continent['gdpPercap'],

    y=df_by_continent['lifeExp'],

    text=df_by_continent['country'],

    mode='markers',

    opacity=0.7,

    marker={

    'size': 15,

    'line': {'width': 0.5, 'color': 'white'}

    },

    name=i

    ))

    return {

    'data': traces,

    'layout': go.Layout(

    xaxis={'type': 'log', 'title': 'GDP Per Capita'},

    yaxis={'title': 'Life Expectancy', 'range': [20, 90]},

    margin={'l': 40, 'b': 40, 't': 10, 'r': 10},

    legend={'x': 0, 'y': 1},

    hovermode='closest'

    )

    }

    if __name__ == '__main__':

    app.run_server(port=8888, host='0.0.0.0', debug=True)

    Relative pathname config

    Make special note of line 12. When serving a Dash application from Domino, you must configure Dash to serve from a relative path, instead of the default root path. Include this line immediately after initializing your app:

    app.config.requests_pathname_prefix =app.config.update({

    'requests_pathname_prefix': ''

    })''

    Make note of two important variables in the final line of the file. Domino-hosted applications must run on a host of 0.0.0.0 and listen on port 8888. These are the settings Domino will expect when directing users to your App.

    Click Save when finished.

    The last thing to do before publishing your App is to create an app.sh file. This is a script that Domino runs after initializing the host that will serve your App. It should contain all commands required to launch your App. In this example, the only command you need is python app.py. Create this file the same way you did for app.py, then save it.

    Publish

    Now you're ready to publish your App.

    Click Publish from the project sidebar.

    Give your App an informative title and description, and set Permissions to Anyone can access. This will allow anyone with a network connection to your Domino deployment to access the App if they have the URL.

    Click Publish.

    Once the App status says Running, click View App to load your App. You should see the interactive scatterplot with a Domino toolbar above it showing the project it's published from, plus buttons to email the App owner and open the description panel.

    Share and consume

    Now that your App is published, if you set the permissions toAnyone can access, you can now easily share it with colleagues who have access to your instance of Domino. You can try this out yourself by opening a private or incognito browser, or logging out of Domino, and navigating to the App URL.

    View Article
  • This article will guide you through your first steps in Domino. You'll be working with some sample data from the Global Power Plant Database. You'll see examples of Jupyter, Dash, pandas, and NumPy used in Domino.

    In Part 1, you'll learn how to:

    create a project

    launch a workspace session

    retrieve data for use in Domino

    In Part 2, you'll learn how to:

    create new files in your project

    publish an App

    share your work with others

    Part 1

    The first thing you'll see after logging into Domino is the Projects page, displaying your projects.

    Jupyterlab

    Every new user will own an automatically-createdquick-start project. This project contains an informative README with tips and instructions for working in Domino. It's a useful reference, but for this tutorial you're going to want a fresh project.

    Click New Project to get started.

    Give your project an informative name, set its visibility to Private, then click Create Project.

    Usually, after you create a project you'll want to apply some settings appropriate for the work you plan to do. The software environment your code will run in is controlled by the Domino Environment your project is configured to use. For this tutorial, any of the prepackaged Domino Standard Environments will work, so you can leave the project settings at their defaults.

    Click Workspaces from the project menu to continue.

    Select Jupyter, give the workspace an informative name, then clickLaunch Jupyter Workspace.

    When you launch a workspace in Domino, a new containerized session is created on a machine in the required hardware tier. The workspace tool you requested is launched in that container, and your browser is automatically redirected to the workspace's UI when it's ready.

    Once your workspace is up and running, you will see a fresh Jupyter interface. If you're brand new to Jupyter, you might find the Jupyter anddocumentation helpful.

    You can see from the file path that you're in /mnt. By default, this is considered the root of your Domino project. When your project files are loaded onto an executor machine like this one, they'll be placed here. If you add or modify files in/mnt, you can save them back to your project when you stop or sync the workspace.

    Your next step is to download some data to the executor.

    Use the New menu to open a Jupyter terminal.

    Once your terminal is open, run the following command to fetch some data exported from the Global Power Plant Database:

    wget https://s3-us-west-2.amazonaws.com/dominodatalab-gppd/global_power_plant_database.csv

    Click the Jupyter logo at the top to return to the files browser. You should see a file named global_power_plant_database.csv in /mnt. Your next step is to do some basic manipulation of the data.

    Use the New menu to create a Python notebook.

    In Jupyter, you enter Python code in cells and hit Shift+Enter to execute the focused cell. Each time you execute a cell, you step forward to the next program state, as though you were typing each cell into the Python interpreter.

    In your first cell, enter these lines to import some necessary packages, then hit Shift+Enter to execute:

    import pandas as pd

    import numpy as np

    In the cells after that, you should read the file you downloaded into a pandas dataframe, then display the data:

    df = pd.read_csv('global_power_plant_database.csv')

    df

    You can now see a truncated table of the data. Let's answer a simple question:

    For power plants commissioned between 1990 and 2000, how many gigawatt hours are produced by each fuel source today?

    To answer, you'll want to clean up the data a bit. Filter down to only those plants that have an estimated future production:

    df = df[df.estimated_generation_gwh > 0]

    After that, you should also filter down to plants with a commissioning_year in the target range, discard any decimal places on those dates, and then display the data again.

    df = df[df.commissioning_year > 1990]

    df = df[df.commissioning_year < 2000]

    df.commissioning_year = df['commissioning_year'].astype(int)

    df

    Now that you've got the data you're interested in, you can group the plants by the fuel1 column, and thenaggregate estimated_generation_gwh and display the results:

    fuels = df.groupby('fuel1')

    totals = fuels['estimated_generation_gwh'].agg(np.sum)

    totals

    You can see from the data that from power plants built between 1990 and 2000, today the world draws the greatest number of gigawatt hours from coal, followed by natural gas and hydroelectrics. This is an interesting start, so you should save your work.

    Click Stop from the top menu.

    Domino will show you which files have changed during your workspace session, and prompt you to commit them back to your project files. Enter an informative commit message, then clickStop and Commit. Once the workspace shuts down you can close it. If you return to your project in Domino and look at the Files page, you'll see the raw data and the notebook file have been saved in the latest revision. If you start a new workspace, those files will be loaded into /mnt and you can resume where you left off.

    Part 2

    Now that you've found some interesting data, you can use Domino to share it in a way that's easier to browse and consume than a Python notebook. Domino can host Apps built with popular web application frameworks. This lets you power interactive visualizations with your Domino data, and quickly share insights.

    Return to your project from Part 1, and click Files from the project menu.

    Click the Add File button. Name the new file app.py, and paste in the following code for a Dash application. This code builds on the aggregation-by-fuel idea from Part 1, and shows how much capacity was commissioned by fuel type in each year.

    import dash

    import dash_core_components as dcc

    import dash_html_components as html

    import pandas as pd

    import numpy as np

    import plotly.graph_objs as go

    app = dash.Dash()

    app.config.update({

    'requests_pathname_prefix': ''

    })

    df = pd.read_csv('global_power_plant_database.csv')

    df = df[df.estimated_generation_gwh > 0]

    df = df[df.commissioning_year >= 1990]

    df = df[df.commissioning_year <= 2000]

    df.commissioning_year = df['commissioning_year'].astype(int)

    app.layout = html.Div(style={'paddingLeft': '40px', 'paddingRight': '40px'}, children=[

    dcc.Graph(id='graph-with-slider'),

    dcc.Slider(

    id='year-slider',

    min=df['commissioning_year'].min(),

    max=df['commissioning_year'].max(),

    value=df['commissioning_year'].min(),

    step=None,

    marks={str(year): str(year) for year in df['commissioning_year'].unique()}

    )

    ])

    @app.callback(

    dash.dependencies.Output('graph-with-slider', 'figure'),

    [dash.dependencies.Input('year-slider', 'value')])

    def update_figure(selected_year):

    filtered_df = df[df.commissioning_year == selected_year]

    print(filtered_df)

    grouped_df = filtered_df.groupby('fuel1')

    averages = grouped_df['estimated_generation_gwh'].agg(np.sum)

    fuel_values = []

    production_values = []

    for fuel, average in averages.items():

    fuel_values.append(fuel)

    production_values.append(average)

    bars = [go.Bar(x=fuel_values, y=production_values)]

    return {

    'data': bars,

    'layout': go.Layout(

    xaxis={'type': 'category', 'title': 'Fuel type'},

    yaxis={'title': 'GWH commissioned'},

    margin={'l': 40, 'b': 40, 't': 10, 'r': 10},

    )

    }

    if __name__ == '__main__':

    app.run_server(port=8888, host='0.0.0.0', debug=True)

    Click Save when finished. The last thing you need to do before publishing this App is set up an app.sh file. When you publish an App from a project, Domino looks in the project files for a file named app.sh with the necessary commands to launch the application.

    Return to the Files page and click Add File again. Name this file app.sh and paste in the following single line, then click Save:

    python app.py

    You're now ready to publish your App.

    Click Publish from the project menu.

    Title and describe your App, then click Publish.

    If your app starts successfully, you can click View App to open it.

    All that's left is to share your data with a colleague. Return to the project and clickPublish again from the project menu. Under App Permissions you'll find a field you can use to send email invites to view your App.

    View Article
  • Domino can connect to and query any common database,including Google BigQuery. In this article, we outline the steps to create a Google service account, authenticate to Google, and use the BigQuery API to query a public table.

    Obtaining credentials

    Go to Google's Service Accounts page. Select a previous project or create a new project.

    how to store your credentials securely

    New Project Screen:

    Create a Service account for your project.

    Define the access that the Service account should have to BigQuery. See Google's Access Control documentation for more information.

    Confirm that your Service account has been created.

    On the Service Accounts page, create a new key.

    Download the JSON key and keep it in a safe place. You will use this key later to programmatically authenticate to Google.

    Enable the BigQuery API

    Click on the Google APIs logo in the top left of the screen.

    In the Library page, select the Big Query API.

    If it is not enabled, click Enable.

    Activating your credentials from Domino

    Google Cloud activatesyour credentials using the Google Cloud SDK, which is already installed in the Domino Default environment. You will need to execute the following bash command:

    /home/ubuntu/google-cloud-sdk/bin/gcloud auth activate-service-account <service account name>--key-file <key file path>

    Example:

    /home/ubuntu/google-cloud-sdk/bin/gcloud auth activate-service-account big-query-example@example-big-query-170823.iam.gserviceaccount.com --key-file key.json

    You may want to use a custom Domino compute environment andenter this command in Domino pre-setup script to activate the credentials before each run. Otherwise, you can execute them in workspace sessions. Make sure to read more on .

    Authenticate and query using Python

    You will require two Python packages: gcloud andoauth2client==1.4.12. You can install them using

    pip install --user gcloud oauth2client==1.4.12

    in either your custom Domino compute environment or in your workspace session.

    Use the following code snippet to authenticate your Google credentials and query a public Big Query table:

    from oauth2client.client import GoogleCredentials

    from googleapiclient.discovery import build

    # Grab the application's default credentials from the environment.

    credentials = GoogleCredentials.get_application_default()

    # Construct the service object for interacting with the BigQuery API.

    bigquery_service = build('bigquery', 'v2', credentials=credentials)

    query_request = bigquery_service.jobs()

    query_data = {

    'query': (

    'SELECT TOP(corpus, 10) as title, '

    'COUNT(*) as unique_words '

    'FROM [publicdata:samples.shakespeare];')

    }

    query_response = query_request.query(

    projectId='example-big-query-170823', # Substitute your ProjectId

    body=query_data).execute()

    print('Query Results:')

    for row in query_response['rows']:

    print('\t'.join(field['v'] for field in row['f']))

    View Article
  • Any user can look up the version of Domino that they are currently running by going to <your domino url>/version. For example, users on try.domino.tech can see the version at try.domino.tech/version.

    A deployment running 3.6.10 will display as the following:

    release notes

    To learn more about the features in your release, you can consult the Domino .

    View Article
  • Overview

    For security reasons, Domino Workspace sessions are only accessible on one port. For example, Jupyter typically uses port 8888. When you launch a Jupyter Workspace session, a Domino executor starts the Jupyter server in a Run, and opens port 8888 to serve the Jupyter application to your browser. If you were to attempt to use the Jupyter terminal to start another application on a different port, it would not be accessible.

    However, in some cases you may want to run multiple interactive applications in the same Workspace session.These cases include:

    Editing and debugging Dash or Flask apps live

    Using Tensorboard to view progress of a live training job

    Domino 3.5+ supports this with Jupyter Server Proxy and JupyterLab.

    Environment Prerequisites

    Python 3+

    Jupyter Server Proxy

    Jupyter Server Proxy is installed by default in the latest Domino Standard Environments. To install it in one of your existing environments, see the instructions below.

    Installing Jupyter Server Proxy in your environment

    If you are not on the recent version of the Domino Standard Environments, you can install Jupyter Server Proxy in your Domino environment, follow these steps. If you are on a recent Environment, you can skip these steps.

    1. Add the following lines to your environment's Dockerfile Instructions.

    # Install NodeJS

    # You can omit this step if your environment already has NodeJS 6+ installed

    RUN curl -sL https://deb.nodesource.com/setup_8.x | bash - && \

    apt-get install nodejs -y && \

    rm -rf /var/lib/apt/lists/*

    # Switch to the latest JupyterLab start script

    RUN rm -rf /var/opt/workspaces/Jupyterlab/start.sh && \

    cd /var/opt/workspaces/Jupyterlab/ && \

    wget https://raw.githubusercontent.com/dominodatalab/workspace-configs/2019q2-v1.3/Jupyterlab/start.sh&& \

    chmod 777 /var/opt/workspaces/Jupyterlab/start.sh

    # Install and enable jupyter-server-proxy

    RUN pip install --upgrade jupyterlab==0.35.4 && \

    pip install nbserverproxy jupyter-server-proxy && \

    jupyter serverextension enable --py --sys-prefix nbserverproxy && \

    jupyter labextension install jupyterlab-server-proxy

    2. Update the JupyterLab definition in the Pluggable Workspace Tools section of your environment.

    jupyterlab:

    title: "JupyterLab (Beta)"

    iconUrl: "https://raw.githubusercontent.com/dominodatalab/workspace-configs/develop/workspace-logos/jupyterlab.svg?sanitize=true"

    start: [ /var/opt/workspaces/Jupyterlab/start.sh ]

    httpProxy:

    internalPath: "/{{#if pathToOpen}}/lab/tree/{{pathToOpen}}{{/if}}"

    port: 8888

    rewrite: false

    Using Jupyter Server Proxy

    If you launch a JupyterLab Workspace session in an environment with Jupyter Server Proxy installed, you can start and serve additional applications as long as they are served on a different port than JupyterLab itself.

    Once an additional application is started, you can access it at the following URI:

    <workspace id>-workspace.<host name>/proxy/<port>

    Suppose your JupyterLab session is served at:

    https://app.dominodatalab.com/workspace?owner=chuckhead&projectName=demo&runId=5cef31de46e0fb00083f9708

    You can then use the JupyterLab terminal to start a Dash app on port 8887 for debugging. You can then open the Dash app at:

    https://5cef31de46e0fb00083f9708-workspace.app.dominodatalab.com/proxy/8887/

    The slash at the end of the URL is necessary for the app to load properly.

    VSCode

    Once your app is running, if you edit its source files in JupyterLab, when you restart the app in your browser the edits will take effect.

    For environments that have VSCode installed within JupyterLab, it's possible to start a session from JupyterLab, and then start an App from VSCode. This will allow you to debug using VSCode.

    Note that any new App or process you start and open in a separate tab will not have the Domino Workspace UI, with options to stop, sync, commit, or manage your project files. To access this UI and manage your changes, you must open the main JupyterLab tab for your Workspace session.

    View Article
  • Overview

    Data science projects often require multiple steps to go from raw data to useful data products. These steps tend to be sequential, and involve things like:

    sourcing data

    cleaning data

    processing data

    training models

    Once you understand the steps necessary to deliver results from your work, it's useful to automate them as a repeatable pipeline. Domino has the ability to schedule Jobs, but for more complex pipelines you can pair Domino with an external scheduling system like Apache Airflow.

    This article will describe how to integrate Airflow with Domino by using the python-domino package.

    Contents

    Domino and Airflow webinar

    Getting started with Airflow

    Installing python-domino on your Airflow server

    How Airflow tasks map to Domino Jobs

    Example pipeline

    Domino and Airflow webinar

    This video is a recording of a webinar held on February 21st, 2019. This webinar walks through a detailed example of integrating Airflow with Domino.

    Airflow DAG

    Click here to see the slides from this presentation.

    Getting started with Airflow

    Airflow is an open source platform to author, schedule, and monitor pipelines of programmatic tasks. As a user, you can define pipelines with code and configure the Airflow scheduler to execute the underlying tasks. The Airflow UI can be used visualize, monitor, and troubleshoot pipelines.

    If you are new to Airflow, read the Airflow Quick Start to set up your own Airflow server.

    There are many options for configuring your Airflow server, and for pipelines that can run parallel tasks, you will need to use Airflow's LocalExecutor mode. In this mode you can run tasks in parallel and execute multiple dependencies at the same time. Airflow uses a database to keep records of all the tasks it executes and schedules, so you will need to install and configure a SQL database for LocalExecutor mode.

    Readthe following guide to learn more about setting up LocalExecutor mode:

    A Guide On How To Build An Airflow Server/Cluster

    For more information about scheduling and triggers, notifications, and pipeline monitoring, read the Airflow documentation.

    Installing python-domino on your Airflow workers

    To create Airflow tasks that work with Domino, you need to install python-domino on your Airflow workers. This library will enable you to add tasks in your pipeline code that interact with the Domino API to start Jobs.

    Connect to your Airflow workers, and follow these steps to install and configure python-domino:

    Install from pip

    pip install git+https://github.com/dominodatalab/python-domino.git

    Set up an Airflow variable to point to the Domino host. This is the URL where you load the Domino application in your browser.

    Key: DOMINO_API_HOST

    Value: <your-domino-url>

    Set up an Airflow variable to store the user API key you want to use with Airflow. This is the user Airflow with authenticate to Domino as for the purpose of starting Jobs.

    Key: DOMINO_API_KEY

    Value: <your-api-key>

    How Airflow tasks map to Domino Jobs

    Airflow pipelines are defined with Python code. This fits in well with Domino’s code-first philosophy. You can use python-domino in your pipeline definitions to create tasks that start Jobs in Domino.

    Architecturally, Airflow has its own server and worker nodes, and Airflow will operate as an independent service that sits outside of your Domino deployment. Airflow will need network connectivity to Domino so its workers can access the Domino API to start Jobs in your Domino project. All the code that performs the actual work in each step of the pipeline -- code that fetches data, cleans data, and trains data science models -- is maintained and versioned in your Domino project. This way you have Domino’s Reproducibility engine working together with Airflow’s scheduler.

    Example pipeline

    The following example assumes you have an Airflow server where you want to set up a pipeline of tasks that fetches data, cleans and processes data, performs an analysis, then generates a report. It also assumes you have all the code required to complete those tasks stored as scripts in a Domino project.

    The example graph shown above is written using Airflow and python-domino, and executes all the dependencies in Domino using the Airflow scheduler. It trains a model using multiple datasets, and generates a final report.

    See the commented script below for an example of how to configure an to execute such a pipeline with Domino Jobs.

    from datetime import datetime, timedelta

    from airflow import DAG

    from airflow.operators.dummy_operator import DummyOperator

    from airflow.operators.python_operator import PythonOperator

    from domino import Domino

    from airflow.models import Variable

    # Initialize Domino API object with the api_key and host

    api_key=Variable.get("DOMINO_API_KEY")

    host=Variable.get("DOMINO_API_HOST")

    domino = Domino("sujaym/airflow-pipeline",api_key,host)

    # Parameters to DAG object

    default_args = {

    'owner': 'domino',

    'depends_on_past': False,

    'start_date': datetime(2019, 2, 7),

    'retries': 1,

    'retry_delay': timedelta(minutes=.5),

    'end_date': datetime(2019, 2, 10),

    }

    # Instantiate a DAG

    dag = DAG('domino_pipeline', description='Execute Airflow DAG in Domino',default_args=default_args,schedule_interval=timedelta(days=1))

    # Define Task instances in Airflow to kick off Jobs in Domino

    t1 = PythonOperator(task_id='get_dataset_1', python_callable=domino.runs_start_blocking, dag=dag, op_kwargs={"command":["src/data/get_dataset_1.py"]})

    t2= PythonOperator(task_id='get_dataset_2', python_callable=domino.runs_start_blocking, op_kwargs={"command":["src/data/get_dataset_2.py"]}, dag=dag)

    t3 = PythonOperator(task_id='get_dataset_3', python_callable=domino.runs_start_blocking, op_kwargs={"command":["src/models/get_dataset_3.sh"]}, dag=dag)

    t4 = PythonOperator(task_id='clean_data', python_callable=domino.runs_start_blocking, op_kwargs={"command":["src/data/cleaning_data.py"]}, dag=dag)

    t5 = PythonOperator(task_id='generate_features_1', python_callable=domino.runs_start_blocking, op_kwargs={"command":["src/features/word2vec_features.py"]}, dag=dag)

    t6 = PythonOperator(task_id='run_model_1', python_callable=domino.runs_start_blocking, op_kwargs={"command":["src/models/run_model_1.py"]}, dag=dag)

    t7 = PythonOperator(task_id='do_feature_engg', python_callable=domino.runs_start_blocking, op_kwargs={"command":["src/features/feature_eng.py"]}, dag=dag)

    t8 = PythonOperator(task_id='run_model_2', python_callable=domino.runs_start_blocking, op_kwargs={"command":["src/models/run_model_2.py"]}, dag=dag)

    t9 = PythonOperator(task_id='run_model_3', python_callable=domino.runs_start_blocking, op_kwargs={"command":["src/models/run_model_3.py"]}, dag=dag)

    t10 = PythonOperator(task_id='run_final_report', python_callable=domino.runs_start_blocking, op_kwargs={"command":["src/report/report.sh"]}, dag=dag)

    # Define your dependencies

    t2.set_upstream(t1)

    t3.set_upstream(t1)

    t4.set_upstream(t2)

    t5.set_upstream(t3)

    t6.set_upstream([t4, t5])

    t7.set_upstream(t4)

    t8.set_upstream(t7)

    t9.set_upstream(t7)

    t10.set_upstream([t6, t8, t9])

    View Article
  • If you have signed up for an account on our support site, you will need to verify your email before you can log in. Check the inbox for the email account that you signed up with for an email from [email protected] with subject line "Verify your email". If you didn't receive it, you can reach out to [email protected] for assistance.

    Once signed in, you will be able to submit tickets and see past tickets in our system. You can also use your account to access our community at community.dominodatalab.com

    View Article
  • Users interested in self-guided training can find videos on this page that cover both introductory topics and advanced features.

    Domino Fundamentals

    Domino 201

    Domino Continuing Education Webinar Series

    Domino Fundamentals

    This video series teaches the basics of working in Domino. It explains the behavior of Domino Runs, including Jobs and interactive Workspace sessions. It also introduces Domino environments and explains the benefits of the Domino Reproducibility Engine.

    Introduction to Domino Fundamentals

    Domino Architecture

    Introduction to Jobs

    Executing Jobs

    Introduction to Workspaces

    Using Workspaces

    Introduction to Environments

    Working with Environments

    Domino 201

    This video series covers more advanced Domino features, like environment variables, Git integration, Model publishing, App publishing, Launchers, scheduled Jobs, and the Domino CLI. The series follows a practical example of a data science project through each of these feature areas.

    Before beginning this series, you should already know the basics of working in Domino, as covered in the Domino Fundamentals series.

    Introduction to Domino 201

    GitHub Integration

    Environment Variables

    Feature Engineering

    Model Training and Diagnostic Statistics

    Model APIs

    Web Applications

    Launchers and Scheduled Jobs

    Domino CLI

    Domino Continuing Education Webinar Series

    The webinar recordings below cover advanced and specialized topics related to Domino.

    Best Practices for App Publishing in Domino

    Using Apache Airflow with Domino

    Introduction to Domino Datasets

    Using Apache Spark with Domino

    View Article
  • Overview

    Domino supports connecting to an Amazon EMR cluster through the addition of cluster-specific binaries and configuration files to your Domino environment.

    At a high level, the process is as follows:

    Connect to the EMR Master Node and gather the required binaries and configuration files, then download them to your local machine.

    Upload the gathered files into a Domino project to allow access by the Domino environment builder.

    Create a new Domino environment that uses the uploaded files to enable connections to your cluster.

    Enable YARN integration for the Domino projects that you want to use with the EMR cluster.

    Note

    Domino supports the following types of connections to an EMR cluster:

    FS shell

    spark-shell

    spark-submit

    pyspark

    Gathering the required binaries and configuration files

    You will find the necessary files for setting up your Domino environment on the EMR Master Node. To get started, connect to your Master Node via SSH, then follow the steps below.

    Create a directory named hadoop-binaries-configs at /tmp.

    mkdir /tmp/hadoop-binaries-configs

    Create a subdirectory named configs in /tmp/hadoop-binaries-configsand create the listed subdirectories inside it.

    mkdir /tmp/hadoop-binaries-configs/configs

    mkdir -p /tmp/hadoop-binaries-configs/configs/hadoop

    mkdir -p /tmp/hadoop-binaries-configs/configs/hive

    mkdir -p /tmp/hadoop-binaries-configs/configs/spark

    Copy the contents of the hive, spark, and hadoop directories from /etc to/tmp/hadoop-binaries-configs/configs.

    cp -rL /etc/hadoop/conf /tmp/hadoop-binaries-configs/configs/hadoop

    cp -rL /etc/hive/conf /tmp/hadoop-binaries-configs/configs/hive

    cp -rL /etc/spark/conf /tmp/hadoop-binaries-configs/configs/spark

    Copy the following additional directories to /tmp/hadoop-binaries-configs.

    cp -r /usr/lib/hadoop /tmp/hadoop-binaries-configs

    cp -r /usr/lib/hadoop-lzo /tmp/hadoop-binaries-configs

    cp -r /usr/lib/spark /tmp/hadoop-binaries-configs

    cp -r /usr/share/aws /tmp/hadoop-binaries-configs

    cp -r /usr/share/java /tmp/hadoop-binaries-configs

    Add the following lines to the end of the newly copied file at/tmp/hadoop-binaries-configs/configs/hadoop/conf/hdfs-site.xml. This is necessary since the Domino executor will not be able to connect to HDFS on the same private IPs that the Master Node uses.

    cd /tmp/hadoop/binaries/configs/hadoop/conf/

    echo "<property>" >> hdfs-site.xml

    echo "<name>dfs.client.use.datanode.hostname</name>" >> hdfs-site.xml

    echo "<value>true</value>" >> hdfs-site.xml

    echo "</property>" >> hdfs-site.xml

    (Optional) If your EMR cluster uses Kerberos authentication, create a subdirectory named kerberos at /tmp/hadoop-binaries-configs.

    mkdir /tmp/hadoop-binaries-configs/kerberos

    Then copy the Kerberos configuration file krb5.conf from /etc to/tmp/hadoop-binaries-configs/kerberos.

    cp /etc/krb5.conf /tmp/hadoop-binaries-configs/kerberos/

    Once you've copied and edited all of the above files into /tmp/hadoop-binaries-configs, zip up the directory for transfer to your local machine.

    cd /tmp

    tar -zcf hadoop-binaries-configs.tar.gz hadoop-binaries-configs

    Then use SCP from your local machine to download the zipped archive. Refer back to the AWS documentation on connecting to a Master Node via SSH for credentialing and address information.

    Uploading the binaries and configuration files to Domino

    Use the following procedure toupload the files you retrieved in the previous step to a public Domino project. This will make the files available to the Domino environment builder.

    Log in to Domino, then create a new public project.

    Open the Files page for the new project, then click to browse for files and select the archive of binaries and configuration files you downloaded from the EMR Master Node. Then click Upload.

    After your upload has completed, click the gear menu next to the uploaded file, then right click Download and click Copy Link Address. Save this URL in your notes, as you will need it in the next step.

    Once you have recorded the download URL of the binaries and configuration files archive, you're ready to build a Domino environment for connecting to EMR.

    Creating a Domino environment for connecting to EMR

    First, you need to visit the Spark downloads page to copy a download URL for the Spark binaries. Use the dropdown menus to select the correct version of the binaries for your EMR cluster, then right click the download link and click Copy Link Address. Record the copied URL for use in a later step.

    Click Environments from the Domino main menu, then click Create Environment.

    Give the environment an informative name, then choose a base environment that includes the version of Python that is installed on the nodes of your EMR cluster. Most Linux distributions ship with Python 2.7 by default, so you will see the Domino Analytics Distribution for Python 2.7 used as the base image in the following examples. Click Create when finished.

    After creating the environment, click Edit Definition. Copy the below example into your Dockerfile Instructions, then be sure to edit it wherever necessary with values specific to your deployment and cluster.

    Note

    In this Dockerfile, wherever you see a hyphenated instruction enclosed in carats like <paste-your-domino-download-url-here>, be sure to replace it with the corresponding value you recorded in previous steps.

    You may also need to edit commands that follow to match downloaded filenames.

    USER root

    # Give ubuntu user ability to sudo as any user including root

    RUN echo "ubuntu ALL=(ALL:ALL) NOPASSWD: ALL" >> /etc/sudoers

    # Set up directories.

    RUN mkdir /tmp/domino-hadoop-downloads

    # Download the binaries and configs gzip from Domino project.

    #

    # This downloaded gzip archive should contain a configs directory with

    # hadoop, hive, and spark subdirectories directories.

    #

    # Make sure the URL is edited to reflect where you uploaded your configs.

    # You should have this saved from previous steps.

    RUN wget --no-check-certiticate <paste-your-domino-file-download-url-here> -O /tmp/domino-hadoop-downloads/hadoop-binaries-configs.tar.gz && \

    tar xzf /tmp/domino-hadoop-downloads/hadoop-binaries-configs.tar.gz -C /tmp/domino-hadoop-downloads/

    ### Copy hadoop, hive, and spark configurations

    RUN cp -r /tmp/domino-hadoop-downloads/hadoop-binaries-configs/configs/hadoop /etc/hadoop && \

    cp -r /tmp/domino-hadoop-downloads/hadoop-binaries-configs/configs/hive /etc/hive && \

    cp -r /tmp/domino-hadoop-downloads/hadoop-binaries-configs/configs/spark /etc/spark

    ### Set correct hadoop and spark config directory names. Sometimes the directories in EMR cluster are named as conf.dist. Check your EMR cluster for right names for each of the configs

    RUN rm /etc/hadoop/conf && \

    rm /etc/spark/conf && \

    rm /etc/hive/conf && \

    mv /etc/hadoop/conf.empty /etc/hadoop/conf && \

    mv /etc/spark/conf.empty /etc/spark/conf && \

    mv /etc/hive/conf.empty /etc/hive/conf

    ### Copy emr jars to the right locations

    RUN mv /tmp/domino-hadoop-downloads/hadoop-binaries-configs/aws /usr/share/aws

    RUN mv /tmp/domino-hadoop-downloads/hadoop-binaries-configs/hadoop /usr/lib/hadoop

    RUN mv /tmp/domino-hadoop-downloads/hadoop-binaries-configs/hadoop-lzo /usr/lib/hadoop-lzo

    RUN mv /tmp/domino-hadoop-downloads/hadoop-binaries-configs/spark /usr/lib/spark

    RUN cp -r /tmp/domino-hadoop-downloads/hadoop-binaries-configs/java/* /usr/share/java/

    ### Update SPARK and HADOOP environment variables. Make sure py4j file name is correct as per your edgenode

    ### Update SPARK and HADOOP environment variables. Make sure py4j file name is correct as per your edgenode

    RUN \

    echo 'export JAVA_HOME=/usr/lib/jvm/java-8-oracle/' >> /home/ubuntu/.domino-defaults && \

    echo 'export HADOOP_HOME=/usr/lib/hadoop' >> /home/ubuntu/.domino-defaults && \

    echo 'export SPARK_HOME=/usr/lib/spark' >> /home/ubuntu/.domino-defaults && \

    echo 'export PYTHONPATH=${PYTHONPATH:-}:${SPARK_HOME:-}/python/' >> /home/ubuntu/.domino-defaults && \

    echo 'export PYTHONPATH=${PYTHONPATH:-}:${SPARK_HOME:-}/python/lib/py4j-0.10.7-src.zip' >> /home/ubuntu/.domino-defaults && \

    echo 'export PATH=${PATH:-}:${SPARK_HOME:-}/bin' >> /home/ubuntu/.domino-defaults && \

    echo 'export PATH=${PATH:-}:${HADOOP_HOME:-}/bin' >> /home/ubuntu/.domino-defaults && \

    echo 'export HADOOP_CONF_DIR=/etc/hadoop/conf' >> /home/ubuntu/.domino-defaults && \

    echo 'export YARN_CONF_DIR=/etc/hadoop/conf' >> /home/ubuntu/.domino-defaults && \

    echo 'export SPARK_CONF_DIR=/etc/spark/conf' >> /home/ubuntu/.domino-defaults

    Click Build when finished editing the Dockerfile instructions. If the build completes successfully, you are ready to try using the environment.

    Configure a Domino project for use with an EMR cluster

    This procedure assumes that an environment with the necessary client software has been created according to the instructions above. Ask your Domino admin for access to such an environment. Note that you may need to provide Domino with additional options when setting up your project. Your Domino or AWS administrators should be able to provide you with the correct values for these options.

    Open the Domino project you want to use with your EMR cluster, then click Settings from the project menu.

    On the Integrations tab, click to select YARN integration from the Apache Spark panel.

    Use root as the Hadoop user name.

    If your EMR cluster is in the same AWS VPC as your Domino deployment, you do not need to list the hosts in the Custom /etc/hosts entries field. If your Domino deployment is in a separate network from the EMR cluster, list the hostnames of the nodes in your cluster.

    Note

    If your work with the cluster generates many warnings about missing Java packages, you can suppress these by adding the following to Spark Configuration Options.

    Key: spark.hadoop.yarn.timeline-service.enabled

    Value: false

    After inputting your YARN configuration, click Save.

    On the Hardware & Environment tab, change the project default environment to the one you built earlier with the binaries and configuration files.

    You are now ready to start Runs from this project that interact with your EMR cluster.

    View Article
  • Contents

    Overview

    Finding my Scratch Space

    Seeing the contents of my Datasets Scratch Space

    Promoting Scratch Space contents to a Dataset Snapshot

    Risk notifications

    Administration

    Non-empty length of time

    FAQ

    Overview

    A Datasets Scratch Space is a scalable mutable (i.e. read-writeable) filesystem directory for temporary data storage and exploration. They are a compliment to the core Datasets functionality. They provide a space to keep intermediate data results or candidates for Dataset Snapshots as you explore your data. These spaces are designed for when you don’t know what you want just yet (i.e. you don’t know what you don’t know). These spaces are automatically mounted for Workspace sessions for each User for every Project (i.e. they are Per User Per Project). Here are some key properties:

    They do not preserve reproducibility. Files placed in a Datasets Scratch Space are not versioned or tracked. A Datasets Scratch Space is simply a long-lived directory with the scalable properties of Datasets (i.e. large file sizes and many individual files).

    They are only available for Workspaces.

    You get a unique Datasets Scratch Space per User per Project.

    If you shutdown and launch Workspaces, the Datasets Scratch Space is exactly as you left it. All the contents remain; unless you promote the contents to a Dataset Snapshot.

    If you spin up multiple, concurrent Workspaces in a Project, all those Workspaces will see the same Datasets Scratch Space. Remember, only you can “see” your Scratch Space, so any potentially file locks are happening by you.

    When no Workspaces are running, the contents of the Datasets Scratch Space can be promoted to a Snapshot of a Dataset within the Project.

    Finding my Scratch Space

    For a given project (with name project-name), your Datasets Scratch Space for that Project will be at: /domino/datasets/{username}/{project-name}/scratch, where username is your Domino username. Remember, you only get Scratch Spaces with Workspaces.

    Examples

    Let’s say for these examples, my username is dara-data.

    Project: big-data, Owner: alex-algo

    My Scratch Space is at: /domino/datasets/dara-data/big-data/scratch

    Notice that it doesn’t matter than the owner is alex-algo. The scratch path is still with my username. Also, it is assumed I have the appropriate permissions to the Project.

    Project: big-data, Owner: dara-data

    My Scratch Space is at: /domino/datasets/dara-data/big-data/scratch

    Notice that it doesn’t matter that there are other Projects with the name big-data. They won’t conflict because the scope of the path is only for Workspaces in that Project.

    Seeing the contents of my Datasets Scratch Space

    You can always view the contents of your Datasets Scratch Space by launching a Workspace and navigating to your Scratch Space. It is simply another directory (with all the high performance properties of Datasets).

    File Browser.

    If you’d like to get an idea of the contents of the Scratch Space without using a Workspace, you can navigate to the Datasets Project Level Page.

    domino.yaml

    There you will find a file browser that displays the contents of your Scratch Space. If you have lots of files, you can paginate through. You can also drill down into any folders that are in your Scratch Space. As you modify the contents of your Scratch Space (e.g. add/delete/edit files, add/delete/edit folders), refreshing the file browser will reflect those changes.

    Calculated Size.

    The used Scratch Space size is calculated anytime a Workspace is stopped. The timestamp of the calculation is also provided (see FAQ ).

    In this example, I have two Workspaces open, and my current calculation is 10 GB. This actually doesn’t reflect the 12 GB in the transactions folder that I created since the last time a Workspace was closed.

    When I close one of the Workspaces, the used Scratch Space size is updated to 22 GB.

    Promoting Scratch Space Contents to a Dataset Snapshot

    The contents of your Scratch Space can be made into a Dataset Snapshot.

    No Workspaces can be running to create a Dataset Snapshot from the contents of your Scratch Space. The Create a Dataset Snapshot from Scratch Space is only enabled when no Workspaces are running.

    You will only be able to promote to a Dataset within your Project (see FAQ ).

    Once the contents of the Space Scratch is made into a Dataset Snapshot, contents of the Scratch Space will be deleted (i.e. the Scratch Space is cleared)

    Workflow

    Click Create a Dataset Snapshot from Scratch Space

    Using the input box in the modal, select a Dataset in your Project.

    WARNING: When you create a Dataset Snapshot from the contents of the Datasets Scratch Space, those contents will no longer be in the Datasets Scratch Space (i.e. The contents of the Datasets Scratch Space will be deleted upon promotion to a Dataset Snapshot).

    Click Create Snapshot

    To confirm, you can navigate to the details page of your newly created Snapshot.

    Risk Notifications

    Using a Datasets Scratch Space for indefinite storage is discouraged. Not only does this potentially lead to wasteful storage consumption and costs, it may also be unnecessarily compromising reproducibility. Finally, while a Datasets Scratch Space is reliable persistent storage, an accidental loss could occur from a User accidentally deleting contents from the Scratch Space. Creating a Snapshot of valuable work from the Scratch Space prevents accidental loss.

    To mitigate the risk of using Scratch Spaces for indefinite storage, a user interface indicator provides a Scratch Space risk notification that alerts the user of the risk associated with the contents in the Scratch Space. Specifically, we want to notify the user of the length of time since potentially new work has not been made into a Snapshot. Here, the length of time is a proxy for risk profile (see Non-Empty Length of Time ).

    By default, the three ranges are:

    Low Risk: Less than or equal to five days.

    Medium Risk: Greater than five days and less than or equal to ten days.

    High Risk: Greater than ten days.

    There are three risk ranges, separated by two thresholds; hence, the thresholds define the ranges. The risk ranges are in terms of days and have a lower bound of zero days and no upper bound. The thresholds are configurable via two central configuration values.

    Namespace: common

    Key:com.cerebro.domino.dataset.scratch.riskThresholdOneInDays

    Value: number

    Default: 5

    This option controls the first Datasets Scratch Space risk threshold in days.

    Namespace: common

    Key:com.cerebro.domino.dataset.scratch.riskThresholdTwoInDays

    Value: number

    Default: 10

    This option controls the second Datasets Scratch Space risk threshold in days.

    Administration

    The administration page for Dataset Scratch Spaces is the same of as the Datasets administration page. Once in the Administration area, you can navigate by selecting Datasets under the Advanced menu option.

    A Dataset Scratch Spaces administration section is below the Datasets administration section. A table of all Dataset Scratch Spaces is shown (see FAQ ).

    Project: Unique identifier which is a concatenation of {owner_username}/{project_name}

    User: Full name of user.

    Size: Used Scratch Space Size (see FAQ )

    Last Time Size Calculated: Timestamp of when the value in the Size column was calculated.

    Last Time Snapshot: The last time the contents of that Scratch Space was promoted into a Dataset Snapshot. No value means the contents of the Scratch Space has never been made into a Dataset Snapshot.

    Filtering

    A filtering input box is provided at the top of the administration table. The filter will do a case insensitive string match on all the columns in the administration table and return a table with rows with matching elements (i.e. rows that contain the filtering string).

    Deleting Scratch Space Contents

    Administrators can delete the contents of a Datasets Scratch Space. Only Scratch Spaces with files are eligible to be deleted; this is the case even if the files are zero bytes. The Delete Contents button for Scratch Spaces that contain no files will be disabled.

    Workflow

    Click Delete Contents for the Datasets Scratch Space you want to delete the contents of.

    This will bring up a confirmation modal. If you are sure you want to delete the contents, press Delete Contents. Deleting the contents of a Scratch Space cannot be undone.

    The administration table will be refreshed and you can confirm that the contents of your selected Datasets Scratch Space are deleted.

    Non-Empty Length of Time

    The length of time used to notify the user is specifically the time the Scratch Space has been non-empty. Recall the following:

    The storage size of the Datasets Scratch Space contents is computed after the stopping of any Workspace.

    There are three cases the Scratch Space becomes empty.

    Initial state

    User clears it (e.g. rm -rf *)

    Promot to Snapshot

    Consider the following figure, which illustrates the state of a Datasets Scratch Space over time. The horizontal axis is time. Portions where the Datasets Scratch Space is non-empty is shown in orange. A non-empty Scratch Space is one where there exists at least one file (of any size). At times t_2 and t_4, the Scratch Space is cleared (emptied) by the User and through a promotion to a Snapshot, respectively.

    Assume we are at the moment in time marked “Now” and we’ve closed a Workspace, and hence, the Scratch Space storage size is computed. The non-empty time we report will be that moment in time back to t5, the most recent point Scratch Space storage was calculated to be non-empty coming from the empty state. This non-empty time is specified as T_NON-EMPTY.

    FAQ

    I tried to write to a Datasets Scratch Space in a Job and it failed?

    A Datasets Scratch Space is only available in Workspace sessions.

    Why is the stated “Used Scratch Space Size” in the Datasets Scratch Space file browser different from what I expected based on the files that are currently in my Datasets Scratch Space?

    The “Used Scratch Space Size” (size calculation) is not updated in a real-time fashion. Instead, it is calculated every time a Workspace is closed. Notice that with every stated “Used Scratch Space Size”, a note is presented on when the last time the value was calculated. See "Why is the size calculation in the file browser only updated when a Workspace is closed?".

    Why is the size calculation in the file browser only updated when a Workspace is closed?

    Constantly having a process periodically calculating a file system size for open Workspaces can be taxing on the system. Remember, the Scratch Space is designed to have all the performance properties of Datasets: it can handle large data (i.e. TBs) and many number of individual files (i.e. millions of files). Also, if Workspaces are already open, Users can inspect the filesystem and sizes directly. Finally, the convenience of the file browser really comes into play when no Workspaces are running, which at that time, everything is static and up-to-date.

    Why the “Create a Dataset Snapshot from Scratch Space” button disabled sometimes?

    You can only snapshot a Datasets Scratch Space when no Workspaces are running.

    Why are the contents of the Datasets Scratch Space deleted upon promotion to a Dataset Snapshot?

    This is for performance reasons. Like Datasets, Scratch Spaces can potentially contain large sized files and a large number of individual files. In order to create a Dataset Snapshot and preserve the contents of the Scratch Spaces, we would need to perform an expensive and robust copy operation (i.e. long time and computation resources). By allowing the Scratch Space to be cleared upon promoting its contents to a Datasets Snapshot, we are able to cleverly perform the operation instantly. Both the Scratch Space and newly created Dataset Snapshot are available for use after promotion.

    I want to promote my Scratch Space to a Dataset that doesn’t exist yet. How do I create a Dataset?

    You can create an empty Dataset as described in the Creating Datasets Support article.

    What if I want to take the contents of my Dataset Scratch Space and make it into a Snapshot of a Dataset not in my current Project?

    You would have to do this manually and just treat the Scratch Space like a directory you’d like to copy or move into new Snapshot directory.

    First, you must have access to create a new Snapshot to a Project (see Sharing and Collaboration ).

    Assuming you have permission, you would need to create a configuration in your file where you mount an output directory for your desired Dataset. Because the Dataset is not in your Project, you will have to refer to the Dataset in a fully qualified way: {project_owner_username}/{project_name}/{dataset_set}.

    So, for example, if I wanted to mount an output directory so that I could create a new Snapshot to a Dataset called iris in a Project called datascience owned by john_smith, I should have a YAML entry that looks something like:

    datasetConfigurations:

    - name: “new iris snapshot”

    outputs:

    - path: “new_iris”

    dataset: “john_smith/datascience/iris”

    Here, I called the configuration new iris snapshot and I chose a mount point of new_iris; both this are up to you.

    Once you’ve properly mounted the output directory for your new Snapshot, you can simply launch a Run and copy or move the contents of your Scratch Space to the output directory.

    What Dataset Scratch Spaces are shown in the administration table?

    All Dataset Scratch Spaces are shown in administration table for Dataset Scratch Spaces. A Dataset Scratch Spaces becomes “active” any time a User starts a Workspace in a Project. Hence, there is a Dataset Scratch Space for any Project that a User starts a Workspace.

    Why am I allowed to delete the contents of a Datasets Scratch Space that uses zero bytes?

    Scratch Spaces with any files are eligible to be deleted; this is the case even if the files are zero bytes.

    View Article
  • Overview

    This article will guide you through a hands-on tutorial on using Domino Datasets in advanced mode. Advanced mode allows you to set up multiple customized dataset configurations, and swap between them from one Run to the next.

    If you want to understand the basic features of Domino Datasets, read the Datasets overview.

    in this tutorial, you'll be working with data from the Climate Analysis Indicators Tool (CAIT) via World Resources Institute. If you follow along you'll learn how to write, read, append, revert, and manage Domino Datasets.

    This tutorial will take you approximately 30 minutes to complete.

    Prerequisites

    Domino 3.3+

    Domino executors must have access to the Internet for this tutorial

    Contents

    Creating Datasets

    Setting up an initial Dataset configuration

    Writing to an output Dataset

    Reading from an input Dataset

    Appending to a Dataset

    Reverting a Dataset to a previous snapshot

    Sharing a Dataset

    domino.yaml schema

    Creating Datasets

    Domino Datasets belong to Domino projects. Permission to read and write from a Dataset is granted to project contributors, just like the behavior of project files. For your first step in this tutorial, you should create a new project. This project will be used to ingest, process, and store data.

    Navigate to your Domino application, then click + New Project from the landing page. Give the project an informative name, set its visibility to Private, then click + Create Project.

    You'll be automatically redirected to the Files page for the new project. From the project menu, click Datasets. This page will show information about the Datasets in this project, plus any shared Datasets that are mounted. Since this is a new project, there are no Datasets to display here.Click Create New Dataset to get started.

    For the purpose of this tutorial, you'll want to create two new Datasets. One to store raw imported data, and one to store some derivative processed data. Name the first Dataset cait-raw, give it an informative description like the one shown above, then click Upload Contents.

    The Dataset will be created, and you will then be directed to the upload page. You can ignore this page for now, since this tutorial focuses on writing to Datasets with advanced mode. Click Datasets from the project menu again, and you'll see the Dataset you just created listed with zero active Snapshots.

    Click Create New Dataset again, and follow the same process to create an emissions-trendDataset.

    Now that your Datasets are created, the next step is to configure your project to use them in advanced mode.

    Setting up an initial Dataset configuration

    When you want to interact with a Dataset from your Domino project in advanced mode, you must mount the Dataset.

    There are two ways to do so:

    Mounting a Dataset as an input Dataset makesthe contents of a specific snapshot (the most recent, by default) available in a directory at the specified mount point in your Workspace or Run. A Dataset mounted only for input cannot be modified.

    Mounting a Dataset as an output Dataset creates an empty directory at the specified mount point, and when your Run or Workspace is stopped a new snapshot is written to the Dataset with the contents of that directory. Note that the new snapshot will only contain exactly those files that are in the mounted output directory. Snapshots do not append by default.

    It's important to note that the same Dataset can be mounted for input and output simultaneously at different mount points, which will be important when we perform append and revert operations later. For now, we'll set up our project to mount thecait-rawDataset for output, so we can populate it with some data.

    Dataset configurations for Domino projects are controlled by a file nameddomino.yaml. If Domino sees this file in the root of your project, it will attempt to read it and make available the Dataset configurations specified within.

    Thedomino.yaml file doesn't exist by default, so from the Files page of your project, click the Add File button.

    Name the file exactlydomino.yaml. You can read the full YAML scheme for this configuration file here, but for now you can just copy and paste the following markup to create a configuration that mountscait-rawfor output.

    datasetConfigurations:

    - name: "PopulateRaw"

    outputs:

    - path: "raw-output"

    dataset: "cait-raw"

    The three important values in this configuration are:

    "PopulateRaw" is the name you give this configuration so you can identify and select it when starting a Workspace or Run.

    "raw-output" is the directory name that will be mounted to receive new output at /domino/datasets.

    "cait-raw" is the name of the Dataset you want to mount.

    Once you've filled in the filename and contents, click Save.

    You're now set up to write data to a Dataset.

    Writing to an output Dataset

    If you've created a validdomino.yaml file in the root of a project, you'll see an option to select the configurations defined within when launching a Run or Workspace. To populate some initial data intocait-raw, you should start up a Jupyter Workspace with thePopulateRaw configuration selected.

    From the project menu, click Workspaces. Click Jupyter, give the workspace session an informative name, then click theAdvanced tabin the Datasets panel. Select thePopulateRaw configuration in the dropdown menu, then click Launch Jupyter Workspace.

    If you get an error saying that a valid Dataset configuration file was not found, double-check that your file is correct YAML, uses spaces instead of tabs for indentation, and is named exactly correctly with no spaces before or after the filename.

    In your new Jupyter Workspace, click New > Terminal to access the executor shell.

    Domino has made some of the CAIT data available in a public bucket on Amazon S3. Run the following commands to fetch two files containing data on CO2 emissions by country for the years 2010 and 2011.

    wget https://s3.amazonaws.com/dominodatalab-cait/country-emissions-2010.csv

    wget https://s3.amazonaws.com/dominodatalab-cait/country-emissions-2011.csv

    When finished, you should have the two downloaded files in your /mnt working directory.

    To write these files to your output Dataset, you need to copy them to the mount path set in your Dataset configuration. In thePopulateRaw configuration, the output Dataset was mounted atraw-output. That directory path gets appended to a base path of/domino/datasets.

    To queue the files you downloaded for writing to the Dataset, use these commands to move them to the output mount.

    mv country-emissions-2010.csv /domino/datasets/raw-output/

    mv country-emissions-2011.csv /domino/datasets/raw-output/

    Now, the output mount directory contains the two files you want to write to the next snapshot for the output Dataset.

    To write the snapshot, all you need to do is stop your Workspace session. Click Stop from the top menu, then Stop and Commit in the prompt. Domino will detect that there is data in the output mount, and will write a new snapshot.

    Back in Domino, open the Datasetspage. Click the name of the output Dataset (cait-raw) to view details on it, and you'll see the snapshot you just wrote to the Dataset.

    You now have a populated Dataset that you and other contributors to your project can use to access data for analysis and transformation.

    Reading from an input Dataset

    In this step, you'll read in the data from the raw Dataset, transform it, and write it to a new Dataset. The first step is to create a new Dataset configuration.

    From the Files page of your project, click the filename ofdomino.yaml to open the file, then click Edit. Add the following new Dataset configuration at the end of the file, then click Save.

    - name: "WriteTrend"

    inputs:

    - path: "raw-input"

    dataset: "cait-raw"

    outputs:

    - path: "trend-output"

    dataset: "emissions-trend"

    When finished, yourdomino.yaml file will describe two configurations:

    the PopulateRaw configuration you used in the previous step

    the new WriteTrend configuration that mounts thecait-rawDataset for input, and theemissions-trendDataset for output

    Using the new WriteTrend configuration, you can write code that reads from the input Dataset, performs some processing or analysis operations on the data within, then writes to the different output Dataset.

    For this step you will write and execute a Python script as a Domino Run, rather than using a Domino Workspace. The same operations could be done in a Workspace if desired, but using a batch Run will create a repeatable step in a data pipeline, which you could run every time the input Dataset changes to get an updated output Dataset.

    From the Files page of your project, click Add File. Name the file calculate-trend.py, paste in the Python script below, then click Save.

    from __future__ import division

    import pandas as pd

    import os

    import glob

    # load all files from input datasets as pandas dataframes

    df_list = []

    data_dir = "/domino/datasets/raw-input/"

    data_files_list = glob.glob(data_dir+'*')

    # function takes two raw MtCO2 columns and returns percentage change

    def percentage_change(year1_df, year2_df):

    year1_data = year1_df["CO2 emissions (Mt)"]

    year2_data = year2_df["CO2 emissions (Mt)"]

    change = (year2_data-year1_data)/year1_data * 100

    change_formatted = str(round(change,2))+"\%"

    return change_formatted

    # create output dataset

    emissions = pd.DataFrame()

    emissions['Country'] = pd.read_csv(data_files_list[0])['Country']

    # write new columns for each year on year pair

    for i in range(len(data_files_list)):

    if i == 0:

    continue

    df_year_1 = pd.read_csv(data_files_list[i-1])

    df_year_2 = pd.read_csv(data_files_list[i])

    year1 = str(df_year_1.loc[0,'Year'])

    year2 = str(df_year_2.loc[0,'Year'])

    column_name = "Emissions change " + year1 + " to " + year2

    emissions[column_name] = percentage_change

    # send data to output dataset

    output_file = "/domino/datasets/trend-output/emissions-trend.csv"

    emissions.to_csv(output_file)

    To run this script with theWriteTrendDataset configuration, click Jobs from the project menu, then click Run at the top of the Runs list. Entercalculate-trend.py as the file you want to run, then below that click the Advanced Datasets configuration tab and chooseWriteTrend from the dropdown menu.

    Click Start Run to execute your code. When finished, you'll see a new snapshot written to theemissions-trendDataset, containing the transformed data fromcait-raw. Every time you runcalculate-trend.py with this Dataset configuration, it will read and process the latest snapshot ofcait-raw and write a new snapshot ofemissions-trend.

    In the next step, you'll learn how to append to a Dataset by adding a new file tocait-raw.

    Appending to a Dataset

    Appending to a Dataset requires use of Advanced Mode.

    Suppose you receive a fresh batch of raw data, in this case a new file with data from 2012 that you want to store alongside the data from 2010 and 2011. The logical operation you want to do is append that file to the existing content of the last snapshot. The procedure for appending to a Domino Dataset involves mounting it for both input and output simultaneously.

    Remember that by default, mounting a Dataset for input makes available the files in the most recent snapshot, and mounting a Dataset for output provides an empty directory, the contents of which will become the next snapshot at the end of the Run or Workspace session.

    The high-level steps to an append are:

    Start a Run or Workspace session with the Dataset mounted for both input and output

    Copy the contents of the input mount to the output mount

    Add the data you want to append to the Dataset to the output mount

    To continue this tutorial example, you first need to write a new Dataset configuration.From the Files page of your project, click the filename ofdomino.yaml to open the file, then click Edit. Paste the following new Dataset configuration at the end of the file, then click Save.

    - name: "AppendRaw"

    inputs:

    - path: "raw-input"

    dataset: "cait-raw"

    outputs:

    - path: "raw-output"

    dataset: "cait-raw"

    This newAppendRaw configuration mounts thecait-rawDataset for both input and output. The input mount will be at /domino/datasets/raw-input and the output mount will be at /domino/datasets/raw-output.

    Now you can perform your append operation by starting up a Domino Workspace with theAppendRaw configuration.

    From the project menu click Workspaces.

    Select Jupyter.

    Give the Workspace an informative name.

    Click to open theAdvancedtab in the Datasets panel.

    ChooseAppendRawfrom the dropdown menu.

    Click Launch Jupyter Workspace.

    In your Jupyter Workspace, click New > Terminalto access the executor shell. Follow these steps to complete your append operation:

    Fetch the new file with 2012 data.

    wget https://s3.amazonaws.com/dominodatalab-cait/country-emissions-2012.csv

    Copy the previous snapshot of the Dataset from the input mount to the output mount.

    cp /domino/datasets/raw-input/* /domino/datasets/raw-output/

    Move the new file to the output mount.

    mv country-emissions-2012.csv /domino/datasets/raw-output/

    Click Stop then Stop and Commit to end this session and write the new snapshot of the Dataset.

    If you examine thecait-rawDataset from the Datasets page of your project, you'll see a new snapshot with the 2012 file appended to the contents of the previous snapshot. Now, if you want to, you can start a fresh Run ofcalculate-trend.py with theWriteTrend configuration, to update theemissions-trendDataset with the 2012 data.

    Reverting a Dataset to a previous snapshot

    Reverting a Dataset is similar to appending, in that you'll mount the Dataset for input and output simultaneously. However, instead of mounting the latest snapshot for input, you'll mount a specified previous snapshot that you want to revert to. Suppose there was an issue with the 2012 file you added in the previous section, and you want to go back to the initial snapshot ofcait-raw.

    To identify a specific snapshot in your Dataset configuration, you need to tag the snapshot. From the Files page of your project, click the Datasets tab. Next, click the name of thecait-rawDataset.

    From the dropdown menu, choose the snapshot you want to tag, in this case Snapshot 0.

    Click the + Add Tag button below the Dataset name in the upper left, then fill in the string you want to tag the snapshot with. In the below example the snapshot is tagged with "good."

    When finished, you'll see a blue tag with the string you entered appear next to the + Add Tag button. You'll now need to write a Dataset configuration that mounts the tagged snapshot of the Dataset.From the Files page of your project, click the filename ofdomino.yaml to open the file, then click Edit. Paste the following new Dataset configuration at the end of the file, then click Save.

    - name: "RevertRaw"

    inputs:

    - path: "raw-input"

    dataset: "cait-raw:good"

    outputs:

    - path: "raw-output"

    dataset: "cait-raw"

    This is similar the the append configuration, but note that the input Dataset name is entered ascait-raw:good. This is how to set up a dataset configuration to mount a tagged input snapshot. A colon is appended to the Dataset name followed by the tag string.

    Now you can perform your revert operation by starting up a Domino Workspace with theRevertRaw configuration.

    From the project menu click Workspaces.

    Select Jupyter.

    Give the Workspace an informative name.

    Click to open theAdvanced tab in the Datasets panel.

    ChooseRevertRawfrom the dropdown menu.

    Click Launch Jupyter Workspace.

    In your Jupyter Workspace, click New > Terminalto access the executor shell. Follow these steps to complete your revert operation:

    Copy the tagged snapshot contents from the input mount to the output mount.

    cp /domino/datasets/raw-input/* /domino/datasets/raw-output/

    Click Stop then Stop and Commit to end this session and write the new snapshot of the Dataset.

    If you examine thecait-rawDataset from the Datasets tab on the Files page of your project, you'll see a new Snapshot 2 with only the original two files from Snapshot 0 in it.

    Sharing a Dataset in advanced mode

    To give another user access to the Datasets in your project, you need to add them to the project as a Contributor, Results Consumer, or Project Importer. Once your colleague has been granted one of those permissions on the project, he or she can refer to your Datasets in a domino.yaml file with the scheme:

    <your-username>/<your-project-name>/<dataset-name>

    For example, to mount the emissions-trendDataset from the above examples as an input Dataset, your colleague would use a configuration like this, noting that documentation in this path is the username:

    datasetConfigurations:

    - name: "import"

    inputs:

    - path: "cait-input"

    dataset: "documentation/cait-data/emissions-trend"

    domino.yaml schema

    Thedomino.yaml file respects the following schema.

    datasetConfigurations: # contains array of configurations

    - name: string # identifier for this configuration

    inputs: # contains array of datasets to mount for input

    - path: string # path appended to /domino/datasets

    dataset: string # name of the dataset to mount as input

    outputs: # contains array of datasets to mount for input

    - path: string # path appended to /domino/datasets

    dataset: string # name of the dataset to mount for output

    View Article
  • Contents

    Overview

    Writing to a local Dataset

    CLI Upload

    Upload by Running Script

    Upload by Launching Workspace

    Reading from a shared Dataset

    Managing Datasets

    Overview

    Domino Datasets provide high-performance, versioned and structured filesystem storage in Domino. With Domino Datasets, you can build multiple curated pipelines of data in one project, and share them with your fellow contributors across their projects.

    A Domino Dataset is a series of Snapshots. Each Snapshot is a completely independent state of the Dataset, and represents the contents of a filesystem directory from the time when the snapshot was written. There are two key ways to interact with a Domino Dataset:

    you can write a new snapshot to one of your project's local Datasets

    you can read from an available snapshot of a shared Dataset you have mounted

    Prerequisites

    Domino 3.3.1+ on AWS in an EFS-enabled region

    If you're running Domino on-premises and want to use Datasets, contact your Customer Success Manager for more information.

    Writing to a local Dataset

    Domino Datasets belong to Domino projects. Permission to read and write from a dataset is granted to project contributors, just like the behavior of project files. A Dataset that belongs to a project is considered to be local to that project. To create a new Dataset in your project, click Datasets from the project menu, then click Create New Dataset.

    Supply a name and optional description, then click Upload Contents. The upload page provides four ways to write to your local dataset.

    Browser Upload

    In Domino 3.5+ you can use the Upload Files section to queue up to 50GB or 50,000 individual files for upload through your browser. You can pause this upload and resume within 24 hours. You can upload directories and subdirectories to preserve your filesystem structure.

    CLI Upload

    After installing and configuring the Domino CLI, you can copy and paste the displayed command to upload a directory of files from your local machine to the Dataset. Note that all contents of the directory you specify are written to the Dataset.

    For the example shown above, if the files you want to write to the Dataset are in /Users/myUser/data, you would run the following command:

    domino upload-datasetnjablonski/simple-model/main/Users/myUser/data

    When finished, click Complete. You will then be taken to the Dataset overview where you should see a new Snapshot has been written. The new Snapshot will contain exactly those files that were in the folder you uploaded from your local machine.

    Upload by Running Script

    Before using this method, you need a script in your project files that is configured to write to the the target Dataset. Supply the name of a Bash, Python, or R script and click Start to launch a Job. During the Job, an empty folder will be available at the path shown in Output Directory. At the conclusion of the Job, any files that your script has written to the output directory will be written to your Dataset as a new Snapshot.

    In the example above, the Output Directory is /domino/datasets/main/output. For the simplest possible example, you could run the below script for a situation where there is a file named data.csv in your project.

    write-dataset.sh

    cp $DOMINO_WORKING_DIR/data.csv /domino/datasets/output/<Dataset_Name>

    When the script runs, it will copy the data file to the Dataset output directory. Then, when the Job is finished, Domino will write a new snapshot to the Dataset. The new Snapshot will contain the exact contents of the output directory, which in this case is just the data.csv file.

    Upload by Launching a Workspace

    This method works similarly to uploading by running a script. You will have all of the usual options available from your Domino environment for launching a Workspace. When the Workspace is launched,an empty folder will be available at the path shown in Output Directory. When you stop and sync the Workspace, any files that you have written to the output directory will be written to your Dataset as a new Snapshot.

    Note

    There is a configurable limit to the number of snapshots a Dataset may contain.

    This limit defaults to 20 Snapshots.

    If your Dataset is at this limit, attempting to start an upload with any of the above methods will result in an error message. Before you can write additional Snapshots, you will need to delete old Snapshots or increase the limit. Talk to your local administrator for more information.

    Note

    You can create a maximum of 5 local Datasets per project.

    If you are setting up a pipeline that requires more than 5 Datasets, use separate projects for each logical task that import Datasets from the project that precedes them in the pipeline.

    Reading from a shared Dataset

    To access the contents of an existing Dataset snapshot, you must mount the target Dataset in your project. To mount a Dataset, click Datasets from the project menu, then click Mount Shared Dataset.

    Click the Dataset to Mount field to see an autocomplete dropdown of Datasets you have access to. To access a Dataset, you must be an Owner, Contributor, Project Importer, or Results Consumer on the project that contains the Dataset.

    There are three different settings under Update Behavior that control which Snapshot of the target Dataset your project will mount. You can mount the latest Snapshot, a tagged Snapshot, or a fixed Snapshot number. When finished, click Mount.

    Now, on the Datasets page for your project you will see the Dataset you mounted listed under Shared Datasets.

    The Path shown for the Dataset points to a directory where you will find the file contents of the mounted Snapshot in your project's Runs and Workspaces. When mounted this way, the Dataset is read-only.

    Managing Datasets

    From the Datasets page of your project, click the name of a local or shared Dataset to open its overview page. At the top of the overview page you will see the Dataset name and description, plus buttons to upload to, rename, or archive the Dataset.

    Below the description is a panel with Dataset details. Use the dropdown menu at the top of the panel to select a Snapshot, then the panel will display a list of the files it contains, plus some metadata about the Snapshot.

    There are two important actions you can take on a Dataset snapshot:

    Add Tag

    Above the Snapshot selection dropdown you will find a list of tags applied to the Snapshot, followed by a + Add Tag button. Tags can be used to identify a Snapshot when mounting a shared Dataset for input. This allows the Dataset owner to tag a Snapshot for production use, and move the tag to whichever Snapshot is in the desired state as the Dataset changes over time.

    Mark for Deletion

    Clicking the Mark for Deletion button in the lower right of the panel will mark the currently selected Snapshot for deletion, changing its status. Such Snapshots can no longer be mounted in Runs by consuming projects. The Snapshot will be flagged to a Domino administrator as ready for deletion, but will not be fully deleted until the administrator takes an additional action to delete it, at which point the data is erased and no longer recoverable.

    View Article
  • Overview

    This article describes how to convert legacy Data Sets workflows to use Domino Datasets. This is a two-step process that involves moving your data into a new Domino Dataset, and then updating all projects and artifacts that consume the data to retrieve it from the new location.

    Prerequisites

    Domino 3.3+

    Contents

    Migrating data from a legacy Data Set into a Domino Dataset

    Updating data consumers to use the new Domino Dataset

    Migrating data from a legacy Data Set into a Domino Dataset

    Legacy Data Sets are semantically similar to Domino Projects. If your deployment is running a version of Domino with the new Domino Datasets feature, you can create Domino Datasets inside legacy Data Sets. This will allow for a very simple migration path for a legacy Data Set, where all of the existing data is added to a single Domino Dataset owned by the legacy Data Set, and the entire file structure is preserved.

    The long term deprecation plan for legacy Data Sets is to transform them into ordinary Domino Projects, which will continue to contain and share any Domino Datasets you created in them.

    To get started, you need to add a script to the contents of your legacy Data Set that can transfer all of your data into a Domino Dataset output mount. From the Files page of your legacy Data Set, click Add File:

    Name the file migrate.sh, and paste in the example command provided below.

    cp -R $DOMINO_WORKING_DIR/. /domino/datasets/output/main

    This example migration script copies the contents of $DOMINO_WORKING_DIR, which is a default Domino environment variable that always points to the root of your project, to a Domino Dataset output mount path. The directory named main in the path below is derived from the name of the Domino Dataset that will be created to store the files from this legacy Data Set.

    Click Save when finished. Your script should look like this:

    project dependency

    Next, click Datasets from the project menu, then click Create New Dataset.

    Be sure to name this Dataset to match the path to the output mount in the migration script. If you copied the command above and added it to your script without modification, you should name this Dataset main. You can supply an optional description, then click Upload Contents. On the upload page, click to expand the Create by Running Scriptsection.

    Double-check to make sure the listed Output Directory matches the path from your migration script, then enter the name of your script and click Start. A Job will be launched that mounts the new Dataset for output and executes your script. If the Job finishes successfully, you can return to the Datasets page from the project menu and click the name of your new Dataset to see its contents.

    You now have all of the data from your legacy Data Set loaded into a Domino Dataset. This method preserves the file structure of the legacy Data Set, which is useful for the next step: updating consumers to use the new Dataset.

    Updating data consumers to use the new Domino Dataset

    Potential consumers of your legacy Data Set are those users to whom you granted Project Importer, Results Consumer, or Contributor permissions. As the project Owner, you also may have other projects consuming the contents of your legacy Data Set. This same set of permissions will grant access to your new Domino Dataset.

    A project consuming data from your legacy Data Set will import it as a, and it will be visible on the Other Projects tab of the Files page.

    In the example above, the global-power project imports the data-quick-start legacy Data Set. The contents of data-quick-start are then available in global-power Runs and Workspaces at the path shown in the Location column. Anywhere your code for batch Runs, scheduled Runs, or Apps refers to that path will need to be updated to point to the new Domino Dataset.

    To determine the new path and set up access to the Domino Dataset, you need to mount the Dataset. With the consuming project open, click Datasets from the project menu, then click Mount Shared Dataset.The Dataset to Mount field is a dropdown menu that will show shared Datasets you have access to. In the above example, the main Dataset from the data-quick-start project will be mounted at the latest snapshot. Select the Dataset that you migrated your data into earlier, then click Mount.

    When finished, you will see the Dataset you added listed under Shared Datasets. The Path column shows the path where the contents of the Dataset will be mounted in this project's Runs and Workspaces.

    Remember that if you used the migration script shown earlier, the file structure at that path will be identical to the file structure of the imported legacy Data Set location. All you need to do to access the same data is change the path to this new Domino Dataset mount.

    Be sure to contact other users who are consuming your legacy Data Set and provide them with information about the new Domino Dataset.

    View Article
  • Overview

    This article describes how to use Domino Datasets to solve problems, improve collaboration, and open new workflow possibilities in Domino.

    Prerequisites

    Domino 3.3+

    You can use Datasets to...

    Store more files, bigger files, and access them faster

    Build multiple curated collections of shareable data in your project

    Track production and testing states of your data

    Simplify working with Domino locally

    Automatically pipe data from external sources into Domino

    Store more files, bigger files, and access them faster

    When you start a Run or launch a Workspace, Domino copies your project files to an executor.When working with large volumes of data, this presents three potential problems:

    The number of files that can be stored in Domino project files may exceed the configurable limit. By default, the limit is 10,000 files.

    There is a limit to the size of any individual file that can be transferred to and from your Domino project files. By default, this limit is 8GB.

    The time required to transfer data to and from the executor is proportional to the size of the data. It can take a long time if the size of the data is very large, leading to long startup and shutdown times for Runs and Workspaces.

    You can solve these problems with Domino Datasets:

    There is no limit to the number of files that can be stored in a Domino Dataset.

    There is no limit to the size of any individual file stored in a Domino Dataset.

    Domino Datasets are attached to executors as networked filesystems, removing the need to transfer their contents to the executor when starting a Run or Workspace.

    Build multiple curated collections of shareable data in your project

    If you use project imports and exports to share data with other members of your team, the consumers of your project will receive the entire contents of your project files in their Runs and Workspaces. That works well if your project is small, simple, and narrowly scoped.

    However, for large projects that produce many data products, you may want to expose them to your consumers in smaller, curated subsets. You can do this with Domino Datasets.

    Consider the project shown below.

    scheduled Job

    This project has a small folder full of code, plus nine folders full of various kinds of various output data. Each data folder is larger than 10GB, and the whole project is 100GB. It would be impractical to ask your data consumers to import this project, but you also don't want to separate the data from the code that produced it by moving the data to a different project.

    The solution is to organize the data into Datasets, with one Dataset for each type of data your consumers are interested in. In this example, suppose you have two colleagues who want to consume your data. One of them is only interested in the data from the experiment1 folder, and the other is only interested in the data from experiment9.

    You can follow the i nstructions on the Dataset overview to create and write to two Datasets with scripts like the following, where it's assumed you have named the Datasets experiment1-data and experiment9-data.

    experiment1-populate-dataset.sh

    cp -R $DOMINO_WORKING_DIR/experiment1/. /domino/datasets/experiment1-data/output

    experiment9-populate-dataset.sh

    cp -R $DOMINO_WORKING_DIR/experiment9/. /domino/datasets/experiment9-data/output

    Your consumers can then mount only the Datasets they are interested in.

    Note

    If you are working with data at this scale, you should write it to Datasets whenever you produce it, instead of storing it in your project files.

    You can execute your experiment code from the Datasets upload interface, and make the Dataset output directory the destination for your data products. If you want to write to multiple Datasets in the same Run, check out the Datasets advanced mode.

    Track production and testing states of your data

    If you have a Dataset that is being used by downstream consumers for critical work, tagging allows you to continue to improve, process, and experiment with new Snapshots without impacting those consumers. When you have improved data ready for use, you can switch which Snapshot is tagged, and your tag consumers will automatically start getting your new data.

    Consider the Dataset shown below.

    The Dataset has three active Snapshots. If you decide that you want consumers of this Datasets to work from Snapshot 1, since Snapshot 2 represents an experimental modification of the data that you are not yet confident in, you can apply a tag like prod to Snapshot 1.

    When your consumers mount the Dataset in their projects, they have the option to mount whichever Snapshot is marked with a given tag. When they choose thePin your snapshot at a certain tag update behavior, they will see a dropdown menu of available tags.

    When you are confident that your experimentation has produced a new Snapshot that is ready for production use, you can remove the prod tag from Snapshot 1, and apply it to the new Snapshot. Your consumers will then automatically see the newly tagged Snapshot mounted in their Runs and Workspaces. Note that trying to apply the tag again without first removing it from the previously tagged Snapshot will result in an error.

    Simplify working with Domino locally

    If you use the Domino CLI to work with projects to your local machine, you may find that storing large data files slows down your download and sync operations, and fills up a lot of your local disk storage. You can prevent this by storing data in a Domino Dataset, and reserving your project files for the scripts and documents you want to work with locally.

    Follow these steps to simplify your local workflow:

    Create a Dataset in your project, and write your large data files to it.

    Once the files have been written to the Dataset, you can remove them from your project files.

    Fetch a fresh, slimmed-down copy of your project.

    Update your code to reference your data files in their new location, at:

    /domino/datasets/local/<dataset-name>/

    When everything is working smoothly, you can delete any copies of the project from your local machine that have the large data files in them.

    Automatically pipe data from external sources into Domino

    If you have data in an external source that you want to periodically fetch and load into Domino, you can do so with scheduled Runs set up to write to Datasets with advanced mode.

    Suppose you have data stored in an external data source that is periodically updated. If you wanted to fetch the latest state of that file once per week and load it into a Domino Dataset, you could use the following process to set up a scheduled Run.

    Create a Dataset to store the data from the external source.

    Write a script that fetches the data and writes it to the Dataset.

    Set up an advanced mode configuration to bridge between your script and your Dataset.

    Create a scheduled job to run your script with the new Dataset configuration.

    Below is a detailed example showing how to fetch a large, dynamic data file from a private S3 bucket with a scheduled Run once per week.

    First, create a Dataset to hold the file. This example shows the Dataset being named fetched-from-S3.

    After clicking Upload Contents, the Dataset will be created. However, instead of using one of the UI options to perform an upload, you should instead click Files from the project menu, then click Add File to start creating the script for your scheduled Run.

    For this example, assume the S3 bucket is named my_bucket and the file you want is named some_data.csv. In that case, you can set up your script like this:

    fetch-data.py

    import boto3

    import io

    # create new S3 client

    client = boto3.client('s3')

    # download some_data.csv from my_bucket and write to latest-S3 output mount

    file = client.download_file('my_bucket',

    'some_data.csv',

    '/domino/datasets/latest-S3/some_data.csv')

    It's important to note that the latest-S3 part of the path in the last line of the script is a folder you need to set up as part of your Datasets advanced mode configuration. To set that up, create another new file in your project, and name it domino.yaml.

    To match the script shown above, its contents should be the following:

    domino.yaml

    datasetConfigurations:

    - name: "pipe-in"

    inputs:

    - path: "latest-S3"

    dataset: "fetched-from-S3"

    That configuration sets up the fetched-from-S3 Dataset created earlier for new input at the latest-S3 path used by the fetch-data.py script. The last step is to set up a that executes this script once per week with the correct Dataset configuration.

    View Article
  • Datasets best practices

    To learn more about Datasets, read:

    Datasets overview

    Datasets advanced mode tutorial

    View Article
  • Overview

    domino.yamlis a file that defines Dataset configurations.

    A Dataset configuration controls:

    Existing Dataset Snapshots and how those Snapshots are mounted for input.

    New directories that can become Snapshots and how those directories are mounted for output.

    Schema

    The domino.yaml file respects the following schema. Spaces matter.

    datasetConfigurations: # contains array of configurations

    - name: string # identifier for this configuration

    inputs: # contains array of datasets to mount for input

    - path: string # path appended to /domino/datasets

    dataset: string # name of the dataset to mount as input

    outputs: # contains array of datasets to mount for input

    - path: string # path appended to /domino/datasets

    dataset: string # name of the dataset to mount for output

    Valid fields in the YAML object are:

    datasetConfigurations

    This is a required field. It must be the first very first field on the first line. Only one of these fields can exist in the YAML file. This will contain an array of individual Dataset configurations.

    name

    Identifier for a specific configuration.

    path

    Desired mount path for Dataset Snapshot or new Snapshot directory.

    dataset

    Name of dataset. If configured as input, the latest Snapshot of the Dataset will be mounted by default. A different tagged Snapshot can be specified using a colon, like {dataset-name}:{tag}. For example: iris:test

    inputs

    Contains array of one or more [path, dataset] specifications to be mounted for input.

    outputs

    Contains array of one or or more [path, dataset] specifications.

    Error Handling

    If you attempt to use an invalid domino.yaml, you may see one of these categories of error.

    Invalid field that Domino does not recognize in a particular position

    The error indicates the field found and shows valid field options for that position.

    Example

    Found invalid field in domino.yaml: “output”.

    Valid field options: “inputs”, “outputs”, “name”

    Valid field that Domino recognizes in a particular position, but there is an error.

    An example of this is two outputs fields in one name block.

    Example

    There is an error in domino.yaml encountered while processing field “outputs”.

    Please check all your “outputs” fields.

    Valid field that Domino recognizes in an incorrect position.

    An example of this is a valid field with the wrong indentation.

    Example

    There is a formatting error in domino.yaml encountered while processing field

    “dataset”. Please check all your “dataset” fields.

    Syntax error

    An example of this is a missing quote. In some cases, we can identify the region the error occurs.

    Example

    There is a formatting error in domino.yaml in the block near 12.

    Catch all

    We are having trouble parsing domino.yaml.

    Please see the support article linked above. If you still cannot identify

    the problem, please email [email protected] about your problem and include your domino.yaml.

    View Article
  • You can configure projects to importfiles and environment variables from other projects. This allows you to use your team’s existing work as reusable building blocks, and avoid unnecessary repetition.

    For example:

    A canonical dataset used by multiple projects can be managed in a single place.

    Your code can reside in a separate project from your data. No need to duplicate large datasets within multiple projects.

    An external data source requiring login credentials (such as a database) may be securely represented via environment variables in a single project, and then used by many projects.

    Results from one model(such as trained model files, or R workspaces) can be imported and used by multiple different downstreamprojects.

    If a project's files are organized as an R or Python package, then you can configure other projects to automatically install them at runtime.

    How it works

    The first step is to configure the exporting project. Projects can export files and environment variables.

    Other projects canimport content from projects that are configured for export. After you’ve set up import, the content from the exporting projects is accessible when you run code in the importing project.

    During runs with imported files, each project directory is located at /mnt/<username>/<project name>, where <username> is the owner of that particular project. Imported directories are read-only.

    Note

    The path of your main project will also change from /mnt to /mnt/<username>/<project name>. If you have hardcoded any paths in your projects to /mnt, we recommend replacing the hardcoded paths to usingthe $DOMINO_WORKING_DIRenvironment variable. This will ensure the correct path regardless of whether other projects are imported. See the support article on Domino Environment Variables for more information.

    Setup

    First, you need to set up the projects you want to export from. You need to have Owner, Collaborator, or Project Importer access to the projects to set them up for export.

    To set up export, open the project in Domino and click Settings from the project menu, then open the Exports tab. In the panel on that tab, click the checkbox for files or environment variables to make those types of content available to other projects. To export as a package, select the appropriate language from the Code Package dropdown.

    Next, open the project into which you want to import content. Click Files from the project menu, then open the Other Projects tab. Add projects by filling in the project name field, then clicking Import. You'll see projects currently being imported listed in a table below.

    Note

    Only the files from the directly imported project will be viewable when you import. For example if project A is imported into project B, and then your project imports B,only the contents of B will be accessible to your project.

    Troubleshooting

    Running python scripts from imported project

    When running a Python script from an imported project you may encounter the error message:

    FileNotFoundError: [Error 2] No such file or directory:

    When a python script runs an import, it executes that code with the current working directory, so if you have a relative path in the imported file, it will try to find the file in the current folder and fail. In this case, you can update your imported script to use an absolute path based on the current path of the imported file using os.path, eg.

    import os

    file_name = os.path.join(os.path.dirname(__file__), 'your_referenced_file.dat')

    View Article
  • Using Search

    Domino’s search feature is a comprehensive tool for locating specific files or bits of text across your entire deployment.

    To use the search feature, click Searchin the left navigation bar. Type in the term you’re looking for. The search panel will automatically update with results organized by source, including project names, files, runs, comments, and more.

    When Domino searches files, it will return results found in both filenames and file contents. However, Domino only indexes the latest revisions of files, so the search results will not contain occurrences of your search term from previous versions.

    Security

    Domino search respects the collaboration and privacy settings for projects. If you do not have read access to a project, then that project will never appear in your search results.

    Advanced Search Options

    In addition to the foursearch tabs, you can use the following commands in your queries to target what you search:

    project.tag=

    project.tag.approved=

    project.description=

    project.name=

    project.user=

    For example:

    View Article
  • Forking a project copies all of its files into a new, separate project, allowing for independent development and experimentation. Changes in the forked project can later be reviewed and merged back into the original project.

    Forking

    To fork a project, open the project overview and click Fork. Enter a name for the fork when prompted.You must be the project owner, or have access to the project as a contributor or results consumer in order to fork it. You can learn more about project roles and access control here.

    copied to the newly forked project:

    All files

    Revision history of all files

    Environment variables

    These things are not copied to the newly forked project:

    Run history

    Project settings, including collaborators and compute environment

    Launchers

    Discussion history

    Projects that have been forked, or were created by forking another project, will link to related forks on the project overview page.

    Merging

    Once you've made some changes to the new fork, you can initiate a merge by clicking Request Review on the project overview page. You must be the project owner, or have access to the project as a contributor in order to request a merge review.

    You will be prompted to submit a review request, in which you can review the changes and describe their effects with a message. Once submitted, contributorsto the main-line project are notified. The merge will occur when a contributor accepts the review, and a new revision of the main-line project will be written with the forked changes merged in.

    To view a history of Review Requests, including the status of current requests, select"Reviews" from the left-hand menu.

    View Article
  • Project tags are an easy way to add freeform metadata to a project. Tags help colleagues and consumers organize and find projects that interest them in Domino. Tags can be used to describe the subject matter explored by a project, the packages and libraries it uses, or the source of the data within.

    Tagging a Project

    Tags can be added, deleted and modified from a project's overview page by clicking the `+` button above the description.

    librarians

    While you can create a tag with whatever content you'd like, tags indicated in green have been marked as approved by your Domino admin or librarian to help reduce duplicate tags.

    Managing Tags

    Domino admins and can manage the tags in a Domino deployment. From the left navigation menu you can click Tags to open the tags interface.

    From this screen you can add, delete, and edit existing tags. You can also merge tags. You can mark a tag as approved which will make it appear green to all of your users, and signal that its use is encouraged.

    View Article
  • Overview

    Domino allows you to easily revert entire projects or individual files back to previous versions at any time. This is intended to encourage experimentation and creative approaches to problem-solving by eliminating the overhead cost of recovering earlier, functional versions if experiments don’t pan out.

    Reverting a Project

    To revert to an earlier version of a project, open the project in Domino, then click Files from the project menu. Above the table of files you'll see a dropdown menu that controls which revision of the project files is shown. The current revision is at the top, and older revisions appear below it chronologically. Click on an older revision to view it.

    While you're viewing an older revision, the files page will have a yellow background, and a callout will appear below the dropdown menu, alerting you to the fact that this is an older state of the project, and providing a link to switch back to the latest.

    You'll also see a Revert Project button next to the dropdown menu. Clicking this button will revert your project to the state shown in the revision you're currently viewing. This will result in a new revision being applied on top of the existing project history. The revision you reverted from is not lost.

    Reverting a File

    Domino allows you to revert individual files as well. This can be useful in cases where you wish to revert to an earlier version of a file while still preserving changes made to other files in the project.

    To revert to an earlier version of a file, open the project containing the file in Domino, then click Files from the project menu.Click on the name of the file you want to revert to view the contents of the file. You'll see a dropdown menu above the file contents that can be used to select which revision of the file you want to view.

    While viewing an older revision of the file, you'll see a Revert File button next to the dropdown. Clicking this button will write a new revision of the project that changes this file to match the state you want to revert to. Just like with reverting a whole project, the revision you're reverting from is not lost.

    View Article
  • Domino can show you a rich comparison of the differences between two revisions of a file in your Domino project. To view this comparison, open the file from the Files page and click Compare Revisions.

    Domino will open the file comparison tool and display two menus above its contents. The left menu, labeled Base, sets the starting version of the file for your comparison. The other menu, labeled Target,lets you select another version of the file to compare to the base version.

    Note

    When you are viewing the Files page for your project, you can select between all revisions for the project. When you are viewing an individual file, the revisions dropdown is limited to only those revisions where a change was made to that file.

    View Article
  • Overview

    Domino supports adding Git repositories to projects. Repositories that have been added to a project are available to Runs started in that project, allowing you to access the contents of those repositories just as you would your Domino files. This article explains how you can add a Git repository to a project, access the added repository from within a Workspace, and commit any changes back to the repository.

    Domino supports connecting to Git servers via HTTPS and SSH, and both public and private repositories.

    Contents

    Step 1: Create credentials

    Option 1: SSH key

    Option 2: Personal Access Token

    Step 2: Add your credential to Domino

    Option 1: SSH key

    Option 2: Personal Access Token

    Step 3: Add a repository to a project

    Working with a Git repository in Domino

    Committing back to Git repositories

    Git interaction from the workspace command line

    Tracking changes to repositories made in Domino Runs

    Troubleshooting

    Step 1: Create credentials

    If you are adding a private repository, want to write commits to remote, or are using SSH, you will need to add Git credentials to your Domino account. Domino will use these credentials to authenticate with the service hosting your repository when you start a Run.

    Domino supports storing two types of credentials:

    Personal Access Tokens

    SSH private keys

    If you already have the credential you need, you can skip to step 2.

    Option 1:SSH key creation

    To connect with SSH, you'll need a private SSH key that corresponds to a public key that you've added to your Git service. Check out the GitHub documentation for thorough instructions on creating and adding keys.

    Option 2: Personal Access Token creation

    You will need a Personal Access Token to access a private repository via HTTPS. You will need a Personal Access Token if the URI you want to use to interact with a repository is formatted as:

    https://<domain>/<user>/<repository>.git

    Personal Access Tokens are supported by the following Git services.

    GitHub Personal Access Tokens

    GitLab Personal Access Tokens

    Note

    To connect to Bitbucket repositories via HTTPS from Domino, you must add a Bitbucket App Password credential to your Domino account.

    Note

    If your GitHub organization requires SSO then you will need to authorize the PAT or SSH key in order to access private repos via Domino.

    Read the GitHub documentation for instructions on authorizing keys for SSO on Github.

    Step 2: Add your credential to Domino

    Option 1: SSH private key

    You will need an SSH Private Key to access a repository via SSH. You will need an SSH private key if the URI you want to use to interact with a repository is formatted as:

    <user>@<domain>:<username>/<repository>.git

    SSH access is supported by the following Git services.

    GitHub SSH Access

    GitLab SSH Access

    Bitbucket SSH Access

    After setting up SSH access with your Git service, you should have both apublic keythat you provided to the Git service, and aprivate key. Use these steps to add theprivate keyto Domino:

    In Domino, click your username at the bottom of the main menu, then clickAccount Settings.

    authorizing keys for SSO on Github

    Scroll down to the panel labeledGit Credentials, then clickAdd a New Credential.

    In theDomainfield, enter the exact domain of the service hosting your repository, such asgithub.com, bitbucket.com, oryour-internal-gitlab-url.com.

    ForAuthentication Credential Type, click to selectPrivate SSH Key.

    Paste in your private key. This will be the contents of the private key file that matches the public key you provided to your Git service.

    If you set up your SSH keys to require a passphrase when used, enter it in thePassphrasefield, then clickAdd Credential.

    You should now see your credential listed in theGit Credentialspanel. You can also delete it from this panel if desired.

    Option 2: Personal Access Token

    The Personal Access Token you generate needs to have read and write access to your private repositories. After generating a Personal Access Token in your Git service, use these steps to add it to Domino:

    In Domino, click your username in the upper right, then clickAccount Settings.

    Scroll down to the panel labeledGit Credentials, then clickAdd a New Credential.

    In theDomainfield, enter the exact domain of the service hosting your repository, such asgithub.com, bitbucket.com, oryour-internal-gitlab-url.com.

    ForAuthentication Credential Type, click to selectPersonal Access Token.

    Enter your Personal Access Token, then clickAdd Credential.

    You should now see your credential listed in theGit Credentialspanel. You can also delete it from this panel if desired.

    Step 3: Add a repository to a project

    Open the project you want to add a repository to, then clickFilesfrom the left navigation bar.

    Click to open theGit Repositoriestab, then clickAdd a New Repository.

    Enter an optional directory name and the HTTPS or SSH URI of the repository you want to add. The directory name will be the directory in /reposthat this repository clones into. It defaults to the name of the repository.

    Use the dropdown menu to choose which branch of the repository you want Domino to check out when it clones this repository into a run or workspace. If you leave this setting atUse default branch, Domino will check out the branch specified as default by your Git service, typicallymaster. You can also specify a different branch name, tag name, commit ID, or supply a custom Git ref.

    ClickAdd Repository.

    Working with a Git repository in Domino

    When you start a run or workspace in a project, any repositories added to the project are cloned into/reposand will have the branch or commit you specified checked out.

    Remember that your Domino working directory is in/mnt, which is a sibling of/repos. Both directories are in the filesystem root (/). Scripts you have added as Domino files can interact with the contents of these repositories by specifying an absolute path to/repos/<repo-name>/<file>.

    Committing back to Git Repositories

    When you start a Workspace session in a project that has added Git repositories, you will see those repositories listed in theSession OverviewunderGit repos. If you make changes to the contents of those repositories while running the workspace, those changes will be itemized file-by-file under each repository.

    If you want to commit those changes back to the repository, click the checkbox next to the repository name and then clickFull Sync.

    You will be prompted to supply a commit message. This commit message will be attached to commits to the selected Git repositories, and to a new revision of the Domino project if there are changes to Domino files. Git commits will be pushed to the default branch you specified when adding the repository.

    If you attempt to stop your workspace while there are uncommitted changes to your Git repositories, you will be prompted to commit those changes. This works the same as theSession Overviewinterface. Click the checkbox next to the repositories you want to commit to, supply a commit message, and clickStop and Commit.

    Note

    If you try to commit when there are conflicts between your local changes and the state of the default branch in remote, Domino will create a new branch from its local state. Domino will then push that new branch to remote.

    After this happens, you will need to resolve those conflicts outside of Domino, or use the command line in your Workspace session to resolve them. The next time you launch a Workspace session, Domino will check out the default branch from remote, not the new branch it pushed.

    Git interaction from the workspace command line

    Both Jupyter and RStudio workspaces have command line tools. You can use these to interact with your repositories with conventional Git commands. Navigate to/reposin your command line to find your project’s repositories. Visit the official Git documentation to learn more about using Git on the command line.

    To open the RStudio command line, clickTools -> Shell

    To open the Jupyter command line, from theFilestab clickNew -> Terminal

    Tracking changes to repositories made in Domino Runs

    When viewing the Details tab of a Domino Run, at the bottom you will find a Repositories panel. You can expand this panel to see details of how the repository changed during the Run. Domino records the checked out commit at the start of the Run and the end of the Run.

    Troubleshooting

    Run Error:

    Errors occurred while processing dependencies. Please contact [email protected]:

    Credentials are required for your repository: project-name (ssh://[email protected]/your-org/projectname.git)

    Solution:

    Your Git Credential added to Domino may have the incorrect Domain. Double-check the domain field in your Git credential to ensure it matches your exact Git repository URL, like:

    github.com

    bitbucket.com

    your-internal-gitlab-url.com

    Run Error:

    Errors occurred while processing dependencies. Please contact [email protected]:

    Authentication is required for your repository:

    The repository provided requires credentials but none were found.

    Please add SSH or PAT authentication to your Domino account.

    Solution:

    There are a couple steps to check when encountering this error. First, ensure your private SSH key or PAT has been added to the Git Credentials section of your Domino Account Settings page. Second, if your organization's Git repo requires SSO access, you may need to authorize the key you have added.Take a look at the following instructions on authorizing keys for SSO on Github.

    Run Error:

    Errors occurred while processing dependencies. Please contact [email protected]:

    remote: Invalid username or password.

    fatal: Authentication failed for 'https://github.com/<your account>/<your repo>/'

    Solution:

    If your organization's Git repo requires SSO access, you may need to authorize the key you have added.Take a look at the following instructions on .

    Keywords: Git, GitHub, GitLab, Bitbucket, credentials, integration, connect

    View Article
  • Contents

    Overview

    Project files

    Special files

    Project settings

    Hardware tier

    Compute environment

    Environment variables

    Access and sharing

    Stage and status

    Activity Feed

    Export and import

    Setting up export

    Setting up import

    Overview

    Work in Domino happens inprojects. Projects contain data, code, and environment settings, and the entire project is tracked and revisioned automatically. A new commit is written to a project each time its files are changed by user action, or by the execution of code in the project. Users in Domino can create their own new projects, invite other users to collaborate on them, and export data or results for consumption by other projects.

    Project files

    Domino manages a collection of files for every project. Files can be added to a project in the following ways:

    uploaded from the Domino web application

    uploaded from the Domino Command Line Interface

    uploaded via the Domino API

    created and edited in the Domino web application

    generated by the execution of code in a workspace orrun

    Each of these ways of modifying a project’s files creates a new revision of the project. Whenever you start a Run from a Domino project, the files in that project are loaded onto the machine hosting the Run. This machine is known as the executor. The project files are mounted in the/mntdirectory, which is in the filesystem root of the executor. Domino keeps track of changes to this directory. When a Run completes, Domino will record changes to/mntas a new revision of the project. To learn more about how files are loaded into and changed by Runs, read about the Domino service filesystem.

    It is also possible to add external Git repositories to projects. Doing so makes the contents of those repositories available in runs and workspaces in the/reposdirectory of the executor. To learn more, read about Git repositories in Domino.

    Special files

    There are several special filenames reserved by Domino. These files control the revisioning behavior, results display, and run comparison features of a project.

    .dominoignore

    By default, all projects include a.dominoignorefile in the project root folder. This file functions similarly to a .gitignore file, and can be used to exclude certain file patterns from being written to future revisions of the project files. Domino will ignore files that match the specified patterns whenever a new revision is created. This includes revisions created by syncing from your local machine using the CLI, as well as new revisions created by arun or workspace session.

    To ignore a file pattern, add it to.dominoignore. Patterns can be filenames, folder names, or UNIX shell regular expressions. Adding a folder will ignore it along with all of its contents. Note that the*symbol in UNIX shell regular expressions is a wildcard, and will match. All paths must be relative to the project root. Take a look at the contents of the default.dominoignorein one of your projects to see commented examples of excluded patterns.

    Note

    A.git/directory is always ignored by Domino sync operations, even if that pattern is not listed in.dominoignore.

    .dominoresults

    Domino projects include a special file named.dominoresults. This file controls which files appear in the results dashboard for this project’s runs. It is constructed similarly to.dominoignore, but lists file patterns toincludeinstead ofexclude. If no patterns are listed in this file, all files changed by a run will be included in the results dashboard. If any patterns are listed in this file, only files which match those patterns will be included in the results dashboard for this project’s runs.

    For example, a.dominoresultsfile that contains the following lines will only display the two specified files in the results dashboard.

    histogram.pdf

    output.txt

    A.dominoresultsfile that contains the following lines will display all PDF files in the project, plus any PNG files that are in theresults/folder.

    *.pdf

    results/*.png

    dominostats.json

    Domino’s run comparison feature checks for a file nameddominostats.jsonto compare key measurables from individual runs. This file is automatically deleted at the beginning of a run, and will only exist in the project revision produced by a run if a fresh version is written during execution. Read more about runs to learn the details of how this feature works.

    Project settings

    There are several important settings attached to every Domino project. To access project settings, open a project overview and clickSettingsfrom the left sidebar.

    Hardware tier

    Hardware tiers describe the compute hardware that will be used for project executors. Executors can either be virtual instances from a cloud services provider, or a machine running in your deployment’s on-premise data center. Local administrators will configure the hardware tiers available for your deployment. Use theHardware & Environmenttab of the project settings to set a specific hardware tier for your project.

    You should choose a hardware tier that will provide the performance your workflow needs, bearing in mind the cost of the hardware in cloud deployments, and the impact of your tenancy on local hardware in on-premise deployments. Domino will use this hardware tier for all runs started in the project. When the hardware tier is changed, it will be the default for future runs in the project, although the default can be overridden when starting a run that requires alternate hardware.

    Compute environment

    Compute environments are specifications for containers in which Domino projects will run. Users can create new environments and access public environments shared in their deployment or organization. Whenever a new executor is launched or provisioned for use with a project, Domino loads the compute environment specified in theHardware & Environmenttab of the project settings.

    Click to read more about how environments work.

    Environment variables

    Domino pulls environment variables from three sources whenever it loads a run or workspace:

    User, project, and hardware information. These are stored in variables set by Domino automatically.

    Environment variables defined in the user profile of the user starting a run.

    Environment variables defined in theHardware & Environmenttab of the project settings.

    These environment variables can be used to securely store keys and credentials needed by the project. The names of these variables must start with a letter, and contain only alphanumeric characters and underscores.

    Access and sharing

    Read Sharing and collaboration for details on how to grant various types of access to your projects.

    Project stage and status

    In Domino 3.5+, projects can be labeled with configurable stages that track their progress through a data science life cycle. If your Domino administrator has configured project stages, you will see the current stage of your project displayed in brackets below the project name in the project menu. If you click the project name, a panel will open with a dropdown menu that you can use to change the project stage if you are an owner or contributor on the project.

    Tracking your project through the stages used by your team will help your colleagues understand what kind of work is happening in the project and how they can contribute.Changing the stage of a project is an event that will appear in the project's activity feed.

    Projects also have a status, which is indicated by the colored pip next to the project name.

    A project's status can be:

    Green

    Marks an active and progressing project.

    Red

    Marks an active project that is blocked.

    Grey

    Marks a completed project.

    By default, new projects are set to a green, active status. To modify a project's status, click the project name at the top of the project menu to open a panel with options to mark a project as blocked or complete. When you do so, you'll be given the option to supply a message describing the blocker or end result of the project.Changing the status of a project is an event that will appear in the project's activity feed with an attached comment thread, so project collaborators can discuss blockers or project conclusions.

    Marking a project as blocked

    Project owners and contributors can mark a project as blocked. You should mark a project as blocked when you need assistance from colleagues or administrators to make progress. Domino administrators and your project collaborators will receive an email notification when you mark a project as blocked. Some common cases where raising a blocker can help are:

    You need assistance setting up additional tools in a Domino environment

    You need access to new data sources

    You need hardware capabilities not covered by your current hardware tier

    The same menu used to mark a project as blocked can be used to unblock the project, which returns it to a green, active status.

    Marking a project as complete

    Project owners and contributors can mark a project as complete. This allows for final conclusions and products to be recorded in the project's activity feed, and filters the project out of active project views. On your projects overview, you will find a checkbox to Show completed projects you can use to find projects that have been marked as complete.

    The same menu used to mark a project as complete can be used to reopen the project, which returns it to a green, active status. Note that a project marked as complete is still a fully functional Domino project. You can modify its files and start Runs in it, but before doing so you may want to consider reopening the project to indicate that work is continuing.

    Project activity

    Click Activity in the project menu to open the project's Activity Feed. On this page you will see the history of activity in the project, including:

    Jobs started

    Workspaces started

    Comments left on Jobs or Workspaces

    Comments left on files

    Project stage changes

    Blockers raised or resolved

    Files created, edited, or deleted in the Domino UI

    Files modified in Workspace sessions

    Models or Apps published

    Scheduled Jobs published or edited

    You can use the dropdown menu at the top of the feed to filter out comments, Jobs, or Workspaces. If you check two successfully completed Runs in the feed, you can use the comparison button next to the filter menu to open a Run comparison.

    Export and Import

    It is possible to import content from one Domino project into another. The importing project may have access to the exporting project’s files, environment variables, or both, depending on configuration. During runs with imported files, each project directory is located at/mnt/<username>/<project name>. When a run or workspace is started, these files are pulled in alongside the current project’s files . Imported directories are read-only.

    Note

    The path of your project will also change from/mntto/mnt/<username>/<project name>when you have imported projects. If you have hardcoded any paths in your project code to/mnt, you should replace them with paths that use the$DOMINO_WORKING_DIRenvironment variable.

    Setting up export

    From the project overview page, in the left sidebar clickExportsunderSettings. From this interface you can enable exports for the project’s files and environment variables separately, or export the project files as a Python or R package. If none of these are enabled, other projects will not be able to import anything from this project.

    By default, projects will make their latest revision available for export when configured. You can also make revisions produced by specific runs available for import by tagging those runs withrelease. From the runs page of a project, select the runs you want to export by clicking the checkbox next to them, then click the tag button at the top of the list. Enter the exact stringreleaseto mark the revision created by the selected runs as available for export.

    Setting up import

    From theFilespage of the project you want to import into, click theOther Projectstab. Enter the path to the project you want to import, with the format<username>/<project-name>. The following conditions must be true for you to successfully import a project:

    You must haveProject Importer,Contributor, orOwnerpermissions on the project.

    The project must be configured for export.

    After adding a project that exports files, you can choose which revision of the project files you want to import with theReleasemenu.

    View Article
  • Prerequisites

    Domino 3.5.5+

    Projects Portfolio

    The Projects Portfolio allows you to track the status of projects you have access to, including:

    projects you own

    projects you have been added to as a collaborator

    or if you are a system administrators, all projects across Domino

    To open the Projects Portfolio, click Control Center in the Switch To menu.

    how users set project stages, raise and resolve blockers, and change project status

    You will find the Projects Portfolio as an option in the Control Center main menu.

    The Control Center interface above shows the Projects Portfolio with the following important elements:

    This is the main menu option that opens the Projects Portfolio

    This button filters the table of projects to show only projects that are marked by users as blocked

    These tabs filter the table of projects by active or complete status

    These buttons filter the table of projects by project stage

    This menu selects which columns of data to display in the projects table

    This interface allows you to quickly digest the state of work in your projects. To maximize the usefulness of this tool, be sure to understand how administrators can configure meaningful project stages for their teams, and read about.

    View Article
  • Prerequisites

    Domino 3.5+

    Logged in as a user with a system admin role

    Project stage configuration

    As a data science leader, you have the ability to define a set of custom project stages that users in Domino can use to label their projects for creating useful views in the Projects Portfolio. These stages can be used to mark a project's progress through the workflow and life cycle your team uses. To learn more about how users interact with and set project stages, read about stage and status in the projects overview

    To set up the stages that will be available to users in your Domino platform, open the Admin interface, then click Advanced > Project Stage Configuration.

    a custom default project for new users

    On the project stage configuration interface, you can click Add Record to create a new stage label that will be available for Domino users to set on their projects. The record at the top of the list is the default stage all new projects created in Domino will have, and projects can be changed to any other available stage.

    These stages are a custom set of labels that allow your Domino users to communicate progress in a project to their colleagues and to leadership. It's up to you as a data science leader to determine the stages that you want available, and to communicate to your team how they should be used.

    Domino recommends setting up with information in the README about your teams practices, available environments, and how users should use project stages.

    View Article
  • Overview

    When experiments in Domino yield interesting results that you want to share with your colleagues, you can easily do so with a Domino App. Domino Apps host web applications and dashboards with the same elastic infrastructure that powers Jobs and Workspace sessions.

    Domino supports hosting Apps built with many popular frameworks, including Flask, Shiny, and Dash. Apps are first-class objects in Domino, and Domino includes features that allow for easy sharing, monitoring, collaborating, and iterating on Apps.

    Contents

    How do I publish an App?

    How do I view my App?

    Who can see my App?

    Where do I find Apps in Domino?

    How do I publish an App?

    Apps are published from Domino Projects. To publish an App, you need to:

    Have all of your application code in the project files for the Project you want to publish from, or loaded into your Project from an external git repository.

    Configure your application to serve from a host address of 0.0.0.0 on port 8888. This is the host and port Domino will use when directing users to your application server.

    Have an app.sh file in the project. Domino will look for and execute a shell script named app.sh after creating the hardware that will host your App. Put all commands required to launch your application in app.sh.

    When your application code is ready for publishing, and you've set up app.sh, click Publish from the project menu, then click App. Give your App an informative name and description, choose a permissions setting, and toggle the Show in Launchpadcheckbox to control whether your App appears in the Domino Launchpad.

    Click Publish when you're finished.

    Once your App is published, the App link from the project menu will navigate you to the App settings page, where you can click View App to open your App or copy a link to it for sharing with your colleagues.

    For complete end-to-end examples of App publishing, check out these tutorials:

    Getting started with Dash (Python App framework)

    Getting started with Shiny (R App framework)

    Publishing a Flask web app in Domino

    Remember these key facts about App publishing

    Your App will run on the same Domino execution hardware your project uses normally. Make sure your environment has all the dependencies your application requires.

    Your application must be configured to serve from a host address of 0.0.0.0 on port 8888.

    The performance of your App will depend on the design of the underlying application. Read more about designing applications for performance.

    How do I view my App?

    The App settings page has a View App button that can be used to open your App while it is running. You can also copy a persistent URLthat can be used to access and share your App.

    You can also see all of your own Apps from the Launchpad.

    A dashboard showing the history of Runs hosting your App can be seen on the App Versions tab of the App settings page.

    Who can see my App?

    Project owners, contributors, and results consumers automatically have access to an App. To control more general access to an App, use the Permissions tab on the App settings page.

    Under Access Permissions are the following four options:

    Anyone, including anonymous users

    Any request to the App URL will be served the App. This setting means that anyone with network access to Domino can view your App. This is useful for sharing Apps with people on your network who do not have Domino accounts.

    Anyone with an account

    All users who have Domino accounts and are logged in to Domino can view your App.

    Invited users (other users may request access)

    All users who have Domino accounts and are logged in to Domino can request access to your App if it appears in the Launchpad, but cannot view the App until the owner grants the request.

    Invited users only

    Only Domino users who are added by the App owner via the Invite People field can view the App. Users cannot request access.

    Use the Invite People field to send email invites. Domino users who receive an invite will be able to access the App.

    Users who request access will appear provisionally in the Who has access table, and their requests can be granted or denied from controls in the Status column.

    Where do I find Apps in Domino?

    All Apps that are configured to Show in Launchpadwill appear in the Domino Launchpad.

    This is the primary interface for Domino users who want to consume Apps. You can click on an App in this list to see its description, settings, recent usage, and a link to either view the app or request access.

    When viewing an App in Domino, App consumers have access to a toolbar with controls to view the App description, and to contact the App owner.

    View Article
  • Overview

    Launching an App in Domino works the same as any other Domino run. Domino assigns hosting of your App to an executor machine in the hardware tier your App is configured to use. That executor then retrieves the default Domino environment configured for your project, and creates a container based on that image. Domino then loads your project files onto the machine and executes the app.sh file that you have authored and placed in the project root, at which point your application will be running in its container.

    Depending on which hardware tier you select, the container running your application may share a host machine with other containers or run on a dedicated host. Your selection of hardware tier allows you to specify available memory and CPU. However, Domino Apps do not automatically scale horizontally to multiple hosts. Your App can scale vertically to use all available resources on the executor host machine by correctly configuring the underlying application.

    The following sections describe configuring web applications to optimize scalability and performance for several popular frameworks.

    Flask and Dash (Python)

    By default, Flask and Dash will run single-threaded on a single process. The authors of Flask do not recommend this configuration if you are going to serve more than 10 users concurrently, or for any externally consumed applications. The Flask documentation provides many ways to serve the application in a more scalable way.

    For example, you can serve a Flask application through gunicorn. To do this in Domino, change the project’s app.shfile from:

    python app.py

    to

    gunicorn -w 4 -b 0.0.0.0:8888 myproject:app

    This will start serving the Flask application on 4 processes.

    The performance and scalability of your App will depend on the compute demands of your application, and the compute resources available on the host machine. If there is a command in your application that will use 100MB RAM and 20\% of a standard VM CPU, then an executor host machine with 1 core and 1 GB RAM could handle 5 concurrent users running that command without suffering reduced performance. A 6th user attempting to run the command would cause the App’s performance to suffer. There would be RAM available, but not enough CPU cycles.

    Shiny (R)

    Like Python Apps, your Shiny Apps' performance will depend on design of the underlying application. While multiple users can view Shiny applications in independent sessions, R is a single process language. This means that multiple users can view and interact with the App in their own isolated session, but only one can do any processing at a time regardless of the memory or CPU of the machine.

    Shiny Apps typically cannot scale to more than a handful of concurrent users.

    Learn more

    Make Shiny fast by doing as little work as possible

    View Article
  • The video below is a recording of a webinar held on March 28, 2019.

    The webinar provides an overview of App publishing in Domino, and walks through some example demonstrations.

    App scaling and performance

    Click here to view and download the slides from the webinar presentation.

    For more information about Apps, read:

    App publishing overview

    Getting started with Dash

    Getting started with Shiny

    View Article
  • Overview

    This article describes how to configure, administer, and troubleshoot Domino Apps running on Kubernetes.

    Prerequisites

    Running Domino 3.4+ in AWS with Apps on Kubernetes configured.

    Contact [email protected] if you're interested in setting up Apps on Kubernetes.

    Enabling Apps on Kubernetes

    The following feature flags are accessible to Domino system administrators from the Admin interface at Advanced > Feature Flags.

    ShortLived.RunAppsOnKubernetes

    Default: false

    Description: When set to true, Domino will run Apps on Kubernetes.

    ShortLived.ComputeGridExecutionEventHistoryEnabled

    Default: false

    Description: When set to true, Domino will log Kubernetes deployment events.

    Domino recommends setting these flags to the same value. If you enable Apps on Kubernetes, you will likely also want to enable Kubernetes deployment logs.

    Configuring Kubernetes hardware tiers

    When launching an App on Kubernetes, users will not have access to the general hardware tiers available to Domino Runs. Before running Apps in Kubernetes, Domino system administrators must create Kubernetes hardware tiers.

    These are created in the same part of the Admin interface as general hardware tiers, at Advanced > Hardware Tiers.

    [email protected]

    When creating a new hardware tier from this interface, you will find that the first option is to select a Cluster Type for the hardware tier. You must choose Kubernetes to create a hardware tier that can be used by Apps on Kubernetes. You will find that after setting the Cluster Type to Kubernetes, some of the other fields for configuring the hardware tier will disappear, as they represent settings for classic hardware tiers that are automatically managed by Kubernetes.

    This includes:

    Running in a specific subnet

    Overriding data volume mounts

    Publishing Apps on Kubernetes

    The process for authoring, configuring, and publishing Apps on Kubernetes is the same as publishing classic Domino Apps. Read the App publishing overview, App authoring guides, or watch the video introduction to Apps in Domino to learn the basics.

    With Apps on Kubernetes enabled, you will find when publishing your App that only hardware tiers with a Kubernetes Cluster Typeare available for use with Apps.

    Note

    Due to differences in networking configuration between classic hardware tiers and Apps on Kubernetes, the following environment level integrations are not currently supported for Apps on Kubernetes.

    Kerberos authentication

    Connecting to VPN

    EFS mounts in Docker arguments

    SSH to the App environment

    About file sync

    Apps hosted on classic hardware tiers were functionally similar to Domino Runs, and could sync back file changes to the Domino Service Filesystem.

    Apps on Kubernetes are different, and do not support Run-style file sync. However, you can still write files back to your Domino project from your App with the Domino API.

    Troubleshooting Apps on Kubernetes

    Apps on Kubernetes are designed to gracefully handle failures. If your App experiences a user-induced crash or intermittent failure, Domino will automatically restart the App to maintain its availability. You can see the standard console output of your App process by clicking View Execution Details on the App settings tab.

    If your App code has a bug that causes persistent failure, your App can enter a loop where Domino repeatedly tries to restart it and it repeatedly crashes. If this is happening, you will see your App remain in a Running state while the User Output shows a recurring sequence of Starting server... followed by the same error, as shown in the example below.

    When you see this, you should click Stop on the App settings tab to manually halt the crash loop, and use the information from the User Output log to fix the error in your App code or app.sh script.

    If instead of looping through server starts in a Running state, your App proceeds to a Failed state, this could be caused by cluster level deployment failure or misconfiguration. Domino will not attempt to automatically recover from such failures. If you encounter this kind of failure persistently, a Domino system administrator can click Download deployment logs and send the downloaded file to for assistance.

    Admins can also download deployment logs from past launches of your App on the App Versions tab.

    View Article
  • Understanding the performance of your experiments can involve analyzing many outputs and results. It’s often useful to seekey metrics at a glance across all your runs, to allow you to quickly identify which experiments are worth investigating further.

    Domino’s diagnostic statistics functionality allows you to do just that.

    To use this feature, write a file nameddominostats.json to the root of your project directory. Use keys in this JSON file to identify the outputs you're interested in, and then add the corresponding values.

    Here is some example R code that writes three key/value pairs to dominostats.json:

    diagnostics=list("R^2"=0.99,"p-value"=0.05,"sse" = 10.49)

    library(jsonlite)

    fileConn<-file("dominostats.json")

    writeLines(toJSON(diagnostics),fileConn)

    close(fileConn)

    Here is the same data being written to dominostats.json by Python code:

    import json

    with open('dominostats.json', 'w') as f:

    f.write(json.dumps({"R^2": 0.99, "p-value": 0.05, "sse": 10.49}))

    The resulting dominostats.json file from these code examples looks like this:

    {

    "sse": 10.49,

    "R^2": 0.99,

    "p-value": 0.05

    }

    The dominostats.json file is deleted before each run automatically by Domino. Therefore, past dominostats.json files will not pollutenew Jobs on your Jobs dashboard.If Domino detects that this file has been written to the project root by a Job, it will parse the values out and show them as columns on the Jobs dashboard. You can see the keys represented as available columns in the dashboard, and each row contains the corresponding value from a Job.

    use tagging to create a separate dashboard view

    You can also click Jobs Timeline at the top of the dashboard to expand a line chart of dominostats.json values over time. This chart shows all Jobs displayed in the current dashboard. To filter to a specific set of related Jobs, .

    The x-axis ticks on the timeline represent individual Jobs, and the y-axis represents the values for the statistics in those jobs. Hover along the chart to see individual data points as tooltips, and click at a point to open the details for the Job that produced that value. You can also click and drag on the chart to zoom in, and you can click on each stat in the legend at upper right to toggle its line on and off.

    View Article
  • These configuration options control various aspects of your Domino deployment. They are accessible from the admin page, via Advanced > Central Config. In order for changes to take effect, you’ll need to restart the Domino frontend server.

    Below are examples/descriptions of some of the options, grouped by functional area.

    Access Controls

    common com.cerebro.domino.defaultProjectVisibility Public

    defaultProjectVisibility controls the default level of access that users will have to new project. The Sharing and Collaboration page describes the different options for this "Visibility"setting.

    Executor Timeouts

    role frontend com.cerebro.domino.dispatcher.executorStopTimeout 30min

    role frontend com.cerebro.domino.dispatcher.executorTerminateTimeout 7day

    How long until an idle executor is stopped orterminated (VPC only)

    Environment Editing

    common com.cerebro.domino.environments.allowAnyUserToEditEnvironments true

    Email Alerts

    common com.cerebro.domino.supportAlerter.enableEmailSupportAlert false

    common smtp.from no-reply@mydomain

    common smtp.hostmyemailhost

    common smtp.passwordmyemailpassword

    common smtp.port 25

    common smtp.ssl false

    common smtp.user domino-smtp

    common com.cerebro.domino.email.notificationFromAddress no-reply@mydomain

    common com.cerebro.domino.email.transportType smtp

    common notification.maxFilesToAttach 10

    Run Directory GC

    common com.cerebro.domino.executor.runsGarbageCollectorExpirationsOverride { "Succeeded": "7days", "Failed": "7days", "Stopped": "7days", "Error": "2hours" }

    The runsGarbageCollectorExpirationsOverride controls how long a run working directory will remain cached on an executor. Leaving cached data may speed up subsequent runs on that executor, especially if the project is large (since the data copy step will be skipped). However, this requires ample disk space to maintain the cache.

    HTML file rendering

    commoncom.cerebro.domino.frontend.allowHtmlFilesToRender true

    Allows HTML files to render when opened so they display as a normal HTML file would render in a regular web browser. Setting to false means just the file contents will be shown

    If you need assistance with a configuration parameter not shown here, please contact us for support.

    View Article
  • As part of our effort to make data science teams more productive by providing a secure, centralsystem of record, Domino strives todeliver a consistent, equal user experience across whicheverplatforms our users prefer.

    To that end, we officially supportthe following desktop browsers - while Domino may work in other browsers, you'll likely receive a warning message encouraging you to switch to one we support.

    Domino Version

    Browser

    Browser Versions

    Notes

    1.54+

    Google Chrome

    Latest regular release

    Mozilla Firefox

    Latest regular release

    Microsoft Internet Explorer

    11

    Apple Safari

    10, 9*

    OS X only

    *Older releases of commonly used and supported browsers will maintainfunctionality but may result an inconsistent user experience.

    View Article
  • By default, anyone who can access the Domino application can create a user account. It is therefore sometimes necessary to strongly restrict networking access to Domino to prevent unwanted account creation.

    If such restrictions are not enough, it is possible to specify a whitelisted domain or set of email addresses and only allow signups when the provided email is on the whitelist. To set a whitelist:

    Log in to the target deployment as an admin.

    Click your username in the upper right, then clickAdmin.

    UnderAdvanced, clickCentral Config.

    ClickAdd Recordand provide the following:

    namespace:common

    name: leave empty

    key:com.cerebro.domino.frontend.signupEmailWhitelistOrDomain

    value: the value can beeithera domain (e.g.dominodatalab.com) or a comma-separated list of email addresses

    ClickCreate, then restart the Domino front-end for your changes to take effect. Users will then only be able to create accounts if the email they provide is on the whitelist or matches the whitelisted domain.

    Note: Domino does not do email verification when a user signs up. Anyone who entered a whitelisted email would be able to create an account, even if they did not own that email address. This technique relies on security through obscurity, and should not be relied upon long term.

    View Article
  • Overview

    In Domino 3.5+, administrators can use configurable thresholds to track user behavior across the platform for the purposes of identifying users who are taking up a Domino license. Users who access Domino only to consume data science products, view results, and run Launchers are not counted as taking up a practitioner license.

    Once a user performs a data science workflow like starting a Run or publishing a Model, the user will be considered a practitioner for the purposes of licensing.

    Tracking user license types

    To view user information and identify users who are taking up a license, open the Admin interface by clicking Admin at the bottom of the main menu, then click Users.

    From this interface, admins can see:

    What license type a user is assigned

    How many data science practitioner workloads a user has run

    The most recent activity for a user

    This allows admins to identify inactive users who are taking up a practitioner license, and there is an option in this interface to free up the license by deactivating the user.

    Generating user activity reports

    The same data on license types, practitioner workloads, and recent activity that is shown in the Users table is available as a downloadable CSV report. To generate a report manually, from the Admin interface click Advanced > User Activity Report.

    Admins can specify the following parameters for the report:

    A date range they want data from

    How far back to set the threshold for including actions in the recent activity section

    A specific project or organization to get data about

    Email addresses to receive copies of the report

    It's also possible to configure Domino to send User Activity Reports on a regular cadence. To set this up, click Advanced > Central Config from the Admin interface, then set the following options.

    Namespace: common

    Key: com.cerebro.domino.Usage.ReportRecipients

    Value: comma-separated list of email addresses to receive automated reports

    Default: empty

    Namespace: common

    Key: com.cerebro.domino.Usage.RecentUsageDays

    Value: number of days back to set as the threshold for recent activity

    Default: 30

    Namespace: common

    Key: com.cerebro.domino.Usage.ReportFrequency

    Value: cron string for how often to send usage reports

    Default: 0 2 * * * (daily at 02:00)

    View Article
  • Launchers let you turn your analyses into self-service web forms that less technical colleagues can interact with. They are greatfor creating "templatized" reports and analyses, so stakeholders can answer questions without bothering data scientists.

    This guide describes the basic concepts of how they work, and provides complete working examples in R and Python.

    You could build a Launcher to let your users upload data and specify parameters, and then see the output from any code you specify should run under the hood. For example, you could give your users something like this:

    like this

    That would produce for them something like this:

    Basic concepts

    A Launcher is essentially a web form on top of any script you could run through Domino, i.e., anything you could run at a command line. Any arguments (i.e., parameters) your script or executable expects can be exposed as UI elements in the web form.

    The Command

    When you create a Launcher, you specify a command that runs under the hood. This command serves as a template: when an end-user runs the Launcher through the web form, parameters in your command template will be replaced with the user's input values, and the resulting command will run.

    Outputs

    When your code runs, it runs just like anything else in Domino. Namely, Domino will detect new files your code produces include images, CSVs, PDFs, HTML, etc. and treat those as the "results". Whoever runs your Launcher will get a link to those results to view on the web, and they'll get an email when the results are ready. So your code can produce rich images, even interactive HTML dashboards, dynamically based on a user's input.

    Parameters

    Like any other script you run through Domino, your command can take parameters/arguments. Anything in your command of the form ${param_name} will be treated as a parmeter in the Launcher -- i.e., it will be presented to the end user for input.

    Your parameters can be of the following types:

    Text: normal text field

    Select: drop down where you can select one value from a list

    File: button to select and upload files

    Multiselect: list where you can select multiple values

    An end user would see those parameters rendered like this in the final web form:

    Writing your code to process parameter values

    When an end-user runs your Launcher through the web form, the user's input values will be passed into your command in place of the corresponding placeholders you specified, and that final command will be run as though it were a command-line executable. That means your underlying code can access parmeters using any standard method for reading command-line inputs. The most common techniques would be:

    Python

    Using sys.argv, e.g.,

    import sys

    p1 = sys.argv[1]

    p2 = sys.argv[2]

    # a file upload parameter

    with open(sys.argv[3], 'r') as f:

    print f.readline()

    # a multi-select parameter

    for part in sys.argv[4].split(","):

    print part

    R

    Using the commandArgs function, e.g.,

    args <- commandArgs(trailingOnly=TRUE)

    p1 <- args[1]

    p2 <- args[2]

    # a file upload parameter

    print(readLines(args[3], n = 1))

    # a multi-select parameter

    for (each in strsplit(args[4],",")) {

    cat(each, sep="\n")

    }

    Note that file parmeters will be passed into your file as the path to the file. Multi-select parameters will be passed in as the comma-separated list of all selected choices.

    Full examples in R and Python

    R

    Our simple example lets users input two numbers, and behind the scenes, we run some R code that adds them and prints the sum (obviously a trivial example). Our script is in a file called launcher.R, and we could tell Domino to run launcher.R 10 20 (or any other two numbers), so our Launcher's command will be launcher.R ${A} ${B}.

    You can view the "R example" Launcher in our sample project to see how this works. launcher.R reads inputs from the command line, using the commandArgs() function.

    args <- commandArgs(trailingOnly = TRUE)

    a <- as.integer(args[1])

    b <- as.integer(args[2])

    if (is.na(a)) {

    print("A is not a number")

    } else if (is.na(b)){

    print("B is not a number")

    } else {

    paste("The sum of", a, "and", b, "is:", a+b)

    }

    The R example Launcher is defined to take A and B both as parameters.

    When operational, the user will see:

    Python

    The "Scatter plotting your data" Launcher in the sample project shows a more sophisticated example in Python.

    The idea is to have a script that creates an interactive scatter plot, using Bokeh, from a CSV file that anyone can upload using a web form. The user provides (a) a file (b) what to put on the X- and Y- axes and (c) some information how to color the data points; our Python script generates an interactive Bokeh scatterplot.

    Writing the script

    This is the complete code of main.py.

    from bokeh.plotting import show, output_file

    from bokeh.charts import Scatter

    import pandas as pd

    import sys

    output_file("scatter.html")

    data = pd.read_csv(sys.argv[1])

    scatter = Scatter(data, x = sys.argv[2], y = sys.argv[3], color = sys.argv[4], legend = "top_left")

    show(scatter)

    Building the Launcher

    Select the “Launchers” button and click “New launcher”. Now you get a set up screen to build the web form.

    Like we figured out writing the script, we need 4 parameters. So click “Add Parameter” 4 times. You can rename them by selecting the text.

    Their exact names aren’t important for the script, they are for the users of the launcher only. The order of the parameters is directly linked to the script. So rename them to File, X, Y, Color.

    After renaming the parameters, you should change their types. In the overview of the parameters, click on “File” and use the dropdown next to “Type” to select “Upload File”. X, Y and Color can remain type “Text”.

    Make sure to put in a clear name for the launcher and you might have an idea for the Description. Your result should look something like this:

    Using the Launcher

    Now it is time to put this to the test! Select the “Launchers” button and click “Run” on the launcher that you have just created.

    The earlier created form appears where you have to fill in the different values. As a dataset, I use scanvote.csv containing the percentage of a population voting “Yes” per district in Finland, Sweden and Norway.

    With this data I would like to create a scatter plot with population (Pop) as X, “Yes” vote percentage as Y (Yes), and the points colored based on the country (Country)

    Putting this in the launcher form will give the picture below. Hit “Run” and wait for the result!

    When they run the Launcher, the user will get an interactive HTML page

    View Article
  • Overview

    Some Domino Standard Environments support launching Visual Studio Code (VSCode) in interactive Workspaces. VSCode is an open-source multi-language editor maintained by Microsoft. Domino can serve the VSCode application to your browser with the power of code-server from Coder.com.

    Prerequisites

    VSCode support is available in the latest versions of the following Domino Standard Environments :

    Domino Analytics Distribution for Python 2.7

    quay.io/domino/base:Ubuntu18_DAD_Py2.7_R3.5-20190501

    Domino Analytics Distribution for Python 3.6

    quay.io/domino/base:Ubuntu18_DAD_Py3.6_R3.5-20190501

    Domino Analytics Distribution for Python 3.7

    quay.io/domino/base:Ubuntu18_DAD_Py3.7_R3.5-20190501

    See below for instruction to add VSCode to older environments ( Link)

    Contents

    Launching a VSCode Workspace

    Option 1: Launching VSCode directly

    Option 2: Launching VSCode from JupyterLab

    Installing VSCode extensions

    Installing VSCode into older environments

    Launching a VSCode Workspace

    When using a VSCode-equipped Domino environment, there are two ways to launch the VSCode application.

    Option 1: Launch VSCode directly

    You can launch VSCode directly from the Workspaces dashboard or Quick Action menu, the same way you would launch RStudio or Jupyter.

    https://github.com/dominodatalab/workspace-configs/archive/2019q2-v1.3.zip

    If launched this way, your Workspace will open with the Domino controls around a VSCode editor. You can work with your project files in VSCode, and commit and sync with the Domino Workspace UI as normal.

    Option 2: Launch VSCode from JupyterLab

    In VSCode-equipped environments, you will also find VS Code IDE as a notebook option in JupyterLab.

    If launched this way, JupyterLab will open a new tab that will serve the VSCode application. This editor is running in the same Domino Run container as your JupyterLab application. However, the VSCode tab will not show the Domino Workspace controls. If you want to sync, commit, or stop your Workspace after working in VSCode, you must do so from the JupyterLab tab.

    Installing VSCode extensions

    You can use the extensions manager in VSCode to install extensions from the marketplace as you would usually. However, note that these extensions are installed only in the current Workspace session, and will not persist once the session is shut down.

    To install persistent extensions that will be available in every new VSCode Workspace, you must build them into your environment. Use the following steps to set up such an environment.

    Find the extension you want to install in the Visual Studio Marketplace. In this example, we'll install the scala-lang extension.

    Microsoft obscures the download URL for the extension by default, so you will need to first open your browser's development tools, then click the Download extension link.

    You can retrieve the download URL for the extension by looking at the request details in your browser's development tools. It should end with /vspackage. Copy this URL for use in your custom environment.

    In Domino, create a new environment. As the base image, you must use one of the VSCode-equipped Domino Standard Environments, listed in the prerequisites at the beginning of this article.

    Add the following instructions to your new environment's Dockerfile, replacing the folder names and example /vspackage URL with the extension URL you retrieved earlier.These commands download the extension, extract the required files, and adds them to the appropriate folder.

    apt-get update -y

    apt-get install -y bsdtar

    mkdir -p /home/ubuntu/.local/share/code-server/extensions/ms-python.python-2019.3.6558 && \

    cd /home/ubuntu/.local/share/code-server/extensions/ms-python.python-2019.3.6558

    curl -JL https://marketplace.visualstudio.com/_apis/public/gallery/publishers/ms-python/vsextensions/python/2019.3.6558/vspackage | bsdtar -xvf - extension

    cd /home/ubuntu/.local/share/code-server/extensions/ms-python.python-2019.3.6558/extension/ && mv * ../

    chown ubuntu:ubuntu /home/ubuntu/.local/share/code-server/

    When finished, click Build. Following a successful build you can use this new environment to launch VSCode Workspace sessions with your desired extensions already installed.

    Installing VSCode to older environments

    VSCode can be added to some older environments by adding the following to your compute environment. The base environment must be2018-05-23 or newer and you must have the Pluggable Tools feature enabled.

    You can add the following to your compute environment docker file instructions:

    #note: Make sure you are using the latest release if you'd like the latest version of the workspaces

    #https://github.com/dominodatalab/workspace-configs/releases

    RUN cd /tmp && wget && \

    unzip 2019q2-v1.3.zip && cp -Rf workspace-configs-2019q2-v1.3/vscode /var/opt/workspaces/vscode && \

    rm -rf /var/opt/workspaces/workspace-logos && \

    rm -rf /tmp/workspace-configs-2019q2-v1.3

    RUN \

    chmod +x /var/opt/workspaces/vscode/install && sleep 2 && \

    /var/opt/workspaces/vscode/install

    and add the following to your compute environment's "Pluggable Workspace Tools":

    vscode:

    title: "vscode"

    iconUrl: "https://raw.github.com/dominodatalab/workspace-configs/develop/workspace-logos/vscode.svg?sanitize=true"

    start: [ "/var/opt/workspaces/vscode/start" ]

    httpProxy:

    port: 8888

    View Article

Curious about Domino Data Lab?

Anonymously Ask Domino Data Lab Any Question

Ask Anonymous Question

×
Rate your company