DataHub is an enterprise data integration platform that enables organizations to maximize business value and productivity from their content.
It connects disparate storage platforms and business applications together, allowing organizations to move, copy, synchronize, gather and organize files as well as their related data across any system.
DataHub empowers your users with unified access to the most relevant, complete and up-to-date content – no matter where it resides.
DataHub delivers a user-friendly web-based experience that is optimized for PC, tablet and mobile phone interfaces—so you can monitor and control your file transfers anywhere, from any device.
DataHub’s true bi-directional hybrid/sync capabilities enable organizations to leverage and preserve content across on-premises systems and any cloud service. Seamless to users, new files/file changes from either system are automatically reflected in the other.
How does DataHub Work?
Cloud storage and collaboration platforms continue to be the driving force of digital transformation within the enterprise. However, users need to readily access the content that resides within your existing network file systems, ECM, and other storage platforms – enabling them to be productive, wherever they are. DataHub is purpose-built to provide boundless enterprise content integration possibilities, the DataHub Platform is 100% open and provides a highly-scalable architecture that enables enterprises to easily meet evolving technology and user demands—no matter how complex.
The DataHub platform provides:
-
A low risk approach to moving content to the cloud while maintaining on-premises systems
-
No impact to users, IT staff, business operations or existing storage integrations
-
Ability to extend cloud storage anywhere/any device capabilities to locally-stored content
-
Easy integration of newly acquired business storage platforms into existing infrastructures
The Engine
DataHub’s bi-directional synchronization engine enables your enterprise to fully-integrate and synchronize your existing on-premises platforms with any cloud service.
It empowers your users to freely access the content they need while IT staff maintains full governance and control. DataHub integrates with each system's published Application Program Interface (API) at the deepest level—optimizing transfer speeds and preserving all file attributes.
Security
DataHub’s 100 percent security-neutral model does not incorporate or use any type of proxy cloud service or other intermediary presence point. All content and related data is streamed directly via HTTPS [256-bit encryption] from the origin to the destination system(s). Additionally, DataHub works with native database encryption.
Analyzer - Simulation Mode
The DataHub analyzer is a powerful enterprise file transfer simulation that eliminates the guesswork. You will gain granular insight into your entire content landscape including its structure, the use of your files, how old and what type they are, what the metadata contains and more, no matter where the files are located—whether in local storage, remote offices or on user desktops.
Simulation mode allows you to create a job with all desired configuration options set and execute it as a dry-run. In this mode, no data will actually transfer, no permissions will be set, no changes will be made to either the source or the destination. This can be useful in answering several questions about your content prior to actually running any jobs against your content.
Features and Functionality
The DataHub Platform enables you with complete integration and control over:
-
User accounts
-
User networked home drives
-
User and group permissions
-
Document types, notes, and file attributes
-
Timestamps
-
Versions
-
Departmental, project, and team folders
-
Defined and custom metadata
Architecture and Performance
The DataHub platform is built upon a pluggable, content–streaming architecture that enables highly–automated file/data transfer and synchronization capabilities for up to billions of files. File bytes stream in from one connection endpoint (defined by the administrator), across a customer owned and operated network and/or cloud service, then streamed out to a second connection endpoint. Content can also flow bidirectionally across two endpoints, rather than solely from an "origin" to a "destination".
Supported Features
The DataHub Platform Comparison tool allows you to compare platform features and technical details to determine which are supported for your transfer scenario.
Viewing the Platform Comparison results for your integration will display a list of features of each platform and provide insight early in the integration planning process on what details may need further investigation.
The Platform Comparison tool is available via the Connection, Platforms menu options.
Connection Setup
DataHub is built on a concept of connections. A connection is made to the source platform and then another connection is made to the destination platform. A job is created to tie the two platforms together.
When DataHub connects to a content platform, it does so by using the publicly available Application Programming Interface (API) for the specific platform. This ensures that DataHub is “playing by the rules” for each platform.
Connections “connect” to a platform as a specific user account. The user account requires the proper permissions to the platform to read/write/update/delete the content, according to what actions the DataHub job is to perform.
The connection user account should also be set up so that the password does not expire, otherwise the connection will no longer be able to access the platform until the connection has been refreshed with the new password.
Most connections require a specific user account and its corresponding password. The user account is typically an email address.
Authenticated Connections
Authenticated Connections are accounts that have been verified with the cloud-based or network-based platform when created. The connection can be user/password-based or done through OAuth2 flow, where a token is generated based on the granting authorization to DataHub through a user login. This authorization allows DataHub access to the user's drive information (files and folder) on the platform. These connections are used as the source or the destination authentication to transfer your content.
OAuth2 Interactive (Web) Flow
-
Connectors such as Box, Google Drive and Dropbox use the OAuth2 interactive (or web) flow
OAuth2 Client Credentials Flow
-
Connections such as Syncplicity and GSuite uses the OAuth2 client credentials flow
SharePoint
-
SharePoint (all versions, CSOM) uses a custom username/password authentication model
OAuth2 Interactive (Web) Flow
You will need the following information when creating a connection to Network File System, Box, Dropbox and Dropbox for Business:
-
A name for the connection
-
The account User ID, such as jsmith@company.com
-
The password for the User ID
Create Connections
DataHub is built on a concept of connections. A connection is made to the source platform and then another connection is made to the destination platform. Next a job is created to tie the two platforms together.
When DataHub connects to a content platform, it does so by using the publicly available Application Programming Interface (API) for the specific platform. This ensures that DataHub is “playing by the rules” for each platform.
Connections “connect” to a platform as a specific user account. The user account requires the proper permissions to the platform to read/write/update/delete the content, according to what actions the DataHub job is to perform.
The connector user account should also be set up so that the password does not expire, otherwise, the connection will no longer be able to access the platform until the connection has been refreshed with the new password.
Most connections require a specific user account and its corresponding password. The user account is typically an email address.
Creating a Connection
Creating a connection in the DataHub Platform user-interface is easy! Simply add a connection, select your platform and enter the requested information. DataHub will securely validate your credentials and connect to your source content.
Create a Dropbox for Business Connection
The Dropbox connector in DataHub allows you to analyze, migrate, copy, and synchronize files to your Dropbox account from cloud storage repositories and on-premise network file shares. The first step is to create the Dropbox connection by providing the connection information required for DataHub to connect to the server. The connector can be created using a standard Dropbox account or a Dropbox for Business account. However, when creating a connection using a Dropbox for Business account, you must use an Administrator account. Standard accounts will return an error from the Dropbox for Business platform.
Known Issue
-
Connection-based Impersonation should be used due to a caching issue which may incorrect shared folder detection
-
Connection-based Impersonation is shown in the user-interface as "Run As...." option on the Locations step when creating a new job
-
Path-based impersonation should not be used, first 'folder' is a user name such as "path": "/user@company.com/"
Create Connection - DataHub Application User-Interface
-
Select Connections > Add connection.
-
Select Dropbox for Business as the platform on the Add connection modal.
-
Enter the connection information. Reference the table below for details about each field.
-
Select Sign in with Dropbox for Business.
-
On the Dropbox API Request Authorization modal, enter the Email and Password required to log in to the account and select Sign In. Only an Administrator account can be used for Dropbox for Business. Standard accounts will return an error from the Dropbox for Business platform.
-
You will see a "Connection test succeeded" message on the Add connection modal. (If you don't see this message, repeat the sign in and authorization steps above.)
-
Select Done to finish creating the connection.
Field |
Description |
Required |
---|---|---|
Display as |
Enter the display name for the connection. If you will be creating multiple connections, ensure the name readily identifies the connection. The name displays in the application, and you can use it to search for the connection and filter lists. If you do not add a display name, the connection will automatically be named, "Dropbox for Business." DataHub recommends that you edit the name to more readily identify the connection. |
Optional |
Platform API client credentials |
Required |
|
Use the system default client credentials |
Select this option to use the default DataHub client application. |
|
Use custom client credentials |
Select this option to use custom client credentials provided by your administrator. When selected, two additional fields will be available to enter the credentials. |
|
Client ID |
This field displays only when you select Use custom client credentials. This value will be provided by your administrator. |
Optional |
Client Secret |
This field displays only when you select Use custom client credentials. This value will be provided by your administrator. |
Optional |
Features and Limitations
Platforms all have unique features and limitations. DataHub’s transfer engine manages these differences between platforms and allows you to configure actions based on Job Policies and Behaviors. Utilize the Platform Comparison tool to see how your integration platforms may interact regarding features and limitations.
Files and Folders
Below is list of Dropbox's supported and unsupported features as well as additional file/folder restrictions.
Supported |
Unsupported |
Other Features/Limitations |
---|---|---|
(See "Versioning" below for additional details.) |
< > \\ / : ? * \ | ~ |
|
File size maximum: 50 GB or smaller |
||
Segment path length: 255 |
||
Maximum number of files per folder: 10,000 |
||
No trailing spaces after file extensions |
||
|
|
|
|
|
|
|
|
Versioning
The Dropbox platform treats content version history differently than other cloud platforms, with each native move and rename tracked as a new version. In order to only transfer true version history, enable the Scripting option on step 6 during job creation and copy the following JSON into the scripting box:
|
For more information about how to use Advanced Scripting, refer to the Advanced Scripting | Additional Configuration Options page. |
Version Limits
While Dropbox allows you to store unlimited versions, the platform restricts version downloads to 100. This means only the most recent 100 versions for a given file will be transferred to the destination. Refer to the Dropbox API Documentation for more information. You can also find additional information in the Dropbox Forum.
Author Preservation
The Dropbox connector uses "per-request impersonation." This means DataHub makes requests to the platform on behalf of the account owner, not the administrator. Therefore, files created by other users will fail to upload unless the account owner mounts the shared parent folder into their Dropbox drive before uploading the file. (See https://help.dropbox.com/files-folders/add-shared-folder for more information.) Failed uploads will be logged with a message similar to the following example: "[PermissionFailure] The account 'Joe Smith' is not authorized to perform the requested operation."
Business Team Folders
Team folders transfer automatically with any transfer job; however, Team and Shared folders must first be mounted in Dropbox (Dropbox for Business API only surfaces mounted Team and Shared folders). If you include Job Filters | Filter Shared Content, Team Folders will not be transferred. The Dropbox for Business connector creates shared user folders by default. To create shared folders in a Team Folder, the root Team Folder must be created in the Dropbox UI and selected as the root of the job. Shared folders that are created in a Team Folder support permission disinheritance. Shared folders created in a user folder do not support disinheritance and will log a warning.
Create a Syncplicity Connection
The Syncplicity connector in DataHub allows you to analyze, migrate, copy, and synchronize files from your Syncplicity service to cloud storage repositories and on-premise network file shares. DataHub connections to Syncplicity require OAuth 2.0 access. In order to create a connection from DataHub to Syncplicity, you will need to complete configuration on the Syncplicity side, and you will need to provide several pieces of authentication information. To learn more, click here.
Create a Syncplicity Connection
-
Select Connections > Add connection.
-
Select Syncplicity as the platform on the Add connection modal.
-
Enter the connection information. Reference the table below for details about each field.
-
Test the connection to ensure DataHub can connect using the information entered.
-
Select Done.
Field |
Value |
Description |
Optional/Required |
---|---|---|---|
Display as |
User-Defined Text Field |
Enter the display name for the connection. If you will be creating multiple connections, ensure the name readily identifies the connection. The name displays in the application, and you can use it to search for the connection and filter lists. |
Required |
Application token |
Provided by your Syncplicity administrator |
Each user can provision a personal application token, which may be used to authenticate in UI-less scenarios via API. This is especially useful for account administration tasks that run in a headless session. If provisioned, an application token is the only information required to log in a user using OAuth 2.0 resource owner credentials grant. You should protect this token.To learn more, click here. |
Required |
App key |
Provided by your Syncplicity administrator |
Identifier of the third-party application as defined by OAuth 2.0. To learn more, click here. |
Required |
App secret |
Provided by your Syncplicity administrator |
The secret (password) of the third-party application as defined by OAuth 2.0. Used with an application key to authenticate a third-party application. To learn more, click here. |
Required |
New SyncPoint type |
Syncpoint type choice |
This option instructs DataHub as to what type of folder should be created when a top level folder is created through a DataHub process. To learn more about these options, click here. |
Optional |
Features and Limitations
Platforms all have unique features and limitations. DataHub’s transfer engine manages these differences between platforms and allows you to configure actions based on Job Policies and Behaviors. Utilize the Platform Comparison tool to see how your integration platforms may interact regarding features and limitations.
Supported Features |
Unsupported Features |
Other Features/Limitations |
---|---|---|
Segment path length: 260 |
||
No leading spaces in file name/folder names |
||
No trailing spaces before or after file extensions |
||
No non-printable ASCII characters |
||
Invalid characters: \ / < > |
||
Only syncpoints can be shared with other users and have permissions persist. |
||
Users with a large number of syncpoints are not supported by Syncplicity. |
If you are creating a new impersonation job with a Syncplicity connection and the source or destination location is empty, the user you are impersonating has too many syncpoints. You will need to delete the syncpoints before you can create the job.,
Connection Pooling
When transferring data between a source and destination, there are a number of factors that can limit the transfer speed. Most cloud providers have rate limitations that reduce the transfer rate, but if those limits are account based and it supports impersonation, DataHub can create a pool of accounts that issues commands in a round-robin format across all of the accounts connected to the pool. Any modifications to the connection pool will used on the next job run.
For example, if a connection pool has two accounts, all commands will be alternated between them. If a third account is added to the pool, the next run of the job will use all three accounts.
Not Supported:
-
"My Computer" and Network File Share (NFS) connections are not supported with Connection Pooling.
User & Group Maps
A user account or group map provides the ability to explicitly associate users and groups for the purposes of setting ownership and permissions on items transferred. These mappings can happen automatically using rules or explicitly using an exception. Accounts or groups can be excluded by specifying an exclusion, and unmapped users can be defaulted to a known user.
Here are a few things to consider when creating an account or group map:
-
A source and destination connection are required and need to match the source and destination of the job that will be referencing the user or group map.
-
A map can be created before or during the creation of the job.
-
A map can be used across multiple jobs.
-
Once updated, the updates will not be reapplied to content that has already been transferred.
User & Group Map Import Templates
Please see Account Map / Group Map | CSV File Guidelines for map templates and sample downloads.
User & Group Map Exceptions
A user or group map exception provides the ability to explicitly map a specific user from one platform to another. These are exceptions to the automatic account or group mapping policies specified. User account or group map exceptions can be defined during the creation of the map or can be imported from a comma-separated values (CSV) file.
User & Group Map Exclusions
A user or group map exclusion provides the ability to explicitly exclude an account or group from owner or permissions preservation. User account or group map exclusions can be defined during the creation of the map or can be imported from a comma-separated values (CSV) file.
Transfer Planner
At the start of a project, it is common to begin planning with questions like "How long should I expect this to take?"
Transfer Planner allows you to outline the basic assumptions of any integration, primarily around the initial content copy at the beginning of a migration or first synchronization. It uses basic assumptions to begin visualization of the process, without requiring any setup of connections or jobs.
The tool estimates and graphs a time line to complete the transfer based on the information entered in the Assumptions area. The time line assumes a start date of today and uses the values in the Assumptions section to model the content transfer.
The Transfer Planner automatically recalculates the predicted time line if you change any of the values, making simple “what if?” scenario evaluations. Press Reset to restore default values for the transfer planner tool.
The window displays projected Total Transfer in dark blue and Daily Transfer Rate in light blue. Hovering the mouse pointer over the graph displays estimated transfer details for that day.
You can see the impact on the project timeline by changing the values in the Assumptions area. The graph will redraw to reflect your new values.
Note that the Transfer Planner is primarily driven by the amount of data needing to be processed. DataHub has various tools for transferring versions of files (if the platform supports this feature), which can increase the size of your data set. It also has the ability to filter out specific files by their type or by other rules you set. At this stage, a rough estimate of total size is recommended as it can refined later using Simulation Mode.
Simulation Mode
Simulation mode allows you to create a job with all desired configuration options set and execute it as a dry run. In this mode, no data will actually transfer, no permissions will be set, and no changes will be made to either the source or the destination.
This can be useful in answering several questions about your content prior to actually running any jobs against your content.
How much content do I have?
-
An important first step in any migration is to determine how much content you actually have, as this can help in determining how long a migration will take.
What kinds of content do I have?
-
Another important step in any migration is to determine what kinds of content you actually have.
-
Many organizations have accumulated a lot of content and some of that may not be useful on the desired destination platform.
-
The results of a simulation mode job can help you determine if you should introduce any filter rules to narrow the scope of the job.
-
An example would be if you should exclude executable files (.exe or .bat files) or exclude files older than 3 years old.
What kinds of issues should I expect to run into?
-
During the course of a migration, there are many things to consider and unknown issues that can arise, many of which will only present themselves once you start doing something with the source and destination.
-
Running a job in simulation mode can help you identify some of those issues before you actually start transferring content.
Examples can include:
-
Are my user mappings configured correctly?
-
Does the scope of the job capture everything that I expected it to capture?
-
Do I have files that are too large for the destination platform?
-
Do I have permissions that are incompatible with the destination platform (i.e. ACL vs waterfall)?
-
Do I have files or folders that are too long or contain invalid characters that the destination platform will not accept?
Create a Simulation Job
During the job creation workflow, the last stage before creating the job there will be an option to enable simulation mode.
When a job is in simulation mode, it can be run and scheduled like any other job, but no data will be transferred.
Transition a Simulation Job to Transfer Content
After review, a simulation job can be transitioned to a live job that will begin to transfer your content to the destination platform.
Create a Job
DataHub delivers a user-friendly web-based experience that is optimized for PC, tablet and mobile phone interfaces—so you can monitor and control your file transfers anywhere, from any device.
DataHub’s true bi-directional hybrid/sync capabilities enable organizations to leverage and preserve content across on-premises systems and any cloud service. Seamless to users, new files/file changes from either system are automatically reflected in the other.
DataHub uses jobs to perform specific actions between the source and destination platforms. The most common type of jobs are copy and sync; please see Create New Job | Transfer Direction for more information.
All jobs can be configured to run manually or on a defined schedule. This option will be presented as the last configuration step.
To create a job, select the Jobs option from the left menu and click on Create Job. DataHub will lead you through a wizard to select all the applicable options for your scenario.
The main job creation steps include:
-
Selecting a Job Type
-
Configuring Locations
-
Defining Transfer Policies
-
Defining Job Transfer Behaviors
-
Advanced Options
-
Summary | Review, Create Job, and Schedule
Job Type
Job type defines the kind of job and the actions the job will perform with the content. There are two main job types available: basic transfer and folder Mapping.
Basic Transfer - Transfer items between one connection and another
This will copy all content (files, folders) from the source to the destination. Each Job run will detect any new content on the Source and copy to the Destination
For more information, please see Create New Job | Transfer Direction.
Define Source & Destination Locations
All platform connections made in the DataHub Platform application will be available in the locations drop-down lists when creating a job.
-
If your connections were created with Administrative privileges, you may also have the ability to impersonate another user within your organization.
-
Source defines the location of your current content you wish to transfer.
-
Destination defines the location of where you would like your content to go.
Configuring Your Locations - Impersonation
Impersonation allows a site admin access to all the folders on the site, including those that belong to other users. With DataHub, a job can be set up using the username and password of the site admin to sync/migrate/copy files to or from a different user's account without ever having the username or password of that user.
How and why would I use impersonation?
This allows a site admin access to all the folders on the site, including those that belong to other users. Within DataHub, a job can be set up using the username and password of the site admin to sync/migrate/copy files to or from a different user's account without ever having the username or password of that user.
Enable Run as user...
Choose Source User
Job Category
The category function allows for the logical grouping of jobs for reporting and filtering purposes. The category is optional and does not alter the job function in any way.
DataHub comes with two default job categories:
Maintenance: DataHub maintenance jobs only. This category allows you to view the report of background maintenance jobs and is not intended for newly created transfer jobs.
Default: When a category is not defined during job creation, it will automatically be given the default category. This option allows you to create a report for all jobs that a custom category was not assigned.
Create Job Category
Enable feature and select from existing job categories or create a new category.
From the jobs grid, filter by category
Job Policies
Define what should happen once items have been successfully transferred and set up rules around how to deal with content as it is updated on your resources while the job is running.
-
DataHub works on the concept of “deltas” where the transfer engine only transfers files after they have been updated.
-
File version conflicts occur when the same file on the source and destination platforms have been updated in between job executions.
-
Policies define how DataHub handles file version conflicts and whether or not it persists a detected file deletion.
-
Each job has its own policies defined and the settings are NOT global across all jobs.
Conflict Policy - File Version Conflicts
When a conflict is detected on either the source or the destination, Conflict Policy determines how DataHub will behave.
For more information, please see Conflict Policy.
Delete Policy - Deleted Items
When a delete is detected on either the Source or the Destination, Delete Policy determines how DataHub will behave.
For more information, please see Delete Policy.
Behaviors
Behaviors determine how this job should execute and what course of action to take in different scenarios. All behaviors are enabled by default as recommended settings to ensure content is transferred successfully to the destination.
Zip Unsupported Files / Restricted Content
Enabling this behavior allows DataHub to compress any file that is not supported on the destination into a .zip format before being transferred. This will be done instead of flagging the item for manual remediation and halting the transfer of the file.
For example, if you attempt to transfer the file "db123.cmd" from a Network File Share to SharePoint, DataHub will compress the file to "db123.zip" before transferring it over, avoiding an error message.
Allow unsupported file names to be changed
Segment Transformation policy controls if DataHub can change folder and file names to comply to platform's restrictions.
Enabling this behavior allows DataHub to change the names of folders and files that contain characters that are not supported by the destination before transferring the file. This will be done instead of flagging the file for manual remediation and preventing it from being transferred.
When this occurs, the unsupported character will be transformed into a underscore.
For example, if you attempted to transfer the file "Congrats!.txt" from box to NFS, it would be transformed to "Congrats_.txt" and appear that way on the destination.
Preserve file versioning between locations
DataHub will preserve and transfer all versions of a file on supported platforms.
Advanced
These optional job configurations determine what features you want to preserve, filter or add during your content transfer.
Filtering
Filtering defines rules for determining which items are included or excluded during transfer. For more
information, please see Job Filters.
Job Filters | Filter By Name Pattern
Job Filters | Filter By Extensions or Type
Job Filters | Filter By Date Range or Age
Job Filters | Filter by Metadata
Job Filters | Metadata Conjunctions
Permission Preservation
This setting enables DataHub to determine how permissions are transferred across platforms.
Permissions | Author / Owner Preservation
Permissions | Permissions Preservation
Permissions | Permissions Import
Permissions | Preserve Shared Links
Metadata Mapping
Metadata mapping allows you to document your source metadata and map how you want it applied to the destination in CSV format. Enabling this feature will offer the ability to import the CSV file and apply it during job creation.
For more information, please see Metadata Import.
Scripting
Some DataHub features are not available yet in the user-interface. The scripting feature allows the advanced DataHub user to enter advanced transfer features by inserting JSON formatted job controls.
Enabling this option will allow you to leverage these features and apply it during job creation.
Job Summary - Review your job configuration
Before you create your job, review all your configurations and adjust as needed. Modifying your job after creation is not supported; however, the option to duplicate your current job will allow you to make any adjustments without starting from the beginning.
-
The Edit option will take you directly to the configuration to make changes.
Define Job Schedule
During job creation, the final step is to define when the job will run and what criteria will define when it stops.
-
Save job will launch the job scheduler.
-
Save job and run it right now will trigger the job to start immediately. It will run every 15 mins after the last execution completes.
Schedule Stop Policies
Stop policies determine when a job should stop running. If none of the stop policies are enabled, a scheduled job will continue to run until it is manually stopped or removed.
The options for the stop policy are:
Stop after a number of total runs
The number of total executions before the job will move to "complete" status
Stop after a number of runs with no changes
The job has run and detected no further changes; all content has transferred successfully.
If new content is added to the source and the job runs again, this will not increment your stop policy count. However, job executions that detect no changes do not need to be consecutive to increment your stop policy count.
Stop after a number of failures
Most failures are resolved through automatic retries. If the retries fail to resolve the failures, then manual intervention is required. This policy takes the job out of rotation so that the issue can be investigated.
Job executions that detect failures do not need to be consecutive to increment your stop policy count.
Stop after a specific date
The job will "complete" on the date defined
Reports
Reporting is paramount with the DataHub Platform. Whether you chose to utilize the DataHub manager application, CLI, or ReST API, reporting options are available to help you manage and surface data about your content in real-time.
Out-of-the-box reports include:
-
Dashboard: Provides an overview of what is happening across all your content
-
Job Overview: Detailed job information including source, destination, schedule and current status
-
Flagged Items: Content that did not transfer and requires attention
-
Content Insights: Breakdown of your transferred data
-
Sharing Insights: Breakdown of all permissions associated based on your source content
-
User Mappings: The permission associations of your content
-
Item Report: Information on each item that transferred
-
Validation: At any time, you may run a validation run, which will trigger a full inspection of all content relating to the option you select for the next run only.
Job Overview Report
This report provides detailed transfer information for the individual job.
Schedule: Provides information on how many times the job has executed, when the job will run again and progress towards meeting the job stop policy defined
Transfer Details | Identified Chart: Reflects content identified on the source platform and the status summary for items
Transfer Details | Revised Chart: Reflects content that DataHub revised during transfer to meet destination requirements and user-defined job configurations
Transfer Details | Flagged Chart: Reflect content that DataHub could not transfer. Manual remediation is required
Run Breakdown Report: Provides job history information for each execution for the given job
-
Note: Last Activity in the Run Breakdown will only appear during the job execution.
In some circumstances, bytes on the destination can be higher than listed on the source. This discrepancy is caused by property promotion on Word documents. For more information, see Report Values | Potential Differences due to Post Processing.
Values in the run breakdown may differ from values presented in the charts. This is because the run breakdown tracks each individual occurrence where as an item can only exist in a single chart category.
Example: When an item is both truncated AND ignored, it would not show up in the "Revised" chart but would show up in the "Revised" run breakdown
The run breakdown also shows both files and folder values. The charts display files and folder values separately, with the "Transfer Details" dropdown available to switch between display values.
Job Content Insights Report
This report provides detailed content information for the individual job.
Use the drop-down options to change the chart views.
Job Sharing Insights Report
This report offers a breakdown of all permissions associated to your content. The values presented are based on the source content.
On the Shared Insights tab for a job, the value "Not Shared" represents both items that have no permissions as well as content shared by inheritance from the parent folder. At this time, DataHub only tracks permissions applied during transfer, not permissions that result from inheritance within the hierarchy.
Job User Mappings Report
The User Mappings report for a given job presents the permission breakdown of your content.
If any of the following features are enabled, User Mapping report will populate:
Job Validation Report
Control the level of tracking and reporting for content that exists on both the source and destination platform, including content that has been configured to be excluded from transfer and content that existed on the destination prior to the initial transfer.
Items that have been ignored / skipped by policy or not shown because they already existed on the destination can now be seen on reports with the defined categories.
The default validation option is inspect none. This option does not need to be configured in the application user-interface or through the ReST API; it is the system default.
This configuration will not track all items but will offer additional tracking with performance in mind. Inspect none will track all items on the source at all levels of the hierarchy but not including those configured to be ignored/skipped through policy. For the destination, all content in the root (files & folders) that existed prior to the initial transfer will be tracked as destination only items and reported as ignored/skipped.
This option has the following features:
Source: All content (files and folders) at all levels in the hierarchy, but not including those configured to be ignored/skipped through policy. However, if the connection does not have access to a given folder in the hierarchy, we cannot track and report these items.
Destination: All content in the root (files and folders) that existed prior to the initial transfer will be tracked as destination only items and reported as ignored/skipped.
Destination: All content (files and folders) lower depths of the directory (sub-folders) that existed prior to the initial transfer will not be tracked.
If the connection does not have access to a given folder in the hierarchy, we cannot track and report these items.
Job Reports - Validation tab: At any time, you may run a validation run, which will trigger a full inspection of all content relating to the option you select for the next run only.
-
For more information, see Job Validation | Item Inspection Policy.
Generate Job Reports
DataHub Reports provide several options to combine many jobs into a single report for review. Reports are generated by category, individually selected jobs or by convention job parent (user account mapping, network home drive mapping or folder mapping job types).
Reports are separated by two tabs so you can clearly distinguish between jobs that are actively transferring content and simulation jobs that imitate transfer.
If no category is defined during job creation, it will be assigned to the default job category.
Generate Report
Select Report Type
Define what the report will contain
-
Category: Defined during job creation
-
Parent jobs: Relating to convention jobs such as user account mapping, network home drive mapping or folder mapping job types
-
Manually select jobs: Choose each job individually that you want in your report
Remediation
Items that were unable to be transferred by the DataHub Platform will be flagged for manual remediation. Items can be flagged for many reasons, and in some cases, still transferred to the destination platform. Each item is a package, consisting of the media itself, version history, author, sharing and any other metadata. DataHub ensures all pieces of the item package are transferred to the destination to preserve data integrity. When an item is flagged, DataHub is indicating that all or some portion of this failed to migrate.
All migrations require some amount of manual intervention by the client to move content that fails to transfer automatically.
-
Note that one of the uses of simulation mode is to get an understanding prior to a live transfer of how many files might fail to transfer and the reasons.
-
This can be used to adjust the job parameters to achieve a higher number of automatic remediation successes.
General Reasons Content does not Transfer
Errors from Source & Destination Platforms
This is a broad error category that indicates DataHub was prevented from reading, downloading, uploading or writing content during content transfer by either the source or destination platform provider. Each situation is dependent on the storage provider rejection reason and will require manual investigation to resolve.
Insufficient Permissions
Many platforms may require additional permissions in order to perform certain functions, even for site administrator accounts. These permissions typically require a special request from the storage provider. For example, content that has been locked, hidden or has been flagged to disable download may require this special permission request from your storage provider.
Scenario-Specific Configuration
Content on your source storage platform is diverse, and users across your business will structure their data in a wide-variety of different ways. A single one-size-fits-all project configuration may not be suitable and can result in some content not transferring to the destination platform. DataHub will assist in assessing these situations to help provide custom, scenario-specific configuration that may workaround the issue that is preventing the transfer.
Disparate Platform Features
Each platform provider has a given set of features that are generally shared concepts in the storage business industry. However, within each storage platform, there can be behavioral or rule differences within these features, and aligning these discrepancies can be challenging. Features such as permission levels (edit, view, view+upload, etc.) may not align as an exact match to the destination platform, file size restrictions or file names may need to be altered to conform to meet the destination platform's policies. DataHub will attempt to accommodate these restrictions through configurations in the system; however, not all scenarios can be covered in a diverse data set.
Interruption in Service
DataHub must maintain connection to the database at all times during the transfer process. If there is an interruption in service, DataHub will fail the transfer as it is unable to track / write to the database.
How do I validate my content transferred successfully?
Verify the destination
DataHub will report all content that has transferred to the destination. Log into your destination platform and verify the content is located as expected.
DataHub is reporting items in "pending" or "retrying" status, what are my next steps?
Run the job again
DataHub defaults to retrying the job 3 times to reconcile items that are in pending/retry status. Depending on your job configuration, this may occur with the defined schedule or you can start the job manually.
Review the log message
DataHub logs a reason why the item is in pending/retry status. On the job "Overview" tab, click on the Transfer Details breakdown status "retrying". This will direct you to the filtered "Items" list. Select the item then click the "View item history" link on the right toolbox.
DataHub is reporting items in "Flagged" status, what are my next steps?
When an item is in "flagged" status, this means DataHub has made all attempts to transfer the file without success, and it requires manual remediation.
Review the log message
DataHub logs a reason why the item has been flagged. On the job "Overview" tab, click on the Transfer Details breakdown status "flagged". This will direct you to the filtered "Items" list. Select the item and click the "View item history" link on the right toolbox.
Review the message and determine if you can resolve on the source platform.
Review all flagged items
These are the recommended ways to view all flagged items: export the flagged item report or review the "Flagged Items" page.
Export report:
-
Job Report → Items tab → Filter by Status: Flagged
-
Click "Export this report" → Save CSV file for review
Review "Flagged Items" page:
-
Retry or Ignore individual items
-
View Item History for individual items
-
Link back to the job the flagged items is associated with
-
Export all Flagged Items report
Doc rebrand check list
- Change SkySync to DataHub, except in code examples and literals.
- Verify hyperlinks and change or delete as necessary.
- Check graphics. Delete or replace as necessary. If deleted, make sure text doesn't reference the graphic.
- SME reviewed.
- SME approved.
Introduction
DataHub is an enterprise data integration platform that enables organizations to maximize business value and productivity from their content.
It connects disparate storage platforms and business applications together, allowing organizations to move, copy, synchronize, gather and organize files as well as their related data across any system.
DataHub empowers your users with unified access to the most relevant, complete and up-to-date content – no matter where it resides.
DataHub delivers a user-friendly web-based experience that is optimized for PC, tablet and mobile phone interfaces—so you can monitor and control your file transfers anywhere, from any device.
DataHub’s true bi-directional hybrid/sync capabilities enable organizations to leverage and preserve content across on-premises systems and any cloud service. Seamless to users, new files/file changes from either system are automatically reflected in the other.
How does DataHub Work?
Cloud storage and collaboration platforms continue to be the driving force of digital transformation within the enterprise. However, users need to readily access the content that resides within your existing network file systems, ECM, and other storage platforms – enabling them to be productive, wherever they are. DataHub is purpose-built to provide boundless enterprise content integration possibilities, the DataHub Platform is 100% open and provides a highly-scalable architecture that enables enterprises to easily meet evolving technology and user demands—no matter how complex.
The DataHub platform provides:
-
A low risk approach to moving content to the cloud while maintaining on-premises systems
-
No impact to users, IT staff, business operations or existing storage integrations
-
Ability to extend cloud storage anywhere/any device capabilities to locally-stored content
-
Easy integration of newly acquired business storage platforms into existing infrastructures
The Engine
DataHub’s bi-directional synchronization engine enables your enterprise to fully-integrate and synchronize your existing on-premises platforms with any cloud service.
It empowers your users to freely access the content they need while IT staff maintains full governance and control. DataHub integrates with each system's published Application Program Interface (API) at the deepest level—optimizing transfer speeds and preserving all file attributes.
Security
DataHub’s 100 percent security-neutral model does not incorporate or use any type of proxy cloud service or other intermediary presence point. All content and related data is streamed directly via HTTPS [256-bit encryption] from the origin to the destination system(s). Additionally, DataHub works with native database encryption.
Analyzer - Simulation Mode
The DataHub analyzer is a powerful enterprise file transfer simulation that eliminates the guesswork. You will gain granular insight into your entire content landscape including its structure, the use of your files, how old and what type they are, what the metadata contains and more, no matter where the files are located—whether in local storage, remote offices or on user desktops.
Simulation mode allows you to create a job with all desired configuration options set and execute it as a dry-run. In this mode, no data will actually transfer, no permissions will be set, no changes will be made to either the source or the destination. This can be useful in answering several questions about your content prior to actually running any jobs against your content.
Features and Functionality
The DataHub Platform enables you with complete integration and control over:
-
User accounts
-
User networked home drives
-
User and group permissions
-
Document types, notes, and file attributes
-
Timestamps
-
Versions
-
Departmental, project, and team folders
-
Defined and custom metadata
Architecture and Performance
The DataHub platform is built upon a pluggable, content–streaming architecture that enables highly–automated file/data transfer and synchronization capabilities for up to billions of files. File bytes stream in from one connection endpoint (defined by the administrator), across a customer owned and operated network and/or cloud service, then streamed out to a second connection endpoint. Content can also flow bidirectionally across two endpoints, rather than solely from an "origin" to a "destination".
Supported Features
The DataHub Platform Comparison tool allows you to compare platform features and technical details to determine which are supported for your transfer scenario.
Viewing the Platform Comparison results for your integration will display a list of features of each platform and provide insight early in the integration planning process on what details may need further investigation.
The Platform Comparison tool is available via the Connection, Platforms menu options.
Connection Setup
DataHub is built on a concept of connections. A connection is made to the source platform and then another connection is made to the destination platform. A job is created to tie the two platforms together.
When DataHub connects to a content platform, it does so by using the publicly available Application Programming Interface (API) for the specific platform. This ensures that DataHub is “playing by the rules” for each platform.
Connections “connect” to a platform as a specific user account. The user account requires the proper permissions to the platform to read/write/update/delete the content, according to what actions the DataHub job is to perform.
The connection user account should also be set up so that the password does not expire, otherwise the connection will no longer be able to access the platform until the connection has been refreshed with the new password.
Most connections require a specific user account and its corresponding password. The user account is typically an email address.
Authenticated Connections
Authenticated Connections are accounts that have been verified with the cloud-based or network-based platform when created. The connection can be user/password-based or done through OAuth2 flow, where a token is generated based on the granting authorization to DataHub through a user login. This authorization allows DataHub access to the user's drive information (files and folder) on the platform. These connections are used as the source or the destination authentication to transfer your content.
OAuth2 Interactive (Web) Flow
-
Connectors such as Box, Google Drive and Dropbox use the OAuth2 interactive (or web) flow
OAuth2 Client Credentials Flow
-
Connections such as Syncplicity and GSuite uses the OAuth2 client credentials flow
SharePoint
-
SharePoint (all versions, CSOM) uses a custom username/password authentication model
OAuth2 Interactive (Web) Flow
You will need the following information when creating a connection to Network File System, Box, Dropbox and Dropbox for Business:
-
A name for the connection
-
The account User ID, such as jsmith@company.com
-
The password for the User ID
Create Connections
DataHub is built on a concept of connections. A connection is made to the source platform and then another connection is made to the destination platform. Next a job is created to tie the two platforms together.
When DataHub connects to a content platform, it does so by using the publicly available Application Programming Interface (API) for the specific platform. This ensures that DataHub is “playing by the rules” for each platform.
Connections “connect” to a platform as a specific user account. The user account requires the proper permissions to the platform to read/write/update/delete the content, according to what actions the DataHub job is to perform.
The connector user account should also be set up so that the password does not expire, otherwise, the connection will no longer be able to access the platform until the connection has been refreshed with the new password.
Most connections require a specific user account and its corresponding password. The user account is typically an email address.
Creating a Connection
Creating a connection in the DataHub Platform user-interface is easy! Simply add a connection, select your platform and enter the requested information. DataHub will securely validate your credentials and connect to your source content.
Create a Dropbox for Business Connection
The Dropbox connector in DataHub allows you to analyze, migrate, copy, and synchronize files to your Dropbox account from cloud storage repositories and on-premise network file shares. The first step is to create the Dropbox connection by providing the connection information required for DataHub to connect to the server. The connector can be created using a standard Dropbox account or a Dropbox for Business account. However, when creating a connection using a Dropbox for Business account, you must use an Administrator account. Standard accounts will return an error from the Dropbox for Business platform.
Known Issue
-
Connection-based Impersonation should be used due to a caching issue which may incorrect shared folder detection
-
Connection-based Impersonation is shown in the user-interface as "Run As...." option on the Locations step when creating a new job
-
Path-based impersonation should not be used, first 'folder' is a user name such as "path": "/user@company.com/"
Create Connection - DataHub Application User-Interface
-
Select Connections > Add connection.
-
Select Dropbox for Business as the platform on the Add connection modal.
-
Enter the connection information. Reference the table below for details about each field.
-
Select Sign in with Dropbox for Business.
-
On the Dropbox API Request Authorization modal, enter the Email and Password required to log in to the account and select Sign In. Only an Administrator account can be used for Dropbox for Business. Standard accounts will return an error from the Dropbox for Business platform.
-
You will see a "Connection test succeeded" message on the Add connection modal. (If you don't see this message, repeat the sign in and authorization steps above.)
-
Select Done to finish creating the connection.
Field |
Description |
Required |
---|---|---|
Display as |
Enter the display name for the connection. If you will be creating multiple connections, ensure the name readily identifies the connection. The name displays in the application, and you can use it to search for the connection and filter lists. If you do not add a display name, the connection will automatically be named, "Dropbox for Business." DataHub recommends that you edit the name to more readily identify the connection. |
Optional |
Platform API client credentials |
Required |
|
Use the system default client credentials |
Select this option to use the default DataHub client application. |
|
Use custom client credentials |
Select this option to use custom client credentials provided by your administrator. When selected, two additional fields will be available to enter the credentials. |
|
Client ID |
This field displays only when you select Use custom client credentials. This value will be provided by your administrator. |
Optional |
Client Secret |
This field displays only when you select Use custom client credentials. This value will be provided by your administrator. |
Optional |
Features and Limitations
Platforms all have unique features and limitations. DataHub’s transfer engine manages these differences between platforms and allows you to configure actions based on Job Policies and Behaviors. Utilize the Platform Comparison tool to see how your integration platforms may interact regarding features and limitations.
Files and Folders
Below is list of Dropbox's supported and unsupported features as well as additional file/folder restrictions.
Supported |
Unsupported |
Other Features/Limitations |
---|---|---|
(See "Versioning" below for additional details.) |
< > \\ / : ? * \ | ~ |
|
File size maximum: 50 GB or smaller |
||
Segment path length: 255 |
||
Maximum number of files per folder: 10,000 |
||
No trailing spaces after file extensions |
||
|
|
|
|
|
|
|
|
Versioning
The Dropbox platform treats content version history differently than other cloud platforms, with each native move and rename tracked as a new version. In order to only transfer true version history, enable the Scripting option on step 6 during job creation and copy the following JSON into the scripting box:
|
For more information about how to use Advanced Scripting, refer to the Advanced Scripting | Additional Configuration Options page. |
Version Limits
While Dropbox allows you to store unlimited versions, the platform restricts version downloads to 100. This means only the most recent 100 versions for a given file will be transferred to the destination. Refer to the Dropbox API Documentation for more information. You can also find additional information in the Dropbox Forum.
Author Preservation
The Dropbox connector uses "per-request impersonation." This means DataHub makes requests to the platform on behalf of the account owner, not the administrator. Therefore, files created by other users will fail to upload unless the account owner mounts the shared parent folder into their Dropbox drive before uploading the file. (See https://help.dropbox.com/files-folders/add-shared-folder for more information.) Failed uploads will be logged with a message similar to the following example: "[PermissionFailure] The account 'Joe Smith' is not authorized to perform the requested operation."
Business Team Folders
Team folders transfer automatically with any transfer job; however, Team and Shared folders must first be mounted in Dropbox (Dropbox for Business API only surfaces mounted Team and Shared folders). If you include Job Filters | Filter Shared Content, Team Folders will not be transferred. The Dropbox for Business connector creates shared user folders by default. To create shared folders in a Team Folder, the root Team Folder must be created in the Dropbox UI and selected as the root of the job. Shared folders that are created in a Team Folder support permission disinheritance. Shared folders created in a user folder do not support disinheritance and will log a warning.
Create a Syncplicity Connection
The Syncplicity connector in DataHub allows you to analyze, migrate, copy, and synchronize files from your Syncplicity service to cloud storage repositories and on-premise network file shares. DataHub connections to Syncplicity require OAuth 2.0 access. In order to create a connection from DataHub to Syncplicity, you will need to complete configuration on the Syncplicity side, and you will need to provide several pieces of authentication information. To learn more, click here.
Create a Syncplicity Connection
-
Select Connections > Add connection.
-
Select Syncplicity as the platform on the Add connection modal.
-
Enter the connection information. Reference the table below for details about each field.
-
Test the connection to ensure DataHub can connect using the information entered.
-
Select Done.
Field |
Value |
Description |
Optional/Required |
---|---|---|---|
Display as |
User-Defined Text Field |
Enter the display name for the connection. If you will be creating multiple connections, ensure the name readily identifies the connection. The name displays in the application, and you can use it to search for the connection and filter lists. |
Required |
Application token |
Provided by your Syncplicity administrator |
Each user can provision a personal application token, which may be used to authenticate in UI-less scenarios via API. This is especially useful for account administration tasks that run in a headless session. If provisioned, an application token is the only information required to log in a user using OAuth 2.0 resource owner credentials grant. You should protect this token.To learn more, click here. |
Required |
App key |
Provided by your Syncplicity administrator |
Identifier of the third-party application as defined by OAuth 2.0. To learn more, click here. |
Required |
App secret |
Provided by your Syncplicity administrator |
The secret (password) of the third-party application as defined by OAuth 2.0. Used with an application key to authenticate a third-party application. To learn more, click here. |
Required |
New SyncPoint type |
Syncpoint type choice |
This option instructs DataHub as to what type of folder should be created when a top level folder is created through a DataHub process. To learn more about these options, click here. |
Optional |
Features and Limitations
Platforms all have unique features and limitations. DataHub’s transfer engine manages these differences between platforms and allows you to configure actions based on Job Policies and Behaviors. Utilize the Platform Comparison tool to see how your integration platforms may interact regarding features and limitations.
Supported Features |
Unsupported Features |
Other Features/Limitations |
---|---|---|
Segment path length: 260 |
||
No leading spaces in file name/folder names |
||
No trailing spaces before or after file extensions |
||
No non-printable ASCII characters |
||
Invalid characters: \ / < > |
||
Only syncpoints can be shared with other users and have permissions persist. |
||
Users with a large number of syncpoints are not supported by Syncplicity. |
If you are creating a new impersonation job with a Syncplicity connection and the source or destination location is empty, the user you are impersonating has too many syncpoints. You will need to delete the syncpoints before you can create the job.,
Connection Pooling
When transferring data between a source and destination, there are a number of factors that can limit the transfer speed. Most cloud providers have rate limitations that reduce the transfer rate, but if those limits are account based and it supports impersonation, DataHub can create a pool of accounts that issues commands in a round-robin format across all of the accounts connected to the pool. Any modifications to the connection pool will used on the next job run.
For example, if a connection pool has two accounts, all commands will be alternated between them. If a third account is added to the pool, the next run of the job will use all three accounts.
Not Supported:
-
"My Computer" and Network File Share (NFS) connections are not supported with Connection Pooling.
User & Group Maps
A user account or group map provides the ability to explicitly associate users and groups for the purposes of setting ownership and permissions on items transferred. These mappings can happen automatically using rules or explicitly using an exception. Accounts or groups can be excluded by specifying an exclusion, and unmapped users can be defaulted to a known user.
Here are a few things to consider when creating an account or group map:
-
A source and destination connection are required and need to match the source and destination of the job that will be referencing the user or group map.
-
A map can be created before or during the creation of the job.
-
A map can be used across multiple jobs.
-
Once updated, the updates will not be reapplied to content that has already been transferred.
User & Group Map Import Templates
Please see Account Map / Group Map | CSV File Guidelines for map templates and sample downloads.
User & Group Map Exceptions
A user or group map exception provides the ability to explicitly map a specific user from one platform to another. These are exceptions to the automatic account or group mapping policies specified. User account or group map exceptions can be defined during the creation of the map or can be imported from a comma-separated values (CSV) file.
User & Group Map Exclusions
A user or group map exclusion provides the ability to explicitly exclude an account or group from owner or permissions preservation. User account or group map exclusions can be defined during the creation of the map or can be imported from a comma-separated values (CSV) file.
Transfer Planner
At the start of a project, it is common to begin planning with questions like "How long should I expect this to take?"
Transfer Planner allows you to outline the basic assumptions of any integration, primarily around the initial content copy at the beginning of a migration or first synchronization. It uses basic assumptions to begin visualization of the process, without requiring any setup of connections or jobs.
The tool estimates and graphs a time line to complete the transfer based on the information entered in the Assumptions area. The time line assumes a start date of today and uses the values in the Assumptions section to model the content transfer.
The Transfer Planner automatically recalculates the predicted time line if you change any of the values, making simple “what if?” scenario evaluations. Press Reset to restore default values for the transfer planner tool.
The window displays projected Total Transfer in dark blue and Daily Transfer Rate in light blue. Hovering the mouse pointer over the graph displays estimated transfer details for that day.
You can see the impact on the project timeline by changing the values in the Assumptions area. The graph will redraw to reflect your new values.
Note that the Transfer Planner is primarily driven by the amount of data needing to be processed. DataHub has various tools for transferring versions of files (if the platform supports this feature), which can increase the size of your data set. It also has the ability to filter out specific files by their type or by other rules you set. At this stage, a rough estimate of total size is recommended as it can refined later using Simulation Mode.
Simulation Mode
Simulation mode allows you to create a job with all desired configuration options set and execute it as a dry run. In this mode, no data will actually transfer, no permissions will be set, and no changes will be made to either the source or the destination.
This can be useful in answering several questions about your content prior to actually running any jobs against your content.
How much content do I have?
-
An important first step in any migration is to determine how much content you actually have, as this can help in determining how long a migration will take.
What kinds of content do I have?
-
Another important step in any migration is to determine what kinds of content you actually have.
-
Many organizations have accumulated a lot of content and some of that may not be useful on the desired destination platform.
-
The results of a simulation mode job can help you determine if you should introduce any filter rules to narrow the scope of the job.
-
An example would be if you should exclude executable files (.exe or .bat files) or exclude files older than 3 years old.
What kinds of issues should I expect to run into?
-
During the course of a migration, there are many things to consider and unknown issues that can arise, many of which will only present themselves once you start doing something with the source and destination.
-
Running a job in simulation mode can help you identify some of those issues before you actually start transferring content.
Examples can include:
-
Are my user mappings configured correctly?
-
Does the scope of the job capture everything that I expected it to capture?
-
Do I have files that are too large for the destination platform?
-
Do I have permissions that are incompatible with the destination platform (i.e. ACL vs waterfall)?
-
Do I have files or folders that are too long or contain invalid characters that the destination platform will not accept?
Create a Simulation Job
During the job creation workflow, the last stage before creating the job there will be an option to enable simulation mode.
When a job is in simulation mode, it can be run and scheduled like any other job, but no data will be transferred.
Transition a Simulation Job to Transfer Content
After review, a simulation job can be transitioned to a live job that will begin to transfer your content to the destination platform.
Create a Job
DataHub delivers a user-friendly web-based experience that is optimized for PC, tablet and mobile phone interfaces—so you can monitor and control your file transfers anywhere, from any device.
DataHub’s true bi-directional hybrid/sync capabilities enable organizations to leverage and preserve content across on-premises systems and any cloud service. Seamless to users, new files/file changes from either system are automatically reflected in the other.
DataHub uses jobs to perform specific actions between the source and destination platforms. The most common type of jobs are copy and sync; please see Create New Job | Transfer Direction for more information.
All jobs can be configured to run manually or on a defined schedule. This option will be presented as the last configuration step.
To create a job, select the Jobs option from the left menu and click on Create Job. DataHub will lead you through a wizard to select all the applicable options for your scenario.
The main job creation steps include:
-
Selecting a Job Type
-
Configuring Locations
-
Defining Transfer Policies
-
Defining Job Transfer Behaviors
-
Advanced Options
-
Summary | Review, Create Job, and Schedule
Job Type
Job type defines the kind of job and the actions the job will perform with the content. There are two main job types available: basic transfer and folder Mapping.
Basic Transfer - Transfer items between one connection and another
This will copy all content (files, folders) from the source to the destination. Each Job run will detect any new content on the Source and copy to the Destination
For more information, please see Create New Job | Transfer Direction.
Define Source & Destination Locations
All platform connections made in the DataHub Platform application will be available in the locations drop-down lists when creating a job.
-
If your connections were created with Administrative privileges, you may also have the ability to impersonate another user within your organization.
-
Source defines the location of your current content you wish to transfer.
-
Destination defines the location of where you would like your content to go.
Configuring Your Locations - Impersonation
Impersonation allows a site admin access to all the folders on the site, including those that belong to other users. With DataHub, a job can be set up using the username and password of the site admin to sync/migrate/copy files to or from a different user's account without ever having the username or password of that user.
How and why would I use impersonation?
This allows a site admin access to all the folders on the site, including those that belong to other users. Within DataHub, a job can be set up using the username and password of the site admin to sync/migrate/copy files to or from a different user's account without ever having the username or password of that user.
Enable Run as user...
Choose Source User
Job Category
The category function allows for the logical grouping of jobs for reporting and filtering purposes. The category is optional and does not alter the job function in any way.
DataHub comes with two default job categories:
Maintenance: DataHub maintenance jobs only. This category allows you to view the report of background maintenance jobs and is not intended for newly created transfer jobs.
Default: When a category is not defined during job creation, it will automatically be given the default category. This option allows you to create a report for all jobs that a custom category was not assigned.
Create Job Category
Enable feature and select from existing job categories or create a new category.
From the jobs grid, filter by category
Job Policies
Define what should happen once items have been successfully transferred and set up rules around how to deal with content as it is updated on your resources while the job is running.
-
DataHub works on the concept of “deltas” where the transfer engine only transfers files after they have been updated.
-
File version conflicts occur when the same file on the source and destination platforms have been updated in between job executions.
-
Policies define how DataHub handles file version conflicts and whether or not it persists a detected file deletion.
-
Each job has its own policies defined and the settings are NOT global across all jobs.
Conflict Policy - File Version Conflicts
When a conflict is detected on either the source or the destination, Conflict Policy determines how DataHub will behave.
For more information, please see Conflict Policy.
Delete Policy - Deleted Items
When a delete is detected on either the Source or the Destination, Delete Policy determines how DataHub will behave.
For more information, please see Delete Policy.
Behaviors
Behaviors determine how this job should execute and what course of action to take in different scenarios. All behaviors are enabled by default as recommended settings to ensure content is transferred successfully to the destination.
Zip Unsupported Files / Restricted Content
Enabling this behavior allows DataHub to compress any file that is not supported on the destination into a .zip format before being transferred. This will be done instead of flagging the item for manual remediation and halting the transfer of the file.
For example, if you attempt to transfer the file "db123.cmd" from a Network File Share to SharePoint, DataHub will compress the file to "db123.zip" before transferring it over, avoiding an error message.
Allow unsupported file names to be changed
Segment Transformation policy controls if DataHub can change folder and file names to comply to platform's restrictions.
Enabling this behavior allows DataHub to change the names of folders and files that contain characters that are not supported by the destination before transferring the file. This will be done instead of flagging the file for manual remediation and preventing it from being transferred.
When this occurs, the unsupported character will be transformed into a underscore.
For example, if you attempted to transfer the file "Congrats!.txt" from box to NFS, it would be transformed to "Congrats_.txt" and appear that way on the destination.
Preserve file versioning between locations
DataHub will preserve and transfer all versions of a file on supported platforms.
Advanced
These optional job configurations determine what features you want to preserve, filter or add during your content transfer.
Filtering
Filtering defines rules for determining which items are included or excluded during transfer. For more
information, please see Job Filters.
Job Filters | Filter By Name Pattern
Job Filters | Filter By Extensions or Type
Job Filters | Filter By Date Range or Age
Job Filters | Filter by Metadata
Job Filters | Metadata Conjunctions
Permission Preservation
This setting enables DataHub to determine how permissions are transferred across platforms.
Permissions | Author / Owner Preservation
Permissions | Permissions Preservation
Permissions | Permissions Import
Permissions | Preserve Shared Links
Metadata Mapping
Metadata mapping allows you to document your source metadata and map how you want it applied to the destination in CSV format. Enabling this feature will offer the ability to import the CSV file and apply it during job creation.
For more information, please see Metadata Import.
Scripting
Some DataHub features are not available yet in the user-interface. The scripting feature allows the advanced DataHub user to enter advanced transfer features by inserting JSON formatted job controls.
Enabling this option will allow you to leverage these features and apply it during job creation.
Job Summary - Review your job configuration
Before you create your job, review all your configurations and adjust as needed. Modifying your job after creation is not supported; however, the option to duplicate your current job will allow you to make any adjustments without starting from the beginning.
-
The Edit option will take you directly to the configuration to make changes.
Define Job Schedule
During job creation, the final step is to define when the job will run and what criteria will define when it stops.
-
Save job will launch the job scheduler.
-
Save job and run it right now will trigger the job to start immediately. It will run every 15 mins after the last execution completes.
Schedule Stop Policies
Stop policies determine when a job should stop running. If none of the stop policies are enabled, a scheduled job will continue to run until it is manually stopped or removed.
The options for the stop policy are:
Stop after a number of total runs
The number of total executions before the job will move to "complete" status
Stop after a number of runs with no changes
The job has run and detected no further changes; all content has transferred successfully.
If new content is added to the source and the job runs again, this will not increment your stop policy count. However, job executions that detect no changes do not need to be consecutive to increment your stop policy count.
Stop after a number of failures
Most failures are resolved through automatic retries. If the retries fail to resolve the failures, then manual intervention is required. This policy takes the job out of rotation so that the issue can be investigated.
Job executions that detect failures do not need to be consecutive to increment your stop policy count.
Stop after a specific date
The job will "complete" on the date defined
Reports
Reporting is paramount with the DataHub Platform. Whether you chose to utilize the DataHub manager application, CLI, or ReST API, reporting options are available to help you manage and surface data about your content in real-time.
Out-of-the-box reports include:
-
Dashboard: Provides an overview of what is happening across all your content
-
Job Overview: Detailed job information including source, destination, schedule and current status
-
Flagged Items: Content that did not transfer and requires attention
-
Content Insights: Breakdown of your transferred data
-
Sharing Insights: Breakdown of all permissions associated based on your source content
-
User Mappings: The permission associations of your content
-
Item Report: Information on each item that transferred
-
Validation: At any time, you may run a validation run, which will trigger a full inspection of all content relating to the option you select for the next run only.
Job Overview Report
This report provides detailed transfer information for the individual job.
Schedule: Provides information on how many times the job has executed, when the job will run again and progress towards meeting the job stop policy defined
Transfer Details | Identified Chart: Reflects content identified on the source platform and the status summary for items
Transfer Details | Revised Chart: Reflects content that DataHub revised during transfer to meet destination requirements and user-defined job configurations
Transfer Details | Flagged Chart: Reflect content that DataHub could not transfer. Manual remediation is required
Run Breakdown Report: Provides job history information for each execution for the given job
-
Note: Last Activity in the Run Breakdown will only appear during the job execution.
In some circumstances, bytes on the destination can be higher than listed on the source. This discrepancy is caused by property promotion on Word documents. For more information, see Report Values | Potential Differences due to Post Processing.
Values in the run breakdown may differ from values presented in the charts. This is because the run breakdown tracks each individual occurrence where as an item can only exist in a single chart category.
Example: When an item is both truncated AND ignored, it would not show up in the "Revised" chart but would show up in the "Revised" run breakdown
The run breakdown also shows both files and folder values. The charts display files and folder values separately, with the "Transfer Details" dropdown available to switch between display values.
Job Content Insights Report
This report provides detailed content information for the individual job.
Use the drop-down options to change the chart views.
Job Sharing Insights Report
This report offers a breakdown of all permissions associated to your content. The values presented are based on the source content.
On the Shared Insights tab for a job, the value "Not Shared" represents both items that have no permissions as well as content shared by inheritance from the parent folder. At this time, DataHub only tracks permissions applied during transfer, not permissions that result from inheritance within the hierarchy.
Job User Mappings Report
The User Mappings report for a given job presents the permission breakdown of your content.
If any of the following features are enabled, User Mapping report will populate:
Job Validation Report
Control the level of tracking and reporting for content that exists on both the source and destination platform, including content that has been configured to be excluded from transfer and content that existed on the destination prior to the initial transfer.
Items that have been ignored / skipped by policy or not shown because they already existed on the destination can now be seen on reports with the defined categories.
The default validation option is inspect none. This option does not need to be configured in the application user-interface or through the ReST API; it is the system default.
This configuration will not track all items but will offer additional tracking with performance in mind. Inspect none will track all items on the source at all levels of the hierarchy but not including those configured to be ignored/skipped through policy. For the destination, all content in the root (files & folders) that existed prior to the initial transfer will be tracked as destination only items and reported as ignored/skipped.
This option has the following features:
Source: All content (files and folders) at all levels in the hierarchy, but not including those configured to be ignored/skipped through policy. However, if the connection does not have access to a given folder in the hierarchy, we cannot track and report these items.
Destination: All content in the root (files and folders) that existed prior to the initial transfer will be tracked as destination only items and reported as ignored/skipped.
Destination: All content (files and folders) lower depths of the directory (sub-folders) that existed prior to the initial transfer will not be tracked.
If the connection does not have access to a given folder in the hierarchy, we cannot track and report these items.
Job Reports - Validation tab: At any time, you may run a validation run, which will trigger a full inspection of all content relating to the option you select for the next run only.
-
For more information, see Job Validation | Item Inspection Policy.
Generate Job Reports
DataHub Reports provide several options to combine many jobs into a single report for review. Reports are generated by category, individually selected jobs or by convention job parent (user account mapping, network home drive mapping or folder mapping job types).
Reports are separated by two tabs so you can clearly distinguish between jobs that are actively transferring content and simulation jobs that imitate transfer.
If no category is defined during job creation, it will be assigned to the default job category.
Generate Report
Select Report Type
Define what the report will contain
-
Category: Defined during job creation
-
Parent jobs: Relating to convention jobs such as user account mapping, network home drive mapping or folder mapping job types
-
Manually select jobs: Choose each job individually that you want in your report
Remediation
Items that were unable to be transferred by the DataHub Platform will be flagged for manual remediation. Items can be flagged for many reasons, and in some cases, still transferred to the destination platform. Each item is a package, consisting of the media itself, version history, author, sharing and any other metadata. DataHub ensures all pieces of the item package are transferred to the destination to preserve data integrity. When an item is flagged, DataHub is indicating that all or some portion of this failed to migrate.
All migrations require some amount of manual intervention by the client to move content that fails to transfer automatically.
-
Note that one of the uses of simulation mode is to get an understanding prior to a live transfer of how many files might fail to transfer and the reasons.
-
This can be used to adjust the job parameters to achieve a higher number of automatic remediation successes.
General Reasons Content does not Transfer
Errors from Source & Destination Platforms
This is a broad error category that indicates DataHub was prevented from reading, downloading, uploading or writing content during content transfer by either the source or destination platform provider. Each situation is dependent on the storage provider rejection reason and will require manual investigation to resolve.
Insufficient Permissions
Many platforms may require additional permissions in order to perform certain functions, even for site administrator accounts. These permissions typically require a special request from the storage provider. For example, content that has been locked, hidden or has been flagged to disable download may require this special permission request from your storage provider.
Scenario-Specific Configuration
Content on your source storage platform is diverse, and users across your business will structure their data in a wide-variety of different ways. A single one-size-fits-all project configuration may not be suitable and can result in some content not transferring to the destination platform. DataHub will assist in assessing these situations to help provide custom, scenario-specific configuration that may workaround the issue that is preventing the transfer.
Disparate Platform Features
Each platform provider has a given set of features that are generally shared concepts in the storage business industry. However, within each storage platform, there can be behavioral or rule differences within these features, and aligning these discrepancies can be challenging. Features such as permission levels (edit, view, view+upload, etc.) may not align as an exact match to the destination platform, file size restrictions or file names may need to be altered to conform to meet the destination platform's policies. DataHub will attempt to accommodate these restrictions through configurations in the system; however, not all scenarios can be covered in a diverse data set.
Interruption in Service
DataHub must maintain connection to the database at all times during the transfer process. If there is an interruption in service, DataHub will fail the transfer as it is unable to track / write to the database.
How do I validate my content transferred successfully?
Verify the destination
DataHub will report all content that has transferred to the destination. Log into your destination platform and verify the content is located as expected.
DataHub is reporting items in "pending" or "retrying" status, what are my next steps?
Run the job again
DataHub defaults to retrying the job 3 times to reconcile items that are in pending/retry status. Depending on your job configuration, this may occur with the defined schedule or you can start the job manually.
Review the log message
DataHub logs a reason why the item is in pending/retry status. On the job "Overview" tab, click on the Transfer Details breakdown status "retrying". This will direct you to the filtered "Items" list. Select the item then click the "View item history" link on the right toolbox.
DataHub is reporting items in "Flagged" status, what are my next steps?
When an item is in "flagged" status, this means DataHub has made all attempts to transfer the file without success, and it requires manual remediation.
Review the log message
DataHub logs a reason why the item has been flagged. On the job "Overview" tab, click on the Transfer Details breakdown status "flagged". This will direct you to the filtered "Items" list. Select the item and click the "View item history" link on the right toolbox.
Review the message and determine if you can resolve on the source platform.
Review all flagged items
These are the recommended ways to view all flagged items: export the flagged item report or review the "Flagged Items" page.
Export report:
-
Job Report → Items tab → Filter by Status: Flagged
-
Click "Export this report" → Save CSV file for review
Review "Flagged Items" page:
-
Retry or Ignore individual items
-
View Item History for individual items
-
Link back to the job the flagged items is associated with
-
Export all Flagged Items report