Convention jobs such as folder mapping are ideal for migrations where you wish to control the transfer at a granular level without the effort of creating individual jobs. DataHub will automatically create a unique job for each folder in your hierarchy, inheriting configurations from the parent/master job.
Folder mapping jobs are a type of convention or master job that create child jobs a single level deep in the folder hierarchy at the path you specify. The folder mapping job may be controlled and manipulated just like a transfer job, but when executed, it will not transfer data. Instead, each execution creates, modifies, or deletes its child jobs, which are then responsible for the transfer of data. For example, if your source contains three sub-folders, a folder mapping job would create a child job for each of those three folders in the specified path. As new folders are created in the source, additional child jobs are created for them automatically. Data will be transferred when the child jobs are run.
The parent for a folder mapping job has minimal impact on performance. However, the child jobs it creates follow the same performance consideration as any job type available in DataHub.
If DataHub is installed on a single instance, Parallel Writes will be a limiting factor, regardless of how many child jobs are created. In a multi-node scenario, more jobs running concurrently may be configured without impacting performance.
User-Interface - Creating a Folder Mapping Job
Folder Mapping supports the following:
All job features defined while creating the parent job will be applied to the child jobs it creates
Defining your Source and Destination Paths
If you are an administrator using Impersonation, enable Run as user..., and choose the user you wish to access.
Source / Destination Path: If you wish to transfer all content, leave the source path blank. A child job will be created for every top level folder. If a folder is selected for the source path, a child job will be created for every sub folder within the parent folder.
Child Job Source / Destination Path: This directory within each folder will be used as the source.
Target the root of each folder: The child job will be created for the first level folder relative to the source path.
Target a specific directory within each folder: If there is a folder that is exists in every directory, you can define it with this option.
Create the Folder Mapping Job
After configuring policies, behaviors, and advanced features, you will be prompted to schedule the job. This schedule will be applied to the child jobs
- The parent job will run immediately to create the child jobs.
- After the child jobs are created, the default schedule is set to run every 6 hours so it can review the source for any new content.
- The parent and child job schedules can be changed at any time.
DataHub API - Creating a Folder Mapping Job
In general, creating a folder mapping job is not much different than creating a transfer job.
The parameter to note here is "kind". It must be set to "folder_mapping".
To set the schedule for the child jobs, set the schedule parameter within the transfer block. The schedule parameter outside the transfer block is the schedule for the parent job only.
In the example below,
- The parent job is set to run automatically after creation and run every 15 mins thereafter.
- The child jobs are set to a manual schedule.
DataHub API - Creating a Folder Mapping Job by Including Specific Folders
DataHub API | Creating a Folder Mapping Job by Excluding Specific Folders