Hadoop Integration
H-Syndicate allows users to access datasets on Syndicate Volumes from Hadoop.
Plugged into Hadoop FileSystem Abstraction, Hadoop users can access Syndicate
Volumes in the same way as working with Hadoop Native Filesystem (HDFS).
Prerequisites
You must first Install H-Syndicate and Mount Syndicate Gateways.
Once Syndicate Gateways are mounted, they will issue – as you requested – session_name
and session_key for each mount.
Path (URI)
To access H-Syndicate, give URI scheme of H-Syndicate when passing a path.
We recommend hsyn:// but this can differ from environments by policy. Hadoop
will redirect accesses to H-Syndicate when requested file path (URI) has the
corresponding scheme.
H-Syndicate’s path is defined as following:
<scheme>://<Syndicate_REST_host>/<session_name>/path/to/file
Most cases, users can omit host part. In this case, local Syndicate REST service
will be accessed.
Following example is a typical form:
hsyn:///<session_name>/path/to/file
Session Key
session_key is not passed directly via path for security. session_key is
passed to H-Syndicate securely via Hadoop Credential Provider.
It creates a secure database storing your all session_name and session_key
pairs on Hadoop File System (HDFS). Then, H-Syndicate finds corresponding
session_key to a given session_name from the database.
To store session_key to the database:
hadoop credential create \
fs.hsyndicate.session.<session_name>.key \
-v <session_key> \
-provider jceks://hdfs/user/<hadoop_user>/.syndicate/hsyndicate.jceks
You can use different paths for the database file. But we recommend place them under your home directory for security.
To use the database, the database path should be given. One way is defining
hadoop.security.credential.provider.path configuration parameter from command-line.
-D hadoop.security.credential.provider.path=jceks://hdfs/user/<hadoop_user>/.syndicate/hsyndicate.jceks
Thus, listing a root directory for a session, test_session, is:
hadoop dfs -D hadoop.security.credential.provider.path=jceks://hdfs/user/<hadoop_user>/.syndicate/hsyndicate.jceks -ls hsyn:///test_session/
For more information about Hadoop Credential Provider, checkout following links: