Hadoop Integration

H-Syndicate allows users to access datasets on Syndicate Volumes from Hadoop. Plugged into Hadoop FileSystem Abstraction, Hadoop users can access Syndicate Volumes in the same way as working with Hadoop Native Filesystem (HDFS).

Prerequisites

You must first Install H-Syndicate and Mount Syndicate Gateways. Once Syndicate Gateways are mounted, they will issue – as you requested – session_name and session_key for each mount.

Path (URI)

To access H-Syndicate, give URI scheme of H-Syndicate when passing a path. We recommend hsyn:// but this can differ from environments by policy. Hadoop will redirect accesses to H-Syndicate when requested file path (URI) has the corresponding scheme.

H-Syndicate’s path is defined as following:

<scheme>://<Syndicate_REST_host>/<session_name>/path/to/file

Most cases, users can omit host part. In this case, local Syndicate REST service will be accessed.

Following example is a typical form:

hsyn:///<session_name>/path/to/file

Session Key

session_key is not passed directly via path for security. session_key is passed to H-Syndicate securely via Hadoop Credential Provider.

It creates a secure database storing your all session_name and session_key pairs on Hadoop File System (HDFS). Then, H-Syndicate finds corresponding session_key to a given session_name from the database.

To store session_key to the database:

hadoop credential create \
            fs.hsyndicate.session.<session_name>.key \
            -v <session_key> \
            -provider jceks://hdfs/user/<hadoop_user>/.syndicate/hsyndicate.jceks

You can use different paths for the database file. But we recommend place them under your home directory for security.

To use the database, the database path should be given. One way is defining hadoop.security.credential.provider.path configuration parameter from command-line.

-D hadoop.security.credential.provider.path=jceks://hdfs/user/<hadoop_user>/.syndicate/hsyndicate.jceks

Thus, listing a root directory for a session, test_session, is:

hadoop dfs -D hadoop.security.credential.provider.path=jceks://hdfs/user/<hadoop_user>/.syndicate/hsyndicate.jceks -ls hsyn:///test_session/

For more information about Hadoop Credential Provider, checkout following links: