Datameer FAQs | Comparably
Datameer Claimed Company
Datameer is a leading provider of data management software for analytics that gives analysts universal access to the data they need when they need it for faster analytics. Datameer Spotlight enables business teams to discover, access, collaborate, and analyze more data without complex data replication and movement, while Datameer Spectrum is a cloud-native, fully-featured ETL++ platform that turns raw data into analytics-ready datasets in an easy, code-free manner. Datameer is a trusted platform at leading enterprises globally, including Citibank, Royal Bank of Canada, British Telecom, Aetna, Optum, National Instruments, Vivint and more. To learn more, please visit www.datameer.com. read more
EMPLOYEE
PARTICIPANTS
14
TOTAL
RATINGS
138

Datameer FAQs

Datameer's Frequently Asked Questions page is a central hub where its customers can always go to with their most common questions. These are the 468 most popular questions Datameer receives.

Frequently Asked Questions About Datameer

  • Problem

    During a cluster job execution, some tasks fail with the following message seen in the YARN application log.

    {"entity":"attempt_111111111_222222_1_01_000001_0","entitytype":"TEZ_TASK_ATTEMPT_ID",

    "events":[{"ts":1566540243508,"eventtype":"TASK_ATTEMPT_FINISHED"}],

    "otherinfo":{"creationTime":1566540195458,"allocationTime":1566540197777,"startTime":1566540230285,"endTime":1566540243508,"timeTaken":13223,

    "status":"FAILED","taskAttemptErrorEnum":"CONTAINER_EXITED","taskFailureType":"NON_FATAL","diagnostics":"Container container_111111111_222222_1_01_000001 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]",

    "counters":{"counterGroups":"[{counterGroupName=org.apache.tez.common.counters.DAGCounter, counters=[{counterName=RACK_LOCAL_TASKS, counterValue=1}]}]"},"lastDataEvents":{"lastDataEvents":"[{TEZ_TASK_ATTEMPT_ID=, ts=1566540195231}]"},"nodeHttpAddress":"DataNodeHostName:port"}}

    Occasionally this can cause a job failure, but usually just impacts performance, as a failed task should be rerun.

    Cause

    This failure is a known issue with YARN ( YARN-8671 ) that may occurif a node is overly busy (e.g., some other container is using too much CPU or the NodeManager is doing too much to respond). The failure is indicative of a busy cluster or nodes that are having issues for some other reason.

    Solution

    As this exception points to a cluster services issue, it is recommended to review the cluster's configuration, performance and perform a general health check.

    View Article
  • Goal

    How can the file size of Datameer Export Jobs be limited?

    Learn

    For file-based Export Jobs, Datameer allows limiting the size of exported files. It is possible to set this threshold at: Export Job Configuration -> Data Details -> Advanced Settings -> Maximum file size (MB) filed.

    Keep in mind that the limit is applied to uncompressed data at the moment when it is being initially written to the target location. Compression (if configured) happens after the uncompressed files are written for every exported file individually.

    For example, when data is exported as compressed CSV files and a file size limit of 50 MB is set, Datameer:

    a) Exports uncompressed data considering that max allowed size for an individual file.

    b) Compresses every file.

    This leads to an expected situation when even with a 50 MB per-file size limit, an Export Job ends up writing ten 5MB files. This means that ten 50 MB files were compressed with a compression ratio of 10:1.

    View Article
  • Problem

    On Sheet1 column A, I’ve created a record using the T function:T("09302019"). I then converted this string value into a date using ASDATE(#A;"MMddyyyy")function.

    However, trying the query:

    SELECT CAST(Sheet1.A as date 'mmddyyyy') from Sheet1

    Throws the following exception:

    WARN [2019-09-30 17:35:29.806] [qtp1942406066-77] (SqlSheetModel.java:199) - Something went wrong while parsing SQL query: SELECT CAST(Sheet1.A as date 'mmddyyyy') from Sheet1, Encountered "\'mmddyyyy\'" at line 1, column 30.

    Was expecting one of:

    ")" ...

    "(" ...

    "CHARACTER" ...

    "MULTISET" ...

    Cause

    Under the hood, SQL Worksheets are converted into a set of traditional Datameer functions and operations. The library it uses to cast a string into a date has a hardcoded pattern which isyyyy-MM-dd. As a result, any other pattern will not be recognized. This is why Datameer throws an exception for a query like:SELECT CAST(receipt_date as DATE; "MMddyyyy")....

    Solution

    To work around this limitation, transform the source value fromMMddyyyytoyyyy-MM-ddformat before applying the SQL query. Here is one approach that can be used to do so.

    Sheet1 ColumnA - initial value09302019.

    Transform it into2019-09-30using the formulaRIGHT(#A;4)+"-"+LEFT(#A;2)+"-"+RIGHT(LEFT(#A;4);2).

    Create a SQL Sheet and introduce the desired SQL query.

    Example.

    View Article
  • Goal

    I want to replace the default jar files used for the Teradata Database Driver in Datameer.

    Learn

    In order to successfully replace JDBC jar files for an existing Datameer Database Driver, ALL existing jar files for the specific driver must first be removed.

    Navigate to the Admin Tab -> Database Drivers and then click on the gear icon for the existing Teradata Driver to access it's configuration.

    Remove all jar files associated with this driver.

    Note: These files can only be removed one at a time, even if all of them disappear once you click on the Trash icon. It is necessary to re-open the Driver configuration page again and remove each file individually.

    On the Database Drivers section, ensure that no files remain within the File column for the existing Teradata Driver. As seen below:

    Once this is confirmed, upload the new jar files and save the configuration.

    View Article
  • Problem

    After the upgrade to version 7.5.x, Datameer fails to start with the following exception.

    stderrout.log

    2020-01-05 11:23:43.622:WARN:oejw.WebAppContext:main: Failed startup of context o.e.j.w.WebAppContext@7225790e{/,file:/datameer/Datameer-7.5.4-hdp-2.6.0/webapps/conductor/,STARTING}{/conductor}

    java.lang.IllegalArgumentException: Unknown name value [VARIABLES_READ] for enum class [datameer.dap.sdk.usermanagement.Capability]

    at org.hibernate.type.EnumType$NamedEnumValueMapper.fromName(EnumType.java:467)

    at org.hibernate.type.EnumType$NamedEnumValueMapper.getValue(EnumType.java:452)

    at org.hibernate.type.EnumType.nullSafeGet(EnumType.java:107)

    at org.hibernate.type.CustomType.nullSafeGet(CustomType.java:127)

    at org.hibernate.persister.collection.AbstractCollectionPersister.readElement(AbstractCollectionPersister.java:811)

    conductor.log (may or may not occur)

    [anonymous] WARN [2020-01-05 11:23:43.622] [ldap-cache-update-operation] - role with role_id 2, capability VARIABLES_READ can not be converted to Capability object, skipping it.

    Cause

    This is related to the updates made to the 7.5 SDK. Role capabilityVARIABLES_READ has been removed. Please refer to Important API and SDK Changes for Developers section of Datameer documentation for more details.

    There is the upgrade scriptupgrade-7.5.0-DAP_38522.sql, that should be executed as part of the Datameer database schema upgrade. It deletes all occurrences of the role capabilityVARIABLES_READ.

    Solution

    In case the mentioned upgrade script hasn't been executed for some reason, it is possible to manually removethe role capabilityVARIABLES_READ.

    Execute the following query against the Datameer database. If the script has run, the query should not return any record. In this case, please get in touch with Datameer support for further investigation.

    SELECT * FROM role_capability WHERE capability = 'VARIABLES_READ';

    If the query returns any response:

    Stop Datameer.

    Take the database dump.

    Execute the following query.

    DELETE FROM role_capability WHERE capability = 'VARIABLES_READ';

    Start Datameer.

    View Article
  • Problem

    Suddenly, the Datameer Git plug-in stops committing activities to the configured Git Repository. Within the <INSTALLDIR>/logs/conductor.log file, the following exception is observed:

    [anonymous] ERROR [2018-01-01 00:00:00.000] [datameer-event-bus-1] (GitVersioningRecorder.java:207) - Exception caught during execution of add command

    org.eclipse.jgit.api.errors.JGitInternalException: Exception caught during execution of add command

    at org.eclipse.jgit.api.AddCommand.call(AddCommand.java:211)

    at datameer.plugin.versioning.git.GitVersioning$2.apply(GitVersioning.java:97)

    at datameer.plugin.versioning.git.GitVersioning$2.apply(GitVersioning.java:90)

    at datameer.dap.sdk.util.Success.flatMap(Success.java:43)

    at datameer.plugin.versioning.git.GitVersioning.writeWorkbookToWorkTree(GitVersioning.java:629)

    at datameer.plugin.versioning.git.GitVersioning.commitWorkbookChanges(GitVersioning.java:191)

    at datameer.plugin.versioning.git.GitVersioningRecorder.recordWorkbookChanges(GitVersioningRecorder.java:242)

    at datameer.plugin.versioning.git.GitVersioningRecorder$6.apply(GitVersioningRecorder.java:174)

    at datameer.plugin.versioning.git.GitVersioningRecorder$6.apply(GitVersioningRecorder.java:171)

    at datameer.dap.sdk.util.Success.flatMap(Success.java:43)

    at datameer.plugin.versioning.git.GitVersioningRecorder.record(GitVersioningRecorder.java:674)

    at sun.reflect.GeneratedMethodAccessor456.invoke(Unknown Source)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:606)

    at datameer.com.google.common.eventbus.EventSubscriber.handleEvent(EventSubscriber.java:74)

    at datameer.com.google.common.eventbus.SynchronizedEventSubscriber.handleEvent(SynchronizedEventSubscriber.java:47)

    at datameer.com.google.common.eventbus.EventBus.dispatch(EventBus.java:322)

    at datameer.com.google.common.eventbus.AsyncEventBus.access$001(AsyncEventBus.java:34)

    at datameer.com.google.common.eventbus.AsyncEventBus$1.run(AsyncEventBus.java:117)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

    at java.lang.Thread.run(Thread.java:745)

    Caused by: org.eclipse.jgit.errors.LockFailedException: Cannot lock /opt/datameer/current/versioning/.git/index

    at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:224)

    at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:301)

    at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:267)

    at org.eclipse.jgit.lib.Repository.lockDirCache(Repository.java:1053)

    at org.eclipse.jgit.api.AddCommand.call(AddCommand.java:142)

    ... 21 more

    Cause

    The Git repository on the Datameer Server is locked. Specifically, the <REPOSITORY>/.git/index.lock file is stale. This file is locking the Git repository from any further edits.

    During normal operation, the <REPOSITORY>/.git/index.lock file should be created before an edit is made and then deleted immediately following the edit. If this file exists for more than 1 minute, it is likely that the lock was not released as expected.

    Solution

    To work-around this issue, release the Git repository lock by removing the <REPOSITORY>/.git/index.lock file from the local file system. The future commits to the Git repository will resume as expected.

    If this issue occurs, it is recommended to capture the <INSTALLDIR>/logs/conductor.log* files from the environment and to contact Datameer Support for further information.

    View Article
  • Goal

    I want to collect the YARN application logs.

    Learn

    There are times when the Datameer job trace logs might not provide enough information for effective troubleshooting of an issue. When this happens, you may be asked to provide the YARN application logs from the Hadoop cluster.

    To do this, you must first discern the application_id of the job in question. This can be found from the logs section of the Job History for that particular job id. First you must navigate to the job run details for the job id # in question:

    How to Collect the YARN Application Logs - Manual Method

    Once there, scroll to the bottom to the Job Log section and look for the line Submitted Application <application_id>:

    Once the application_id is obtained, you can execute the following command from the command line on the Resource Manager to obtain the application logs:

    yarn logs -applicationId <application_id>

    Continuing with the above example, the following command would be executed:

    yarn logs -applicationId application_1432041223735_0001 > appID_1432041223735_0001.log

    Please note that using the `yarn logs -applicationId <application_id>` method is preferred but it does require log aggregation to be enabled first. If log aggregation is not enabled, the following steps may be followed to manually collect the YARN Application logs:

    View Article
  • Problem

    Attempting to create a connection to a MySQL Database fails with the error below:

    java.lang.RuntimeException: could not create jdbc connection to jdbc:mysql://host:3306/database_name

    Caused by: com.mysql.cj.core.exceptions.UnableToConnectException: CLIENT_PLUGIN_AUTH is required

    Cause

    Mentioned com.mysql.cj.core.exceptions.UnableToConnectExceptionmost likely comes from version 6 ofMySQL Connector/J, when you try to connect to a relatively old MySQL instance. (Please refer to Supported Data Sources to check if the version of MySQL instance you are trying to connect to is supported.)

    Name of the class that implementsjava.sql.Driverin MySQL Connector/J has changed fromcom.mysql.jdbc.Drivertocom.mysql.cj.jdbc.Driver in version 6. Please refer to Changes in the Connector/J API.

    When you usemysql-connector-java-6.0.6.jaras a default MySQL JDBC driver (stored at etc/custom-jar/) and would like to set up a connection to a relatively old MySQL instance, it might fail with the error message mentioned above.

    You can try to workaround the problem by setting up custom database driver using one of previous versions ofMySQL Connector/J (e.g. 5.1.44), but this still might not work. When one creates a custom Database Driver and uploads another version of themysql-connector-javajar file, a new connection that will be created in the future would have both jars (default and custom) in it's classpath. In case that the defaultmysql-connector-java-6.0.6.jarfrometc/custom-jar/is picked up first, it will be used instead of the custom driver.

    Datameer recommends using generally available versions of MySQL Connector/J.

    Solution

    Here are the steps to replace mysql-connector-java-6.0.6.jarif you use it as the default one.

    Remove all custom MySQL drivers you might have created to fix this problem and keep only the embedded one.

    Stop Datameer.

    Ensure that the service has been really stopped and no datameer processes are running.

    Clean up /<Datameer installation folder>/tempand/<Datameer installation folder>/tmpfolders.

    Replace the/<Datameer installation folder>/etc/custom-jars/mysql-connector-java-6.0.6.jarfile withmysql-connector-java-5.1.44.jar. or any other recent GA version of MySQL Connector/J

    Start Datameer.

    View Article
  • Problem

    After accepting a self-signed certificate, the browser complains that scripts are being served in mixed mode:

    Mixed Content: The page at 'https://<host>/browser' was loaded over HTTPS, but requested an insecure XMLHttpRequest endpoint 'http://<host>/home'. This request has been blocked; the content must be served over HTTPS.

    Cause

    This error might occurwhen you use Apache mod_proxy at your environment and external connection is secured, but internal one is not.

    In this case the embedded Jetty webservice doesn't recognize that all external connections should be served in secured mode (over HTTPS) and keeps responding over HTTP.

    Environment schema

    User > HTTPS > Apache mod_proxy > HTTP > Datameers Jetty

    Solution

    In order to fix the issue, adjust Apache, Jetty, and Datameer settings.

    Apache

    Add the following line to the Apache config for the Datameer VirtualHost section:

    RequestHeader set X-Forwarded-Proto "https" env=HTTPS

    Jetty

    In <datameer-install-path>/etc/jetty.xml uncomment the following:

    <Call name="addCustomizer">

    <Arg><New class="org.eclipse.jetty.server.ForwardedRequestCustomizer"/></Arg>

    </Call>

    Datameer

    Make sure that the correct hostname and protocol is set in <datameer-install-path>/etc/live.properties for system.property.server.address:

    # Define the address and port used to connect to DATAMEER.

    system.property.server.address=<host>:<port>

    Restart Datameer and Apache to apply changes.

    View Article
  • Problem

    During creation of a Connection to HiveServer2, an error message is received.

    [admin] INFO [<timestamp>] [<thread>] (SetJsonOutputCommand.java:29) - Triggering HQL:SET hive.ddl.output.format=json

    [admin] WARN [<timestamp>] [<thread>] (DataStore.java:196) - connection fails: java.lang.RuntimeException: Error while processing statement: Cannot modify hive.ddl.output.format at runtime

    . It is not in list of params that are allowed to be modified at runtime

    datameer.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Error while processing statement: Cannot modify hive.ddl.output.format at runtime. It is not in list of params t

    hat are allowed to be modified at runtime

    Hive Configuration Variables

    Cause

    The issue seems to be mainly caused by permissions set at hive.security.authorization.sqlstd.confwhitelist.append in hive-site.xml.

    The data format to use for DDL output (e.g. DESCRIBE table) is either set to 'text' (for human readable text) or 'json' (for a json object). In this case, the format is set to text per default, where expected data format isJSON. (As of Hive 0.9.0.)

    Solution

    Whitelist the variable hive.ddl.output.format as per .

    View Article
  • Environment

    DM: 5.x, DIST: HDP 2.1, OS: Linux, COM: -

    Problem

    Setting up a connection to Amazon S3 bucket failed with following error message:

    AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: <id>, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: <id>

    at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:686)

    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:350)

    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:202)

    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3066)

    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3037)

    at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:533)

    at datameer.dap.hadoop.filesystem.DatameerS3FileSystem$ListingIterator.computeNext(DatameerS3FileSystem.java:617)

    at datameer.dap.hadoop.filesystem.DatameerS3FileSystem$ListingIterator.computeNext(DatameerS3FileSystem.java:605)

    at datameer.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)

    at datameer.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)

    at datameer.dap.hadoop.filesystem.DatameerS3FileSystem.listStatus(DatameerS3FileSystem.java:282)

    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1483)

    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1523)

    at datameer.dap.sdk.cluster.filesystem.HadoopFileSystem.listStatus(HadoopFileSystem.java:124)

    at datameer.dap.sdk.util.DatameerFsClient.listStatus(DatameerFsClient.java:53)

    at datameer.dap.sdk.util.DatameerFsClient.listStatus(DatameerFsClient.java:46)

    at datameer.dap.sdk.datastore.FileDataStoreModel.testConnect(FileDataStoreModel.java:56)

    at datameer.dap.sdk.entity.DataStore.validate(DataStore.java:186)

    ...

    Cause

    Server Side Encryption (SSE) is required for to write. The job is attemptingto do a test and is getting denied without SSE.

    The ability to implement AES 256 encryption in Hadoop was not added until the 2.5.0 distribution of Hadoop. Refer to Add S3 Server Side Encryption for background information.

    Apache Hadoop 2.6 release is supported in HDP 2.2 and beyond.

    Solution

    Set the following value as either a Custom Property in Datameer or in the core-site.xml file in your Hadoop cluster:

    fs.s3n.server-side-encryption-algorithm=AES256

    Workaround

    Since this parameter must be set at the Apache Hadoop level, itisnecessary to upgrade to HDP 2.2.As a workaround prior to the HDP 2.2 release, disable Server Side Encryption (SSE) on the specific S3 buckets that need to be accessed.

    View Article
  • Environment

    DM:4.4.1, OS: -, DIST: -, COM: HDFS

    Symptoms

    After upgrading Datameer to 4.4.1, the existing datalinks fail to resolve the configured logical name. Datalink jobs start running fine, but eventually they fail with an error like this:

    INFO [2014-10-19 23:41:25.797] [JobScheduler worker1-thread-991] (MrPlanRunner.java:250) - Completed postprocessing: [0 sec], progress at 100

    INFO [2014-10-19 23:41:25.797] [JobScheduler worker1-thread-991] (MrPlanRunner.java:251) - -------------------------------------------

    INFO [2014-10-19 23:41:25.798] [JobScheduler worker1-thread-991] (MrPlanRunner.java:157) - Completed execution plan with SUCCESS and 1 completed MR jobs. (hdfs://nameservice1/user/datameer/importlinks/7199/34922)

    INFO [2014-10-19 23:41:25.814] [JobScheduler worker1-thread-991] (JobArtifactFileAccessTool.java:62) - Configuring job result artifacts from [hdfs://nameservice1/user/datameer/importlinks/7199]

    INFO [2014-10-19 23:46:24.327] [JobScheduler worker1-thread-991] (JobArtifactFileAccessTool.java:62) - Configuring job result artifacts from [hdfs://nameservice1/user/datameer/joblogs/34922]

    ERROR [2014-10-19 23:46:24.410] [JobScheduler worker1-thread-991] (DasJobCallable.java:135) - Job failed! Execution plan: digraph G {

    1 [label = "MrInputNode{datalink-sample-input} - 0 Bytes"];

    2 [label = "MrMapNode{datameer.dap.common.job.sample.WritePartitionedPreviewMapper@216634b4}"];

    3 [label = "MrOutputNode{datalink-sample} - 0 Bytes"];

    2 -> 3 [label = "PRODUCED_BY_MAPPER"];

    1 -> 2 [label = "REQUIRED_AS_MAPPER_INPUT"];

    }

    datameer.dap.sdk.util.ExceptionUtil$WrappedThreadException: java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1

    at datameer.dap.sdk.util.ExceptionUtil.wrapInThreadException(ExceptionUtil.java:271)

    at datameer.dap.sdk.util.HadoopUtil.executeTimeRestrictedCall(HadoopUtil.java:165)

    at datameer.dap.sdk.util.HadoopUtil.getFileSystem(HadoopUtil.java:88)

    at datameer.dap.sdk.util.HadoopUtil.getFileSystem(HadoopUtil.java:71)

    at datameer.dap.sdk.cluster.filesystem.ClusterFileSystem.open(ClusterFileSystem.java:242)

    at datameer.dap.sdk.cluster.filesystem.ClusterFileSystemProvider$1.open(ClusterFileSystemProvider.java:15)

    at datameer.dap.sdk.datastore.FileDataStoreModel.openFileSystem(FileDataStoreModel.java:120)

    ...

    Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1

    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)

    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)

    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)

    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:569)

    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:512)

    ...

    All the correct HA configuration details are present in the Custom Properties field of the Administration -> Hadoop Cluster page, but the jobs are still failing to resolve thenameservice1 logical name.

    Cause/Resolution

    Copying the same HA Hadoop configuration details from the Administration -> Hadoop Cluster page to the custom properties field of the HDFS Connection (that datalinks use to connect to the cluster) helps to run Datalinks successfully:

    dfs.nameservices=nameservice1

    dfs.ha.namenodes.nameservice1=namenode1,namenode2

    dfs.namenode.rpc-address.nameservice1.namenode1=hostname1.company.com:8020

    dfs.namenode.rpc-address.nameservice1.namenode2=hostname2.company.com:8020

    dfs.client.failover.proxy.provider.nameservice1=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

    Even having the same configuration details in the Hadoop Custom Properties field of datalinks doesn't help - the configuration needs to be present in the HDFS Connection.

    Instead of setting the HDFS Name Node tohdfs://hostname:8080 and to solve the issue global, it will be necessary to usehdfs://nameservice.*

    Further Information

    regardinghow to "Configure High Availability on a Hadoop Cluster" and "High Availability and Yarn"can be requested from Datameer service team.

    View Article
  • Goal

    I have partitioned data stored in HDFS, with a partition type of string. For example, a Hive table partitioned by county name. I would like to be able to choose certain partitions for ingestion.

    Learn

    To achieve this, specify the path to files with wildcards within theFile Or Folder field in the ImportJob/DataLink configuration wizard. Regular expressions are not supported for folder names, but wild cards are allowed.

    For example, considering a 2 character country code where the path is as follows:

    /warehouse/../country={<country 1>,<country 2>,...}/

    If we want to select just the US and Japan countries:

    /warehouse/../country={us,jp}/

    If we want to do broader pattern matching:

    /warehouse/../country=*/

    /warehouse/../country=a*/

    /warehouse/../country=*s/

    For more information see our documentation: File Path and File Name Patterns.

    View Article
  • Problem

    Attempting to use a Datameer integration link with Microsoft Power BI Desktop doesn't allow data to be retrieved. After defining the integration URL, the Web View shows the Datameer login page but it can't interact with it. This makes it impossible to retrieve the data as authentication can't be performed.

    Cause

    This is a bug/regression in the Microsoft Power BI Desktop. The Web View was previously used to authenticate, and the data would then be ingested. Even after updating the permissions of the query - Power BI Desktop incorrectly attempts to parse the login page as HTML data, rather than re-run the query and retrieve the data.

    Workaround

    The credentials passed with the query can be updated after building the initial query, the initial query deleted, and a new query built. At this point, the new query will properly pass credentials through the HTTP header with the query, and data will be retrieved.

    In Datameer, right-click on your workbook and select Show Results.

    Click Copy Integration Link in the Download dialog.

    Open Power BI Desktop and select Get Data -> Other -> Web

    Paste in your integration link retrieved in Step 1 and click OK.

    Note, in the Table View it shows an HTML document.

    Note, in the Web View you can see this HTML document is the Datameer Login screen correctly challenging for authentication.

    Click on the Menu Button above the Power BI Desktop Ribbon and select Options and Settings -> Data source settings.

    Click the URL shown in the Edit Data Source window and then select Edit Permissions...

    Under Type: Anonymous click Edit... and select Basic Authentication from the list on the left.

    Enter your Datameer credentials, and click Save.

    On the Ribbon Home Tab click Edit Queries.

    In the Query Editor right click on the Document in the Queries pane and Delete the query, then Close & Apply.

    On the Ribbon Home Tab click Get Data -> Other -> Weband re-add your Integration URL.

    Since credentials have been pre-set for this source, they will be passed through to Datameer in the request header and the login page will be bypassed.

    View Article
  • Goal

    Create a KeyStore for implementing signed requests for SAML authentication.

    Learn

    Prerequisites

    There should be a Public Certificate available from the Identity Provider server. Common file formats for this are .cer and.crt.

    Identify the following variables for usage in the environment:

    SERVICE_PROVIDER_ALIAS (i.e. datameersaml)

    IDENTITY_PROVIDER_ALIAS (i.e. externalsaml)

    KEYSTORE_FILENAME (i.e. datameersaml.keystore)

    Step-by-step guide

    1) Generate a new KeyStore and private key on the Datameer server by running this command:

    keytool -genkey -alias <SERVICE_PROVIDER_ALIAS> -keyalg RSA -keystore<KEYSTORE_FILENAME>

    A password/passphrase for the new KeyStore file. This command will prompt for the following values:

    Re-enter the same password to confirm.

    Private Key identifying attributes such as Company name, Organization name, etc.

    2)Verify that the<KEYSTORE_FILENAME> is successfully created on the file system.

    3)Import the ID Provider Public Certificate into the KeyStore that was created.

    keytool -import -alias <IDENTITY_PROVIDER_ALIAS> -file <IDENTITY_PROVIDER_CERTIFICATE_FILE> -keystore <KEYSTORE_FILENAME>

    4) Copy the <KEYSTORE_FILENAME> file to a known location on the Datameer server and ensure that the Linux file permissions allow the Datameer user to read the file.

    5)Login to the Datameer GUI and edit the SAML configuration.

    Input the KeyStore information including these values:

    KeyStore Path (path to the <KEYSTORE_FILE>)

    KeyStore Password (this was input during the first keytool command)

    Service Provider Alias Name (<SERVICE_PROVIDER_ALIAS>)

    Service Provider Passphrase (this was input during the first keytool command)

    View Article
  • Goal

    If I want to calculate the distance between two positions, how can I achieve this?

    Learn

    If you have two geohashes, you first need to decode them into latitude and longitude:

    GEOHASH_DEC_LAT(#GeoHash)

    GEOHASH_DEC_LONG(#GeoHash)

    After you have your coordinates you can calculate the distance:

    IF(! ISBLANK(#Timestamp); (ACOS(COS(RADIANS(90-#LatitudeFrom)) *COS(RADIANS(90-#LatitudeTo)) +SIN(RADIANS(90-#LatitudeFrom)) *SIN(RADIANS(90-#LatitudeTo)) *COS(RADIANS(#LongitudeFrom-#LongitudeTo))) *3958.756); null)

    whereby r = 3958.756 gives the distance in miles and r = 6371 in kilometer.

    View Article
  • Log Types

    Conductor Log

    This is the application log for Datameer. If you are experiencing overall issues with Datameer, this would be the first log to check. It contains general information and errors regarding Datameer.

    Download this log under the Administration tab under System Dashboard. Please note, you will need to haveadministrative access to this tab.

    Job Log

    This log follows specific job artifacts such as Import Jobs, Workbooks, etc. If you are experiencing issues with a specific job, this would be a good place to start. This log usually points in the direction of any errors.

    Download the job log by right clicking on your Datameer artifact and select "Show Details".

    Next, you will want to select the job run under History. Click on the job ID number to get more details.

    To download the job log, click Download Logfile under Job Log.

    Job Trace

    The Job Trace is a collection of files with data on a specific Datameer artifact (i.e. Workbooks, File Uploads, Export Jobs, etc.). The Job Trace includes the Job Log, along with other configuration files for the job. Additionally, if a job is failing, extra logs for the attempts may be stored here. The Job Trace will include the following:

    job-conf.xml

    job-definition.json

    job-plan-original.dot

    job-plan-compiled.dot

    job.log

    To Download the Job Trace, follow the same directions as the Job Log, but instead, select Download Job Trace under Job Log.

    Log Details

    Here are different log levels you may encounter while reading your logs from Datameer.

    INFO

    These are system or services messages indicating what Datameer is doing, or tasks it is performing.

    WARN

    Possible abnormalities during activity or something different than what Datameer expects. Most of the time, it leads to the next line in the log (potentially errors). You may also see warnings when records are dropped during a job run.

    ERROR

    When the system is no longer able to perform an intended task due to various issues. There is never one over arching reason for an error. Many things can cause an error. After an error, you will usually get a stack trace that provides more insight as to what actually caused the error.

    View Article
  • Goal

    I want Datameer embedded jetty serve only TLS1.2 requests, reject all weaker TLS algorithms, and disable weak CipherSuites.

    Learn

    By default jetty's SSL module is configured to serve data via any supported SSL/TLS version except SSLv3, as verified in the Configuring SSL/TLS section of the jetty documentation.

    Below is an example configuration block that adds more protocols into the exception list:

    <Set name="ExcludeProtocols">

    <Array type="String">

    <Item>SSL</Item>

    <Item>SSLv2</Item>

    <Item>SSLv2Hello</Item>

    <Item>SSLv3</Item>

    <Item>SSLv3</Item>

    <Item>TLSv1</Item>

    <Item>TLSv1.1</Item>

    </Array>

    </Set>

    A second option is to change the configuration of the JVM used by Datameer and disable unwanted TLS algorithms. To do so, set thejdk.tls.disabledAlgorithms property within the$JAVA_HOME/jre/lib/security/java.securityfile and restart Datameer to apply changes.

    For more details, see the following documentation: How to force java server to accept only tls 1.2 and reject tls 1.0 and tls 1.1 connections

    In order to allow or forbid Jetty to use a certainCipherSuite, edit the appropriate properties within thejetty-ssl.xml configuration file, per the Jetty/Howto/CipherSuites section of Jetty's documentation.

    Please note that Datameer restart is required to apply any changes to jetty configuration.

    View Article
  • Problem

    Error Message:

    Caused by: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration

    Cause

    Removal or corruption of underlying Tez JAR files contained in Datameer's temp locations.

    Solution

    Stop the Datameer service and completely purge the following temp paths:

    <datameer>/temp

    <datameer>/tmp

    hdfs://<datameer's private folder>/temp

    hdfs://<datameer private folder>/jobjars

    Start the service and new temp data will be built to replace the purged files.

    View Article
  • Goal

    In some company environments, cURL might not be an approved software while Wget is. Learn to work with REST API using Wget.

    Learn

    The following approach works in test environments for downloading the JSON definition of artifacts.

    ENC='$(echo -n '<user>:<pass>' | openssl base64 -base64)' && wget --no-check-certificate -O artifact.json --header 'Accept:application/json' --header 'Authorization:Basic ${ENC}' 'https://<host>:8443/rest/<artifactType>/<configID>'

    Limitations

    Note that depending on the version you are using there might be limitations.For example Downloading Data Without an Export Job using a REST API call might not work:

    ENC="$(echo -n '<user>:<pass>' | openssl base64 -base64)" && wget -t 1 --no-check-certificate -O content.csv -c --header "Accept:text/plain" --header "Authorization:Basic ${ENC}" "https://<host>:8443/rest/data/workbook/<configID>/<WorksheetName>/download"

    Further Information

    How to Upload the JSON Definition of Artifacts

    View Article
  • Goal

    In a workbook, you might have a column calledNames with about 20 names. You cantranspose this column into a row to create a list of all 20 names.

    Solution

    Createa SourceSheet calledNames with two columns:Namesand n.

    Create a FormulaSheet calledNamesPrepareToGroup with acolumn usingNamesandn names, and a column usingdummyGroupNo and 1.

    Create a FormulaSheet calledNamesGrouped with thecolumn NamesGroupedand the formula

    GROUPBY(#NamesPrepareToGroup!dummyGroupNo)

    the column NamesConcatwith the formula

    GROUPCONCAT(#NamesPrepareToGroup!Names)

    Attachements

    View Article
  • Goal

    A command line tool such as cURL or Wget is required to manipulate Datameer using the REST API.

    For a Windows machine, you can use Cygwin and the cURL package (you should look into Installing cURL on Cygwin on Windows ), but this method still requires command line manipulation.

    While developing a REST service you want to submit commands and data manually and see the response. Use this article to learnaboutGUIs for REST services and manual testing.

    Learn

    You cancheck available GUI front ends, clients, and browser plug-ins mentioned in

    http://stackoverflow.com/questions/7746448/is-there-a-handy-gui-for-rest-manual-services-testing

    http://stackoverflow.com/questions/603170/gui-frontend-for-curl-for-testing-an-api

    View Article
  • Problem

    Datameer doesn't work once Kerberosis installed.

    The following error message appears:

    ... 85 more

    Caused by: KrbException: Integrity check on decrypted field failed (31) - PREAUTH_FAILED

    at sun.security.krb5.KrbAsRep.<init>(KrbAsRep.java:82)

    at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316)

    at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361)

    at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:776)

    ... 98 more

    Caused by: KrbException: Identifier doesn't match expected value (906)

    at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)

    at sun.security.krb5.internal.ASRep.init(ASRep.java:64)

    at sun.security.krb5.internal.ASRep.<init>(ASRep.java:59)

    at sun.security.krb5.KrbAsRep.<init>(KrbAsRep.java:60)

    ... 101 more

    Cause

    Some operating systems use AES-256 for Kerberos principal. If the correct JCE is missing, the client fails to authenticate to Kerberos.

    Solution

    Download the unlimited JCE package from the Java Oracle website and follow the README in the same package.

    View Article
  • Problem

    An Import Job or Export Job fails in Datameer and the following error message is generated:

    Caused by: java.io.IOException: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset

    More expansively, here is an example stacktrace from an Export Job. This stacktrace is generated in the job log:

    java.lang.RuntimeException: java.lang.RuntimeException: Failed to generate file for 'RecordStream[sheetName=export,description=datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor@3415aedf]'

    at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:56)

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

    at java.util.concurrent.FutureTask.run(FutureTask.java:262)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

    at java.lang.Thread.run(Thread.java:745)

    Caused by: java.lang.RuntimeException: Failed to generate file for 'RecordStream[sheetName=export,description=datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor@3415aedf]'

    at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:240)

    at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:53)

    ... 5 more

    Caused by: java.lang.RuntimeException: java.io.IOException: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset

    at datameer.com.google.common.base.Throwables.propagate(Throwables.java:160)

    at datameer.dap.common.graphv2.hadoop.ServerSideContext.finalizeExport(ServerSideContext.java:62)

    at datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor.afterMrJobEnd(ExportJob.java:180)

    at datameer.dap.common.graphv2.RecordStream.afterJobEnd(RecordStream.java:76)

    at datameer.dap.common.graphv2.BaseMrClusterJob.afterJobEnd(BaseMrClusterJob.java:142)

    at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:138)

    at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:227)

    ... 6 more

    Caused by: java.io.IOException: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset

    at datameer.awstasks.ssh.JschRunner.run(JschRunner.java:215)

    at datameer.awstasks.ssh.JschRunner.execute(JschRunner.java:222)

    at datameer.awstasks.exec.ShellExecutor.execute(ShellExecutor.java:25)

    at datameer.dap.hadoop.filesystem.LinuxShellCommandExecutor.deletePath(LinuxShellCommandExecutor.java:52)

    at datameer.dap.hadoop.filesystem.ScpFileSystem.delete(ScpFileSystem.java:134)

    at datameer.dap.sdk.util.FileOutputAdapter.finalizeExport(FileOutputAdapter.java:342)

    at datameer.dap.common.graphv2.hadoop.ServerSideContext.finalizeExport(ServerSideContext.java:60)

    ... 11 more

    Caused by: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset

    at awstasks.com.jcraft.jsch.Session.connect(Session.java:558)

    at awstasks.com.jcraft.jsch.Session.connect(Session.java:183)

    at datameer.awstasks.ssh.JschRunner.createFreshSession(JschRunner.java:369)

    at datameer.awstasks.ssh.JschRunner.openSession(JschRunner.java:289)

    at datameer.awstasks.ssh.JschRunner.run(JschRunner.java:207)

    ... 17 more

    Cause

    The cause of this issue is that the network connection between a Hadoop Data Node and the Import/Export SSH or SFTP server was dropped by the SSH or SFTP server.

    To identify the root cause of the dropped connection, the SSH or SFTP daemon logs may need to be investigated on the Import/Export target server.

    A common cause is that there is a limit on the number of concurrent SSH/SFTP connections to target server and this limit was temporarily exhausted.

    Resolution

    If the root cause of the issue was a temporary issue (i.e. limit exhausted or network outage), it may be possible to work-around this issue by simply re-running the job.

    It may be helpful to reduce the concurrency of the Import Job or Export Job. This may cause the performance of the job to decrease since less concurrent network connections are established. To limit the maximum concurrency, set the following parameter in the Custom Properties of the affected job (the example value is 1):

    das.splitting.max-split-count=1

    View Article
  • Problem

    When you attempt to execute my workbook containing partitioned data, you notice a few select partitions are causing a failure of the job.

    Upon closer inspection, you see the following:

    Caused by: java.lang.RuntimeException: hdfs://<datameer_private_folder>/importjobs/<artifact_id>/<execution_id>/rewrite/data/<partition_date>/<exported_parquet_file>_0.parquet is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [-76, -93, 1, 0]

    If the broken partition is excluded, the workbook will complete successfully.

    Cause

    The error can be traced back to a Compaction Job that ran against the source data. If such a job fails, it currently only attempts to clean up the data from the previous attempt. If this fails and a new attempt is started without the clean up, you are left with an additional corrupted file in your output path.

    Workaround

    If you were to identify the source directories that have been impacted, the corrupted files can be removed to repair the partition.

    Two ways to identify the broken file:

    It will contain an "_<attempt_value>" that will be lower than the other file in place.

    The corrupted file will typically be sized smaller than the intact file.

    Solution

    We have identified a solution and will be releasing updated code in the form of a maintenance patch.

    For further inquiries, please reach out to Support and provide "DAP-37174" as a reference.

    View Article
  • Problem

    The Datameer GUI in inaccessible. When reviewing the conductor.log file on the Datameer server, the following stack trace is visible:

    [system] ERROR [2014-01-01 00:00:00.000] [JobScheduler thread-1] (JDBCTransaction.java:198) - JDBC rollback failed

    com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after connection closed.

    at sun.reflect.GeneratedConstructorAccessor123.newInstance(Unknown Source)

    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

    at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)

    at com.mysql.jdbc.Util.getInstance(Util.java:383)

    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1023)

    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:997)

    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:983)

    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:928)

    at com.mysql.jdbc.ConnectionImpl.throwConnectionClosedException(ConnectionImpl.java:1323)

    at com.mysql.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:1315)

    at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5057)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

    at java.lang.reflect.Method.invoke(Method.java:597)

    at com.jamonapi.proxy.MonProxy.invoke(MonProxy.java:127)

    at com.jamonapi.proxy.JDBCMonProxy.invoke(JDBCMonProxy.java:100)

    at com.sun.proxy.$Proxy58.rollback(Unknown Source)

    at com.mchange.v2.c3p0.impl.NewProxyConnection.rollback(NewProxyConnection.java:855)

    at org.hibernate.transaction.JDBCTransaction.rollbackAndResetAutoCommit(JDBCTransaction.java:213)

    at org.hibernate.transaction.JDBCTransaction.rollback(JDBCTransaction.java:192)

    at org.hibernate.ejb.TransactionImpl.rollback(TransactionImpl.java:107)

    at datameer.dap.conductor.persistence.PersistenceService.rollbackTransaction(PersistenceService.java:139)

    at datameer.dap.conductor.persistence.TransactionHandler.execute(TransactionHandler.java:119)

    at datameer.dap.conductor.persistence.TransactionHandler.executeInNewTransaction(TransactionHandler.java:92)

    at datameer.dap.conductor.job.SingleThreadedTransactionController.execute(SingleThreadedTransactionController.java:32)

    at datameer.dap.conductor.job.SingleThreadedController.executeAndLogMetrics(SingleThreadedController.java:141)

    at datameer.dap.conductor.job.SingleThreadedController.loop(SingleThreadedController.java:117)

    at datameer.dap.conductor.job.SingleThreadedController$2$1.run(SingleThreadedController.java:89)

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)

    at datameer.dap.conductor.webapp.security.DatameerSecurityService.runAsUser(DatameerSecurityService.java:124)

    at datameer.dap.conductor.webapp.security.DatameerSecurityService.runAsUser(DatameerSecurityService.java:139)

    at datameer.dap.conductor.job.SingleThreadedController$2.run(SingleThreadedController.java:85)

    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

    at java.lang.Thread.run(Thread.java:662)

    Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

    The last packet successfully received from the server was 60,621 milliseconds ago. The last packet sent successfully to the server was 1 milliseconds ago.

    at sun.reflect.GeneratedConstructorAccessor124.newInstance(Unknown Source)

    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

    at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)

    at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1137)

    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3715)

    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3604)

    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4149)

    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615)

    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)

    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2834)

    at com.mysql.jdbc.ConnectionImpl.setAutoCommit(ConnectionImpl.java:5368)

    at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

    at java.lang.reflect.Method.invoke(Method.java:597)

    at com.jamonapi.proxy.MonProxy.invoke(MonProxy.java:127)

    at com.jamonapi.proxy.JDBCMonProxy.invoke(JDBCMonProxy.java:100)

    at com.sun.proxy.$Proxy58.setAutoCommit(Unknown Source)

    at com.mchange.v2.c3p0.impl.NewProxyConnection.setAutoCommit(NewProxyConnection.java:881)

    at org.hibernate.transaction.JDBCTransaction.begin(JDBCTransaction.java:87)

    at org.hibernate.impl.SessionImpl.beginTransaction(SessionImpl.java:1473)

    at org.hibernate.ejb.TransactionImpl.begin(TransactionImpl.java:60)

    at datameer.dap.conductor.persistence.PersistenceService.beginTransaction(PersistenceService.java:81)

    at datameer.dap.conductor.persistence.TransactionHandler.execute(TransactionHandler.java:107)

    ... 12 more

    Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.

    at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3161)

    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3615)

    ... 30 more

    Cause

    This is a MySQLsettings issue. The response time of the MySQLserver (60,621 ms in the example above) exceeds the MySQLserver's configured wait_timeout value.

    Solution

    To resolve this issue, work with the MySQLdatabase administrator to increase the value of the wait_timeout parameter.

    This setting is configurable in the my.cnf file. By default, mysql sets this value to "28800" seconds. If this value has been modified from the default, consider reverting it back to the default to restore connectivity to the Datameer server.

    View Article
  • Description

    According to the Apache Hadoop documentation,history files are written by MapReduce jobs (in HDFS) to the.../history/done_intermediate/directory. This location is configured inmapred-site.xmlvia the propertymapreduce.jobhistory.intermediate-done-dir.

    After a mapreduce job completes, logs are written to HDFS under this directory. The history server continuously scans the intermediate directory and moves any newly available logs to the directory specified by themapreduce.jobhistory.done-dirparameter inmapred-site.xml. From this location, history server picks up the logs and displays them on the history server UI.

    MapReduce Job History retention policy is controlled by the below properties.

    mapreduce.jobhistory.cleaner.enable- True / False. Default value isTrue.

    mapreduce.jobhistory.cleaner.interval-ms- How often the job history cleaner checks for files to delete, in milliseconds. Defaults to 86400000 (one day). Files are only deleted if they are older thanmapreduce.jobhistory.max-age-ms.

    mapreduce.jobhistory.max-age-ms- Job history files older than this many milliseconds will be deleted when the history cleaner runs. Defaults to 604800000 (1 week).

    View Article
  • Problem

    After creating a data link to Hive it is not possible to import data. In the l og files an error is shown.

    Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/common/io/NonSyncByteArrayOutputStream

    ...

    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream

    ...

    Background

    When loading classes, Datameer is giving priority to the etc/custom-jars directory.

    If custom Hive SerDe are used, our process isexpecting the classes to reside in the Hiveplugin, but if they were first picked up from custom-jarsthen they will be skipped when the Hive plugin becomes loaded as they are already available.

    Troubleshooting Steps

    Review the current Hiveplugin and check if all classes are in place.

    Check MD5.

    Runlsof against the Datameer process ID ( PID ).

    Note if there are classes pulled from /etc/custom-jars.

    Solution

    Ensure that custom SerDe jar files arenot included in the <datameer-install-path>/etc/custom-jars. If extra custom SerDe jar files in the custom-jars path are found, they need to be removed. It will will be necessary to restart the Datameer service to make thechange active.

    View Article
  • Problem

    A Datameer job fails and in the job log, the following stacktrace is displayed:

    ERROR [2015-01-01 00:00:00.000] [ConcurrentJobExecutor-4] (ClusterSession.java:186) - Failed to run cluster job 'Workbook job (12345): MyWorkbook with MyJob#Joined(Disconnected record stream)' [1 hrs, 18 mins, 24 sec] java.lang.RuntimeException: Job job_1447373200318_0080 failed! Failure info: Task failed task_1447373200318_0080_r_000071 Job failed as tasks failed. failedMaps:0 failedReduces:1

    at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:49)

    at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:31)

    at datameer.dap.common.graphv2.hadoop.MrJob.runImpl(MrJob.java:228)

    at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:128)

    at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:181)

    at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:48)

    at datameer.dap.common.security.DatameerSecurityService$1.call(DatameerSecurityService.java:135)

    at datameer.dap.common.security.DatameerSecurityService$1.call(DatameerSecurityService.java:129)

    at java.util.concurrent.FutureTask.run(FutureTask.java:262)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

    at java.lang.Thread.run(Thread.java:745)

    Caused by: java.io.IOException: Job job_1447373200318_0080 failed! Failure info: Task failed task_1447373200318_0080_r_000071 Job failed as tasks failed. failedMaps:0 failedReduces:1

    at datameer.dap.common.job.mr.DefaultMrJobClient.waitUntilJobCompletion(DefaultMrJobClient.java:234)

    at datameer.dap.common.job.mr.DefaultMrJobClient.runJobImpl(DefaultMrJobClient.java:91)

    at datameer.dap.common.job.mr.MrJobClient.runJob(MrJobClient.java:34) at datameer.dap.common.graphv2.hadoop.MrJob.runImpl(MrJob.java:216) ... 9 more

    Caused by: java.lang.RuntimeException: Task: AttemptID:attempt_1447373200318_0080_r_000071_3 Timed out after 600 secs

    Cause

    The timeout occurs when a task isn't updating on the cluster side within the specified time frame. This problem mightoccur due to priorities of other tasks on that node at that time. Ultimately,thetask was terminated by Hadoop because it exceeded the timeout value(in milliseconds).

    mapreduce.task.timeout

    Solution

    To be more flexible, increase the timeout parameter by setting 6million milliseconds

    mapreduce.task.timeout=6000000

    for this job and re-running it. A Datameer administrator canimplement this recommendation.

    If that doesn't resolve the issue, contact Datameer Support for further assistance.

    Further Information

    This issue is described in the Apache Hadoop documentation of mapred-default.xml.

    "The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. A value of 0 disables the timeout."

    View Article
  • Goal

    Using the JDBC driver provided by Microsoft, connections from a Linux client may not use Windows Authentication to connect to an MSSQL instance.

    I want to configure MSSQL connections using Windows authentication.

    Learn

    To work around this limitation, it may be possible to configure Kerberos authentication and to continue to use the JDBC driver provided by Microsoft. Alternatively, there is an available open-source driver named JTDS which can be used to configure Linux clients to connect to an MSSQL instance using Windows Authentication (without Kerberos).

    For Java 1.6, the suitable JTDS driver is version 1.2.8 which is available for download. For further information and documentation about this driver, please consult the projectpage.

    To add this driver to Datameer, follow these steps:

    Extract the included jtds-1.2.8.jar file from the jtds-1.2.8-dist.zip file.

    Navigate to the Datameer Administration page and select the Database Drivers category from the left pane.

    Click New to add the JTDS driver.

    Provide a name such as "JTDS".

    Upload the extracted jar file from step 1.

    Select "MsSql" for the Database Driver Template.

    For the driver class, input the following value:

    net.sourceforge.jtds.jdbc.Driver

    For the connection pattern, please use the following template:

    jdbc:jtds:sqlserver://\%hostName\%:\%port\%/\%database\%;instance=\%instance\%;domain=\%domain\%;

    Click save to add the JTDS driver.

    With the driver added to Datameer, navigate to the Browser section.

    Create a new Connection. As the type, select "JTDS" from the Databases section and click Next.

    Fill in the connection details using the template provided.Here is an example completed connection string:

    jdbc:jtds:sqlserver://mymssqlserver.corp.company.com:1433/mydatabase;instance=myinstance;domain=corp.company.com;

    Specify the user and password. Click Next to test the connectivity.

    Assuming the connection was successful, save the new connection.

    The newly added JTDS driver and connection are now ready for use.

    View Article
  • Goal

    Import EDI 837 health care files, also known as X12-837 or ANSI-837 into Datameer.

    Learn

    At the moment Datameer doesn't have the native instruments to parse EDI 837 files, but you could ingest these them as plain text. You can then work with the data using the functionality available in a workbook.

    For example, ingest data the following data mentioned at EDI 837 Health Care Claim :

    ISA*00* *00* *ZZ*99999999999 *ZZ*888888888888 *111219*1340*^*00501*000001377*0*T*>

    GS*HC*99999999999*888888888888*20111219*1340*1377*X*005010X222

    ST*837*0001*005010X222

    BHT*0019*00*565743*20110523*154959*CH

    NM1*41*2*SAMPLE INC*****46*496103

    PER*IC*EDI DEPT*EM*[email protected]*TE*3305551212

    NM1*40*2*PPO BLUE*****46*54771

    HL*1**20*1

    PRV*BI*PXC*333600000X

    NM1*85*2*EDI SPECIALTY SAMPLE*****XX*123456789

    N3*1212 DEPOT DRIVE

    N4*CHICAGO*IL*606930159

    REF*EI*300123456

    HL*2*1*22*1

    Steps

    Create new import job or file upload.

    Choose file type CSV/TSV.

    Set appropriate the appropriate delimiter, for example *.

    Execute the job and link the data to a new workbook.

    For the example file snippet, thedata appears in the workbook as shown in the following screenshot. Then you could use Datameer functions to transform the data set according to your requirements and start analysis.

    plug-in SDK

    Refer to the Importing Data section of our documentation for more details on CSVfile ingestion (escape characters, custom schema, etc).

    Further Proceeding

    In case theinstruments available in Datameer can't help you to get required results, you can leverage our and create a custom connection for particular use-case. Our professional services (PS) team might help you with custom function engineering, if required.

    View Article
  • Datameer EOMPolicy

    Major releases are maintained for two (2) years minimum after the GA of that release or six (6) months after the GA date of the following major release (whichever is longer).

    Minor releases are maintained for nine (9) months minimum after the GA of that release.

    Standard Extensions

    In the event that a minor release GA date is less than nine (9) months before the EOM date for the associated major release, maintenancefor the associated major release will be extended until nine (9) months after the GA date of this minor release.

    In the event that the latest minor release GA date is more than nine (9) months ago and the associated major release GA date is less than two (2) years ago, the minor release will still be maintaineduntil the end of the major release or the next minor release whichever comes first.

    The table below shows end-of-maintenance dates up to twelve (12) months ago and into the future.

    Definition of Maintenance

    Active customers may open tickets with Datameer's Technical Support team for support. Datameer's Technical Support will also assist active customers with upgrading to a maintained release.

    Maintenance also includes software updates for bug fixes, and security vulnerability resolutions.Bug fixes will be made to maintained Minor Releases only. Security vulnerability resolutions will be made available in all maintainedMinor Releases.

    Security vulnerabilities and severe bugs will be repaired in a Maintenance Release. All other bugs will be repaired in future Major or Minor releases.

    Technical supportfor out of maintenance releases may be requested in the Datameer Community.

    Definition of Terms

    Major Release -Version numberX.0.0 : where X changes.

    Minor Release -Version number x.Y.0 : where Y changes.

    Maintenance Release -Version number x.y.Z:where Z changes.

    End of Maintenance (EOM) - The last date a particular release will be maintained. This includes bug fixes and security patches.

    General Availability (GA) - The date that a product was released to all customers.

    Active Customers - These are customers with valid and activeMaintenance and Support contracts with Datameer

    Severe Bug - A software defectwhere no work-around is availablethat severely limits operation within a production environment.

    Security Vulnerability Bug - Asoftware defect that produces a weakness that could allow an attacker to compromise the integrity, availability or confidentiality of Datameer.

    Maintenance Schedule

    Product

    EOMDate

    Datatmeer X (Major)

    2021-11-13

    Datameer 10.0

    2020-08-13

    Datameer 7 (Major)

    2020-05-02

    Datameer 7.5 (Minor)

    2020-05-02

    Datameer 7.4 (Minor)

    2020-02-11

    Datameer 7.2 (Minor)

    2019-05-17

    Datameer 7.1 (Minor)

    2018-12-13

    Datameer 6 (Major)

    2018-09-12

    Versions not listed above are no longer maintained by Datameer.

    View Article
  • Problem

    Datameer can't schedule any new jobs. In the logsthe following exception is shown:

    Error Message

    Failed to submit application_<id> to YARN : org.apache.hadoop.security.AccessControlException: Queue root.default already has 10000 applications, cannot accept submission of application: application_<id>

    Cause

    This error is from the Capacity Scheduler. Based on the following document Configuring YARN Capacity Scheduler Ambari, it appears the yarn.scheduler.capacity.maximum-applications limit are hit, which is set to 10,000 by default.

    Solution

    Check the Hadoop cluster queue and clear out any unnecessary jobs.

    Increase the limitin the YARN configuration to allow for more concurrent applications to be submitted.

    View Article
  • Problem

    When a user tries to login, it throws an error saying "Login failed: User '<Usernam>' could not be authenticated." This happens even though the group is there in Datameer and the user account is also part of that group.

    Cause

    One possible reason of this issue is if a group contains more than 1500 members, LDAP search will fail to retrieve group information for any of those users. This results in Datameer not able to sync that user intoits cache.

    This is seems to come from the LDAP Policy value:MaxValRange

    "MaxValueRange controls the number of values that are returned on a single attribute on a single object. Default"1500 Hard Limit: 5000"

    Solution

    Increate MaxValRange to a value larger than number of users within the specified group.More about the parameter can be found at LDAP Wiki page on MaxValRange

    View Article
  • Goal

    This article describes how custom parameters may be added to a JDBC connection in Datameer. For example, this technique could be used to enable the "tinybit" property in MySQL or to define username and password for MSSQL.

    Learn

    In general, custom properties for JDBC may be added using the following designation

    ?property=value

    Example 1

    If one wanted to activate the tinybit as true in MySQL, the property to add would be as follows

    ?tinybit=1

    Based on this property and the default MySQL JDBC connection pattern, here is the full connection pattern to use

    jdbc:mysql://\%host\%:\%port\%/\%database\%?tinybit=1

    Example 2

    If one wanted to definethe username and password for MSSQL, the property to add would be as follows

    username=<user>;password=<pass>;

    Based on this property and the default MSSQL JDBC connection pattern, here is the full connection pattern to use

    jdbc:sqlserver://<host>:<port>;instance=MSSQLSERVER;DatabaseName=<database>;username=<user>;password=<pass>;

    View Article
  • Problem

    Functions produce duplicate records on some workbook sheets. Specifically, source Parquet files less than a threshold may be read twice. Any downstream calculations will include the duplicated data.No errors are displayed nor logged indicating an issue.

    For example, a grouping Workbook is expected to produce 15 results, but actually produces 20 results.

    Cause

    This is a software defect in Datameer. This is known internally asDAP-36752.

    The problem only occurs if the job processes at least 1 small file (smaller than the threshold) and at least 1 large file (bigger than the threshold). The exact threshold depends on the system settings for the following values:

    minSplitSize: Value of the property mapreduce.input.fileinputformat.split.minsize (default 128MB)

    maxSplitSize: Value of the property mapreduce.input.fileinputformat.split.maxsize (default 512MB)

    parquetMaxBlockSize: Value of the property das.parquet-storage.max-parquet-block-size (default 256MB)

    The threshold itself is calculated asthe maximum of the minSplitSize value and the result of the minimum of the parquetMaxBlockSize or maxSplitSize values. Using the default values, the parquetMaxBlockSize is the minimum of the parquetMaxBlockSize and the maxSplitSize. The resulting parquetMaxBlockSize is compared to the minSplitSize and the maximum result is the parquetMaxBlockSize of 256MB.

    Versions Affected

    7.1.3, 7.1.4 and 7.1.5

    6.4.7, 6.4.8 and 6.4.9

    6.3.9 and 6.3.10

    Workaround

    To work-around this issue, splitting can be disabled by adding the following Custom Property to the Hadoop Cluster's Custom Properties configuration:

    das.splitting.disable-individual-file-splitting=true

    Adding this property will negatively affect the job's performance so it is advised to install the maintenance release as soon as possible.

    If the work-around is applied, it should be deactivated after updating to a fixed release.

    Solution

    Apply the latest Datameer maintenance release to resolve this issue. A fix for this issue is included in 6.4.10 and 7.1.6 and higher releases.

    View Article
  • Goal

    Troubleshoot problematic Import Jobs and Export Jobs using an sFTP connection by implementing additional logging to debug and to determine what is preventing a successful job run.

    Learn

    First, occasionally there is a caching problem depending on the version of SSH/sFTP running on the host. Attempt re-running the job with the followingCustom Property.

    fs.sftp.enable.session-cache=false

    If this does not resolve the issue, remove the above parameter.

    To begin debug troubleshooting, configure the artifact in question for a specific execution framework. Then implement the enhanced logging.

    For Tez, related jobs should additionallybe set within the import-specificCustom Properties:

    das.execution-framework=Tez

    fs.sftp.enable.debug=true

    tez.task.log.level=DEBUG

    tez.am.log.level=DEBUG

    Set the Default log severity to:

    TRACE

    And Logging Customizationof:

    log4j.category.datameer=TRACE

    log4j.category.datameer.awstasks=DEBUG

    log4j.category.awstasks.com.jcraft=DEBUG

    log4j.category.org.apache.hadoop=DEBUG

    Further Information

    Include Hadoop Task Logs

    Often, comparing the sshd_config and ssh_config files from the Datameer Host, Data Nodes, and sFTP Host can be a quick path to resolving sFTP issues. Notably, supported authentication and encryption mechanisms should be identical on all machines.

    View Article
  • Problem

    If you have a file-based data source with many partitions and a significant amount of data, poor performance can be observed when running a data link. Upon investigation of the YARN application logs, it can be seen that five splits are always being used instead of the optimal calculated number of splits.

    ---------- Split Settings ----------

    min/max split size: 16.0 MB (8.0 MB) / 5.0 GB

    min/max split count: 0 / 5

    total input size: 599.4 GB

    slot count: 6980

    number of desired tasks: 5

    optimal split size: 119.9 GB

    optimal split count: 5

    -----------------------------------

    Regardless of changing the min/max split size, min/max split count, and wave count - the 'optimal' split count always remains 5 with only 5 tasks.

    This results in poor performance when the data link is run to generate the sample. In this instance there were almost 20,000 partitions, which means there would be 100,000 sample records generated five tasks at a time.

    Cause

    This was an intentional design decision made by the engineering team. The ideology behind it is that data link samples aren't doing analytical work so they should have fewer resources allocated to them on the cluster. This way jobs doing analytics run faster and have more resources allocated to them proportionally.

    Solution

    This behavior can be overridden by explicitly setting the number of splits for data link sample generation. The property that controls this behavior is:

    das.splitting.datalink.sample-split-count

    This property has a default value of 5, which explains the behavior described in the problem section. Add this property to the Custom Properties of the data link job, and increase the value beyond the default of 5 for added parallelism and more splits.

    View Article
  • Problem

    Attempting to run a job against a cluster using Isilon as the storage backend fails with the following exceptions:

    Diagnostics: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS];

    ...

    [system] WARN [2018-09-24 14:21:15.833] [ClusterMetadataUpdater thread-1] (Client.java:711) - Couldn't setup connection for [email protected] to <hostname>.com/<IP Address:Port> javax.security.sasl.SaslException: No common protection layer between client and server

    at com.sun.security.sasl.gsskerb.GssKrb5Client.doFinalHandshake(GssKrb5Client.java:251)

    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:186)

    Cause

    Isilon HDFS clusters require use_ip for tokens to be set to false for the whole cluster. When use_ip is set to false, all delegation tokens will be represented by hostnames rather than IPs. This is a requirement from thearchitecture of Isilon itself since the Isilon name node is "rolling" among a few servers.

    However, due to a bug reported in MAPREDUCE-6565, in HDP environments, execution frameworks will always take the use_ip setting from core-site.xml from its local mr-framework/hadoop/etc/hadoop directory on distributed cache. In HDP's originaldistribution, core-site is left empty so the Application Master will use the default value (true) for use_ip (hadoop.security.token.service.use_ip). When a job is submitted from client with use_ip=false but theApplication Master uses use_ip=false, the AM will not be able to initialize the SASL client with the name node.

    Solution

    Update the hadoop-site.xml file within the Datameer Tez plugin to ensure that the use_ip setting will be set to false.

    1. Shut down the Datameer conductor. (./conductor.sh stop)

    2. Navigate to <Datameer Home>/plugins and copy the plugin-tez-<version>.zip file to a temporary location.

    3. Unzip the plugin file and edit /classes/hadoop-site.xml

    4. Within the<configuration> tags add the following:

    <property>

    <name>hadoop.security.token.service.use_ip</name>

    <value>false</value>

    <description>Value for Isilon</description>

    </property>

    5. Save the xml file and then re-zip the plugin contents to create a newplugin-tez-<version>.zip

    6. Replace the original plugin zip with the new modified copy.

    7. Restart the Datameer conductor.

    View Article
  • Goal

    I want to collect network HAR file logs from a developer tools session.

    Learn

    In order to effectively troubleshoot any support issues you may be, at times, asked to provide a network HAR file from the Developer Tools section of Google Chrome or the preferred browser of your choice.

    In this article, we will be reviewing how to capture a trace from Google Chrome.

    In order to capture a network trace, begin by first opening the Developer Tools section of Google Chrome. This can be found by selecting the drop down in the upper right of the browser window (next to Settings) and navigating to More Tools > Developer Tools:

    Developer Tools

    Once opened, navigate to the Network tab and perform the reproduction action that prompted the opening of the support case (ex. open the troublesome workbook). This will populate the network section of Developer Tools with the information necessary for support:

    Once the issue is reproduced, right click anywhere in the Developer Tools pane and select Save As HAR with Content:

    Please be sure to include this HAR file as an attachment to the support ticket.

    Further Information

    The similar is possible with Internet Explorer (IE). You can use the included to export a debug session into a file called NetworkData in XML format.

    View Article
  • Goal

    This article describes a manual method for collecting YARN Application logs if log aggregation is not enabled. The automated and recommended method is outlined in this article: How to Collect the YARN Application Logs

    Learn

    Follow the steps in the above article to identify the Application ID for the affected job. Once the application ID is known, follow these steps:

    1. Navigate to the Resource Manager UI then find the application ID and click on the link.

    2. Click on the Logs button for the Application attempt.

    3. For each of the log files displayed, open the full log and then save the file. Ensure that the syslog, syslog_dag, stdout, and stderr files are captured at a minimum.

    This concludes the steps to collect the logs for the Application Master. In addition, there might be other containers that were created to execute this particular application. The same logs might be required to be collected from any failed or suspect containers as well.

    View Article
  • We've found the following setup working

    Hadoop Cluster

    Since your Hadoop cluster needs a running Timeline Server, check that the following properties are enabled in yarn-site.xml

    ...

    <property>

    <name>yarn.timeline-service.enabled</name>

    <value>true</value>

    </property>

    ...

    <property>

    <name>yarn.timeline-service.hostname</name>

    <value>localhost</value><!-- configure a proper hostname that is available for all nodes -->

    </property>

    ...

    <property>

    <name>yarn.timeline-service.http-cross-origin.enabled</name>

    <value>true</value>

    </property>

    ...

    <property>

    <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>

    <value>true</value>

    </property>

    ...

    It runs per default on port 8188

    If it is not running, start it with ./sbin/yarn-daemon.sh start timelineserver

    On Datameer Application Server

    Download and unpack Apache Tomcat 8.5.20

    Configure the HTTP port in conf/server.xml e.g. 8280

    Download and unzip Apache Tez 0.7.1 binaries

    Delete everything under TOMCAT_HOME/webapps/*

    Unzip the tez-ui-0.7.1.war into TOMCAT_HOME/webapps/ROOT

    Edit TOMCAT_HOME/webapps/ROOT/scripts/configs.js

    timelineBaseUrl must point to the Timeline Server (ATS)

    RMWebUrl must point to the Resource Manager (RM)

    Double check that JAVA_HOME is set, (It should be set as it's a pre-requirement for Datameer)

    Start the server with TOMCAT_HOME/bin/catalina.sh start

    Configure a test job with Hadoop Custom Properties and Debug Logging Implemented

    das.execution-framework=Tez

    das.debug.tasks.logs.collect.force=true

    tez.task.log.level=DEBUG

    tez.am.log.level=DEBUG

    tez.allow.disabled.timeline-domains=true

    tez.am.history.logging.enabled=true

    tez.dag.history.logging.enabled=true tez.history.logging.service.class=org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService

    tez.tez-ui.history-url.base=http://<thisHost>:8080/#/main/view/TEZ/tez_cluster_instance

    yarn.timeline-service.enabled=true

    yarn.timeline-service.hostname=<ATS>

    If you have a Kerberos Secured cluster you should configure the cluster in the yarn-site.xml as described in section Security Configuration

    [email protected]

    yarn.timeline-service.keytab=/home/datameer/datameer.keytab

    Test

    Execute test job

    Gather and review logs, yarn-site.xml, Tomcat/catalina.log, full job trace

    Check that everything is working properly

    View Article
  • This document outlines the Datameer Support Holiday schedule.

    Datameer Support holidays are observed regionally and indicate limited support coverage for business hours on the given day. Service level agreementswith 24x7x365 support are not affected by holidays; only "business hours" service level agreementsare affected by holidays. If you have any questions regarding your organization's support subscription, please reach out to your Account Executive.

    Here are the business hours for each region that will be closed for a regional holiday:

    Region

    Start Time

    End Time

    Timezone(s)

    AMER

    0600

    1800

    Pacific Time (GMT-8;GMT-7)

    EMEA

    0900

    1800

    Berlin Time (GMT-1;GMT-2)

    2019 Holiday Schedule

    Holiday Title

    Date Observed

    Region(s) Observed

    New Years Day

    January 1, 2019

    Worldwide

    Epiphany

    January 6, 2019

    EMEA

    Martin Luther King Jr. Day

    January 21, 2019

    AMER

    President's Day

    February 18, 2019

    AMER

    Good Friday

    April 19, 2019

    Worldwide

    Easter Monday

    April 22, 2019

    EMEA

    Labor Day (Germany)

    May, 1 2019

    EMEA

    Memorial Day

    May 27, 2019

    AMER

    Ascension

    May 30, 2019

    EMEA

    Whitmonday

    June 10, 2019

    EMEA

    US Independence Day

    July 4, 2019

    AMER

    Labor Day (US)

    September 2, 2019

    AMER

    Day of German Unity

    October 3, 2019

    EMEA

    Thanksgiving (US)

    November 28, 2019

    AMER

    Day After Thanksgiving (US)

    November 29, 2019

    AMER

    Christmas

    December 25, 2019

    Worldwide

    Day After Christmas

    December 26, 2019

    EMEA

    2020 Holiday Schedule

    Holiday Title

    Date Observed

    Region(s) Observed

    New Years Day

    January 1, 2020

    Worldwide

    Epiphany

    January 6, 2020

    EMEA

    President's Day

    February 17, 2020

    AMER

    Spring Friday

    April 10, 2020

    Worldwide

    Easter Monday

    April 13, 2020

    EMEA

    Labor Day (Germany)

    May, 1 2020

    EMEA

    Ascension

    May 21, 2020

    EMEA

    Memorial Day

    May 25, 2020

    AMER

    Whit Monday

    June 1, 2020

    EMEA

    US Independence Day

    July 3, 2020

    AMER

    Labor Day (US)

    September 7, 2020

    AMER

    Thanksgiving (US)

    November 26, 2020

    AMER

    Day After Thanksgiving (US)

    November 27, 2020

    AMER

    Christmas Eve

    December 24, 2020

    AMER

    Christmas

    December 25, 2020

    Worldwide

    View Article
  • Business Impact Descriptions

    Severe:Datameer is entirely unusable for all users. The situation completely halts your business operations and no workaround exists.

    High:Datameer functions partially. The situation is causing a significant impact to your business operations and no workaround exists.

    Moderate:A problem that involves partial, non-critical loss of use of the software.

    Low:A general usage question, reporting of a documentation error, or recommendation for a future product enhancement or modification. There is low-to-no impact on your business or the performance or functionality of your system.

    Examples

    Example 1: Datameer is offline and no user can login. The Datameer administrator has attempted to restart Datameer, but it remains inaccessible for all users.

    This example is a Severe impact.

    Example 2: When configuring a new artifact (i.e. an Import Job), the setup or first execution of this artifact fails.

    This example is aModerateorLowimpact depending on the use case that is being created.

    Example 3: An existing artifact (i.e. a Workbook) that has been running successfully in the past is now failing to execute successfully. Successful execution of this artifact is crucial to the business.

    This example is aHigh impact.

    View Article
  • This document describes how the severity is assigned to support tickets.

    The severity of a ticket is defined by two factors, the environment type and the business impact. The following table outlines the severity for each possible scenario:

    Production Environment

    Non-Production Environment

    Severe Business Impact

    Severity 1

    Severity 3

    High Business Impact

    Severity 2

    Severity 3

    Moderate Business Impact

    Severity 3

    Severity 4

    Low Business Impact

    Severity 4

    Severity 4

    For more detailed information about the business impacts, please review the descriptions and examples.

    View Article
  • This Maintenance and Support Policy (“Policy”) describes the current practices of Datameer with regard to its provision of technical support and maintenance services to entities that have entered into an agreement for Datameer’s Software (each such entity, a “Customer”). Datameer will not modify the terms of your Policy during the initial term of your license; however, if you renew your license, then the version of this Policy that is current at the time of renewal will apply for your renewal term.

    1) Definitions

    “Business Day” means Monday through Friday, Coordinated Universal Time (UTC), excluding holidays observed by Datameer. Datameer holidays are published here:

    https://www.datameer.com/supportholidays

    “Business Hours” means 12:00 a.m. to 11:59 p.m., UTC on Business Days.

    “Active Customer” means a Customer with a valid and unexpired Enterprise Support Services subscription with Datameer.

    “Support Contact” means a representative from an Active Customer that has successfully completed Datameer Administration certification.

    “Supported Instance” means a server belonging to an Active Customer running Datameer software.

    2) Technical Support Technical support is available with a subscription to a Datameer Support Service package which defines service level agreements. These guidelines define the following areas;

    a) Support Tickets Every Support Contact is required to create an account in the Customer Support Center prior to opening a ticket. Once a Support Contact has created an account and has logged into the Customer Support Center, the contact may manage open tickets, review previously solved tickets and may submit new tickets. When submitting tickets, a Support Contact must provide the following information: (a) a description of the issue; (b) the step-by-step process to reproduce the issue; (c) the error messages associated with the issue; (d) any additional data available, or required as determined by Datameer, including but not limited to stack traces, configuration settings, and related information; and (e) information necessary to classify the severity of the issue

    b) Support Hotline Support Contacts may phone in new tickets or may request continued assistance with existing tickets at +1-800-874-0569.

    c) Service Levels- Service Levels are defined per Datameer support service packages set forth below:

    i) Enterprise Standard - Bundled with Datameer Enterprise on AWS Marketplace (Hourly Package)

    Severity

    Active Hours

    Initial Response

    Update Frequency

    1

    Business Hours

    Within 3 hours

    Updated every 1 business day

    2

    Business Hours

    Within 6 hours

    Updated every 1 business day

    3

    Business Hours

    Within 24 hours

    Updated every 2 business days

    4

    Business Hours

    Within 3 business days

    Updated every 5 business days

    ii) Enterprise Silver

    Severity

    Active Hours

    Initial Response

    Update Frequency

    1

    Business Hours

    Within 2 hours

    Updated every 2 hours

    2

    Business Hours

    Within 4 hours

    Updated every 1 business day

    3

    Business Hours

    Within 12 hours

    Updated every 2 business days

    4

    Business Hours

    Within 1 business day

    Updated every 5 business days

    iii) Enterprise Gold

    Severity

    Active Hours

    Initial Response

    Update Frequency

    1

    24x7x365

    Within 1 hour

    Updated every 1 hour

    2

    24x7x365

    Within 2 hours

    Updated every 1 business day

    3

    Business Hours

    Within 4 hours

    Updated every 2 business days

    4

    Business Hours

    Within 8 hours

    Updated every 3 business days

    If Datameer provides a work-around that corrects an issue, but the Support Contact does not consider the work-around to be a reasonable solution, the priority level of the ticket will be updated to Severity 3.

    Initial response is satisfied with either an inbound Customer phone call answered, a phone call placed to the Customer or a public comment to the ticket where the Support Contact is also notified in writing, with an action plan on the initial steps required to begin the problem resolution process. Given the heightened urgency around Severity 1 and 2 tickets, initial response may include an invitation to participate in a screen share session to shorten time to problem isolation.

    d) Severity Level Definitions

    Severity 1: A problem that severely impacts Customer’s use of Datameer in a production environment (i.e. loss of production data or a production system is not functioning). Additionally, the situation halts routine business operations and no work-around exists.

    Severity 2: A problem where Datameer is functioning but Customer’s use in a production environment is severely reduced (i.e., a job-failure of a business critical job). The situation is causing a high impact to your business operations and no workaround exists.

    Severity 3: A problem that involves partial, non-critical loss of use of the software in a production environment or development environment. For production environments, there is a medium-to-low impact on Customer’s business, but Customer’s business continues to function, including by using a workaround. For development environments, Customer’s usage of Datameer is severely reduced.

    Severity 4: A general usage question, reporting of a documentation error, or recommendation for a future product enhancement or modification. For production environments, there is low-to-no impact on Customer’s business or the performance or functionality of Customer’s system. For development environments, there is a medium-to-low impact on Customer’s business, but Customer’s business continues to function, including by using a workaround.

    e) Scope of Enterprise Technical Support Services An Active Customer may contact Datameer Enterprise Technical Support by opening a ticket via the Customer Support Center to request information regarding the use, configuration or operation of the Datameer software running on any Supported Instance. Enterprise Technical Support services include responses to submitted tickets pertaining to questions and resolving technical problems in the following scope:

    Best practices for setting up and configuring a Supported Instance for running Datameer software, including:

    System requirements and 3rd party software compatibility

    Installation, deployment, migration and upgrading

    Using supported and available functions and features

    Operational support for a Supported Instance running Datameer software, including:

    Best practices for using Datameer software, functions and features

    Identifying, diagnosing and fixing errors in Datameer software

    Preventing and recovering from failures and troubleshooting

    Problem diagnosis and resolution, including:

    Problem isolation and diagnosis of errors in Datameer software

    Patches and workarounds to fix bugs in Datameer software

    Product Enhancement Requests

    Providing feedback to the Datameer product team

    Submitting product enhancement request to the Datameer product team

    f) Additional Services (Available Upon Request) Additional services are available by request for an added cost. The following services may be available for Active Customers, contact your sales representative for pricing:

    Installation, Deployment, Migration and Upgrading

    Technical Support includes providing documentation and clarifying requirements.

    Active Customers may request a consultant to perform these tasks on or off site.

    Datameer Systems Integration

    Enterprise Technical Support includes fixing Datameer software errors and best practice recommendations for integrating with other systems.

    Active Customers may request a consultant to design, optimize and deploy an integration between Datameer and other systems.

    Use Case Development

    Enterprise Technical Support includes best practice recommendations for functions and features.

    Active Customers may request a consultant to scope, design, build and deploy a use case, or help guide use case requirements gathering and assessment.

    Product Training

    Enterprise Technical Support includes providing documentation for the Datameer product.

    Active Customers may request Product Training.

    Non-Recurring Engineering

    Enterprise Technical Support includes fixing Datameer software errors and providing documentation for the Datameer Software Development Kit (SDK).

    Active Customers may request Non-Recurring Engineering Services to scope, design, build and deploy a custom plugin using the SDK.

    3) Customers without a Support Contract - Small private offer deals on the Amazon Marketplace do not include a Datameer Support Service package. These customers may submit questions on the Datameer Community, which is free to all customers.

    4) Success Management - When you purchase a subscription to the Datameer Support Service package, Datameer will assign a Success Manager to your account. The service level will vary depending on your subscription, which defines the level of engagement. Please see below guidelines for the service levels under each subscription.

    Deliverables- A Success Manager deliverables per Datameer Support Service packages are set forth below:

    i) Enterprise Silver

    Remote Customer Checkpoint with Business and IT Leadership - Up to 1 per month

    ii) Enterprise Gold

    Remote Customer Checkpoint with Active Customer - Up to 4 per month

    On-site Customer Checkpoint with Active Customer - Up to 1 per month

    Business Review - Up to 1 per quarter

    Datameer Health Dashboard Report and Review - Up to 1 per quarter

    Datameer Product Roadmap Review - Up to 1 per quarter

    Coordinate Beta Program Participation for New Features - As needed

    5) Maintenance Policy - The Datameer maintenance policy is published here: https://www.datameer.com/maintenancepolicy

    6) Changes to Policy -Datameer reserves the right, at its discretion, to change the Policy and the policies within it at any time based on prevailing market practices and the development of Datameer's software products.

    Legacy Maintenance and Support Services are also published for customers with older multi-year subscriptions.

    View Article
  • Please note that this document describes Legacy Maintenance and Support Services from Datameer. These are no longer offered and are published for reference only. The current offerings are available at Enterprise Maintenance and Support Services.

    1) Definitions

    “Business Day” means Monday through Friday, Coordinated Universal Time (UTC), excluding holidays observed by Datameer. Datameer holidays are published here:

    https://datameer.zendesk.com/hc/en-us/articles/211483666-Datameer-Support-Holiday-Schedule

    “Business Hours” means 12:00 a.m. to 11:59 p.m., UTC on Business Days.

    “Active Customer” means a customer with a valid and unexpired Enterprise Support Services subscription with Datameer.

    “Standard Support Services” means the basic support services provided at no cost to Customer.

    “Support Contact” means a representative from an Active Customer.

    “Supported Instance” means a server belonging to an Active Customer running Datameer software.

    2) Technical Support Technical support is available with a subscription to a Datameer Support Service package which defines service level agreements. These guidelines define

    a) Support Tickets Every Support Contact is required to create an account in the Customer Support Center prior to opening a ticket. Once a Support Contact has created an account and has logged into the Customer Support Center, the contact may manage open tickets, review previously solved tickets and may submit new tickets. When submitting tickets, a Support Contact must provide the following information: (a) a description of the issue; (b) the step-by-step process to reproduce the issue; (c) the error messages associated with the issue; (d) any additional data available, or required as determined by Datameer, including but not limited to stack traces, configuration settings, and related information; and (e) information necessary to classify the severity of the issue

    b) Support Hotline Support Contacts may phone in new tickets or may request continued assistance with existing tickets at +1-800-874-0569.

    c) Service Level Agreement

    Severity

    Active Hours

    Initial Response

    Update Frequency

    1

    Business Hours

    Within 3 hours

    Continuous effort until relief provided

    2

    Business Hours

    Within 6hours

    Updated every 1 business day

    3

    Business Hours

    Within 1 business day

    Updated every 2 business days

    4

    Business Hours

    Within 3 business days

    Updated every 5 business days

    If Datameer provides a work-around that corrects an issue, but the Support Contact does not consider the work-around to be a reasonable solution, the priority level of the ticket will be updated to Severity 3.

    Initial response is satisfied with either an inbound customer phone call answered, a phone call placed to the customer or a public comment to the ticket where the Support Contact is also notified in writing, with an action plan on the initial steps required to begin the problem resolution process. Given the heightened urgency around Severity 1 and 2 tickets, initial response may include an invitation to participate in a screen share session to shorten time to problem isolation.

    d) Severity Level Definitions

    Severity 1: A problem that severely impacts customer’s use of Datameer in a production environment (i.e. loss of production data or a production system is not functioning). Additionally, the situation halts routine business operations and no work-around exists.

    Severity 2: A problem where Datameer is functioning but customer’s use in a production environment is severely reduced (i.e., a job-failure of a business critical job). The situation is causing a high impact to your business operations and no workaround exists.

    Severity 3: A problem that involves partial, non-critical loss of use of the software in a production environment or development environment. For production environments, there is a medium-to-low impact on customer’s business, but customer’s business continues to function, including by using a workaround. For development environments, customer’s usage of Datameer is severely reduced.

    Severity 4: A general usage question, reporting of a documentation error, or recommendation for a future product enhancement or modification. For production environments, there is low-to-no impact on customer’s business or the performance or functionality of customer’s system. For development environments, there is a medium-to-low impact on customer’s business, but customer’s business continues to function, including by using a workaround.

    e) Scope of Enterprise Technical Support Services An Active Customer may contact Datameer Enterprise Technical Support by opening a ticket via the Customer Support Center to request information regarding the use, configuration or operation of the Datameer software running on any Supported Instance. Enterprise Technical Support services include responses to submitted tickets pertaining to questions and resolving technical problems in the following scope:

    Best practices for setting up and configuring a Supported Instance for running Datameer software, including:

    System requirements and 3rd party software compatibility

    Installation, deployment, migration and upgrading

    Using supported and available functions and features

    Operational support for a Supported Instance running Datameer software, including:

    Best practices for using Datameer software, functions and features

    Identifying, diagnosing and fixing errors in Datameer software

    Preventing and recovering from failures and troubleshooting

    Problem diagnosis and resolution, including:

    Problem isolation and diagnosis of errors in Datameer software

    Patches and workarounds to fix bugs in Datameer software

    Product Enhancement Requests

    Providing feedback to the Datameer product team

    Submitting product enhancement request to the Datameer product team

    f) Additional Services (Available Upon Request) Additional services are available by request for an added cost. The following services may be available for Active Customers, contact your sales representative for pricing:

    Installation, Deployment, Migration and Upgrading

    Technical Support includes providing documentation and clarifying requirements.

    Active Customers may request a consultant to perform these tasks on or off site.

    Datameer Systems Integration

    Enterprise Technical Support includes fixing Datameer software errors and best practice recommendations for integrating with other systems.

    Active Customers may request a consultant to design, optimize and deploy an integration between Datameer and other systems.

    Use Case Development

    Enterprise Technical Support includes best practice recommendations for functions and features.

    Active Customers may request a consultant to scope, design, build and deploy a use case, or help guide use case requirements gathering and assessment.

    Product Training

    Enterprise Technical Support includes providing documentation for the Datameer product.

    Active Customers may request Product Training.

    Non-Recurring Engineering

    Enterprise Technical Support includes fixing Datameer software errors and providing documentation for the Datameer Software Development Kit (SDK).

    Active Customers may request Non-Recurring Engineering Services to scope, design, build and deploy a custom plugin using the SDK.

    3) Premium Support Service Plans-Customers may upgrade their Standard Support Service by purchasing Datameer’s optional add-on support service packages, Enhanced Level Support Services or Elite Level Support Services, which provide for additional service level hours or support. Enhanced and Elite Level Support Services are available by request for an added cost.

    a) Enhanced Level Support Service Levels

    Severity

    Active Hours

    Initial Response

    Update Frequency

    1

    24x7x365

    Within 1hours

    Continuous effort until relief provided

    2

    Business Hours

    Within 4hours

    Updated every 1 business day

    3

    Business Hours

    Within 12 hours

    Updated every 2 business days

    4

    Business Hours

    Within 1business day

    Updated every 5 business days

    b) EliteLevel Support Service Levels

    Severity

    Active Hours

    Initial Response

    Update Frequency

    1

    24x7x365

    Within 1hour

    Continuous effort until relief provided

    2

    24x7x365

    Within 2hours

    Updated every 4 hours

    3

    Business Hours

    Within 8 hours

    Updated every 1 business days

    4

    Business Hours

    Within 1business day

    Updated every 3business days

    4) Maintenance Policy - The Datameer maintenance policy is published here: https://datameer.zendesk.com/hc/en-us/articles/207475003-Datameer-Maintenance-Policy-and-Schedule

    View Article
  • Problem

    Tez jobs fail with encrypted shuffle enabled.

    Cause

    The property

    mapreduce.shuffle.ssl.enabled=true

    is set on yourcluster and marked as final.

    Solution

    Set the following property on the Hadoop Clusterpage under Custom Properties:

    tez.runtime.shuffle.ssl.enable=true

    View Article
  • Goal

    Restore a lost/forgotten password or change the password for either a Datameer user or administrator.

    Learn

    A user needs to restore a password:

    No auto restoration process is available for a user to request a password. A user will need to get in contact with a Datameer administrator to request the password.

    The Datameer administrator will open the User settings under the Administration tab. Select the user and update the user's password. A box can be checked to send the new password to the user via the email address listed for the account.

    An administrator needs to restore a password:

    If the administrator needs to update a password, they can do so using the steps listed above.

    If an administrator is unable to log into Datameer in order to update the password, the following steps can be taken to reset the password throughproperty files.

    To reset the admin user password in the property files:

    Open the Datameer file: das-env.sh

    Remove comment tag on:# export ADMIN_PASSWORD_RESET=true

    Restart Datameer using property--resetPassword

    When Datameer has restarted, the administrator user's password will be reset to the default as written in the default.properties file.

    Enter this default password to log into Datameer.

    The admin user's password may then be changed to a new unique password in the user account settings as described above.

    When complete, go back to the das-env.sh file and comment the line back in so the password will not revert to the default upon restarting Datameer.

    View Article
  • Goal

    Set upJetty 9 to redirect all HTTP requests to HTTPS instead of disabling the HTTP connector after Enabling SSL in verison 5.2 and later.

    Learn

    Open

    <datameer-install-path/etc/webdefault.xml>

    in an editor and add

    <security-constraint>

    <web-resource-collection>

    <web-resource-name>Everything</web-resource-name>

    <url-pattern>/*</url-pattern>

    </web-resource-collection>

    <user-data-constraint>

    <transport-guarantee>CONFIDENTIAL</transport-guarantee>

    </user-data-constraint>

    </security-constraint>

    in the appropriate section.

    Test redirection using

    curl --verbose 'http://localhost:8080'

    View Article
×
Rate your company