Datameer Claimed Company

Datameer is a leading provider of data management software for analytics that gives analysts universal access to the data they need when they need it for faster analytics. Datameer Spotlight enables business teams to discover, access, collaborate, and analyze more data without complex data replication and movement, while Datameer Spectrum is a cloud-native, fully-featured ETL++ platform that turns raw data into analytics-ready datasets in an easy, code-free manner. Datameer is a trusted platform at leading enterprises globally, including Citibank, Royal Bank of Canada, British Telecom, Aetna, Optum, National Instruments, Vivint and more. To learn more, please visit www.datameer.com. read more

EMPLOYEE PARTICIPANTS: 14

TOTAL RATINGS: 138

CEO: Christian Rodatus

Datameer FAQs

Datameer's Frequently Asked Questions page is a central hub where its customers can always go to with their most common questions. These are the 468 most popular questions Datameer receives.

Frequently Asked Questions About Datameer

Task attempt fails with Container released on a *lost* node

Problem
During a cluster job execution, some tasks fail with the following message seen in the YARN application log.
{"entity":"attempt_111111111_222222_1_01_000001_0","entitytype":"TEZ_TASK_ATTEMPT_ID",
"events":[{"ts":1566540243508,"eventtype":"TASK_ATTEMPT_FINISHED"}],
"otherinfo":{"creationTime":1566540195458,"allocationTime":1566540197777,"startTime":1566540230285,"endTime":1566540243508,"timeTaken":13223,
"status":"FAILED","taskAttemptErrorEnum":"CONTAINER_EXITED","taskFailureType":"NON_FATAL","diagnostics":"Container container_111111111_222222_1_01_000001 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]",
"counters":{"counterGroups":"[{counterGroupName=org.apache.tez.common.counters.DAGCounter, counters=[{counterName=RACK_LOCAL_TASKS, counterValue=1}]}]"},"lastDataEvents":{"lastDataEvents":"[{TEZ_TASK_ATTEMPT_ID=, ts=1566540195231}]"},"nodeHttpAddress":"DataNodeHostName:port"}}
Occasionally this can cause a job failure, but usually just impacts performance, as a failed task should be rerun.
Cause
This failure is a known issue with YARN ( YARN-8671 ) that may occurif a node is overly busy (e.g., some other container is using too much CPU or the NodeManager is doing too much to respond). The failure is indicative of a busy cluster or nodes that are having issues for some other reason.
Solution
As this exception points to a cluster services issue, it is recommended to review the cluster's configuration, performance and perform a general health check.
View Article
Limit the size of a file created by an ExportJob

Goal
How can the file size of Datameer Export Jobs be limited?
Learn
For file-based Export Jobs, Datameer allows limiting the size of exported files. It is possible to set this threshold at: Export Job Configuration -> Data Details -> Advanced Settings -> Maximum file size (MB) filed.
Keep in mind that the limit is applied to uncompressed data at the moment when it is being initially written to the target location. Compression (if configured) happens after the uncompressed files are written for every exported file individually.
For example, when data is exported as compressed CSV files and a file size limit of 50 MB is set, Datameer:
a) Exports uncompressed data considering that max allowed size for an individual file.
b) Compresses every file.
This leads to an expected situation when even with a 50 MB per-file size limit, an Export Job ends up writing ten 5MB files. This means that ten 50 MB files were compressed with a compression ratio of 10:1.
View Article
SQL Worksheet: the query like SELECT CAST(Sheet1.A as DATE; "MMddyyyy") FROM Sheet1 return error

Problem
On Sheet1 column A, I’ve created a record using the T function:T("09302019"). I then converted this string value into a date using ASDATE(#A;"MMddyyyy")function.
However, trying the query:
SELECT CAST(Sheet1.A as date 'mmddyyyy') from Sheet1
Throws the following exception:
WARN [2019-09-30 17:35:29.806] [qtp1942406066-77] (SqlSheetModel.java:199) - Something went wrong while parsing SQL query: SELECT CAST(Sheet1.A as date 'mmddyyyy') from Sheet1, Encountered "\'mmddyyyy\'" at line 1, column 30.
Was expecting one of:
")" ...
"(" ...
"CHARACTER" ...
"MULTISET" ...
Cause
Under the hood, SQL Worksheets are converted into a set of traditional Datameer functions and operations. The library it uses to cast a string into a date has a hardcoded pattern which isyyyy-MM-dd. As a result, any other pattern will not be recognized. This is why Datameer throws an exception for a query like:SELECT CAST(receipt_date as DATE; "MMddyyyy")....
Solution
To work around this limitation, transform the source value fromMMddyyyytoyyyy-MM-ddformat before applying the SQL query. Here is one approach that can be used to do so.
Sheet1 ColumnA - initial value09302019.
Transform it into2019-09-30using the formulaRIGHT(#A;4)+"-"+LEFT(#A;2)+"-"+RIGHT(LEFT(#A;4);2).
Create a SQL Sheet and introduce the desired SQL query.
Example.
View Article
Update Teradata Database Driver

Goal
I want to replace the default jar files used for the Teradata Database Driver in Datameer.
Learn
In order to successfully replace JDBC jar files for an existing Datameer Database Driver, ALL existing jar files for the specific driver must first be removed.
Navigate to the Admin Tab -> Database Drivers and then click on the gear icon for the existing Teradata Driver to access it's configuration.
Remove all jar files associated with this driver.
Note: These files can only be removed one at a time, even if all of them disappear once you click on the Trash icon. It is necessary to re-open the Driver configuration page again and remove each file individually.
On the Database Drivers section, ensure that no files remain within the File column for the existing Teradata Driver. As seen below:
Once this is confirmed, upload the new jar files and save the configuration.
View Article
DRAFT: Datameer 7.5 failed to start because of Unknown name value [VARIABLES_READ] exception

Problem
After the upgrade to version 7.5.x, Datameer fails to start with the following exception.
stderrout.log
2020-01-05 11:23:43.622:WARN:oejw.WebAppContext:main: Failed startup of context o.e.j.w.WebAppContext@7225790e{/,file:/datameer/Datameer-7.5.4-hdp-2.6.0/webapps/conductor/,STARTING}{/conductor}
java.lang.IllegalArgumentException: Unknown name value [VARIABLES_READ] for enum class [datameer.dap.sdk.usermanagement.Capability]
at org.hibernate.type.EnumType$NamedEnumValueMapper.fromName(EnumType.java:467)
at org.hibernate.type.EnumType$NamedEnumValueMapper.getValue(EnumType.java:452)
at org.hibernate.type.EnumType.nullSafeGet(EnumType.java:107)
at org.hibernate.type.CustomType.nullSafeGet(CustomType.java:127)
at org.hibernate.persister.collection.AbstractCollectionPersister.readElement(AbstractCollectionPersister.java:811)
conductor.log (may or may not occur)
[anonymous] WARN [2020-01-05 11:23:43.622] [ldap-cache-update-operation] - role with role_id 2, capability VARIABLES_READ can not be converted to Capability object, skipping it.
Cause
This is related to the updates made to the 7.5 SDK. Role capabilityVARIABLES_READ has been removed. Please refer to Important API and SDK Changes for Developers section of Datameer documentation for more details.
There is the upgrade scriptupgrade-7.5.0-DAP_38522.sql, that should be executed as part of the Datameer database schema upgrade. It deletes all occurrences of the role capabilityVARIABLES_READ.
Solution
In case the mentioned upgrade script hasn't been executed for some reason, it is possible to manually removethe role capabilityVARIABLES_READ.
Execute the following query against the Datameer database. If the script has run, the query should not return any record. In this case, please get in touch with Datameer support for further investigation.
SELECT * FROM role_capability WHERE capability = 'VARIABLES_READ';
If the query returns any response:
Stop Datameer.
Take the database dump.
Execute the following query.
DELETE FROM role_capability WHERE capability = 'VARIABLES_READ';
Start Datameer.
View Article
Git Plugin org.eclipse.jgit.errors.LockFailedException - No Updates Logged to Repository

Problem
Suddenly, the Datameer Git plug-in stops committing activities to the configured Git Repository. Within the <INSTALLDIR>/logs/conductor.log file, the following exception is observed:
[anonymous] ERROR [2018-01-01 00:00:00.000] [datameer-event-bus-1] (GitVersioningRecorder.java:207) - Exception caught during execution of add command
org.eclipse.jgit.api.errors.JGitInternalException: Exception caught during execution of add command
at org.eclipse.jgit.api.AddCommand.call(AddCommand.java:211)
at datameer.plugin.versioning.git.GitVersioning$2.apply(GitVersioning.java:97)
at datameer.plugin.versioning.git.GitVersioning$2.apply(GitVersioning.java:90)
at datameer.dap.sdk.util.Success.flatMap(Success.java:43)
at datameer.plugin.versioning.git.GitVersioning.writeWorkbookToWorkTree(GitVersioning.java:629)
at datameer.plugin.versioning.git.GitVersioning.commitWorkbookChanges(GitVersioning.java:191)
at datameer.plugin.versioning.git.GitVersioningRecorder.recordWorkbookChanges(GitVersioningRecorder.java:242)
at datameer.plugin.versioning.git.GitVersioningRecorder$6.apply(GitVersioningRecorder.java:174)
at datameer.plugin.versioning.git.GitVersioningRecorder$6.apply(GitVersioningRecorder.java:171)
at datameer.dap.sdk.util.Success.flatMap(Success.java:43)
at datameer.plugin.versioning.git.GitVersioningRecorder.record(GitVersioningRecorder.java:674)
at sun.reflect.GeneratedMethodAccessor456.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at datameer.com.google.common.eventbus.EventSubscriber.handleEvent(EventSubscriber.java:74)
at datameer.com.google.common.eventbus.SynchronizedEventSubscriber.handleEvent(SynchronizedEventSubscriber.java:47)
at datameer.com.google.common.eventbus.EventBus.dispatch(EventBus.java:322)
at datameer.com.google.common.eventbus.AsyncEventBus.access$001(AsyncEventBus.java:34)
at datameer.com.google.common.eventbus.AsyncEventBus$1.run(AsyncEventBus.java:117)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.eclipse.jgit.errors.LockFailedException: Cannot lock /opt/datameer/current/versioning/.git/index
at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:224)
at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:301)
at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:267)
at org.eclipse.jgit.lib.Repository.lockDirCache(Repository.java:1053)
at org.eclipse.jgit.api.AddCommand.call(AddCommand.java:142)
... 21 more
Cause
The Git repository on the Datameer Server is locked. Specifically, the <REPOSITORY>/.git/index.lock file is stale. This file is locking the Git repository from any further edits.
During normal operation, the <REPOSITORY>/.git/index.lock file should be created before an edit is made and then deleted immediately following the edit. If this file exists for more than 1 minute, it is likely that the lock was not released as expected.
Solution
To work-around this issue, release the Git repository lock by removing the <REPOSITORY>/.git/index.lock file from the local file system. The future commits to the Git repository will resume as expected.
If this issue occurs, it is recommended to capture the <INSTALLDIR>/logs/conductor.log* files from the environment and to contact Datameer Support for further information.
View Article
How to Collect the YARN Application Logs

Goal
I want to collect the YARN application logs.
Learn
There are times when the Datameer job trace logs might not provide enough information for effective troubleshooting of an issue. When this happens, you may be asked to provide the YARN application logs from the Hadoop cluster.
To do this, you must first discern the application_id of the job in question. This can be found from the logs section of the Job History for that particular job id. First you must navigate to the job run details for the job id # in question:
How to Collect the YARN Application Logs - Manual Method
Once there, scroll to the bottom to the Job Log section and look for the line Submitted Application <application_id>:
Once the application_id is obtained, you can execute the following command from the command line on the Resource Manager to obtain the application logs:
yarn logs -applicationId <application_id>
Continuing with the above example, the following command would be executed:
yarn logs -applicationId application_1432041223735_0001 > appID_1432041223735_0001.log
Please note that using the `yarn logs -applicationId <application_id>` method is preferred but it does require log aggregation to be enabled first. If log aggregation is not enabled, the following steps may be followed to manually collect the YARN Application logs:
View Article
Can't Set Up MySQL Connection: CLIENT_PLUGIN_AUTH is required

Problem
Attempting to create a connection to a MySQL Database fails with the error below:
java.lang.RuntimeException: could not create jdbc connection to jdbc:mysql://host:3306/database_name
Caused by: com.mysql.cj.core.exceptions.UnableToConnectException: CLIENT_PLUGIN_AUTH is required
Cause
Mentioned com.mysql.cj.core.exceptions.UnableToConnectExceptionmost likely comes from version 6 ofMySQL Connector/J, when you try to connect to a relatively old MySQL instance. (Please refer to Supported Data Sources to check if the version of MySQL instance you are trying to connect to is supported.)
Name of the class that implementsjava.sql.Driverin MySQL Connector/J has changed fromcom.mysql.jdbc.Drivertocom.mysql.cj.jdbc.Driver in version 6. Please refer to Changes in the Connector/J API.
When you usemysql-connector-java-6.0.6.jaras a default MySQL JDBC driver (stored at etc/custom-jar/) and would like to set up a connection to a relatively old MySQL instance, it might fail with the error message mentioned above.
You can try to workaround the problem by setting up custom database driver using one of previous versions ofMySQL Connector/J (e.g. 5.1.44), but this still might not work. When one creates a custom Database Driver and uploads another version of themysql-connector-javajar file, a new connection that will be created in the future would have both jars (default and custom) in it's classpath. In case that the defaultmysql-connector-java-6.0.6.jarfrometc/custom-jar/is picked up first, it will be used instead of the custom driver.
Datameer recommends using generally available versions of MySQL Connector/J.
Solution
Here are the steps to replace mysql-connector-java-6.0.6.jarif you use it as the default one.
Remove all custom MySQL drivers you might have created to fix this problem and keep only the embedded one.
Stop Datameer.
Ensure that the service has been really stopped and no datameer processes are running.
Clean up /<Datameer installation folder>/tempand/<Datameer installation folder>/tmpfolders.
Replace the/<Datameer installation folder>/etc/custom-jars/mysql-connector-java-6.0.6.jarfile withmysql-connector-java-5.1.44.jar. or any other recent GA version of MySQL Connector/J
Start Datameer.
View Article
Mixed Content. This request has been blocked; the content must be served over HTTPS.

Problem
After accepting a self-signed certificate, the browser complains that scripts are being served in mixed mode:
Mixed Content: The page at 'https://<host>/browser' was loaded over HTTPS, but requested an insecure XMLHttpRequest endpoint 'http://<host>/home'. This request has been blocked; the content must be served over HTTPS.
Cause
This error might occurwhen you use Apache mod_proxy at your environment and external connection is secured, but internal one is not.
In this case the embedded Jetty webservice doesn't recognize that all external connections should be served in secured mode (over HTTPS) and keeps responding over HTTP.
Environment schema
User > HTTPS > Apache mod_proxy > HTTP > Datameers Jetty
Solution
In order to fix the issue, adjust Apache, Jetty, and Datameer settings.
Apache
Add the following line to the Apache config for the Datameer VirtualHost section:
RequestHeader set X-Forwarded-Proto "https" env=HTTPS
Jetty
In <datameer-install-path>/etc/jetty.xml uncomment the following:
<Call name="addCustomizer">
<Arg><New class="org.eclipse.jetty.server.ForwardedRequestCustomizer"/></Arg>
</Call>
Datameer
Make sure that the correct hostname and protocol is set in <datameer-install-path>/etc/live.properties for system.property.server.address:
# Define the address and port used to connect to DATAMEER.
system.property.server.address=<host>:<port>
Restart Datameer and Apache to apply changes.
View Article
HiveServer2 Connection - Not in list of params that are allowed to be modified

Problem
During creation of a Connection to HiveServer2, an error message is received.
[admin] INFO [<timestamp>] [<thread>] (SetJsonOutputCommand.java:29) - Triggering HQL:SET hive.ddl.output.format=json
[admin] WARN [<timestamp>] [<thread>] (DataStore.java:196) - connection fails: java.lang.RuntimeException: Error while processing statement: Cannot modify hive.ddl.output.format at runtime
. It is not in list of params that are allowed to be modified at runtime
datameer.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Error while processing statement: Cannot modify hive.ddl.output.format at runtime. It is not in list of params t
hat are allowed to be modified at runtime
Hive Configuration Variables
Cause
The issue seems to be mainly caused by permissions set at hive.security.authorization.sqlstd.confwhitelist.append in hive-site.xml.
The data format to use for DDL output (e.g. DESCRIBE table) is either set to 'text' (for human readable text) or 'json' (for a json object). In this case, the format is set to text per default, where expected data format isJSON. (As of Hive 0.9.0.)
Solution
Whitelist the variable hive.ddl.output.format as per .
View Article
AWS Access Denied Error with Server Side Encryption (SSE) Enabled

Environment
DM: 5.x, DIST: HDP 2.1, OS: Linux, COM: -
Problem
Setting up a connection to Amazon S3 bucket failed with following error message:
AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: <id>, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: <id>
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:350)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:202)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3066)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3037)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:533)
at datameer.dap.hadoop.filesystem.DatameerS3FileSystem$ListingIterator.computeNext(DatameerS3FileSystem.java:617)
at datameer.dap.hadoop.filesystem.DatameerS3FileSystem$ListingIterator.computeNext(DatameerS3FileSystem.java:605)
at datameer.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at datameer.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at datameer.dap.hadoop.filesystem.DatameerS3FileSystem.listStatus(DatameerS3FileSystem.java:282)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1483)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1523)
at datameer.dap.sdk.cluster.filesystem.HadoopFileSystem.listStatus(HadoopFileSystem.java:124)
at datameer.dap.sdk.util.DatameerFsClient.listStatus(DatameerFsClient.java:53)
at datameer.dap.sdk.util.DatameerFsClient.listStatus(DatameerFsClient.java:46)
at datameer.dap.sdk.datastore.FileDataStoreModel.testConnect(FileDataStoreModel.java:56)
at datameer.dap.sdk.entity.DataStore.validate(DataStore.java:186)
...
Cause
Server Side Encryption (SSE) is required for to write. The job is attemptingto do a test and is getting denied without SSE.
The ability to implement AES 256 encryption in Hadoop was not added until the 2.5.0 distribution of Hadoop. Refer to Add S3 Server Side Encryption for background information.
Apache Hadoop 2.6 release is supported in HDP 2.2 and beyond.
Solution
Set the following value as either a Custom Property in Datameer or in the core-site.xml file in your Hadoop cluster:
fs.s3n.server-side-encryption-algorithm=AES256
Workaround
Since this parameter must be set at the Apache Hadoop level, itisnecessary to upgrade to HDP 2.2.As a workaround prior to the HDP 2.2 release, disable Server Side Encryption (SSE) on the specific S3 buckets that need to be accessed.
View Article
Datalink fail to resolve logical name in HA setup - UnknownHostException: nameservice1

Environment
DM:4.4.1, OS: -, DIST: -, COM: HDFS
Symptoms
After upgrading Datameer to 4.4.1, the existing datalinks fail to resolve the configured logical name. Datalink jobs start running fine, but eventually they fail with an error like this:

INFO [2014-10-19 23:41:25.797] [JobScheduler worker1-thread-991] (MrPlanRunner.java:250) - Completed postprocessing: [0 sec], progress at 100
INFO [2014-10-19 23:41:25.797] [JobScheduler worker1-thread-991] (MrPlanRunner.java:251) - -------------------------------------------
INFO [2014-10-19 23:41:25.798] [JobScheduler worker1-thread-991] (MrPlanRunner.java:157) - Completed execution plan with SUCCESS and 1 completed MR jobs. (hdfs://nameservice1/user/datameer/importlinks/7199/34922)
INFO [2014-10-19 23:41:25.814] [JobScheduler worker1-thread-991] (JobArtifactFileAccessTool.java:62) - Configuring job result artifacts from [hdfs://nameservice1/user/datameer/importlinks/7199]
INFO [2014-10-19 23:46:24.327] [JobScheduler worker1-thread-991] (JobArtifactFileAccessTool.java:62) - Configuring job result artifacts from [hdfs://nameservice1/user/datameer/joblogs/34922]
ERROR [2014-10-19 23:46:24.410] [JobScheduler worker1-thread-991] (DasJobCallable.java:135) - Job failed! Execution plan: digraph G {
1 [label = "MrInputNode{datalink-sample-input} - 0 Bytes"];
2 [label = "MrMapNode{datameer.dap.common.job.sample.WritePartitionedPreviewMapper@216634b4}"];
3 [label = "MrOutputNode{datalink-sample} - 0 Bytes"];
2 -> 3 [label = "PRODUCED_BY_MAPPER"];
1 -> 2 [label = "REQUIRED_AS_MAPPER_INPUT"];
}
datameer.dap.sdk.util.ExceptionUtil$WrappedThreadException: java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1
at datameer.dap.sdk.util.ExceptionUtil.wrapInThreadException(ExceptionUtil.java:271)
at datameer.dap.sdk.util.HadoopUtil.executeTimeRestrictedCall(HadoopUtil.java:165)
at datameer.dap.sdk.util.HadoopUtil.getFileSystem(HadoopUtil.java:88)
at datameer.dap.sdk.util.HadoopUtil.getFileSystem(HadoopUtil.java:71)
at datameer.dap.sdk.cluster.filesystem.ClusterFileSystem.open(ClusterFileSystem.java:242)
at datameer.dap.sdk.cluster.filesystem.ClusterFileSystemProvider$1.open(ClusterFileSystemProvider.java:15)
at datameer.dap.sdk.datastore.FileDataStoreModel.openFileSystem(FileDataStoreModel.java:120)
...
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:569)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:512)
...

All the correct HA configuration details are present in the Custom Properties field of the Administration -> Hadoop Cluster page, but the jobs are still failing to resolve thenameservice1 logical name.
Cause/Resolution
Copying the same HA Hadoop configuration details from the Administration -> Hadoop Cluster page to the custom properties field of the HDFS Connection (that datalinks use to connect to the cluster) helps to run Datalinks successfully:

dfs.nameservices=nameservice1
dfs.ha.namenodes.nameservice1=namenode1,namenode2
dfs.namenode.rpc-address.nameservice1.namenode1=hostname1.company.com:8020
dfs.namenode.rpc-address.nameservice1.namenode2=hostname2.company.com:8020
dfs.client.failover.proxy.provider.nameservice1=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

Even having the same configuration details in the Hadoop Custom Properties field of datalinks doesn't help - the configuration needs to be present in the HDFS Connection.
Instead of setting the HDFS Name Node tohdfs://hostname:8080 and to solve the issue global, it will be necessary to usehdfs://nameservice.*
Further Information
regardinghow to "Configure High Availability on a Hadoop Cluster" and "High Availability and Yarn"can be requested from Datameer service team.
View Article
HDFS file path wildcards

Goal
I have partitioned data stored in HDFS, with a partition type of string. For example, a Hive table partitioned by county name. I would like to be able to choose certain partitions for ingestion.
Learn
To achieve this, specify the path to files with wildcards within theFile Or Folder field in the ImportJob/DataLink configuration wizard. Regular expressions are not supported for folder names, but wild cards are allowed.
For example, considering a 2 character country code where the path is as follows:
/warehouse/../country={<country 1>,<country 2>,...}/
If we want to select just the US and Japan countries:
/warehouse/../country={us,jp}/
If we want to do broader pattern matching:
/warehouse/../country=*/
/warehouse/../country=a*/
/warehouse/../country=*s/
For more information see our documentation: File Path and File Name Patterns.
View Article
Power BI Desktop Attempting to Parse Login Page Instead of Data After Supplying the Integration Link.

Problem
Attempting to use a Datameer integration link with Microsoft Power BI Desktop doesn't allow data to be retrieved. After defining the integration URL, the Web View shows the Datameer login page but it can't interact with it. This makes it impossible to retrieve the data as authentication can't be performed.
Cause
This is a bug/regression in the Microsoft Power BI Desktop. The Web View was previously used to authenticate, and the data would then be ingested. Even after updating the permissions of the query - Power BI Desktop incorrectly attempts to parse the login page as HTML data, rather than re-run the query and retrieve the data.
Workaround
The credentials passed with the query can be updated after building the initial query, the initial query deleted, and a new query built. At this point, the new query will properly pass credentials through the HTTP header with the query, and data will be retrieved.
In Datameer, right-click on your workbook and select Show Results.
Click Copy Integration Link in the Download dialog.
Open Power BI Desktop and select Get Data -> Other -> Web
Paste in your integration link retrieved in Step 1 and click OK.
Note, in the Table View it shows an HTML document.
Note, in the Web View you can see this HTML document is the Datameer Login screen correctly challenging for authentication.
Click on the Menu Button above the Power BI Desktop Ribbon and select Options and Settings -> Data source settings.
Click the URL shown in the Edit Data Source window and then select Edit Permissions...
Under Type: Anonymous click Edit... and select Basic Authentication from the list on the left.
Enter your Datameer credentials, and click Save.
On the Ribbon Home Tab click Edit Queries.
In the Query Editor right click on the Document in the Queries pane and Delete the query, then Close & Apply.
On the Ribbon Home Tab click Get Data -> Other -> Weband re-add your Integration URL.
Since credentials have been pre-set for this source, they will be passed through to Datameer in the request header and the login page will be bypassed.
View Article
How to Setup a Java KeyStore for a SAML Configuration

Goal
Create a KeyStore for implementing signed requests for SAML authentication.
Learn
Prerequisites
There should be a Public Certificate available from the Identity Provider server. Common file formats for this are .cer and.crt.
Identify the following variables for usage in the environment:

SERVICE_PROVIDER_ALIAS (i.e. datameersaml)
IDENTITY_PROVIDER_ALIAS (i.e. externalsaml)
KEYSTORE_FILENAME (i.e. datameersaml.keystore)

Step-by-step guide
1) Generate a new KeyStore and private key on the Datameer server by running this command:
keytool -genkey -alias <SERVICE_PROVIDER_ALIAS> -keyalg RSA -keystore<KEYSTORE_FILENAME>
A password/passphrase for the new KeyStore file. This command will prompt for the following values:

Re-enter the same password to confirm.
Private Key identifying attributes such as Company name, Organization name, etc.

2)Verify that the<KEYSTORE_FILENAME> is successfully created on the file system.
3)Import the ID Provider Public Certificate into the KeyStore that was created.
keytool -import -alias <IDENTITY_PROVIDER_ALIAS> -file <IDENTITY_PROVIDER_CERTIFICATE_FILE> -keystore <KEYSTORE_FILENAME>
4) Copy the <KEYSTORE_FILENAME> file to a known location on the Datameer server and ensure that the Linux file permissions allow the Datameer user to read the file.
5)Login to the Datameer GUI and edit the SAML configuration.
Input the KeyStore information including these values:

KeyStore Path (path to the <KEYSTORE_FILE>)
KeyStore Password (this was input during the first keytool command)
Service Provider Alias Name (<SERVICE_PROVIDER_ALIAS>)
Service Provider Passphrase (this was input during the first keytool command)
View Article
How to Calculate the Distance Between Two Geohashes or Locations

Goal
If I want to calculate the distance between two positions, how can I achieve this?
Learn
If you have two geohashes, you first need to decode them into latitude and longitude:
GEOHASH_DEC_LAT(#GeoHash)
GEOHASH_DEC_LONG(#GeoHash)
After you have your coordinates you can calculate the distance:
IF(! ISBLANK(#Timestamp); (ACOS(COS(RADIANS(90-#LatitudeFrom)) *COS(RADIANS(90-#LatitudeTo)) +SIN(RADIANS(90-#LatitudeFrom)) *SIN(RADIANS(90-#LatitudeTo)) *COS(RADIANS(#LongitudeFrom-#LongitudeTo))) *3958.756); null)
whereby r = 3958.756 gives the distance in miles and r = 6371 in kilometer.
View Article
Datameer Log Basics

Log Types
Conductor Log
This is the application log for Datameer. If you are experiencing overall issues with Datameer, this would be the first log to check. It contains general information and errors regarding Datameer.
Download this log under the Administration tab under System Dashboard. Please note, you will need to haveadministrative access to this tab.
Job Log
This log follows specific job artifacts such as Import Jobs, Workbooks, etc. If you are experiencing issues with a specific job, this would be a good place to start. This log usually points in the direction of any errors.
Download the job log by right clicking on your Datameer artifact and select "Show Details".
Next, you will want to select the job run under History. Click on the job ID number to get more details.
To download the job log, click Download Logfile under Job Log.
Job Trace
The Job Trace is a collection of files with data on a specific Datameer artifact (i.e. Workbooks, File Uploads, Export Jobs, etc.). The Job Trace includes the Job Log, along with other configuration files for the job. Additionally, if a job is failing, extra logs for the attempts may be stored here. The Job Trace will include the following:
job-conf.xml
job-definition.json
job-plan-original.dot
job-plan-compiled.dot
job.log
To Download the Job Trace, follow the same directions as the Job Log, but instead, select Download Job Trace under Job Log.
Log Details
Here are different log levels you may encounter while reading your logs from Datameer.
INFO
These are system or services messages indicating what Datameer is doing, or tasks it is performing.
WARN
Possible abnormalities during activity or something different than what Datameer expects. Most of the time, it leads to the next line in the log (potentially errors). You may also see warnings when records are dropped during a job run.
ERROR
When the system is no longer able to perform an intended task due to various issues. There is never one over arching reason for an error. Many things can cause an error. After an error, you will usually get a stack trace that provides more insight as to what actually caused the error.
View Article
Disable weak TLS algorithms and CipherSuites

Goal
I want Datameer embedded jetty serve only TLS1.2 requests, reject all weaker TLS algorithms, and disable weak CipherSuites.
Learn
By default jetty's SSL module is configured to serve data via any supported SSL/TLS version except SSLv3, as verified in the Configuring SSL/TLS section of the jetty documentation.
Below is an example configuration block that adds more protocols into the exception list:
<Set name="ExcludeProtocols">
<Array type="String">
<Item>SSL</Item>
<Item>SSLv2</Item>
<Item>SSLv2Hello</Item>
<Item>SSLv3</Item>
<Item>SSLv3</Item>
<Item>TLSv1</Item>
<Item>TLSv1.1</Item>
</Array>
</Set>
A second option is to change the configuration of the JVM used by Datameer and disable unwanted TLS algorithms. To do so, set thejdk.tls.disabledAlgorithms property within the$JAVA_HOME/jre/lib/security/java.securityfile and restart Datameer to apply changes.
For more details, see the following documentation: How to force java server to accept only tls 1.2 and reject tls 1.0 and tls 1.1 connections
In order to allow or forbid Jetty to use a certainCipherSuite, edit the appropriate properties within thejetty-ssl.xml configuration file, per the Jetty/Howto/CipherSuites section of Jetty's documentation.
Please note that Datameer restart is required to apply any changes to jetty configuration.
View Article
Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration

Problem
Error Message:
Caused by: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration
Cause
Removal or corruption of underlying Tez JAR files contained in Datameer's temp locations.
Solution
Stop the Datameer service and completely purge the following temp paths:
<datameer>/temp
<datameer>/tmp
hdfs://<datameer's private folder>/temp
hdfs://<datameer private folder>/jobjars
Start the service and new temp data will be built to replace the purged files.
View Article
How to Use Wget for REST API Access

Goal
In some company environments, cURL might not be an approved software while Wget is. Learn to work with REST API using Wget.
Learn
The following approach works in test environments for downloading the JSON definition of artifacts.
ENC='$(echo -n '<user>:<pass>' | openssl base64 -base64)' && wget --no-check-certificate -O artifact.json --header 'Accept:application/json' --header 'Authorization:Basic ${ENC}' 'https://<host>:8443/rest/<artifactType>/<configID>'
Limitations
Note that depending on the version you are using there might be limitations.For example Downloading Data Without an Export Job using a REST API call might not work:
ENC="$(echo -n '<user>:<pass>' | openssl base64 -base64)" && wget -t 1 --no-check-certificate -O content.csv -c --header "Accept:text/plain" --header "Authorization:Basic ${ENC}" "https://<host>:8443/rest/data/workbook/<configID>/<WorksheetName>/download"
Further Information
How to Upload the JSON Definition of Artifacts
View Article
How to Transpose Many Rows into One Column

Goal
In a workbook, you might have a column calledNames with about 20 names. You cantranspose this column into a row to create a list of all 20 names.
Solution

Createa SourceSheet calledNames with two columns:Namesand n.
Create a FormulaSheet calledNamesPrepareToGroup with acolumn usingNamesandn names, and a column usingdummyGroupNo and 1.

Create a FormulaSheet calledNamesGrouped with thecolumn NamesGroupedand the formula
GROUPBY(#NamesPrepareToGroup!dummyGroupNo)
the column NamesConcatwith the formula
GROUPCONCAT(#NamesPrepareToGroup!Names)

Attachements
View Article
How to Upload the JSON Definition of Artifacts

Goal
A command line tool such as cURL or Wget is required to manipulate Datameer using the REST API.
For a Windows machine, you can use Cygwin and the cURL package (you should look into Installing cURL on Cygwin on Windows ), but this method still requires command line manipulation.
While developing a REST service you want to submit commands and data manually and see the response. Use this article to learnaboutGUIs for REST services and manual testing.
Learn
You cancheck available GUI front ends, clients, and browser plug-ins mentioned in

http://stackoverflow.com/questions/7746448/is-there-a-handy-gui-for-rest-manual-services-testing
http://stackoverflow.com/questions/603170/gui-frontend-for-curl-for-testing-an-api
View Article
Datameer Fails with Kerberos Installation

Problem
Datameer doesn't work once Kerberosis installed.
The following error message appears:

... 85 more
Caused by: KrbException: Integrity check on decrypted field failed (31) - PREAUTH_FAILED
at sun.security.krb5.KrbAsRep.<init>(KrbAsRep.java:82)
at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316)
at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:776)
... 98 more
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
at sun.security.krb5.internal.ASRep.init(ASRep.java:64)
at sun.security.krb5.internal.ASRep.<init>(ASRep.java:59)
at sun.security.krb5.KrbAsRep.<init>(KrbAsRep.java:60)
... 101 more

Cause
Some operating systems use AES-256 for Kerberos principal. If the correct JCE is missing, the client fails to authenticate to Kerberos.
Solution
Download the unlimited JCE package from the Java Oracle website and follow the README in the same package.
View Article
Import Job or Export Job Failure - awstasks.com.jcraft.jsch.JSchException - Connection reset

Problem
An Import Job or Export Job fails in Datameer and the following error message is generated:
Caused by: java.io.IOException: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset
More expansively, here is an example stacktrace from an Export Job. This stacktrace is generated in the job log:
java.lang.RuntimeException: java.lang.RuntimeException: Failed to generate file for 'RecordStream[sheetName=export,description=datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor@3415aedf]'
at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:56)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to generate file for 'RecordStream[sheetName=export,description=datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor@3415aedf]'
at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:240)
at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:53)
... 5 more
Caused by: java.lang.RuntimeException: java.io.IOException: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset
at datameer.com.google.common.base.Throwables.propagate(Throwables.java:160)
at datameer.dap.common.graphv2.hadoop.ServerSideContext.finalizeExport(ServerSideContext.java:62)
at datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor.afterMrJobEnd(ExportJob.java:180)
at datameer.dap.common.graphv2.RecordStream.afterJobEnd(RecordStream.java:76)
at datameer.dap.common.graphv2.BaseMrClusterJob.afterJobEnd(BaseMrClusterJob.java:142)
at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:138)
at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:227)
... 6 more
Caused by: java.io.IOException: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset
at datameer.awstasks.ssh.JschRunner.run(JschRunner.java:215)
at datameer.awstasks.ssh.JschRunner.execute(JschRunner.java:222)
at datameer.awstasks.exec.ShellExecutor.execute(ShellExecutor.java:25)
at datameer.dap.hadoop.filesystem.LinuxShellCommandExecutor.deletePath(LinuxShellCommandExecutor.java:52)
at datameer.dap.hadoop.filesystem.ScpFileSystem.delete(ScpFileSystem.java:134)
at datameer.dap.sdk.util.FileOutputAdapter.finalizeExport(FileOutputAdapter.java:342)
at datameer.dap.common.graphv2.hadoop.ServerSideContext.finalizeExport(ServerSideContext.java:60)
... 11 more
Caused by: awstasks.com.jcraft.jsch.JSchException: Session.connect: java.net.SocketException: Connection reset
at awstasks.com.jcraft.jsch.Session.connect(Session.java:558)
at awstasks.com.jcraft.jsch.Session.connect(Session.java:183)
at datameer.awstasks.ssh.JschRunner.createFreshSession(JschRunner.java:369)
at datameer.awstasks.ssh.JschRunner.openSession(JschRunner.java:289)
at datameer.awstasks.ssh.JschRunner.run(JschRunner.java:207)
... 17 more

Cause
The cause of this issue is that the network connection between a Hadoop Data Node and the Import/Export SSH or SFTP server was dropped by the SSH or SFTP server.
To identify the root cause of the dropped connection, the SSH or SFTP daemon logs may need to be investigated on the Import/Export target server.
A common cause is that there is a limit on the number of concurrent SSH/SFTP connections to target server and this limit was temporarily exhausted.

Resolution
If the root cause of the issue was a temporary issue (i.e. limit exhausted or network outage), it may be possible to work-around this issue by simply re-running the job.
It may be helpful to reduce the concurrency of the Import Job or Export Job. This may cause the performance of the job to decrease since less concurrent network connections are established. To limit the maximum concurrency, set the following parameter in the Custom Properties of the affected job (the example value is 1):
das.splitting.max-split-count=1
View Article
Workbook Fails: <path_to_file> Is Not a Parquet File. Expected Magic Number at Tail

Problem
When you attempt to execute my workbook containing partitioned data, you notice a few select partitions are causing a failure of the job.
Upon closer inspection, you see the following:
Caused by: java.lang.RuntimeException: hdfs://<datameer_private_folder>/importjobs/<artifact_id>/<execution_id>/rewrite/data/<partition_date>/<exported_parquet_file>_0.parquet is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [-76, -93, 1, 0]
If the broken partition is excluded, the workbook will complete successfully.
Cause
The error can be traced back to a Compaction Job that ran against the source data. If such a job fails, it currently only attempts to clean up the data from the previous attempt. If this fails and a new attempt is started without the clean up, you are left with an additional corrupted file in your output path.
Workaround
If you were to identify the source directories that have been impacted, the corrupted files can be removed to repair the partition.
Two ways to identify the broken file:
It will contain an "_<attempt_value>" that will be lower than the other file in place.
The corrupted file will typically be sized smaller than the intact file.
Solution
We have identified a solution and will be releasing updated code in the form of a maintenance patch.
For further inquiries, please reach out to Support and provide "DAP-37174" as a reference.
View Article
MySQLNonTransientConnectionException: No operations allowed after connection closed

Problem
The Datameer GUI in inaccessible. When reviewing the conductor.log file on the Datameer server, the following stack trace is visible:

[system] ERROR [2014-01-01 00:00:00.000] [JobScheduler thread-1] (JDBCTransaction.java:198) - JDBC rollback failed
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after connection closed.
at sun.reflect.GeneratedConstructorAccessor123.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
at com.mysql.jdbc.Util.getInstance(Util.java:383)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1023)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:997)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:983)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:928)
at com.mysql.jdbc.ConnectionImpl.throwConnectionClosedException(ConnectionImpl.java:1323)
at com.mysql.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:1315)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5057)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.jamonapi.proxy.MonProxy.invoke(MonProxy.java:127)
at com.jamonapi.proxy.JDBCMonProxy.invoke(JDBCMonProxy.java:100)
at com.sun.proxy.$Proxy58.rollback(Unknown Source)
at com.mchange.v2.c3p0.impl.NewProxyConnection.rollback(NewProxyConnection.java:855)
at org.hibernate.transaction.JDBCTransaction.rollbackAndResetAutoCommit(JDBCTransaction.java:213)
at org.hibernate.transaction.JDBCTransaction.rollback(JDBCTransaction.java:192)
at org.hibernate.ejb.TransactionImpl.rollback(TransactionImpl.java:107)
at datameer.dap.conductor.persistence.PersistenceService.rollbackTransaction(PersistenceService.java:139)
at datameer.dap.conductor.persistence.TransactionHandler.execute(TransactionHandler.java:119)
at datameer.dap.conductor.persistence.TransactionHandler.executeInNewTransaction(TransactionHandler.java:92)
at datameer.dap.conductor.job.SingleThreadedTransactionController.execute(SingleThreadedTransactionController.java:32)
at datameer.dap.conductor.job.SingleThreadedController.executeAndLogMetrics(SingleThreadedController.java:141)
at datameer.dap.conductor.job.SingleThreadedController.loop(SingleThreadedController.java:117)
at datameer.dap.conductor.job.SingleThreadedController$2$1.run(SingleThreadedController.java:89)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at datameer.dap.conductor.webapp.security.DatameerSecurityService.runAsUser(DatameerSecurityService.java:124)
at datameer.dap.conductor.webapp.security.DatameerSecurityService.runAsUser(DatameerSecurityService.java:139)
at datameer.dap.conductor.job.SingleThreadedController$2.run(SingleThreadedController.java:85)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 60,621 milliseconds ago. The last packet sent successfully to the server was 1 milliseconds ago.
at sun.reflect.GeneratedConstructorAccessor124.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1137)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3715)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3604)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4149)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2834)
at com.mysql.jdbc.ConnectionImpl.setAutoCommit(ConnectionImpl.java:5368)
at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.jamonapi.proxy.MonProxy.invoke(MonProxy.java:127)
at com.jamonapi.proxy.JDBCMonProxy.invoke(JDBCMonProxy.java:100)
at com.sun.proxy.$Proxy58.setAutoCommit(Unknown Source)
at com.mchange.v2.c3p0.impl.NewProxyConnection.setAutoCommit(NewProxyConnection.java:881)
at org.hibernate.transaction.JDBCTransaction.begin(JDBCTransaction.java:87)
at org.hibernate.impl.SessionImpl.beginTransaction(SessionImpl.java:1473)
at org.hibernate.ejb.TransactionImpl.begin(TransactionImpl.java:60)
at datameer.dap.conductor.persistence.PersistenceService.beginTransaction(PersistenceService.java:81)
at datameer.dap.conductor.persistence.TransactionHandler.execute(TransactionHandler.java:107)
... 12 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3161)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3615)
... 30 more

Cause
This is a MySQLsettings issue. The response time of the MySQLserver (60,621 ms in the example above) exceeds the MySQLserver's configured wait_timeout value.
Solution
To resolve this issue, work with the MySQLdatabase administrator to increase the value of the wait_timeout parameter.
This setting is configurable in the my.cnf file. By default, mysql sets this value to "28800" seconds. If this value has been modified from the default, consider reverting it back to the default to restore connectivity to the Datameer server.
View Article
yarn.app.mapreduce.am.staging-dir property and its usage

Description
According to the Apache Hadoop documentation,history files are written by MapReduce jobs (in HDFS) to the.../history/done_intermediate/directory. This location is configured inmapred-site.xmlvia the propertymapreduce.jobhistory.intermediate-done-dir.
After a mapreduce job completes, logs are written to HDFS under this directory. The history server continuously scans the intermediate directory and moves any newly available logs to the directory specified by themapreduce.jobhistory.done-dirparameter inmapred-site.xml. From this location, history server picks up the logs and displays them on the history server UI.
MapReduce Job History retention policy is controlled by the below properties.
mapreduce.jobhistory.cleaner.enable- True / False. Default value isTrue.
mapreduce.jobhistory.cleaner.interval-ms- How often the job history cleaner checks for files to delete, in milliseconds. Defaults to 86400000 (one day). Files are only deleted if they are older thanmapreduce.jobhistory.max-age-ms.
mapreduce.jobhistory.max-age-ms- Job history files older than this many milliseconds will be deleted when the history cleaner runs. Defaults to 604800000 (1 week).
View Article
ClassNotFoundException: org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream

Problem
After creating a data link to Hive it is not possible to import data. In the l og files an error is shown.
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/common/io/NonSyncByteArrayOutputStream
...
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream
...
Background
When loading classes, Datameer is giving priority to the etc/custom-jars directory.
If custom Hive SerDe are used, our process isexpecting the classes to reside in the Hiveplugin, but if they were first picked up from custom-jarsthen they will be skipped when the Hive plugin becomes loaded as they are already available.
Troubleshooting Steps

Review the current Hiveplugin and check if all classes are in place.
Check MD5.
Runlsof against the Datameer process ID ( PID ).
Note if there are classes pulled from /etc/custom-jars.

Solution
Ensure that custom SerDe jar files arenot included in the <datameer-install-path>/etc/custom-jars. If extra custom SerDe jar files in the custom-jars path are found, they need to be removed. It will will be necessary to restart the Datameer service to make thechange active.
View Article
Hadoop Task Failed - Timed out After 600 secs

Problem
A Datameer job fails and in the job log, the following stacktrace is displayed:
ERROR [2015-01-01 00:00:00.000] [ConcurrentJobExecutor-4] (ClusterSession.java:186) - Failed to run cluster job 'Workbook job (12345): MyWorkbook with MyJob#Joined(Disconnected record stream)' [1 hrs, 18 mins, 24 sec] java.lang.RuntimeException: Job job_1447373200318_0080 failed! Failure info: Task failed task_1447373200318_0080_r_000071 Job failed as tasks failed. failedMaps:0 failedReduces:1
at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:49)
at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:31)
at datameer.dap.common.graphv2.hadoop.MrJob.runImpl(MrJob.java:228)
at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:128)
at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:181)
at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:48)
at datameer.dap.common.security.DatameerSecurityService$1.call(DatameerSecurityService.java:135)
at datameer.dap.common.security.DatameerSecurityService$1.call(DatameerSecurityService.java:129)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Job job_1447373200318_0080 failed! Failure info: Task failed task_1447373200318_0080_r_000071 Job failed as tasks failed. failedMaps:0 failedReduces:1
at datameer.dap.common.job.mr.DefaultMrJobClient.waitUntilJobCompletion(DefaultMrJobClient.java:234)
at datameer.dap.common.job.mr.DefaultMrJobClient.runJobImpl(DefaultMrJobClient.java:91)
at datameer.dap.common.job.mr.MrJobClient.runJob(MrJobClient.java:34) at datameer.dap.common.graphv2.hadoop.MrJob.runImpl(MrJob.java:216) ... 9 more
Caused by: java.lang.RuntimeException: Task: AttemptID:attempt_1447373200318_0080_r_000071_3 Timed out after 600 secs
Cause
The timeout occurs when a task isn't updating on the cluster side within the specified time frame. This problem mightoccur due to priorities of other tasks on that node at that time. Ultimately,thetask was terminated by Hadoop because it exceeded the timeout value(in milliseconds).
mapreduce.task.timeout
Solution
To be more flexible, increase the timeout parameter by setting 6million milliseconds
mapreduce.task.timeout=6000000
for this job and re-running it. A Datameer administrator canimplement this recommendation.
If that doesn't resolve the issue, contact Datameer Support for further assistance.
Further Information
This issue is described in the Apache Hadoop documentation of mapred-default.xml.
"The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. A value of 0 disables the timeout."
View Article
How to Configure MSSQL Connections Using Windows Authentication in Datameer

Goal
Using the JDBC driver provided by Microsoft, connections from a Linux client may not use Windows Authentication to connect to an MSSQL instance.
I want to configure MSSQL connections using Windows authentication.
Learn
To work around this limitation, it may be possible to configure Kerberos authentication and to continue to use the JDBC driver provided by Microsoft. Alternatively, there is an available open-source driver named JTDS which can be used to configure Linux clients to connect to an MSSQL instance using Windows Authentication (without Kerberos).
For Java 1.6, the suitable JTDS driver is version 1.2.8 which is available for download. For further information and documentation about this driver, please consult the projectpage.
To add this driver to Datameer, follow these steps:
Extract the included jtds-1.2.8.jar file from the jtds-1.2.8-dist.zip file.
Navigate to the Datameer Administration page and select the Database Drivers category from the left pane.
Click New to add the JTDS driver.
Provide a name such as "JTDS".
Upload the extracted jar file from step 1.
Select "MsSql" for the Database Driver Template.
For the driver class, input the following value:
net.sourceforge.jtds.jdbc.Driver
For the connection pattern, please use the following template:
jdbc:jtds:sqlserver://\%hostName\%:\%port\%/\%database\%;instance=\%instance\%;domain=\%domain\%;
Click save to add the JTDS driver.
With the driver added to Datameer, navigate to the Browser section.
Create a new Connection. As the type, select "JTDS" from the Databases section and click Next.
Fill in the connection details using the template provided.Here is an example completed connection string:
jdbc:jtds:sqlserver://mymssqlserver.corp.company.com:1433/mydatabase;instance=myinstance;domain=corp.company.com;
Specify the user and password. Click Next to test the connectivity.
Assuming the connection was successful, save the new connection.
The newly added JTDS driver and connection are now ready for use.
View Article
How to Read File Type 837 (X12-837 or ANSI-837)

Goal
Import EDI 837 health care files, also known as X12-837 or ANSI-837 into Datameer.
Learn
At the moment Datameer doesn't have the native instruments to parse EDI 837 files, but you could ingest these them as plain text. You can then work with the data using the functionality available in a workbook.
For example, ingest data the following data mentioned at EDI 837 Health Care Claim :
ISA*00* *00* *ZZ*99999999999 *ZZ*888888888888 *111219*1340*^*00501*000001377*0*T*>
GS*HC*99999999999*888888888888*20111219*1340*1377*X*005010X222
ST*837*0001*005010X222
BHT*0019*00*565743*20110523*154959*CH
NM1*41*2*SAMPLE INC*****46*496103
PER*IC*EDI DEPT*EM*[email protected]*TE*3305551212
NM1*40*2*PPO BLUE*****46*54771
HL*1**20*1
PRV*BI*PXC*333600000X
NM1*85*2*EDI SPECIALTY SAMPLE*****XX*123456789
N3*1212 DEPOT DRIVE
N4*CHICAGO*IL*606930159
REF*EI*300123456
HL*2*1*22*1
Steps
Create new import job or file upload.
Choose file type CSV/TSV.
Set appropriate the appropriate delimiter, for example *.
Execute the job and link the data to a new workbook.
For the example file snippet, thedata appears in the workbook as shown in the following screenshot. Then you could use Datameer functions to transform the data set according to your requirements and start analysis.
plug-in SDK
Refer to the Importing Data section of our documentation for more details on CSVfile ingestion (escape characters, custom schema, etc).
Further Proceeding
In case theinstruments available in Datameer can't help you to get required results, you can leverage our and create a custom connection for particular use-case. Our professional services (PS) team might help you with custom function engineering, if required.
View Article
Datameer Maintenance Policy and Schedule

Datameer EOMPolicy
Major releases are maintained for two (2) years minimum after the GA of that release or six (6) months after the GA date of the following major release (whichever is longer).
Minor releases are maintained for nine (9) months minimum after the GA of that release.
Standard Extensions
In the event that a minor release GA date is less than nine (9) months before the EOM date for the associated major release, maintenancefor the associated major release will be extended until nine (9) months after the GA date of this minor release.
In the event that the latest minor release GA date is more than nine (9) months ago and the associated major release GA date is less than two (2) years ago, the minor release will still be maintaineduntil the end of the major release or the next minor release whichever comes first.
The table below shows end-of-maintenance dates up to twelve (12) months ago and into the future.
Definition of Maintenance
Active customers may open tickets with Datameer's Technical Support team for support. Datameer's Technical Support will also assist active customers with upgrading to a maintained release.
Maintenance also includes software updates for bug fixes, and security vulnerability resolutions.Bug fixes will be made to maintained Minor Releases only. Security vulnerability resolutions will be made available in all maintainedMinor Releases.
Security vulnerabilities and severe bugs will be repaired in a Maintenance Release. All other bugs will be repaired in future Major or Minor releases.
Technical supportfor out of maintenance releases may be requested in the Datameer Community.
Definition of Terms
Major Release -Version numberX.0.0 : where X changes.
Minor Release -Version number x.Y.0 : where Y changes.
Maintenance Release -Version number x.y.Z:where Z changes.
End of Maintenance (EOM) - The last date a particular release will be maintained. This includes bug fixes and security patches.
General Availability (GA) - The date that a product was released to all customers.
Active Customers - These are customers with valid and activeMaintenance and Support contracts with Datameer
Severe Bug - A software defectwhere no work-around is availablethat severely limits operation within a production environment.
Security Vulnerability Bug - Asoftware defect that produces a weakness that could allow an attacker to compromise the integrity, availability or confidentiality of Datameer.
Maintenance Schedule
Product
EOMDate
Datatmeer X (Major)
2021-11-13
Datameer 10.0
2020-08-13
Datameer 7 (Major)
2020-05-02
Datameer 7.5 (Minor)
2020-05-02
Datameer 7.4 (Minor)
2020-02-11
Datameer 7.2 (Minor)
2019-05-17
Datameer 7.1 (Minor)
2018-12-13
Datameer 6 (Major)
2018-09-12
Versions not listed above are no longer maintained by Datameer.
View Article
AccessControlException: Queue root.default already has 10000

Problem
Datameer can't schedule any new jobs. In the logsthe following exception is shown:
Error Message
Failed to submit application_<id> to YARN : org.apache.hadoop.security.AccessControlException: Queue root.default already has 10000 applications, cannot accept submission of application: application_<id>
Cause
This error is from the Capacity Scheduler. Based on the following document Configuring YARN Capacity Scheduler Ambari, it appears the yarn.scheduler.capacity.maximum-applications limit are hit, which is set to 10,000 by default.
Solution

Check the Hadoop cluster queue and clear out any unnecessary jobs.
Increase the limitin the YARN configuration to allow for more concurrent applications to be submitted.
View Article
User sync will fail while importing users from LDAP groups with more than 1500 members

Problem
When a user tries to login, it throws an error saying "Login failed: User '<Usernam>' could not be authenticated." This happens even though the group is there in Datameer and the user account is also part of that group.
Cause
One possible reason of this issue is if a group contains more than 1500 members, LDAP search will fail to retrieve group information for any of those users. This results in Datameer not able to sync that user intoits cache.
This is seems to come from the LDAP Policy value:MaxValRange
"MaxValueRange controls the number of values that are returned on a single attribute on a single object. Default"1500 Hard Limit: 5000"
Solution
Increate MaxValRange to a value larger than number of users within the specified group.More about the parameter can be found at LDAP Wiki page on MaxValRange
View Article
How to Add Custom Parameters to a JDBC String

Goal
This article describes how custom parameters may be added to a JDBC connection in Datameer. For example, this technique could be used to enable the "tinybit" property in MySQL or to define username and password for MSSQL.
Learn
In general, custom properties for JDBC may be added using the following designation
?property=value
Example 1
If one wanted to activate the tinybit as true in MySQL, the property to add would be as follows
?tinybit=1
Based on this property and the default MySQL JDBC connection pattern, here is the full connection pattern to use
jdbc:mysql://\%host\%:\%port\%/\%database\%?tinybit=1
Example 2
If one wanted to definethe username and password for MSSQL, the property to add would be as follows
username=<user>;password=<pass>;
Based on this property and the default MSSQL JDBC connection pattern, here is the full connection pattern to use
jdbc:sqlserver://<host>:<port>;instance=MSSQLSERVER;DatabaseName=<database>;username=<user>;password=<pass>;
View Article
Duplicate Records Processed in Workbook Functions

Problem
Functions produce duplicate records on some workbook sheets. Specifically, source Parquet files less than a threshold may be read twice. Any downstream calculations will include the duplicated data.No errors are displayed nor logged indicating an issue.
For example, a grouping Workbook is expected to produce 15 results, but actually produces 20 results.
Cause
This is a software defect in Datameer. This is known internally asDAP-36752.
The problem only occurs if the job processes at least 1 small file (smaller than the threshold) and at least 1 large file (bigger than the threshold). The exact threshold depends on the system settings for the following values:
minSplitSize: Value of the property mapreduce.input.fileinputformat.split.minsize (default 128MB)
maxSplitSize: Value of the property mapreduce.input.fileinputformat.split.maxsize (default 512MB)
parquetMaxBlockSize: Value of the property das.parquet-storage.max-parquet-block-size (default 256MB)
The threshold itself is calculated asthe maximum of the minSplitSize value and the result of the minimum of the parquetMaxBlockSize or maxSplitSize values. Using the default values, the parquetMaxBlockSize is the minimum of the parquetMaxBlockSize and the maxSplitSize. The resulting parquetMaxBlockSize is compared to the minSplitSize and the maximum result is the parquetMaxBlockSize of 256MB.
Versions Affected
7.1.3, 7.1.4 and 7.1.5
6.4.7, 6.4.8 and 6.4.9
6.3.9 and 6.3.10
Workaround
To work-around this issue, splitting can be disabled by adding the following Custom Property to the Hadoop Cluster's Custom Properties configuration:
das.splitting.disable-individual-file-splitting=true
Adding this property will negatively affect the job's performance so it is advised to install the maintenance release as soon as possible.
If the work-around is applied, it should be deactivated after updating to a fixed release.
Solution
Apply the latest Datameer maintenance release to resolve this issue. A fix for this issue is included in 6.4.10 and 7.1.6 and higher releases.
View Article
How to Implement Debug Logging for sFTP Imports and Exports

Goal
Troubleshoot problematic Import Jobs and Export Jobs using an sFTP connection by implementing additional logging to debug and to determine what is preventing a successful job run.
Learn
First, occasionally there is a caching problem depending on the version of SSH/sFTP running on the host. Attempt re-running the job with the followingCustom Property.
fs.sftp.enable.session-cache=false
If this does not resolve the issue, remove the above parameter.
To begin debug troubleshooting, configure the artifact in question for a specific execution framework. Then implement the enhanced logging.
For Tez, related jobs should additionallybe set within the import-specificCustom Properties:
das.execution-framework=Tez
fs.sftp.enable.debug=true
tez.task.log.level=DEBUG
tez.am.log.level=DEBUG
Set the Default log severity to:
TRACE
And Logging Customizationof:
log4j.category.datameer=TRACE
log4j.category.datameer.awstasks=DEBUG
log4j.category.awstasks.com.jcraft=DEBUG
log4j.category.org.apache.hadoop=DEBUG
Further Information
Include Hadoop Task Logs
Often, comparing the sshd_config and ssh_config files from the Datameer Host, Data Nodes, and sFTP Host can be a quick path to resolving sFTP issues. Notably, supported authentication and encryption mechanisms should be identical on all machines.
View Article
Data Link Always Uses Five Splits.

Problem
If you have a file-based data source with many partitions and a significant amount of data, poor performance can be observed when running a data link. Upon investigation of the YARN application logs, it can be seen that five splits are always being used instead of the optimal calculated number of splits.
---------- Split Settings ----------
min/max split size: 16.0 MB (8.0 MB) / 5.0 GB
min/max split count: 0 / 5
total input size: 599.4 GB
slot count: 6980
number of desired tasks: 5
optimal split size: 119.9 GB
optimal split count: 5
-----------------------------------
Regardless of changing the min/max split size, min/max split count, and wave count - the 'optimal' split count always remains 5 with only 5 tasks.
This results in poor performance when the data link is run to generate the sample. In this instance there were almost 20,000 partitions, which means there would be 100,000 sample records generated five tasks at a time.
Cause
This was an intentional design decision made by the engineering team. The ideology behind it is that data link samples aren't doing analytical work so they should have fewer resources allocated to them on the cluster. This way jobs doing analytics run faster and have more resources allocated to them proportionally.
Solution
This behavior can be overridden by explicitly setting the number of splits for data link sample generation. The property that controls this behavior is:
das.splitting.datalink.sample-split-count
This property has a default value of 5, which explains the behavior described in the problem section. Add this property to the Custom Properties of the data link job, and increase the value beyond the default of 5 for added parallelism and more splits.
View Article
Kerberos token failure against cluster using Isilon for storage.

Problem
Attempting to run a job against a cluster using Isilon as the storage backend fails with the following exceptions:
Diagnostics: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS];
...
[system] WARN [2018-09-24 14:21:15.833] [ClusterMetadataUpdater thread-1] (Client.java:711) - Couldn't setup connection for [email protected] to <hostname>.com/<IP Address:Port> javax.security.sasl.SaslException: No common protection layer between client and server
at com.sun.security.sasl.gsskerb.GssKrb5Client.doFinalHandshake(GssKrb5Client.java:251)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:186)
Cause
Isilon HDFS clusters require use_ip for tokens to be set to false for the whole cluster. When use_ip is set to false, all delegation tokens will be represented by hostnames rather than IPs. This is a requirement from thearchitecture of Isilon itself since the Isilon name node is "rolling" among a few servers.
However, due to a bug reported in MAPREDUCE-6565, in HDP environments, execution frameworks will always take the use_ip setting from core-site.xml from its local mr-framework/hadoop/etc/hadoop directory on distributed cache. In HDP's originaldistribution, core-site is left empty so the Application Master will use the default value (true) for use_ip (hadoop.security.token.service.use_ip). When a job is submitted from client with use_ip=false but theApplication Master uses use_ip=false, the AM will not be able to initialize the SASL client with the name node.
Solution
Update the hadoop-site.xml file within the Datameer Tez plugin to ensure that the use_ip setting will be set to false.
1. Shut down the Datameer conductor. (./conductor.sh stop)
2. Navigate to <Datameer Home>/plugins and copy the plugin-tez-<version>.zip file to a temporary location.
3. Unzip the plugin file and edit /classes/hadoop-site.xml
4. Within the<configuration> tags add the following:
<property>
<name>hadoop.security.token.service.use_ip</name>
<value>false</value>
<description>Value for Isilon</description>
</property>
5. Save the xml file and then re-zip the plugin contents to create a newplugin-tez-<version>.zip
6. Replace the original plugin zip with the new modified copy.
7. Restart the Datameer conductor.
View Article
How to Collect the Network HAR File Logs from a Developer Tools Session

Goal
I want to collect network HAR file logs from a developer tools session.
Learn
In order to effectively troubleshoot any support issues you may be, at times, asked to provide a network HAR file from the Developer Tools section of Google Chrome or the preferred browser of your choice.
In this article, we will be reviewing how to capture a trace from Google Chrome.
In order to capture a network trace, begin by first opening the Developer Tools section of Google Chrome. This can be found by selecting the drop down in the upper right of the browser window (next to Settings) and navigating to More Tools > Developer Tools:
Developer Tools
Once opened, navigate to the Network tab and perform the reproduction action that prompted the opening of the support case (ex. open the troublesome workbook). This will populate the network section of Developer Tools with the information necessary for support:

Once the issue is reproduced, right click anywhere in the Developer Tools pane and select Save As HAR with Content:

Please be sure to include this HAR file as an attachment to the support ticket.
Further Information
The similar is possible with Internet Explorer (IE). You can use the included to export a debug session into a file called NetworkData in XML format.
View Article
How to Collect the YARN Application Logs - Manual Method

Goal
This article describes a manual method for collecting YARN Application logs if log aggregation is not enabled. The automated and recommended method is outlined in this article: How to Collect the YARN Application Logs
Learn
Follow the steps in the above article to identify the Application ID for the affected job. Once the application ID is known, follow these steps:
1. Navigate to the Resource Manager UI then find the application ID and click on the link.
2. Click on the Logs button for the Application attempt.
3. For each of the log files displayed, open the full log and then save the file. Ensure that the syslog, syslog_dag, stdout, and stderr files are captured at a minimum.
This concludes the steps to collect the logs for the Application Master. In addition, there might be other containers that were created to execute this particular application. The same logs might be required to be collected from any failed or suspect containers as well.
View Article
How to Enable Tez History/UI for Hadoop?

We've found the following setup working
Hadoop Cluster
Since your Hadoop cluster needs a running Timeline Server, check that the following properties are enabled in yarn-site.xml
...
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
...
<property>
<name>yarn.timeline-service.hostname</name>
<value>localhost</value>
</property>
...
<property>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>
...
<property>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>
...
It runs per default on port 8188
If it is not running, start it with ./sbin/yarn-daemon.sh start timelineserver
On Datameer Application Server
Download and unpack Apache Tomcat 8.5.20
Configure the HTTP port in conf/server.xml e.g. 8280
Download and unzip Apache Tez 0.7.1 binaries
Delete everything under TOMCAT_HOME/webapps/*
Unzip the tez-ui-0.7.1.war into TOMCAT_HOME/webapps/ROOT
Edit TOMCAT_HOME/webapps/ROOT/scripts/configs.js
timelineBaseUrl must point to the Timeline Server (ATS)
RMWebUrl must point to the Resource Manager (RM)
Double check that JAVA_HOME is set, (It should be set as it's a pre-requirement for Datameer)
Start the server with TOMCAT_HOME/bin/catalina.sh start
Configure a test job with Hadoop Custom Properties and Debug Logging Implemented
das.execution-framework=Tez
das.debug.tasks.logs.collect.force=true
tez.task.log.level=DEBUG
tez.am.log.level=DEBUG
tez.allow.disabled.timeline-domains=true
tez.am.history.logging.enabled=true
tez.dag.history.logging.enabled=true tez.history.logging.service.class=org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
tez.tez-ui.history-url.base=http://<thisHost>:8080/#/main/view/TEZ/tez_cluster_instance
yarn.timeline-service.enabled=true
yarn.timeline-service.hostname=<ATS>
If you have a Kerberos Secured cluster you should configure the cluster in the yarn-site.xml as described in section Security Configuration
[email protected]
yarn.timeline-service.keytab=/home/datameer/datameer.keytab
Test
Execute test job
Gather and review logs, yarn-site.xml, Tomcat/catalina.log, full job trace
Check that everything is working properly
View Article
Datameer Support Holiday Schedule

This document outlines the Datameer Support Holiday schedule.
Datameer Support holidays are observed regionally and indicate limited support coverage for business hours on the given day. Service level agreementswith 24x7x365 support are not affected by holidays; only "business hours" service level agreementsare affected by holidays. If you have any questions regarding your organization's support subscription, please reach out to your Account Executive.
Here are the business hours for each region that will be closed for a regional holiday:
Region
Start Time
End Time
Timezone(s)
AMER
0600
1800
Pacific Time (GMT-8;GMT-7)
EMEA
0900
1800
Berlin Time (GMT-1;GMT-2)
2019 Holiday Schedule
Holiday Title
Date Observed
Region(s) Observed
New Years Day
January 1, 2019
Worldwide
Epiphany
January 6, 2019
EMEA
Martin Luther King Jr. Day
January 21, 2019
AMER
President's Day
February 18, 2019
AMER
Good Friday
April 19, 2019
Worldwide
Easter Monday
April 22, 2019
EMEA
Labor Day (Germany)
May, 1 2019
EMEA
Memorial Day
May 27, 2019
AMER
Ascension
May 30, 2019
EMEA
Whitmonday
June 10, 2019
EMEA
US Independence Day
July 4, 2019
AMER
Labor Day (US)
September 2, 2019
AMER
Day of German Unity
October 3, 2019
EMEA
Thanksgiving (US)
November 28, 2019
AMER
Day After Thanksgiving (US)
November 29, 2019
AMER
Christmas
December 25, 2019
Worldwide
Day After Christmas
December 26, 2019
EMEA
2020 Holiday Schedule
Holiday Title
Date Observed
Region(s) Observed
New Years Day
January 1, 2020
Worldwide
Epiphany
January 6, 2020
EMEA
President's Day
February 17, 2020
AMER
Spring Friday
April 10, 2020
Worldwide
Easter Monday
April 13, 2020
EMEA
Labor Day (Germany)
May, 1 2020
EMEA
Ascension
May 21, 2020
EMEA
Memorial Day
May 25, 2020
AMER
Whit Monday
June 1, 2020
EMEA
US Independence Day
July 3, 2020
AMER
Labor Day (US)
September 7, 2020
AMER
Thanksgiving (US)
November 26, 2020
AMER
Day After Thanksgiving (US)
November 27, 2020
AMER
Christmas Eve
December 24, 2020
AMER
Christmas
December 25, 2020
Worldwide
View Article
Business Impact Descriptions and Examples

Business Impact Descriptions
Severe:Datameer is entirely unusable for all users. The situation completely halts your business operations and no workaround exists.
High:Datameer functions partially. The situation is causing a significant impact to your business operations and no workaround exists.
Moderate:A problem that involves partial, non-critical loss of use of the software.
Low:A general usage question, reporting of a documentation error, or recommendation for a future product enhancement or modification. There is low-to-no impact on your business or the performance or functionality of your system.
Examples
Example 1: Datameer is offline and no user can login. The Datameer administrator has attempted to restart Datameer, but it remains inaccessible for all users.
This example is a Severe impact.
Example 2: When configuring a new artifact (i.e. an Import Job), the setup or first execution of this artifact fails.
This example is aModerateorLowimpact depending on the use case that is being created.
Example 3: An existing artifact (i.e. a Workbook) that has been running successfully in the past is now failing to execute successfully. Successful execution of this artifact is crucial to the business.
This example is aHigh impact.
View Article
Support Ticket Severity Definition Table

This document describes how the severity is assigned to support tickets.
The severity of a ticket is defined by two factors, the environment type and the business impact. The following table outlines the severity for each possible scenario:
Production Environment
Non-Production Environment
Severe Business Impact
Severity 1
Severity 3
High Business Impact
Severity 2
Severity 3
Moderate Business Impact
Severity 3
Severity 4
Low Business Impact
Severity 4
Severity 4
For more detailed information about the business impacts, please review the descriptions and examples.
View Article
Datameer Maintenance and Support Policy

This Maintenance and Support Policy (“Policy”) describes the current practices of Datameer with regard to its provision of technical support and maintenance services to entities that have entered into an agreement for Datameer’s Software (each such entity, a “Customer”). Datameer will not modify the terms of your Policy during the initial term of your license; however, if you renew your license, then the version of this Policy that is current at the time of renewal will apply for your renewal term.
1) Definitions
“Business Day” means Monday through Friday, Coordinated Universal Time (UTC), excluding holidays observed by Datameer. Datameer holidays are published here:
https://www.datameer.com/supportholidays
“Business Hours” means 12:00 a.m. to 11:59 p.m., UTC on Business Days.
“Active Customer” means a Customer with a valid and unexpired Enterprise Support Services subscription with Datameer.
“Support Contact” means a representative from an Active Customer that has successfully completed Datameer Administration certification.
“Supported Instance” means a server belonging to an Active Customer running Datameer software.
2) Technical Support Technical support is available with a subscription to a Datameer Support Service package which defines service level agreements. These guidelines define the following areas;
a) Support Tickets Every Support Contact is required to create an account in the Customer Support Center prior to opening a ticket. Once a Support Contact has created an account and has logged into the Customer Support Center, the contact may manage open tickets, review previously solved tickets and may submit new tickets. When submitting tickets, a Support Contact must provide the following information: (a) a description of the issue; (b) the step-by-step process to reproduce the issue; (c) the error messages associated with the issue; (d) any additional data available, or required as determined by Datameer, including but not limited to stack traces, configuration settings, and related information; and (e) information necessary to classify the severity of the issue
b) Support Hotline Support Contacts may phone in new tickets or may request continued assistance with existing tickets at +1-800-874-0569.
c) Service Levels- Service Levels are defined per Datameer support service packages set forth below:
i) Enterprise Standard - Bundled with Datameer Enterprise on AWS Marketplace (Hourly Package)
Severity
Active Hours
Initial Response
Update Frequency
1
Business Hours
Within 3 hours
Updated every 1 business day
2
Business Hours
Within 6 hours
Updated every 1 business day
3
Business Hours
Within 24 hours
Updated every 2 business days
4
Business Hours
Within 3 business days
Updated every 5 business days
ii) Enterprise Silver
Severity
Active Hours
Initial Response
Update Frequency
1
Business Hours
Within 2 hours
Updated every 2 hours
2
Business Hours
Within 4 hours
Updated every 1 business day
3
Business Hours
Within 12 hours
Updated every 2 business days
4
Business Hours
Within 1 business day
Updated every 5 business days
iii) Enterprise Gold
Severity
Active Hours
Initial Response
Update Frequency
1
24x7x365
Within 1 hour
Updated every 1 hour
2
24x7x365
Within 2 hours
Updated every 1 business day
3
Business Hours
Within 4 hours
Updated every 2 business days
4
Business Hours
Within 8 hours
Updated every 3 business days
If Datameer provides a work-around that corrects an issue, but the Support Contact does not consider the work-around to be a reasonable solution, the priority level of the ticket will be updated to Severity 3.
Initial response is satisfied with either an inbound Customer phone call answered, a phone call placed to the Customer or a public comment to the ticket where the Support Contact is also notified in writing, with an action plan on the initial steps required to begin the problem resolution process. Given the heightened urgency around Severity 1 and 2 tickets, initial response may include an invitation to participate in a screen share session to shorten time to problem isolation.
d) Severity Level Definitions
Severity 1: A problem that severely impacts Customer’s use of Datameer in a production environment (i.e. loss of production data or a production system is not functioning). Additionally, the situation halts routine business operations and no work-around exists.
Severity 2: A problem where Datameer is functioning but Customer’s use in a production environment is severely reduced (i.e., a job-failure of a business critical job). The situation is causing a high impact to your business operations and no workaround exists.
Severity 3: A problem that involves partial, non-critical loss of use of the software in a production environment or development environment. For production environments, there is a medium-to-low impact on Customer’s business, but Customer’s business continues to function, including by using a workaround. For development environments, Customer’s usage of Datameer is severely reduced.
Severity 4: A general usage question, reporting of a documentation error, or recommendation for a future product enhancement or modification. For production environments, there is low-to-no impact on Customer’s business or the performance or functionality of Customer’s system. For development environments, there is a medium-to-low impact on Customer’s business, but Customer’s business continues to function, including by using a workaround.
e) Scope of Enterprise Technical Support Services An Active Customer may contact Datameer Enterprise Technical Support by opening a ticket via the Customer Support Center to request information regarding the use, configuration or operation of the Datameer software running on any Supported Instance. Enterprise Technical Support services include responses to submitted tickets pertaining to questions and resolving technical problems in the following scope:
Best practices for setting up and configuring a Supported Instance for running Datameer software, including:
System requirements and 3rd party software compatibility
Installation, deployment, migration and upgrading
Using supported and available functions and features
Operational support for a Supported Instance running Datameer software, including:
Best practices for using Datameer software, functions and features
Identifying, diagnosing and fixing errors in Datameer software
Preventing and recovering from failures and troubleshooting
Problem diagnosis and resolution, including:
Problem isolation and diagnosis of errors in Datameer software
Patches and workarounds to fix bugs in Datameer software
Product Enhancement Requests
Providing feedback to the Datameer product team
Submitting product enhancement request to the Datameer product team
f) Additional Services (Available Upon Request) Additional services are available by request for an added cost. The following services may be available for Active Customers, contact your sales representative for pricing:
Installation, Deployment, Migration and Upgrading
Technical Support includes providing documentation and clarifying requirements.
Active Customers may request a consultant to perform these tasks on or off site.
Datameer Systems Integration
Enterprise Technical Support includes fixing Datameer software errors and best practice recommendations for integrating with other systems.
Active Customers may request a consultant to design, optimize and deploy an integration between Datameer and other systems.
Use Case Development
Enterprise Technical Support includes best practice recommendations for functions and features.
Active Customers may request a consultant to scope, design, build and deploy a use case, or help guide use case requirements gathering and assessment.
Product Training
Enterprise Technical Support includes providing documentation for the Datameer product.
Active Customers may request Product Training.
Non-Recurring Engineering
Enterprise Technical Support includes fixing Datameer software errors and providing documentation for the Datameer Software Development Kit (SDK).
Active Customers may request Non-Recurring Engineering Services to scope, design, build and deploy a custom plugin using the SDK.
3) Customers without a Support Contract - Small private offer deals on the Amazon Marketplace do not include a Datameer Support Service package. These customers may submit questions on the Datameer Community, which is free to all customers.
4) Success Management - When you purchase a subscription to the Datameer Support Service package, Datameer will assign a Success Manager to your account. The service level will vary depending on your subscription, which defines the level of engagement. Please see below guidelines for the service levels under each subscription.
Deliverables- A Success Manager deliverables per Datameer Support Service packages are set forth below:
i) Enterprise Silver
Remote Customer Checkpoint with Business and IT Leadership - Up to 1 per month
ii) Enterprise Gold
Remote Customer Checkpoint with Active Customer - Up to 4 per month
On-site Customer Checkpoint with Active Customer - Up to 1 per month
Business Review - Up to 1 per quarter
Datameer Health Dashboard Report and Review - Up to 1 per quarter
Datameer Product Roadmap Review - Up to 1 per quarter
Coordinate Beta Program Participation for New Features - As needed
5) Maintenance Policy - The Datameer maintenance policy is published here: https://www.datameer.com/maintenancepolicy
6) Changes to Policy -Datameer reserves the right, at its discretion, to change the Policy and the policies within it at any time based on prevailing market practices and the development of Datameer's software products.
Legacy Maintenance and Support Services are also published for customers with older multi-year subscriptions.
View Article
Legacy Maintenance and Support Services

Please note that this document describes Legacy Maintenance and Support Services from Datameer. These are no longer offered and are published for reference only. The current offerings are available at Enterprise Maintenance and Support Services.
1) Definitions
“Business Day” means Monday through Friday, Coordinated Universal Time (UTC), excluding holidays observed by Datameer. Datameer holidays are published here:
https://datameer.zendesk.com/hc/en-us/articles/211483666-Datameer-Support-Holiday-Schedule
“Business Hours” means 12:00 a.m. to 11:59 p.m., UTC on Business Days.
“Active Customer” means a customer with a valid and unexpired Enterprise Support Services subscription with Datameer.
“Standard Support Services” means the basic support services provided at no cost to Customer.
“Support Contact” means a representative from an Active Customer.
“Supported Instance” means a server belonging to an Active Customer running Datameer software.
2) Technical Support Technical support is available with a subscription to a Datameer Support Service package which defines service level agreements. These guidelines define
a) Support Tickets Every Support Contact is required to create an account in the Customer Support Center prior to opening a ticket. Once a Support Contact has created an account and has logged into the Customer Support Center, the contact may manage open tickets, review previously solved tickets and may submit new tickets. When submitting tickets, a Support Contact must provide the following information: (a) a description of the issue; (b) the step-by-step process to reproduce the issue; (c) the error messages associated with the issue; (d) any additional data available, or required as determined by Datameer, including but not limited to stack traces, configuration settings, and related information; and (e) information necessary to classify the severity of the issue
b) Support Hotline Support Contacts may phone in new tickets or may request continued assistance with existing tickets at +1-800-874-0569.
c) Service Level Agreement
Severity
Active Hours
Initial Response
Update Frequency
1
Business Hours
Within 3 hours
Continuous effort until relief provided
2
Business Hours
Within 6hours
Updated every 1 business day
3
Business Hours
Within 1 business day
Updated every 2 business days
4
Business Hours
Within 3 business days
Updated every 5 business days
If Datameer provides a work-around that corrects an issue, but the Support Contact does not consider the work-around to be a reasonable solution, the priority level of the ticket will be updated to Severity 3.
Initial response is satisfied with either an inbound customer phone call answered, a phone call placed to the customer or a public comment to the ticket where the Support Contact is also notified in writing, with an action plan on the initial steps required to begin the problem resolution process. Given the heightened urgency around Severity 1 and 2 tickets, initial response may include an invitation to participate in a screen share session to shorten time to problem isolation.
d) Severity Level Definitions
Severity 1: A problem that severely impacts customer’s use of Datameer in a production environment (i.e. loss of production data or a production system is not functioning). Additionally, the situation halts routine business operations and no work-around exists.
Severity 2: A problem where Datameer is functioning but customer’s use in a production environment is severely reduced (i.e., a job-failure of a business critical job). The situation is causing a high impact to your business operations and no workaround exists.
Severity 3: A problem that involves partial, non-critical loss of use of the software in a production environment or development environment. For production environments, there is a medium-to-low impact on customer’s business, but customer’s business continues to function, including by using a workaround. For development environments, customer’s usage of Datameer is severely reduced.
Severity 4: A general usage question, reporting of a documentation error, or recommendation for a future product enhancement or modification. For production environments, there is low-to-no impact on customer’s business or the performance or functionality of customer’s system. For development environments, there is a medium-to-low impact on customer’s business, but customer’s business continues to function, including by using a workaround.
e) Scope of Enterprise Technical Support Services An Active Customer may contact Datameer Enterprise Technical Support by opening a ticket via the Customer Support Center to request information regarding the use, configuration or operation of the Datameer software running on any Supported Instance. Enterprise Technical Support services include responses to submitted tickets pertaining to questions and resolving technical problems in the following scope:
Best practices for setting up and configuring a Supported Instance for running Datameer software, including:
System requirements and 3rd party software compatibility
Installation, deployment, migration and upgrading
Using supported and available functions and features
Operational support for a Supported Instance running Datameer software, including:
Best practices for using Datameer software, functions and features
Identifying, diagnosing and fixing errors in Datameer software
Preventing and recovering from failures and troubleshooting
Problem diagnosis and resolution, including:
Problem isolation and diagnosis of errors in Datameer software
Patches and workarounds to fix bugs in Datameer software
Product Enhancement Requests
Providing feedback to the Datameer product team
Submitting product enhancement request to the Datameer product team
f) Additional Services (Available Upon Request) Additional services are available by request for an added cost. The following services may be available for Active Customers, contact your sales representative for pricing:
Installation, Deployment, Migration and Upgrading
Technical Support includes providing documentation and clarifying requirements.
Active Customers may request a consultant to perform these tasks on or off site.
Datameer Systems Integration
Enterprise Technical Support includes fixing Datameer software errors and best practice recommendations for integrating with other systems.
Active Customers may request a consultant to design, optimize and deploy an integration between Datameer and other systems.
Use Case Development
Enterprise Technical Support includes best practice recommendations for functions and features.
Active Customers may request a consultant to scope, design, build and deploy a use case, or help guide use case requirements gathering and assessment.
Product Training
Enterprise Technical Support includes providing documentation for the Datameer product.
Active Customers may request Product Training.
Non-Recurring Engineering
Enterprise Technical Support includes fixing Datameer software errors and providing documentation for the Datameer Software Development Kit (SDK).
Active Customers may request Non-Recurring Engineering Services to scope, design, build and deploy a custom plugin using the SDK.
3) Premium Support Service Plans-Customers may upgrade their Standard Support Service by purchasing Datameer’s optional add-on support service packages, Enhanced Level Support Services or Elite Level Support Services, which provide for additional service level hours or support. Enhanced and Elite Level Support Services are available by request for an added cost.
a) Enhanced Level Support Service Levels
Severity
Active Hours
Initial Response
Update Frequency
1
24x7x365
Within 1hours
Continuous effort until relief provided
2
Business Hours
Within 4hours
Updated every 1 business day
3
Business Hours
Within 12 hours
Updated every 2 business days
4
Business Hours
Within 1business day
Updated every 5 business days
b) EliteLevel Support Service Levels
Severity
Active Hours
Initial Response
Update Frequency
1
24x7x365
Within 1hour
Continuous effort until relief provided
2
24x7x365
Within 2hours
Updated every 4 hours
3
Business Hours
Within 8 hours
Updated every 1 business days
4
Business Hours
Within 1business day
Updated every 3business days
4) Maintenance Policy - The Datameer maintenance policy is published here: https://datameer.zendesk.com/hc/en-us/articles/207475003-Datameer-Maintenance-Policy-and-Schedule
View Article
Tez Jobs Fail with Encrypted Shuffle Enabled

Problem
Tez jobs fail with encrypted shuffle enabled.
Cause
The property
mapreduce.shuffle.ssl.enabled=true
is set on yourcluster and marked as final.
Solution
Set the following property on the Hadoop Clusterpage under Custom Properties:
tez.runtime.shuffle.ssl.enable=true
View Article
How to Recover or Change a Password

Goal
Restore a lost/forgotten password or change the password for either a Datameer user or administrator.

Learn
A user needs to restore a password:
No auto restoration process is available for a user to request a password. A user will need to get in contact with a Datameer administrator to request the password.
The Datameer administrator will open the User settings under the Administration tab. Select the user and update the user's password. A box can be checked to send the new password to the user via the email address listed for the account.

An administrator needs to restore a password:
If the administrator needs to update a password, they can do so using the steps listed above.
If an administrator is unable to log into Datameer in order to update the password, the following steps can be taken to reset the password throughproperty files.
To reset the admin user password in the property files:

Open the Datameer file: das-env.sh

Remove comment tag on:# export ADMIN_PASSWORD_RESET=true

Restart Datameer using property--resetPassword

When Datameer has restarted, the administrator user's password will be reset to the default as written in the default.properties file.
Enter this default password to log into Datameer.
The admin user's password may then be changed to a new unique password in the user account settings as described above.
When complete, go back to the das-env.sh file and comment the line back in so the password will not revert to the default upon restarting Datameer.
View Article
How to Redirect HTTP to HTTPS in Jetty 9

Goal
Set upJetty 9 to redirect all HTTP requests to HTTPS instead of disabling the HTTP connector after Enabling SSL in verison 5.2 and later.
Learn
Open
<datameer-install-path/etc/webdefault.xml>
in an editor and add
<security-constraint>
<web-resource-collection>
<web-resource-name>Everything</web-resource-name>
<url-pattern>/*</url-pattern>
</web-resource-collection>
<user-data-constraint>
<transport-guarantee>CONFIDENTIAL</transport-guarantee>
</user-data-constraint>
</security-constraint>
in the appropriate section.
Test redirection using
curl --verbose 'http://localhost:8080'
View Article

1 2 3 4 5 6 7 8 9 10

Rate your company

Datameer FAQs

Frequently Asked Questions About Datameer

Anonymously Ask Datameer Any Question

Working at Datameer

Datameer's Competitors

Recently Asked Questions

Recent Datameer Employee Reviews

Datameer FAQs

Frequently Asked Questions About Datameer

Task attempt fails with Container released on a *lost* node

Limit the size of a file created by an ExportJob

SQL Worksheet: the query like SELECT CAST(Sheet1.A as DATE; "MMddyyyy") FROM Sheet1 return error

Update Teradata Database Driver

DRAFT: Datameer 7.5 failed to start because of Unknown name value [VARIABLES_READ] exception

Git Plugin org.eclipse.jgit.errors.LockFailedException - No Updates Logged to Repository

How to Collect the YARN Application Logs

Can't Set Up MySQL Connection: CLIENT_PLUGIN_AUTH is required

Mixed Content. This request has been blocked; the content must be served over HTTPS.

HiveServer2 Connection - Not in list of params that are allowed to be modified

AWS Access Denied Error with Server Side Encryption (SSE) Enabled

Datalink fail to resolve logical name in HA setup - UnknownHostException: nameservice1

HDFS file path wildcards

Power BI Desktop Attempting to Parse Login Page Instead of Data After Supplying the Integration Link.

How to Setup a Java KeyStore for a SAML Configuration

How to Calculate the Distance Between Two Geohashes or Locations

Datameer Log Basics

Disable weak TLS algorithms and CipherSuites

Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration

How to Use Wget for REST API Access

How to Transpose Many Rows into One Column

How to Upload the JSON Definition of Artifacts

Datameer Fails with Kerberos Installation

Import Job or Export Job Failure - awstasks.com.jcraft.jsch.JSchException - Connection reset

Workbook Fails: <path_to_file> Is Not a Parquet File. Expected Magic Number at Tail

MySQLNonTransientConnectionException: No operations allowed after connection closed

yarn.app.mapreduce.am.staging-dir property and its usage

ClassNotFoundException: org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream

Hadoop Task Failed - Timed out After 600 secs

How to Configure MSSQL Connections Using Windows Authentication in Datameer

How to Read File Type 837 (X12-837 or ANSI-837)

Datameer Maintenance Policy and Schedule

AccessControlException: Queue root.default already has 10000

User sync will fail while importing users from LDAP groups with more than 1500 members

How to Add Custom Parameters to a JDBC String

Duplicate Records Processed in Workbook Functions

How to Implement Debug Logging for sFTP Imports and Exports

Data Link Always Uses Five Splits.

Kerberos token failure against cluster using Isilon for storage.

How to Collect the Network HAR File Logs from a Developer Tools Session

How to Collect the YARN Application Logs - Manual Method

How to Enable Tez History/UI for Hadoop?

Datameer Support Holiday Schedule

Business Impact Descriptions and Examples

Support Ticket Severity Definition Table

Datameer Maintenance and Support Policy

Legacy Maintenance and Support Services

Tez Jobs Fail with Encrypted Shuffle Enabled

How to Recover or Change a Password

How to Redirect HTTP to HTTPS in Jetty 9

Anonymously Ask Datameer Any Question

Working at Datameer

Datameer's Competitors

Recently Asked Questions

Recent Datameer Employee Reviews

Task attempt fails with Container released on a lost node