
Instaclustr's Frequently Asked Questions page is a central hub where its customers can always go to with their most common questions. These are the 61 most popular questions Instaclustr receives.
Theprovisioning API allows most provisioning actions available on the console (GUI) to be carried out via a REST API(eg create, view, and delete clusters) .
This is most commonly used as part of an automated testing process where you want to provision a complete environment from scratch.
For increased security, the provisioning API is not enable until you have generated a key for your account (see below).
Pricing
Clusters created via the provisioning API are charged at our standard on-demand rateslistedon the Pricing page in our console.
Free trial clusters canbe created under the same terms as clusters created via the GUI console.
Authentication
All requests to the API must use Basic Authentication and contain a valid username and provisioning API key.
API keys are created per user, per account.
Accessing your API keys
API keys can be retrieved at any time from the secure page in the console by going to Account > API Keys.
https://api.instaclustr.com/provisioning/v1/vpc-peering/<clusterDataCentreId>/<vpcPeeringConnectionId>
Request Rate
Note that requests to the provisioning API are rate limited. Individual IPs may only make 1 request per 2.5 seconds. Exceeding this rate will return a 429 error code until that IP's request threshold falls below this limit.
Create Cluster
To provision a new cluster make a POST request to https://api.instaclustr.com/provisioning/v1/ with the JSON body:
{
"clusterName":"CassandraCluster",
"provider":"AWS_VPC",
"account": "YourAccount",
"version":"apache-cassandra-2.1.13",
"size":"t2.small",
"region": {
"dataCentre":"US_WEST_2",
"authnAuthz":"true",
"clientEncryption": "false",
"usePrivateBroadcastRPCAddress":"true",
"cdcNetwork":"",
"firewallRules":[],
"rackAllocation":[
{
"name":"us-west-2a", "nodeCount":"2"
},
{
"name":"us-west-2b", "nodeCount":"2"
},
{
"name":"us-west-2c", "nodeCount":"2"
}
]}, "tags":{"tag1name":"tag1value", "tag2name":"tag2value"}
}
Additionally, data at rest encryption is supported for AWS_VPC clusters. Add diskEncryptionKey to the provision cluster request:
{
"clusterName":"CassandraCluster",
"provider":"AWS_VPC",
...
"region": {
...
"firewallRules":[],
"diskEncryptionKey":"UUID returned after adding a KMS key",
"rackAllocation":[...]
}
}
Note: Refer our support article on encryption keys API to retrieve the diskEncryptionKey UUID.
If the JSON is valid (see allowed values below), the API will respond with 202 Accepted and a JSON containing the cluster id.
Depending onthe underlying infrastructure provider, it can take up to 10 minutes for the cluster to reach a useable state. You can check the status of provisioning using the cluster status endpoint, or via the console.
VPC Peeringmust be set up after the cluster is created. See below for VPC Peering APIs.
Allowed Values
Field
Allowed Values
clusterName
May contain a combination of letters, numbers and underscores with a maximum length of 32 characters.
provider
AWS_VPC, AZURE, SOFTLAYER_BARE_METAL, GCP (Multi-dc provisioning not currently supported through api.)
account
[New Feature]
Optional for customers running in their own account. Your provider account can be found on the 'Account' tab on the console, or the "Provider Account" property on any existing cluster.
For customers running in Instaclustr account, this property may be omitted.
version
Current version of Cassandra:
Please refer to the console in the Create cluster page for a list of cassandra versions available to you. They will take the form apache-cassandra-x.x.xx.
Email [email protected] for othersupported versions.
size
See 'Data centres and node sizes' reference table below
region.dataCentre
See 'Data centres and node sizes' reference table below
region.clientEncryption
MUST be false on developer (t2) nodes
region.defaultNetwork(Deprecated - Use region.cdcNetwork instead)
This field is optional.The private network address block for the cluster specified using CIDR address notation. The network must have a prefix length between /12 and /22 and must be part of a private address space. Defaults to "10.224.0.0/12". Note: This value is ignored if region.cdcNetwork is specified.
region.cdcNetwork
This field is optional.The private network address block for the for the data centre specified using CIDR address notation.The network must have a prefix length between /16 and /26 and must be part of a private address space.
If not specified,defaults to 10.224.0.0/16
region.firewallRules
Array of CIDR addresses permitted to connect to this cluster. Additional addresses may be added later in Cluster Settings (console).
region.rackAllocation
See 'Racks' reference table below
tags
This field is optional.If specified, the value is a map from tag key to value. For restrictions, refer to the AWS User Guide. Tags are defined per cluster and will be applied to every instance in the cluster. Tags are currently an AWS-only feature and will be ignored for other providers.
Cluster Status
To retrieve status of a cluster, make a GET request to https://api.instaclustr.com/provisioning/v1/<clusterId>
The APIwill respond with a JSON containing the following information:
{
"id":"77b5a4e1-c422-4a78-b551-d8fa5c42ad95",
"clusterName":"MyInstaclustr",
"clusterNetwork":"{"network":"10.224.0.0","prefixLength":12}",
"clusterStatus":"RUNNING",
"cassandraVersion":"apache-cassandra-2.1.11",
"username":"iccassandra",
"instaclustrUserPassword":"supersecretpassword",
"clusterCertificateDownload":"disabled",
"dataCentres":[
{
"id":"f0bdb45c-f83c-4298-aa38-4d5a779ba816",
"name":"US_EAST_1",
"provider":"AWS_VPC",
"clientEncryption":false,
"passwordAuthentication":true,
"userAuthorization":true,
"usePrivateBroadcastRPCAddress":true,
"cdcNetwork":{"network":"10.224.0.0","prefixLength":16},"bundles":["SPARK"]
"nodes":[
{
"id":"cb986e08-f6be-4d08-8de2-4352c2cfaf1f",
"size":"t2.small",
"rack":"us-east-1a",
"publicAddress":"111.111.111.111",
"privateAddress":"111.111.111.111",
"nodeStatus":"RUNNING"
"sparkMaster":true,
"sparkJobserver":true,
"zeppelin":false
},
{
"id":"f1809b07-ed42-4c40-83e0-e7cf8358a9cf",
"size":"t2.small",
"rack":"us-east-1e",
"publicAddress":"111.111.111.111",
"privateAddress":"111.111.111.111",
"nodeStatus":"RUNNING"
"sparkMaster":true,
"sparkJobserver":false,
"zeppelin":false
},
{
"id":"c8c29c26-91b4-4878-b11a-953a0b70c422",
"size":"t2.small",
"rack":"us-east-1d",
"publicAddress":"111.111.111.111",
"privateAddress":"111.111.111.111",
"nodeStatus":"RUNNING"
"sparkMaster":true,
"sparkJobserver":false,
"zeppelin":false
}
]
"nodeCount":3
}]
}
Delete Cluster
To delete a cluster, make a DELETE request to https://api.instaclustr.com/provisioning/v1/<clusterId>
The APIwill respond with 202 Accepted and JSON with message ""Cluster has been marked for deletion."
List all clusters
You can retrievea list of all active clusters in your account by making a GET request to https://api.instaclustr.com/provisioning/v1/.
The response will contain an arrayof clusters:
[
{
"id":"77b5a4e1-c422-4a78-b551-d8fa5c42ad95",
"name":"myInstaclustr",
"cassandraVersion":"apache-cassandra-2.1.11",
"nodeCount":4,
"runningNodeCount":3,
"derivedStatus":"RUNNING"
}
]
Firewall rules
Createfirewall rule
To provision a new firewall rule make a POST request to https://api.instaclustr.com/provisioning/v1/<clusterId>/firewallRules with the JSON body:
{
"network":"10.0.0.0/16",
"rules":[
{
"type":"CASSANDRA"
},
{
"type":"SPARK"
},
{
"type":"SPARK_JOBSERVER"
}
]
}
If the JSON is valid (see allowed values below), the API will respond with 202 Accepted.
It can take up to 10 minutes for the firewall rule to reach a useable state. You maycheck the status of provisioning using the list firewall rules endpoint, or via the console. If you have recently deleted a firewall rule, you may need to wait up to 10 minutes before provisioning a replacement using the same network.
Allowed Values
Field
Allowed Values
network
Must be a valid IPv4 CIDR.
type
CASSANDRA, SPARK, SPARK_JOBSERVER
Delete firewall rule
To delete an existing firewall rule, make a DELETE request to https://api.instaclustr.com/provisioning/v1/<clusterId>/firewallRules with the JSON body:
{
"network":"10.0.0.0/16",
"rules":[
{
"type":"CASSANDRA"
},
{
"type":"SPARK"
},
{
"type":"SPARK_JOBSERVER"
}
]
}
If the JSON is valid (see allowed values), the API will respond with 202 Accepted.
It can take up to 10 minutes for the firewall rule to be deleted. You can check the status of provisioning using the list firewall rules endpoint, or via the console. If you have recently deleted a firewall rule, you may need to wait up to 10 minutes before provisioning a replacement using the same network.
Allowed Values
Field
Allowed Values
network
Must be a valid IPv4 CIDR.
type
CASSANDRA, SPARK, SPARK_JOBSERVER
List firewall rules
You can obtain a list of all firewall rules for a cluster by making a GET request to https://api.instaclustr.com/provisioning/v1/<clusterId>/firewallRules
Sample response:
[ { "network":"10.0.0.0/16", "rules":[ { "type":"CASSANDRA", "status":"RUNNING" }, { "type":"SPARK", "status":"RUNNING" }, { "type":"SPARK_JOBSERVER", "status":"RUNNING" } ] }, { "network":"192.168.0.0/24", "rules":[ { "type":"CASSANDRA", "status":"RUNNING" } ] } ]
VPC Peering Connections
List VPC Peering Connections
To list the details for all the connections for a given Cluster Data Centre make a GET request to https://api.instaclustr.com/provisioning/v1/vpc-peering/<clusterDataCentreId>
The response will contain an array of VPC Peering Connections:
[
{
"id": "068c447e-8475-49b2-974b-ca1c917012324",
"aws_vpc_connection_id": "pcx-a667dbcf",
"clusterDataCentre": "a008665c-8916-1234-978c-90d49a3a1364",
"vpcId": "vpc-002512aa",
"peerVpcId": "vpc-a5bacd45",
"peerAccountId": "123777123999",
"peerSubnet": {
"network": "10.7.0.0",
"prefixLength": 16
},
"statusCode": "active"
},
{
"id": "561f463e-522e-4a5f-968e-e65961b6d9aa",
"aws_vpc_connection_id": "pcx-8667dbef",
"clusterDataCentre": "a008665c-8916-40ae-978c-90d49a3a1364",
"vpcId": "vpc-00250e64",
"peerVpcId": "vpc-c2f6d4a6",
"peerAccountId": "123555127866",
"peerSubnet": {
"network": "10.99.0.0",
"prefixLength": 16
},
"statusCode": "pending-acceptance"
}
]
List VPC Peering Connection
To List the details for a given VPC Peering Connection make a GET request to https://api.instaclustr.com/provisioning/v1/vpc-peering/<clusterDataCentreId>/<vpcPeeringConnectionId>
Here is an example response:
{
"id": "068c447e-9999-49b2-a74b-ca1c91702999",
"aws_vpc_connection_id": "pcx-ff12abc1",
"clusterDataCentre": "aabc665c-8916-40ae-978c-96d39a3a1364",
"vpcId": "vpc-aa770abc",
"peerVpcId": "vpc-aabb1122",
"peerAccountId": "123888456789",
"peerSubnet": {
"network": "10.7.0.0",
"prefixLength": 16
},
"statusCode": "active"
}
Create VPC Peering Request
To create a new VPC Peering Connection request make a POST request to https://api.instaclustr.com/provisioning/v1/vpc-peering/<clusterDataCentreId> with JSON body:
{
"peerVpcId" : "vpc-aaaa1234",
"peerAccountId" : "123770124789",
"peerSubnet" : "10.7.0.0/16"
}
If successful the call will return with status Accepted 202 and return the id of the new connection:
{"id": "068c447e-8475-49b2-974b-ca1c91702ed4"}
Delete VPC Peering Request/Connection
To delete a VPC Peering Connection make a DELETE call to . If successful the call will return with status Accepted 202.
Reference Data: Data centres and node sizes
provider
dataCentre
size
Plan
Data Centre Name
AWS_VPC
AP_NORTHEAST_1
m4l-250
EBS: tiny
Asia Pacific (Tokyo)
AWS_VPC
AP_NORTHEAST_1
m4xl-400
EBS: small
Asia Pacific (Tokyo)
AWS_VPC
AP_NORTHEAST_1
m4xl-800
EBS: balanced
Asia Pacific (Tokyo)
AWS_VPC
AP_NORTHEAST_1
m4xl-1600
EBS: bulk storage
Asia Pacific (Tokyo)
AWS_VPC
AP_NORTHEAST_1
m3.xlarge
Small
Asia Pacific (Tokyo)
AWS_VPC
AP_NORTHEAST_1
c3.2xlarge
Medium
Asia Pacific (Tokyo)
AWS_VPC
AP_NORTHEAST_1
i2.2xlarge
Extra Large
Asia Pacific (Tokyo)
AWS_VPC
AP_NORTHEAST_1
t2.small
Starter
Asia Pacific (Tokyo)
AWS_VPC
AP_NORTHEAST_1
t2.medium
Professional
Asia Pacific (Tokyo)
AWS_VPC
AP_SOUTHEAST_1
m4l-250
EBS: tiny
Asia Pacific (Singapore)
AWS_VPC
AP_SOUTHEAST_1
m4xl-400
EBS: small
Asia Pacific (Singapore)
AWS_VPC
AP_SOUTHEAST_1
m4xl-800
EBS: balanced
Asia Pacific (Singapore)
AWS_VPC
AP_SOUTHEAST_1
m4xl-1600
EBS: bulk storage
Asia Pacific (Singapore)
AWS_VPC
AP_SOUTHEAST_1
m3.xlarge
Small
Asia Pacific (Singapore)
AWS_VPC
AP_SOUTHEAST_1
c3.2xlarge
Medium
Asia Pacific (Singapore)
AWS_VPC
AP_SOUTHEAST_1
i2.2xlarge
Extra Large
Asia Pacific (Singapore)
AWS_VPC
AP_SOUTHEAST_1
t2.small
Starter
Asia Pacific (Singapore)
AWS_VPC
AP_SOUTHEAST_1
t2.medium
Professional
Asia Pacific (Singapore)
AWS_VPC
AP_SOUTHEAST_2
m4l-250
EBS: tiny
Asia Pacific (Sydney)
AWS_VPC
AP_SOUTHEAST_2
m4xl-400
EBS: small
Asia Pacific (Sydney)
AWS_VPC
AP_SOUTHEAST_2
m4xl-800
EBS: balanced
Asia Pacific (Sydney)
AWS_VPC
AP_SOUTHEAST_2
m4xl-1600
EBS: bulk storage
Asia Pacific (Sydney)
AWS_VPC
AP_SOUTHEAST_2
m3.xlarge
Small
Asia Pacific (Sydney)
AWS_VPC
AP_SOUTHEAST_2
c3.2xlarge
Medium
Asia Pacific (Sydney)
AWS_VPC
AP_SOUTHEAST_2
i2.2xlarge
Extra Large
Asia Pacific (Sydney)
AWS_VPC
AP_SOUTHEAST_2
t2.small
Starter
Asia Pacific (Sydney)
AWS_VPC
AP_SOUTHEAST_2
t2.medium
Professional
Asia Pacific (Sydney)
AWS_VPC
EU_CENTRAL_1
m4l-250
EBS: tiny
EU Central (Frankfurt)
AWS_VPC
EU_CENTRAL_1
t2.small
Starter
EU Central (Frankfurt)
AWS_VPC
EU_CENTRAL_1
t2.medium
Professional
EU Central (Frankfurt)
AWS_VPC
EU_CENTRAL_1
m3.xlarge
Small
EU Central (Frankfurt)
AWS_VPC
EU_CENTRAL_1
c3.2xlarge
Medium
EU Central (Frankfurt)
AWS_VPC
EU_CENTRAL_1
i2.2xlarge
Extra Large
EU Central (Frankfurt)
AWS_VPC
EU_CENTRAL_1
m4xl-400
EBS: small
EU Central (Frankfurt)
AWS_VPC
EU_CENTRAL_1
m4xl-800
EBS: balanced
EU Central (Frankfurt)
AWS_VPC
EU_CENTRAL_1
m4xl-1600
EBS: bulk storage
EU Central (Frankfurt)
AWS_VPC
EU_WEST_1
m4l-250
EBS: tiny
EU West (Ireland)
AWS_VPC
EU_WEST_1
m4xl-400
EBS: small
EU West (Ireland)
AWS_VPC
EU_WEST_1
m4xl-800
EBS: balanced
EU West (Ireland)
AWS_VPC
EU_WEST_1
m4xl-1600
EBS: bulk storage
EU West (Ireland)
AWS_VPC
EU_WEST_1
m3.xlarge
Small
EU West (Ireland)
AWS_VPC
EU_WEST_1
c3.2xlarge
Medium
EU West (Ireland)
AWS_VPC
EU_WEST_1
i2.2xlarge
Extra Large
EU West (Ireland)
AWS_VPC
EU_WEST_1
t2.small
Starter
EU West (Ireland)
AWS_VPC
EU_WEST_1
t2.medium
Professional
EU West (Ireland)
AWS_VPC
EU_WEST_1
m42xl-1600
Special Size: m4.2xlarge
EU West (Ireland)
AWS_VPC
SA_EAST_1
m3.xlarge
Small
South America (So Paulo)
AWS_VPC
SA_EAST_1
c3.2xlarge
Medium
South America (So Paulo)
AWS_VPC
SA_EAST_1
t2.small
Starter
South America (So Paulo)
AWS_VPC
SA_EAST_1
t2.medium
Professional
South America (So Paulo)
AWS_VPC
US_EAST_1
m4l-250
EBS: tiny
US East (Northern Virginia)
AWS_VPC
US_EAST_1
m4xl-400
EBS: small
US East (Northern Virginia)
AWS_VPC
US_EAST_1
m4xl-800
EBS: balanced
US East (Northern Virginia)
AWS_VPC
US_EAST_1
m4xl-1600
EBS: bulk storage
US East (Northern Virginia)
AWS_VPC
US_EAST_1
m3.xlarge
Small
US East (Northern Virginia)
AWS_VPC
US_EAST_1
c3.2xlarge
Medium
US East (Northern Virginia)
AWS_VPC
US_EAST_1
i2.2xlarge
Extra Large
US East (Northern Virginia)
AWS_VPC
US_EAST_1
t2.small
Starter
US East (Northern Virginia)
AWS_VPC
US_EAST_1
t2.medium
Professional
US East (Northern Virginia)
AWS_VPC
US_EAST_1
m42xl-1600
Special Size: m4.2xlarge
US East (Northern Virginia)
AWS_VPC
US_WEST_1
m4l-250
EBS: tiny
US West (Northern California)
AWS_VPC
US_WEST_1
m4xl-400
EBS: small
US West (Northern California)
AWS_VPC
US_WEST_1
m4xl-800
EBS: balanced
US West (Northern California)
AWS_VPC
US_WEST_1
m4xl-1600
EBS: bulk storage
US West (Northern California)
AWS_VPC
US_WEST_1
m3.xlarge
Small
US West (Northern California)
AWS_VPC
US_WEST_1
c3.2xlarge
Medium
US West (Northern California)
AWS_VPC
US_WEST_1
i2.2xlarge
Extra Large
US West (Northern California)
AWS_VPC
US_WEST_1
t2.small
Starter
US West (Northern California)
AWS_VPC
US_WEST_1
t2.medium
Professional
US West (Northern California)
AWS_VPC
US_WEST_2
m4l-250
EBS: tiny
US West (Oregon)
AWS_VPC
US_WEST_2
m4xl-400
EBS: small
US West (Oregon)
AWS_VPC
US_WEST_2
m4xl-800
EBS: balanced
US West (Oregon)
AWS_VPC
US_WEST_2
m4xl-1600
EBS: bulk storage
US West (Oregon)
AWS_VPC
US_WEST_2
m3.xlarge
Small
US West (Oregon)
AWS_VPC
US_WEST_2
c3.2xlarge
Medium
US West (Oregon)
AWS_VPC
US_WEST_2
i2.2xlarge
Extra Large
US West (Oregon)
AWS_VPC
US_WEST_2
t2.small
Starter
US West (Oregon)
AWS_VPC
US_WEST_2
t2.medium
Professional
US West (Oregon)
AZURE
CANADA_CENTRAL
Standard_DS12_v2-512
Premium: small
Canada Central (Toronto)
AZURE
CANADA_CENTRAL
Standard_DS12_v2-1023
Premium: balanced
Canada Central (Toronto)
AZURE
CANADA_CENTRAL
Standard_DS12_v2-2046
Premium: bulk storage
Canada Central (Toronto)
AZURE
CANADA_CENTRAL
Standard_DS2_v2-256
Premium: tiny
Canada Central (Toronto)
AZURE
CANADA_EAST
Standard_DS12_v2-512
Premium: small
Canada East (Quebec City)
AZURE
CANADA_EAST
Standard_DS12_v2-1023
Premium: balanced
Canada East (Quebec City)
AZURE
CANADA_EAST
Standard_DS12_v2-2046
Premium: bulk storage
Canada East (Quebec City)
AZURE
CANADA_EAST
Standard_DS2_v2-256
Premium: tiny
Canada East (Quebec City)
AZURE
CENTRAL_US
Standard_DS12_v2-512
Premium: small
Central US (Iowa)
AZURE
CENTRAL_US
Standard_DS12_v2-1023
Premium: balanced
Central US (Iowa)
AZURE
CENTRAL_US
Standard_DS12_v2-2046
Premium: bulk storage
Central US (Iowa)
AZURE
CENTRAL_US
Standard_DS2_v2-256
Premium: tiny
Central US (Iowa)
AZURE
CENTRAL_US
Standard_DS13_v2-2046
Premium: extra large
Central US (Iowa)
AZURE
EAST_ASIA
Standard_DS12-512
Premium: small
East Asia (Hong Kong)
AZURE
EAST_ASIA
Standard_DS12-1023
Premium: balanced
East Asia (Hong Kong)
AZURE
EAST_ASIA
Standard_DS12-2046
Premium: bulk storage
East Asia (Hong Kong)
AZURE
EAST_ASIA
Standard_DS2-256
Premium: tiny
East Asia (Hong Kong)
AZURE
EAST_US
Standard_DS12_v2-512
Premium: small
East US (Virginia)
AZURE
EAST_US
Standard_DS12_v2-1023
Premium: balanced
East US (Virginia)
AZURE
EAST_US
Standard_DS12_v2-2046
Premium: bulk storage
East US (Virginia)
AZURE
EAST_US
Standard_DS2_v2-256
Premium: tiny
East US (Virginia)
AZURE
EAST_US_2
Standard_DS12_v2-512
Premium: small
East US 2 (Virginia)
AZURE
EAST_US_2
Standard_DS12_v2-1023
Premium: balanced
East US 2 (Virginia)
AZURE
EAST_US_2
Standard_DS12_v2-2046
Premium: bulk storage
East US 2 (Virginia)
AZURE
EAST_US_2
Standard_DS2_v2-256
Premium: tiny
East US 2 (Virginia)
AZURE
JAPAN_WEST
Standard_DS12_v2-512
Premium: small
Japan West (Osaka Prefecture)
AZURE
JAPAN_WEST
Standard_DS12_v2-1023
Premium: balanced
Japan West (Osaka Prefecture)
AZURE
JAPAN_WEST
Standard_DS12_v2-2046
Premium: bulk storage
Japan West (Osaka Prefecture)
AZURE
JAPAN_WEST
Standard_DS2_v2-256
Premium: tiny
Japan West (Osaka Prefecture)
AZURE
NORTH_EUROPE
Standard_DS12_v2-512
Premium: small
North Europe (Ireland)
AZURE
NORTH_EUROPE
Standard_DS12_v2-1023
Premium: balanced
North Europe (Ireland)
AZURE
NORTH_EUROPE
Standard_DS12_v2-2046
Premium: bulk storage
North Europe (Ireland)
AZURE
NORTH_EUROPE
Standard_DS2_v2-256
Premium: tiny
North Europe (Ireland)
AZURE
SOUTH_CENTRAL_US
Standard_DS12_v2-512
Premium: small
South Central US (Texas)
AZURE
SOUTH_CENTRAL_US
Standard_DS12_v2-1023
Premium: balanced
South Central US (Texas)
AZURE
SOUTH_CENTRAL_US
Standard_DS12_v2-2046
Premium: bulk storage
South Central US (Texas)
AZURE
SOUTH_CENTRAL_US
Standard_DS2_v2-256
Premium: tiny
South Central US (Texas)
AZURE
SOUTHEAST_ASIA
Standard_DS12_v2-512
Premium: small
Southeast Asia (Singapore)
AZURE
SOUTHEAST_ASIA
Standard_DS12_v2-1023
Premium: balanced
Southeast Asia (Singapore)
AZURE
SOUTHEAST_ASIA
Standard_DS12_v2-2046
Premium: bulk storage
Southeast Asia (Singapore)
AZURE
SOUTHEAST_ASIA
Standard_DS2_v2-256
Premium: tiny
Southeast Asia (Singapore)
AZURE
SOUTHEAST_ASIA
Standard_DS13_v2-2046
Premium: extra large
Southeast Asia (Singapore)
AZURE
WEST_EUROPE
Standard_DS12_v2-512
Premium: small
West Europe (Netherlands)
AZURE
WEST_EUROPE
Standard_DS12_v2-1023
Premium: balanced
West Europe (Netherlands)
AZURE
WEST_EUROPE
Standard_DS12_v2-2046
Premium: bulk storage
West Europe (Netherlands)
AZURE
WEST_EUROPE
Standard_DS2_v2-256
Premium: tiny
West Europe (Netherlands)
AZURE
WEST_US
Standard_DS12_v2-512
Premium: small
West US (California)
AZURE
WEST_US
Standard_DS12_v2-1023
Premium: balanced
West US (California)
AZURE
WEST_US
Standard_DS12_v2-2046
Premium: bulk storage
West US (California)
AZURE
WEST_US
Standard_DS2_v2-256
Premium: tiny
West US (California)
SOFTLAYER_BARE_METAL
AMS01
Xeon_1270
Medium
Western Europe (Amsterdam 01)
SOFTLAYER_BARE_METAL
AMS01
Xeon_2690
Large
Western Europe (Amsterdam 01)
SOFTLAYER_BARE_METAL
AMS01
Xeon_2690_x2
Extra Large
Western Europe (Amsterdam 01)
SOFTLAYER_BARE_METAL
AMS03
Xeon_1270
Medium
Western Europe (Amsterdam 03)
SOFTLAYER_BARE_METAL
AMS03
Xeon_2690
Large
Western Europe (Amsterdam 03)
SOFTLAYER_BARE_METAL
AMS03
Xeon_2690_x2
Extra Large
Western Europe (Amsterdam 03)
SOFTLAYER_BARE_METAL
DAL01
Xeon_1270
Medium
Central US (Dallas 01)
SOFTLAYER_BARE_METAL
DAL01
Xeon_2690
Large
Central US (Dallas 01)
SOFTLAYER_BARE_METAL
DAL01
Xeon_2690_x2
Extra Large
Central US (Dallas 01)
SOFTLAYER_BARE_METAL
DAL05
Xeon_1270
Medium
Central US (Dallas 05)
SOFTLAYER_BARE_METAL
DAL05
Xeon_2690
Large
Central US (Dallas 05)
SOFTLAYER_BARE_METAL
DAL05
Xeon_2690_x2
Extra Large
Central US (Dallas 05)
SOFTLAYER_BARE_METAL
DAL06
Xeon_1270
Medium
Central US (Dallas 06)
SOFTLAYER_BARE_METAL
DAL06
Xeon_2690
Large
Central US (Dallas 06)
SOFTLAYER_BARE_METAL
DAL06
Xeon_2690_x2
Extra Large
Central US (Dallas 06)
SOFTLAYER_BARE_METAL
DAL07
Xeon_1270
Medium
Central US (Dallas 07)
SOFTLAYER_BARE_METAL
DAL07
Xeon_2690
Large
Central US (Dallas 07)
SOFTLAYER_BARE_METAL
DAL07
Xeon_2690_x2
Extra Large
Central US (Dallas 07)
SOFTLAYER_BARE_METAL
DAL09
Xeon_1270
Medium
Central US (Dallas 09)
SOFTLAYER_BARE_METAL
DAL09
Xeon_2690
Large
Central US (Dallas 09)
SOFTLAYER_BARE_METAL
DAL09
Xeon_2690_x2
Extra Large
Central US (Dallas 09)
SOFTLAYER_BARE_METAL
FRA02
Xeon_1270
Medium
Western Europe (Frankfurt 02)
SOFTLAYER_BARE_METAL
FRA02
Xeon_2690
Large
Western Europe (Frankfurt 02)
SOFTLAYER_BARE_METAL
FRA02
Xeon_2690_x2
Extra Large
Western Europe (Frankfurt 02)
SOFTLAYER_BARE_METAL
HKG02
Xeon_1270
Medium
Asia (Hong Kong 02)
SOFTLAYER_BARE_METAL
HKG02
Xeon_2690
Large
Asia (Hong Kong 02)
SOFTLAYER_BARE_METAL
HKG02
Xeon_2690_x2
Extra Large
Asia (Hong Kong 02)
SOFTLAYER_BARE_METAL
HOU02
Xeon_1270
Medium
Central US (Houston 02)
SOFTLAYER_BARE_METAL
HOU02
Xeon_2690
Large
Central US (Houston 02)
SOFTLAYER_BARE_METAL
HOU02
Xeon_2690_x2
Extra Large
Central US (Houston 02)
SOFTLAYER_BARE_METAL
LON02
Xeon_1270
Medium
Western Europe (London 02)
SOFTLAYER_BARE_METAL
LON02
Xeon_2690
Large
Western Europe (London 02)
SOFTLAYER_BARE_METAL
LON02
Xeon_2690_x2
Extra Large
Western Europe (London 02)
SOFTLAYER_BARE_METAL
MEL01
Xeon_1270
Medium
Australia (Melbourne 01)
SOFTLAYER_BARE_METAL
MEL01
Xeon_2690
Large
Australia (Melbourne 01)
SOFTLAYER_BARE_METAL
MEL01
Xeon_2690_x2
Extra Large
Australia (Melbourne 01)
SOFTLAYER_BARE_METAL
MEX01
Xeon_1270
Medium
Mexico (Queretaro 01)
SOFTLAYER_BARE_METAL
MEX01
Xeon_2690
Large
Mexico (Queretaro 01)
SOFTLAYER_BARE_METAL
MEX01
Xeon_2690_x2
Extra Large
Mexico (Queretaro 01)
SOFTLAYER_BARE_METAL
MON01
Xeon_1270
Medium
Canada (Montreal 01)
SOFTLAYER_BARE_METAL
MON01
Xeon_2690
Large
Canada (Montreal 01)
SOFTLAYER_BARE_METAL
MON01
Xeon_2690_x2
Extra Large
Canada (Montreal 01)
SOFTLAYER_BARE_METAL
PAR01
Xeon_1270
Medium
Western Europe (Paris 01)
SOFTLAYER_BARE_METAL
PAR01
Xeon_2690
Large
Western Europe (Paris 01)
SOFTLAYER_BARE_METAL
PAR01
Xeon_2690_x2
Extra Large
Western Europe (Paris 01)
SOFTLAYER_BARE_METAL
SEA01
Xeon_1270
Medium
West Coast US (Seattle 01)
SOFTLAYER_BARE_METAL
SEA01
Xeon_2690
Large
West Coast US (Seattle 01)
SOFTLAYER_BARE_METAL
SEA01
Xeon_2690_x2
Extra Large
West Coast US (Seattle 01)
SOFTLAYER_BARE_METAL
SJC01
Xeon_1270
Medium
West Coast US (San Jose 01)
SOFTLAYER_BARE_METAL
SJC01
Xeon_2690
Large
West Coast US (San Jose 01)
SOFTLAYER_BARE_METAL
SJC01
Xeon_2690_x2
Extra Large
West Coast US (San Jose 01)
SOFTLAYER_BARE_METAL
SNG01
Xeon_1270
Medium
Southeast Asia (Singapore 01)
SOFTLAYER_BARE_METAL
SNG01
Xeon_2690
Large
Southeast Asia (Singapore 01)
SOFTLAYER_BARE_METAL
SNG01
Xeon_2690_x2
Extra Large
Southeast Asia (Singapore 01)
SOFTLAYER_BARE_METAL
SYD01
Xeon_1270
Medium
Australia (Sydney 01)
SOFTLAYER_BARE_METAL
SYD01
Xeon_2690
Large
Australia (Sydney 01)
SOFTLAYER_BARE_METAL
SYD01
Xeon_2690_x2
Extra Large
Australia (Sydney 01)
SOFTLAYER_BARE_METAL
TOK02
Xeon_1270
Medium
Japan (Tokyo 02)
SOFTLAYER_BARE_METAL
TOK02
Xeon_2690
Large
Japan (Tokyo 02)
SOFTLAYER_BARE_METAL
TOK02
Xeon_2690_x2
Extra Large
Japan (Tokyo 02)
SOFTLAYER_BARE_METAL
TOR01
Xeon_1270
Medium
Canada (Toronto 01)
SOFTLAYER_BARE_METAL
TOR01
Xeon_2690
Large
Canada (Toronto 01)
SOFTLAYER_BARE_METAL
TOR01
Xeon_2690_x2
Extra Large
Canada (Toronto 01)
SOFTLAYER_BARE_METAL
WDC01
Xeon_1270
Medium
East Coast US (Washington, DC 01)
SOFTLAYER_BARE_METAL
WDC01
Xeon_2690
Large
East Coast US (Washington, DC 01)
SOFTLAYER_BARE_METAL
WDC01
Xeon_2690_x2
Extra Large
East Coast US (Washington, DC 01)
GCP
asia-east1
n1-highmem-4-1600
High Memory: Bulk
Eastern Asia-Pacific (Taiwan)
GCP
asia-east1
n1-highmem-4-400
High Memory: Small
Eastern Asia-Pacific (Taiwan)
GCP
asia-east1
n1-highmem-4-800
High Memory: Balanced
Eastern Asia-Pacific (Taiwan)
GCP
asia-east1
n1-standard-1
Professional
Eastern Asia-Pacific (Taiwan)
GCP
asia-east1
n1-standard-2
Tiny
Eastern Asia-Pacific (Taiwan)
GCP
asia-east1
n1-standard-4-1600
Bulk
Eastern Asia-Pacific (Taiwan)
GCP
asia-east1
n1-standard-4-400
Small
Eastern Asia-Pacific (Taiwan)
GCP
asia-east1
n1-standard-4-800
Balanced
Eastern Asia-Pacific (Taiwan)
GCP
asia-northeast1
n1-highmem-4-1600
High Memory: Bulk
Northeastern Asia-pacific (Japan)
GCP
asia-northeast1
n1-highmem-4-400
High Memory: Small
Northeastern Asia-pacific (Japan)
GCP
asia-northeast1
n1-highmem-4-800
High Memory: Balanced
Northeastern Asia-pacific (Japan)
GCP
asia-northeast1
n1-standard-1
Professional
Northeastern Asia-pacific (Japan)
GCP
asia-northeast1
n1-standard-2
Tiny
Northeastern Asia-pacific (Japan)
GCP
asia-northeast1
n1-standard-4-1600
Bulk
Northeastern Asia-pacific (Japan)
GCP
asia-northeast1
n1-standard-4-400
Small
Northeastern Asia-pacific (Japan)
GCP
asia-northeast1
n1-standard-4-800
Balanced
Northeastern Asia-pacific (Japan)
GCP
europe-west1
n1-highmem-4-1600
High Memory: Bulk
Western Europe (Belgium)
GCP
europe-west1
n1-highmem-4-400
High Memory: Small
Western Europe (Belgium)
GCP
europe-west1
n1-highmem-4-800
High Memory: Balanced
Western Europe (Belgium)
GCP
europe-west1
n1-standard-1
Professional
Western Europe (Belgium)
GCP
europe-west1
n1-standard-2
Tiny
Western Europe (Belgium)
GCP
europe-west1
n1-standard-4-1600
Bulk
Western Europe (Belgium)
GCP
europe-west1
n1-standard-4-400
Small
Western Europe (Belgium)
GCP
europe-west1
n1-standard-4-800
Balanced
Western Europe (Belgium)
GCP
us-central1
n1-highmem-4-1600
High Memory: Bulk
Central US (Iowa)
GCP
us-central1
n1-highmem-4-400
High Memory: Small
Central US (Iowa)
GCP
us-central1
n1-highmem-4-800
High Memory: Balanced
Central US (Iowa)
GCP
us-central1
n1-standard-1
Professional
Central US (Iowa)
GCP
us-central1
n1-standard-2
Tiny
Central US (Iowa)
GCP
us-central1
n1-standard-4-1600
Bulk
Central US (Iowa)
GCP
us-central1
n1-standard-4-400
Small
Central US (Iowa)
GCP
us-central1
n1-standard-4-800
Balanced
Central US (Iowa)
GCP
us-east1
n1-highmem-4-1600
High Memory: Bulk
Eastern US (South Carolina)
GCP
us-east1
n1-highmem-4-400
High Memory: Small
Eastern US (South Carolina)
GCP
us-east1
n1-highmem-4-800
High Memory: Balanced
Eastern US (South Carolina)
GCP
us-east1
n1-standard-1
Professional
Eastern US (South Carolina)
GCP
us-east1
n1-standard-2
Tiny
Eastern US (South Carolina)
GCP
us-east1
n1-standard-4-1600
Bulk
Eastern US (South Carolina)
GCP
us-east1
n1-standard-4-400
Small
Eastern US (South Carolina)
GCP
us-east1
n1-standard-4-800
Balanced
Eastern US (South Carolina)
GCP
us-west1
n1-highmem-4-1600
High Memory: Bulk
Western US (Oregon)
GCP
us-west1
n1-highmem-4-400
High Memory: Small
Western US (Oregon)
GCP
us-west1
n1-highmem-4-800
High Memory: Balanced
Western US (Oregon)
GCP
us-west1
n1-standard-1
Professional
Western US (Oregon)
GCP
us-west1
n1-standard-2
Tiny
Western US (Oregon)
GCP
us-west1
n1-standard-4-1600
Bulk
Western US (Oregon)
GCP
us-west1
n1-standard-4-400
Small
Western US (Oregon)
GCP
us-west1
n1-standard-4-800
Balanced
Western US (Oregon)
Note: These values can change over time. If in doubt see the create cluster page on the console or contact support.
Reference Data - Racks
We recommend that rack allocation be distributed evenly where possible to ensure consistent performance and optimal data distribution & redundancy.
provider
dataCentre
rackAllocation.name
AWS_VPC
AP_NORTHEAST_1
ap-northeast-1a
AWS_VPC
AP_NORTHEAST_1
ap-northeast-1c
AWS_VPC
AP_SOUTHEAST_1
ap-southeast-1a
AWS_VPC
AP_SOUTHEAST_1
ap-southeast-1b
AWS_VPC
AP_SOUTHEAST_2
ap-southeast-2a
AWS_VPC
AP_SOUTHEAST_2
ap-southeast-2b
AWS_VPC
EU_CENTRAL_1
eu-central-1a
AWS_VPC
EU_CENTRAL_1
eu-central-1b
AWS_VPC
EU_WEST_1
eu-west-1a
AWS_VPC
EU_WEST_1
eu-west-1b
AWS_VPC
EU_WEST_1
eu-west-1c
AWS_VPC
SA_EAST_1
sa-east-1a
AWS_VPC
SA_EAST_1
sa-east-1c
AWS_VPC
US_EAST_1
us-east-1a
AWS_VPC
US_EAST_1
us-east-1b
AWS_VPC
US_EAST_1
us-east-1c
AWS_VPC
US_EAST_1
us-east-1d
AWS_VPC
US_EAST_1
us-east-1e
AWS_VPC
US_WEST_1
us-west-1b
AWS_VPC
US_WEST_1
us-west-1c
AWS_VPC
US_WEST_2
us-west-2a
AWS_VPC
US_WEST_2
us-west-2b
AWS_VPC
US_WEST_2
us-west-2c
AZURE
AUSTRALIA_SOUTHEAST
fault-domain-0
AZURE
AUSTRALIA_SOUTHEAST
fault-domain-1
AZURE
AUSTRALIA_SOUTHEAST
fault-domain-2
AZURE
CANADA_CENTRAL
fault-domain-0
AZURE
CANADA_CENTRAL
fault-domain-1
AZURE
CANADA_CENTRAL
fault-domain-2
AZURE
CANADA_EAST
fault-domain-0
AZURE
CANADA_EAST
fault-domain-1
AZURE
CANADA_EAST
fault-domain-2
AZURE
CENTRAL_US
fault-domain-0
AZURE
CENTRAL_US
fault-domain-1
AZURE
CENTRAL_US
fault-domain-2
AZURE
EAST_ASIA
fault-domain-0
AZURE
EAST_ASIA
fault-domain-1
AZURE
EAST_ASIA
fault-domain-2
AZURE
EAST_US
fault-domain-0
AZURE
EAST_US
fault-domain-1
AZURE
EAST_US
fault-domain-2
AZURE
EAST_US_2
fault-domain-0
AZURE
EAST_US_2
fault-domain-1
AZURE
EAST_US_2
fault-domain-2
AZURE
JAPAN_EAST
fault-domain-0
AZURE
JAPAN_EAST
fault-domain-1
AZURE
JAPAN_EAST
fault-domain-2
AZURE
JAPAN_WEST
fault-domain-0
AZURE
JAPAN_WEST
fault-domain-1
AZURE
JAPAN_WEST
fault-domain-2
AZURE
NORTH_EUROPE
fault-domain-0
AZURE
NORTH_EUROPE
fault-domain-1
AZURE
NORTH_EUROPE
fault-domain-2
AZURE
SOUTH_CENTRAL_US
fault-domain-0
AZURE
SOUTH_CENTRAL_US
fault-domain-1
AZURE
SOUTH_CENTRAL_US
fault-domain-2
AZURE
SOUTHEAST_ASIA
fault-domain-0
AZURE
SOUTHEAST_ASIA
fault-domain-1
AZURE
SOUTHEAST_ASIA
fault-domain-2
AZURE
WEST_EUROPE
fault-domain-0
AZURE
WEST_EUROPE
fault-domain-1
AZURE
WEST_EUROPE
fault-domain-2
AZURE
WEST_US
fault-domain-0
AZURE
WEST_US
fault-domain-1
AZURE
WEST_US
fault-domain-2
SOFTLAYER_BARE_METAL
AMS01
AMS01
SOFTLAYER_BARE_METAL
AMS03
AMS03
SOFTLAYER_BARE_METAL
DAL01
DAL01
SOFTLAYER_BARE_METAL
DAL05
DAL05
SOFTLAYER_BARE_METAL
DAL06
DAL06
SOFTLAYER_BARE_METAL
DAL07
DAL07
SOFTLAYER_BARE_METAL
DAL09
DAL09
SOFTLAYER_BARE_METAL
FRA02
FRA02
SOFTLAYER_BARE_METAL
HKG02
HKG02
SOFTLAYER_BARE_METAL
HOU02
HOU02
SOFTLAYER_BARE_METAL
LON02
LON02
SOFTLAYER_BARE_METAL
MEL01
MEL01
SOFTLAYER_BARE_METAL
MEX01
MEX01
SOFTLAYER_BARE_METAL
MON01
MON01
SOFTLAYER_BARE_METAL
PAR01
PAR01
SOFTLAYER_BARE_METAL
SEA01
SEA01
SOFTLAYER_BARE_METAL
SJC01
SJC01
SOFTLAYER_BARE_METAL
SNG01
SNG01
SOFTLAYER_BARE_METAL
SYD01
SYD01
SOFTLAYER_BARE_METAL
TOK02
TOK02
SOFTLAYER_BARE_METAL
TOR01
TOR01
SOFTLAYER_BARE_METAL
WDC01
WDC01
GCP
us-west1
us-west1-a
GCP
us-west1
us-west1-b
GCP
us-central1
us-central1-a
GCP
us-central1
us-central1-b
GCP
us-central1
us-central1-c
GCP
us-central1
us-central1-f
GCP
us-east1
us-east1-b
GCP
us-east1
us-east1-c
GCP
us-east1
us-east1-d
GCP
europe-west1
europe-west1-b
GCP
europe-west1
europe-west1-c
GCP
europe-west1
europe-west1-d
GCP
asia-east1
asia-east1-a
GCP
asia-east1
asia-east1-b
GCP
asia-east1
asia-east1-c
GCP
asia-northeast1
asia-northeast1-a
GCP
asia-northeast1
asia-northeast1-b
GCP
asia-northeast1
asia-northeast1-c
Customers running in their own accounts may have other racks available for provisioning.
Errors
Unsuccessful calls will return the following responses, depending upon the issue. Where possible we will provide meaningful detail on the failure:
400 Bad Request: Returned when the expected node or cluster ID is not a valid UUID.
404 Not Found: Returned when accessing an incorrect URL or trying to access a cluster/node not owned by the authenticated user.
429 Too Many Requests: Returned when the request rate exceeds 1 request / 2.5 seconds.
500 Server Error: All other errors.
View Article(Read or write) request latency is time to complete an operations as reportedby the coordinator node. Its the time taken between the client read or write request being received, to contacting the right replicas across the network and returning the data requested to the client.
Note: p95 is a 5 minute decaying statistic. Average is calculated over the 20 seconds between metrics collections and will be more volatile.
There are many potential causes of high request latency, such as cluster overload, a node falling behind in compactions and Cassandra having to read many SSTables in a read, high levels of tombstones or overly large partitions.
If you are concerned about high request latencies on your cluster, feel free to contact .
View ArticleRepair is a Cassandra operation which ensures that data consistency is eventually attained across the ring. It is I/O intensive but required to ensure well-being of the cluster.
The graphs display 2 metrics related to repairs: Active repairs are the number of repair related messages (validation/streaming) that are currently being executed in a thread in Cassandra, while pending repairs is the number of repair messages queued up waiting for a thread.
Repairs are a scheduled operation. If you believe repairs may be causing issues with your cluster then please contact [email protected] and we will investigate.
View ArticleCassandra performs compactions to remove duplicate entries and merge SSTables to keep read throughput high and disk usage low. Compactions are an ongoing background task executed by Cassandra. Pending compactions is thenumber of compaction tasks that are waiting for a thread to become available.
Having a significant (>20) and/or increasing numberof pending compactions usually indicates that the cluster has insufficient processing capacity to process the load that it is receiving. High pending compactions can be expectedhigh duringbulk loads or streaming of data to add or replace a node and may be acceptable during peak period such as running a batch update.
View ArticleList
The Tombstones Per Read option displays the latest metrics for Tombstones and Live Cells scanned per read, for each column family, averaged across the cluster.
The Tombstone Meancolumnrepresents, on average, how many tombstones are read per read request.
TheTombstone Max column represents the highest number of tombstones read in a read request.
TheLive Cells Mean column represents, on average, how many live cells are read per read request.
TheLive Cells Max column represents the highest number of live cells read in a read request.
All values are calculated over roughly the last 5 minutes of activity.
High ratios of tombstones to live cells (greater than 5x as a starting guide) can cause substantially reduced performance in reads from a column family and suggest a need for a change in data model.Contact [email protected] if you need further assistance in dealing with tombstone issues.
Graph
TheTombstones Per Read option shows the mean and maximum tombstones and live cells read per read request.
View ArticleList
TheSSTables Per Read option displaysthe latest mean and maximum metrics for each column family, averaged across the cluster.
The Meancolumn represents, on average (across all nodes over roughly the last five minutes), how many SSTables must be involved per read request.
The Maximumcolumn represents the highest number of SSTables (across all nodes over roughly the last five minutes) that were involved in a read request.
High numbers of sstables per read (typically, more than 3 or 4 as a guide) can reduce read performance and if read performance is below desired levels you many need to change compaction strategy for the affected column family.
Graph
The SSTables Per Read option shows the mean and maximum SSTables involved per read request per node, over the last hour.
View ArticleList
The Read Latency and Write Latency options display:
Reads/Writes: the average number of local read or write requests processedper second, by each node in the cluster
95th Percentile: the latest mean (Avg. Node Mean) and 95th percentile (Avg. Node 95th % Latency) latency metrics
Distribution: the percentile distribution of latency metrics
for each column family in the cluster.
TheAvg. Node Mean column is the mean read or write latency for the column family, oneach node in the cluster (over roughly the last five minutes), averaged across the cluster.
TheAvg. Node 95th % Latencycolumn is the read or write latency for which 95% of sampled values fall below, on each node in the cluster (over roughly the last five minutes), averaged across the cluster.
The 50th percentile, 75th percentile, 95th percentile and 99th percentile columnsshow the respective latency metrics below which 50%, 75%, 95% and 99% of latency metrics fall (over roughly the last five minutes), averaged across the cluster.
High latency values may indicate a cluster at the edge of its processing capacity, issues with the data model such as poor choice of partition key or high levels of tombstones or issues with the underlying infrastructure. If you require assistance diagnosing the source of latency issues then please contact [email protected].
Graph
The Read Latency and Write Latency options show:
Reads/Writes: the number of local read or write requests per second for the column family, per node, over the most recent period of time
95th Percentile: the mean and 95th percentile latency metrics per node, over the last hour
Distribution: the 50th, 75th, 95th and 99th percentile latency metrics per node, over the last hour
View ArticleThe Cassandra Service Status provides an indicator of thecurrent stateof the Cassandra service on the node. We report two statuses:
Upindicates that the Cassandra serviceappears to be running normally;
Support Alertedindicates that the service is not in a normal running state or that it has lost contact with our monitoring system. Instaclustr Support will be aware of this situation and working to restore the node to normal state as soon as possible.
It is important to understand that Cassandra is designed to operate normally whenone or more nodes in the cluster are unavailable. Instaclustr Support will occasionally restartindividual nodes for planned maintenance such asupgrading Cassandra or operating system patching. Planned maintenance will normal only result in a single node in the cluster being unavailable at any point in time.
Should you have any concerns or queries regarding the Cassandra Service Status of a node in your cluster then please contact Instaclustr Support.
View ArticleThe reads and writes per second metrics show the number of reads and writes per second completed by a node averaged over 10 second intervals. Please note that client requests are when then the node serves as a coordinator for the operation, as opposed to local requests where the node actually reads or writes the data.
The number of reads and writes per second is a good indicator of the load that a node is sustaining. Increases in load above the available capacity of the node may result in increased latency of operations and, eventually, timeout errors.
Should you have any queries regarding the processing capacity of your cluster please contact Instaclustr Support.
View ArticleThe CPU usage metric shows the percentage of CPU utilised on your node. The percentage shown is the percentage of total CPU available (ie the maximum possible is 100% no matter how many CPU cores in the node).
High CPU usage is one indicator of a node reaching the limits of its processing capacity.
If you are experiencing consistently high CPU usage and not reaching the desired throughput on your Cassandra cluster you may need to tune your data model or add nodes to your cluster to increase processing capacity.
Please contact Instaclustr Support should you have any questions regarding this metric or the processing capacity of your cluster.
View ArticleThe disk usage metric shows the percentage of space used on the data partition of a node. This includes the main Cassandra files containing your data as well as working files such as snapshots for backups and temporary copies of files used for compactions.
We recommend that disk usage is kept to less than 70% during normal running to allow temporary working space for Cassandra operations. If your cluster is regularly exceeding 70% then you should consider adding capacity or removing data.
Instaclustr Support monitors disk use for all managedclusters and will notify you if your cluster is exceeding recommended levels of disk usage.
View Article1. To create an Elassandra cluster, log into the Instaclustr console. If you do not have an Instaclustr account, refer our support article to sign up.
2. Once you are logged in, click the Create Cassandra Cluster button.
Connecting to the Elassandra Cluster
3. On the Create Cassandra Cluster page, enter an appropriate name and network address block for your cluster. Refer our support article on Network Address Allocation to understand how we divide up the specified network range to determine the node IP addresses. Under Applications section, select Elassandra. Select any add-ons as per requirement.
4. Under Data Centre section, select your Infrastructure Provider, Region, Custom Name (which is a logical name for the data centre within Cassandra. A default Custom Name is provided for you), Node Size and EBS Encryption option.
5. Under Cassandra Options section, select your Network and Security settings.
6. Under Elassandra Options section, select the security checkbox if you want to add your current IP address to firewall rules for Elassandra API calls. Do the same under Kibana Options section. The Summary section displays a brief summary of your cluster configurations and pricing details. Click the Terms and Conditions link to open the Instaclustr Terms and Conditions and other policy document. After going through the document, select checkbox to accept the Terms and Conditions. Once you are happy with the cluster configuration and accepted the terms and conditions, click the Create Cluster button to start creating the cluster.
7. Once the cluster has been provisioned, it will be listed in your Cluster Overview page.
8. Now that you have an Elassandra cluster up and running, refer the following support articles to configure and connect to your cluster:
Updating Firewall Rules
View ArticleOnce the cluster is provisioned, you can update firewall rules to specify individual hosts and networks that can connect to the cluster.
1. On the Cluster Overview page, click Cluster Settings from Manage Cluster menu of your cluster.
2. On the Settings page, add and update IP addresses to Cassandra Allowed Addresses, Elassandra REST API Allowed Addresses and Kibana Allowed Addresses. Addresses listed in these sections will be allowed to connect to the respective application ports. Once all the required IP address are added, click the Save Cluster Settings button.
View ArticleBefore connecting to the cluster, you need to take note of node addresses and various login credentials.
Node Addresses and Login Credentials
The Connection Info page contains list of Node addresses and login credentials for Cassandra, Elassandra REST API and Kibana.
1. Click Connection Details from Manage Cluster menu to access the Connection Info page.
Connecting to Clusters Using CQLSH
2. On the Connection Info page, you can view Node Addresses under Cassandra Node Addresses section.
3. Login credentials can be found under Default Credentials for Password Authentication section.
4. Elassandra REST API credentials and certificates can be found under Elassandra section.
5. Kibana credentials can be found under Kibana section.
Connecting to the cluster using cqlsh
To connect to the cluster using CQLSH, refer our support article. Connecting and running CQL commands work the same for Elassandra clusters.
Elassandra REST API
You can also communicate with the cluster through Elassandra REST API. The following examples show how to interact with REST API using curl command. Replace keywords in <> with the corresponding cluster configs (Refer Node Addresses and Login Credentials section at the start of this article)
1. To view state of the cluster:
curl -XGET https://<elassandra_REST_API_URL>:9201/_cluster/state -u <elassandra_username>:<elassandra_password>
Result:
{"cluster_name":"Elassandra_Cluster","version":11,"state_uuid":"9ESEjhiJTQCaaD1-kus7Gw","master_node":"75375ebd-4692-4af4-8cda-6d2bbd48e0b3","blocks":{},"nodes":{"75375ebd-4692-4af4-8cda-6d2bbd48e0b3":{"name":"34.208.26.143","status":"ALIVE","transport_address":"34.208.26.143:9300","attributes":{"rack":"us-west-2c","data":"true","data_center":"AWS_VPC_US_WEST_2","master":"true"}},"04c0eeb2-c76c-4e1b-b4e2-4e296cf4d935":{"name":"35.165.13.99","status":"ALIVE","transport_address":"35.165.13.99:9300","attributes":{"rack":"us-west-2a","data":"true","data_center":"AWS_VPC_US_WEST_2","master":"true"}},"7471bfc1-fad8-421f-b67a-b061627f18a3":{"name":"52.34.201.185","status":"ALIVE",
2. To add Index:
curl -XPUT "https://<elassandra_REST_API_URL>:9201/samples" -d '{
"settings" : {
"index" : {
"number_of_replicas" : 2,
"index.search_strategy_class" : "RandomSearchStrategy",
"index.token_ranges_bitset_cache" : true
}
}
}' -u <elassandra_username>:<elassandra_password>
Result:
{"acknowledged":true}
The command creates a keyspace called samples. View the keyspace using cqlsh:
iccassandra@cqlsh> DESCRIBE samples;
Result:
CREATE KEYSPACE samples WITH replication = {'class': 'NetworkTopologyStrategy', 'AWS_VPC_US_WEST_2': '3'} AND durable_writes = true;
3. Add Mapping: This adds a table and creates the mapping from CQL Types to Elasticsearch types used in secondary index. It uses the Elasticsearch mapping API with some CQL additions.
curl -XPUT "https://<elassandra_REST_API_URL>:9201/samples/_mapping/earthquakes" -d '{
"earthquakes": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed",
"cql_collection": "singleton"
},
"location": {
"type": "geo_point"
},
"mag": {
"type": "double",
"cql_collection": "singleton",
"cql_partition_key": false,
"cql_primary_key_order": 1
},
"place": {
"type": "string",
"index": "analyzed",
"cql_collection": "singleton"
},
"time": {
"type": "date",
"cql_collection": "singleton",
"format": "strict_date_optional_time||epoch_millis||EEE MMM dd HH:mm:ss zzz yyyy",
"cql_partition_key": true,
"cql_primary_key_order": 0
},
"url": {
"type": "string",
"index": "not_analyzed",
"cql_collection": "singleton"
},
"felt": {
"type": "boolean",
"cql_collection": "singleton"
},
"cdi": {
"type": "double",
"cql_collection": "singleton"
},
"tsunami": {
"type": "boolean",
"cql_collection": "singleton"
},
"sig": {
"type": "integer",
"cql_collection": "singleton"
},
"type": {
"type": "string",
"index": "not_analyzed",
"cql_collection": "singleton"
},
"title": {
"type": "string",
"index": "analyzed",
"cql_collection": "singleton"
}
}
}
}' -u <elassandra_username>:<elassandra_password> -w "\n"
Result:
{"acknowledged":true}
Table earthquakes is created under keyspace samples. View table using cqlsh:
iccassandra@cqlsh> USE samples;iccassandra@cqlsh:samples> DESCRIBE earthquakes;
Result:
CREATE TABLE samples.earthquakes (
time timestamp,
mag double,
cdi double,
felt boolean,
id text,
location list<frozen>,
place text,
sig int,
title text,
tsunami boolean,
type text,
url text,
PRIMARY KEY (time, mag)
) WITH CLUSTERING ORDER BY (mag ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = 'Auto-created by Elassandra'
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE CUSTOM INDEX elastic_earthquakes_felt_idx ON samples.earthquakes (felt) USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
CREATE CUSTOM INDEX elastic_earthquakes_type_idx ON samples.earthquakes (type) USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
CREATE CUSTOM INDEX elastic_earthquakes_sig_idx ON samples.earthquakes (sig) USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
CREATE CUSTOM INDEX elastic_earthquakes_id_idx ON samples.earthquakes (id) USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
CREATE CUSTOM INDEX elastic_earthquakes_url_idx ON samples.earthquakes (url) USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
CREATE CUSTOM INDEX elastic_earthquakes_title_idx ON samples.earthquakes (title) USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
CREATE CUSTOM INDEX elastic_earthquakes_cdi_idx ON samples.earthquakes (cdi) USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
CREATE CUSTOM INDEX elastic_earthquakes_tsunami_idx ON samples.earthquakes (tsunami) USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
CREATE CUSTOM INDEX elastic_earthquakes_place_idx ON samples.earthquakes (place) USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
CREATE CUSTOM INDEX elastic_earthquakes_mag_idx ON samples.earthquakes (mag) USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
CREATE CUSTOM INDEX elastic_earthquakes_location_idx ON samples.earthquakes (values(location)) USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
4. Indexing: Rows or documents can be indexed through CQL or via REST
curl -XPUT "https://<elassandra_REST_API_URL>:9201/samples/earthquakes/1469922907000" -d '{
"id": "pr16212011",
"mag": 2.0,
"cdi": 0,
"felt": false,
"location": [{lat: 18.290100, lon: -67.235600}],
"place": "Mona Passage, Puerto Rico",
"sig": 62,
"title": "M 2.0 - Mona Passage, Puerto Rico",
"tsunami": false,
"type": "earthquake",
"url": "https://earthquake.usgs.gov/earthquakes/eventpage/pr16212011"
}' -u <elassandra_username>:<elassandra_password>
Result:
{"_index":"samples","_type":"earthquakes","_id":"1469922907000","_version":1,"_shards":{"total":3,"successful":1,"failed":0},"created":true}
5. Data or documents in a CSV file can be indexed using CQLSH from the command line.
$ cat <csv_filename> | cqlsh cluster_ip 9042 -u <cassandra_username> -p <cassandra_password> -e "COPY <keyspace_name>.<table_name> FROM stdin"
6. To retrieve a document
curl -XGET "https://elassandra_REST_API_URL>:9201/samples/earthquakes/1469922907000" -u <elassandra_username>:<elassandra_password>
Result:
{"_index":"samples","_type":"earthquakes","_id":"1469922907000","_version":1,"found":true,"_source":{"time":"2016-07-30T23:55:07.000Z"}}
This is equivalent to the following CQL command:
iccassandra@cqlsh> SELECT * FROM samples.earthquakes WHERE time = '1469922907000';
7. Searching: Following are some examples of search commands using the REST API
Find rows with Japan in the title
curl -XGET https://<elassandra_REST_API_URL>:9201/samples/earthquakes/_search?q=title:japan -u <elassandra_username>:<elassandra_password>
Find rows with specific magnitude
curl -XGET https://<elassandra_REST_API_URL>:9201/samples/earthquakes/_search?q=mag:6.3 -u <elassandra_username>:<elassandra_password>
Multi Word Query
curl -XGET https://<elassandra_REST_API_URL>:9201/samples/earthquakes/_search?q=title:Raoul%20New%20Zealand -u <elassandra_username>:<elassandra_password>
Search for rows that has all the words being searched
curl -XGET https://<elassandra_REST_API_URL>:9201/samples/earthquakes/_search?pretty=true -d '{ "query" : { "match" : { "place" : {"query" : "Raoul New Zealand", "operator" : "and"} }}}' -u <elassandra_username>:<elassandra_password>
8. Aggregation: Here is an aggregate query that groups and counts earthquake by magnitude.
curl -XGET https://<elassandra_REST_API_URL>:9201/samples/earthquakes/_search?pretty=true -d '{ "size" : 0,
"aggs" : {
"magnitudes" : {
"histogram" : {
"field" : "mag",
"interval" : 1
}
}
}
}' -u <elassandra_username>:<elassandra_password>
Result:
{
"took" : 268,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 75062,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"magnitudes" : {
"buckets" : [ {
"key" : 1,
"doc_count" : 44067
}, {
"key" : 2,
"doc_count" : 14433
}, {
"key" : 3,
"doc_count" : 3490
}, {
"key" : 4,
"doc_count" : 11413
}, {
"key" : 5,
"doc_count" : 1545
}, {
"key" : 6,
"doc_count" : 103
}, {
"key" : 7,
"doc_count" : 11
} ]
}
}
}
View ArticleElassandra (Elasticsearch + Cassandra) is a fork of Elasticsearch modified to run on top of Apache Cassandra to provide advanced search features on Cassandra tables. In this tutorial we will walk you through the basic steps of setting up an Instaclustr Elassandra cluster with Zeppelin on Amazon Web Services (AWS) and how to query and visualize Elassandra indexes using Elasticsearch interpreter. The high-level steps are:
Provision a cluster with Elassandra and Zeppelin
Create a Zeppelin notebook based on Elasticsearch interpreter
Add data to Elassandra using Zeppelin Elasticsearch interpreter
Query and search data via Zeppelin notebook
1. Provision a cluster with Elassandra and Zeppelin
a) If you havent already signed up for an Instaclustr account, refer our support article to sign up and create an account.
b) Once you have signed up for Instaclustr and verified your email, log in to the Instaclustr console and click the Create Cassandra Cluster button.
https://zeppelin.apache.org/
c) On the Create Cassandra Cluster page, enter an appropriate name and network address block for your cluster. Refer our support article on Network Address Allocation to understand how we divide up the specified network range to determine the node IP addresses. Under Applications section, select:
Elassandra 2.4.2.13 (Cassandra 3.0.10) (preview)
Apache Zeppelin as an Add-on
d) Under Data Centre section, select:
Amazon Web Services as the Infrastructure Provider
A minimum node size of t2.medium
e) Leave the other options as default. Accept the terms and conditions and click Create Cluster button.
The cluster will automatically provision and will be available for use once all nodes are in the running state.
2. Create a notebook based on Elasticsearch interpreter
a) Once all nodes in the cluster are in the running state, click the Zeppelin tab to get to its dashboard.
b)You will be asked to provide Zeppelin account credentials which can be found on the Connection Info page.
c) On the Zeppelin Dashboard, click Create new note. On the Create New Note dialog box, choose a name for the notebook, select elasticsearch as Default Interpreter and click Create Note button.
d) The notebook has already been preconfigured to use Elasticsearch interpreter. Click the gear button on the top right of the notebook to see the enabled interpreters and more importantly Elasticsearch.
e) Make sure Elasticsearch interpreter is at the top of the list and Cassandra interpreter is enabled. Click Save button to save the settings.
3. Add data to Elassandra using Zeppelin Elasticsearch interpreter
To start off, let's index some data into Elassandra by running the commands below, one per paragraph.Note: if Elasticsearch is not your default interpreter, you should have %elasticsearch at the top of each paragraph to get it to run.
index twitter/user/kimchy { "name" : "Shay Banon" }
Index some more data by running the following commands on the notebook:
index twitter/tweet/1 {
"postDate": "2009-11-15T13:12:00",
"message": "Trying out Zeppelin Elasticsearch interpreter, so far so good?"}
index twitter/tweet/2 {
"postDate": "2009-11-15T14:12:12",
"message": "Another tweet, will it be indexed?"}
index twitter/tweet/3 {
"postDate": "2009-11-15T15:12:12",
"message": "Give me my index and no query gets hurt!"}
index twitter/tweet/4 {
"postDate": "2009-11-16T15:12:12",
"message": "Index it before search it!"}
4. Query Elassandra data
Once the data is in Elassandra, we can search using Zeppelin, for example:
get twitter/user/kimchy
count twitter/tweet
search twitter/tweet
The result of a search query can also be viewed graphically (histograms, pie charts etc.) or downloaded as CSV (Comma Separated Values) or TSV (Tab Separated Values) file by clicking on the buttons marked in blue box in the above screenshot.
We can also search for specific words or strings:
search twitter/tweet { "query": { "query_string": { "query": "good" } } }
search twitter/tweet { "query": { "query_string": { "query": "it" } } }
Finally, to get the list of available commands, run:
help
5. ConclusionIn this tutorial you have learned how to:
Provision a cluster with Elassandra and Zeppelin
Create a Zeppelin notebook based on Elasticsearch interpreter
Add, query and search data via Zeppelin notebook
For more information, refer following resources:
http://www.elassandra.io/
View ArticleElassandra (Elasticsearch + Cassandra) is a fork of Elasticsearch modified to run on top of Apache Cassandra to provide advanced search features on Cassandra tables. In this tutorial we will walk you through the basic steps of setting up an Instaclustr Elassandra cluster with Spark on Amazon Web Services (AWS) and how to write and query Elassandra from Spark. The high-level steps are:
Provision a cluster with Elassandra and Spark
Set up a Spark client to communicate with Elassandra via the Elasticsearch REST API
Configure network access
Run basic queries to read data from Elassandra using Spark Shell
Submit a Spark job to write to Elassandra index
This tutorial assumes that you are familiar with launching and connecting to servers in AWS. While this tutorial uses AWS, we also support Spark on Azure, IBM SoftLayer, and Google Cloud Platform. You can follow a similar approach to set up on those platforms or contact [email protected] if you need more detailed instructions.
1. Provision a cluster with Elassandra and Spark
a) If you havent already signed up for an Instaclustr account, refer our support article to sign up and create an account.
b) Once you have signed up for Instaclustr and verified your email, log in to the Instaclustr console and click the Create Cassandra Cluster button.
https://spark.apache.org/
c) On the Create Cassandra Cluster page, enter an appropriate name and network address block for your cluster. Refer our support article on Network Address Allocation to understand how we divide up the specified network range to determine the node IP addresses. Under Applications section, select:
Elassandra 2.4.2.13 (Cassandra 3.0.10) (preview)
Apache Spark as an Add-on
d) Under Data Centre section, select:
Amazon Web Services as the Infrastructure Provider
A minimum node size of m4l-250
e) Leave the other options as default. Accept the terms and conditions and click Create Cluster button.
The cluster will automatically provision and will be available for use once all nodes are in the running state.
2. Set Up a Spark Client
To use your newly created Spark cluster, you will need to set up a client machine to submit jobs. Use the following steps to set up a client in AWS:
a) Provision a new AWS server with the following configuration:
Region: same as your newly created Elassandra and Spark cluster
VPC: if possible, use a VPC with DNS resolution and DNS hostname enabled (Otherwise, refer step g below). The VPC network range should not overlap with the network range of your instaclustr cluster.
AMI: Ubuntu Server 14.04 LTS (HVM), SSD Volume Type as the AMI
Size: t2.small or t2.micro is sufficient for this tutorial and sufficient for many use-cases ongoing
b) ssh to the newly launched server with ubuntu as username.
c) Download the spark version which matches the cluster created in Step 1. In this case, Spark 1.6.0:
wget https://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
d) Extract the Spark files:
tar -xvf spark-1.6.0-bin-hadoop2.6.tgz
e) Download the Spark Elassandra assembly Jar (this is a fat Jar built by Instaclustr to include all required dependencies, to be used for spark shell). The latest version available for your spark version should be accessible via the Connection Info page of Instaclustr console.
wget https://static.instaclustr.com/spark/spark-elassandra-connector-assembly-1.6.2.jar
f) Install the Java Development Kit:
sudo apt-get update
sudo apt-get install default-jdk
g) If you are not using a VPC with DNS resolution and DNS hostname enabled, you will need to change the hostname of the client to the IP so that it resolves when used by Spark (a bit of a hack the right way is to edit /etc/hosts but this is quicker):
sudo hostname spark_client_private_ip
h) Install curl to run some test REST calls in command line:
sudo apt-get install curl
i) To build the final Java example, install maven
sudo apt-get install maven
3. Configure Client Network Access
As Spark has minimal security, we recommend that you access Spark from a peered VPC in AWS to increase the security of network-based access rules. To set up the peered VPC and allow connections from your VPC to the cluster, follow our support article on Using VPC Peering AWS.
Note: When following the VPC Peering instructions, you must add your VPC network range to the Spark Allowed Addresses and the Cassandra Allowed Addresses. The Spark driver on your client machine needs to be able to connect to Cassandra as well as the Spark workers (to establish partition ranges).
To add your VPC network range to Cassandra and Spark allowed addresses:
a) On the Cluster Overview page, click Cluster Settings from Manage Cluster menu of your cluster.
b) On the Settings page, add your VPC network range to Cassandra Allowed Addresses, Elassandra REST API Allowed Addresses, Spark Masters Allowed Addresses and Spark Jobserver Allowed Addresses. After adding VPC network address range, click Save Cluster Settings button.
In addition to connections from the Spark Client to the cluster, the architecture of Spark means that the Spark Cluster needs to be able to connect to the clients. Enable this in AWS by editing the security group associated with your Spark Client to add an Inbound rule with the following values:
Type: Custom TCP Rule
Protocol: TCP
Port Range: 1024-65535
Source: Custom IP, <your cluster network range> (viewable from the cluster details page in the Instaclustr console)
4. Open Elassandra REST Port in the Cluster
Elasticsearch by defaults listens to REST API calls on port 9200 and uses 9300 for internal communication between nodes. Elassandra uses the same ports. However, these ports are not open in the cluster for security reasons. Instead, the client machine can talk to Elassandra on 9201 port which is secured by a CA-signed certificate in the cluster. To allow the client to to connect to Elassandras REST port (9201) in the cluster:
log into the Instaclustr console and click on the Settings tab inside your cluster panel
In Elassandra REST API Allowed Addresses box, add the global IP address of your client.
Finally, save the changes (Refer steps 3a and 3b above)
Having these settings in place allows the client to make REST calls to Elassandra on port 9201.
5. Run a Cluster Health Check
Before running any Spark job either via Spark Shell or Spark Submit command, it is worth checking if the cluster has been properly set up and configured. You can do this by running some basic curl commands to index and read twitter like information (demo from Elasticsearch).
Note: Before running the commands, find the required authentication and URL information from Connection Info page in the Console and set them in the commands accordingly. To find authentication and URL information:
a) On the Cluster Overview page, click Connection Details from Manage Cluster menu of your cluster.
b) On the Connection Info page, you can view Elassandra REST API username, password, port number and URL.
Once you have the authentication and URL information, log in to your Spark Client and run the following commands to create a twitter user and two tweets in Elassandra.(Replace keywords in <> with corresponding authentication, IP and URL information)
curl -XPUT 'https://<elassandra_username>:<elassandra_password>@<elassandra_REST_API_URL>:9201/twitter/user/kimchy' -d '{ "name" : "Shay Banon" }'
curl -XPUT 'https://<elassandra_username>:<elassandra_password>@<elassandra_REST_API_URL>:9201/twitter/tweet/1' -d '{ "postDate": "2009-11-15T13:12:00", "message": "Trying out Elassandra as an Instaclustr managed service, so far so good?"}'
curl -XPUT 'https://<elassandra_username>:<elassandra_password>@<elassandra_REST_API_URL>:9201/twitter/tweet/2' -d '{ "postDate": "2009-11-15T14:12:12", "message": "Another tweet, will it be indexed?"}'
Running the commands above should be followed by messages like the one below that indicates successful creation of a twitter index.
{"_index":"twitter","_type":"user","_id":"3","_version":1,"_shards":{"total":1,"successful":1,"failed":0},"created":true}
6. Run basic queries against Elassandra from Spark Shell
We will now connect to the Spark cluster using the Spark Shell and run some queries.
a) Note the IP addresses of the three Spark Masters in your cluster this is viewable on the Spark tab on the Instaclustr console for your cluster.
b) Log in to your Spark Client and run the following commands:
$ cd ~/spark-1.6.0-bin-hadoop2.6/bin
$ ./spark-shell --master spark://<spark_master_IP1>:7077,<spark_master_IP2>:7077,<spark_master_IP3>:7077 --conf spark.es.nodes=<elassandra_Public_IP_Address>--conf spark.es.port=9202 --conf spark.es.net.ssl.cert.allow.self.signed=true --conf spark.es.net.ssl=true --conf spark.es.net.http.auth.user=<elassandra_username> --conf spark.es.net.http.auth.pass=<elassandra_password>--jars ~/spark-elassandra-connector-assembly-1.6.2.jar
c) Spark-shell should start without any errors. There will be a lot of log messages. Once fully started you will see a "scala>" prompt as shown below:
d) Some imports are necessary. For our simple job, enter the following at the prompt:
import org.elasticsearch.spark._
e) Now run the following code to read the Elassandra index and create a spark RDD. By default the documents are returned as a Tuple2 with id as first element and the actual value as second element.
val tweet_rdd = sc.esRDD("twitter/tweet")
tweet_rdd.take(5).foreach(println)
f)You should now see the following output:
(1,Map(postDate -> Sun Nov 15 13:12:00 UTC 2009, message -> Trying out Elassandra, so far so good?, user -> kimchy))
(2,Map(postDate -> Sun Nov 15 14:12:12 UTC 2009, message -> Another tweet, will it be indexed?, user -> kimchy))
7. Submit a Spark job to write to Elassandra index
In this step of the tutorial we will demonstrate how to build and submit a Spark job written in Java to write to Elassandra index.
a) Create the following folder structure on the client.
Spark_job/src/main/java/SimpleApp.java
Spark_job/pom.xml
b) Add the following code to the maven pom file (pom.xml):
<project> <groupId>com.instaclustr</groupId> <artifactId>simple-project</artifactId> <modelVersion>4.0.0</modelVersion> <name>Simple Project</name> <packaging>jar</packaging> <version>1.0</version> <dependencies> <dependency> <!-- Spark dependency --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>1.6.0</version> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop</artifactId> <version>2.2.0</version> </dependency> </dependencies></project>
c) Create a file called SimpleApp.java in java directory and add the code below. Replace keywords in <> with the corresponding cluster configs which are visible in the Connection Info page on the console (Refer steps 5a, 5b and 6a above):
import java.util.Map;import org.apache.spark.api.java.JavaSparkContext;import org.elasticsearch.spark.rdd.api.java.JavaEsSpark;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaPairRDD;public class SimpleApp { public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("Simple Application").setMaster("spark://<Spark_Private_IP>:7077,<Spark_private_IP>:7077"); conf.set("es.nodes",<elassandra_Public_IP_Address>); conf.set("es.port","9202"); conf.set("es.net.ssl","true"); conf.set("es.net.http.auth.user",<elassandra_username>); conf.set("es.net.http.auth.pass",<elassandra_password>); conf.set("es.net.ssl.cert.allow.self.signed","true"); JavaSparkContext jsc = new JavaSparkContext(conf); JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(jsc, "twitter/user"); System.out.println("Records Count:" + esRDD.count()); }}
d) Build the job (from spark_job directory):
mvn package
e) Submit the job (from spark_job directory):
$ ~/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --jars ~/spark-elassandra-connector-assembly-1.6.2.jar --class "SimpleApp" target/simple-project-1.0.jar
You should see a lot of log messages with the row count message towards the end.
8. Conclusion
In this tutorial you have learned how to:
Provision a cluster with Elassandra and Spark
Setup and configure Spark client
Run basic curl commands from Spark client
Run basic queries against Elassandra from Spark Shell and
Submit a Spark job to write to Elassandra index
For more information, refer following resources:
http://www.elassandra.io/
View ArticleInstaclustr's Cluster Health page exposes a number of indicators to help you understand your cluster's long term performance. There are three potential states for each indicator:
Green represents a healthystate
Amber represents a warning state; and
Red represents failed state
Disk Usage Indicator
Tombstones and Live Cells
The Disk Usage indicator checks the percentage of space used on each node. If the disk usage is over 75%-80% in the last hour, it indicates that the node is filling up, and it is very likely that the node cannot provide enough work space for normal Cassandra operations. Please refers to Disk Usage for more details.
Suggested fix for non-healthy states:
Remove excess data from the cluster
Add more nodes to the cluster
Partition Size Indicator
Partition Size indicator checks the size of the largestpartition in each table. Werecommended limiting the maximum partition size to 10MB for optimal performance with 100MB as un upper limit for ongoing stability.Large partitions may significantly impact the performance of Cassandra operation. Please refer to Part ition Size for more details.
Suggested fix for non-healthy states:
Remove the problem partition
Re-assess the data model as data may not be evenly distributed or is bunched into too few partitions
Tombstones to Live Cells Indicator
The Tombstones to Live Cells indicator checks the average ratio of the number of tombstones and live cells per read in each table. High ratios of tombstones to live cells (greater than 5x as a starting guide) can cause substantially reduced performance in reads from a table. Please refers to for more details.
Suggested fix for non-healthy states:
Tune the compaction strategy to more aggresively remove tombstones
Re-assess the data model
Replication Strategy Indicator
The Replication Strategy indicator checks the replication class used for each keyspace. NetworkTopologyStrategy is highly recommended to ensure data is replicated to minimise impact of likely failures in your infrastructure (e.g. replicate across AWS availability zones) and to enable additional data centers to be added to the cluster without table rebuilds.
Suggested fix for non-healthy states:
Change the replication class to NetworkTopologyStrategy for the problem keyspaces
Replication Factor Indicator
The Replication Factor indicator checks the number of replicas set for each datacenter. A replication factor of at least 3 is required for Instaclustr SLAs to apply and highly recommended for data protection and high availability.
Suggested fix for non-healthystates:
Set the replication factor to three or larger for the problem datacenters (note: increasing replication factor requires repairs to be run after the change to ensure data is correctly distributed. Contact [email protected] for assistance with this operation.)
View ArticleDatacentersrunning on Amazon'sEBS infrastructurecan be encrypted with an AWSKMS key. See Setting Up a Datacenter with EBS Encryption for more information on sharinga KMS key with Instaclustr.
List availablekeys
To geta list of encryption keyspreviously added to this account make a GET request to https://api.instaclustr.com/provisioning/v1/encryption-keys
The response will contain an array of key IDs that may be used to provision encrypted clusters:
[
{
"id":"ff4fccf3-2ac0-494b-9f40-e95288dd752d",
"arn":"arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789abc",
"alias":"virginia 1"
}
]
Add a KMS key
To addan encryption key make a POSTrequest to https://api.instaclustr.com/provisioning/v1/encryption-keys with the JSON body:
{
"alias":"virginia key",
"arn":"arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789abd"
}
If validation succeeds, we will respond with 202 Accepted and a JSON containing the key idthat may be used to provision encrypted clusters.
Remove a KMS key
Make a DELETE request to
If successful, the APIwill respond with 202 Accepted.
If the key is in use by a running cluster, the API will respond with 400 Bad Request and a JSON with message"Encryption key in use. Data centres using this key need to be deleted first."
View ArticleThe monitoring API currently provides the following monitoring information:
Long term cluster health indicators
Metrics for:
Cassandra status
reads and writes operations per second
cpu utilization
disk utilization
pending compactions and active repairs
Metrics information is provided either for an individual node or for all nodes in a cluster and cluster data centre.
The API also provides key statistics for each table in the cluster (similar to what is available through "nodetool tablehistograms"):
read & write counts(mean, distribution)
read & write latency (mean, distribution)
live cells & tombstones per read (mean, max)
number of sstables read for each read operation (mean, max)
The set of available metrics will expand as we build out this API. Descriptions of each of the metrics can be found in the monitoring section of this support site: https://support.instaclustr.com/hc/en-us/sections/200689300-Monitoring-Information
Authentication
All requests to the API must use Basic Authentication and contain a valid username and the monitoring API key. API keys are created per user account and can be retrieved via the Instaclustr Consolefromthe Account > API Key tab.
https://support.instaclustr.com/hc/en-us/articles/226437447-Cluster-Health-Check
All available metrics are updated every 20 seconds (i.e. requesting the same metric twice in 20 seconds will always return the same response).
Cluster Health Indicator
Cluster Health Indicator API provides a summary of indicators on the long term health of your cluster and is retrieved by making a GET request to
The API will respond with status 200 OK and a JSON packet containing the following information:
[
{
"type": "REPLICATION_STRATEGY",
"stateDetails": {
"PASS": [
{
"message": "",
"keyspace": "testkeyspace"
}
]
}
},
{
"type": "REPLICATION_FACTOR",
"stateDetails": {
"PASS": [
{
"message": "",
"keyspace": "testkeyspace"
}
]
}
},
{
"type": "DISK_USAGE",
"stateDetails": {
"PASS": [
{
"message": "",
"publicIp": "52.5.37.217",
"privateIp": "10.224.145.126"
},
{
"message": "",
"publicIp": "34.232.115.13",
"privateIp": "10.224.80.183"
},
{
"message": "",
"publicIp": "34.233.151.239",
"privateIp": "10.224.9.122"
}
]
}
},
{
"type": "PARTITION_SIZE",
"stateDetails": {
"PASS": [
{
"message": "",
"keyspace": "testkeyspace",
"table": "units"
}
],
"PASS": [
{
"message": "",
"keyspace": "testkeyspace",
"table": "students"
}
]
}
},
{
"type": "TOMBSTONE_LIVECELL",
"stateDetails": {
"UNKNOWN": [
{
"message": "No tobmstone/liveCell information found.",
"keyspace": "testkeyspace",
"table": "units"
},
{
"message": "No tobmstone/liveCell information found.",
"keyspace": "testkeyspace",
"table": "students"
}
]
}
}
]
Example: Response packet showing cluster health
The output JSON consists of:
type: The name of the indicator being returned. The API returns five indicator types; REPLICATION_STRATEGY and REPLICATION_FACTOR for each keyspace. DISK_USAGE for each node. PARTITION_SIZE and TOMBSTONE_LIVECELL for every table.
stateDetails: The state of the indicator type. stateDetails can be PASS, UNKNOWN, FAIL, WARN with further details provided in the form of a message.
A detailed description of cluster health indicators can be found in this support article:
Metrics
Metrics are requested by constructing a GET request, consisting of:
type:Either 'clusters', 'datacentres' or 'nodes'. Specifying 'clusters' will return the metrics for each node in the cluster. Specifying 'datacentres' will return the metrics for each node belonging to the datacenter. Specifying 'nodes' will return the metrics for a specific node.
UUID or public IP:If the type is set to 'clusters' or 'datacentres', then the UUID of cluster or datacentre must be specified. However, if the type is set to 'nodes', than either the nodes' UUID or public IP may be specified.
metrics:The metrics to return are specified as a comma delimited querystring parameter. Up to 20 metrics may be specified.
reportNaN: (true|false) If a metric value is NaN or null, reportNaN determines whether API should report it as NaN. The default behaviour is false and NaN and null values will be reported as 0. Setting reportNaN=true will return NaN values in the API response.
https://api.instaclustr.com/monitoring/v1/clusters/e7342f08-d32f-41af-95be-cfaa0a43
3a26?metrics=n::cpuUtilization,n::diskUtilization
Example: Endpoint to return the CPU and disk utilization for each node in the cluster with a UUID of e7342f08-d32f-41af-95be-cfaa0a433a26
https://api.instaclustr.com/monitoring/v1/datacentres/001224dc-989c-4ad0-8b37-1ce34
5065b8f?metrics=n::cassandraReads,n::cassandraWrites
Example: Endpoint to return the read and write per second by Cassandra for each node belonging to the datacentre with a UUID of 001224dc-989c-4ad0-8b37-1ce345065b8f
https://api.instaclustr.com/monitoring/v1/nodes/52.70.191.97?metrics=cf::tk1::tcf1:
:readlatencydistribution
Example: Endpoint to return the read latency distribution for the 'tcf1' table in the 'tk1' keyspace, for just the 52.70.191.97 node.
For a complete list of available metrics, refer to theReferencesection.
Successfully processed metric API requests will return a 200 status code and accompanying JSON packet. JSON packets follow the same basic structure as listed in the following example:
[
{
"id":"be456b5e-e81a-4ea3-99f1-23905942d1d9",
"payload":[
{
"metric":"cpuUtilization",
"type":"percentage",
"unit":"1",
"values":[
{
"time":"2017-01-04T03:53:32.000Z",
"value":"7.401636"
}
]
}
],
"publicIp":"123.123.123.123",
"privateIp":"10.0.0.1",
"rack":{
"name":"us-east-1a",
"dataCentre":{
"name":"US_EAST_1",
"provider":"AWS_VPC",
"customDCName":"AWS_VPC_US_EAST_1"
},
"providerAccount":{
"name":"INSTACLUSTR",
"provider":"AWS_VPC"
}
}
}
]
Example: Response with CPU Utilization for a single node
Each payload item represents an individual metric and will consist of:
metric: The name of the metric beingreturned
type:The sub-typeof the metric that is being measured (e.g. for the diskUsed metric, the available 'types' are livediskspaceused and totaldiskspaceused)
unit: The unit of measurement. The following unit abbreviations are used:
GB: Gigabyte
MB: Megabyte
B: Byte
s: Second
ms: Millisecond
us: Microsecond
1: Non-standard unit (e.g. percentage)
us/1: Microseconds pre non-standard unit (e.g. latency per read operation)
1/s: Non-standard unit per second (e.g. write operations per second)
values: An array of time/value maps containing the measurement as recorded by Instaclustr
If multiple metrics are requested, the response will include multiple payload entries:
[
{
"id": "ce456b5e-c81a-4ea3-99f1-13805942d1d9",
"payload": [
{
"metric": "diskUtilization",
"type": "percentage",
"unit": "1",
"values": [
{
"time": "2017-01-04T03:59:14.000Z",
"value": "47.104115"
}
]
},
{
"metric": "cpuUtilization",
"type": "percentage",
"unit": "1",
"values": [
{
"time": "2017-01-04T03:59:14.000Z",
"value": "7.545443"
}
]
}
],
"publicIp": "123.123.123.123",
"privateIp": "10.0.0.1",
"rack": {
"name": "us-east-1a",
"dataCentre": {
"name": "US_EAST_1",
"provider": "AWS_VPC",
"customDCName": "AWS_VPC_US_EAST_1"
},
"providerAccount": {
"name": "INSTACLUSTR",
"provider": "AWS_VPC"
}
}
}
]
Example: Get CPU Utilization and Disk Utilization for a single node
Unsuccessful calls will return the following responses, depending upon the issue:
400 Bad Request: Returned when the expected node or cluster ID is not a valid UUID or an incorrect metric name has been supplied.
401 Unauthorized: Returned when no or incorrect username and/or api key details are provided.
404 Not Found: Returned when accessing an incorrect URL or trying to access a cluster/node not owned by the authenticated user.
429 Too Many Requests: Returned when more than 70 requests per second are being received by your user.
500 Server Error: All other errors
> GET /monitoring/v1/nodes/0aa675db-fe5a-4c54-80e7-e6be9dd60f35/badendpoint
HTTP/1.1
> Authorization: Basic 12345678==
> User-Agent: curl/7.40.0
> Host: api.instaclustr.com
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Server: nginx/1.9.4
< Date: Thu, 03 Sep 2015 02:10:57 GMT
< Content-Type: application/json
< Content-Length: 68
< Connection: keep-alive
< Set-Cookie: rememberMe=deleteMe; Path=/; Max-Age=0; Expires=Wed, 02-Sep-2015
02:10:57 GMT
<
* Connection #0 to host api.instaclustr.com left intact
{"name":"Endpoint not found","message":"Please check the URL path."}
Example: Error response
Reference
Nodes
General Metrics
Non-table metrics follow the format n::{metricName}.
Each metric type will contain the latest available measurement.
n::nodeStatus: Whether Cassandra is available on the node. Returns a "warn" value, if no check in has been logged in the last 30 seconds.
n::cpuUtilization: Current CPU utilisation as a percentage of total available. Maximum value is 100%, regardless of the number of cores on the node.
n::osload: Current OS load. Generally, a node is overloaded if os load >= the number of cores on the ndoe.
n::diskUtilization: Total disk space utilisation, by Cassandra, as a percentage of total available.
n::cassandraReads: Reads per second by Cassandra. (Deprecated, please use n::reads)
n::cassandraWrites: Writes per second by Cassandra.(Deprecated, please use n::writes)
n::compactions: Number of pending compactions.
n::repairs: Number of active and pending repair tasks.
n::clientRequestRead: 95th percentile distribution and average latency per client read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
n::clientRequestWrite: 95th percentile distribution and average latency per client write request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
Note: All deprecated metrics and endpoints will be removed in the future.
Table Metrics
Table metric names follow the format cf::{keyspace}::{table}::{metricType}. Optionally, a 'sub-type' may be specified to return a specific part of the metric. For example,
cf::tk1::tcf1::readlatencydistribution
will return the various distributions of the read latency metric.
cf::tk1::tcf1::readlatencydistribution::50thPercentile
will only return the 50th percentile distribution of the read latency metric.
Each metric type will contain the latest available measurement.
cf::{keyspace}::{table}::readLatencyDistribution: Measurement of local read latency for the table, on the individual node. Available sub-types:
50thPercentile: 50th percentile distribution of read latency
75thPercentile: 75th percentile distribution of read latency
95thPercentile: 95th percentile distribution of read latency
99thPercentile: 99th percentile distribution of read latency
cf::{keyspace}::{table}::reads: General measurements of local read latency for the table, on the individual node. Available sub-types:
latency_per_operation: Average local read latency per second
count_per_second: Reads of the table performed on the individual node
cf::{keyspace}::{table}::writeLatencyDistribution: Metrics for local write latency for the table, on the individual node. Available sub-types:
50thPercentile: 50th percentile distribution of write latency
75thPercentile: 75th percentile distribution of write latency
95thPercentile: 95th percentile distribution of write latency
99thPercentile: 99th percentile distribution of write latency
cf::{keyspace}::{table}::writes: General measurements of local write latency for the table, on the individual node. Available sub-types:
latency_per_operation: Average local write latency per second
count_per_second: Writes to the table performed on the individual node
cf::{keyspace}::{table}::sstablesPerRead: SSTables accessed per read of the table on the individual node. Available sub-types:
average: Average SSTables accessed per read
max: Maximum SSTables accessed per read
cf::{keyspace}::{table}::tombstonesPerRead: Tombstoned cells accessed per read of the table on the individual node. Available sub-types:
average: Average tombstones accessed per read
max: Maximum tombstones accessed per read
cf::{keyspace}::{table}::liveCellsPerRead: Live cells accessed per read of the table on the individual node. Available sub-types:
average: Average live cells accessed per read
max: Maximum live cells accessed per read
cf::{keyspace}::{table}::partitionSize: The size of partitions in the specified table in kb:
average: Average partition size
max: Maximum partition size
cf::{keyspace}::{table}::diskUsed: Live and total disk used by the table. Available sub-types:
livediskspaceused: Disk used by live cells
totaldiskspaceused: Disk used by both live cells and tombstones
Listing Monitored Tables
A list of monitored tables, grouped by keyspace, can be generated by making a GET request to:
https://api.instaclustr.com/monitoring/v1/clusters/{cluster-id}/columnFamilies
The API will respond with the following packet:
{
"keyspace1": [
"standard1",
"counter1",
"Counter3"
],
"keyspace2": [
"table2",
"table1"
]
}
Example: Response packet listing monitored tables
Clusters
Requesting 'cluster' metrics returns the requested measurements for each provisioned node in the cluster and follows the same format as the 'nodes' endpoint. All node metrics are available for use.
For example, this request:
https://api.instaclustr.com/monitoring/v1/clusters/37af4800-5166-3d3c-cb9a-c9a4b960
196e?metrics=n::cpuUtilization,cf::tk1::tcf1::sstablesPerRead
would return the following response packet:
[
{
"id": "694294d9-ea82-49c2-9f71-aacac81f0325",
"payload": [
{
"metric": "cpuUtilization",
"type": "percentage",
"unit": "1",
"values": [
{
"time": "2017-01-04T04:19:28.000Z",
"value": "7.639166"
}
]
},
{
"metric": "reads",
"type": "count_per_second",
"unit": "1/s",
"values": [
{
"time": "2017-01-04T04:19:28.000Z",
"value": "3.80952380952381"
}
]
}
],
"publicIp": "123.123.123.123",
"privateIp": "10.0.0.1",
"rack": {
"name": "us-east-1c",
"dataCentre": {
"name": "US_EAST_1",
"provider": "AWS_VPC",
"customDCName": "AWS_VPC_US_EAST_1"
},
"providerAccount": {
"name": "INSTACLUSTR",
"provider": "AWS_VPC"
}
}
},
{
"id": "4d848f48-5e24-41d6-81f2-44c2f578895f",
"payload": [
{
"metric": "cpuUtilization",
"type": "percentage",
"unit": "1",
"values": [
{
"time": "2017-01-04T04:19:30.000Z",
"value": "7.915636"
}
]
},
{
"metric": "reads",
"type": "count_per_second",
"unit": "1/s",
"values": [
{
"time": "2017-01-04T04:19:30.000Z",
"value": "5.571428571428571"
}
]
}
],
"publicIp": "123.123.123.124",
"privateIp": "10.0.0.2",
"rack": {
"name": "us-east-1a",
"dataCentre": {
"name": "US_EAST_1",
"provider": "AWS_VPC",
"customDCName": "AWS_VPC_US_EAST_1"
},
"providerAccount": {
"name": "INSTACLUSTR",
"provider": "AWS_VPC"
}
}
}
]
View ArticleInstaclustrs monitoring API is designed to allow you to integrate the monitoring information from your Instaclustr managed cluster with the monitoring tool used for the entire application. DataDog ( datadoghq.com ) is a popular platform for monitoring a range of applications. This help article walks you through how to use the Instaclustr Monitoring API with DataDog.
At a high-level, the approach we will take in this article is to install a script on a server you manage that has the DataDog agent installed. This script calls the Instaclustr Monitoring API at regular intervals and passes the information returned to the DataDog agent which reports it to the central DataDog system.
One of our awesome customers has also come up with and uses an alternative approach using AWS lambdas. You can find details here: https://github.com/manheim/InstaCluster-to-Datadog-Lambda
Prepare Your Environment
Follow these steps to set up your environment:
Set up a cluster with Instaclustr (see https://support.instaclustr.com/hc/en-us/articles/203738340-Creating-a-Cluster )
Set up a DataDog account ( datadoghq.com)
Install the DataDog agent on the machine you will use to run the integration script (Install instruction are available in the DataDog console).
Install Python on the machine ( https://www.python.org/downloads/)
Install the pip Python package manager on the machine ( https://pip.pypa.io/en/stable/installing/ ).
Install the DataDog DogStatsD API package (pip install datadog).
Set Up The Script
We have created a sample script that calls the Instaclustr API and forwards the data to DataDog. The script is available on GitHub here: https://github.com/instaclustr/ICAPI-DataDog. You can download the ZIP file or clone the repository.
The script (ic2datadog.py) is fairly straightforward. It retrieves a specified list of metrics for all nodes in the cluster and requires a configuration file (called configuration.json) in the format shown below:
{
"cluster_id":"[instaclustr cluster id]",
"metrics_list":"n::cpuutilization,n::cassandraReads,n::cassandraWrites,n::nodeStatus",
"dd_options":
{
"api_key":"[datadog API key]",
"app_key":"[datadog app key]"
},
"ic_options":
{
"user_name":"[instaclustr user name]",
"api_key":"[instaclustr Monitoring API Key]"
}
}
The settings to be added to the configuration file are:
cluster_id: The Instaclustr cluster ID. Available from the cluster details page on the Instaclustr console
metrics_list: A comma separated list of metrics to retrieve and pass to DataDog. For a full list of available metrics, see the Instaclustr monitoring API documentation ( https://support.instaclustr.com/hc/en-us/articles/209695488-Monitoring-API ).
dd_options: Your DataDog API key and Application key. Available from Integrations/APIs in the DataDog console (you may need to create a new app key).
ic_options: Your Instaclustr user name and API key. These can be viewed under theAccount/API Keys tab of the Instaclustr console.
Run The Script and View The Results
Running the script is a simple matter of python ic2datadog.py. The script will then run until interrupted.After a minute or so of running, the metrics will be visible in DataDog.
You can see the results by logging into the DataDog console:
To view the gauge metrics (e.g. CPU):
Go to Metrics/Explorer
In the Graph text box, start typing instaclustr. You should see a list of available metrics appear in the format Instaclustr.[Node IP].[metric name]. Choose the metrics you want and DataDog will draw you a graph.
To view the node status information:
Go to Monitors/Check Summary
You should see the Instaclustr node status checks in the list (filter for Instaclustr if necessary).
The Instaclustr metrics are now available to use wherever else you would use them in DataDog (dashboards, monitors, etc).
View ArticleKong is an open-source management layer for APIs, delivering high-performance and reliability (see http://getkong.org ). Kong requires Cassandra as its data store. As such, Instaclustr is a great way to run Cassandra for your Kong deployment. This article describes how to get set up to use Instaclustr with Kong.
Firstly, for a product installation or one where you will be doing any kind of performance/load testing, we highly recommend that your server running Kong is in the same region as your Cassandra cluster. We support all AWS regions, most Azure Data Centers, Google Cloud Platform, and Softlayer.
These are the steps to configure Kong to work with Instaclustr:
Provision a new cluster on Instaclustr by logging into the dashboard and choosing Create Cassandra Cluster .
Choose the same region as your Kong server.
Before v0.4.2, Kong does not support user authentication or client connection encryption, so ensure that you turn these options off. You will need to rely on the firewall rules to limit access to the cluster to your server only. (Alternatively, on AWS you could use VPC peering.)
https://getkong.org/docs/0.11.x/getting-started/quickstart/
Kong v0.4.2 has included support for authentication and encryption. You can therefore now enable these options in your cluster if desired - username, password and certificate file will have to be set in your kong configuration file.
All other settings can be left at default.
Ensure the server from which you are running Kong is whitelisted in your cluster firewall rules.
From the cluster page in the Instaclustr dashboard, choose Manage Cluster / Firewall Rules.
In the Allowed Addresses box, enter the IP of your Kong host and click on "Save Cluster Settings".
Navigate to the Cluster Details page for your cluster and note the Public IPs of the nodes that have been provisioned.
Log in to a shell on your Kong host and, if necessary, install Kong according to the instructions on getkong.org.
Optionally, install CQLSH on your Kong host and check the connection.
Run:
pip install cqlsh
It is the easiest way to install cqlsh where you have pip installed.
Run:
cqlshx.x.x.x
wherex.x.x.x is the public IP of one of the nodes in your cluster (from step 3).
cqlsh should connect and you should get a cqlsh> prompt. You can run
describe keyspaces;
to check that the connection is working. Then exitto quit cqlsh.
Configure Kong:
Go to KONG Documentation and select the Kong version you have installed.
Click Configuration file and follow the instruction.
If necessary, copy the default konq.yml file to /etc/kong (you may need tocreate the directory first.)
cp /usr/local/lib/luarocks/rocks/kong/0.3.1-1/conf/kong.yml /etc/kong
Open/etc/kong/kong.yml in your favourite text editor.
Find the hosts: property under cassandra:and change it to look like the following:
hosts:
- "1.1.1.1"
- "2.2.2.2"
For Kong 0.4.2 and later this needs to look like:
hosts:
- "1.1.1.1:9042"
- "2.2.2.2:9042"
Start Kong and test:
Run:
kong start
Configure an API and run some transactions following instructions in theKong docs quick start ()
Optionally Connect via cqlsh and view some of the data created:
Run:
cqlshx.x.x.x
wherex.x.x.x is the public IP of one of the nodes in your cluster (from step 3).
From the cqlsh> prompt run:
use kong;
to use the kong keyspace.
Run:
select * from apis;
View ArticleOur Heroku add-on is now in general release and provide the simplest method to get started with Cassandra from Heroku. Find it in the Heroku Elements marketplaceand see our documentation on the Heroku devcenter: https://devcenter.heroku.com/articles/instaclustr
However, for more advanced use cases (such as connecting from both Heroku and another environment) it may be useful to connect from Heroku to a cluster provisioned through the standard Instaclustr environment. The process for doing that is described below.
The tutorial describes the steps required to extend the Heroku python tutorial app to query a Cassandra cluster provisioned by Instaclustr. If you are interested in using a different language, the Instaclustr configuration steps will be the same for all languages. The basic connection settings and driver API reference for other languages can be found on the cluster Connection Details page on the Instaclustr dashboard.
Complete the Heroku Getting Started with Python on Heroku tutorial. If you are already familiar with Heroku or just impatient to get going with Cassandra, you only need to complete till the Push local changes step to follow this tutorial.
Provision a new cluster on Instaclustr by logging into the dashboard and choosing Create Cassandra Cluster .
Choose the same AWS region as your Heroku app. You can verify this by running:
heroku info
The heroku us region corresponds to AWS US East (Northern Virginia) data centre, the heroku eu region is AWS EU West (Ireland).
For Heroku, we highly recommend enabling password authentication/authorization and client to cluster encryption(due to the fact that Heroku apps have dynamic IPs so it is necessary to open the firewall to all source addresses). At the present time these options are only available for production level Instaclustr node sizes (not developer nodes).
All other settings can be left at default.
Once your cluster finishes provisioning, configure the firewall to allow connections from Heroku:
Navigate to the Settings page for the cluster.
Change the value of allowed addresses in the Firewall rules section to 0.0.0.0/0 and click Save Cluster Settings. This will allow any source IP to connect to the cluster.
Download the SSL certificate for your newly created cluster:
Navigate to the Connection Details page for your cluster.
Click on the Download Cluster CA X.509 Certificates button. This will download a zip file with the Certificate Authority certificate for your cluster in a variety of formats.
Unzip the downloaded zip file and copy the cluster-ca-certificate.pem file to your python-getting-started folder.
Add the Cassandra driver to you python virtual environment:
Edit requirements.txt in your python-getting-started folder to add the line cassandra-driver==3.11.0 at the end.
Install cassandra-driver to your env:
pip install -r requirements.txt --allow-all-external
Edit the file hello/views.py to include the required code to connect to Cassandra and retrieve some data from a table. Replace the existing contents of the file with the following:
from django.http import HttpResponse
from cassandra.cluster import Cluster
from cassandra.policies import DCAwareRoundRobinPolicy
from cassandra.auth import PlainTextAuthProvider
# Create your views here.
def index(request):
cluster = Cluster(
contact_points=[
"<CLUSTER_IP_ADDRESS>" # US_EAST_1 (Amazon Web Services (VPC))
], load_balancing_policy=DCAwareRoundRobinPolicy(local_dc='US_EAST_1'),
# your local data centre
port=9042,
ssl_options={
'ca_certs': 'cluster-ca-certificate.pem'
},
auth_provider = PlainTextAuthProvider(username='<USER_NAME>',
password='<PASSWORD>'))
session = cluster.connect()
html = 'Connected to cluster %s<br>' % cluster.metadata.cluster_name
for host in cluster.metadata.all_hosts():
html += 'Datacenter: %s; Host: %s; Rack: %s<br>' % (host.datacenter, host.address, host.rack)
html += "<br>Keyspaces:<br>"
rows = session.execute("select keyspace_name from system_schema.keyspaces;")
for row in rows:
html += row[0] + "<br>"
cluster.shutdown()
return HttpResponse(html)
def db(request): greeting = Greeting()
greeting.save()
greetings = Greeting.objects.all()
return render(request, 'db.html', {'greetings': greetings})
Note: the connection_points, local_dc, "username" and "password" values in the call to Cluster() will need to be updated with the appropriate value for your cluster. This can be copied from the Connection Setting page for your cluster.
Test the changes from your local computer:
Start the application locally by running:
heroku local web
Browse to http://localhost:5000 and you should see the result from connecting to your Cassandra cluster
Upload the changes to Heroku and run the application:
Add changed and new files to git change set:
git add cluster-ca-certificate.pem
git add requirements.txt
git add hello/views.py
Commit changes to git:
git commit m "Cassandra!"
Push your changes to heroku:
git push heroku master
Open the application running on Heroku and see the results:
heroku open
This is, of course, a basic tutorial and production ready apps are likely to require the use of additional Cassandra driver features such as retry policies and async execution. However, those considerations are not Heroku specific and can be design using the general driver documentation and other sources of information.
View ArticleThe rest of this tutorial will walk your through options to submit jobs to your Spark cluster. If you choose to provision your cluster with Zeppelin, you will be able to quickly write some Spark job via Zeppelin interface, available through Instaclustr console. You will find more details in this article: https://support.instaclustr.com/hc/en-us/articles/214940967
The high-level steps in this tutorial are:
Provision a cluster with Cassandra and Spark
Set up a Spark Client
Configure Client Network Access
Basic Interaction with Spark Shell
Using Spark SQL from Spark Shell
Creating and Submitting a Scala Job
This tutorial assumes that you are familiar with launching and connecting to servers in AWS.
While this tutorial is specific to AWS, we also support Spark on Azure and IBM SoftLayer. You can follow a similar approach to set up on those platforms or contact [email protected] if you need more detailed instructions.
1. Provision a cluster with Cassandra and Spark
a) If you havent already signed up for an Instaclustr account, refer our support article to sign up and create an account.
b)Once you have signed up for Instaclustr and verified your email, log in to the Instaclustr console and click the Create Cassandra Cluster button.
https://github.com/datastax/spark-cassandra-connector
c) On the Create Cassandra Cluster page, enter an appropriate name and network address block for your cluster. Refer our support article on Network Address Allocation to understand how we divide up the specified network range to determine the node IP addresses. Under Applications section, select:
Apache Cassandra 3.11
Apache Spark as an Add-on (Apache Spark 2.1.1 - Hadoop 2.6)
d)Under Data Centre section, select:
Amazon Web Services as the Infrastructure Provider
A minimum node size of t2.medium
Do not enable client encryption for Cassandra (see https://support.instaclustr.com/hc/en-us/articles/218913487-Instaclustr-Spark-with-ssl-configured-Cassandra-Cluster if you want to use Spark with Cassandra client to server encryption)
e) Under Cassandra Options section, select:
Use Private IP Addresses for node discovery
f) Leave the other options as default. Accept the terms and conditions and click Create Cluster button. The cluster will automatically provision and will be available for use once all nodes are in the running state.
2. Set Up a Spark Client
To use our Spark cluster, you will need a client machine setup to submit jobs. Use the following steps to set up a client in AWS:
a) Provision a new AWS server with the following configuration:
Region: same as your newly created Cassandra and Spark cluster
VPC: if possible, use a VPC with DNS resolution and DNS hostname enabled (Otherwise, refer step g below). The VPC network range should not overlap with the network range of your instaclustr cluster.
AMI: Ubuntu Server 14.04 LTS (HVM), SSD Volume Type as the AMI
Size: t2.small is sufficient for this tutorial and sufficient for many use-cases ongoing
b) ssh to the newly launched server with ubuntu as username.
c) Download the spark version matching your instaclustr version. In this case, Spark 2.1.1:
wget https://archive.apache.org/dist/spark/spark-2.1.1/spark-2.1.1-bin-hadoop2.6.tgz
d) Extract the Spark files:
tar -xvf spark-2.1.1-bin-hadoop2.6.tgz
e) Download the Spark Cassandra assembly Jar (this is a fat Jar built by Instaclustr to include all required dependencies, to be used for spark shell). The latest version available for your spark version should be accessible via the Connection Info page of Instaclustr console.
wget https://static.instaclustr.com/spark/spark-cassandra-connector-assembly-2.0.2.jar
f) Install the Java Development Kit:
sudo apt-get update
sudo apt-get install default-jdk
g) If you are not using a VPC with DNS resolution and DNS hostname enabled, you will need to change the hostname of the client to the IP so that it resolves when used by Spark (a bit of a hack the right way is to edit /etc/hosts but this is quicker):
sudo hostname
h) If you will be building the final scala example, then install sbt:
sudo echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 642AC823
sudo apt-get update
sudo apt-get install sbt
3. Configure Client Network Access
As Spark has minimal security, we recommend that you access Spark from a peered VPC in AWS to increase the security of network-based access rules. To set up the peered VPC and allow connections from your VPC to the cluster, follow our support article on Using VPC Peering AWS.
Note: When following the VPC Peering instructions, you must add your VPC network range to the Spark Allowed Addresses and the Cassandra Allowed Addresses. The Spark driver on your client machine needs to be able to connect to Cassandra as well as the Spark workers (to establish partition ranges).
To add your VPC network range to Cassandra and Spark allowed addresses:
a) On the Cluster Overview page, click Cluster Settings from Manage Cluster menu of your cluster.
b) On the Settings page, add your VPC network range to Cassandra Allowed Addresses, Spark Masters Allowed Addresses and Spark Jobserver Allowed Addresses. After adding VPC network address range, click Save Cluster Settings button.
In addition to connections from the Spark Client to the cluster, the architecture of Spark means that the Spark Cluster needs to be able to connect to the clients. Enable this in AWS by editing the security group associated with your Spark Client to add an Inbound rule with the following values:
Type: Custom TCP Rule
Protocol: TCP
Port Range: 1024-65535
Source: Custom IP, <your cluster network range> (viewable from the cluster details page in the Instaclustr console)
4. Basic Interaction with Spark Shell
We will now connect to the Spark cluster using the Spark Shell and run an analytic job. (Note: sometimes the log messages from Spark shell overwrite the shell prompt. If processing seems to have stopped with no prompt then hit the enter key to get a prompt.)
a) Note the IP of the three Spark Masters in your cluster this is viewable on the Spark tab on the Instaclustr console for your cluster.
b) Log in to your Spark Client and run the following command (adjust keywords in <> to specify your spark master IPs, one of cassandra IP, and the cassandra password if you enabled authentication)
cd ~/spark-2.1.1-bin-hadoop2.6/bin
./spark-shell --master spark://<spark_master_IP1>:7077,<spark_master_IP2>:7077,<spark_master_IP3>:7077 --conf spark.cassandra.connection.host=<cassandra_private_IP> --conf spark.cassandra.auth.username=iccassandra --conf spark.cassandra.auth.password=<iccassandra password> --jars ~/spark-cassandra-connector-assembly-2.0.2.jar
c) Spark-shell should start without any errors. There will be a lot of log message. Once fully started you will see a prompt: scala>.
d) Some imports are necessary. For this simple job, enter the following at the prompt:
import org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.SparkContext._import com.datastax.spark.connector._
e) Now we can create an rdd and execute an action on it. Only the action (rdd.count) will trigger the calculation. In this case, we use the "system_schema" keyspace that is used by cassandra to keep tracks of internals, such as the list of keyspaces.
val rdd = sc.cassandraTable("system_schema","keyspaces")println("Row count: " + rdd.count)
f) You should see a lot of log messages followed by the row count message.
5. Using Spark SQL from Spark Shell
Spark SQL allows you to run complex SQL queries against Cassandra data. The following step demonstrate how to execute a Spark SQL query against Cassandra using the Spark SQL connector. Execute these steps in the same Spark shell session where you executed the previous example:
a) Import the required libraries:
import org.apache.spark.sql.cassandra._
import org.apache.spark.sql
b) Create a temporary view to access datasets using Spark SQL:
val createDDL = """CREATE TEMPORARY VIEW keyspaces USING org.apache.spark.sql.cassandra OPTIONS ( table "keyspaces", keyspace "system_schema", pushdown "true")"""
spark.sql(createDDL)
c) Run queries on the temporary view:
spark.sql("SELECT * FROM keyspaces").show
val rdd1 = spark.sql("SELECT count(*) from keyspaces")println("Row count: " + rdd1.first()(0))
6. Creating and Submitting a Scala Job
In this step of the tutorial we will demonstrate how to build and submit a Scala job. This is useful where you wish to create a job and submit it multiple times.
a) Log in to your Spark client machine.
b) Create required directories for your project:
mkdir ~/cassandra-count
cd cassandra-count
mkdir -p src/main/scala
mkdir project
c) Create a file called build.sbt in the cassandra-count directory with the following contents (note: the blank lines are important):
name := "cassandra-count" version := "1.0" scalaVersion := "2.11.8" libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.1" % "provided" libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.2" libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.1" % "provided"assemblyMergeStrategy in assembly ~= { (old) => { case PathList("META-INF", "io.netty.versions.properties") => MergeStrategy.last case x => old(x) } }
d) Create a file called assembly.sbt in the cassandra-count/project directory with the following contents (this will include required dependencies in the output jars):
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")
e) Create a file called cassandra-count.scala in the cassandra-count/src/main/scala directory with the following contents:
import org.apache.spark.SparkContextimport org.apache.spark.SparkConfimport com.datastax.spark.connector._object cassandraCount { def main(args: Array[String]) { // 1. Create a conf for the spark context // In this example, spark master and cassandra nodes info are provided in a separate count.conf file. val conf = new SparkConf().setAppName("Counting row of a cassandra table") // 2. Create a spark context val sc = new SparkContext(conf) // 3. Create an rdd that connect to the cassandra table "schema_keyspaces" of the keyspace "system" val rdd = sc.cassandraTable("system_schema", "keyspaces") // 4. Count the number of row val num_row = rdd.count() println("\n\n Number of rows in system_schema.keyspaces: " + num_row + "\n\n") // 5. Stop the spark context. sc.stop } }
f) Create a file called cassandra-count.conf in the cassandra-count directory, this file contains the configuration that will be used when we submit the job (replace <> with your cluster value like in 4(b)):
spark.master spark://<spark_master_private_IP1>:7077,<spark_master_private_IP2>:7077,<spark_master_private_IP3>:7077 spark.executor.memory 1g spark.cassandra.connection.host <private ip of cassandra> spark.cassandra.auth.username iccassandra spark.cassandra.auth.password <iccassandra password> spark.serializer org.apache.spark.serializer.KryoSerializer spark.eventLog.enabled true spark.eventLog.dir .
g) Build the job (from cassandra-count directory):
sbt assembly
h) Submit the job (from cassandra-count directory):
~/spark-2.1.1-bin-hadoop2.6/bin/spark-submit --properties-file cassandra-count.conf --class cassandraCount target/scala-2.11/cassandra-count-assembly-1.0.jar
i) You should see a lot of log messages with the row count message about 15 messages from the end.
7. Conclusion
In this tutorial you have learned how to:
Provision a cluster with Cassandra and Spark
Setup and configure Spark client
Use Spark SQL from spark shell and
Create a submit a scala job
For more information, refer the following resources:
https://spark.apache.org/
https://www.scala-lang.org/
View ArticleSpark Jobserver is an open source project available on GitHub (originally created by Ooyala). You can submit jobs, contexts and JARs to the Jobserver using a RESTful interface. This tutorial demonstrates how to use the Jobserver to submit jobs to an Instaclustr Cassandra+Spark cluster. We will interact with the Jobserver using curl.
Spark Jobserver provides a simple, secure method of submitting jobs to Spark without many of the complex set up requirements of connecting to the Spark master directly.
Ifyou'veenabled encryption, you will need to have downloaded the clusters CA certificate, available in the zip file in the connection details page (this is the same CA certificate that is used for connecting to Cassandra).
The Spark Jobserver GitHub page
Ifyou'veenabled authentication, you will need to supply a username and password when making HTTP requests to the Jobserver. You can do this in curl by using the flag -u username:password. These are also available on the connection details page of your cluster on the console.
The high level steps to follow are:
Setup your environment.
Build the sample.
Run the sample.
1. Setup your environment
First, if youdon'talready have one, create a Cassandra cluster with Spark enabled. All Instaclustr clusters with Spark enabled include Jobserver so you can use whatever settings make the most sense for your scenario.
Secondly, ensure that you have the necessary software installed on your machine to build the Spark jobs.
If you have already done one of our other Spark tutorials, the Spark client machine that you set up for those tutorials can be used for this tutorial. However, one of the advantages of using Jobserver is that less setup and network configuration is required to use it.
The software that you will need installed is:
A java 1.8 JDK
sbt
git (to retrieve the samples)
These are readily available and easily installed for most systems. Some examples of how to install are:
Ubuntu:
sudo apt-get install default-jdk
echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 642AC823
sudo apt-get update
sudo apt-get install sbt
sudo apt-get install git
Mac:
Install brew if youdon'talready have it (http://brew.sh/)
brew update
brew tap caskroom/cask
brew install Caskroom/cask/java
brew install sbt
brew install git
Finally, you will need to ensure the machine you will be working on has access to Spark Jobserver through the firewall. Log in to the Instaclustr console, navigate to the cluster settings page and add the IP address of your workstation to the Spark Jobserver allowed address. (If youre unsure of the IP address of your workstation, go to Google and search for what is my ip.)
2. Build the sample
We have loaded a sample project includingthe build, source and configuration files to Github. To build this:
Clone the repository:
git clone https://github.com/instaclustr/sample-SparkJobserverCassandra
Build the project:
cd sample-SparkJobserverCassandrasbt assembly
The repository contains 3 source files:
sbt: the project file that specifies dependencies.
src/main/scala/cassandraCount.scala: the scala file with the actual application. The code is brief (~10 lines) and heavily commented to explain what is going on.
project/assembly.sbt: sbt plugin config to package dependencies in the target jar.
When executed, the application will use the Cassandra connector to create an RDD based on a Cassandra table, count the number of rows in the RDD and return the result.
3. Run the sample
Upload the jar
As the first step we will upload the JAR to the Jobserver, which will allow us to make future calls to it. The examples below include the --cacert and -u options which assumes that authentication and SSL are enabled. If they aren't enabled, just proceed without those flags (and use http rather than https).
curl --cacert cluster-ca-certificate.pem -u icspark:<password> --data-binary @target/scala-2.10/cassandra-count-assembly-1.0.jar https://<sparkJobServerIP>:8090/jars/cassandra-count
curl will return with OK to indicate success. You can also make a GET request to the Jobserver to verify that the JAR has indeed been uploaded:
curl --cacert cluster-ca-certificate.pem -u icspark:<password> https://<sparkJobServerIP>:8090/jars
{
"cassandra-count": "2015-11-16T22:44:31.775Z"
}
Uploading Contexts to the Jobserver
You upload contexts to the Jobserver and specify your job to use the available contexts that have been uploaded to the Jobserver. Jobserver manages the context on your behalf so you don't initialize a new context in the main function of the job. The Jobserver uses the following context by default:
num-cpu-cores = 2
memory-per-node = 512m
We will upload a new context as the cassandra-count job requires connecting to cassandra:
curl --cacert cluster-ca-certificate.pem -d "" -u icspark:<password> 'https://<sparkJobserverIP>:8090/contexts/test-context?spark.cassandra.auth.username=iccassandra&spark.cassandra.auth.password=<password>&spark.cassandra.connection.host=<PRIVATE_IP_OF_CASSANDRA_NODE>'
curl will return with OK upon success of uploading the context. We can now use this context when running the job. You can specify other context parameters and they will override the default context if applicable. The name of the context is unique - there can only be one context with the name test-context. You can always delete the existing context by making a DELETE request to it:
curl --cacert cluster-ca-certificate.pem -u icspark:<password> --request DELETE https://<sparkJobserverIP>:8090/contexts/test-context'
This will stop all jobs running in that context, so be careful!
Running our cassandra-count job
We are now ready to run the job. We do so by making a post request to the Jobserver with the cassandra endpoint we want to use:
curl --cacert cluster-ca-certificate.pem -u icspark:<password> -d "" 'https://<sparkJobserverIP>:8090/jobs?appName=cassandra-count&classPath=cassandraCount&context=test-context'
Here we've told the Jobserver to run our job using the context we specified earlier. Jobserver will return with something that looks like this:
{
"status": "STARTED",
"result": {
"jobId": "6d6350d6-7c67-4cd7-8129-c36d4985ca80",
"context": "test-context"
}
}
Alternatively, you can always add &sync=true at the end for small jobs, which will cause curl to wait for the result of the job. We can query for the result or status of the job by making a GET request to /jobs/<uuid>:
curl --cacert cluster-ca-certificate.pem -u icspark:<password>https://sparkJobserverIP:8090/jobs/6d6350d6-7c67-4cd7-8129-c36d4985ca80
If the job has finished, you will see the following result:
{
"status": "FINISHED",
"result": 5
}
Which tells us that the job completed successfully and that the number of tables in the system keyspace is 5.
The Spark Jobserver UI
The Jobserver UI is available on port 8090 of your spark jobserverinstances IP. It shows all your currently uploaded JARs, contexts, and failed/running/successful jobs.
Further Reading
contains a lot of useful information about using the Spark Jobserver.
View ArticleZeppelin is a web-based notebook, which facilitates interactive data analysis using Spark. Instaclustr now supports Apache Zeppelin as an add-on component to our managed clusters. In this tutorial, we will walk you through the basic steps of using Apache Zeppelin with Instaclustr Spark and Cassandra.
1. Provision a cluster with Cassandra, Spark and Zeppelin
(1) If you havent already signed up for an Instaclustr account, refer our support article to sign up and create an account.
(2) Once you have signed up for Instaclustr and verified your email, log into the Instaclustr console and click the Create Cassandra Cluster button.
https://support.instaclustr.com/hc/en-us/articles/218913487-Instaclustr-Spark-with-ssl-configured-Cassandra-Cluster
(3)On the Create Cassandra Cluster page, enter an appropriate name and network address block for your cluster. Refer our support article on Network Address Allocation to understand how we divide up the specified network range to determine the node IP addresses. Under Applications section, select:
Apache Cassandra 3.11
Apache Spark as an Add-on (Apache Spark 2.1.1 - Hadoop 2.6)
Apache Zeppelin as an Add-on (Apache Zeppelin 0.7.1 with Scala 2.11/Spark 2.1.1)
(4)Under Data Centre section, select:
Amazon Web Services as the Infrastructure Provider
A minimum node size of t2.medium
Do not enable client encryption for Cassandra (see if you want to use Spark with Cassandra client to server encryption)
(5) Under Cassandra Options section, select:
Use Private IP Addresses for node discovery
(6) Leave the other options as default. Accept the terms and conditions and click Create Cluster button. The cluster will automatically provision and will be available for use once all nodes are in the running state.
2. Getting Started with Zeppelin
(1)Once all nodes in the cluster are in the running state, click on the Zeppelin tab on the clusters page.
(2)Go to the listed URL and enter the given credentials to access Zeppelin.
(3)After which you should see the following page.
3.Basic Interaction with Zeppelin Notebook
(1)Create a new Notebook by clicking on the Create new note link. Give your note a preferred name and let Spark to be the Default Interpreter and click the Create Note button.
(2)The notebook has already been preconfigured to use Spark interpreter. Click the gear button on the top right of the notebook to see the enabled interpreters.
(3)Make sure the Spark interpreter is at the top of the list and Cassandra interpreter is enabled. Click Save button to save the settings.
(4)Load the dependencies using the following code.
%depz.load("/opt/zeppelin/interpreter/spark/spark-cassandra-connector-assembly-2.0.2.jar")
Then you will see the following output:
Make sure you get the same output as shown in the above picture. If it throws out an error, click on the gear button on the top right, go to the Interpreter menu and then restart the spark interpreter. Then you can go back to the Notebook and re-run the code.
(5) Run the following code:
%sparkimport com.datastax.spark.connector._import org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.SparkContext._val rdd = sc.cassandraTable("system_schema","keyspaces")println("Row count:" + rdd.count)
You should then get a result like the following:
4. Using Spark SQL from Zeppelin Notebook
(1) In the same Notebook, add a new paragraph, write and run the following code.
%sparkimport org.apache.spark.sql.cassandra._import org.apache.spark.sqlval createDDL = """CREATE TEMPORARY VIEW keyspaces USING org.apache.spark.sql.cassandra OPTIONS ( table "keyspaces", keyspace "system_schema", pushdown "true")"""spark.sql(createDDL) spark.sql("SELECT * FROM keyspaces").showval rdd1 = spark.sql("SELECT count(*) from keyspaces")println("Row count: " + rdd1.first()(0))
You should then get a result like the following:
If you try to run the above code in a new Notebook, you have to load the dependencies in the new Notebook first.
5. Using CQL from Zeppelin Notebook
Zeppelin can also be used to connect directly to Cassandra to execute CQL commands.
(1) Create a new Notebook.
(2) Put the following code into your Notebook and run the code.
%cassandra
USE "system_schema";
SELECT * FROM keyspaces;
You should then get a result like the following:
View ArticleThis tutorial describes how you can use Apache Spark and Zeppelin as part of an Instaclustr-managed cluster to extract and sample data from one cluster and write to another cluster.
1. Prerequisites
(1) At least two clusters running in Instaclustr. In this tutorial, the cluster from which we read data is called source cluster and the cluster to which we write the data is called target cluster.
(2) The target cluster is provisioned with Zeppelin and Spark.
(3) The keyspace of the target table must be identical to that of the source table (table names can be different).
2. Configure Network Access
As Spark in your target cluster needs to connect to your source cluster to read data, the public IP addresses of the nodes in your target cluster needs to be added into the Cassandra Allowed Addresses of your source cluster. The detailed steps are as follows:
Open your source cluster dashboard page.
Click Settings panel.
Add thepublic IP addresses of your target cluster nodes to Cassandra Allowed Addresses
Click Save Cluster Settings.
3.Create Table Definition on Source Cassandra Cluster and Target Cassandra Cluster
(1)Check the public IP address of your source cluster node.
(2) Open a terminal.
(3) Make sure cqlsh is installed on your system.
(4) Execute:
cqlsh <public IP address of source cluster>
(5) Change to instaclustr keyspace:
use instaclustr;
(6) Create a table called users:
CREATE TABLE users ( userid text Primary Key, first_name text, last_name text, emails set<text>, top_scores list<int>, todo map<timestamp, text> );
(7) Insert test data:
INSERT INTO users(userid, first_name, last_name) VALUES (1, f_name_src', l_name_src);INSERT INTO users(userid, first_name, last_name) VALUES (2, f_name_src', l_name_src);INSERT INTO users(userid, first_name, last_name) VALUES (3, f_name_src', l_name_src);INSERT INTO users(userid, first_name, last_name) VALUES (4, f_name_src', l_name_src);INSERT INTO users(userid, first_name, last_name) VALUES (5, f_name_src', l_name_src);INSERT INTO users(userid, first_name, last_name) VALUES (6, f_name_src', l_name_src);INSERT INTO users(userid, first_name, last_name) VALUES (7, f_name_src', l_name_src);INSERT INTO users(userid, first_name, last_name) VALUES (8, f_name_src', l_name_src);INSERT INTO users(userid, first_name, last_name) VALUES (9, f_name_src', l_name_src);INSERT INTO users(userid, first_name, last_name) VALUES (10, f_name_src', l_name_src);
(8)Execute quit to exit the Cassandra environment.
(9) Check the public IP addresses of your target cluster nodes.
(10) Execute:
cqlsh <public IP address of target cluster>
(11) Change to instaclustr keyspace:
use instaclustr;
(12) Create a table called users:
CREATE TABLE users ( userid text PRIMARY KEY );
(13)Execute select CQL command:
SELECT * FROM users;
The result should be empty.
4. Sample and Load Data
(1) Open the dashboard page of your target cluster.
(2) Open Details panel and click Zeppelin button, then you will see the Zeppelin webpage opened through your web browser.
(3) Create a new notebook by clicking the Notebook button on the home page of Zeppelin.
(4) Put the following code in the first paragraph to load dependencies.
%depz.load("/opt/zeppelin/interpreter/spark/spark-cassandra-connector-assembly-2.0.2.jar")
(5) Use the following spark code in the next paragraph to sample
%sparkimport com.datastax.spark.connector._import com.datastax.spark.connector.cql._import org.apache.spark.SparkContextval sourceCluster = CassandraConnector(sc.getConf.set("spark.cassandra.connection.host", "<public IP of nodes in source cluster>").set("spark.cassandra.auth.username","<user name of source cluster>") .set("spark.cassandra.auth.password","<password of source cluster>"))val rddFromSourceCluster = { implicit val c = sourceCluster // connect to source cluster in this code block.sc.cassandraTable("<source keyspace>","<source table>") .select("<PK column>") .sample(false,0.1)// sample data from source cluster. // sample 10% data from source table}rddFromSourceCluster.saveToCassandra("<target keyspace>","<target table>") //save data to local cassandra
For a large dataset, it is very time consuming to extract the whole dataset into Spark and then sample data on Spark. To make it more efficient, the method used in the above example is sampling partition key and joining the sampled partition key with the source table, which avoids pulling the complete data set down to Spark.
(6)Check the result on target Cassandra Cluster.
(7) Go back to the terminal environment.
(8) Execute select CQL command again:
SELECT * FROM users;
The result should be as following:
5. SSL Connection
If encryption is enabled in your source cluster you will need to contact our to load the truststore file of the source cluster to your target cluster. Meanwhile, the Spark context should be configured using the following code:
val sourceCluster = CassandraConnector(
sc.getConf.set("spark.cassandra.connection.host", "<Source IP>")
.set("spark.cassandra.auth.username","<user name of source cluster>")
.set("spark.cassandra.auth.password","<password of source cluster>")
.set("spark.cassandra.connection.ssl.trustStore.password", "instaclustr") .set("spark.cassandra.connection.ssl.enabled","true")
.set("spark.cassandra.connection.ssl.trustStore.type","jks")
.set("spark.cassandra.connection.ssl.trustStore.path","/opt/spark/conf/source_truststore.jks"))
View ArticleThese references for Spark and the Spark Cassandra connector should help you in learning to create Spark applications with Cassandra:
Spark Cassandra Connector 1.6.0-M1 Docs:
https://github.com/datastax/spark-cassandra-connector/blob/v1.6.0-M1/doc/0_quick_start.md
Spark Cassandra Connector 2.0.2 Docs:
https://github.com/datastax/spark-cassandra-connector/blob/v2.0.2/doc/0_quick_start.md
Spark 1.6.0 Docs:
https://spark.apache.org/docs/1.6.0/
Spark 2.1.1 Docs
https://spark.apache.org/docs/2.1.1/
Spark Screencasts
http://spark.apache.org/screencasts/
View ArticleOverview
Instaclustr have developed a number of useful tools to assist with diagnosing issues in a cluster. For users of Instaclustr's Managed Service, our Technical Operations team will run these as needed when working with you to help diagnose issues. The tools are available on a supported basis for our enterprise support customers and on an unsupported basis for the general community (although we'll probably answer questions on the C* user email list).
These tools supplement the information available from the nodetool utility that is part of core Apache Cassandra. Whereas nodetool tends to report based on summary statistics maintained as Cassandra services operate, ic-toolsdirectly read Cassandra's data files when executed to report more detailed and accurate statistics.
As such, executing the tools can result in a large amount of data being read which canpotentially impact the performance of a node where they are being executed. The two most data heavy tools (ic-cfstats and ic-purge) provide rate limiting functions to reduce the impact. However, users are advised to execute care when using these tools in a live cluster.
These tools are version-specific and you must use the corresponding ic-tools version for your Cassandra version. We have provided pre-built jars for all versions of Cassandra at the bottom ofthispage.
The source code is published on github.
Command
Description
ic-summary
Summary information about all column families including how much of the data is repaired
ic-sstables
Print out metadata for sstables the belong to a column family
ic-pstats
Partition size statistics for a column family
ic-cfstats
Detailed statistics about cells in a column family
ic-purge
Statistics about reclaimable data for a column family
(We've generally used the old-school C* term 'column family'. It is synonymous with 'table' in modern C* versions.)
ic-summary
Provides summary information about all column families. Useful for findingthe largest column families and how much data has been repaired by incremental repairs.
Usage
ic-summary
Output
Column
Description
Keyspace
Keyspace the column family belongs to
Column Family
Name of column family
SSTables
Number of sstables on this node for the column family
Disk Size
Compressed size on disk for this node
Data Size
Uncompressed size of the data for this node
Last Repaired
Time of the last incremental repair
Repair %
Percentage of data marked as repaired by incremental repair
ic-sstables
Print out sstable metadata for a column family. Useful in helping to tune compaction settings.
Usage
ic-sstables <keyspace> <column-family>
Output
Column
Description
SSTable
Data.db filename of sstable
Disk Size
Size of sstable on disk
Total Size
Uncompressed size of data contained in the sstable
Min Timestamp
Minimum cell timestamp contained in the sstable
Max Timestamp
Maximum cell timestamp contained in the sstable
Duration
The time span between minimum and maximum cell timestamps
Level
Leveled Tiered Compaction sstable level
Keys
Number of partition keys
Avg Partition Size
Average partition size
Max Partition Size
Maximum partition size
Avg Column Count
Average number of columns in a partition
Max Column Count
Maximum number of columns in a partition
Droppable
Estimated droppable tombstones
Repaired At
Time when marked as repaired by incremental repair
ic-pstats
Tool for finding largest partitions. Reads the Index.db files so is relatively quick.
Usage
ic-pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h
Display help
-b
Batch mode. Uses progress indicator that is friendly for running in batch jobs.
-n <num>
Number of partitions to display in leaders lists
-t <name>
Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.
-f <files>
Comma separated list of Data.db sstables to filter on
Output
Summary: Summary statistics about partitions
Column
Description
Count (Size)
Number of partition keys on this node
Total (Size)
Total uncompressed size of all partitions on this node
Total (SSTable)
Number of sstables on this node
Minimum (Size)
Minimum uncompressed partition size
Minimum (SSTable)
Minimum number of sstables a partition belongs to
Maximum (Size)
Maximum uncompressed partition size
Maximum (SSTable)
Maximum number of sstables a partition belongs to
Average (Size)
Average (mean) uncompressed partition size
Average (SSTable)
Average (mean) number of sstables a partition belongs to
Largest partitions: The top N largest partitions
Column
Description
Key
The partition key
Size
Total uncompressed size of the partition
SSTable Count
Number of sstables that contain the partition
SSTable Leaders: The top N partitions that belong to the most sstables
Column
Description
Key
The partition key
SSTable Count
Number of sstables that contain the partition
Size
Total uncompressed size of the partition
SSTables: Metadata about sstables as it relates to partitions.
Column
Description
SSTable
Data.db filename of SSTable
Size
Uncompressed size
Min Timestamp
Minimum cell timestamp in the sstable
Max Timestamp
Maximum cell timestamp in the sstable
Level
Leveled Tiered Compaction level of sstable
Partitions
Number of partition keys in the sstable
Avg Partition Size
Average uncompressed partition size in sstable
Max Partition Size
Maximum uncompressed partition size in sstable
ic-cfstats
Tool for getting detailed cell statistics that can help identify issues with data model.
Usage
ic-cfstats [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h
Display help
-b
Batch mode. Uses progress indicator that is friendly for running in batch jobs.
-r <limit>
Limit read throughput to ratelimit MB/s (unlimited by default, 16 is probably a good starting pointif you want to limit)
-n <num>
Number of partitions to display in leaders lists
-t <name>
Snapshot to analyse(snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.
-f <files>
Comma separated list of Data.db sstables to filter on
Output
Summary: Summary statistics about partitions
Column
Description
Count (Size)
Number of partition keys on this node
Total (Size)
Total uncompressed size of all partitions on this node
Total (SSTable)
Number of sstables on this node
Minimum (Size)
Minimum uncompressed partition size
Minimum (SSTable)
Minimum number of sstables a partition belongs to
Maximum (Size)
Maximum uncompressed partition size
Maximum (SSTable)
Maximum number of sstables a partition belongs to
Average (Size)
Average (mean) uncompressed partition size
Average (SSTable)
Average (mean) number of sstables a partition belongs to
Largest partitions: Partitions with largest uncompressed size
Column
Description
Key
The partition key
Size
Total uncompressed size of the partition
Tombstones
Number of cell or range tombstones
(droppable)
Number of tombstones that can be dropped as per gc_grace_seconds
Cells
Number of cells in the partition
SSTable Count
Number of sstables that contain the partition
Widest partitions: Partitions with the most cells
Column
Description
Key
The partition key
Cells
Number of cells in the partition
Tombstones
Number of cell or range tombstones
(droppable)
Number of tombstones that can be dropped as per gc_grace_seconds
Size
Total uncompressed size of the partition
SSTable Count
Number of sstables that contain the partition
Tombstone Leaders: Partitions with the most tombstones
Column
Description
Key
The partition key
Tombstones
Number of cell or range tombstones
(droppable)
Number of tombstones that can be dropped as per gc_grace_seconds
Cells
Number of cells in the partition
Size
Total uncompressed size of the partition
SSTable Count
Number of sstables that contain the partition
SSTable Leaders: Partitions that are in the most sstables
Column
Description
Key
The partition key
SSTable Count
Number of sstables that contain the partition
Size
Total uncompressed size of the partition
Cells
Number of cells in the partition
Tombstones
Number of cell or range tombstones
(droppable)
Number of tombstones that can be dropped as per gc_grace_seconds
SSTables: Metadata about sstables as it relates to partitions.
Column
Description
SSTable
Data.db filename of SSTable
Size
Uncompressed size
Min Timestamp
Minimum cell timestamp in the sstable
Max Timestamp
Maximum cell timestamp in the sstable
Partitions
Number of partitions
(deleted)
Number of row level partition deletions
(avg size)
Average uncompressed partition size in sstable
(max size)
Maximum uncompressed partition size in sstable
Cells
Number of cells in the SSTable
Tombstones
Number of cell or range tombstones in the SSTable
(droppable)
Number of tombstones that are droppable according to gc_grace_seconds
(range)
Number of range tombstones
Cell Liveness
Percentage of live cells. Does not consider tombstones or cell updates shadowing cells. That is it is percentage of non-tombstoned cells to total number of cells.
ic-purge
Usage
ic-purge [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h
Display help
-b
Batch mode. Uses progress indicator that is friendly for running in batch jobs.
-r <limit>
Limit read throughput to ratelimit MB/s(unlimited by default, 16 is probably a good starting pointif you want to limit)
-n <num>
Number of partitions to display in leaders lists
-t <name>
Snapshot to analyse. Snapshot is created if none is specified.
Output
Largest reclaimable partitions: Partitions with the largest amount of reclaimable data
Column
Description
Key
The partition key
Size
Total uncompressed size of the partition
Reclaim
Reclaimable uncompressed size
Generations
SSTable generations the partition belongs to
View ArticleThis tutorial builds on our basic Getting Started with Instaclustr Spark and Cassandra tutorial to demonstrate how to set up Apache Kafka and use it to send data to Spark Streaming where it is summarised before being saved in Cassandra.
The high-level steps to be followed are:
Set up your environment.
Build the sample.
Run the sample.
1. Set Up Your Environment
To set up your environment, first follow the step in sections 1 (Provision a cluster with Cassandra and Spark) 2 (Set up a Spark client) in the tutorial here: https://www.instaclustr.com/support/documentation/apache-spark/getting-started-with-instaclustr-spark-cassandra/
(Just a minor change in the configuration would be selecting AMI: Ubuntu Server 16.04 LTS (HVM), SSD Volume Type as the AMI)
Once this is complete, open three tabs (tab 1, tab 2, tab 3) in the Terminal, install and start Kafka:
Download Kafka (in tab1):
cd ~wget http://www-us.apache.org/dist/kafka/0.9.0.1/kafka_2.10-0.9.0.1.tgz
Unpack the files (in tab1):
tar xvf kafka_2.10-0.9.0.1.tgz
Build Kafka (in tab1):
cd kafka_2.10-0.9.0.1sbt updatesbt package
Note 1:If you get a warning of the sbt version not being set: then create a new directory in /home/ubuntu/kafka_2.10-0.9.0.1 called project. In this directory, create a file called build.properties and add this line: sbt.version=1.0.2
Note 2: If you get an error about java version mismatch, saying: bc: command not found The java installation you have is not up to date requires at least version 1.6+, you have version 1.8, install bc with sudo apt-get install bc.
Start Kafka (in tab1):
bin/zookeeper-server-start.sh config/zookeeper.properties&bin/kafka-server-start.sh config/server.properties&
Run Kafka producer test harness to send some test messages (in tab2):
cd kafka_2.10-0.9.0.1bin/kafka-console-producer.sh--topictest--broker-listlocalhost:9092
With the test harness running, type some random messages followed by Ctrl-D to finish. You will see a lot of logs when you type the first entry (related to the test harness setting up the channel). From then, you should see no errors for subsequent entries (in tab2).
Run the consumer test harness to retrieve the messages (in tab3):
cd kafka_2.10-0.9.0.1bin/kafka-console-consumer.sh--topictest--zookeeperlocalhost:2181--from-beginning
You should see the messages you typed in earlier played back to you (in tab3).
2. Build the sample
We have loaded a sample project includingthe build, source and configuration files to Github. To build this:
Clone the repository:
cd ~git clone https://github.com/instaclustr/sample-KafkaSparkCassandra.git
The repository contains 4 active files:
sbt: the project file that specifies dependencies.
cassandra-count.conf: configuration file with IPs, username, password.
src/main/scala/KafkaSparkCassandra.scala: the scala file with the actual application. The code is heavily commented to explain what is going on.
project/assembly.sbt: sbt plugin config to package dependencies in the target jar.
When executed, the application will:
Connect directly from the Spark driver to Cassandra, create a keyspace and table to store results if required.
Start a Spark streaming session connected to Kafka. Summarise messages received in each 5 second period by counting words. Save the summary result in Cassandra.
Stop the streaming session after 30 seconds.
Use Spark SQL to connect to Cassandra and extract the summary results table data that has been saved.
Build the project:
cd sample-KafkaSparkCassandrasbt assembly
Set your local configuration settings by either overwriting the cassandra-count.conf with the one you created in the previous tutorial or editing the template from the repository to replace the values in <> brackets.
3. Run the sample
At this stage, Kafka should still be running after the first step. We need to run both the Kafka producer test harness and the Spark sample app at the same time so its easiest if you have two console windows open. Once you have the two windows open and logged in do the following steps:
In your first console window, start the Kafka producer test harness:
cd ~/kafka_2.10-0.9.0.1/bin/kafka-console-producer.sh--topictest--broker-listlocalhost:9092
In the second console window, submit your Spark job:
cd ~/sample-KafkaSparkCassandra/~/spark-2.1.1-bin-hadoop2.6/bin/spark-submit --properties-file cassandra-count.conf --class KafkaSparkCassandra target/scala-2.11/cassandra-kafka-streaming-assembly-1.0.jar
Switch back to the Kafka producer console window and enter some test messages for 20 seconds or so.
Switch back to the Spark console window, amidst the streams of log messages you should see something like the following which is the summary from a single Spark streaming batch:
After 30 seconds of streaming has passed, you should see an output like the following which is the dump of the Casssandra table:
View ArticleA common setup for Cassandra cluster is to enable client encryption. In order to utilize spark with these clusters additonal steps must be taken when submitting jobs to configure the spark cassandra connector to use SSL. In this guide we will go through these steps and attempt to clarify the configuration properties used.
As a prerequisite to this guide the user should have provisioned and configured a cluster with both Cassandra and Spark. You can find the details on how to do this in the following article. Getting Started with Instaclustr Spark & Cassandra
Download Truststore File
You will need download the Certificates for the cluster from the Connection info page for your cluster. This image shows the download button on the top right hand side.
GitHub
In the downloaded zip you will find a Java Key Store file called truststore.jks. This file needs to be included as a resource in the assembled jar in a later step.
Creating and Submitting a Scala Job with SSL Cassandra Connection
In this step of the tutorial we will demonstrate how to build and submit a Scala job. This is useful where you wish to create a job and submit it multiple times.
Log in to your Spark client machine
Create required directories for your project:
mkdir ~/cassandra-countcd cassandra-countmkdir -p src/main/scalamkdir projectmkdir -p src/main/javamkdir -p src/main/resources
Create a file called build.sbt in the cassandra-count directory with the following contents (note: the blank lines are important):
name := "cassandra-count" version := "1.0" scalaVersion := "2.10.5" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0" % "provided" libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.6.0-M1" libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.0" % "provided"assemblyMergeStrategy in assembly <<= (assemblyMergeStrategy in assembly) {(old) => {case PathList("META-INF", "io.netty.versions.properties") => MergeStrategy.lastcase x => old(x)}}
Create a file called assembly.sbt in the cassandra-count/project directory with the following contents (this will include required dependencies in the output jars):
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.0")
Create a file called cassandra-count.scala in the cassandra-count/src/main/scala directory with the following contents:
import org.apache.spark.SparkContextimport org.apache.spark.SparkConfimport com.datastax.spark.connector._object cassandraCount { def main(args: Array[String]) { // 1. Create a conf for the spark context // In this example, spark master and cassandra nodes info are provided in a separate count.conf file. val conf = new SparkConf().setAppName("Counting row of a cassandra table") // 2. Create a spark context val sc = new SparkContext(conf) // 3. Create an rdd that connect to the cassandra table "schema_keyspaces" of the keyspace "system" val rdd = sc.cassandraTable("system", "schema_keyspaces") // 4. Count the number of row val num_row = rdd.count() println("\n\n Number of rows in system.schema_keyspaces: " + num_row + "\n\n") // 5. Stop the spark context. sc.stop } }
In order for Spark to connect to Cassandra using SSL an appropriate SSL Context needs to be created on the spark driver and all the executors. This is achieved via providing ssl specific properties to the spark cassandra connector. Using the default factory the path to the truststore file needs to be valid on the driver and executors. This can be restrictive. An alternative is to create a custom connector. Next we are going to create a custom cassandra connection class which treats the trust store path property as resource path rather than a file path. This allows the reading of the trust store from a resource inside the assembled jar. Create a file called CustomCassandraConnectionFactory.java in the cassandra-count/src/main/java directory with the following contents:
import com.datastax.driver.core.Cluster;import com.datastax.driver.core.JdkSSLOptions;import com.datastax.driver.core.SSLOptions;import com.datastax.driver.core.SocketOptions;import com.datastax.driver.core.policies.ExponentialReconnectionPolicy;import com.datastax.spark.connector.cql.CassandraConnectionFactory;import com.datastax.spark.connector.cql.CassandraConnectorConf;import com.datastax.spark.connector.cql.LocalNodeFirstLoadBalancingPolicy;import com.datastax.spark.connector.cql.MultipleRetryPolicy;import scala.collection.immutable.HashSet;import scala.collection.immutable.Set;import scala.reflect.ClassTag;import javax.net.ssl.SSLContext;import javax.net.ssl.TrustManagerFactory;import java.io.IOException;import java.io.InputStream;import java.net.Inet4Address;import java.security.*;import java.security.cert.CertificateException;import java.util.ArrayList;import java.util.List;public class CustomCassandraConnectionFactory implements CassandraConnectionFactory { @Override public Cluster createCluster(CassandraConnectorConf conf) { try { return clusterBuilder (conf).build(); } catch (Exception e) { throw new RuntimeException(e); } } @Override public Set<String> properties() { try { return new HashSet<String>(); } catch (Exception e) { throw new RuntimeException(e); } } private Cluster.Builder clusterBuilder(CassandraConnectorConf conf) throws CertificateException, NoSuchAlgorithmException, KeyStoreException, KeyManagementException, IOException { SocketOptions socketOptions = new SocketOptions(); socketOptions.setConnectTimeoutMillis(conf.connectTimeoutMillis()); socketOptions.setReadTimeoutMillis(conf.readTimeoutMillis()); List<Inet4Address> hosts = new ArrayList<Inet4Address>(); scala.collection.Iterator iter = conf.hosts().toIterator(); while (iter.hasNext()) { Inet4Address a = (Inet4Address) iter.next(); hosts.add(a); } Cluster.Builder builder = Cluster.builder() .addContactPoints(hosts.toArray(new Inet4Address[0])) .withPort(conf.port()) .withRetryPolicy( new MultipleRetryPolicy(conf.queryRetryCount(), conf.queryRetryDelay())) .withReconnectionPolicy( new ExponentialReconnectionPolicy(conf.minReconnectionDelayMillis(), conf.maxReconnectionDelayMillis())) .withLoadBalancingPolicy( new LocalNodeFirstLoadBalancingPolicy(conf.hosts(), conf.localDC(), true)) .withAuthProvider(conf.authConf().authProvider()) .withSocketOptions(socketOptions) .withCompression(conf.compression()); if (conf.cassandraSSLConf().enabled()) { SSLOptions options = createSSLOPtions(conf.cassandraSSLConf()); if (null != options) { builder = builder.withSSL(options); } else { builder = builder.withSSL(); } } return builder; } SSLOptions createSSLOPtions (CassandraConnectorConf.CassandraSSLConf conf) throws KeyStoreException, IOException, CertificateException, NoSuchAlgorithmException, KeyManagementException { if (conf.trustStorePath().isEmpty()) { return null; } try (InputStream trustStore = this.getClass().getClassLoader().getResourceAsStream(conf.trustStorePath().get())) { KeyStore keyStore = KeyStore.getInstance(conf.trustStoreType()); keyStore.load(trustStore, conf.trustStorePassword().isDefined() ? conf.trustStorePassword().get().toCharArray() : null); TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm()); tmf.init(keyStore); SSLContext context = SSLContext.getInstance(conf.protocol()); context.init(null, tmf.getTrustManagers(), new SecureRandom()); ClassTag<String> tag = scala.reflect.ClassTag$.MODULE$.apply(String.class); return JdkSSLOptions.builder() .withSSLContext(context) .withCipherSuites((String[]) conf.enabledAlgorithms().toArray(tag)).build(); } }}
Copy the trust store file downloaded in the earlier step to the cassandra-count/src/main/resources directory.
Additional Properties are needed to set up the connect for the ssl connection to cassandra
Property Name
Description
spark.cassandra.connection.ssl.enabled
boolean switch in indicate whether the connection to cassandra should use SSL
spark.cassandra.connection.ssl.trustStore.password
The password matching the Trust Store
spark.cassandra.connection.ssl.trustStore.path/td>
The path to the trust store file. With the Custom Factory in this example this is a path to a resource instead.
spark.cassandra.connection.factory
For overriding the behaviour of the default Spark Cassandra Connector. When used it should name of the class that implements CassandraConnectionFactory. Details of this class can be found at the DataStax Spark Cassandra Connector page at gitHub
Create a file called cassandra-count.conf in the cassandra-count directory (this file contains the configuration that will be used when we submit the job):
spark.master spark://<spark_master_private_IP1>:7077,<spark_master_private_IP2>:7077,<spark_master_private_IP3>:7077spark.executor.memory 1gspark.cassandra.connection.host <private ip of cassandra>spark.cassandra.auth.username iccassandraspark.cassandra.auth.password <iccassandra password>spark.serializer org.apache.spark.serializer.KryoSerializerspark.eventLog.enabled truespark.eventLog.dir .spark.cassandra.connection.ssl.enabled truespark.cassandra.connection.ssl.trustStore.password <trust store password>spark.cassandra.connection.ssl.trustStore.path truststore.jksspark.cassandra.connection.factory CustomCassandraConnectionFactory
Build the job (from cassandra-count directory):
sbt assembly
Submit the job (from cassandra-count directory):
~/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --properties-file cassandra-count.conf --class cassandraCount target/scala-2.10/cassandra-count-assembly-1.0.jar
You should see a lot of log messages with the row count message about 15 messages from the end.
Using Spark Shell
Connecting to Cassandra via SSL when using Spark Shell is achieved in the same fashion as Spark Submit. The jar containing the custom connection factory and trust store resource must be added to the list of jar files. The same configuration properties used to set up the context for the SSL connection must also be specified. Below is an example Spark Shell Command
cd ~/spark-1.6.0-bin-hadoop2.6/bin./spark-shell --master spark://:7077,:7077,:7077 --conf spark.cassandra.connection.host= --conf spark.cassandra.auth.username=iccassandra --conf spark.cassandra.auth.password= --jars ~/spark-cassandra-connector-assembly-1.6.0-M1.jar,$HOME/examples/cassandra-count/target/scala-2.10/cassandra-count-assembly-1.0.jar --conf spark.cassandra.connection.ssl.enabled=true --conf spark.cassandra.connection.ssl.trustStore.password=instaclustr --conf spark.cassandra.connection.ssl.trustStore.path=truststore.jks --conf spark.cassandra.connection.factory=CustomCassandraConnectionFactory --files truststore.jks
Further Resources
You can find the source code used in this guide at this page.
View ArticleThis article provides a step by step example of using Apache Spark MLlib to do linear regression illustrating some more advanced concepts of using Spark and Cassandra together. The programming environment of this example is Zeppelin and the programming language is Scala. In this article, we assume that Zeppelin and cluster has been set up and provisioned properly as show in our previous tutorials: Getting started with Instaclustr Spark and Cassandra and Zeppelin with Instaclustr Spark & Cassandra Tutorial.
1. Prepare Data
(1) Determine features and target
The first thing we need to do is to prepare the data we want to use. But before we can do that, we must determine what data points to use as modeling features and what to use as our target. Our sample data set for this case is a monitoring data set with over 2000 metrics. Using all of these as features is not practical. In this example, we only use three metrics:
service 1: CPU_Percentage (feature 1)
service 2: /var/lib/instaclustr disk-free-percent (feature 2)
service 3: /cassandra/metrics/type=Keyspace/keyspace=instaclustr/name=WriteLatency/max (target)
(2) Transform data
In MLlib, all the features must be put into a special data structure called Vector. The vector we need to construct is <feature 1: [CPU Percentage], feature 2: [Disk Free], target/label: [Write Latency]> with one vector for every host/time combination in the data set. The schema of the source Cassandra table (which is called Instametrics.events_raw) is <host, service, time, metric> shown as Table1. Obviously, the two structures are not compatible. So we need to transform the data we get from the Cassandra table to fit the vector structure.
Table 1: Cassandra table in instametrics.events_raw
host
service
time
metric
host1
service1
time1
[value]
host1
service1
time2
[value]
host1
service1
time3
[value]
......
......
......
......
host2
service2
time1
[value]
host2
service2
time2
[value]
host2
service2
time3
[value]
......
......
......
......
host3
service3
time1
[value]
host3
service3
time2
[value]
host3
serivce3
time3
[value]
......
......
......
......
We need to combine the entries which have service name of service1, service 2 or service 3 and the same values of host and time to construct the feature vector: <service1, service 2, service 3>. The easy way to do the combination is using join operation in Spark. However, a join operation in Spark can be extremely expensive, especially for datasets of large size, because it requires data shuffling among cluster nodes. So we need to find an efficient way to do join.
As the required feature vector is the combination of values of services, our first step in transforming the data was to save the data of each service into a separated Cassandra table. This is good for repeated use: when you want change the structure of feature vectors, you can pick up the desired service tables and combine them together.
Before starting coding, we need to use cqlsh to create the corresponding schemas and set partition key as (host, time), as we need to join these service tables and joining by partition key is much faster than general join. We can the following code to create the schemas in cqlsh.
CREATE KEYSPACE features WITH REPLICATION={'class' : 'SimpleStrategy', 'replication_factor' : 1 };
USE features;
CREATE TABLE cpuPercent(host text, time timestamp, metric double, primary key ((host,time)));
CREATE TABLE wLatencyMax(host text, time timestamp, metric double, primary key ((host,time)));
CREATE TABLE diskFree(host text, time timestamp, metric double, primary key ((host,time)));
After that, we can use the following code to save data to Cassandra tables. It is worth mentioning that the partition key of Cassandra table events_raw is (host, service) and we have anther table called host only contains a unique list of Host IDs. In the following, we iterate hosts and use where condition to fetch needed data from table events_raw by partition key, which is more efficient than fetching the whole table and then using a filter operation in Spark.
%dep
z.load(<full path to the spark-cassandra-cannector-assembly-1.6.0-M1.jar>)
import org.apache.spark.{SparkConf, SparkContext};
import org.apache.spark.SparkContext._;
import org.apache.spark.util.Vector;
import org.joda.time.DateTime;
import com.datastax.spark.connector._;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.regression.LinearRegressionModel;
import org.apache.spark.mllib.regression.LinearRegressionWithSGD;
import import java.io.FileOutputStream;
import org.apache.spark.mllib.linalg.Vectors;
import java.io.ObjectOutputStream;
import import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.io.ObjectInputStream
val hosts = sc.cassandraTable(instametrics,host).as((r:String) => (r)).collect;
var i=0;
for (i <- 0 until hosts.length) {
sc.cassandraTable(instametrics,events_raw).select(host,time,metric ).where(host=+hosts(i)+and service=<service 1>).saveToCassandra(features,cpuPercent,SomeColumns(host,time,metric));
sc.cassandraTable(instametrics,events_raw).select(host,time, metric).where(host=+hosts(i)+and service=<service 2>).saveToCassandra(features,wLatencyMax,SomeColumns(host,time,metric));
sc.cassandraTable(instametrics,events_raw).select(host,time, metric).where(host=+hosts(i)+and service=<service 3>).saveToCassandra(features,diskFree,SomeColumns(host, time,metric));
}
Then we get three feature tables of the following structures:
table 2: service 1
table 3: service 2
table 4: servie 3
host1
time1
[value]
host1
time1
[value]
host1
time1
[value]
host1
time2
[value]
host1
time2
[value]
host1
time2
[value]
host1
time3
[value]
host1
time3
[value]
host1
time3
[value]
......
......
......
......
......
......
......
......
......
host2
time1
[value]
host2
time1
[value]
host2
time1
[value]
host2
time2
[value]
host2
time2
[value]
host2
time2
[value]
host2
time3
[value]
host2
time3
[value]
host2
time3
[value]
......
......
......
......
......
......
......
......
......
Finally we use a method provided by the spark-cassandra-connecter called joinWithCassandraTable, which can join two Cassandra tables on partition keys and no shuffling is required.
val data=sc.cassandraTable[(String,DateTime,Double)](features,cpuPercent)
.joinWithCassandraTable [(String,DateTime,Double)](features,diskFree)
.map{case((h1,t1,s1),(h2,t2,s2))=>(h1,t1,s1,s2)}
.joinWithCassandraTable[(String,DateTime,Double)](features,wLatencymMax)
.map{case((h1,t1,s1,s2),(h3,t3,s3))=>LabeledPoint(s3,Vectors.dense(S1,S2))};
The end result is a RDD table with the vector structure we need for input into the Spark MLlib regression process:
Table 5: Labeled Feature Vector
Service 1 (feature 1)
Service 2 (feature 2)
Service 3 (target/label)
h1s1t1
h1s2t1
h1s3t1
h1s1t2
h1s2t2
h1s3t2
h1s1t3
h1s2t3
h1s3t3
......
......
......
h2s1t1
h2s2t1
h2s3t1
h2s1t2
h2s2t2
h2s3t2
h2s1t3
h2s2t3
h2s3t3
......
......
......
2. Train and Test Model
With the data prepared we can start feeding them to MLib. Linear Regression is supervised machine learning algorithm which consists of training and prediction and we need two datasets for the two procedures. The following code splits the dataset we get from Part 1 into two subsets: training and test.
val splits=data.randomSplit(Array (0.8,0.2));
val training=splits(0).cache;
val test=splits(1).cache;
Finally, we can define the Linear Regression algorithm, train and test the model. MLlib provides settings to adjust the algorithm to the needs, but we will use the default for purpose of this example.
val algorithm = new LinearRegressionWithSGD();
val model = algorithm.run(training);
val prediction = model.predict(test.map(_.features));
3. Analysis Results
After we get the prediction for the testing dataset, we can evaluate the model we build. In this example we use root mean squared error (RMSE) to quantify the accuracy of the model.
val predictionAndTarget = prediction.zip(test.map(_.label));
val RMSE = math.sqrt(predictionAndTarget.map{case(p,t)=>math.pow((p-t),2)}.mean());
println(RMSE:,RMSE);
Normally, the smaller the RMSE is, the more accurate the model is. But we also need to be very careful to avoid overfitting.
4. Save and Load Model
The linear regression model consists of weights and intercept. Using following code, we can save the model we build to disk for future use and the model file can be found in the installed folder of Zeppelin.
val fos = new FileOutputStream("myModelPath");
val oos = new ObjectOutputStream(fos);
oos.writeObject(model);
oos.close;
We can load the model from disk using the following code.
val fos = new FileInputStream("myModelPath");
val oos = new ObjectInputStream(fos);
val sameModel = oos.readObject().asInstanceOf[LinearRegressionModel];
View ArticleList
The Partition Size: Average column displays the estimated average size on disk of partitions read from the table.
The Partition Size: Max column displays the estimated size on disk of the largest partition read from the table.
Reading large partitions can have a detrimental impact on query performance, and may indicate that data is not being evenly spread around the cluster.Contact [email protected] if you need further assistance in dealing with partition sizeissues.
Graph
Under the Table Info group, the Partition Size: Max and Partition Size: Average graphs displays estimated maximum and average size on disk of partitions read from each table.
View ArticleList
The OS Load: Average column displays the average amount of processes using or waiting for CPU on your node over a period of time.
Average OS load larger than the number of coresindicates that the CPU was temporarily overloaded and reached the limits of its processing capacity. If you are experiencing consistently high OS load and not reaching the desired throughput on your Cassandra cluster, you may need to tune your data model or add nodes to your cluster to increase processing capacity. Contact [email protected] if you need further assistance in dealing with OS load issues.
Graph
Under the OS Loadgroup, the OS Load: Average graphs displays the average amount of work the computer system performs on each node.
View ArticleList
The Synthetic Transaction Latency: Read (ms)column displays the average time in milliseconds of local read requests processed by a node in the cluster.
TheSynthetic Transaction Latency: Write (ms)column displays the average time in milliseconds of local write requests processed by a node in the cluster.
Synthetic Transaction read or write latency is average time to complete a simple read or write operation to an Instaclustr controlled table on each node in the cluster. Reads and writes occur at QUOROM so and are the total operation time measured at the synthetic transaction client on the node.
The Synthetic Transactions are design to isolate client application or data model / data condition related issues from basic cluster health issues. Synthetic transactions will not be impacted by client or data issues unless they get to the point of impacting the general health of the cluster.
Graph
Under theSynthetic Transaction Latency:group, theSynthetic Transaction Latency:Read (ms)and Synthetic Transaction Latency: Write (ms)graphs displays the average time in milliseconds of local read or write requests processed by each node.
View Article1. To access monitoring tools, log into Instaclustr console.
2. Click Monitoring from the Manage Cluster menu of your cluster. This opens cluster Summary page.
3. The cluster Summary page displays monitoring information like CPU usage, Disk usage etc. for all the nodes in the cluster.
4. Click drop down menu on Monitoring tab to access other monitoring pages like Metrics Lists, Metrics Graphs and Cluster Health. Detailed monitoring information about the cluster can be accessed from these pages.
View ArticleThe following roles may be assigned to users in an account. A user can only have one role at a time. Users with Owner level access can change the roles of other users.
Role Name
Description
Owner
An account owner has full permission to execute all actions in an account. Each account must have a least one owner.
Cluster Admin
Can create, delete and modify cluster settings - essentially all functions excepted adding/removing/changing users for an account and changing billing information.
Read Only
Can view all information in an account but not change any settings.
Billing
Can view all information and change billing settings.
Deny access
Has no access to the account (use this role to remove users).
View ArticleThis article shows the step by step process to sign up for an Instaclustr console account. Once you have signed up and verified your email address you will be able to view detailed pricing information, provision clusters and access our support area.
1. Go to the Sign Up page.
2. Enter the required information and click the Sign Up button. Choose a secure password as it will be used to manage all aspects of your cluster account, including cluster deletion.
3. You should receive a confirmation email at the email address that you used for sign up.
Note: If you have not yet received a confirmation email, please wait a minute before checking again. If you still have not received a confirmation email, click the Resend Verification E-mail button.
4. Check your email inbox and open the Instaclustr welcome message, which contains a link to verify your email address and Instaclustr account. Click the Verify My Email Address button. Note, you cannot start a free trial cluster until you have verified your email address.
5. After clicking the Verify My Email Address button you will be redirected to a page confirming your email address verification. You will also be automatically logged into the console.
Note: While you can initiate the creation of your clusters immediately after signing up with us, your clusters will not be provisioned until you have verified your account. The 14 day free trial allows for one small cluster running at a time. To create additional clusters you will need to submit your billing information.
To enter billing information, click the Home link.
ClickAccount tab and thenBilling tab.
6. Once you have completed the Billing form, click the Save Payment Details button. You may notice a temporary $1.00 pre-authorisation to your account, this is necessary to verify the authenticity of your account details, but will not be processed.
7. You have now completed your registration process and are ready to create a Cassandra cluster! Refer our support articles on creating and connecting to a cluster.
Note: You may be subject to provisioning limits, but can be authorised for additional nodes by contacting us at:
View ArticleWhen you first sign up with Instaclustr, our system will automatically create both a user and an account. Any clusters you create will be owned by the account and you can invite other users to access your account, either to view information such as monitoring or to be able to create and manage clusters under a single set of billing information. The full list of roles available for user access is documented here.
Account details, including adding users, can be managed from the Account tab of the Instaclustr console.
From this tab you can:
change the account name;
set a support contact emails for your account (for example, if you have a dev-ops group email you would like us to use to contact you with support matters);
manage the billing information for the account (credit card, contacts);
view and generate API keys for the account;
view and add Encryption keys for use with Amazon's EBS at-rest encryption;
view users who currently have access to the account; and
change user access including inviting and removing users.
An individual user (email address) can have access to multiple accounts (for example, test and production accounts). If you have access to multiple accounts you will be prompted to select an account when you log in and can change your selected account by using the menu on the top right of the screen.
View ArticleTwo-factor authentication uses both your password (something you know) and a one-time password generated by an app on your smart-phone (something that you possess) to secure your Instaclustr account.
Enabling
Before you start, you will need an OTP password app. We suggest one of the following:
Suggested Apps
iOS App Store
Android Google Play Store
Windows 10 Mobile Windows Store
OTP Auth Roland Moers
Google Authenticator Google, Inc.
Authenticator Microsoft Corporation
Google Authenticator Google, Inc.
FreeOTP Authenticator Red Hat, Inc.
Tofu Authenticator Calle Erlandsson
Authy Authy
Sophos Authenticator Sophos GmbH
Sophos Authenticator Sophos GmbH
Search for TOTP on the App Store for more apps.
Compatible with iPhone & iPod Touch.
Compatible with iPad.
Offers Apple Watch app.
Search for TOTP on Google Play for more apps.
For Android devices not utilizing Google Play services, search for TOTP on the device manufactures' store.
Search for TOTP on the Windows Store for more apps.
Steps to Enable:
1. Log onto the Console and select the User tab:
2. Click the Enable button next to Two-factor Authentication:
3. Scan the QR code with your favouriteTOTP app, enter the generated code and click Verify & Enable button to confirm.
4. If 2FA is successfully enabled on your account a confirmation message will appear:
Logging in
Once 2FA is enabled on your account, you will need to enter both your password AND the code generated by your OTP app:
View ArticleSign up
Go to our AWS Marketplace product listing and subscribe. You will be redirected to our login page. If you don't have an Instaclustr login follow the"Sign Up" link: product listing pricing
Billing
All cluster and add-on costs are displayed in "Standard Node" units. Consult the for conversion to USD. The chargesfor theusage of our services will be included in your monthly AWS bill.
Un-subscribe
You can unsubscribe via Amazon Marketplace at any time. When you doall theclusters in the account will be deleted.
Limitations
You can only create clusters onAWS infrastructure. Touse other supported providers you can create a new account which is not linked to your AWS subscription (you will need to provide credit card details):
View ArticleOur technical support is backed by extensive expertise in Apache Cassandra, ScyllaDB, Elasticsearch, Elassandra, Apache Spark, Apache Zeppelin, Kibana, Apache Luceneand a deep knowledge of NoSQL technologies and big data solutions.
The Instaclustr support desk is a 24-hour, 365 days a year service that allows any Instaclustr customer with a cluster that is eligible for support to report issues.
Support Contacts
Access method
Description
Lodge through [email protected]
Web
Lodge through Instaclustr self-service portal at https://support.instaclustr.com
For more information about our support offerings including our contracts and policy please email us at [email protected] can also drop us a message via https://www.instaclustr.com/company/contact/
Support levels
Please see the SLA policy section of our website for details of our support levels and SLAs: https://www.instaclustr.com/support/policies/
Consulting
Instaclustr offers extensive expertise in Apache Cassandraandrelated technologies. Consultation is available to all customers interested in modellinglarge data sets or curious about best practice usages.
If you're setting up your first Cassandra Cluster, or you're a veteran of Big Data and are looking for fresh ideas, please feel free to contact us for any advice or ideason how to proceed.
View ArticleDevCenter is a visual CQL (Cassandra Query Language) query tool provided by DataStax. This article describes how you can set up DevCenter to interact with clusters in Instaclustr. In this article, we assume that your cluster has been set up and provisioned properly as show in our previous tutorial Creating a Cluster.
1. Install DevCenter
Ensurejava has been installed on the local machine, because DevCenter is an Eclipse RCP-based application. The following command can be used to check whether java has been installed or not.
java -version
The output will look something like: java version 1.8.0_40, if java is installed. If java is not installed on your machine, please refer https://www.java.com/en/download/help/index_installing.xml to install java first.
DevCenter can be downloaded from DataStax. If your machine is Mac, before you download the application, please go to System Preferences -> Security & Privacy and select Anywhere option for Allow apps downloaded from first. After downloading, you can change back to your original setting.
Unzip the downloaded zip file and launch DevCenter by double clicking DevCenter.exe or DevCenter.app. Then you can see the following window.
2. Connect to a Cluster
Before trying to connect a cluster, the IP address of your local machine must be added to the Cassandra Allowed Addresses at Settings panel in Instaclustr console.
A connection to a cluster can be set up by the following steps:
In DevCenter, choose File -> New -> Connection or click Create a new connection button in the Connections panel to create a new Connection. Then you will see the following window.
Set up the Connection name and input the public IP address of the nodes you want to connect to into Contact hosts. You can find the public IPs of the nodes in Instaclustr console at the location marked by the red frame in the following picture:
Click the Add button to add the IP address to connection settings. If authentication and encryption are not enabled in your cluster, you can click Finish button and the connection is ready. Otherwise, click Next button and then you will see the following window.
If your cluster has authentication enabled, Select This cluster requires credentials option and enter your username and password of the cluster into Login and Password respectively. The information can be found in Instaclustr console Connection Info as shown in the following picture:
If your cluster has encryption enabled, you must install Java Cryptography Extension (JCE) on your local machine first. You need to download the version that matches your installed JVM and copy local_policy.jar and US_export_policy.jar to the java installation directory (Note: these two jars will be already there so you have to overwrite them):
Mac OS X: /Library/Java/JavaVirtualMachines/jdk1.major.minor_update/Contents/Home/jre/lib/security
Linux: /usr/lib/jvm/jdk1.major.minor_update/jre/lib/security
Windows: \Program Files\Java\jre7\lib\security
You can use the following command to copy the files.
For Mac and Linux:
cp <source file> <destination directory>
For Windows:
copy <source file> <destination directory>
Select This cluster requires SSL option in the above connection configuration panel and download the ca-certificates file from Instaclustr console at Connection Info shown in the following picture.
You can find the truststore file truststore.jks in the downloaded ca-certificates folder and truststore password (default KeyStore password is instaclustr) in the Read Me.txt file. Enter the full path to (or navigate to) the truststore file you downloaded and truststore password. Finally, you can click Finish button to complete the connection setup.
Your connection will be listed in the Connections panel of DevCenter. To start connecting to your cluster, select your connection in the Connections panel and click Open Selected Connection button OR right-click on your connection and click Open Connection.
3. Basic interaction with DevCenter
Once your connection has been set up successfully, you can view keyspaces and Cassandra tables of your cluster in the Schema panel of DevCenter.
After selecting the connection and keyspace you want to interact with, you can write CQL statements in the CQL editor. You need to click the green button on the top of the CQL editor panel to run your CQL. Here is a simple example:
View ArticleInstaclustr's automated provisioning system, with some help from our technical operations team, makes provisioning a multi-datacenter cluster easy. However, there are still several steps to co-ordinate - this article provides a step-by-step guide. (This article uses the abbreviation dc for datacenter from here on)
New Clusters
If you are creating a cluster and plan on multiple dcsfrom the beginning, then the process is simple:
Create a single dccluster using the Instaclustr console. Refer our support articles on how to signup for an instaclustr account and create a cluster.
Use the "Add Datacenter" button on the cluster details page ofthe console to configure and request an additional dc. (for a step by step guide see Expanding your cluster ).
Instaclustr Support will receive the request, verify with you that your cluster is prepared and then allow the provisioning system to provision the new dc.
When creating your schema and application:
Ensure that you use NetworkTopologyStrategy as the replication strategy when creating keyspaces. Specify the number of replicas that Cassandra should maintain in each dc.
When connecting to Cassandra consider the appropriate consistency factor for your use case - the difference between QUOROM and LOCAL_QUOROM(or ONE and LOCAL_ONE) becomes significant once you have multiple dcs.
Existing Clusters
If you wish to add a dcto a cluster that is in use then there a few more steps required to set up the new replica while minimising impact to your existing cluster:
Ensure all keyspaces are configured with NetworkTopologyStrategy and replication for the existing dc.
Ensure your application is using LOCAL_* consistency factor when connecting to your existing dc(even if you plan to use cross-dc consistency at a later date you probably don't want cross-dc queries to start until the new dc is fully set up).
Use the "Add Datacenter" button on the cluster details page of the console to configure and request an additional dc. (for a step by step guide see Expanding your cluster).
Instaclustr Support will receive the request, verify with you that your cluster is prepared and then allow the provisioning system to provision the new dc. Support will confirm that the system keyspaces are correctly synched to the new dcs.
When advised by Instaclustr Support that the new dc is ready, alter the replication strategy for your keyspaces to specify the number of replicas in the new dc and advise Instaclustr Support when complete.
Instaclustr Support will execute nodetool reload on each node in the new dc to sync data from the existing dc.
Network Usage
Be aware that cross-region and cross-provider replication will count towards your network usage under Instaclustr's fair use policy and may incur additional charges when running in Instaclustrs cloud provider account. If you are planning on using multiple dcs in this configuration then please contact Instaclustr Support to assist with capacity planning.
Further Questions
As always, Instaclustr Support is available to provide additional information and guide you through this process.
View ArticleDatacentersrunning on Amazon'sEBS infrastructurecan be encrypted with an AWSKMS key. This will encrypt both your EBS volumes andS3 backups.This involves a fewsteps to set up:
In your AWS account:
1. Go toIAM Encryption Keys
2.Create/viewan AWS Encryption Key in the datacenter's intended region.
3. At this stage, you need to grant key access to a role you created earlier.Details on how to set up this role are in the Instaclustr AWS Setup Guide, in 'Configure IAM role for cross-account access'. By default, the role is called 'instaclustr'.In the 'Key users' section, under 'This account', addthis role.
4. AddInstaclustr's account (624537489435) as anExternal Account.
In your Instaclustr account:
1. Go to Account Encryption Keysto add encryption keys.
You'll need the AWS key's ARN, found in the key's detailsafter key creation.
The aliaswill identifythis keyin other parts of the Instaclustr console.
2.When you Create a cluster or Add a datacenter :
Select an EBS-based Node Size, and
Under EBS Encryption, select Encrypt data at rest and select a key from the dropdown. The keys listed will be those that have been previously added and are in the same region as the datacenter being requested.
3. Finishthe create a cluster or add a datacenter process to provision the encrypted datacenter.
That's it! Encryption and decryption will be handled transparently by AWS' Key Management Service, so use the datacenter asyou would with a datacenter ofno encryption.
For more information regarding Amazon's encryption service see
Share Custom Encryption Keys More Securely Between Accounts by Using AWS Key Management Service
Amazon EBS Encryption
Enabling this feature on existing cluster
Most clusters will require a DC migration to move to encrypted EBS.
Set up your AWS Encryption keys as per the process above, and email [email protected] to request adding this on your existing cluster.
Further Questions
We areavailable to provide additional information and guide you through this process. Please email or raise a new ticket.
View ArticleCqlsh is a utility for running simple CQL (Cassandra Query Language) commands on a local or remote Cassandra cluster. This article describes how cqlsh can be used to connect to clusters in Instaclustr. In this article, we assume that your cluster has been set up and provisioned properly as shown in our previous tutorial Creating a Cluster.
1. Prerequisites
Python 2.7 or later version needs to be installed. If youdon'thave python installed on your local machine, please refer to the appendices at the end of this article.
You also need Cassandra binaries which can be downloaded from http://archive.apache.org/dist/cassandra/. We recommend Cassandra 2.1 or later. Youdon'thave to install Cassandra after downloading, if you only want to use cqlsh.
The public IP address of your machine must be added to the Cassandra Allowed Addresses in Settings tab of your cluster in the Instaclustr console (Refer this support article ).
Username, password and certificate file can be found on the Connection Info page of your cluster. Certificate files are required to connect to your cluster with SSL.
https://www.python.org/downloads/windows/
2. Connecting to Instaclustr without SSL
If encryption is not enabled in your cluster, you can connect to it using cqlsh without SSL.
a) For Mac/Linux:
Open your terminal and use the following command to connect to your cluster. Note: if authentication is not enabled in your cluster, youdon'tneed the options -u and -p.
For tarball installation:
Cassandra/bin/cqlsh public_ip_of_your_node 9042 -u your_username -p your_password
For package installation:
cqlsh public_ip_of_your_node 9042 -u your_username -p your_password
For binary/source download:
Cassandra/bin/cqlsh public_ip_of_your_node 9042 -u username -p your_password
b) For Windows:
Run cmd.exe as administrator and enter the user home directory where your Cassandra is downloaded.
For tarball installation and binary/source download:
python Cassandra/bin/cqlsh public_ip_of_your_node 9042 -u your_username -p your_password
For package installation:
python cqlsh public_ip_of_your_node 9042 -u your_username -p your_password
3. Connecting to Instaclustr with SSL
If encryption is enabled in your cluster, SSL is needed for connecting to the cluster and cqlshrc file is used to configure SSL encryption.
a) For Mac/Linux
Open your terminal and using the following command, create a .cassandra/cqlshrc file in your user home directory.
cd
touch .cassandra/cqlshrc
Open the empty cqlshrc file using the following command.
vi .cassandra/cqlshrc
Copy the following content and paste it into the empty cqlshrc file. Then save the file.
[authentication]
username = your_username
password = your_password
[cql]
version = 3.2.1
[connection]
hostname = public_ip_of_your_node
port = 9042
[tracing]
max_trace_wait = 10.0
[ssl]
certfile = full_path_to_cluster-ca-certificate.pem
validate = true
factory = cqlshlib.ssl.ssl_transport_factory
Now you can start cqlsh with the --ssl option.
For tarball installation and binary download:
Cassandra/bin/cqlsh -ssl public_ip_of_your_node
For package installation:
cqlsh -ssl
For source download:
Cassandra/bin/cqlsh -ssl
b) For Windows
Open notepad, create a new file and name it to cqlshrc. Copy the following content and paste it into the file. Then save the file into the .cassandra directory under your user home directory.
[authentication]
username = your_username
password = your_password
[cql]
version = 3.2.1
[connection]
hostname = public_ip_of_your_node
port = 9042
[tracing]
max_trace_wait = 10.0
[ssl]
certfile = full_path_to_cluster-ca-certificate.pem
validate = true
factory = cqlshlib.ssl.ssl_transport_factory
Now you can start cqlsh with the --ssl option.
For tarball installation or binary download:
python Cassandra/bin/cqlsh --ssl public_ip_of_your_node
For package installation:
python cqlsh --ssl
For source download:
python Cassandra/bin/cqlsh --ssl
4. Troubleshooting
If you encounter a cql version error like: "cql_version '3.3.1' is not supported by remote (w/ native protocol). Supported versions: [u'3.2.1']", run cqlsh with an extra option --cqlversion=3.2.1. You can follow the example below:
For tarball installation and binary/source download:
Cassandra/bin/cqlsh --cqlversion=3.2.1
For package installation:
cqlsh --ssl --cqlversion=3.2.1
Use the same option for connecting to ssl enabled clusters:
cqlsh --ssl --cqlversion=3.2.1
4. Additional Resources
For additional information on CQL, refer the following resources:
http://cassandra.apache.org/doc/latest/cql/index.html
Appendices
Appendix A:Install Python on Windows
Download Python fromand we recommend version 2.7 or later.
Navigate to the download location on your computer, double clicking the Python MSI file and press the "Run" button when the dialog box pops up. Then you will see the following window.
If you only have one user account on your computer, you can select the Install for all users option. If you have multiple accounts on your computer anddon'twant to install it across all accounts, you can select the Install just for me option. Press the Next button, then you will see the following window.
If you want to change the install location, feel free to do so. However, it is best to leave it as it is. Then press the "Next" button and you will see the following window.
Scroll down in the window, find the Add Python.exe to Path, click on the small red x button and choose the Will be installed on local hard drive option. Press the Next button, then the installation starts. You will notice that the installation will bring up a command prompt window. Please wait and do nothing until you see the following window.
Click the "Finish" button to exit the installer.
If you chose Python 3.4.1 or later version, you will not need to manully add Python to System Path Variable. Then you can skip the following procedure.
Once you have successfully installed Python, it is time to add it to the System Path Variable. Doing this will allow Python to run scripts on your computer without any conflicts of problems.
Openthe start menu, search environment and select the option called Edit the system environment variables.
After the System Properties window pops up, select "Advanced" panel and click on Environment Variables.
After the "Environment Variables" window pops up, click the New button in the "System variables" group to create a new variable for Python as shown in the following picture. Then Press the "OK" button to save the changes.
Appendix B:Install Python on Mac
If you already have homebrew installed on your computer, you can skip this step. Otherwise use the following commands to install homebrew.
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
export PATH=/usr/local/bin:/usr/local/sbin:$PATH
Once homebrew is installed, you can install Python using the following command.
brew install python
Appendix C:Install Python on Linux
Download Python and extract it using the following commands.
wget --no-check-certificate https://www.python.org/ftp/python/2.7.11/Python-2.7.11.tgz
tar -xzf Python-2.7.11.tgz
Build and install Python with the following commands.
cd Python-2.7.11./configuremake
sudo make install
View ArticleNow that you've created a cluster, it's time to connect your application to your new Cassandra cluster. Aside from setting your firewall rules, Instaclustr provides a few examples (customised to your cluster) for the wide selection of client libraries Cassandra supports to help you through this process.
1. After your Cluster has finished provisioning, navigate to your Cluster Details page. All your nodes should be in a Running state with no errors listed. Some infrastructure providers allocate both public and private (data centre local) IP addresses to cluster nodes. If both are available, they will be listed with their respective nodes on this page.
Note: We suggest that if your application is running within the same data centre as your cluster, you investigate configuring your client to connect to the private addresses. Please refer to your providers pricing documentation.
Getting Started with DevCenter for Cassandra development
2. Instaclustr manages the firewall permissions for the nodes in your cluster. Each cluster node only allows connections from one or more trusted IP addresses. You can add one or more trusted IP addresses to the cluster firewall on the Cluster Settings page for the cluster.
ClickCluster Settingsfrom the Manage Cluster menu.
3. Under Firewall Rules section, enter any additional IP addresses you wish to trust to the Cassandra Allowed Addresses list.
Note: If your cluster is running on AWS and using VPC peering and you would prefer that an AWS security group be allowed to connect rather than individual addresses, please raise a support request.
Click the Save Cluster Settings button when you are finished.
Note: this setting page contains two other option settings under the Cluster section. Firstly, you can enter a description of your cluster. Secondly, the Two-factor Delete option provides additional security against accidentally deleting a cluster. If it is enabled, a member of our Support team will confirm a delete request via a designated e-mail address or telephone number before your cluster is deleted.
4. Instaclustr provides connection information and examples on the Connection Info page which can be accessed by clicking Connection Details from the Manage Cluster menu.
5. The Connection Info page contains a list of your node addresses, authentication credentials to connect to your cluster and a few connection examples for popular clients Cassandra supports.
At this point we provide custom examples for:
CQLSH
Java
Python
Ruby
6. We recommend reviewing the following support articles as a next step:
Connecting to your cluster using CQLSH
View Article1. To get started, after setting up your user account, navigate to the Clusters Overview page and click the Create Cassandra Cluster button. This will take you to Create Cassandra Cluster page.
2. On the Create Cassandra Cluster page, choose a cluster configuration matching your performance and pricing requirements. Instaclustr recommends that cluster nodes are allocated across all racks within a data centre, and that the allocation be evenly distributed. This ensures stability, fault-tolerance and consistent performance.
Enter an appropriate name and network address block for your cluster. Refer our support article on Network Address Allocation to understand how we divide up the specified network range to determine the node IP addresses.
Instaclustr will automatically add the IP address of your computer to the cluster firewall. Additional addresses may be added or removed at any time.
Note: Instaclustr detects the IP address of the computer used to access the Dashboard. Certain web proxies may interfere with this mechanism and Instaclustr will see their IP address instead. We suggest you verify the detected address.
3. Under Applications section, select the appropriate application version and any add-ons you require.
4. Under Data Centre section, select your Infrastructure Provider, Region, Custom Name (which is a logical name for the data centre within Cassandra), Node Size and EBS Encryption option.
5. Under Cassandra Options section, select your Network and Security settings. The Summary section displays a brief summary of your cluster configurations and pricing details. Click the Terms and Conditions link to open the Instaclustr Terms and Conditions and other policy document. After going through the document, select checkbox to accept the Terms and Conditions. Once you are happy with the cluster configuration and accepted the terms and conditions, click the Create Cluster button to start creating the cluster.
6. Provisioning a cluster can take some time depending on the responsiveness of the underlying cloud provider. All status messages will be displayed along with the progress of the cluster creation on the Clusters page.
7. You have now finished creating your Cassandra cluster, congratulations!
All your clusters will be listed on the Clusters Overview page. You can view details of your cluster by clicking Cluster Details button under each cluster.
8. We recommend reviewing the following support articles as a next step:
Connecting to your cluster
Connecting to your cluster using CQLSH
Contact us at if there is any issue in provisioning your cluster.
View ArticleThere's a lot to explore with Instaclustr's offerings. Here's a collection of tutorials and other information to help you get started:
Creating a cluster
Connecting to Instaclustr Managed Cassandra with CQLSH
Getting started with Instaclustr Managed Spark and Cassandra
Using Apache Zeppelin with Instaclustr Spark and Cassandra Tutorial
View ArticleThis article includes a log of recentsignificant changes to the Instaclustr Managed Service.
October 2017
Date
Item
30 Oct
- Release support for Cassandra 3.11.1, 3.0.15, 2.2.11 and 2.1.19
25 Oct
-Bug Fix: Monitoring API related to Metrics returns value as 0.0 instead of NaN
12 Oct
- New Feature: Private Network Cluster (Beta)
10 Oct
-Minor Console UI Changes
September 2017
Date
Item
26 Sep
- Changes to allow Data Centre networks to be from any valid private address space (removes requirement for DC networks to be in same range)
- Deprecate region.defaultNetwork from Provisioning API
22 Sep
- New Feature: AWS Marketplace integration
21 Sep
-Fixed failed deletion of user firewall rules on GCP clusters
11 Sep
- Minor fix to Free Trials
- Improve node Security
4 Sep
-Minor bug fixes to backups
August 2017
Date
Item
31 Aug
Resizeable nodes improvements
Updates to default node provisioning limitsM4-l Resizeable nodes removed Resizeable nodes added to EU_WEST_2
Console Improvements
Fixed autocompleting 2FA/TOTP codes on login page.
30 Aug
-Updated branding on selected customer communications.
28 Aug
- Changes done to enable Broadcast Private IP for Azure -> RIYOA
28 Aug
-Removed redundant Free Trial informationwhen clusters are provisioned withRIYOA
25 Aug
- Fixed bugs of storage permissions in backup process
24 Aug
- Added options to choose custom subnet when creating or adding a data centre
22 Aug
Monitoring API updates
- Changed metrics limits in a single monitoring api call to 20
- Restored missing metrics incorrectly removed in previous deployment
21 Aug
Console and Monitoring API improvements
- Fixed a bug preventing display of metrics for materialised views
- Fixed a bug preventing display of certain latency metrics in Cassandra 3.11
17 Aug
Monitoring improvements
- Added monitoring for Kibana
15 Aug
- Minor improvements to restore process
2 Aug
Cassandra version update
- Added support for Cassandra 3.0.14
July 2017
Date
Item
31 July
Console Improvements
- Handle lack of certificates on the console
- Handle bad Zeppelin link
- Properly handle bundle versions selection
27 July
Certificate Management
- Fix in certificate renewal process
24 July
Scylla1.7.2
- Added support as a preview release
20July
Certificate Management
-Fix in certificate revocation process
19 July
Metric Collection
-Added configurable options for SLA metric collections
14 July
API Request Rate Limiting
- Implemented 70 request per second per user rate limit
13 July
Cassandra version update
- Added support for Cassandra 3.11, Cassandra 2.1.18 and Cassandra 2.2.10
11 July
Console Graph Improvements
- Customers can now multi-select nodes on the Cluster Summary and Metric Graphs pages
- Tooltip values are sorted, showing you the highest and lowest 5 nodes for a metric at any point in time
- Nodes in large clusters are consistently coloureduntil you refresh the page
6 July
Spark Update
- Release spark 2.1.1 with Scala 2.11
- Updated Spark-jobserver2.0.0 withSpark 2.1.1
- Release a new image for Zeppelin 0.7.1 with Scala 2.11 and Spark 2.1.1
4 July
AddedAWS AvailabilityZoneSupport
- Customers can now provision Clusters acrossthree racks(availability zones) in the Frankfurt (eu-central-1) andSo Paulo (sa-east-1) regions.
June 2017
Date
Item
19 June
Automated Repair improvements
- Improved handling of long running repairs- Improved handling of potential repair failure cases- Fixed an issue causing too many concurrent repairs
15 June
Elassandra/Kibana - Beta Release
- Elassandra and Kibana are now available as options for automatic provisioning through the Instaclustr management console (currently offered as a beta release without SLAs).
May2017
Date
Item
25 May
Cluster Data Centre Resize Support
- Adds the facility to vertically scale an entire data centre by adjusting the number of CPU cores and Memory quota allocated to each node.
19 May
EU (London) RegionSupport on AWS
- Customers can now provision clusters using the AWS EU_WEST_2 (London) region either in Instaclustr's AWS account or their own account.
8 May
Console Update
- Console/dashboard now indicates when Cassandra fails to report some metrics.
4 May
Console Update
- Added spark Cassandra connector support to console
- Fixed problem with CSS of Spark UI
3 May
Console Update
- Corrected reported CPU instance details for AWS T2.Mediums
1 May
Console Update
- Enforce restrictions on weak passwords
April 2017
Date
Item
24 Apr
Metrics collection
- Improvements to make metric collection more resilient to Cassandra exceptions
18 Apr
AWS EBS Optimisation
- Fix bug that caused some AWS instances to be not EBS-Optimized.
11 Apr
Spark Update
- Release spark 2.0.2 with Scala 2.11
- updated Spark-jobserver 2.0.0 with Scala 2.11
Zeppelin Update
- Release zeppelin 0.7.1 with scala 2.11
Console Improvements
- Updated Spark-Cassandra connector (Spark 2.0.2 for Scala 2.11)
7 Apr
GCP Offering
- Google Cloud Platform (GCP) no longer in beta
6 Apr
Console Update
- Corrected role description for cluster_admin role
March2017
Date
Item
30 Mar
Support new Cassandraversion
-Added support for the new Instaclustr Cassandra LTS release 3.7 (v3)
14Mar
Enable free trial on GCP
-n1-standard-1 is now available for a 14 day free trial with Google Cloud Platform.
February2017
Date
Item
21Feb
Support new Cassandraversions
-Added support forCassandra 3.0.10 and Cassandra 3.10.
15 Feb
Support new instance type on AWS: R4 XLarge
-Added support for R4 XLarge node size
- Added 2 configurations: himem bulk (R4 with 2000G disk space) and himem balanced (R4 with 1200G disk space)
9 Feb
ZeppelinUpdate
-released version 0.6.2 of Zeppelin
Azure Improvement
- reduced memory footprint of backup process
2 Feb
Console Improvements
- Fixed DAG Visualization display on Spark console
January 2017
Date
Item
11 Jan
Console Improvements
- Fixed 30-day graph navigation
- Cassandra Lucene plugin pricing change
- Improved custom AWS tag application
9 Jan
New region support for AWS and Azure
- US East (Ohio) and Asia Pacific (Mumbai) for AWS
- Australia East (NSW) and Australia South East (Victoria) for Azure
4 Jan
Console Improvements
-Added CSRF token check to logout path
Monitoring API
- Added unit field to each metric
December 2016
Date
Item
28 Dec
Console Improvements
- Added reCaptcha challenge on password reset requests
22 Dec
Cassandra LuceneIndex
- Added support for Cassandra Lucene Index Plugin as a preview release.
Azure Improvements
- Fixed some issues related to provision in existing resource groups
AWS RIYOA Tagging
- Add ability to add custom tags for RIYOA instances
15 Dec
Console Improvements
- Fixed Spark UI issues when using Jobserver
13 Dec
Azure Improvements and Monitoring API enhancements
- Enabled provisioning in existing Azure resource groups
- Removed Azure classic provisioner
- Added Cassandra data centre name of each nodes into the returning monitoring information.
November2016
Date
Item
30 Nov
Support for Google Cloud Platform (GCP) - Beta
-Added support for GCP. Customers are now able to provision Cassandra & Spark clusters on GCP either in Instaclustr's or their own account. For more details, please see this blog post.
22 Nov
Run in your own account security improvements
-Changes to standard run in your own account cross-account trust configurations for improved security
21 Nov
Cassandra version update
- Apache Cassandra ver. 3.7 (patched v2)and Datastax Enterprise 5.0.3, 5.0.4 added to provisioning options
14 Nov
Console improvements
- Updated Spark-Cassandra connector (Spark 1.6.x for Scala 2.10)
- Changed date format from dd/MM/yyyy to MMM d, YYYY
3 Nov
Console improvements
- Removed availability of Datastax Enterprise as a distribution option. Please contact [email protected] for DSE Support
- Added a missing CSRF check to API key management form
2 Nov
Console/API and Provisioning infrastructure improvements
- Changes to Instaclustr management architecture for improved security and reliability
October2016
Date
Item
26 Oct
Console Improvements
- Added previous/next period controls to graphs
25 Oct
Two-Factor Authentication
- 2FA authentication added as an optional feature for all Console users.
Console Improvements
- Fixed a graphing bug that displayed an error page immediately after a cluster has been provisioned.
18 Oct
Cassandra version update
- Apache Cassandra ver. 3.9, 3.7 (patched),3.0.9, 2.2.8 and 2.1.16 added to provisioning options
Console Improvements
- SLA Latency metrics re-named to Synthetic Transactions
- Reduced pressure on browser memory
- Graph refreshis now much less annoying
06 Oct
Synthetic Transactionmetrics
- Synthetic transactionread and write latencies are available from dashboard and api.
September 2016
Date
Item
29 Sep
Console & Provisioning API Improvements
-General minor improvements to the UI.
- ProvisioningAPI cluster info now exposes network and use private address broadcasting.
28 Sep
Zeppelin Update
-released version 0.6.1 of Zeppelin with Spark Cassandra Connector 1.6.2
23-Sep
Console Improvements- Rack allocation displayed in cluster creation, and data centre and node addition forms.- Cluster details page nowdisplays if PrivateNode Discovery is enabled or disabled.
23-Sep
Added thirdrack toAWS Sydney regionClusters in AWS Asia Pacific (Sydney) now support 3 racks (availability zones).
22-Sep
OS Load now available through console and API monitoringOS Load metric is now available in console monitoring view and through our monitoring API.
22-Sep
Add node process improvementsAutomated setting of some configurations when adding nodes and data centres to a cluster to reduce manual intervention required.
22-Sep
Changes to provisioning statusesNode now moves to "provisioned" as soon as AWS instance makes contact with our system. "Joining" state addedfor nodes where Cassandra is running but in joining state. (Billing commences on joining or running state.)
21-Sep
C* 3.7 patchUpdate our C* 3.7 offering with a patch to prevent segfaults with secondary indexes. See: https://issues.apache.org/jira/browse/CASSANDRA-12590.
Customers will be informed as individual clusters are updated.
8-Sep
Encrypted EBS SupportSee:https://www.instaclustr.com/blog/2016/09/08/encrypted-ebs-support/
2-Sep
Cluster Health PageSee:https://www.instaclustr.com/blog/2016/09/02/instaclustr-cluster-health/
1-Sep
MonitoringGraph Improvements- ability to disable auto refresh- graphs in a fixed order- sort keyspace and table names alphabetically
1-Sep
Make 3.7 default C* versionfor new clustersand remove 2.2.x from available versions for new clusters
View ArticleThere are some specific conditions regarding Instaclustr's support for IBM SoftLayer as an infrastructure providers which potential customers on this platformshould be aware of. These conditions arise from the fact that, due to our SoftLayer offering being based off bare metal rather than virtual servers, manual work is required to provision new servers in the SoftLayer environment (whereas AWS and Azure provisioning is fully automated).
The major conditions to be aware of are:
provisioning of a new cluster will take up to 3 working days from receipt of the request (generally less than 2);
SoftLayer clusters are not eligible for a free trial;
nodes included in, or added to, a SoftLayer cluster must be purchased for a minimum of three months (billed monthly in arrears).
During the beta period, SLA penalties will not apply to SoftLayer clusters. Exceptions may be negotiated via [email protected].
If you are interested in evaluating Instaclustr's offering on IBM SoftLayer please contact [email protected] who will be happyto discussoptions for undertaking evaluations given these conditions.
View Article