Github Enterprise Setup (AWS)

Tool: Github Enterprise

Target Environment / Platform: AWS

Deployment type: PaaS (Github provided machine image — AMI)

Use case: Provide developers with a Source Code Management (SCM) tool.

Synopsis:

We first looked at utilizing github.com as our SCM to allow us for SaaS based Github offering. However, as of this writing, Github.com does not provide ActiveDirectory (SSO) integration – which meant users had to use local github.com userIDs that is why we decided to go with github enterprise.

Deployment on AWS was a breeze — their is an already existing AMI on AWS Marketplace that can be used to deploy+install Github Enterprise:

https://help.github.com/enterprise/2.9/admin/guides/installation/installing-github-enterprise-on-aws/

Architecture:

3 servers (ec2 servers) in total: 1 github master, 1 github server in standby mode, 1 backup server.

1 AWS RDS database (PostgreSQL) that is cross-replicated/Highly-available (HA).

NOTE: we deployed our github to a ‘shared VPC’ which is turn ‘peered’ to our DEV and PROD VPC — this allows github enterprise to be shared between the two environments.

NOTE 2: We also have a scheduled AWS job (System Manager – Windows Maintenance) that fires off the backup utility on the backup server every 12 hours to create a backup and to upload it to S3 bucket.

Note 3: Our S3 bucket is also cross-region replicated (and encrypted) on to another S3 bucket and their is a lifecycle policy to ‘expire/delete’ objects.

NOTE 4: On our backup server (CentOS), I attached a separate ‘EBS’ volume to use as a backup volume — this can be encrypted additionally at EBS level.

NOTE 5: github backup utils can be found at: https://github.com/github/backup-utils

NOTE 6: for monitoring, we simply used CloudWatch alarm to view the diskspace and connectivity.

NOTE 7: The SSL certificate was a pain for the DNS admin guy to configure — he mentioned that the github certificate requirements was different — he ended up talking to the github enterprise support and getting them to create the cert and to pass it to us (if I remember correctly, the problem was github enterprise SSL certificate installation requires ‘PEM’ file).

OK, now for some admin notes (majority is copied/pasted from github guide):

My future plans for github Enterprise:

– Create terraform script to deploy github enterprise to EC2 server and github backup server.

– Create Ansible playbook to configure github enterprise servers (especially the backup server).

– Build out more monitoring with Splunk.

– Currently, we are letting developers define and utilize their own gitflow / workflows on git — monitor this and action on this strategy as needed.

admin changes

1. Git LFS – disabled

2. users can create organizations – disabled.

3. GitHub Pages – disabled — decision taken after discussing with clients.

messages

1. sign in message:

Welcome to XXXX’s very own internally hosted Github Enterprise!

Managed and Supported by: <xxxx@yyyy.com>

2. suspended user message:

Your account is suspended. Please try to login again – once you have successfully logged in using your XXXX credentials, you will be un-suspended automatically.

If you still encounter issues, please contact: <xxxx@yyyy.com> with subject (**Github: User suspended**) _and_ provide your username in the email.

setup console:

new setup page password: ********* https://xxxxxx:8443/setup

email SMTP server: mail.xxxx.yyyy.com | port xx

licensing:

You must have a GitHub Enterprise license file. To download an existing license file or request a trial license, visit enterprise.github.com.

Hardware

Based on your seat count, github recommends this hardware configuration:

Seats vCPUs Memory Attached Storage Root Storage

10-500 2 16 GB 100 GB 80 GB

500-3000 4 32 GB 250 GB 80 GB

Supported Instance Types

GitHub Enterprise is supported on the following EC2 instance types:

m3.xlarge m3.2xlarge m4.xlarge m4.2xlarge c3.2xlarge c3.4xlarge c3.8xlarge c4.2xlarge c4.4xlarge c4.8xlarge r3.large r3.xlarge r3.2xlarge r3.4xlarge r3.8xlarge

Recommended instance types

Based on your seat count, github recommend these instance types:

Seat Range Recommended Type

10 – 500 r3.large

500 – 3000 r3.xlarge

3000 – 5000 r3.2xlarge

5000 – 8000 r3.4xlarge

8000 – 10000+ r3.8xlarge

Creating a Security Group

If you’re setting up your AMI for the first time, you will need to create a security group. From the EC2 Management Console, create an entry for each port in the table below:

Port | Service | Description

22 | SSH | Git over SSH access. Clone, fetch, and push operations to public/private repositories supported.

25 | SMTP | SMTP with encryption (STARTTLS) support.

80 | HTTP | Web application access. Note that all requests are redirected to the HTTPS port when SSL is enabled.

122 | SSH | Instance shell access. Note that the default SSH port (22) is dedicated to application git+ssh network traffic.

161 | /UDP | SNMP Required for network monitoring protocol operation.

443 | HTTPS | Web application and Git over HTTPS access.

1194 | /UDP | VPN Secure replication network tunnel in High Availability configuration.

8080 |  HTTP | Plain-text web based Management Console. Not required unless SSL is disabled manually

8443 | HTTPS | Secure web based Management Console. Required for basic installation and configuration.

9418 | Git | Simple Git protocol port. Clone and fetch operations to public repositories only. Unencrypted network communication.

Enabling EBS encryption

An encrypted data volume provides an extra level of security by ensuring that any data you write to your instance is protected. There’s a slight peformance impact when using encrypted disks. If you decide to encrypt your volume, we strongly recommend doing so before starting your instance for the first time. For more information, see the guide on Amazon EBS encryption.

If you decide to enable encryption after you’ve configured your instance, you will need to migrate your data to the encrypted volume, which will incur some downtime for your users.

Administrative shell access limitations

Administrative shell access is permitted for troubleshooting and performing documented operations procedures only. Modifying system and application files, running programs, or installing unsupported software packages may void your support contract. Please contact GitHub Enterprise Support if you have a question about the activities allowed by your support contract.

Maintenance mode

Some standard maintenance procedures, such as upgrading your GitHub Enterprise instance or restoring backups, require that the GitHub Enterprise instance be taken offline for normal use, or “put into maintenance mode”.

The following types of operations require a maintenance window:

Upgrading to a new version of GitHub Enterprise.

Increasing CPU, memory, or storage resources allocated to the virtual machine.

Migrating data from one virtual machine to another.

Restoring data from a GitHub Enterprise Backup Utilities snapshot.

Troubleshooting certain types of critical application issues.

 

While you can choose to put an instance into maintenance mode immediately, we recommend you schedule a maintenance window for 30 minutes, one, or two hours in the future in order to give users time to prepare.

Additionally, you can use the Management Console API to schedule maintenance for different times or dates.

Note: The https://<hostname>/status url will return status code 503 (Service Unavailable) when the appliance is in maintenance mode. To enable or schedule maintenance mode, visit the Management Console Maintenance page at https://<host>/setup/maintenance

During the actual maintenance window, all normal HTTP and Git access is refused. Visiting the site in a browser results in displaying of a maintenance page.

Git fetch, clone, and push operations are also rejected with an error message indicating that the site is temporarily unavailable.

privacy / security:

Warning: If you add an image attachment to a pull request or issue comment, anyone can view the anonymized image URL without authentication, even if the pull request is in a private repository, or if private mode is enabled. To keep sensitive images private, serve them from a private network or server that requires authentication.

Note: New repositories created through the API will still be publicly visible by default because these default visibility settings only apply to new repositories created on the “Create a new repository” page.

What is GitHub Pages? GitHub Pages is a static site hosting service.

GitHub Pages is designed to host your personal, organization, or project pages directly from a GitHub repository. To learn more about the different types of GitHub Pages sites, see “User, organization, and project pages.”

You can create and publish GitHub Pages online using the Jekyll Theme Chooser. If you prefer to work locally, you can use GitHub Desktop or the command line.

GitHub Pages is a static site hosting service and doesn’t support server-side code such as, PHP, Ruby, or Python.

 

no-reply email: noreply@xxxxx.com

support email: xxxx@yyyy.com 

disable ‘Enable SNMP’

built-in firewall rules:

https://help.github.com/enterprise/2.9/admin/guides/installation/configuring-built-in-firewall-rules/ ^ look up for whenever you need to block things such as excessive SSH connections, blocking git’s port, etc.

firewall used by the gitserver is Ubuntu’s UFW (uncomplicated firewall).

network ports:

https://help.github.com/enterprise/2.9/admin/guides/installation/network-ports-to-open/

Troubleshooting common scenarios

Scenario

High CPU usage

Possible cause(s)

VM contention from other services or programs running on the same host.

Recommendations If possible, reconfigure other services or programs to use fewer CPU resources. To increase total CPU resources for the VM, see “Increasing CPU or Memory Resources”.

Scenario

High memory usage

Possible cause(s)

VM contention from other services or programs running on the same host.

Recommendations

If possible, reconfigure other services or programs to use less memory. To increase the total memory available on the VM, follow the steps for your platform in “Increasing CPU or Memory Resources”.

Scenario

Low disk space availability

Possible cause(s)

Large binaries or log files consuming disk space.

Recommendations

If possible, host large binaries on a separate server, and compress or archive log files. If necessary, increase disk space on the VM by following the steps for your platform in “Increasing storage capacity”.

Scenario

Higher than usual response times

Possible cause(s)

Often caused by one of the above issues.

Recommendations

Identify and fix the underlying issues. If response times remain high, contact GitHub Enterprise Support.

Scenario

Elevated error rates

Possible cause(s)

Software issues.

Recommendations

Contact GitHub Enterprise Support and include your Support Bundle.

Note: Because regularly polling your GitHub Enterprise instance with continuous integration (CI) or build servers can effectively cause a Denial of Service attack that results in one or more of the above problems, we recommend using webhooks to push updates. For more information, see “About webhooks”.

External monitoring and statistics collection

GitHub Enterprise includes support for monitoring basic system resources via two popular monitoring and statistics collection protocols:

SNMP – A widely supported method of monitoring network devices and servers. SNMP is disabled by default but can be enabled via the Management Console settings page at https://<hostname>/setup/settings. You will also need to make sure UDP port 161 is open and reachable from your network management station. See “Monitoring using SNMP” for more information.

collectd – An open source statistics collection and reporting daemon with built-in support for writing to RRD files. Statistics on CPU utilization, memory and disk consumption, network interface traffic and errors, and system load can be forwarded to an external collectd server where graphs, analysis, and alerting may be configured using a wide range of available tools and plugins. To enable and configure collectd forwarding, see “Configuring collectd”.

Both SNMP and collectd forwarding are suitable for use in monitoring basic system resource use. Additionally, the monitoring tools built into underlying virtualization platforms, Amazon CloudWatch and VMware vSphere Monitoring, may also be used for basic monitoring and alerting of system resources.

Recommended alerting thresholds

Storage

Monitoring of both the root and user storage devices should be configured with values that allow for plenty of time to respond when available disk space runs low. We recommend the following alerting thresholds as a starting point:

Severity

Warning

Threshold

Disk use exceeds 70% of total available.

Severity

Critical

Threshold

Disk use exceeds 90% of total available.

It may be necessary to adjust these values based on the total amount of storage allocated, historical growth patterns, and expected time to respond. If possible, we recommend over-allocating storage resources to allow for growth over time and to prevent the need for maintenance/downtime required to allocate additional physical storage.

CPU and load average

Alerting on CPU utilization can be tricky due to normal fluctuations in CPU use created by resource intense Git operations. Temporary spikes are an expected pattern but prolonged heavy CPU utilization means that the instance is likely under-provisioned. At the very least, we recommend monitoring the 15 minute system load average for values nearing or exceeding the number of CPU cores allocated to the virtual machine.

Severity

Warning

Threshold

15 minute load average exceeds 1x CPU cores.

Severity

Critical

Threshold

15 minute load average exceeds 2x CPU cores.

For example, a virtual machine with 4 vCPUs would reflect these thresholds:

Severity

Warning

Threshold

15 minute load average exceeds 4

Severity

Critical

Threshold

15 minute load average exceeds 8

 

While the use of system load average as a single metric for CPU utilization is somewhat simplistic and can also indicate issues with the IO subsystem, sustained system load exceeding these values is a good indication that application performance is suffering from lack of compute resources and that increasing the number and/or speed of CPU cores will improve application responsiveness.

Ideally, user, system, and nice CPU utilization is available in graph form to get a clearer picture of where CPU is being consumed and so that historical patterns in utilization can be weighed in decision making about allocating additional resources.

It’s also important to monitor virtualization “steal” time to ensure that other virtual machines running on the same host system are not starving the instance of compute resources.

Memory

The amount of physical memory allocated to your GitHub Enterprise instance can have a large impact on overall performance and application responsiveness. The system is designed to make heavy use of kernel disk cache to speed many types of Git operations. As such, we recommend that the normal RSS working set fit within 50% of total available RAM at peak usage.

Severity

Warning

Threshold

Sustained RSS usage exceeds 50% of total available memory.

Severity

Critical

Threshold

Sustained RSS usage exceeds 70% of total available memory.

 

It’s also important to note that your GitHub Enterprise instance does not make use of a swap partition for low memory conditions. If memory is exhausted, the kernel OOM killer will attempt to free memory resources by forcibly killing RAM heavy application processes, which could result in disruption of service. For these reasons, we recommend allocating significantly more memory to the virtual machine than is required in the normal course of operations.

Github HA commands and strategy:

https://help.github.com/enterprise/2.9/admin/guides/installation/high-availability-configuration/

About Webhooks

Webhooks provide a way for notifications to be delivered to an external web server whenever certain actions occur on a repository or organization.

Tip: Only members with owner privileges for an organization or admin privileges for a repository can manage webhooks for an organization. For more information, see “Permission levels for an organization.” Webhooks can be triggered whenever a variety of actions are performed on a repository or an organization. For example, you can configure a webhook to execute whenever:

A repository is pushed to A pull request is opened A GitHub Pages site is built A new member is added to a team Using the GitHub Enterprise API, you can make these webhooks update an external issue tracker, trigger CI builds, update a backup mirror, or even deploy to your production server.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s