Sign in

Automation of Extracting JIRA Issues and Loading to Hive Table Using Python and Shell Script

Orchestrate End to End flow of Extracting JIRA issues with custom fields using Python and loading them to Hive using Shell Script.

Requirements:

Pull All Issues From JIRA for Project Specified
Specify Fields Mapping
CSV and Column Re-Order
Create External Table
Cron Scheduler
from jira import JIRAimport pandas as pd#Specify your company base_url for JIRA BoardURL=‘https://xxx.jira.com’#authenticate JIRA using basic authentication Username and Passwordjira = JIRA(server=URL,basic_auth=(‘username’,’password’))#PULL all the avaiable issues in JIRA for the project specified.blockSize=1000blockNo=0#initialize the list for adding the issuesjiraIssues=[]while True:startIDX=blockNo*blockSizeprojectAllTkts=jira.search_issues(‘project=<project_code>’,startIDX,blockSize)if len(projectAllTkts) == 0:breakblockNo +=1for tkt in projectAllTkts:jiraIssues.append(tkt)#to display the count of issues extractedprint(len(jiraIssues)tktsDF=pd.DataFrame()for tkt in jiraIssues:#specify dictionary to extract required fields from JIRA boardfieldsDict={‘PROJECT_CODE’: tkt.fields.project.key, #4 CHARS Project Code in Key‘JIRA_TKT’: tkt.key, #4 CHARS PROJECT CODE + NUMBER‘ISSUE_TYPE’: tkt.fields.issuetype.name, #Bug,Story etc‘ASSIGNEE’: tkt.fields.assignee,‘CREATOR’: tkt.fields.creator,‘CREATION_DATE’: tkt.fields.created, #yyyy-mm-ddTHH:MM:SS.SSS+0000 Format‘REPORTER’: tkt.fields.reporter,‘SUMMARY’: tkt.fields.summary,‘PRIORITY’: tkt.fields.priority.name,‘STATUS’: tkt.fields.status.name,‘LAST_UPDATED_DATE’: tkt.fields.updated #yyyy-mm-ddTHH:MM:SS.SSS+0000 Format}tktsDF=tkts.append(fieldsDict,ignore_index=True)#use Columns to organize in required order if needed#colORDER=[‘CREATED_DATE’,’PROJECT_CODE’,’JIRA_TKT’,’ISSUE_TYPE’,’STATUS’,’PRIORITY’,’CREATOR’,’REPORTER’,’ASSIGNEE’,’SUMMARY’,’LAST_UPDATED_DATE’]#tktsDF=tktsDF.reindex(columns=colORDER)#convert CREATED_DATE and LAST_UPDATED_DATE to DATE FormattktsDF[‘CREATED_DATE’]=tktsDF[‘CREATED_DATE’].appy(lambda field:pd.Timestamp(field).strftime(‘%Y-%m-%d’))tktsDF[‘LAST_UPDATED_DATE’]=tktsDF[‘LAST_UPDATED_DATE’].appy(lambda field:pd.Timestamp(field).strftime(‘%Y-%m-%d’))tktDF.to_csv(“all_jira_tkts.csv”,encoding=’utf-8',header=True,index=False,sep=’|’)
#!/bin/ksh
TODAY=`date +’%Y%m%d’`
echo “starting shell script ${0} on ${TODAY}”#Remove the Previous run filesrm /output/all_jira_tkts.csv#activate conda environmentexport PATH=”/app/minconda3/bin:$PATH”source activate py_env37#start of python scriptpython <python_script.py>if [ $? != 0 ] ; thenecho “ALERT: Encountered Error in Python Script Execution”exit 1elseecho “SUCESS: Python Script Compelted”fi#hive command to drop existing tablehive -S -e “DROP TABLE IF EXISTS schema.tbl_name;”echo “Dropped Table “#Remove HDFS all_jira_tkts.csv file present in previous run”hdfs dfs -rm -skipTrash hdfs://<cluster>/user/jira/all_jira_tkts.csv#copy created csv file from local to HDFSchmod 777 /output/all_jira_tkts.csvhdfs dfs -copyFromLocal /output/all_jira_tkts.csv hdfs://<cluster>/user/jira#verify if file copied sucessfullyhadoop fs -test -f hdfs://<cluster>/user/jira/all_jira_tkts.csvif [ $? != 0 ] ; thenecho “ALERT: Encountered Error in previous command csv File Not Present”exit 2elseecho “SUCESS: Copied File from Local to HDFS “fi#Create External Table by specifying the schema for all_jira_tkts.csv csv filehive -S -e “CREATE EXTERNAL TABLE IF NOT EXISTS schema.tbl_name (created_date date,project_code string,jira_issue string,issue_type string,status string,priority string,creator string,summary string,last_updated_date date)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ‘|’STORED AS INPUTFORMAT ‘org.apache.hadoop.mapred.TextInputFormat’OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’LOCATION ‘hdfs://<cluster>/user/jira’tblproperties (‘skip.header.line.count’=’1');if [ $? != 0 ] ; thenecho “ALERT: Encountered Error in creating external table”exit 3elseecho “SUCESS: Created External Table “fiecho “Completed Execution of Script ${0}”