Jan M. Kortenkamp

I am a MS Information Systems - Data Analytics graduate currently working at the Rudolf Steiner Fellowship Foundation.

About

Review my Education, Skills and Experience

Jan M. Kortenkamp

Hi, my name is Jan Mauritz Kortenkamp.
I graduated with a MS in Information Systems - Data Analytics from Baruch College. My undergraduate degree is a Bachelor's of Business Administration, specializing in Operations Management.
Through my academic journey I have obtained a robust skillset in business process analysis and -optimization, enabling me to streamline operations and enhance efficiency. Complementing my analytical expertise, I am proficient in several programming languages, which empowers me to leverage data-driven insights and develop innovative solutions. I am passionate about integrating technology and business strategy to drive impactful results.

If you would like to know more, please see my full resume below!

  • Place of Study:Baruch College
  • Degrees:

  • Master of Science

  • Information Systems - Data Analytics
  • Bachelor of Business Administration

  • Operations Management - Consulting

Education

  • 2023 - 2024

    Baruch College

    MS in Information Systems - Data Analytics
    GPA: 3.6, cum laude
  • 2021 - 2023

    Baruch College

    BBA in Operations Management - Consulting
    GPA: 3.7, cum laude
  • 2019 - 2021

    Rockland Community College

    Associate's Degree - Business Administration

Experience

  • 2019 - present

    2016 - 2017

    Rudolf Steiner Fellowship Foundation

    General Management

Skills

  • Data Anlysis
  • Natural Language Processing
  • Data Visualization
  • Regression and Classification Modelling
  • Object Oriented Programming
  • Database Management
  • Business Process Analysis

Coding and Software

  • Python, highly proficient
  • C++ and VBA, moderately proficient
  • HTML and CSS, moderately proficient
  • SQL, somewhat proficient
  • Hardware and IT infrastructure, highly proficient
  • Microsoft 365 including Access and Power Automate, highly proficient

Other Proficiencies

  • Native German Speaker
  • Fluent in Italian, B2 CELI 3
  • Pearson LCCI Certificate
My Projects

Explore my Solutions

Contact

Get in Touch

fmovies
embedgooglemap.net
Please Fill Required Fields

Review Sentiment Analysis and Verification Status Classification


The below paper is the final term project for the fall 2024 CIS 9665 - Applied Natural Language Processing course at Baruch College.

Using a dataset of categorized Amazon reviews hosted by the University of California San Diego Rady School of Management, our team created a binary classification model to predict customer review verification status. Given the increasing importance of online marketplaces, we sought to create a solution with a wide range of applications.
Applying several natural language processing techniques such as sentiment analysis as well as classifying review phrasing and -contents, we built a Random Forest classification model with 86% accuracy. Since the model was created on a subset of the data, the preprocessing steps and model creation can be easily reproduced and applied in academic and commercial settings.


Team member names have been redacted for privacy considerations.

Citibike Data Mining and Visualization

As an avid CitiBike rider, this project seeks to examine CitiBike ridership statistics and key metrics around cycling in New York City. The main source of information is the public CitiBike Archive for ridership statistics. Additional sources are the NYC public health data portal for Air Quality- and Commuting Method Data. The resulting Tableau Story can be found above while the Python code used to datamine the CitiBike archive is displayed below.

Inspired by this vsiualization created by Ben Oldenburg.

All operations were performed in a Google Colab environment

To import and manage the data, the following libraries are used:


#data import and folder management libraries
import requests
import os as os
import shutil as shutil
import zipfile
import re
from google.colab import files

#data libraries
import pandas as pd

#url error handling
import requests

Data Import into Local Directory

The following functions are created to import and unzip the archive data:


#import files stored in Citibike archive
def ImportFiles(year):
	try:
	url = f"https://s3.amazonaws.com/tripdata/20{year}-citibike-tripdata.zip"
	!wget $url
	print(f'Imported: {url}\n')
	except:
	print(f'Unable to import: {url}\n')

#unzip files into local directory
def UnzipFiles(year):
	try:
	if year < 20:
		location = f"/content/20{year}-citibike-tripdata.zip"
		!unzip $location
	else:
		location = f"/content/20{year}-citibike-tripdata.zip"
		!unzip $location
		for month in range(1, 13):
		#the citiBike archive omits the leading 0 for some months without a clear pattern so different
		#filenames need to be accounted for
		if month < 10:
			sublocation = f"/content/20{year}-citibike-tripdata/20{year}0{month}-citibike-tripdata.zip"
			!unzip $sublocation
			os.remove(sublocation)
		else:
			sublocation = f"/content/20{year}-citibike-tripdata/20{year}{month}-citibike-tripdata.zip"
			!unzip $sublocation
			os.remove(sublocation)
	except:
	print(f'Unable to unzip: {location}\n')

#remove zip files from contents
def RemoveZips(year):
	try:
	location = f"/content/20{year}-citibike-tripdata.zip"
	os.remove(location)
	print(f'Removed: {location}\n')
	except:
	print(f'Unable to remove: {location}\n')

#remove incorrectly set up directories 
#only needed if specific directories were created incorrectly
#not required for normal program flow
def RemoveDirectory(year):
	try:
		shutil.rmtree(f'/content/20{year}-citibike-tripdata')
		print("Directory deleted successfully.")
	except FileNotFoundError:
		print("Directory not found.")


The import functions are then called for specific year ranges:
E.g. range(13,25) will import the entire archive while range(23,24) will only import 2023 data. Depending on the intended analysis or computational limitations, a importing a subset of data might be preferable.


for i in range(13,24):
	ImportFiles(i)

for i in range(13,24):
	UnzipFiles(i)

for i in range(13,24):
	RemoveZips(i)

Extraction of Monthly Trips by Year

To extract the total number of trips for any given year, the following function is created:


#this function takes a two digit year in the function call and extracts the total trips by month for the given year
#due to changes and errors in file names and paths, several edge cases are included in the function
def MonthlyTrips(year):
	#for simplified loops and clearer structure the following lists are created, "months" takes into account the CitiBike Archive naming convention  
	months = ['1_January', '2_February', '3_March', '4_April', '5_May', '6_June', '7_July', '8_August', '9_September', '10_October', '11_November', '12_December']
	monthNumber = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']
	monthName = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

	#empty list to store extracted data
	data = []

	#loop through each month, month number, and month name
	for month, month_num, month_name in zip(months, monthNumber, monthName):
		monthlyTotal = 0  #initialize monthly total to accumulate trip counts

		#load files for each month (with index j from 1 to 9)
		for j in range(1, 10):
			try:
			#first possible file path format
			file = f'/content/20{year}-citibike-tripdata/{month}/20{year}{month_num}-citibike-tripdata_{j}.csv'

			#check if the file exists in the directory --> different naming conventions can exist within the same year 
			if os.path.exists(file):
				df = pd.read_csv(file)
				monthlyTotal += len(df)  #add the number of rows to the monthly total
			elif os.path.exists(f'/content/20{year}-citibike-tripdata/{month}/20{year}{month_num}-citibike-tripdata.csv_{j}.csv'):
				firstAlternative = f'/content/20{year}-citibike-tripdata/{month}/20{year}{month_num}-citibike-tripdata.csv_{j}.csv'
				df = pd.read_csv(firstAlternative, low_memory=False) #low_memory = False to address differnt types in same df column
				monthlyTotal += len(df)  
			elif os.path.exists(f'/content/20{year}{month_num}-citibike-tripdata_{j}.csv'):
				secondAlternative = f'/content/20{year}{month_num}-citibike-tripdata_{j}.csv'
				df = pd.read_csv(secondAlternative, low_memory=False)
				monthlyTotal += len(df)
			except FileNotFoundError:           #if neither path format is found, print warning and continue to the next iteration
			print(f'Unable to load: {file}')
			continue
		#append year, month name, and total trips for each month, taking year from function call
		data.append([year, month_name, monthlyTotal])

	#create a DataFrame with Year, Month, and Trips columns
	totals = pd.DataFrame(data, columns=['Year', 'Month', 'Trips'])

	#set the Year as the index
	totals.set_index('Year', inplace=True)

	#return given year as df
	return totals

Next the function is called and the outputs are verified:


#create empty dataframe to be appended
tripsByYear = pd.DataFrame()

#loop through given range of years 
for i in range(13, 25):
	totalTrips = MonthlyTrips(i) #run function with year as the argument 
	tripsByYear = pd.concat([tripsByYear, totalTrips])

#optional: add 2000 to years in index column since 2 digit year was used in funct. call 
tripsByYear.index = tripsByYear.index + 2000

#verify results
for i in range(2012, 2025):
	print(tripsByYear[tripsByYear.index == i])

The following lines convert the extracted data into a .xlsx file and prompt the download:


tripsByYear.to_excel('NYC Total Monthly Citbike Trips 2013-23.xlsx', index=True)
if input('Type y to confirm download: ') == 'y':
	files.download('NYC Total Monthly Citbike Trips 2013-23.xlsx')
	print("File downloaded successfully!")
else:
	print("Export cancelled.")

Extract Station Locations by Year

To map the expansion of stations by year, the following function is created and called for a given year. It loops through all existing .csv files in the specified year range and extracts all stations with their longitude and latitude.


def UniqueStations(year):
	#month and month number for September
	month = '9_September'
	month_num = '09'

	#initialize a DataFrame to store data
	uniqueStatons = pd.DataFrame()

	#loop through all possible files for the month of September (looping through 1 to 9 .csv files)
	for j in range(1, 10):
		try:
			#generate first possible file path format
			file = f'/content/20{year}-citibike-tripdata/{month}/20{year}{month_num}-citibike-tripdata_{j}.csv'

			#check if the file exists and read it
			if os.path.exists(file):
				df = pd.read_csv(file)
			elif os.path.exists(f'/content/20{year}-citibike-tripdata/{month}/20{year}{month_num}-citibike-tripdata.csv_{j}.csv'):
				file = f'/content/20{year}-citibike-tripdata/{month}/20{year}{month_num}-citibike-tripdata.csv_{j}.csv'
				df = pd.read_csv(file, low_memory=False)
			elif os.path.exists(f'/content/20{year}{month_num}-citibike-tripdata_{j}.csv'):
				file = f'/content/20{year}{month_num}-citibike-tripdata_{j}.csv'
				df = pd.read_csv(file, low_memory=False)
			else:
				continue  #if file does not exist, move to the next iteration

			#the naming convention for columns was changed after 2020, kept columns need to be adjusted accordingly
			#keep only the necessary columns and use .copy() to avoid SettingWithCopyWarning
			if year < 20:
				dfUnique = df[['start station name', 'start station latitude', 'start station longitude']].copy()
			else:
				dfUnique = df[['start_station_name', 'start_lat', 'start_lng']].copy()

			#assign the 'Year' column using .loc
			dfUnique.loc[:, 'Year'] = year + 2000 #add 2000 since only the last two numbers are used as function argument
			dfUnique.set_index('Year', inplace=True)

			#rename the columns
			dfUnique.columns = ['Station Name', 'Latitude', 'Longitude']

			# Append the unique data to uniqueStatons
			uniqueStatons = pd.concat([uniqueStatons, dfUnique])

		except FileNotFoundError:
			print(f'Unable to load: {file}')
			continue

	#drop duplicates based on 'Station Name', keeping the first instance of each station for the year
	uniqueStatons = uniqueStatons.drop_duplicates(subset=['Station Name'], keep='first')

	return uniqueStatons

The function is then called and the station data verified:


stationsByYear = pd.DataFrame()

#call function for a custom year range and add to a dataframe
for i in range (13, 24):
	yearData = UniqueStations(i)
	stationsByYear = pd.concat([stationsByYear, yearData])

#verify extracted data
for i in range(2013, 2024):
	year_data = stationsByYear[stationsByYear.index == i]
	if not year_data.empty:  #check if the DataFrame is empty
		print(f"\n\nYear: {i}")
		print(year_data.value_counts())
	else:
		print(f"No data found for year: {i}")

stationsByYear.shape

Prompt .xlsx export of data:


stationsByYear.to_excel('NYC Unique Citbike Stations 2013-23.xlsx', index=True)

if input('Type y to confirm download: ') == 'y':
	files.download('NYC Unique Citbike Stations 2013-23.xlsx')
	print("File downloaded successfully!")
else:
	print("Export cancelled.")

Analysis of a Single Month

Since the functions for the extraction of data for entire years can be computationally demanding and time consuming, investigating single months or years for specific features might be more convenient. The following code showcases extracting data for the top ten starting stations in the month of August. Out of a total of close to four million trips a sheet of more than 100,000 trips is exported allowing for investigation of specific trends or features. The code can be easily adapted to investigate other time periods or features of the dataset.


#add data from all four files for August to single dataframe
for i in range(1,5):
	df2 = pd.read_csv(f'/content/202308-citibike-tripdata_{i}.csv', low_memory=False)
	august23 = pd.concat([august23, df2])

#verify outputs
august23.sample(5)

august23.shape

august23['start_station_  id'].value_counts()

Filter data by top ten starting stations:


top10Stations = august23['start_station_id'].value_counts().head(10).index
top10Stations = august23[august23['start_station_id'].isin(top10Stations)]
top10Stations

Export data to .xlsx:


top10Stations.to_excel('August 2023 Top 10 CitiBike Stations.xlsx', index=False)

Medication Counting Program

This program can be used with a high accuracy scale to count pills, pellets, capsules etc. By averaging the weight per item for n number of three items against the total weight of items, the program calculates the count of items to an accuracy of +-3 per 100. This is especially useful for larger quantities of small items.

Front-end and downloadable .exe still pending.

#include 
#include 
#include 
#include 
using namespace std;

bool ContProg();
/*Precondition: User is prompted for choice to continue program.
Postcondition: true/false return depending on user input.*/

int WeightToCount();
/*Precondition: Weight of all entered pills is put in.
Postcondition: Returns count of pills.*/

string MedInput();
/*Precondition: Empty string
Postcondition: Returns count of pills.*/


bool ContProg() {
	string yesNoResponse;

	//Prompt user input.
	cin >> yesNoResponse;

	//Run while loop until correct input is given.
	while ((yesNoResponse != "yes") && (yesNoResponse != "no")) {
		cout << "Incorrect response, please enter \'yes\' or \'no\': ";
		cin >> yesNoResponse;
	}

	/* Remove potential remaining newline characters potentially causing
	issues with the input if the main function runs again.*/
	cin.ignore();

	//Return condition based on input.
	if (yesNoResponse == "yes") {
		return true;
	}
	else {
		return false;
	}
}

int WeightToCount() {
	int counter = 0;
	double input, threePillWeight = 0, pillCount, avgWeight,
		totalWeight;

	do {
		cout << "Enter weight of three pills: ";
		while (!(cin >> input)) {
			cout << "Invalid input. Please enter a decimal or integer value: ";
			cin.clear();
			cin.ignore(numeric_limits::max(), '\n');
		}
		threePillWeight += input;
		counter++;


		if (counter < 2) {
			cout << "You have entered " << counter
				<< " weight. Type \'yes\' to enter another weight?\n";
		}
		else {
			cout << "You have entered " << counter
				<< " weights. Type \'yes\' to enter another weight?\n";
		}
	} while (ContProg());

	cout << "Enter total weight of pills in bottle: ";
	cin >> totalWeight;

	avgWeight = threePillWeight / counter;
	pillCount = (totalWeight / avgWeight) * 3;

	return pillCount;
}

string MedInput() {
	string input;

	cout << "Enter medication name: ";
	getline(cin, input);

	return input;
}



int main() {
	//Initialize empty vectors
	vector medName;
	vector medCount;
	int count = 0;
	char end = 'n';

	do
	{
		medName.push_back(MedInput());
		medCount.push_back(WeightToCount());
		count++;

		if (count < 2) {
			cout << "You have entered " << count
				<< " medication. Type \'yes\' to enter another medication?\n";
		}
		else {
			cout << "You have entered " << count
				<< " medications. Type \'yes\' to enter another medication?\n";
		} //Correct input is evaluated by function call in while() call
	} while (ContProg());

	for (int i = 0; i < medCount.size(); i++) {
		cout << medName[i] << " has " << medCount[i] <<
			" pills left.\n";
	}

	do {
		cout << "Type 'yes' to terminate the program: \n";
		cin >> end;
	} while (end != 'y');

	cout << "Thank you for using this program!\n\n";

	return 0;
}

Business Strategy Recommendation for Pfizer Inc.


The below presentation is the final term project for the spring 2023 interdisciplinary BPL 5100 - Business Policy course at Baruch College.

With the goal of solving strategic issues for a specific US company, our team examined Pfizer Inc.'s 2023 corporate finaces and product pipeline and recommeded the acquisition of a smaller biotech firm. Given the extensive cash reserves resulting from COVID-19 vaccine revenues as well as the pending expiration of several high-revenue drugs, we recommeded the investment in either Seagen Inc. or BeiGene Inc. for stratgic positioning. Both firms had extensive portfolios with several oncology pharmaceuticals in the later development stages. As such they offered to complement Pfizer's strategic position perfectly.

In December of 2023, Pfizer completed the acquisition of Seagen Inc.


Team member names have been redacted for privacy considerations.

Operations Management Case Solution


The below presentation is the final term project for the spring 2023 OPM 4500 - Cases in Operations Management capstone course at Baruch College.

The "Bottlenecks and batching in dragon fruit jam production" case published by the Singapore Management University, outlines the challenges faces by a small artisanal jam producer that has developed the need to produce at scale, while maintaining product differentiation factors and high quality.
By reducing variability, batching and creating buffers, our team successfully identified removed the bottleneck and improved service quality overall.


Team member names have been redacted for privacy considerations.