Introdução¶
A Fórmula 1 é a principal categoria do automobilismo mundial, os 20 melhores pilotos do planeta disputam pelo título de Campeão do Mundo anualmente. A F1 (como também é conhecida), é de interesse global e, portanto, prever seus resultados é do interesse de milhares de espectadores espalhados pelo planeta; aqui, propõe-se a utilização do resultado de corridas anteriores para prever a corrida seguinte, o que representa uma característica importante da F1: a diferença entre os carros.
A Fórmula 1 não é uma categoria de especificação (como são chamadas as categorias em que os carros são, para todos os propósitos, idênticos, como a Fórmula 2 e a Fórmula 3, por exemplo), assim, não são poucos os casos em que um determinado carro é objetivamente melhor que o de outra equipe, assim, partindo do pressuposto que um piloto corra todas as corridas de uma temporada por uma mesma equipe, é possível intuir que os seus resultado fiquem sempre em torno de uma mesma faixa (a depender de atualizações que ocorrem durante a temporada, que já transformaram equipes que estavam no fundo do pelotão em ponteiros em questão de algumas corridas).
Desta forma, serão utilizados para previsão o resultado das corridas anteriores bem como a sua média (com tamanhos a serem definidos no momento da seleção de variáveis), eventualmente, serão adicionados dados do grid de largada, classificação, treinos livres e resultados de corridas sprint, sempre verificando se as variáveis realmente auxiliam o modelo.
Bibliotecas¶
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import ruptures as rpt
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.linear_model import RidgeClassifier
from sklearn.naive_bayes import CategoricalNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neighbors import RadiusNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.svm import LinearSVC
from sklearn.svm import NuSVC
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
Funções¶
def changepoint(data, x, y): # a série tem o "patamar novo" do ponto detectado em diante
model = "rbf"
algo = rpt.Pelt(model=model).fit(data[y].values)
result = algo.predict(pen = 2 * np.log(len(data[x])))
fig = px.line(data, x=x, y=y)
fig.add_vline(x=data.iloc[0][x])
for resul in result:
fig.add_vline(x=data.iloc[resul-1][x])
fig.show()
def plot(pred, rf, title):
for i in pred.values:
y_pred = rf.predict_proba([i]) # prevê
print(np.dot(list(range(1, len(y_pred[0]))), y_pred[0][:-1]))
fig = go.Figure()
l = list(range(1, len(y_pred[0].tolist())))
fig.add_trace(go.Bar(
x=l,
y=y_pred[0],
text=np.round(y_pred[0] * 100, decimals = 2).astype(str)
))
fig.update_traces(textposition='outside')
fig.update_layout(title=title,
xaxis_title='Position',
yaxis_title='Percentage',
xaxis_type='category',
autosize=False,
height=700,
width=1000
)
fig.show()# plota
Pré-processamento¶
# importa a base
# cria coluna com a corrida anterior (cria bases diferentes indo até 10 corridas anteriores)
# seleciona o ano, a rodada, a equipe e o piloto
# se for a primeira rodada, ou o piloto não participou da rodada anterior na mesma equipe, continua
# se não, procura a posição do piloto na rodada anterior e adiciona
# para lags maiores que 1, calcula a média das rodadas anteriores
# exemplo: lag 4, calcula a média de 2, 3 e 4 rodadas
# testa diferentes modelos com as diferentes bases
# faz a seleção de variáveis para encontrar o melhor modelo em cada caso
df = pd.read_csv("D:\\Esportes\\F1\\F1 db\\f1db-csv\\f1db-races-race-results.csv")
df["positionNumber"].fillna(100, inplace=True)
#df.dropna(subset=["positionNumber"], inplace=True)
df["ID"] = df["driverId"] + df["constructorId"] + df["engineManufacturerId"]
df["race-1"] = np.nan
df.drop_duplicates(subset=["year", "round", "ID"], inplace=True)
df = df[df["year"] >= 1989] # resultado da análise de ponto de mudança
# 15 anos de treino, 4 de teste, 2 de validação
df
C:\Users\yan_k\AppData\Local\Temp\ipykernel_11024\3717127998.py:10: DtypeWarning: Columns (31,32) have mixed types. Specify dtype option on import or set low_memory=False. df = pd.read_csv("D:\\Esportes\\F1\\F1 db\\f1db-csv\\f1db-races-race-results.csv") C:\Users\yan_k\AppData\Local\Temp\ipykernel_11024\3717127998.py:11: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. df["positionNumber"].fillna(100, inplace=True)
raceId | year | round | positionDisplayOrder | positionNumber | positionText | driverNumber | driverId | constructorId | engineManufacturerId | ... | qualificationPositionText | gridPositionNumber | gridPositionText | positionsGained | pitStops | fastestLap | driverOfTheDay | grandSlam | ID | race-1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
11852 | 469 | 1989 | 1 | 1 | 1.0 | 1 | 27 | nigel-mansell | ferrari | ferrari | ... | 6 | 6.0 | 6 | 5.0 | NaN | False | NaN | False | nigel-mansellferrariferrari | NaN |
11853 | 469 | 1989 | 1 | 2 | 2.0 | 2 | 2 | alain-prost | mclaren | honda | ... | 5 | 5.0 | 5 | 3.0 | NaN | False | NaN | False | alain-prostmclarenhonda | NaN |
11854 | 469 | 1989 | 1 | 3 | 3.0 | 3 | 15 | mauricio-gugelmin | march | judd | ... | 12 | 12.0 | 12 | 9.0 | NaN | False | NaN | False | mauricio-gugelminmarchjudd | NaN |
11855 | 469 | 1989 | 1 | 4 | 4.0 | 4 | 20 | johnny-herbert | benetton | ford | ... | 10 | 10.0 | 10 | 6.0 | NaN | False | NaN | False | johnny-herbertbenettonford | NaN |
11856 | 469 | 1989 | 1 | 5 | 5.0 | 5 | 9 | derek-warwick | arrows | ford | ... | 8 | 8.0 | 8 | 3.0 | NaN | False | NaN | False | derek-warwickarrowsford | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
27126 | 1141 | 2025 | 16 | 16 | 16.0 | 16 | 10 | pierre-gasly | alpine | renault | ... | 19 | NaN | PL | 4.0 | 1.0 | False | False | False | pierre-gaslyalpinerenault | NaN |
27127 | 1141 | 2025 | 16 | 17 | 17.0 | 17 | 43 | franco-colapinto | alpine | renault | ... | 18 | 17.0 | 17 | 0.0 | 1.0 | False | False | False | franco-colapintoalpinerenault | NaN |
27128 | 1141 | 2025 | 16 | 18 | 18.0 | 18 | 18 | lance-stroll | aston-martin | mercedes | ... | 17 | 16.0 | 16 | -2.0 | 1.0 | False | False | False | lance-strollaston-martinmercedes | NaN |
27129 | 1141 | 2025 | 16 | 19 | 100.0 | DNF | 14 | fernando-alonso | aston-martin | mercedes | ... | 9 | 8.0 | 8 | NaN | 1.0 | False | False | False | fernando-alonsoaston-martinmercedes | NaN |
27130 | 1141 | 2025 | 16 | 20 | 100.0 | DNS | 27 | nico-hulkenberg | kick-sauber | ferrari | ... | 12 | NaN | NaN | NaN | 0.0 | False | False | False | nico-hulkenbergkick-sauberferrari | NaN |
15279 rows × 36 columns
# escrever para mais corrida anteriores
for ano in df["year"].unique():
for id in df[df["year"] == ano]["ID"].unique():
if len(df[(df["year"] == ano) & (df["ID"] == id)]) == 1:
df = df.drop(df[(df["year"] == ano) & (df["ID"] == id)].index)
else:
df.sort_values("round", inplace=True, ascending=True)
for i in range(len(df[(df["year"] == ano) & (df["ID"] == id)]) - 1):
l = len(df[(df["year"] == ano) & (df["ID"] == id)]) - 1
round = df[(df["year"] == ano) & (df["ID"] == id)].iloc[int(-1 * (i + 1)), 2].tolist()
df.loc[df[(df["year"] == ano) & (df["ID"] == id) & (df["round"] == round)].index, "race-1"] = df[(df["year"] == ano) & (df["ID"] == id)].iloc[int(-1 * (i + 2)), 4]
if i < l - 1: # 2 corridas
df.loc[df[(df["year"] == ano) & (df["ID"] == id) & (df["round"] == round)].index, "race-2"] = df[(df["year"] == ano) & (df["ID"] == id)].iloc[int(-1 * (i + 3)), 4]
if i < l - 2: # 3 corridas
df.loc[df[(df["year"] == ano) & (df["ID"] == id) & (df["round"] == round)].index, "race-3"] = df[(df["year"] == ano) & (df["ID"] == id)].iloc[int(-1 * (i + 4)), 4]
if i < l - 3: # 4 corridas
df.loc[df[(df["year"] == ano) & (df["ID"] == id) & (df["round"] == round)].index, "race-4"] = df[(df["year"] == ano) & (df["ID"] == id)].iloc[int(-1 * (i + 5)), 4]
if i < l - 4: # 5 corridas
df.loc[df[(df["year"] == ano) & (df["ID"] == id) & (df["round"] == round)].index, "race-5"] = df[(df["year"] == ano) & (df["ID"] == id)].iloc[int(-1 * (i + 6)), 4]
if i < l - 5: # 6 corridas
df.loc[df[(df["year"] == ano) & (df["ID"] == id) & (df["round"] == round)].index, "race-6"] = df[(df["year"] == ano) & (df["ID"] == id)].iloc[int(-1 * (i + 7)), 4]
if i < l - 6: # 7 corridas
df.loc[df[(df["year"] == ano) & (df["ID"] == id) & (df["round"] == round)].index, "race-7"] = df[(df["year"] == ano) & (df["ID"] == id)].iloc[int(-1 * (i + 8)), 4]
if i < l - 7: # 8 corridas
df.loc[df[(df["year"] == ano) & (df["ID"] == id) & (df["round"] == round)].index, "race-8"] = df[(df["year"] == ano) & (df["ID"] == id)].iloc[int(-1 * (i + 9)), 4]
if i < l - 8: # 9 corridas
df.loc[df[(df["year"] == ano) & (df["ID"] == id) & (df["round"] == round)].index, "race-9"] = df[(df["year"] == ano) & (df["ID"] == id)].iloc[int(-1 * (i + 10)), 4]
if i < l - 9: # 10 corridas
df.loc[df[(df["year"] == ano) & (df["ID"] == id) & (df["round"] == round)].index, "race-10"] = df[(df["year"] == ano) & (df["ID"] == id)].iloc[int(-1 * (i + 11)), 4]
# pega a primeira linha, adiciona o resultado da segunda
# repete até a penultima linha
#df.dropna(subset=["race-1"], inplace=True)
data = df[df["raceId"] == max(df["raceId"])]
df = df[(df["positionNumber"] != 100) & (df["raceId"] != max(df["raceId"]))]
df = pd.concat([df, data])
df["ma2"] = (df["race-1"] + df["race-2"])/2
df["ma3"] = (df["race-1"] + df["race-2"] + df["race-3"])/3
df["ma4"] = (df["race-1"] + df["race-2"] + df["race-3"] + df["race-4"])/4
df["ma5"] = (df["race-1"] + df["race-2"] + df["race-3"] + df["race-4"] + df["race-5"])/5
df["ma6"] = (df["race-1"] + df["race-2"] + df["race-3"] + df["race-4"] + df["race-5"] + df["race-6"])/6
df["ma7"] = (df["race-1"] + df["race-2"] + df["race-3"] + df["race-4"] + df["race-5"] + df["race-6"] + df["race-7"])/7
df["ma8"] = (df["race-1"] + df["race-2"] + df["race-3"] + df["race-4"] + df["race-5"] + df["race-6"] + df["race-7"] + df["race-8"])/8
df["ma9"] = (df["race-1"] + df["race-2"] + df["race-3"] + df["race-4"] + df["race-5"] + df["race-6"] + df["race-7"] + df["race-8"] + df["race-9"])/9
df["ma10"] = (df["race-1"] + df["race-2"] + df["race-3"] + df["race-4"] + df["race-5"] + df["race-6"] + df["race-7"] + df["race-8"] + df["race-9"] + df["race-10"])/10
Análise exploratória¶
# balanceamento das classes
fig = px.histogram(df, x="positionNumber")
fig.update_layout(xaxis_type='category')
fig.show()
corr = df[["qualificationPositionNumber", "gridPositionNumber", "positionNumber", "race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7", "race-8", "race-9", "race-10", "ma2", "ma3", "ma4", "ma5", "ma6", "ma7", "ma8", "ma9", "ma10"]].corr(method = "spearman")
fig = px.imshow(corr, text_auto=True)
fig.show()
l = []
for ano in df["year"].unique():
l.append([ano, df[df["year"] == ano][["positionNumber", "qualificationPositionNumber"]].corr(method="spearman")["qualificationPositionNumber"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "gridPositionNumber"]].corr(method="spearman")["gridPositionNumber"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "race-1"]].corr(method="spearman")["race-1"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "race-2"]].corr(method="spearman")["race-2"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "race-3"]].corr(method="spearman")["race-3"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "race-4"]].corr(method="spearman")["race-4"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "race-5"]].corr(method="spearman")["race-5"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "race-6"]].corr(method="spearman")["race-6"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "race-7"]].corr(method="spearman")["race-7"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "race-8"]].corr(method="spearman")["race-8"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "race-9"]].corr(method="spearman")["race-9"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "race-10"]].corr(method="spearman")["race-10"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "ma2"]].corr(method="spearman")["ma2"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "ma3"]].corr(method="spearman")["ma3"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "ma4"]].corr(method="spearman")["ma4"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "ma5"]].corr(method="spearman")["ma5"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "ma6"]].corr(method="spearman")["ma6"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "ma7"]].corr(method="spearman")["ma7"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "ma8"]].corr(method="spearman")["ma8"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "ma9"]].corr(method="spearman")["ma9"].values.tolist()[0]
, df[df["year"] == ano][["positionNumber", "ma10"]].corr(method="spearman")["ma10"].values.tolist()[0]])
corrDf = pd.DataFrame(l, columns=["Ano", "Qualy", "Grid", "AR1", "AR2", "AR3", "AR4", "AR5", "AR6", "AR7", "AR8", "AR9", "AR10", "MA2", "MA3", "MA4", "MA5", "MA6", "MA7", "MA8", "MA9", "MA10"])
corrDf.sort_values("Ano", inplace=True)
fig = px.line(corrDf, x="Ano", y=["Qualy", "Grid", "AR1", "AR2", "AR3", "AR4", "AR5", "AR6", "AR7", "AR8", "AR9", "AR10", "MA2", "MA3", "MA4", "MA5", "MA6", "MA7", "MA8", "MA9", "MA10"])
fig.show()
l = ["Qualy", "Grid", "AR1", "AR2", "AR3", "AR4", "AR5", "AR6", "AR7", "AR8", "AR9", "AR10", "MA2", "MA3", "MA4", "MA5", "MA6", "MA7", "MA8", "MA9", "MA10"]
for i in l:
changepoint(corrDf, "Ano", y=i)
Modelagem¶
Processo Gaussiano¶
for i in range(2, 11):
print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = GaussianProcessClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = GaussianProcessClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
2-lags 0.1830178474851271 0.14548404542996216 3-lags 0.34033418426549084 0.15293571594337432 4-lags 0.5524510415367344 0.16265435948609205 5-lags 0.6558283864004317 0.167835941716136 6-lags 0.6978375219170077 0.17372881355932204 7-lags 0.7092651757188498 0.17028753993610224 8-lags 0.706784140969163 0.17762114537444934 9-lags 0.6969519343493552 0.16744822196170378 10-lags 0.6853808894760017 0.162703654777631
Regressão Logistica (Multinomial)¶
Melhor modelo: corridas 1 a 10
for i in range(2, 11):
print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = LogisticRegression(solver='newton-cg')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = LogisticRegression(solver='newton-cg')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
2-lags 0.07961060032449974 0.07917793401838832 3-lags 0.08273381294964029 0.09097238338361568 4-lags 0.08781339653236872 0.08694025196457528 5-lags 0.09079870480302213 0.083917970858068 6-lags 0.09395090590298072 0.08372296902396259 7-lags 0.09281150159744408 0.08594249201277955 8-lags 0.09550660792951542 0.08334801762114537 9-lags 0.09788980070339977 0.08401719421649081 10-lags 0.0959929546455306 0.0845442536327609
Naive Bayes¶
Melhor modelo: corridas 1 a 10
for i in range(2, 11):
print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = CategoricalNB()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = CategoricalNB()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
2-lags 0.15305570578691186 0.14256354786371012 3-lags 0.16500348108609886 0.15235553492689718 4-lags 0.18797555195210178 0.15928651615317452 5-lags 0.19913653534808418 0.16473286562331355 6-lags 0.21274108708357686 0.16817650496785505 7-lags 0.22124600638977635 0.17012779552715654 8-lags 0.23541850220264318 0.17162995594713656 9-lags 0.2534193044157874 0.1652989449003517 10-lags 0.25891677675033026 0.16160281814178776
Radius-Neighbors¶
melhorMa = 0
melhorAr = 0
for w in ['uniform', 'distance']:
for d in ['cityblock', 'cosine', 'euclidean', 'l1', 'l2', 'manhattan', 'nan_euclidean']:
for k in np.arange(0.1, 10, 0.1):
for i in range(1, 11):
#print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
try:
rf = RadiusNeighborsClassifier(radius=k, weights=w, metric=d)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
except:
continue
if acc > melhorAr:
melhorAr = acc
print(f"{i}-lags (AR)")
print(w)
print(d)
print(k)
print(acc)
print("\n")
if i == 1:
continue
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
try:
rf = RadiusNeighborsClassifier(radius=k, weights=w, metric=d)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
except:
continue
if acc > melhorMa:
melhorMa = acc
print(f"{i}-lags (MA)")
print(w)
print(d)
print(k)
print(acc)
print("\n")
1-lags (AR) uniform cityblock 0.1 0.13008706215833166 2-lags (AR) uniform cityblock 1.0 0.16127636560302866 2-lags (MA) uniform cityblock 1.0 0.13672255273120607 2-lags (MA) uniform cityblock 1.5000000000000002 0.13780421849648458 4-lags (AR) uniform cityblock 7.0 0.17425470874391918 5-lags (AR) uniform euclidean 7.9 0.17943874797625473 6-lags (AR) uniform euclidean 8.8 0.2020748100526008 6-lags (AR) uniform euclidean 8.9 0.20295149035651666 6-lags (AR) uniform euclidean 9.0 0.2033898305084746 2-lags (MA) distance cityblock 1.0 0.14883720930232558 3-lags (AR) distance cityblock 5.0 0.3832675794847993 3-lags (MA) distance cityblock 5.0 0.16488744488280344 3-lags (AR) distance cityblock 6.0 0.38373172429798097 4-lags (AR) distance cityblock 7.0 0.5802669327678682 4-lags (MA) distance cityblock 7.0 0.18473244355744045 4-lags (AR) distance cityblock 8.0 0.5806411375826369 4-lags (AR) distance cityblock 9.0 0.5810153423974055 5-lags (AR) distance cosine 0.1 0.6606853750674582 6-lags (AR) distance cosine 0.1 0.695353594389246 7-lags (AR) distance cosine 0.1 0.7057507987220447 7-lags (AR) distance cosine 0.2 0.7062300319488818 5-lags (MA) distance euclidean 7.9 0.19724770642201836 6-lags (AR) distance euclidean 8.8 0.7067504383401519 6-lags (MA) distance euclidean 8.8 0.21566335476329632 6-lags (AR) distance euclidean 8.9 0.7079193454120397
K-Vizinhos¶
Melhor modelo: 1 vizinho, corridas 1 a 4 (o resto literalmente não importa)
melhorMa = 0
melhorAr = 0
for w in ['uniform', 'distance']:
for d in ['cityblock', 'cosine', 'euclidean', 'l1', 'l2', 'manhattan', 'nan_euclidean']:
for k in range(1, 11):
for i in range(1, 11):
#print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = KNeighborsClassifier(n_neighbors=k, weights=w, metric=d)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorAr:
melhorAr = acc
print(f"{i}-lags (AR)")
print(w)
print(d)
print(k)
print(acc)
print("\n")
if i == 1:
continue
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = KNeighborsClassifier(n_neighbors=k, weights=w, metric=d)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorMa:
melhorMa = acc
print(f"{i}-lags (MA)")
print(w)
print(d)
print(k)
print(acc)
print("\n")
1-lags (AR) uniform cityblock 1 0.0845312816359587 2-lags (AR) uniform cityblock 1 0.13358572201189833 2-lags (MA) uniform cityblock 1 0.09724175229853975 3-lags (AR) uniform cityblock 1 0.3469482478533302 3-lags (MA) uniform cityblock 1 0.11058250174054304 4-lags (AR) uniform cityblock 1 0.564300860671074 4-lags (MA) uniform cityblock 1 0.1299738056629662 5-lags (AR) uniform cityblock 1 0.6612250404749056 5-lags (MA) uniform cityblock 1 0.13667026443604965 6-lags (AR) uniform cityblock 1 0.6991525423728814 6-lags (MA) uniform cityblock 1 0.16072472238457042 7-lags (AR) uniform cityblock 1 0.7084664536741214 7-lags (MA) uniform cityblock 1 0.17268370607028755 8-lags (MA) uniform cityblock 1 0.19365638766519824 9-lags (MA) uniform cityblock 1 0.20808909730363423 10-lags (MA) uniform cityblock 1 0.22567151034786437 10-lags (MA) uniform cityblock 2 0.2289740202553941 10-lags (MA) uniform cityblock 6 0.23029502421840597 7-lags (AR) uniform euclidean 1 0.7094249201277956 8-lags (AR) uniform euclidean 1 0.7103083700440529 10-lags (MA) uniform nan_euclidean 5 0.23183619550858653 10-lags (MA) distance cityblock 2 0.23756054601497137 10-lags (MA) distance cityblock 3 0.24064288859533245 10-lags (MA) distance cityblock 4 0.24856891237340378 7-lags (AR) distance cityblock 5 0.7115015974440895 8-lags (AR) distance cityblock 5 0.7117180616740089 10-lags (MA) distance cityblock 5 0.25847644209599296 7-lags (AR) distance cityblock 6 0.713258785942492 10-lags (MA) distance cityblock 6 0.26442095992954645 7-lags (AR) distance cityblock 7 0.7148562300319489 10-lags (MA) distance cityblock 7 0.26574196389255833 7-lags (AR) distance cityblock 8 0.7156549520766773 10-lags (MA) distance cityblock 8 0.26794363716424485 10-lags (MA) distance cityblock 9 0.26926464112725673 7-lags (AR) distance cityblock 10 0.7159744408945687 10-lags (MA) distance cityblock 10 0.27080581241743723 7-lags (AR) distance euclidean 9 0.7164536741214057 7-lags (AR) distance euclidean 10 0.7167731629392972 7-lags (AR) distance nan_euclidean 10 0.7170926517571885
Histogram Gradient Boosting¶
melhorMa = 0
melhorAr = 0
for c in ['log_loss']:
for f in range(1, 2):
for i in range(2, 11):
print(f"rodada {f}")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = GradientBoostingClassifier(loss = c)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorAr:
melhorAr = acc
print(f"{i}-lags (AR)")
print(c)
print(f)
print(acc)
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = GradientBoostingClassifier(loss = c)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorMa:
melhorMa = acc
print(f"{i}-lags (MA)")
print(c)
print(f)
print(acc)
rodada 1 2-lags (AR) log_loss 1 0.17414818820984315 2-lags (MA) log_loss 1 0.14894537587885343 rodada 1 3-lags (AR) log_loss 1 0.2104896727779067 3-lags (MA) log_loss 1 0.16326293803666744 rodada 1 4-lags (AR) log_loss 1 0.2543345391044031 4-lags (MA) log_loss 1 0.1837345640513908 rodada 1 5-lags (AR) log_loss 1 0.2892606583917971 5-lags (MA) log_loss 1 0.19603345925526175 rodada 1 6-lags (AR) log_loss 1 0.31545879602571597 6-lags (MA) log_loss 1 0.20689655172413793 rodada 1 7-lags (AR) log_loss 1 0.3539936102236422 7-lags (MA) log_loss 1 0.21964856230031948 rodada 1 8-lags (AR) log_loss 1 0.3950660792951542 8-lags (MA) log_loss 1 0.23330396475770926 rodada 1 9-lags (AR) log_loss 1 0.4271199687377882 9-lags (MA) log_loss 1 0.235052754982415 rodada 1 10-lags (AR) log_loss 1 0.47512109202994274 10-lags (MA) log_loss 1 0.24856891237340378
NuSVC¶
for i in range(2, 11):
print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
try:
rf = NuSVC(nu=0.9)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
except:
continue
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
try:
rf = NuSVC(nu=0.9)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
except:
continue
2-lags 3-lags 4-lags 5-lags 6-lags 7-lags 8-lags 9-lags 10-lags
SVC¶
for i in range(2, 11):
print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = SVC()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = SVC()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
2-lags 0.10968090859924283 0.11238507301243916 3-lags 0.11441169644929218 0.10385240194940822 4-lags 0.11725084196083323 0.08581763752026943 5-lags 0.11656772800863464 0.08432271991365353 6-lags 0.1260958503798948 0.08284628872004676 7-lags 0.13722044728434504 0.0878594249201278 8-lags 0.14801762114537445 0.09762114537444934 9-lags 0.16451738960531456 0.08909730363423213 10-lags 0.18339938353148394 0.09731395860854249
SVC Linear¶
melhorMa = 0
melhorAr = 0
for c in ['hinge', 'squared_hinge']:
for f in ['l1', 'l2']:
for i in range(2, 11):
if f == 'l1' and c == 'hinge':
continue
print(f"rodada {f}")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = LinearSVC(loss = c, penalty = f)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorAr:
melhorAr = acc
print(f"{i}-lags (AR)")
print(c)
print(f)
print(acc)
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = LinearSVC(loss = c, penalty = f)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorMa:
melhorMa = acc
print(f"{i}-lags (MA)")
print(c)
print(f)
print(acc)
rodada l2
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
2-lags (AR) hinge l2 0.06327744726879395
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
2-lags (MA) hinge l2 0.06327744726879395 rodada l2
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
rodada l2
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
rodada l2
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
rodada l2
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
rodada l2
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
rodada l2
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
rodada l2
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
rodada l2
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\svm\_base.py:1235: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
rodada l1 2-lags (AR) squared_hinge l1 0.07993510005408329 2-lags (MA) squared_hinge l1 0.07917793401838832 rodada l1 3-lags (AR) squared_hinge l1 0.08540264562543513 3-lags (MA) squared_hinge l1 0.09190067300997912 rodada l1 4-lags (AR) squared_hinge l1 0.08856180616190595 rodada l1 rodada l1 6-lags (AR) squared_hinge l1 0.09088252483927528 rodada l1 7-lags (AR) squared_hinge l1 0.09249201277955271 rodada l1 8-lags (AR) squared_hinge l1 0.09427312775330396 rodada l1 9-lags (AR) squared_hinge l1 0.09574052364204767 rodada l1 10-lags (AR) squared_hinge l1 0.0959929546455306 rodada l2 rodada l2 rodada l2 rodada l2 rodada l2 rodada l2 rodada l2 rodada l2 rodada l2
Gradient Boosting¶
melhorMa = 0
melhorAr = 0
for c in ['log_loss']:
for f in ['friedman_mse', 'squared_error']:
for i in range(2, 11):
print(f"rodada {f}")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = GradientBoostingClassifier(loss = c, criterion = f)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorAr:
melhorAr = acc
print(f"{i}-lags (AR)")
print(c)
print(f)
print(acc)
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = GradientBoostingClassifier(loss = c, criterion = f)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorMa:
melhorMa = acc
print(f"{i}-lags (MA)")
print(c)
print(f)
print(acc)
rodada friedman_mse 2-lags (AR) log_loss friedman_mse 0.17414818820984315 2-lags (MA) log_loss friedman_mse 0.14894537587885343 rodada friedman_mse 3-lags (AR) log_loss friedman_mse 0.2104896727779067 3-lags (MA) log_loss friedman_mse 0.16326293803666744 rodada friedman_mse 4-lags (AR) log_loss friedman_mse 0.2543345391044031 4-lags (MA) log_loss friedman_mse 0.1837345640513908 rodada friedman_mse 5-lags (AR) log_loss friedman_mse 0.2892606583917971 5-lags (MA) log_loss friedman_mse 0.19603345925526175 rodada friedman_mse 6-lags (AR) log_loss friedman_mse 0.31545879602571597 6-lags (MA) log_loss friedman_mse 0.20689655172413793 rodada friedman_mse 7-lags (AR) log_loss friedman_mse 0.3539936102236422 7-lags (MA) log_loss friedman_mse 0.21964856230031948 rodada friedman_mse 8-lags (AR) log_loss friedman_mse 0.3950660792951542 8-lags (MA) log_loss friedman_mse 0.23330396475770926 rodada friedman_mse 9-lags (AR) log_loss friedman_mse 0.4271199687377882 9-lags (MA) log_loss friedman_mse 0.235052754982415 rodada friedman_mse 10-lags (AR) log_loss friedman_mse 0.47512109202994274 10-lags (MA) log_loss friedman_mse 0.24856891237340378 rodada squared_error rodada squared_error rodada squared_error rodada squared_error rodada squared_error rodada squared_error rodada squared_error rodada squared_error rodada squared_error
Extra Trees¶
melhorMa = 0
melhorAr = 0
for c in ['gini', 'entropy', 'log_loss']:
for f in range(100, 101):
for i in range(2, 11):
print(f"rodada {f}")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = ExtraTreesClassifier(n_estimators = f, criterion = c)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorAr:
melhorAr = acc
print(f"{i}-lags (AR)")
print(c)
print(f)
print(acc)
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = ExtraTreesClassifier(n_estimators = f, criterion = c)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorMa:
melhorMa = acc
print(f"{i}-lags (MA)")
print(c)
print(f)
print(acc)
rodada 100 2-lags (AR) gini 100 0.18994050838290968 2-lags (MA) gini 100 0.14883720930232558 rodada 100 3-lags (AR) gini 100 0.38280343467161754 3-lags (MA) gini 100 0.16488744488280344 rodada 100 4-lags (AR) gini 100 0.5777722340027441 4-lags (MA) gini 100 0.18473244355744045 rodada 100 5-lags (AR) gini 100 0.6691851052347545 5-lags (MA) gini 100 0.19724770642201836 rodada 100 6-lags (AR) gini 100 0.7042665108123904 6-lags (MA) gini 100 0.21595558153126826 rodada 100 7-lags (AR) gini 100 0.7119808306709265 7-lags (MA) gini 100 0.2343450479233227 rodada 100 8-lags (MA) gini 100 0.24933920704845816 rodada 100 9-lags (MA) gini 100 0.2557639703008988 rodada 100 10-lags (MA) gini 100 0.2712461470717745 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 7-lags (AR) entropy 100 0.7137380191693291 rodada 100 rodada 100 9-lags (AR) entropy 100 0.7139507620164126 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 10-lags (MA) log_loss 100 0.2714663143989432
Floresta Aleatória¶
Corridas 1 a 4, log-loss
melhorMa = 0
melhorAr = 0
for c in ['gini', 'entropy', 'log_loss']:
for f in range(100, 101):
for i in range(2, 11):
print(f"rodada {f}")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = RandomForestClassifier(n_estimators = f, criterion = c)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorAr:
melhorAr = acc
print(f"{i}-lags (AR)")
print(c)
print(f)
print(acc)
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = RandomForestClassifier(n_estimators = f, criterion = c)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorMa:
melhorMa = acc
print(f"{i}-lags (MA)")
print(c)
print(f)
print(acc)
rodada 100 2-lags (AR) gini 100 0.19048134126554894 2-lags (MA) gini 100 0.15002704164413197 rodada 100 3-lags (AR) gini 100 0.38233928985843585 3-lags (MA) gini 100 0.1643072638663263 rodada 100 4-lags (AR) gini 100 0.5773980291879756 4-lags (MA) gini 100 0.18498191343395284 rodada 100 5-lags (AR) gini 100 0.6664867781975176 5-lags (MA) gini 100 0.1988667026443605 rodada 100 6-lags (AR) gini 100 0.7038281706604325 6-lags (MA) gini 100 0.21712448860315606 rodada 100 7-lags (AR) gini 100 0.710223642172524 7-lags (MA) gini 100 0.23610223642172523 rodada 100 8-lags (AR) gini 100 0.7129515418502202 8-lags (MA) gini 100 0.2481057268722467 rodada 100 9-lags (AR) gini 100 0.7135599843688941 9-lags (MA) gini 100 0.25908558030480655 rodada 100 10-lags (MA) gini 100 0.27300748568912375 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 8-lags (AR) entropy 100 0.7138325991189427 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 rodada 100 7-lags (AR) log_loss 100 0.7140575079872205 rodada 100 rodada 100 rodada 100
Rede Neural¶
Resultados piores, nem vou perder tempo
melhorMa = 0
melhorAr = 0
for hl1 in range(24, 25):
for act in ['identity', 'logistic', 'tanh', 'relu']:
for i in range(2, 11):
#print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = MLPClassifier(hidden_layer_sizes=(hl1, hl1,), activation=act)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorAr:
melhorAr = acc
print(f"{i}-lags (AR)")
print(hl1)
print(act)
print(acc)
print("\n")
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = MLPClassifier(hidden_layer_sizes=(hl1, hl1,), activation=act)
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
acc = accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values))
if acc > melhorMa:
melhorMa = acc
print(f"{i}-lags (MA)")
print(hl1)
print(act)
print(acc)
print("\n")
2-lags (AR) 24 identity 0.08339643050297459 2-lags (MA) 24 identity 0.0711736073553272 3-lags (MA) 24 identity 0.08876769552100255 7-lags (AR) 24 identity 0.09297124600638977 8-lags (MA) 24 identity 0.08951541850220264 9-lags (MA) 24 identity 0.08987885892926925
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
2-lags (AR) 24 logistic 0.13585722011898324
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
2-lags (MA) 24 logistic 0.11151974040021634
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
3-lags (AR) 24 logistic 0.14284056625667207
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
3-lags (MA) 24 logistic 0.11580413088883731
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
4-lags (AR) 24 logistic 0.15192715479605837
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
5-lags (AR) 24 logistic 0.1536697247706422
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
6-lags (AR) 24 logistic 0.16379310344827586
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
8-lags (AR) 24 logistic 0.1651101321585903
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
2-lags (MA) 24 tanh 0.12028123309897241
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
3-lags (MA) 24 tanh 0.12044557902065445
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
6-lags (AR) 24 tanh 0.1724137931034483
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
3-lags (MA) 24 relu 0.12079368763054073
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Classificador Passivo Agressivo¶
for i in range(2, 11):
print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = PassiveAggressiveClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = PassiveAggressiveClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
2-lags 0.0936722552731206 0.08177393185505678 3-lags 0.07426317010907403 0.09422139707588768 4-lags 0.09305226393912935 0.06398902332543345 5-lags 0.0791958985429034 0.07366432811656773 6-lags 0.06984219754529515 0.0793395675043834 7-lags 0.0755591054313099 0.07939297124600639 8-lags 0.0720704845814978 0.06748898678414098 9-lags 0.07737397420867527 0.07561547479484174 10-lags 0.0785997357992074 0.06472919418758256
Ridge¶
for i in range(2, 11):
print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = RidgeClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = RidgeClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
2-lags 0.07993510005408329 0.07917793401838832 3-lags 0.08540264562543513 0.09236481782316083 4-lags 0.08806286640888113 0.07920668579269054 5-lags 0.08837021046950891 0.0806799784133837 6-lags 0.09073641145528931 0.0875219170075979 7-lags 0.0926517571884984 0.08067092651757188 8-lags 0.09533039647577092 0.07823788546255507 9-lags 0.09495896834701055 0.07874169597499023 10-lags 0.095112285336856 0.08102157639806253
Ada Boost¶
for i in range(2, 11):
print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = AdaBoostClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = AdaBoostClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
2-lags
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.08090859924283396
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.07906976744186046 3-lags
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.08586679043861685
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.10362032954281736 4-lags
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.10352999875265062
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.0987900710989148 5-lags
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.11130599028602267
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.0871559633027523 6-lags
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.12054354178842781
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.08810637054354178 7-lags
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.12619808306709265
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.07779552715654953 8-lags
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.12458149779735683
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.09004405286343613 9-lags
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.12211801484955061
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.08909730363423213 10-lags
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.11448701012769705
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
0.09026860413914575
Bagging¶
for i in range(2, 11):
print(f"{i}-lags")
l = []
for j in range(i, 0, -1):
#if j > 1:
# l.append(f"ma{j}")
l.append(f"race-{j}")
#print(l)
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = BaggingClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
l = [f"ma{i}"]
df2 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2016)]
dfTest = df1[(df1["year"] > 2016) | (df1["year"] <= 2022)]
dfVal = df1[(df1["year"] > 2022) | (df1["year"] <= 2024)]
rf = BaggingClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfTest[l].values)))
2-lags 0.18745267712276906 0.1474310438074635 3-lags 0.3752610814574147 0.1646553724762126 4-lags 0.5656729449918922 0.18398403392790322 5-lags 0.654749055585537 0.1968429573664328 6-lags 0.6911163062536528 0.21244886031560492 7-lags 0.7054313099041534 0.23067092651757187 8-lags 0.7043171806167401 0.24052863436123348 9-lags 0.6998827667057445 0.25068386088315747 10-lags 0.6957287538529282 0.26486129458388374
Escolha final de modelo¶
l = ["gridPositionNumber", "qualificationPositionNumber", "race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7", "race-8", "race-9", "race-10"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = LogisticRegression(solver='newton-cg')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["gridPositionNumber", "qualificationPositionNumber", "race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7", "race-8", "race-9", "race-10"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = CategoricalNB()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["gridPositionNumber", "qualificationPositionNumber", "race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = KNeighborsClassifier(n_neighbors=9, weights='distance', metric='euclidean')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["gridPositionNumber", "qualificationPositionNumber", "race-1", "race-2", "race-3", "race-4", "race-5", "race-6"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = RandomForestClassifier(criterion = 'entropy')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
0.17874069058903183
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\utils\optimize.py:318: ConvergenceWarning: newton-cg failed to converge at loss = 2.4419924086040696. Increase the number of iterations.
0.3107650643195667
0.8218663180258212
0.820140655394284
l = ["race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7", "race-8", "race-9", "race-10"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = LogisticRegression(solver='newton-cg')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7", "race-8", "race-9", "race-10"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = CategoricalNB()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = GaussianProcessClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["race-1", "race-2", "race-3", "race-4", "race-5", "race-6"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = RadiusNeighborsClassifier(radius=8.9, weights='distance', metric='euclidean')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = KNeighborsClassifier(n_neighbors=10, weights='distance', metric='nan_euclidean')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = RandomForestClassifier(criterion = 'entropy')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7", "race-8", "race-9", "race-10"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = GradientBoostingClassifier(loss = 'log_loss')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7", "race-8", "race-9", "race-10"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = GradientBoostingClassifier(loss = 'log_loss', criterion = 'friedman_mse')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7", "race-8", "race-9"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = ExtraTreesClassifier(criterion = 'entropy')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
l = ["race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = BaggingClassifier()
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
print(accuracy_score(dfTest["positionNumber"].values, rf.predict(dfVal[l].values)))
conf = confusion_matrix(dfTest["positionNumber"].values, rf.predict(dfVal[l].values))
fig = px.imshow(conf, text_auto=True)
fig.show()
0.09995596653456627
C:\Users\yan_k\anaconda3\Lib\site-packages\sklearn\utils\optimize.py:318: ConvergenceWarning: newton-cg failed to converge at loss = 2.8452094281203193. Increase the number of iterations.
0.2600176133861735
0.8012779552715655
0.7964640561075395
0.8083067092651757
0.8035143769968051
0.48789079700572435
0.48789079700572435
0.8067604533020711
0.7945686900958466
Predição¶
Próxima corrida¶
Sem Classificação¶
l = ["positionNumber", "race-1", "race-2", "race-3", "race-4", "race-5", "race-6", "race-7"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = KNeighborsClassifier(n_neighbors=10, weights='distance', metric='nan_euclidean')
rf.fit(dfTrain[l[1:-1]].values, dfTrain["positionNumber"].values)
dfPred = df1[df1["year"] == 2025]
dfPred = dfPred[(dfPred["round"] == max(dfPred["round"]))]
#print(dfPred)
#dfPred.to_excel("pred.xlsx")
for piloto in dfPred["ID"].unique():
plot(dfPred[dfPred["ID"] == piloto][l[:-2]], rf, dfPred[dfPred["ID"] == piloto]["driverId"].values.tolist()[0])
# seleciona a ultima corrida
# roda piloto-a-piloto
13.196246680697346
3.852285279656953
12.69095901427238
8.91376312744927
11.327965935994463
10.278125634017961
7.975417197517713
5.293784384141869
13.20925114787314
7.573242400619545
15.132299938011492
6.465590871785482
3.489897948556636
11.642852429753416
9.484401136346834
9.45090648794505
6.5268942901965605
11.111707861252519
7.77217862037867
10.141498789039641
Com Classificação¶
pred = pd.read_excel("pred.xlsx")
l = ["gridPositionNumber", "qualificationPositionNumber", "race-1", "race-2", "race-3", "race-4", "race-5", "race-6"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = RandomForestClassifier(criterion = 'gini')
rf.fit(dfTrain[l[1:-1]].values, dfTrain["positionNumber"].values)
for piloto in dfPred["ID"].unique():
plot(dfPred[dfPred["ID"] == piloto][l[:-2]], rf, dfPred[dfPred["ID"] == piloto]["driverId"].values.tolist()[0])
#plot() # forçar a barra aqui
# seleciona a ultima corrida
# roda piloto-a-piloto
11.370000000000001
12.67
5.5600000000000005
6.39
1.8598333333333337
4.68
11.95
10.879999999999999
8.146666666666667
11.17
15.090000000000002
11.530000000000001
10.34
10.5
13.43
14.13
9.370000000000001
5.870000000000001
7.05
6.33
Campeonato (Sem Sprint)¶
l = ["race-1", "race-2", "race-3", "race-4", "race-5"]
df1 = df.dropna(subset=l)
dfTrain = df1[(df1["year"] <= 2019)]
dfTest = df1[(df1["year"] > 2019) | (df1["year"] <= 2023)]
dfVal = df1[(df1["year"] > 2023) | (df1["year"] <= 2024)]
rf = RandomForestClassifier(criterion = 'gini')
rf.fit(dfTrain[l].values, dfTrain["positionNumber"].values)
#plot
# seleciona a ultima corrida
# roda piloto-a-piloto
# calcula a média de posição da previsão e determina o ranking
# o ranking passa a ser a posição prevista
RandomForestClassifier()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier()
Campeonato (Com Sprint)¶