Overview

Dataset statistics

Number of variables46
Number of observations98053
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory34.4 MiB
Average record size in memory368.0 B

Variable types

Numeric12
Categorical30
Boolean4

Warnings

examide has constant value "False" Constant
citoglipton has constant value "False" Constant
metformin-rosiglitazone has constant value "False" Constant
diag_1 has a high cardinality: 713 distinct values High cardinality
diag_2 has a high cardinality: 740 distinct values High cardinality
diag_3 has a high cardinality: 786 distinct values High cardinality
tolbutamide is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
insulin is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
A1Cresult is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
diabetesMed is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
max_glu_serum is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
glyburide is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
metformin is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
metformin-pioglitazone is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
metformin-rosiglitazone is highly correlated with tolbutamide and 29 other fieldsHigh correlation
race is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
miglitol is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
gender is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
chlorpropamide is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
repaglinide is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
examide is highly correlated with tolbutamide and 29 other fieldsHigh correlation
acetohexamide is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
age is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
troglitazone is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
change is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
glipizide is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
readmitted is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
acarbose is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
glimepiride-pioglitazone is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
pioglitazone is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
glimepiride is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
citoglipton is highly correlated with tolbutamide and 29 other fieldsHigh correlation
glipizide-metformin is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
glyburide-metformin is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
tolazamide is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
rosiglitazone is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
nateglinide is highly correlated with metformin-rosiglitazone and 2 other fieldsHigh correlation
number_emergency is highly skewed (γ1 = 22.71034016) Skewed
df_index has unique values Unique
num_procedures has 44574 (45.5%) zeros Zeros
number_outpatient has 81680 (83.3%) zeros Zeros
number_emergency has 86846 (88.6%) zeros Zeros
number_inpatient has 64634 (65.9%) zeros Zeros

Reproduction

Analysis started2021-05-05 21:22:20.387978
Analysis finished2021-05-05 21:23:16.876767
Duration56.49 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct98053
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51115.56242
Minimum1
Maximum101765
Zeros0
Zeros (%)0.0%
Memory size766.2 KiB
2021-05-05T17:23:17.034980image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5180.6
Q125575
median51369
Q376379
95-th percentile96683.4
Maximum101765
Range101764
Interquartile range (IQR)50804

Descriptive statistics

Standard deviation29307.25248
Coefficient of variation (CV)0.573352832
Kurtosis-1.191414224
Mean51115.56242
Median Absolute Deviation (MAD)25399
Skewness-0.01478077774
Sum5012034242
Variance858915047.7
MonotocityStrictly increasing
2021-05-05T17:23:17.179019image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
805621
 
< 0.1%
293391
 
< 0.1%
191001
 
< 0.1%
170531
 
< 0.1%
231981
 
< 0.1%
211511
 
< 0.1%
1010281
 
< 0.1%
989811
 
< 0.1%
764641
 
< 0.1%
Other values (98043)98043
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
ValueCountFrequency (%)
1017651
< 0.1%
1017641
< 0.1%
1017631
< 0.1%
1017621
< 0.1%
1017611
< 0.1%

race
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size766.2 KiB
Caucasian
75079 
AfricanAmerican
18881 
Hispanic
 
1984
Other
 
1484
Asian
 
625

Length

Max length15
Median length9
Mean length10.0490857
Min length5

Characters and Unicode

Total characters985343
Distinct characters17
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCaucasian
2nd rowAfricanAmerican
3rd rowCaucasian
4th rowCaucasian
5th rowCaucasian
ValueCountFrequency (%)
Caucasian75079
76.6%
AfricanAmerican18881
 
19.3%
Hispanic1984
 
2.0%
Other1484
 
1.5%
Asian625
 
0.6%
2021-05-05T17:23:17.425838image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-05-05T17:23:17.501515image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
caucasian75079
76.6%
africanamerican18881
 
19.3%
hispanic1984
 
2.0%
other1484
 
1.5%
asian625
 
0.6%

Most occurring characters

ValueCountFrequency (%)
a265608
27.0%
i117434
11.9%
n115450
11.7%
c114825
11.7%
s77688
 
7.9%
C75079
 
7.6%
u75079
 
7.6%
r39246
 
4.0%
A38387
 
3.9%
e20365
 
2.1%
Other values (7)46182
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter868409
88.1%
Uppercase Letter116934
 
11.9%

Most frequent character per category

ValueCountFrequency (%)
a265608
30.6%
i117434
13.5%
n115450
13.3%
c114825
13.2%
s77688
 
8.9%
u75079
 
8.6%
r39246
 
4.5%
e20365
 
2.3%
f18881
 
2.2%
m18881
 
2.2%
Other values (3)4952
 
0.6%
ValueCountFrequency (%)
C75079
64.2%
A38387
32.8%
H1984
 
1.7%
O1484
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Latin985343
100.0%

Most frequent character per script

ValueCountFrequency (%)
a265608
27.0%
i117434
11.9%
n115450
11.7%
c114825
11.7%
s77688
 
7.9%
C75079
 
7.6%
u75079
 
7.6%
r39246
 
4.0%
A38387
 
3.9%
e20365
 
2.1%
Other values (7)46182
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII985343
100.0%

Most frequent character per block

ValueCountFrequency (%)
a265608
27.0%
i117434
11.9%
n115450
11.7%
c114825
11.7%
s77688
 
7.9%
C75079
 
7.6%
u75079
 
7.6%
r39246
 
4.0%
A38387
 
3.9%
e20365
 
2.1%
Other values (7)46182
 
4.7%

gender
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size766.2 KiB
Female
52833 
Male
45219 
Unknown/Invalid
 
1

Length

Max length15
Median length6
Mean length5.077753868
Min length4

Characters and Unicode

Total characters497889
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowFemale
2nd rowFemale
3rd rowMale
4th rowMale
5th rowMale
ValueCountFrequency (%)
Female52833
53.9%
Male45219
46.1%
Unknown/Invalid1
 
< 0.1%
2021-05-05T17:23:17.732352image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-05-05T17:23:17.807175image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
female52833
53.9%
male45219
46.1%
unknown/invalid1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e150885
30.3%
a98053
19.7%
l98053
19.7%
F52833
 
10.6%
m52833
 
10.6%
M45219
 
9.1%
n4
 
< 0.1%
U1
 
< 0.1%
k1
 
< 0.1%
o1
 
< 0.1%
Other values (6)6
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter399834
80.3%
Uppercase Letter98054
 
19.7%
Other Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e150885
37.7%
a98053
24.5%
l98053
24.5%
m52833
 
13.2%
n4
 
< 0.1%
k1
 
< 0.1%
o1
 
< 0.1%
w1
 
< 0.1%
v1
 
< 0.1%
i1
 
< 0.1%
ValueCountFrequency (%)
F52833
53.9%
M45219
46.1%
U1
 
< 0.1%
I1
 
< 0.1%
ValueCountFrequency (%)
/1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin497888
> 99.9%
Common1
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
e150885
30.3%
a98053
19.7%
l98053
19.7%
F52833
 
10.6%
m52833
 
10.6%
M45219
 
9.1%
n4
 
< 0.1%
U1
 
< 0.1%
k1
 
< 0.1%
o1
 
< 0.1%
Other values (5)5
 
< 0.1%
ValueCountFrequency (%)
/1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII497889
100.0%

Most frequent character per block

ValueCountFrequency (%)
e150885
30.3%
a98053
19.7%
l98053
19.7%
F52833
 
10.6%
m52833
 
10.6%
M45219
 
9.1%
n4
 
< 0.1%
U1
 
< 0.1%
k1
 
< 0.1%
o1
 
< 0.1%
Other values (6)6
 
< 0.1%

age
Categorical

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size766.2 KiB
[70-80)
25306 
[60-70)
21809 
[80-90)
16702 
[50-60)
16697 
[40-50)
9265 
Other values (5)
8274 

Length

Max length8
Median length7
Mean length7.027046597
Min length6

Characters and Unicode

Total characters689023
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row[10-20)
2nd row[20-30)
3rd row[30-40)
4th row[40-50)
5th row[50-60)
ValueCountFrequency (%)
[70-80)25306
25.8%
[60-70)21809
22.2%
[80-90)16702
17.0%
[50-60)16697
17.0%
[40-50)9265
 
9.4%
[30-40)3548
 
3.6%
[90-100)2717
 
2.8%
[20-30)1478
 
1.5%
[10-20)466
 
0.5%
[0-10)65
 
0.1%
2021-05-05T17:23:18.012511image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-05-05T17:23:18.097191image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
70-8025306
25.8%
60-7021809
22.2%
80-9016702
17.0%
50-6016697
17.0%
40-509265
 
9.4%
30-403548
 
3.6%
90-1002717
 
2.8%
20-301478
 
1.5%
10-20466
 
0.5%
0-1065
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0198823
28.9%
[98053
14.2%
-98053
14.2%
)98053
14.2%
747115
 
6.8%
842008
 
6.1%
638506
 
5.6%
525962
 
3.8%
919419
 
2.8%
412813
 
1.9%
Other values (3)10218
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number394864
57.3%
Open Punctuation98053
 
14.2%
Dash Punctuation98053
 
14.2%
Close Punctuation98053
 
14.2%

Most frequent character per category

ValueCountFrequency (%)
0198823
50.4%
747115
 
11.9%
842008
 
10.6%
638506
 
9.8%
525962
 
6.6%
919419
 
4.9%
412813
 
3.2%
35026
 
1.3%
13248
 
0.8%
21944
 
0.5%
ValueCountFrequency (%)
[98053
100.0%
ValueCountFrequency (%)
-98053
100.0%
ValueCountFrequency (%)
)98053
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common689023
100.0%

Most frequent character per script

ValueCountFrequency (%)
0198823
28.9%
[98053
14.2%
-98053
14.2%
)98053
14.2%
747115
 
6.8%
842008
 
6.1%
638506
 
5.6%
525962
 
3.8%
919419
 
2.8%
412813
 
1.9%
Other values (3)10218
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII689023
100.0%

Most frequent character per block

ValueCountFrequency (%)
0198823
28.9%
[98053
14.2%
-98053
14.2%
)98053
14.2%
747115
 
6.8%
842008
 
6.1%
638506
 
5.6%
525962
 
3.8%
919419
 
2.8%
412813
 
1.9%
Other values (3)10218
 
1.5%

admission_type_id
Real number (ℝ≥0)

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.025812571
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Memory size766.2 KiB
2021-05-05T17:23:18.219990image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q33
95-th percentile6
Maximum8
Range7
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.450117239
Coefficient of variation (CV)0.7158200418
Kurtosis1.912034312
Mean2.025812571
Median Absolute Deviation (MAD)0
Skewness1.587030686
Sum198637
Variance2.102840007
MonotocityNot monotonic
2021-05-05T17:23:18.302951image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
152178
53.2%
318194
 
18.6%
217543
 
17.9%
65135
 
5.2%
54661
 
4.8%
8312
 
0.3%
720
 
< 0.1%
410
 
< 0.1%
ValueCountFrequency (%)
152178
53.2%
217543
 
17.9%
318194
 
18.6%
410
 
< 0.1%
54661
 
4.8%
ValueCountFrequency (%)
8312
 
0.3%
720
 
< 0.1%
65135
5.2%
54661
4.8%
410
 
< 0.1%

discharge_disposition_id
Real number (ℝ≥0)

Distinct26
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.753368076
Minimum1
Maximum28
Zeros0
Zeros (%)0.0%
Memory size766.2 KiB
2021-05-05T17:23:18.408921image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile18
Maximum28
Range27
Interquartile range (IQR)3

Descriptive statistics

Standard deviation5.309391793
Coefficient of variation (CV)1.414567313
Kurtosis5.838277851
Mean3.753368076
Median Absolute Deviation (MAD)0
Skewness2.534770768
Sum368029
Variance28.18964121
MonotocityNot monotonic
2021-05-05T17:23:18.514756image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
157610
58.8%
313564
 
13.8%
612626
 
12.9%
183624
 
3.7%
22049
 
2.1%
221970
 
2.0%
111606
 
1.6%
51127
 
1.1%
25941
 
1.0%
4756
 
0.8%
Other values (16)2180
 
2.2%
ValueCountFrequency (%)
157610
58.8%
22049
 
2.1%
313564
 
13.8%
4756
 
0.8%
51127
 
1.1%
ValueCountFrequency (%)
28137
 
0.1%
275
 
< 0.1%
25941
1.0%
2448
 
< 0.1%
23400
0.4%

admission_source_id
Real number (ℝ≥0)

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.776692197
Minimum1
Maximum25
Zeros0
Zeros (%)0.0%
Memory size766.2 KiB
2021-05-05T17:23:18.615091image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median7
Q37
95-th percentile17
Maximum25
Range24
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.071639573
Coefficient of variation (CV)0.7048392807
Kurtosis1.733779297
Mean5.776692197
Median Absolute Deviation (MAD)0
Skewness1.027278586
Sum566422
Variance16.57824881
MonotocityNot monotonic
2021-05-05T17:23:18.726640image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
755951
57.1%
128356
28.9%
176602
 
6.7%
42945
 
3.0%
61893
 
1.9%
21031
 
1.1%
5846
 
0.9%
3179
 
0.2%
20160
 
0.2%
949
 
< 0.1%
Other values (7)41
 
< 0.1%
ValueCountFrequency (%)
128356
28.9%
21031
 
1.1%
3179
 
0.2%
42945
 
3.0%
5846
 
0.9%
ValueCountFrequency (%)
252
 
< 0.1%
2212
 
< 0.1%
20160
 
0.2%
176602
6.7%
142
 
< 0.1%

time_in_hospital
Real number (ℝ≥0)

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.42197587
Minimum1
Maximum14
Zeros0
Zeros (%)0.0%
Memory size766.2 KiB
2021-05-05T17:23:18.826141image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile11
Maximum14
Range13
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.993074463
Coefficient of variation (CV)0.6768635902
Kurtosis0.817949426
Mean4.42197587
Median Absolute Deviation (MAD)2
Skewness1.123569649
Sum433588
Variance8.958494742
MonotocityNot monotonic
2021-05-05T17:23:18.935828image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
317049
17.4%
216441
16.8%
113490
13.8%
413434
13.7%
59699
9.9%
67320
7.5%
75694
 
5.8%
84276
 
4.4%
92928
 
3.0%
102287
 
2.3%
Other values (4)5435
 
5.5%
ValueCountFrequency (%)
113490
13.8%
216441
16.8%
317049
17.4%
413434
13.7%
59699
9.9%
ValueCountFrequency (%)
141017
1.0%
131185
1.2%
121424
1.5%
111809
1.8%
102287
2.3%

num_lab_procedures
Real number (ℝ≥0)

Distinct118
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.14807298
Minimum1
Maximum132
Zeros0
Zeros (%)0.0%
Memory size766.2 KiB
2021-05-05T17:23:19.067522image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q131
median44
Q357
95-th percentile73
Maximum132
Range131
Interquartile range (IQR)26

Descriptive statistics

Standard deviation19.71203294
Coefficient of variation (CV)0.4568461945
Kurtosis-0.2451976515
Mean43.14807298
Median Absolute Deviation (MAD)13
Skewness-0.2355346172
Sum4230798
Variance388.5642427
MonotocityNot monotonic
2021-05-05T17:23:19.209643image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13096
 
3.2%
432724
 
2.8%
442414
 
2.5%
452306
 
2.4%
382131
 
2.2%
462120
 
2.2%
402113
 
2.2%
412046
 
2.1%
422031
 
2.1%
472028
 
2.1%
Other values (108)75044
76.5%
ValueCountFrequency (%)
13096
3.2%
21062
 
1.1%
3647
 
0.7%
4364
 
0.4%
5277
 
0.3%
ValueCountFrequency (%)
1321
< 0.1%
1291
< 0.1%
1261
< 0.1%
1211
< 0.1%
1201
< 0.1%

num_procedures
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.350749085
Minimum0
Maximum6
Zeros44574
Zeros (%)45.5%
Memory size766.2 KiB
2021-05-05T17:23:19.317627image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.708505881
Coefficient of variation (CV)1.264858071
Kurtosis0.8236555027
Mean1.350749085
Median Absolute Deviation (MAD)1
Skewness1.303916989
Sum132445
Variance2.918992346
MonotocityNot monotonic
2021-05-05T17:23:19.394152image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
044574
45.5%
120029
20.4%
212383
 
12.6%
39210
 
9.4%
64811
 
4.9%
44076
 
4.2%
52970
 
3.0%
ValueCountFrequency (%)
044574
45.5%
120029
20.4%
212383
 
12.6%
39210
 
9.4%
44076
 
4.2%
ValueCountFrequency (%)
64811
 
4.9%
52970
 
3.0%
44076
 
4.2%
39210
9.4%
212383
12.6%

num_medications
Real number (ℝ≥0)

Distinct75
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16.11964958
Minimum1
Maximum81
Zeros0
Zeros (%)0.0%
Memory size766.2 KiB
2021-05-05T17:23:19.508387image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q111
median15
Q320
95-th percentile31
Maximum81
Range80
Interquartile range (IQR)9

Descriptive statistics

Standard deviation8.108475918
Coefficient of variation (CV)0.5030181257
Kurtosis3.493505174
Mean16.11964958
Median Absolute Deviation (MAD)5
Skewness1.332695065
Sum1580580
Variance65.74738171
MonotocityNot monotonic
2021-05-05T17:23:19.653088image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
135885
 
6.0%
125816
 
5.9%
155621
 
5.7%
115592
 
5.7%
145520
 
5.6%
165271
 
5.4%
105167
 
5.3%
174783
 
4.9%
94711
 
4.8%
184399
 
4.5%
Other values (65)45288
46.2%
ValueCountFrequency (%)
1236
 
0.2%
2397
 
0.4%
3785
0.8%
41269
1.3%
51835
1.9%
ValueCountFrequency (%)
811
 
< 0.1%
791
 
< 0.1%
752
< 0.1%
741
 
< 0.1%
723
< 0.1%

number_outpatient
Real number (ℝ≥0)

ZEROS

Distinct39
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3763780812
Minimum0
Maximum42
Zeros81680
Zeros (%)83.3%
Memory size766.2 KiB
2021-05-05T17:23:19.795141image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum42
Range42
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.28335944
Coefficient of variation (CV)3.409761365
Kurtosis145.5912818
Mean0.3763780812
Median Absolute Deviation (MAD)0
Skewness8.78170539
Sum36905
Variance1.647011452
MonotocityNot monotonic
2021-05-05T17:23:20.203958image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
081680
83.3%
18340
 
8.5%
23514
 
3.6%
32005
 
2.0%
41078
 
1.1%
5521
 
0.5%
6297
 
0.3%
7153
 
0.2%
898
 
0.1%
983
 
0.1%
Other values (29)284
 
0.3%
ValueCountFrequency (%)
081680
83.3%
18340
 
8.5%
23514
 
3.6%
32005
 
2.0%
41078
 
1.1%
ValueCountFrequency (%)
421
< 0.1%
401
< 0.1%
391
< 0.1%
381
< 0.1%
371
< 0.1%

number_emergency
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct33
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2024619339
Minimum0
Maximum76
Zeros86846
Zeros (%)88.6%
Memory size766.2 KiB
2021-05-05T17:23:20.320554image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum76
Range76
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.94289229
Coefficient of variation (CV)4.657133675
Kurtosis1171.637565
Mean0.2024619339
Median Absolute Deviation (MAD)0
Skewness22.71034016
Sum19852
Variance0.8890458704
MonotocityNot monotonic
2021-05-05T17:23:20.424895image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
086846
88.6%
17550
 
7.7%
22011
 
2.1%
3716
 
0.7%
4372
 
0.4%
5190
 
0.2%
693
 
0.1%
772
 
0.1%
850
 
0.1%
1034
 
< 0.1%
Other values (23)119
 
0.1%
ValueCountFrequency (%)
086846
88.6%
17550
 
7.7%
22011
 
2.1%
3716
 
0.7%
4372
 
0.4%
ValueCountFrequency (%)
761
< 0.1%
641
< 0.1%
631
< 0.1%
541
< 0.1%
461
< 0.1%

number_inpatient
Real number (ℝ≥0)

ZEROS

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6468644509
Minimum0
Maximum21
Zeros64634
Zeros (%)65.9%
Memory size766.2 KiB
2021-05-05T17:23:20.531607image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum21
Range21
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.271020492
Coefficient of variation (CV)1.964894639
Kurtosis19.94332538
Mean0.6468644509
Median Absolute Deviation (MAD)0
Skewness3.554829592
Sum63427
Variance1.61549309
MonotocityNot monotonic
2021-05-05T17:23:20.630951image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
064634
65.9%
119067
 
19.4%
27421
 
7.6%
33346
 
3.4%
41597
 
1.6%
5802
 
0.8%
6474
 
0.5%
7266
 
0.3%
8147
 
0.1%
9111
 
0.1%
Other values (10)188
 
0.2%
ValueCountFrequency (%)
064634
65.9%
119067
 
19.4%
27421
 
7.6%
33346
 
3.4%
41597
 
1.6%
ValueCountFrequency (%)
211
 
< 0.1%
192
 
< 0.1%
181
 
< 0.1%
165
< 0.1%
158
< 0.1%

diag_1
Categorical

HIGH CARDINALITY

Distinct713
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size766.2 KiB
428
 
6730
414
 
6374
786
 
3900
410
 
3514
486
 
3412
Other values (708)
74123 

Length

Max length6
Median length3
Mean length3.162167399
Min length1

Characters and Unicode

Total characters310060
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique87 ?
Unique (%)0.1%

Sample

1st row276
2nd row648
3rd row8
4th row197
5th row414
ValueCountFrequency (%)
4286730
 
6.9%
4146374
 
6.5%
7863900
 
4.0%
4103514
 
3.6%
4863412
 
3.5%
4272701
 
2.8%
4912210
 
2.3%
7152073
 
2.1%
4341983
 
2.0%
7801976
 
2.0%
Other values (703)63180
64.4%
2021-05-05T17:23:20.917062image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
4286730
 
6.9%
4146374
 
6.5%
7863900
 
4.0%
4103514
 
3.6%
4863412
 
3.5%
4272701
 
2.8%
4912210
 
2.3%
7152073
 
2.1%
4341983
 
2.0%
7801976
 
2.0%
Other values (703)63180
64.4%

Most occurring characters

ValueCountFrequency (%)
453841
17.4%
237924
12.2%
836767
11.9%
535509
11.5%
727739
8.9%
126791
8.6%
023459
7.6%
622453
7.2%
919352
 
6.2%
316850
 
5.4%
Other values (3)9375
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number300685
97.0%
Other Punctuation7774
 
2.5%
Uppercase Letter1601
 
0.5%

Most frequent character per category

ValueCountFrequency (%)
453841
17.9%
237924
12.6%
836767
12.2%
535509
11.8%
727739
9.2%
126791
8.9%
023459
7.8%
622453
7.5%
919352
 
6.4%
316850
 
5.6%
ValueCountFrequency (%)
V1600
99.9%
E1
 
0.1%
ValueCountFrequency (%)
.7774
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common308459
99.5%
Latin1601
 
0.5%

Most frequent character per script

ValueCountFrequency (%)
453841
17.5%
237924
12.3%
836767
11.9%
535509
11.5%
727739
9.0%
126791
8.7%
023459
7.6%
622453
7.3%
919352
 
6.3%
316850
 
5.5%
ValueCountFrequency (%)
V1600
99.9%
E1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII310060
100.0%

Most frequent character per block

ValueCountFrequency (%)
453841
17.4%
237924
12.2%
836767
11.9%
535509
11.5%
727739
8.9%
126791
8.6%
023459
7.6%
622453
7.2%
919352
 
6.2%
316850
 
5.4%
Other values (3)9375
 
3.0%

diag_2
Categorical

HIGH CARDINALITY

Distinct740
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size766.2 KiB
428
 
6517
276
 
6513
250
 
5412
427
 
4919
401
 
3613
Other values (735)
71079 

Length

Max length6
Median length3
Mean length3.172274178
Min length1

Characters and Unicode

Total characters311051
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique120 ?
Unique (%)0.1%

Sample

1st row250.01
2nd row250
3rd row250.43
4th row157
5th row411
ValueCountFrequency (%)
4286517
 
6.6%
2766513
 
6.6%
2505412
 
5.5%
4274919
 
5.0%
4013613
 
3.7%
4963233
 
3.3%
5993225
 
3.3%
4032781
 
2.8%
4142574
 
2.6%
4112496
 
2.5%
Other values (730)56770
57.9%
2021-05-05T17:23:21.163885image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
4286517
 
6.6%
2766513
 
6.6%
2505412
 
5.5%
4274919
 
5.0%
4013613
 
3.7%
4963233
 
3.3%
5993225
 
3.3%
4032781
 
2.8%
4142574
 
2.6%
4112496
 
2.5%
Other values (730)56770
57.9%

Most occurring characters

ValueCountFrequency (%)
449919
16.0%
247802
15.4%
536582
11.8%
032414
10.4%
827942
9.0%
727749
8.9%
125358
8.2%
921289
6.8%
619412
 
6.2%
313683
 
4.4%
Other values (3)8901
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number302150
97.1%
Other Punctuation6450
 
2.1%
Uppercase Letter2451
 
0.8%

Most frequent character per category

ValueCountFrequency (%)
449919
16.5%
247802
15.8%
536582
12.1%
032414
10.7%
827942
9.2%
727749
9.2%
125358
8.4%
921289
7.0%
619412
 
6.4%
313683
 
4.5%
ValueCountFrequency (%)
V1735
70.8%
E716
29.2%
ValueCountFrequency (%)
.6450
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common308600
99.2%
Latin2451
 
0.8%

Most frequent character per script

ValueCountFrequency (%)
449919
16.2%
247802
15.5%
536582
11.9%
032414
10.5%
827942
9.1%
727749
9.0%
125358
8.2%
921289
6.9%
619412
 
6.3%
313683
 
4.4%
ValueCountFrequency (%)
V1735
70.8%
E716
29.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII311051
100.0%

Most frequent character per block

ValueCountFrequency (%)
449919
16.0%
247802
15.4%
536582
11.8%
032414
10.4%
827942
9.0%
727749
8.9%
125358
8.2%
921289
6.8%
619412
 
6.2%
313683
 
4.4%
Other values (3)8901
 
2.9%

diag_3
Categorical

HIGH CARDINALITY

Distinct786
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size766.2 KiB
250
11208 
401
8090 
276
 
5097
428
 
4491
427
 
3865
Other values (781)
65302 

Length

Max length6
Median length3
Mean length3.142188408
Min length1

Characters and Unicode

Total characters308101
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique126 ?
Unique (%)0.1%

Sample

1st row255
2nd rowV27
3rd row403
4th row250
5th row250
ValueCountFrequency (%)
25011208
 
11.4%
4018090
 
8.3%
2765097
 
5.2%
4284491
 
4.6%
4273865
 
3.9%
4143567
 
3.6%
4962552
 
2.6%
4032322
 
2.4%
5851949
 
2.0%
2721910
 
1.9%
Other values (776)53002
54.1%
2021-05-05T17:23:21.401694image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
25011208
 
11.4%
4018090
 
8.3%
2765097
 
5.2%
4284491
 
4.6%
4273865
 
3.9%
4143567
 
3.6%
4962552
 
2.6%
4032322
 
2.4%
5851949
 
2.0%
2721910
 
1.9%
Other values (776)53002
54.1%

Most occurring characters

ValueCountFrequency (%)
250082
16.3%
448157
15.6%
540276
13.1%
038773
12.6%
725936
8.4%
124108
7.8%
823281
7.6%
916938
 
5.5%
616112
 
5.2%
313976
 
4.5%
Other values (3)10462
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number297639
96.6%
Other Punctuation5488
 
1.8%
Uppercase Letter4974
 
1.6%

Most frequent character per category

ValueCountFrequency (%)
250082
16.8%
448157
16.2%
540276
13.5%
038773
13.0%
725936
8.7%
124108
8.1%
823281
7.8%
916938
 
5.7%
616112
 
5.4%
313976
 
4.7%
ValueCountFrequency (%)
V3757
75.5%
E1217
 
24.5%
ValueCountFrequency (%)
.5488
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common303127
98.4%
Latin4974
 
1.6%

Most frequent character per script

ValueCountFrequency (%)
250082
16.5%
448157
15.9%
540276
13.3%
038773
12.8%
725936
8.6%
124108
8.0%
823281
7.7%
916938
 
5.6%
616112
 
5.3%
313976
 
4.6%
ValueCountFrequency (%)
V3757
75.5%
E1217
 
24.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII308101
100.0%

Most frequent character per block

ValueCountFrequency (%)
250082
16.3%
448157
15.6%
540276
13.1%
038773
12.6%
725936
8.4%
124108
7.8%
823281
7.6%
916938
 
5.5%
616112
 
5.2%
313976
 
4.5%
Other values (3)10462
 
3.4%

number_diagnoses
Real number (ℝ≥0)

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.512059804
Minimum3
Maximum16
Zeros0
Zeros (%)0.0%
Memory size766.2 KiB
2021-05-05T17:23:21.501679image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile4
Q16
median8
Q39
95-th percentile9
Maximum16
Range13
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.832496822
Coefficient of variation (CV)0.2439406593
Kurtosis-0.3451201144
Mean7.512059804
Median Absolute Deviation (MAD)1
Skewness-0.8175023377
Sum736580
Variance3.358044602
MonotocityNot monotonic
2021-05-05T17:23:21.623595image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
948687
49.7%
510592
 
10.8%
810388
 
10.6%
710179
 
10.4%
69988
 
10.2%
45361
 
5.5%
32751
 
2.8%
1640
 
< 0.1%
1316
 
< 0.1%
1016
 
< 0.1%
Other values (4)35
 
< 0.1%
ValueCountFrequency (%)
32751
 
2.8%
45361
5.5%
510592
10.8%
69988
10.2%
710179
10.4%
ValueCountFrequency (%)
1640
< 0.1%
158
 
< 0.1%
147
 
< 0.1%
1316
 
< 0.1%
129
 
< 0.1%

max_glu_serum
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size766.2 KiB
None
92845 
Norm
 
2532
>200
 
1449
>300
 
1227

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters392212
Distinct characters10
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNone
2nd rowNone
3rd rowNone
4th rowNone
5th rowNone
ValueCountFrequency (%)
None92845
94.7%
Norm2532
 
2.6%
>2001449
 
1.5%
>3001227
 
1.3%
2021-05-05T17:23:21.821964image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-05-05T17:23:21.886134image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
none92845
94.7%
norm2532
 
2.6%
2001449
 
1.5%
3001227
 
1.3%

Most occurring characters

ValueCountFrequency (%)
N95377
24.3%
o95377
24.3%
n92845
23.7%
e92845
23.7%
05352
 
1.4%
>2676
 
0.7%
r2532
 
0.6%
m2532
 
0.6%
21449
 
0.4%
31227
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter286131
73.0%
Uppercase Letter95377
 
24.3%
Decimal Number8028
 
2.0%
Math Symbol2676
 
0.7%

Most frequent character per category

ValueCountFrequency (%)
o95377
33.3%
n92845
32.4%
e92845
32.4%
r2532
 
0.9%
m2532
 
0.9%
ValueCountFrequency (%)
05352
66.7%
21449
 
18.0%
31227
 
15.3%
ValueCountFrequency (%)
N95377
100.0%
ValueCountFrequency (%)
>2676
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin381508
97.3%
Common10704
 
2.7%

Most frequent character per script

ValueCountFrequency (%)
N95377
25.0%
o95377
25.0%
n92845
24.3%
e92845
24.3%
r2532
 
0.7%
m2532
 
0.7%
ValueCountFrequency (%)
05352
50.0%
>2676
25.0%
21449
 
13.5%
31227
 
11.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII392212
100.0%

Most frequent character per block

ValueCountFrequency (%)
N95377
24.3%
o95377
24.3%
n92845
23.7%
e92845
23.7%
05352
 
1.4%
>2676
 
0.7%
r2532
 
0.6%
m2532
 
0.6%
21449
 
0.4%
31227
 
0.3%

A1Cresult
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size766.2 KiB
None
81860 
>8
 
7631
Norm
 
4854
>7
 
3708

Length

Max length4
Median length4
Mean length3.768716918
Min length2

Characters and Unicode

Total characters369534
Distinct characters9
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNone
2nd rowNone
3rd rowNone
4th rowNone
5th rowNone
ValueCountFrequency (%)
None81860
83.5%
>87631
 
7.8%
Norm4854
 
5.0%
>73708
 
3.8%
2021-05-05T17:23:22.111391image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category