-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
191 lines (150 loc) · 5.32 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
---
output: github_document
bibliography: paper.bib
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
warning = FALSE,
message = FALSE,
comment = "#>",
fig.path = "man/figures/README-",
fig.width = 7.5,
fig.height = 4
)
```
# memr
```{r setup, include = FALSE}
library(memr)
```
## Medical records embeddings
The ``memr``(Multisource Embeddings for Medical Records) package in R allows for creating embeddings, i.e. vector
representations, of medical free-text records written by doctors. It also
provides a wide spectrum of tools to data visualization and medical
visits' segmentation. These tools aim to develop computer-supported medicine by
facilitating medical data analysis and iterpretation. The package can be exploited for
many applications like the recommendation prediction, patients' clustering etc. that
can aid doctors in their practice.
## Installation & Dependences
``memr`` is written in R and is based on the following packages:
* dplyr
* ggplot2
* ggrepel
* Rtsne
* text2vec
To install ``memr``, simply type in an R console (after having installed the `devtools` package, e.g. `install.package('devtools')`):
```{r, eval = FALSE}
devtools::install_git("https://github.com/MI2DataLab/memr")
```
## Usage
### Example datasets
We show the usage of the package on the example datasets. They are completely artificial, but their structure reflects a structure of the real data collected from Polish health centers. The results of the research on the real data are described in the paper @dobrakowski2019patients.
For every visit we can have some information about ICD-10 code of diagnosed disease,
ID and specialty of the doctor:
```{r}
knitr::kable(visits)
```
For the visits we have also the descriptions of interview
with the extracted medical terms:
```{r}
knitr::kable(interviews)
```
Descriptions of examinations of patients:
```{r}
knitr::kable(examinations)
```
And descriptions of recommendations prescribed by doctors to the patients:
```{r}
knitr::kable(recommendations)
```
Each medical term has one or more categories:
```{r}
knitr::kable(terms_categories)
```
### Medical terms embeddings
Firstly we can compute embeddings:
```{r}
embedding_size <- 5
interview_term_vectors <- embed_terms(merged_terms = interviews, embedding_size = embedding_size,
term_count_min = 1L)
examination_term_vectors <- embed_terms(merged_terms = examinations, embedding_size = embedding_size,
term_count_min = 1L)
knitr::kable(interview_term_vectors[1:5, ])
```
Terms from the chosen category can be visualized:
```{r}
visualize_term_embeddings(terms_categories, interview_term_vectors, c("anatomic"), method = "PCA")
```
To validate the quality of embeddings
we can perform the term analogy task
(see more by ?analogy_task). The package delivers
the analogy test set.
```{r}
knitr::kable(evaluate_term_embeddings(examination_term_vectors, n = 5, terms_pairs_test))
```
For each type of analogy we compute the mean accuracy.
Analogies can be plotted to see if
the connection lines are parallel:
```{r}
visualize_analogies(examination_term_vectors, terms_pairs_test$person, find_analogies = TRUE, n = 10)
```
### Visits embeddings
Having the embeddings of terms, we can compute
embeddings of visits:
```{r}
visits_vectors <- embed_list_visits(interviews, examinations, interview_term_vectors, examination_term_vectors)
knitr::kable(visits_vectors[1:5, ])
```
And now we can visualize the visits on the plot and color by the doctors' IDs:
```{r}
visualize_visit_embeddings(visits_vectors, visits, color_by = "doctor",
spec = "internist")
```
or by ICD-10 code:
```{r}
visualize_visit_embeddings(visits_vectors, visits, color_by = "icd10",
spec = "internist")
```
### Clustering
On the visits' embeddings we can run the k-means algorithm:
```{r}
clusters <- cluster_visits(visits_vectors, visits, spec = "internist", cluster_number = 2)
```
and plot the clusters:
```{r}
visualize_visit_embeddings(visits_vectors, visits, color_by = "cluster",
spec = "internist", clusters = clusters)
```
For every cluster we can see the most
frequent recommendations from chosen categories:
```{r}
rec_tables <- get_cluster_recommendations(recommendations, clusters,
category = "recommendation",
recom_table = terms_categories)
rec_tables
```
or from all categories:
```{r}
rec_tables <- get_cluster_recommendations(recommendations, clusters, category = "all")
rec_tables
```
If we have a new visit, we can assign it
to the most appropriate cluster:
```{r}
inter_descr <- paste("cough", sep = ", ")
exam_descr <- paste("fever", sep = ", ")
visit_description <- c(inter_descr, exam_descr)
names(visit_description) <- c("inter", "exam")
cl <- assign_visit_to_cluster(visit_description, clusters, interview_term_vectors, examination_term_vectors)
cl
```
As the last nice thing we can see
the embeddings of ICD-10 codes:
```{r}
visualize_icd10(visits_vectors, visits)
```
# Acknowledgements
The package was created during the research financially supported by the Polish Centre for Research and Development
(Grant POIR.01.01.01-00-0328/17).
# References