-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #4 in PGX/pgx-samples from feature/paper-classific…
…ation to master * commit 'ec86daa8e47382432535e1b444948d6a0876d33f': Add Cora example
- Loading branch information
Showing
12 changed files
with
443 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
Note: this example makes use of the new `SupervisedGraphWise` classifer, which is available in PGX 19.4.0. | ||
# Research Paper Classification - Cora Dataset | ||
Graph analytics often allows you to leverage relational information that other, classical models simply have no | ||
effective way of capturing. One example of such data is the citation network of research papers. | ||
|
||
## The Cora dataset | ||
The Cora dataset consists of 2708 machine-learning research papers which have been labeled with one of seven topics. | ||
Each paper has a 1433 dimensional binary feature vector, where 0/1 indicates the absence/presence of selected words from | ||
the dictionary. Additionally, the dataset provides the citation graph, where each edge represents a citation between two | ||
research papers. | ||
|
||
To run this example, download [the dataset](https://linqs-data.soe.ucsc.edu/public/lbc/cora.tgz) and extract the | ||
files `cora.cites` and `cora.content` into the `data/graph/cora` directory. | ||
`cora.content` contains the vertex (research paper) feature vectors, and `cora.cites` contains the edges. | ||
|
||
## Running the example | ||
The example can be run using the command `./gradlew run`. This will perform the following steps: | ||
1. Load the Cora graph into the PGX server | ||
2. Create a train-test split | ||
3. Train a `SupervisedGraphWise` model on the training graph | ||
4. Evaluate the trained model on the unseen test nodes | ||
5. Infer and save embeddings for all vertices into `embeddings.csv` | ||
|
||
## Evaluating the results | ||
We can see that the model performs quite well on the test set: the following is example evaluation output: | ||
|
||
|--------------------|--------------------|--------------------|--------------------| | ||
| Accuracy | Precision | Recall | F1-Score | | ||
|--------------------|--------------------|--------------------|--------------------| | ||
| 0.869 | 0.867 | 0.846 | 0.854 | | ||
|--------------------|--------------------|--------------------|--------------------| | ||
|
||
The hyper parameters of the model have been left to the default settings, but can of course be tuned to optimize the | ||
performance. | ||
|
||
## Visualizing the Embeddings | ||
The embeddings themselves can be used for other downstream tasks, such as cluster or link prediction. To visualize them | ||
in an understandable manner, we can generate and plot t-SNE vectors for the vertices. The following figure is the result. | ||
|
||
![Cora Embeddings t-SNE plot](cora_tsne.png "Cora Embeddings TNSE plot") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
apply plugin: 'java' | ||
jar.enabled = false | ||
|
||
ext { | ||
mainClassName = 'Apps.CoraApp' | ||
} | ||
|
||
def pgxDir = "$projectDir/../libs/pgx-19.4.0-SNAPSHOT" | ||
def commonDeps = "$pgxDir/shared-lib/common" | ||
def embeddedDeps = "$pgxDir/shared-lib/embedded" | ||
def serverDeps = "$pgxDir/shared-lib/server" | ||
def thirdPartyDeps = "$pgxDir/shared-lib/third-party" | ||
def smCommonDeps = "$pgxDir/shared-memory/common" | ||
def smClientDeps = "$pgxDir/shared-memory/client" | ||
def smEmbeddedDeps = "$pgxDir/shared-memory/embedded" | ||
def smThirdPartyDeps = "$pgxDir/shared-memory/third-party" | ||
|
||
dependencies { | ||
compile fileTree(dir: commonDeps, includes: ['*.jar']) | ||
compile fileTree(dir: embeddedDeps, includes: ['*.jar']) | ||
compile fileTree(dir: serverDeps, includes: ['*.jar']) | ||
compile fileTree(dir: thirdPartyDeps, includes: ['*.jar']) | ||
compile fileTree(dir: smCommonDeps, includes: ['*.jar']) | ||
compile fileTree(dir: smClientDeps, includes: ['*.jar']) | ||
compile fileTree(dir: smEmbeddedDeps, includes: ['*.jar']) | ||
compile fileTree(dir: smThirdPartyDeps, includes: ['*.jar']) | ||
} | ||
|
||
task run(type: JavaExec, dependsOn: jar) { | ||
jvmArgs = ['-Dpgx_conf=conf/pgx.conf'] | ||
classpath = sourceSets.main.runtimeClasspath | ||
main = mainClassName | ||
args = [project.projectDir] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{ | ||
"datasource_dir_whitelist" : ["PUT_THE_ABSOLUTE_PATH_TO_DATA_DIRECTORY_HERE"], | ||
"allow_local_filesystem": true, | ||
"tmp_dir": "tmp_data" | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Place the files `cora.content` and `cora.cites` here. See `README.md` for more information. |
Binary file not shown.
5 changes: 5 additions & 0 deletions
5
paper-classification/gradle/wrapper/gradle-wrapper.properties
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
distributionBase=GRADLE_USER_HOME | ||
distributionPath=wrapper/dists | ||
zipStoreBase=GRADLE_USER_HOME | ||
zipStorePath=wrapper/dists | ||
distributionUrl=https\://services.gradle.org/distributions/gradle-5.2.1-bin.zip |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,172 @@ | ||
#!/usr/bin/env sh | ||
|
||
############################################################################## | ||
## | ||
## Gradle start up script for UN*X | ||
## | ||
############################################################################## | ||
|
||
# Attempt to set APP_HOME | ||
# Resolve links: $0 may be a link | ||
PRG="$0" | ||
# Need this for relative symlinks. | ||
while [ -h "$PRG" ] ; do | ||
ls=`ls -ld "$PRG"` | ||
link=`expr "$ls" : '.*-> \(.*\)$'` | ||
if expr "$link" : '/.*' > /dev/null; then | ||
PRG="$link" | ||
else | ||
PRG=`dirname "$PRG"`"/$link" | ||
fi | ||
done | ||
SAVED="`pwd`" | ||
cd "`dirname \"$PRG\"`/" >/dev/null | ||
APP_HOME="`pwd -P`" | ||
cd "$SAVED" >/dev/null | ||
|
||
APP_NAME="Gradle" | ||
APP_BASE_NAME=`basename "$0"` | ||
|
||
# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script. | ||
DEFAULT_JVM_OPTS="" | ||
|
||
# Use the maximum available, or set MAX_FD != -1 to use that value. | ||
MAX_FD="maximum" | ||
|
||
warn () { | ||
echo "$*" | ||
} | ||
|
||
die () { | ||
echo | ||
echo "$*" | ||
echo | ||
exit 1 | ||
} | ||
|
||
# OS specific support (must be 'true' or 'false'). | ||
cygwin=false | ||
msys=false | ||
darwin=false | ||
nonstop=false | ||
case "`uname`" in | ||
CYGWIN* ) | ||
cygwin=true | ||
;; | ||
Darwin* ) | ||
darwin=true | ||
;; | ||
MINGW* ) | ||
msys=true | ||
;; | ||
NONSTOP* ) | ||
nonstop=true | ||
;; | ||
esac | ||
|
||
CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar | ||
|
||
# Determine the Java command to use to start the JVM. | ||
if [ -n "$JAVA_HOME" ] ; then | ||
if [ -x "$JAVA_HOME/jre/sh/java" ] ; then | ||
# IBM's JDK on AIX uses strange locations for the executables | ||
JAVACMD="$JAVA_HOME/jre/sh/java" | ||
else | ||
JAVACMD="$JAVA_HOME/bin/java" | ||
fi | ||
if [ ! -x "$JAVACMD" ] ; then | ||
die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME | ||
Please set the JAVA_HOME variable in your environment to match the | ||
location of your Java installation." | ||
fi | ||
else | ||
JAVACMD="java" | ||
which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. | ||
Please set the JAVA_HOME variable in your environment to match the | ||
location of your Java installation." | ||
fi | ||
|
||
# Increase the maximum file descriptors if we can. | ||
if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then | ||
MAX_FD_LIMIT=`ulimit -H -n` | ||
if [ $? -eq 0 ] ; then | ||
if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then | ||
MAX_FD="$MAX_FD_LIMIT" | ||
fi | ||
ulimit -n $MAX_FD | ||
if [ $? -ne 0 ] ; then | ||
warn "Could not set maximum file descriptor limit: $MAX_FD" | ||
fi | ||
else | ||
warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT" | ||
fi | ||
fi | ||
|
||
# For Darwin, add options to specify how the application appears in the dock | ||
if $darwin; then | ||
GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\"" | ||
fi | ||
|
||
# For Cygwin, switch paths to Windows format before running java | ||
if $cygwin ; then | ||
APP_HOME=`cygpath --path --mixed "$APP_HOME"` | ||
CLASSPATH=`cygpath --path --mixed "$CLASSPATH"` | ||
JAVACMD=`cygpath --unix "$JAVACMD"` | ||
|
||
# We build the pattern for arguments to be converted via cygpath | ||
ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null` | ||
SEP="" | ||
for dir in $ROOTDIRSRAW ; do | ||
ROOTDIRS="$ROOTDIRS$SEP$dir" | ||
SEP="|" | ||
done | ||
OURCYGPATTERN="(^($ROOTDIRS))" | ||
# Add a user-defined pattern to the cygpath arguments | ||
if [ "$GRADLE_CYGPATTERN" != "" ] ; then | ||
OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)" | ||
fi | ||
# Now convert the arguments - kludge to limit ourselves to /bin/sh | ||
i=0 | ||
for arg in "$@" ; do | ||
CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -` | ||
CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option | ||
|
||
if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition | ||
eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"` | ||
else | ||
eval `echo args$i`="\"$arg\"" | ||
fi | ||
i=$((i+1)) | ||
done | ||
case $i in | ||
(0) set -- ;; | ||
(1) set -- "$args0" ;; | ||
(2) set -- "$args0" "$args1" ;; | ||
(3) set -- "$args0" "$args1" "$args2" ;; | ||
(4) set -- "$args0" "$args1" "$args2" "$args3" ;; | ||
(5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;; | ||
(6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;; | ||
(7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;; | ||
(8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;; | ||
(9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;; | ||
esac | ||
fi | ||
|
||
# Escape application args | ||
save () { | ||
for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done | ||
echo " " | ||
} | ||
APP_ARGS=$(save "$@") | ||
|
||
# Collect all arguments for the java command, following the shell quoting and substitution rules | ||
eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS" | ||
|
||
# by default we should be in the correct project dir, but when run from Finder on Mac, the cwd is wrong | ||
if [ "$(uname)" = "Darwin" ] && [ "$HOME" = "$PWD" ]; then | ||
cd "$(dirname "$0")" | ||
fi | ||
|
||
exec "$JAVACMD" "$@" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
@if "%DEBUG%" == "" @echo off | ||
@rem ########################################################################## | ||
@rem | ||
@rem Gradle startup script for Windows | ||
@rem | ||
@rem ########################################################################## | ||
|
||
@rem Set local scope for the variables with windows NT shell | ||
if "%OS%"=="Windows_NT" setlocal | ||
|
||
set DIRNAME=%~dp0 | ||
if "%DIRNAME%" == "" set DIRNAME=. | ||
set APP_BASE_NAME=%~n0 | ||
set APP_HOME=%DIRNAME% | ||
|
||
@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script. | ||
set DEFAULT_JVM_OPTS= | ||
|
||
@rem Find java.exe | ||
if defined JAVA_HOME goto findJavaFromJavaHome | ||
|
||
set JAVA_EXE=java.exe | ||
%JAVA_EXE% -version >NUL 2>&1 | ||
if "%ERRORLEVEL%" == "0" goto init | ||
|
||
echo. | ||
echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. | ||
echo. | ||
echo Please set the JAVA_HOME variable in your environment to match the | ||
echo location of your Java installation. | ||
|
||
goto fail | ||
|
||
:findJavaFromJavaHome | ||
set JAVA_HOME=%JAVA_HOME:"=% | ||
set JAVA_EXE=%JAVA_HOME%/bin/java.exe | ||
|
||
if exist "%JAVA_EXE%" goto init | ||
|
||
echo. | ||
echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME% | ||
echo. | ||
echo Please set the JAVA_HOME variable in your environment to match the | ||
echo location of your Java installation. | ||
|
||
goto fail | ||
|
||
:init | ||
@rem Get command-line arguments, handling Windows variants | ||
|
||
if not "%OS%" == "Windows_NT" goto win9xME_args | ||
|
||
:win9xME_args | ||
@rem Slurp the command line arguments. | ||
set CMD_LINE_ARGS= | ||
set _SKIP=2 | ||
|
||
:win9xME_args_slurp | ||
if "x%~1" == "x" goto execute | ||
|
||
set CMD_LINE_ARGS=%* | ||
|
||
:execute | ||
@rem Setup the command line | ||
|
||
set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar | ||
|
||
@rem Execute Gradle | ||
"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %CMD_LINE_ARGS% | ||
|
||
:end | ||
@rem End local scope for the variables with windows NT shell | ||
if "%ERRORLEVEL%"=="0" goto mainEnd | ||
|
||
:fail | ||
rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of | ||
rem the _cmd.exe /c_ return code! | ||
if not "" == "%GRADLE_EXIT_CONSOLE%" exit 1 | ||
exit /b 1 | ||
|
||
:mainEnd | ||
if "%OS%"=="Windows_NT" endlocal | ||
|
||
:omega |
Oops, something went wrong.