Processing¶
Generating the databases for SciELO website¶
Directories¶
The bases-work subdirectory hosts the sub-directories of each database during processing in addition to individual directories for each journal.
The serial subdirectory contains the directories of all journals which in turn have all the original numbers used in processing (this data may be discarded after processing is carried out and approved).
The bases subdirectory has the databases for the website.
The proc subdirectory has the processing scripts, and other files related to the processing.
GeraPadrao.bat¶
It is a script which generates the databases for the SciELO website.
- INPUTS:
- serial folder’s content:
- databases generated by Converter from markup and body folders and were sent to the Linux server by EnviaBasesScieloPadrao.bat script of the local server.
- scilista.lst which contains a list of journal issues to be add/replaced/deleted.
- folders and files located in /var/www/scielo/bases-work/
- log filename. e.g.: /var/www/scielo/proc/log/GeraPadrao.log
- OUTPUTS:
- folders and files located in /var/www/scielo/bases-work/
- folders and files located in /var/www/scielo/bases/
- log file. e.g.: /var/www/scielo/proc/log/GeraPadrao.log
Edit GeraPadrao.bat to set parameters to generate the website’s databases.
./GeraScielo.bat <serial_parent_path> <proc_parent_path> <log filename> <cria>
where
- Parameter 1: serial_parent_path: e.g.: /var/www/scielo/
- Parameter 2: proc_parent_path: e.g.: /var/www/scielo/
- Parameter 3: log filename: e.g.: /var/www/scielo/proc/log/GeraPadrao.log
- Parameter 4: optional parameter. Use the value cria, if you want to reset the log file, otherwise, the log file will be appended.
Warning
It is possible to use relative paths for parameters 1 to 3.
Examples:
./GeraScielo.bat .. .. log/GeraPadrao.log./GeraScielo.bat .. .. log/GeraPadrao.log cria
Execute GeraPadrao.bat
Go to /var/www/scielo/proc
Edit/check scilista.lst which contains the list of journal issues of the website.
vi ../serial/scilista.lst
Execute
./GeraPadrao.bat
Exporting the databases for SciELO Network processing¶
This process is made through the utilitary Paperboy. Paperboy is a Python utilitary developed to replace the scripts:
- Envia2MedlinePadrao.bat
- static_files_catalog.sh
Installing Paperboy¶
Install guide: https://github.com/scieloorg/paperboy
Configuring Paperboy¶
After install the paperboy you must create a config.ini file to configure the source and destiny resources, and the ssh account that will be used to send data to the server.
Creating config.ini file¶
Access the directory /var/www/scielo/proc
cd /var/www/scielo/proc
Create a text file named paperboy_envia_to_scielo_config.ini in the proc directory, the file must follow the bellow format:
Note
You may also use a file name of your preference for the config file, having in mind you must to replace the name of the config file in the following guidances.
[app:main]
source_dir=c:/var/www/scielo
cisis_dir=c:/var/www/scielo/proc/cisis
server=localhost
port=21
user=anonymous
password=anonymous
source_dir: Absolute path to the directory where the SciELO website was installed.
cisis_dir: Absolute path to the directory where CISIS utilitary was installed
ssh_server: Domain of the server where the SciELO Site was installed
ssh_port: The FTP port (default 21)
ssh_user: A valid FTP username
ssh_password: A valid FTP password for the given username
Tip
Ask your FTP credentials to the SciELO team.
Creating envia.sh file¶
Create a text file named paperboy_envia_to_scielo.sh in the proc directory.
Note
You may also use a file name of your preference for the batch file, having in mind you must to replace the name of the config file in the following guidances.
The content of the .sh file must be:
export PAPERBOY_SETTINGS_FILE=/var/www/scielo/proc/paperboy_envia_to_scielo_config.ini
paperboy_delivery_to_scielo
Running¶
Run the script paperboy_envia_to_scielo_config.sh to send databases and reports to SciELO.
./paperboy_envia_to_scielo_config.sh > /var/www/scielo/proc/log/paperboy_envia_to_scielo_config.log
Notes¶
- Ask the SciELO team for you SSH credentials.
- You must configure a CRON to run periodically the processing. (Preferable Weekly or after all the database updates)
- The log files are:
- /var/www/scielo/proc/log/paperboy_envia_to_scielo_config.log
CrossRef: Deposit with budget control¶
This processing selects the articles and generate the XML files to deposit on CrossRef, according to some conditions:
- budget
- current articles price
- backfiles price
- articles publication date
- priority order: most recent to older, or older to most recent
Check deposit fees http://www.crossref.org/02publishers/20pub_fees.html
Configuration¶
Configure proc/scielo_crs/shs/crossRef_config.sh¶
# CrossRef connection
crossrefUserName=
crossrefPassword=
depositor_institution=
depositor_prefix=
depositor_email=
depositor_url=
# BUDGET
# current articles fee
RECENT_FEE=
# Firs year of articles considered current
# All Current records (2007-2009). So, 2007
FIRST_YEAR_OF_RECENT_FEE=
# backfiles fee
BACKFILES_FEE=
crossrefUserName | username given by CrossRef |
crossrefPassword | password given by CrossRef |
depositor_institution | depositor institution name |
depositor_prefix | depositor prefix given by CrossRef |
depositor_email | e-mail to receive processing results from CrossRef |
depositor_url | SciELO Website URL |
RECENT_FEE | check at http://www.crossref.org/02publishers/20pub_fees.html E.g.: 1.0, for $1.00 |
FIRST_YEAR_OF_RECENT_FEE | check at http://www.crossref.org/02publishers/20pub_fees.html E.g.: 2014 for (2014-2016) |
BACKFILES_FEE | check at http://www.crossref.org/02publishers/20pub_fees.html E.g.: 0.25, for $0.25 |
Configure proc/scielo_crs/shs/xref.cip¶
Use the template proc/scielo_crs/shs/xref.cip.template to create xref.cip or edit it.
Replace all /home/scielo/www/proc by the proc path. E.g.: /var/www/scielo/proc
BIREME_TABS_GCHARENT.*=/home/scielo/www/proc/scielo_crs/databases/tabs/gcharent.* Y.*=/home/scielo/www/bases/title/title.* ARTICLE_DB.*=/home/scielo/www/bases/artigo/artigo.* ARTIGO_DB.*=/home/scielo/www/bases/artigo/artigo.* DB_BILL.*=/home/scielo/www/proc/scielo_crs/databases/budget/bill.* DB_BILL_BKP.*=/home/scielo/www/proc/scielo_crs/databases/budget/bill_BKP.* DB_BG.*=/home/scielo/www/proc/scielo_crs/databases/crossref/budget.* XREF_DOI_REPORT.*=/home/scielo/www/proc/scielo_crs/databases/crossref/crossref_DOIReport.* DB_PRESUPUESTOS.*=/home/scielo/www/proc/scielo_crs/databases/budget/presupuestos.* DB_BATCH_RUN_BUDGET.*=/home/scielo/www/proc/scielo_crs/databases/budget/batch_run_budget.* DB_BATCH_RUN.*=/home/scielo/www/proc/scielo_crs/databases/budget/batch_run.* DB_CTRL_BG.*=/home/scielo/www/proc/scielo_crs/databases/budget/budgetctrl.*
Configure proc/scielo_crs/shs/db_presupuestos.txt¶
It is a table in which each line is a budget.
Keep the first line which is a commentary.
Use SPACE character to separate each column.
This file must be edited whenever there is new budget.
- First column:
- ID - unique identified
- Second column:
- budget amount
- Third column:
- budget ISO date (YYYYMMDD, that is, 4 digits year, 2 digits month, 2 digits day)
E.g.:
In Jan 4, 2015, there is $150.00 (one hundred fifty dollars) and in Feb 4, 2015, there is $250.00 (two hundred fifty dollars):
1 150.00 20150104 2 250.00 20150204
In March 10, 2015, new budget: $100.00
3 100.00 20150310
db_presupuestos.txt contents:
1 150.00 20150104 2 250.00 20150204 3 100.00 20150310
When to execute¶
Execute it ONLY after finishing GeraPadrao.bat.
How to execute¶
Go to the corresponding path. E.g.:
cd /var/www/scielo/proc/scielo_crs/shs/
Execute:
./xref_run_budget.sh <budget ID> <Order> <processing mode> <Count> <ISSNYEAR>
Parameters description:
- <budget ID>
- budget ID will be spent
- <Order>
- Descending for most recent to older articles Ascending for older articles to most recent
- <processing mode>
- ALL = select all the articles, including the articles previously processed.
- ONLY_NEVER_PROCESSED = select the articles never processed before.
- ONLY_NEVER_SUBMITTED = select the articles which failed to submit the XML or failed to register the DOI.
- <Count>
- Limit the amount of articles to be process Use a number to the amount of articles or use ALL to process all
- <ISSNYEAR>
- Optional. Select the articles by ISSN and year. Use ALL for all the articles Use ISSN and year to a specific selection: E.g.: 1020-30402008
Examples:
./xref_run_budget.sh 2 Descending ONLY_NEVER_PROCESSED 100./xref_run_budget.sh 2 Descending ONLY_NEVER_PROCESSED ALL 1020-30402008
Results¶
In proc/scielo_crs/databases/budget, there are:
- presupuestos – database generated from db_presupuestos.txt (budget registration)
- budgetctrl – database which registers the budget consumption
- bill – database which registers the expenses of each article
- batch_run_budget – database which registers the data of each execution
bill database¶
../../cisis/mx ../databases/budget/bill
Contents:
mfn= 2 880 "S0717-73562009000100008" 65 "20090600" 4 "requested" 2 "1.0" 3 "20090714 110457 2 194" 1 "1" 121 "000001" 100 "20090714_110450_2_194" 30 "new^xcrossRef_sent_200907141104S0717-73562009000100008.log"
Description:
880 | article PID |
65 | article publication date |
4 | status: requested (success) or dont (not registered, failure) |
2 | DOI price |
3 | processing date/time |
1 | budget ID |
4 | processing order number |
100 | execution ID |
30 | status; same as v30 and v930 of CrossRef_DOIReport database |
batch_run_budget database¶
Add one register for each execution.
../../cisis/mx ../databases/budget/batch_run_budget
Contents:
mfn= 1 1 "1" 100 "20090714_110450_2_194" 190 "20090714 110450 2 194" 102 "0" 200 "2007" 201 "1.0" 202 "0.15" 121 "000001" 2 "1.00" 90 "20090714 110457 2 194"
Description:
1 | budget ID |
100 | execution ID |
190 | Start Date and time |
190 | Finish Date and time |
102 | initial budget, before the ejecution |
200 | initial year of current articles |
201 | DOI price for current articles |
202 | DOI price for backfiles |
121 | quantity of selected articles in this execution |
2 | expenses in this execution |
XML for DOI deposit¶
The XML files are generated in proc/scielo_crs/output/crossref/. The structure below is:
LOG¶
proc/scielo_crs/output/crossref/report_error.txt contains the processing errors.
Example:
PID=S0717-73562009000100001 log file: ../output/crossref/log/validationErrors_200907151502S0717-73562009000100001.log data de processamento: 2009000100001
crossref_DOIReport¶
proc/scielo_crs/databases/crossref/crossref_DOIReport contains the result of the processing of each article / DOI.
../../cisis/mx ../databases/crossref/crossref_DOIReport
Contents:
mfn= 2 30 "new" 930 "crossref_sent_200907141104S0717-73562009000100008.log" 880 "S0717-73562009000100008" 10 "20090714 110457 2 194"
Description:
30 | Status of the registration |
930 | DTD validation result |
880 | article PID |
10 | date and time of the registration |
CrossRef: Deposit without budget control¶
This processing generates CrossRef Deposit XML files and submit them to register articles DOI.
Configuration¶
Configure proc/scielo_crs/shs/crossRef_config.sh¶
# CrossRef connection
crossrefUserName=
crossrefPassword=
depositor_institution=
depositor_prefix=
depositor_email=
depositor_url=
crossrefUserName | username given by CrossRef |
crossrefPassword | password given by CrossRef |
depositor_institution | depositor institution name |
depositor_prefix | depositor prefix given by CrossRef |
depositor_email | e-mail to receive processing results from CrossRef |
depositor_url | SciELO Website URL |
Configure proc/scielo_crs/shs/xref.cip¶
Use the template proc/scielo_crs/shs/xref.cip.template to create xref.cip or edit it.
Replace all /home/scielo/www/proc by the proc path. E.g.: /var/www/scielo/proc
BIREME_TABS_GCHARENT.*=/home/scielo/www/proc/scielo_crs/databases/tabs/gcharent.* Y.*=/home/scielo/www/bases/title/title.* ARTICLE_DB.*=/home/scielo/www/bases/artigo/artigo.* ARTIGO_DB.*=/home/scielo/www/bases/artigo/artigo.* XREF_DOI_REPORT.*=/home/scielo/www/proc/scielo_crs/databases/crossref/crossref_DOIReport.*
When to execute¶
Execute it ONLY after finishing GeraPadrao.bat.
How to execute¶
Go to the corresponding path. E.g.:
cd /var/www/scielo/proc/scielo_crs/shs/
Execute:
./xref_run.sh <ISSN_OR_PID>
Parameters description:
- <ISSN_OR_PID>
- optional Use no value to process all the articles which have not be processed before. Use PID of an issue or an article Use ISSN of a journal
Examples:
./xref_run.sh./xref_run.sh 1020-30402008
Results¶
XML for DOI deposit¶
The XML files are generated in proc/scielo_crs/output/crossref/. The structure below is:
LOG¶
proc/scielo_crs/output/crossref/report_error.txt contains the processing errors.
Example:
PID=S0717-73562009000100001 log file: ../output/crossref/log/validationErrors_200907151502S0717-73562009000100001.log data de processamento: 2009000100001
crossref_DOIReport¶
proc/scielo_crs/databases/crossref/crossref_DOIReport contains the result of the processing of each article / DOI.
../../cisis/mx ../databases/crossref/crossref_DOIReport
Contents:
mfn= 2 30 "new" 930 "crossref_sent_200907141104S0717-73562009000100008.log" 880 "S0717-73562009000100008" 10 "20090714 110457 2 194"
Description:
30 | Status of the registration |
930 | DTD validation result |
880 | article PID |
10 | date and time of the registration |
CrossRef - Display DOI on SciELO Website¶
This processing generates, for each journals issue, one database which is used by SciELO Website to display the articles DOI.
Input: crossref_DOIReport database
When to execute¶
Execute after DOI deposit processing.
How to execute¶
1. scilista creation¶
This pre processing identifies the records which status in crossref_DOIReport database is not “error” and generates the scilista file according to the format:
Example:
neuro v19n6 S1130-147320080006 neuro v20n1 S1130-147320090001Attention
Last line must be empty
- Go to proc directory
- Execute the command:
Example:
./doi/scilista/scilista4art.bat scilista_doi.txt
2. doi database creation¶
- Go to proc directory
- Execute the command:
./doi/create/doi4art.bat <scilista>
Example:
./doi/create/doi4art.bat scilista_doi.txt
Questions about cisis and wxis versions¶
The commands must display the “same version”:
cisis/whatwxis hello
Migration from Lind to LindG4¶
If the cisis and wxis versions were migrated from Lind to LindG4, the files which extension is *.iy0 must be delete, otherwise the indexes will be generated, but they will not be properly read.
The files extensions that must be kept are:
- indexes files: * .cnt * .iyp * .ly1 * .ly2 * .n01 * .n02
- database files: * .mst * .xrf
Attention
The *.iy0 files must be remove from the public server too.
Find the files to delete¶
find . -name "*.iy0"