社会网络分析论坛 social network analysis forum

 找回密码
 立即注册
楼主: snachina
打印 上一主题 下一主题

EAT The Edinburgh Associative Thesaurus

[复制链接]

683

主题

924

帖子

998万

积分

管理员

Rank: 9Rank: 9Rank: 9

金币
9977499
贡献
448
威望
448
积分
9980072
跳转到指定楼层
楼主
发表于 2017-7-18 21:40:29 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
EAT
The Edinburgh Associative Thesaurus
Dataset   eat
Description
eatRS.net directed network with 23219 vertices and 325624 arcs (564 loops); stimulus X is associated with response Y N times.
eatSR.net directed network with 23219 vertices and 325589 arcs (564 loops); response X is associated with stimulus Y N times.
It seems that the SR network is incomplete and that it should be the inverse of RS network.
EATnew.net directed network with 23219 vertices and 325593 arcs (564 loops); stimulus X is associated with response Y N times. Combined eatRS and eatSR, duplicated arcs (32) removed.
Download
Background
The Edinburgh Associative Thesaurus (EAT) is a set of word association norms showing the counts of word association as collected from subjects. This is not a developed semantic network such as WordNet (3), but empirical association data.
The traditional way to collect word association norms is to show or say a word to several people and ask them to say the word which first comes to their minds upon receiving the stimulus. The link established between the stimulus and the response is not semantically labelled (e.g. as synonym, antonym or by a case relation) and can only be regarded as an association.
The Edinburgh association norms were collected by growing the network from a nucleus set of words. Responses were collected to words in this nucleus set, then these responses were used to obtain further responses, and so on. In fact the cycle was repeated about three times since by then the number of different responses was so large that they could not be re-used as stimuli. Data collection stopped when 8400 stimulus words had been used. Each stimulus word was presented to 100 different subjects, each of whom received 100 words. This gave rise to a total of 55732 nodes in the Thesaurus network.
The subjects were mostly undergraduates from a wide variety of British universities. The age range of the subjects was from 17 to 22 with a mode of 19. The sex distribution was 64 per cent male and 36 per cent female. The data was collected between June 1968 and May 1971.
The database consists of two files. The SR (stimulus-response) file, and the RS (response-stimulus) file. Where words have been truncated to 19 characters to save space the per cent character (%) has been placed as the 20th.
The EAT here is that included in the MRC Psycholinguistic Database (4), for use with the other measures available there.
EAT Data Collection Procedure (1)Stimulus words
Since the objective was to obtain a reasonably large complete mapping of the associative network for a large set of words, a systematic procedure of 'growing' the network from a small nucleus was followed. At first responses were obtained from this nucleus set, then these responses were used as stimuli to obtain further responses, and so on. In fact, this cycle was repeated about three times, since by then the number of different responses was so large that they could not all be re-used as stimuli.
The nucleus set was derived from (a) the 200 stimuli used in the Palermo and Jenkins (1964) normq (b) the 1,000 most frequent words of the Thorndike and Lorge (1944) word frequency count and (c) the basic English vocabulary of Ogden (1954).
Data collection was stopped when 8,400 stimulus words had been used. Only a minimal amount of selection of stimuli was applied in each cycle of the data collection. Effectively all responses which were English words or meaningful verbal units were included, including some phrasal forms and numerals. The data cover a wide range of grammatical form classes and inflexional forms.
Procedure
Each stimulus word was presented to 100 different subjects. Each subject recieved a computer-printed sheet with 100 stimuli in randomised arrangement (to minimize priming effects). The total contribution of each subject was thus 100 responses. The verbal environment of each word for each subject was different. The instructions asked the subject to write down against each stimulus the first word it made him think of, working as quickly as possible. the total time spent on this task was measured, and most subjects completed the sheet in five to ten minutes.
Most of the data was collected in a classroom setting under supervision. Sheets which had more than 25 percent blank responses were rejected and fresh data was collected.
New versionThe network SR should be equal to the transposed (mirror) version t(RS) of RS. This is not true. There are some differences:   SR - t(RS):     999.BELLOW       1   t(RS) - SR:     30.=*=          17     ULCER.=*=        1     THIRTY.=*=       1     PERIOD.=*=       1There were also 32 multiple lines. Since the weights on the parallel arcs were the same we treated them as duplicates and preserved only a single arc. The 'corrected' version is saved in EATnew.net.History
  • Original EAT: George Kiss, Christine Armstrong, Robert Milroy and J.R.I. Piper (collected between June 1968 and May 1971).
  • MRC Psycholinguistic Database Version modified by: Max Coltheart, S. James, J. Ramshaw, B.M. Philip, B. Reid, J. Benyon-Tinker and E. Doctor; made available by: Philip Quinlan.
  • The present version was re-structured and documented by Michael Wilson at the Rutherford Appleton Laboratory in 1988 (2).
  • transformed in Pajek format: V. Batagelj, 31. July 2003.
  • combined RS and SR versions, removed duplicates: V. Batagelj, 12. August 2013.

References
回复

使用道具 举报

QQ|Archiver|手机版|小黑屋|社会网络分析论坛 social network analysis forum ( 88876751 )

GMT+8, 2024-11-26 05:17 , Processed in 0.139426 second(s), 19 queries .

Powered by www.snachina.com X3.3

© 2001-2017 snachina.com.

快速回复 返回顶部 返回列表