منتديات تخاطب: ملتقى الفلاسفة واللسانيين واللغويين والأدباء والمثقفين
عزيزي الزائر / عزيزتي الزائرة
تسجيلك في هذا المنتدى يأخذ منك لحظات ولكنه يعطيك امتيازات خاصة كالنسخ والتحميل والتعليق
وإضافة موضوع جديد والتخاطب مع الأعضاء ومناقشتهم
فإن لم تكن مسجلا من قبل فيرجى التسجيل، وإن كنت قد سجّلت فتفضّل
بإدخال اسم العضوية

يمكنك الدخول باستخدام حسابك في الفيس بوك



ستحتاج إلى تفعيل حسابك من بريدك الإلكتروني بعد تسجيلك هنا
التسجيل بالأسماء الحقيقية ثنائية أو ثلاثية وباللغة العربيّة فقط
منتديات تخاطب: ملتقى الفلاسفة واللسانيين واللغويين والأدباء والمثقفين
عزيزي الزائر / عزيزتي الزائرة
تسجيلك في هذا المنتدى يأخذ منك لحظات ولكنه يعطيك امتيازات خاصة كالنسخ والتحميل والتعليق
وإضافة موضوع جديد والتخاطب مع الأعضاء ومناقشتهم
فإن لم تكن مسجلا من قبل فيرجى التسجيل، وإن كنت قد سجّلت فتفضّل
بإدخال اسم العضوية

يمكنك الدخول باستخدام حسابك في الفيس بوك



ستحتاج إلى تفعيل حسابك من بريدك الإلكتروني بعد تسجيلك هنا
التسجيل بالأسماء الحقيقية ثنائية أو ثلاثية وباللغة العربيّة فقط
منتديات تخاطب: ملتقى الفلاسفة واللسانيين واللغويين والأدباء والمثقفين
هل تريد التفاعل مع هذه المساهمة؟ كل ما عليك هو إنشاء حساب جديد ببضع خطوات أو تسجيل الدخول للمتابعة.

منتديات تخاطب: ملتقى الفلاسفة واللسانيين واللغويين والأدباء والمثقفين

تهتم بـ الفلسفة والثقافة والإبداع والفكر والنقد واللغة
 
الرئيسيةأحدث الصورالتسجيلدخول
تعلن إدارة المنتديات عن تعيين الأستاذ بلال موقاي نائباً للمدير .... نبارك له هذه الترقية ونرجو من الله أن يوفقه ويعينه على أعبائه الجديدة وهو أهل لها إن شاء الله تعالى
للاطلاع على فهرس الموقع اضغط على منتديات تخاطب ثم انزل أسفله
» هات يدك Corpora, computers and lexicography I_icon_minitime2023-12-13, 15:27 من طرف عبدالحكيم ال سنبل» بين «بياجيه» و «تشومسكي» مقـاربة حـول كيفيـة اكتسـاب اللغـةCorpora, computers and lexicography I_icon_minitime2023-12-03, 20:02 من طرف سدار محمد عابد» نشيد الفجرCorpora, computers and lexicography I_icon_minitime2023-11-30, 14:48 من طرف عبدالحكيم ال سنبل» الرذ والديناصورCorpora, computers and lexicography I_icon_minitime2023-11-02, 18:04 من طرف عبدالحكيم ال سنبل» سلاما على غزةCorpora, computers and lexicography I_icon_minitime2023-11-01, 18:42 من طرف عبدالحكيم ال سنبل» سلاما على غزةCorpora, computers and lexicography I_icon_minitime2023-11-01, 18:40 من طرف عبدالحكيم ال سنبل» شهد الخلودCorpora, computers and lexicography I_icon_minitime2023-11-01, 18:35 من طرف عبدالحكيم ال سنبل» تهجيرCorpora, computers and lexicography I_icon_minitime2023-11-01, 18:23 من طرف عبدالحكيم ال سنبل» تقرير من غزة Corpora, computers and lexicography I_icon_minitime2023-11-01, 18:18 من طرف عبدالحكيم ال سنبل» القدس لناCorpora, computers and lexicography I_icon_minitime2023-11-01, 17:51 من طرف عبدالحكيم ال سنبل» يوم في غزة Corpora, computers and lexicography I_icon_minitime2023-11-01, 17:45 من طرف عبدالحكيم ال سنبل» شعب عجبCorpora, computers and lexicography I_icon_minitime2023-11-01, 17:41 من طرف عبدالحكيم ال سنبل» سمكة تحت التخديرCorpora, computers and lexicography I_icon_minitime2023-10-07, 15:34 من طرف عبدالحكيم ال سنبل» تجربة حبCorpora, computers and lexicography I_icon_minitime2023-09-16, 23:25 من طرف عبدالحكيم ال سنبل» زلزال و اعصارCorpora, computers and lexicography I_icon_minitime2023-09-14, 05:44 من طرف عبدالحكيم ال سنبل

شاطر
 

 Corpora, computers and lexicography

استعرض الموضوع التالي استعرض الموضوع السابق اذهب الى الأسفل 
كاتب الموضوعرسالة
????
زائر



Corpora, computers and lexicography Empty
مُساهمةموضوع: Corpora, computers and lexicography   Corpora, computers and lexicography I_icon_minitime2011-10-07, 11:29

Corpora, computers and lexicography
The most significant developments in lexicography in the past two decades have involved more extensive corpora of spoken and written language and creation of sophisticated computer-based access tools to such corpora. The greatest innovations have been stimulated by the COBUILD project at the University of Birmingham, UK and the influence of such work can be measured by the fact that by the late 1990s all major English-language learner-dictionary projects have incorporated reference to extensive language corpora and developed computational techniques for extracting lexicographically significant information from such corpora .
The COBUILD project
The COBUILD is one of the largest and most ambitious lexical research projects ever undertaken. COBUILD stands for Collins Birmingham University International Language Database and is largely funded by the publisher William Collins (now HarperCollins).It is based in the school of English at the University of Birmingham under the direction of Professor John Sinclair who is , in addition to having major responsibility for lexical and lexico-grammatical research, editor-in-chief of the major lexicographic and other related publications of COBUILD , WHICH BEGAN WITH PUBLICATION IN 1987 of the ground-breaking Collins COBUILD English Language Dictionary (CCELD). The latest edition is the Collins COBUILD English Dictionary(CCELD),published in 1995. The COBUILD corpus - previously termed the Birmingham Collection of English Text (BCOET)-was re-named The Bank of English in 1991 and at the time of writing (1997)STANDS AT 320 million words.
The principal aim underlying COBUILD research is to investigate in as much detail as possible how the English language is actually used at a given moment in time in both speech and writing , and to allow such evidence to inform publications aimed at learners of the English language. As the project developed through the 1980s , it became clear that such evidence could only be made available by building a multi-million-word corpus.
CONT.
الرجوع الى أعلى الصفحة اذهب الى الأسفل
????
زائر



Corpora, computers and lexicography Empty
مُساهمةموضوع: رد: Corpora, computers and lexicography   Corpora, computers and lexicography I_icon_minitime2011-10-16, 13:34

Lexicography
While studies of grammatical constructions can be reliably conducted on corpora of varying length, to obtain valid information on vocabulary items, it is necessary to analyze corpora that are very large. To understand why this is the case, one need only investigate the frequency patterns of vocabulary in shorter corpora, such as the one-million-word LOB corpus.
The Bank of English Corpus has many potential uses. But it was designed primarily to help in the creation of dictionaries. Sections of the corpus were used as the basis of the BBC English Dictionary, a dictionary that was intended used as the basis of the BBC English Dictionary, a dictionary that was intended to reflect the type of vocabulary used in news broadcasts such as those on the BBC (Sinclair 1992). Consequently, the vocabulary included in the dictionary was based on sections of the Bank of English Corpus containing transcriptions of broadcasts on the BBC (70 million words) and on National Public Radio in Washington, DC ( 10 million words). The Bank of English Corpus was also used as the basis for a more general purpose dictionary, the Collins COBUILD English Dictionary, and a range of other dictionaries on such topics as idioms and phrasal verbs. Other projects have used similar corpora of other types of dictionaries. The Cambridge Language Survey has developed two corpora, the Cambridge International Corpus and the Cambridge Learners' Corpus, to assist in the writing of a number of dictionaries, including the Cambridge International Dictionary of English. Longman publishers assembled a large corpus of spoken and written American English to serve as the basis of the Longman Dictionary of American English, and used the British National Corpus as the basis of the Longman Dictionary of Contemporary English.
To understand why dictionaries are increasingly being based on corpora, it is instructive to review precisely how corpora, and the software designed to analyze them, can not only automate the process of creating a dictionary but also improve the information contained in the dictionary. A typical dictionary as Landau ( 1984:76f) observes, provides its users with various kinds of information about words: their meaning, pronunciation, etymology, part of speech, and status ( e.g. whether the word is considered " colloquial " or " non-standard"). In addition, dictionaries will contain a series of example sentences to illustrate in a meaningful context the various meanings that a given word has.
Prior to the introduction of computer corpora in lexicography, all of this information had to be collected manually. As a consequence, it took years to create a dictionary. For instance, the most comprehensive dictionary of English, the Oxford English Dictionary ( originally entitled New English Dictionary), took fifty years to complete, largely because of the many stages of production that the dictionary went through. Language (1984: 69) notes that the 5 million citations included in the OED had to be "painstakingly collected … sub sorted … analyzed by assistant editors and defined, with representative citations chosen for inclusion; and checked and redefined by [James A. H.] Murray [main editor of the OED] or one of the other supervising editors." Of course, less ambitious dictionaries than the OED took less time to create, but still the creation of a dictionary is a lengthy and arduous process.
Because so much text is now available in computer-readable form, many stages of dictionary creation can be automated. Using a relatively inexpensive piece of software called a concordancing program (cf. section 5.3.2), the lexicographer can go through the stages of dictionary production described above, and instead of spending hours and weeks obtaining information on words, can obtain this information automatically from a computerized corpus. In a matter of seconds, a concordancing program can count the frequency of words in a corpus and rank them from most frequent to least frequent. In addition, some concordancing programs can detect prefixes and suffixes and irregular forms and sort words by "lemmas": words such as runs, running, and ran will not be counted as separate entries but rather as variable forms of the lemma run.
To study the meanings of individual words, the lexicographer can have a word displayed in KWIC (key word in context) format, and easily view the varying contexts in which a word occurs and the meanings it has in these contexts. And if the lexicographer desires a copy of the sentence in which a word occurs, it can be automatically extracted from the text and stored in a file, making obsolete the handwritten citation slip stored in a filing cabinet. If each word in a corpus has been tagged (i.e. assigned a tae designating its word class; cf. section 4.3), the part of speech of each word can be automatically determined. In short, the computer corpus and associated software have completely revolutionized the creation of dictionaries.
In addition to making the proves of creating a dictionary easier, corpora can improve the kinds of information about words contained in dictionaries, and address some of the deficiencies inherent in many dictionaries. One of the criticisms of the OED, Language (1984:71) notes, is that it contains relatively little information on scientific vocabulary. But as the BBC English Dictionary illustrates, if a truly "representative" corpus of a given kind of English is created (in this case, broadcast English), it becomes quite possible to produce a dictionary of any type of English (cf. section 2.5 for a discussion of representative ness in corpus design). And with the vast amount of scientific English available in computerized form, it would now be relatively easy to create a dictionary of scientific English that is corpus-based.
Dictionaries have also been criticized for the unscientific manner in which they define words, a shortcoming that is obviously a consequence of the fact that many of the more traditional dictionaries were created during times when well defined theories of lexical meaning did not exist. But this situation is changing as semanticists turn to corpora to develop theories of lexical meaning based on the use of words in real contexts. Working within the theory of "frame" semantics, Fillmore (1992: 39-45) analyzed the meaning of the word risk in a 25-million-word corpus of written English created by the American Publishing House for the Blind. Fillmore (1992:40) began his analysis of risk in this corpus working from the assumption that all used of risk fit into a general frame of meaning that "there is a probability, greater than zero and less than one, that something bad will happen to someone or something." Within this general frame were three "frame elements," i.e. differing variations on the main meaning of risk, depending upon whether the "risk" is not caused by " someone's action" (e.g. if you stay here you risk getting shot), whether the "risk" is due in some part to what is termed " the Protagonist's Deed" (e.g. I had no idea when I stepped into that bar that I was risking my life), or whether the "risk" results from " the Protagonist's decision to perform the Deed" (e.g. I know I might lose everything, but what the hell, I'm going to risk this week's wages on my favorite horse) (Fillmore 1992: 41-2).
In a survey of ten monolingual dictionaries, Fillmore (1992: 39-40) found great variation in the meanings of risk. In his examination of the 25-million-word corpus he was working with, Fillmore(1992) found that of 1,743 instances of risk he identified, most has one of the three meanings. However, there were some examples that did not fit into the risk frame, and it is these examples that Fillmore (1992:43) finds significant, since without having examined a corpus, "we would not have thought of them on our own." Fillmore's (1992) analysis of the various meanings of the word risk in a corpus effectively illustrates the value of basing a dictionary on actual uses of a particular word. As Fillmore (1992:39) correctly observes, " the citation slips the lexicographers observed were largely limited to examples that somebody happened to notice … " But by consulting a corpus, the lexicographer can be more confident that the results obtained more accurately reflect the actual meaning of a particular word.
-English Corpus Linguistics
An Introduction
CHARLES F.MEYER
University of Massachusetts at Boston,2002

الرجوع الى أعلى الصفحة اذهب الى الأسفل
 

Corpora, computers and lexicography

استعرض الموضوع التالي استعرض الموضوع السابق الرجوع الى أعلى الصفحة 
صفحة 1 من اصل 1

 مواضيع مماثلة

-
» [[color=green]i]Using Corpora in classroom[/i][/color]

صلاحيات هذا المنتدى:لاتستطيع الرد على المواضيع في هذا المنتدى
منتديات تخاطب: ملتقى الفلاسفة واللسانيين واللغويين والأدباء والمثقفين  :: اللغات الأجنبية :: اللسانيات الإنجليزية والأدب الإنجليزي-
تسجيل صفحاتك المفضلة في مواقع خارجية
تسجيل صفحاتك المفضلة في مواقع خارجية reddit      

قم بحفض و مشاطرة الرابط منتديات تخاطب: ملتقى الفلاسفة واللسانيين واللغويين والأدباء والمثقفين على موقع حفض الصفحات
أفضل 10 أعضاء في هذا الأسبوع
لا يوجد مستخدم


Corpora, computers and lexicography 561574572

فانضموا إليها

Computer Hope
انضم للمعجبين بالمنتدى منذ 28/11/2012
سحابة الكلمات الدلالية
محمد النقد الخطاب بلال البخاري العربية اللسانيات النحو اسماعيل الخيام الحذف موقاي النص المعاصر كتاب قواعد اللغة العربي التداولية الأشياء ننجز مجلة مبادئ ظاهرة على مدخل


حقوق النشر محفوظة لمنتديات تخاطب
المشاركون في منتديات تخاطب وحدهم مسؤولون عن منشوراتهم ولا تتحمل الإدارة ولا المشرفون أي مسؤولية قانونية أوأخلاقية عما ينشر فيها

Powered by phpBB© 2010

©phpBB | منتدى مجاني | منتدى مجاني للدعم و المساعدة | التبليغ عن محتوى مخالف | ملفات تعريف الارتباط التابعة لجهات خارجية | آخر المواضيع