Asp Forum - URGENT help : sorting lines of a file

Rajshekhar

8/11/2008 4:06:00 AM

Problem :

Given a text file as input, sort the lines and store in the specified
output file. The sorting need not make any differenciation with
regards to the characters (whether it is alpha / numeric /
alphanumeric / special). It can compare them using the ASCII value.

IMP *********** The program should not hardcode the number of lines in
the file in ANY manner. In other words, the program should keep
dynamically allocating memory as it reads from the file.

IMP *********** You should *not* assume and hardcode the number of
words in the file. (a convenient usage of a huge array of structures
is strictly prohibited)

how do i make sure that , i cant hardcore the no.of lines and what
kind of data structure to use ?

please help me .........its urgent.

Thanks :)

8 Answers

Barry Schwarz

8/11/2008 4:30:00 AM

On Sun, 10 Aug 2008 21:06:03 -0700 (PDT), Rajshekhar
<rajshekhar3@gmail.com> wrote:

>Problem :
>
>Given a text file as input, sort the lines and store in the specified
>output file. The sorting need not make any differenciation with
>regards to the characters (whether it is alpha / numeric /
>alphanumeric / special). It can compare them using the ASCII value.
>
>IMP *********** The program should not hardcode the number of lines in
>the file in ANY manner. In other words, the program should keep
>dynamically allocating memory as it reads from the file.
>
>IMP *********** You should *not* assume and hardcode the number of
>words in the file. (a convenient usage of a huge array of structures
>is strictly prohibited)

This makes no sense since you are not interested in the words, only in
the characters. Were the instructions to not assume the maximum
number of characters and you copied it wrong for us.

>
>
>how do i make sure that , i cant hardcore the no.of lines and what
>kind of data structure to use ?
>
>
>please help me .........its urgent.

No it is not urgent. It is homework.

One approach is to use a linked list. Each node would contain a
pointer to char that points to a dynamically acquired buffer.

Another approach is to use a dynamic (and therefore adjustable) array
of pointers to char where each pointer points to a dynamically
acquired buffer. Hint: this approach will make it easier to use
qsort.

--
Remove del for email

Eric Sosman

8/11/2008 12:32:00 PM

Rajshekhar wrote:
> Problem :
>
> Given a text file as input, sort the lines and store in the specified
> output file. The sorting need not make any differenciation with
> regards to the characters (whether it is alpha / numeric /
> alphanumeric / special). It can compare them using the ASCII value.
>
> IMP *********** The program should not hardcode the number of lines in
> the file in ANY manner.

Remark in passing: This is too strong a condition for
any computer with finite resources.

> In other words, the program should keep
> dynamically allocating memory as it reads from the file.
>
> IMP *********** You should *not* assume and hardcode the number of
> words in the file. (a convenient usage of a huge array of structures
> is strictly prohibited)
>
>
> how do i make sure that , i cant hardcore the no.of lines and what
> kind of data structure to use ?

The problem statement hints about using a dynamically-
allocated memory area and expanding it if it turns out to
be too small. Do you know how to do this? If not, review
your notes from earlier classes or ask your teacher.

One approach would be to read and store the lines[*] and
also to maintain a dynamically-allocated array of `char*', each
pointing to the start of a line. Each time the array fills up,
expand it to make room for more `char*' pointers. When you've
read all the lines, use qsort() to sort the array of `char*'
and then write out the lines in the order indicated by the
sorted array.

[*] How do you read a line of unknown length? Same idea:
Allocate some memory to hold its characters, then read and
store until you've found the '\n' at the end of the line. If
the line is too long for the memory area, expand it and keep
on reading.

--
Eric Sosman
esosman@ieee-dot-org.invalid

osmium

8/11/2008 4:38:00 PM

"Rajshekhar"

> Problem :
>
> Given a text file as input, sort the lines and store in the specified
> output file. The sorting need not make any differenciation with
> regards to the characters (whether it is alpha / numeric /
> alphanumeric / special). It can compare them using the ASCII value.
>
> IMP *********** The program should not hardcode the number of lines in
> the file in ANY manner. In other words, the program should keep
> dynamically allocating memory as it reads from the file.
>
> IMP *********** You should *not* assume and hardcode the number of
> words in the file. (a convenient usage of a huge array of structures
> is strictly prohibited)
>
>
> how do i make sure that , i cant hardcore the no.of lines and what
> kind of data structure to use ?

Make a tree. When you get done reading and inserting the file is already
sorted! A fundamental question is "How do you handle duplicates lines?" One
answer, for a student exercise, is to assume that won't happen. In that
case each node simply corresponds to the text of a line. Write a routine to
read a line of unknown length and then put it in a properly sized array.
Each non-null leaf in the tree points at one of these arrays.

The hardest part of writing a tree is handling the removal of nodes, and you
don't have to do that in this exercise.

Modify if, as and when you encounter problems.

Ed Prochak

8/11/2008 4:55:00 PM

On Aug 11, 7:31 am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> Rajshekhar wrote:
> > Problem :
>
> > Given a text file as input, sort the lines and store in the specified
> > output file. The sorting need not make any differenciation with
> > regards to the characters (whether it is alpha / numeric /
> > alphanumeric / special). It can compare them using the ASCII value.
>
> > IMP *********** The program should not hardcode the number of lines in
> > the file in ANY manner.
>
> Remark in passing: This is too strong a condition for
> any computer with finite resources.

No it isn't. What is missing is a description of how to fail when the
limit is reached. But the limit is in the OS, not in the application
program. (E.g. think of the cp command in UNIX. it is limited by the
size of the file system, but internally it contains no such limit
value.)

[remaining good advice kept]
> The problem statement hints about using a dynamically-
> allocated memory area and expanding it if it turns out to
> be too small. Do you know how to do this? If not, review
> your notes from earlier classes or ask your teacher.
>
> One approach would be to read and store the lines[*] and
> also to maintain a dynamically-allocated array of `char*', each
> pointing to the start of a line. Each time the array fills up,
> expand it to make room for more `char*' pointers. When you've
> read all the lines, use qsort() to sort the array of `char*'
> and then write out the lines in the order indicated by the
> sorted array.
>
> [*] How do you read a line of unknown length? Same idea:
> Allocate some memory to hold its characters, then read and
> store until you've found the '\n' at the end of the line. If
> the line is too long for the memory area, expand it and keep
> on reading.
>
> --
> Eric Sosman
> esos...@ieee-dot-org.invalid

Ed

gordonb.wpydi

8/11/2008 6:04:00 PM

>Given a text file as input, sort the lines and store in the specified
>output file. The sorting need not make any differenciation with
>regards to the characters (whether it is alpha / numeric /
>alphanumeric / special). It can compare them using the ASCII value.

If this is not homework, consider using the UNIX sort utility.
There are ports of this program to Windows. The source code to
the utility is available with various distributions of FreeBSD,
Linux, OpenBSD, etc. Oh, and the versions of the utility I have
seen do not assume that the data can fit in memory.

roberson

8/11/2008 6:08:00 PM

In article <7dbdc09e-6a25-49e2-bca4-a1093c75792a@d45g2000hsc.googlegroups.com>,
Ed Prochak <edprochak@gmail.com> wrote:

>No it isn't. What is missing is a description of how to fail when the
>limit is reached. But the limit is in the OS, not in the application
>program. (E.g. think of the cp command in UNIX. it is limited by the
>size of the file system, but internally it contains no such limit
>value.)

Any implementation of cp must stat (or fstat) the names given
as parameters, in order to determine whether those names are regular
files or are directories. That limits the cp command to be used
with filesystems whose attributes can be represented within the
defined limits of struct stat; in particular, the file size field
is type off_t, which POSIX.1 requires (section 2.5) to be
a signed arithmetic type. Unless the system implements extensions
beyond standard C, that limits file sizes to long (C89) or long long (C99).
One could have filesystems (e.g., networked file systems) whose size
limits exceeded the local off_t limitations. The limitation would
then be internal to cp (and the OS interfaces), rather than
being limited to the size of the file system.

Are you referring to the official Unix 'cp' as documented at
http://www.opengroup.org/onlinepubs/009695399/utiliti...
or the 'cp' varients as commonly implemented on actual Unix and Unix-like
(e.g., Linux) systems?

If you are referring to cp as implemented on actual systems, then
support for sparse files is not uncommon, and the implementation
then gets more complicated, with lseek() calls being made to create
the "holes", again invoking compiler and OS limits rather than
necessarily hitting the filesystem limits.
--
"No sincere artist was ever completely satisfied with his labour."
-- Walter J. Phillips

CBFalconer

8/11/2008 10:41:00 PM

Rajshekhar wrote:
>
> Given a text file as input, sort the lines and store in the specified
> output file. The sorting need not make any differenciation with
> regards to the characters (whether it is alpha / numeric /
> alphanumeric / special). It can compare them using the ASCII value.
>
> IMP *********** The program should not hardcode the number of lines in
> the file in ANY manner. In other words, the program should keep
> dynamically allocating memory as it reads from the file.
>
> IMP *********** You should *not* assume and hardcode the number of
> words in the file. (a convenient usage of a huge array of structures
> is strictly prohibited)
>
> how do i make sure that , i cant hardcore the no.of lines and what
> kind of data structure to use ?

The easiest thing is to read lines into a linked list. Then sort
the list. If you use my ggets (see below) getting the list is dead
easy. See freverse in the published code. Sorting is also dead
easy if you use mergesort. I have published a suitable
implementation in this newsgroup in the past. Search for
'listops'.

ggets is available in source form, and released to public domain,
at:

<http://cbfalconer.home.att.net/download/gge...

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.a...
Try the download section.

Ed Prochak

8/12/2008 3:34:00 AM

On Aug 11, 2:08 pm, rober...@ibd.nrc-cnrc.gc.ca (Walter Roberson)
wrote:
> In article <7dbdc09e-6a25-49e2-bca4-a1093c757...@d45g2000hsc.googlegroups..com>,
> Ed Prochak <edproc...@gmail.com> wrote:
>
> >No it isn't. What is missing is a description of how to fail when the
> >limit is reached. But the limit is in the OS, not in the application
> >program. (E.g. think of the cp command in UNIX. it is limited by the
> >size of the file system, but internally it contains no such limit
> >value.)
>
> Any implementation of cp must stat (or fstat) the names given
> as parameters, in order to determine whether those names are regular
> files or are directories. That limits the cp command to be used
> with filesystems whose attributes can be represented within the
> defined limits of struct stat; in particular, the file size field
> is type off_t, which POSIX.1 requires (section 2.5) to be
> a signed arithmetic type. Unless the system implements extensions
> beyond standard C, that limits file sizes to long (C89) or long long (C99).
> One could have filesystems (e.g., networked file systems) whose size
> limits exceeded the local off_t limitations. The limitation would
> then be internal to cp (and the OS interfaces), rather than
> being limited to the size of the file system.

Exactly. it is an OS interface issue, not a WAG number.

>
> Are you referring to the official Unix 'cp' as documented athttp://www.opengroup.org/onlinepubs/009695399/utiliti...
> or the 'cp' varients as commonly implemented on actual Unix and Unix-like
> (e.g., Linux) systems?
>
> If you are referring to cp as implemented on actual systems, then
> support for sparse files is not uncommon, and the implementation
> then gets more complicated, with lseek() calls being made to create
> the "holes", again invoking compiler and OS limits rather than
> necessarily hitting the filesystem limits.

The point is the limits are outside the program.

Looking back at your original remark, the issue is that the constraint
was not to put an arbitrary limit on the program, while you remarked
there has to be a limit on a computer system. Basically I think you
extended the constraint needlessly and then tried to shoot it down.

I think we all agree there are finite resopurces. At leat no one I
know of is using a Turing machine with the infinite tape. 8^)

> --
> "No sincere artist was ever completely satisfied with his labour."
> -- Walter J. Phillips

nice quote.
ed

comp.lang.c

URGENT help : sorting lines of a file

Rajshekhar

Barry Schwarz

Eric Sosman

osmium

Ed Prochak

gordonb.wpydi

roberson

CBFalconer

Ed Prochak

x Login to ForumsZone