Asp Forum - [hardware] memory bandwidth limitations (IMPORTANT

kenobi

1/31/2015 11:37:00 AM

for me it seem the most crucial thing in todays hardware related to optymistation:

what if we can do arithmetic in 8-channel simds one step etc faster and faster, whan it is all limited by the memory bandwidth which is all in the range of reading 1 integer or float on 1 nanosecond

i saw many talks on how cpu is faster
etc but it just seem that the overal speed is not so much dependant on cpu
this is dependant on memory bandwidth
i mean the time in which memory is
transfered form ram to cpu (this is probably 1-level cache to cpu transfer time; many people say how cache 9cache misses) are important, they also seem not important to me as my codes i observe nearly always fits the cache and i do not see any cache misses; it is just always limited by this cached movs
time (whose as i said are on the level
1 int/ns)

So the question is WHY it cannot be speeded up, canot this value of this
1-lvl cache to cpu throughput (in both read and write) be increased? what is real technical reason that it stays fixed and not improve (this seem to me be probably both not maybe raw ram speed dependant, cpu arithmetic dependant,
cache speed dependant - it seem to be just dependant on the 1-level cache to cpu stream; why it cannot be for example simd-like paralelised like arithmetic
is paralelised?

Hope the question is understandable

2 Answers

Mark Carroll

1/31/2015 12:16:00 PM

fir <profesor.fir@gmail.com> writes:

> So the question is WHY it cannot be speeded up, canot this value of this
> 1-lvl cache to cpu throughput (in both read and write) be increased?
> what is real technical reason that it stays fixed and not improve

I'd be surprised if it hadn't been improving, but I'd guess that we have
three main technical limitations for general CPU design:

* As you start using low power in microscopically tiny wiring you start
running into quantum effects such as tunnelling.

* It takes energy to change the voltages on the wires. The faster you do
this, the more energy you use per second, so the hotter the chip gets.
So, you run into thermal issues as power dissipates as heat.

* When voltage changes are fast enough, you also run into high-frequency
analog issues, such as inductance with the changing magnetic fields
inducing current in nearby wiring.

With regard to the cache issue, I'd also wonder if, in terms of moving
data around between sections of the chip, such as between cache and
computation, even with multi-layer stacked circuits there are also
physical space limits regarding wiring the buses in and out of
everywhere, and, with logic gates on the path in the cicuit (to route or
whatever), there may well be propagation delays in the signal that are
limiting at high computation speeds.

I'd be interested to be corrected by somebody who knows rather more than
I about modern CPU design; after all, the last hardware course I took
was last century.

-- Mark

kenobi

1/31/2015 2:03:00 PM

W dniu sobota, 31 stycznia 2015 13:16:18 UTC+1 uzytkownik Mark Carroll napisal:
> fir <profesor.fir@gmail.com> writes:
>
> > So the question is WHY it cannot be speeded up, canot this value of this
> > 1-lvl cache to cpu throughput (in both read and write) be increased?
> > what is real technical reason that it stays fixed and not improve
>
> I'd be surprised if it hadn't been improving, but I'd guess that we have
> three main technical limitations for general CPU design:
>
> * As you start using low power in microscopically tiny wiring you start
> running into quantum effects such as tunnelling.
>
> * It takes energy to change the voltages on the wires. The faster you do
> this, the more energy you use per second, so the hotter the chip gets.
> So, you run into thermal issues as power dissipates as heat.
>
> * When voltage changes are fast enough, you also run into high-frequency
> analog issues, such as inductance with the changing magnetic fields
> inducing current in nearby wiring.
>
> With regard to the cache issue, I'd also wonder if, in terms of moving
> data around between sections of the chip, such as between cache and
> computation, even with multi-layer stacked circuits there are also
> physical space limits regarding wiring the buses in and out of
> everywhere, and, with logic gates on the path in the cicuit (to route or
> whatever), there may well be propagation delays in the signal that are
> limiting at high computation speeds.
>
> I'd be interested to be corrected by somebody who knows rather more than
> I about modern CPU design; after all, the last hardware course I took
> was last century.
>

they paralelized fpu arithmetic (like 8-way float simd, muls, divs), WHY they cannot "parallize" movs?

there must be some reason, and as i said this is absolutely critical thing, why they are not doin that?

comp.programming

[hardware] memory bandwidth limitations (IMPORTANT

kenobi

Mark Carroll

kenobi

x Login to ForumsZone